Project Python Foundations: FoodHub Data Analysis¶
Context¶
The number of restaurants in New York is increasing day by day. Lots of students and busy professionals rely on those restaurants due to their hectic lifestyles. Online food delivery service is a great option for them. It provides them with good food from their favorite restaurants. A food aggregator company FoodHub offers access to multiple restaurants through a single smartphone app.
The app allows the restaurants to receive a direct online order from a customer. The app assigns a delivery person from the company to pick up the order after it is confirmed by the restaurant. The delivery person then uses the map to reach the restaurant and waits for the food package. Once the food package is handed over to the delivery person, he/she confirms the pick-up in the app and travels to the customer's location to deliver the food. The delivery person confirms the drop-off in the app after delivering the food package to the customer. The customer can rate the order in the app. The food aggregator earns money by collecting a fixed margin of the delivery order from the restaurants.
Objective¶
The food aggregator company has stored the data of the different orders made by the registered customers in their online portal. They want to analyze the data to get a fair idea about the demand of different restaurants which will help them in enhancing their customer experience. Suppose you are hired as a Data Scientist in this company and the Data Science team has shared some of the key questions that need to be answered. Perform the data analysis to find answers to these questions that will help the company to improve the business.
Data Description¶
The data contains the different data related to a food order. The detailed data dictionary is given below.
Data Dictionary¶
- order_id: Unique ID of the order
- customer_id: ID of the customer who ordered the food
- restaurant_name: Name of the restaurant
- cuisine_type: Cuisine ordered by the customer
- cost_of_the_order: Cost of the order
- day_of_the_week: Indicates whether the order is placed on a weekday or weekend (The weekday is from Monday to Friday and the weekend is Saturday and Sunday)
- rating: Rating given by the customer out of 5
- food_preparation_time: Time (in minutes) taken by the restaurant to prepare the food. This is calculated by taking the difference between the timestamps of the restaurant's order confirmation and the delivery person's pick-up confirmation.
- delivery_time: Time (in minutes) taken by the delivery person to deliver the food package. This is calculated by taking the difference between the timestamps of the delivery person's pick-up confirmation and drop-off information
Let us start by importing the required libraries¶
# Installing the libraries with the specified version.
!pip install numpy==1.25.2 pandas==1.5.3 matplotlib==3.7.1 seaborn==0.13.1 -q --user
Note: After running the above cell, kindly restart the notebook kernel and run all cells sequentially from the start again.
# import libraries for data manipulation
import numpy as np
import pandas as pd
# import libraries for data visualization
import matplotlib.pyplot as plt
import seaborn as sns
Understanding the structure of the data¶
# uncomment and run the following lines for Google Colab
# from google.colab import drive
# drive.mount('/content/drive')
data = pd.read_csv('foodhub_order.csv')
data.head()
order_id | customer_id | restaurant_name | cuisine_type | cost_of_the_order | day_of_the_week | rating | food_preparation_time | delivery_time | |
---|---|---|---|---|---|---|---|---|---|
0 | 1477147 | 337525 | Hangawi | Korean | 30.75 | Weekend | Not given | 25 | 20 |
1 | 1477685 | 358141 | Blue Ribbon Sushi Izakaya | Japanese | 12.08 | Weekend | Not given | 25 | 23 |
2 | 1477070 | 66393 | Cafe Habana | Mexican | 12.23 | Weekday | 5 | 23 | 28 |
3 | 1477334 | 106968 | Blue Ribbon Fried Chicken | American | 29.20 | Weekend | 3 | 25 | 15 |
4 | 1478249 | 76942 | Dirty Bird to Go | American | 11.59 | Weekday | 4 | 25 | 24 |
Question 1: How many rows and columns are present in the data? [0.5 mark]¶
data.shape
(1898, 9)
Observations:¶
There are 1898 rows and 9 columns
Question 2: What are the datatypes of the different columns in the dataset? (The info() function can be used) [0.5 mark]¶
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1898 entries, 0 to 1897 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 order_id 1898 non-null int64 1 customer_id 1898 non-null int64 2 restaurant_name 1898 non-null object 3 cuisine_type 1898 non-null object 4 cost_of_the_order 1898 non-null float64 5 day_of_the_week 1898 non-null object 6 rating 1898 non-null object 7 food_preparation_time 1898 non-null int64 8 delivery_time 1898 non-null int64 dtypes: float64(1), int64(4), object(4) memory usage: 133.6+ KB
Observations:¶
- The data appears complete in each column. No missing (null) data.
- There are 5 numerical columns (two of which are lengths of time and two id columns).
- Four sting columns (day_of_the_week, cuisine_type, and rating are probably catagorical).
Question 3: Are there any missing values in the data? If yes, treat them using an appropriate method. [1 mark]¶
data.isnull().sum()
order_id 0 customer_id 0 restaurant_name 0 cuisine_type 0 cost_of_the_order 0 day_of_the_week 0 rating 0 food_preparation_time 0 delivery_time 0 dtype: int64
Observations:¶
- There are no missing values in the 9 columns of data
- The data does not need to be corrected for missing values
Check for duplicate data¶
data.duplicated().sum()
0
Question 4: Check the statistical summary of the data. What is the minimum, average, and maximum time it takes for food to be prepared once an order is placed? [2 marks]¶
data.describe().T
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
order_id | 1898.0 | 1.477496e+06 | 548.049724 | 1476547.00 | 1477021.25 | 1477495.50 | 1.477970e+06 | 1478444.00 |
customer_id | 1898.0 | 1.711685e+05 | 113698.139743 | 1311.00 | 77787.75 | 128600.00 | 2.705250e+05 | 405334.00 |
cost_of_the_order | 1898.0 | 1.649885e+01 | 7.483812 | 4.47 | 12.08 | 14.14 | 2.229750e+01 | 35.41 |
food_preparation_time | 1898.0 | 2.737197e+01 | 4.632481 | 20.00 | 23.00 | 27.00 | 3.100000e+01 | 35.00 |
delivery_time | 1898.0 | 2.416175e+01 | 4.972637 | 15.00 | 20.00 | 25.00 | 2.800000e+01 | 33.00 |
Observations:¶
order_id
andcustomer_id
are indexes and only thecount
is meaningful: 1898 where all rows have the index valuescost_of_the_order
: The average cost of the order is 16.50. 75% of the orders cost less than 27.05. Orders can be as high as 40.53. All orders are greater than or equal to $4.47. There are outliers for costfood_preparation_time
: The entire range is between 20 and 35 with the average being 27. At first glance, this appears to be a normal distribution. The mean and the 50% percentile numbers are close. There are outliers, especially on the max side.delivery_time
: The delivery time is between 15 and 33. At first glance, this appears to be a normal distribution. The mean and the 50% percentile numbers are close. There are outliers on both sides.
Question 5: How many orders are not rated? [1 mark]¶
print(data['rating'].value_counts())
Not given 736 5 588 4 386 3 188 Name: rating, dtype: int64
Observation¶
- 736 are
Not given
- About 39% of orders are not rated
Exploratory Data Analysis (EDA)¶
Univariate Analysis¶
Question 6: Explore all the variables and provide observations on their distributions. (Generally, histograms, boxplots, countplots, etc. are used for univariate exploration.) [9 marks]¶
Observations on cost of order
#observation on cost_of_the_order
sns.displot(data=data,x='cost_of_the_order',kind='kde')
plt.show()
sns.boxplot(data=data,x='cost_of_the_order')
plt.show()
- The distribution is skewed toward the right
- There do not appear to be any outliers
Observations on food preparation time
#observation on food_preparation_time
sns.displot(data=data,x='food_preparation_time',kind='kde')
plt.show()
sns.boxplot(data=data,x='food_preparation_time')
plt.show()
- The distribution of
food_perparation_time
appears somewhat normal
Observation on delivery time
#observation on delivery_time
sns.displot(data=data,x='delivery_time',kind='kde')
plt.show()
sns.boxplot(data=data,x='delivery_time')
plt.show()
- Delivery time is left skewed
Observation on cuisine type
#observation of the cusine_type
sns.countplot(data=data,x='cuisine_type')
plt.xticks(rotation=90)
plt.show()
print(data['cuisine_type'].value_counts())
American 584 Japanese 470 Italian 298 Chinese 215 Mexican 77 Indian 73 Middle Eastern 49 Mediterranean 46 Thai 19 French 18 Southern 17 Korean 13 Spanish 12 Vietnamese 7 Name: cuisine_type, dtype: int64
- 14 cusine types
- American is the most popular
- The top 4 cuisines have 83% share
Observation on the day of the week
#observation of the day_of_the_week
sns.countplot(data=data,x='day_of_the_week')
plt.xticks(rotation=90)
plt.show()
print(data['day_of_the_week'].value_counts())
Weekend 1351 Weekday 547 Name: day_of_the_week, dtype: int64
- Weekend is 71%; weekday is 28%
- Weekend has high activity
Observation by restaurant
#observation of restaurant_name
plt.figure(figsize=(15,5))
sns.countplot(data=data,x='restaurant_name')
plt.xticks(rotation=90)
plt.show()
C:\Users\bruce\anaconda3\Lib\site-packages\IPython\core\pylabtools.py:152: UserWarning: Glyph 140 (\x8c) missing from current font. fig.canvas.print_figure(bytes_io, **kw) C:\Users\bruce\anaconda3\Lib\site-packages\IPython\core\pylabtools.py:152: UserWarning: Glyph 142 (\x8e) missing from current font. fig.canvas.print_figure(bytes_io, **kw)
print(data['restaurant_name'].value_counts())
Shake Shack 219 The Meatball Shop 132 Blue Ribbon Sushi 119 Blue Ribbon Fried Chicken 96 Parm 68 ... Sushi Choshi 1 Dos Caminos Soho 1 La Follia 1 Philippe Chow 1 'wichcraft 1 Name: restaurant_name, Length: 178, dtype: int64
Observation of restaurants¶
- Top three restaurants had nearly 25% of the orders
- Top five restaurants had a third of the orders
- A few restaurants dominate based on the number of order
#observation of rating
sns.countplot(data=data,x='rating')
plt.xticks(rotation=90)
plt.show()
print(data['rating'].value_counts())
Not given 736 5 588 4 386 3 188 Name: rating, dtype: int64
- Nearly 39% of
rating
has valueNot given
- Rating of 5 had the highest number
- Rating of 4 and 3 combined had nearly the same number of orders as the 5s
Question 7: Which are the top 5 restaurants in terms of the number of orders received? [1 mark]¶
data['restaurant_name'].value_counts().head(n=5)
Shake Shack 219 The Meatball Shop 132 Blue Ribbon Sushi 119 Blue Ribbon Fried Chicken 96 Parm 68 Name: restaurant_name, dtype: int64
Observations:¶
- Top 5 restaurants are 33% of the total
- Shake Shack has the most number of orders by far
Question 8: Which is the most popular cuisine on weekends? [1 mark]¶
sns.countplot(data=data.loc[data['day_of_the_week']=="Weekend"],x='cuisine_type')
plt.xticks(rotation=90)
plt.show()
weekend_cuisines = data.loc[data['day_of_the_week']=="Weekend",'cuisine_type']
print(weekend_cuisines.value_counts(dropna=False))
weekend_cuisines.count()
American 415 Japanese 335 Italian 207 Chinese 163 Mexican 53 Indian 49 Mediterranean 32 Middle Eastern 32 Thai 15 French 13 Korean 11 Southern 11 Spanish 11 Vietnamese 4 Name: cuisine_type, dtype: int64
1351
Observations:¶
- American is the most popular
- The top four (American, Japanese, Italian, Chinese) dominate both weekend and weekday
- The top four cuisines have 83% of the weekend orders
Question 9: What percentage of the orders cost more than 20 dollars? [2 marks]¶
order_count_greater_than_20 = data.loc[data['cost_of_the_order']>20].shape[0]
format(order_count_greater_than_20/data['cost_of_the_order'].count(), ".0%")
'29%'
Observations:¶
29% of the orders cost more than \$20
Question 10: What is the mean order delivery time? [1 mark]¶
data['delivery_time'].mean()
24.161749209694417
Observations:¶
- Mean order delivery time is 24
Question 11: The company has decided to give 20% discount vouchers to the top 3 most frequent customers. Find the IDs of these customers and the number of orders they placed. [1 mark]¶
data['customer_id'].value_counts().head(3)
52832 13 47440 10 83287 9 Name: customer_id, dtype: int64
Observations:¶
The three most frequest customers were ID: 52832, 47442, and 83287
Multivariate Analysis¶
Question 12: Perform a multivariate analysis to explore relationships between the important variables in the dataset. (It is a good idea to explore relations between numerical variables as well as relations between numerical and categorical variables) [10 marks]¶
# plt.figure(figsize=(10,5))
data_without_id_columns= data.drop(columns=["order_id", "customer_id"])
sns.heatmap(data_without_id_columns.corr(),annot=True,cmap='Spectral',vmin=-1,vmax=1)
plt.show()
C:\Users\bruce\AppData\Local\Temp\ipykernel_23616\4285889247.py:3: FutureWarning: The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select only valid columns or specify the value of numeric_only to silence this warning. sns.heatmap(data_without_id_columns.corr(),annot=True,cmap='Spectral',vmin=-1,vmax=1)
sns.pairplot(data, diag_kind="kde");
- Weak correlation between cost, preparation time, and delivery time
plt.figure(figsize=(10,5))
sns.scatterplot(data=data,x='food_preparation_time',y='delivery_time')
plt.show()
When comparing food preparation time to delivery time, no pattern is revealed
Cuisine¶
Lets check relationship between cuisine and cost of the order
sns.boxplot(data=data,x='cuisine_type',y='cost_of_the_order')
plt.xticks(rotation=90)
plt.show()
# Dispersion of price in every region
sns.catplot(x='cost_of_the_order',
col='cuisine_type',
data=data,
col_wrap=4,
kind="violin")
plt.show()
- Southern has the biggest IRQ range and Korean has the smallest
- French has the highest average cost per order
- Vietnamese has the least average cost per order
Lets check relationship between cuisine and food preparation time
sns.boxplot(data=data,x='cuisine_type',y='food_preparation_time')
plt.xticks(rotation=90)
plt.show()
- Food takes between 25 and 28 minutes to prepare
- Vietnamese and Korean take the shortest time to prepare
- Italian and Thai take the longest to prepare
Lets check the relationship between cuisine and delivery time
sns.boxplot(data=data,x='cuisine_type',y='delivery_time')
plt.xticks(rotation=90)
plt.show()
- Generally Japanese, American, Italian, Mediteranean, Middle Easter, Southern can have delivery times between the 15 (min) and 32 (max).
- French on average has the longest delivery times
- Korean has the shortest delivery time
- Japanese, American, Italian, Indian, Southern have similar average delivery times
Lets check the relationship between cuisine and the day of the week
sns.countplot(data=data,x='cuisine_type',hue='day_of_the_week')
plt.xticks(rotation=90)
plt.show()
- The orders seem to be proportional between each of the cuisine types regardless of the day of the week.
Lets compare the cusisine type to cost of the order and consider the day of the week
sns.scatterplot(data=data, x="cuisine_type", y="cost_of_the_order", hue="day_of_the_week")
plt.xticks(rotation=90)
plt.show()
There may be a lot of overlapping data, so we may want to use a stripplot
plt.figure(figsize=(25, 7)) # To resize the plot
sns.stripplot(data = data, x = "cuisine_type", y = "cost_of_the_order", hue="day_of_the_week"
)
plt.xticks(rotation=90);
Question 13: The company wants to provide a promotional offer in the advertisement of the restaurants. The condition to get the offer is that the restaurants must have a rating count of more than 50 and the average rating should be greater than 4. Find the restaurants fulfilling the criteria to get the promotional offer. [3 marks]¶
Let's begin by keeping the data that is rated
# remove the orders that were 'Not Rated'
rated_data = data.loc[data['rating'] != 'Not given'].reset_index()
# convert the ratings to an int
rated_data['order_rating'] = rated_data.loc[:,'rating'].astype(int)
rated_data.head()
index | order_id | customer_id | restaurant_name | cuisine_type | cost_of_the_order | day_of_the_week | rating | food_preparation_time | delivery_time | order_rating | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2 | 1477070 | 66393 | Cafe Habana | Mexican | 12.23 | Weekday | 5 | 23 | 28 | 5 |
1 | 3 | 1477334 | 106968 | Blue Ribbon Fried Chicken | American | 29.20 | Weekend | 3 | 25 | 15 | 3 |
2 | 4 | 1478249 | 76942 | Dirty Bird to Go | American | 11.59 | Weekday | 4 | 25 | 24 | 4 |
3 | 5 | 1477224 | 147468 | Tamarind TriBeCa | Indian | 25.22 | Weekday | 3 | 20 | 24 | 3 |
4 | 7 | 1477859 | 89574 | Barbounia | Mediterranean | 5.97 | Weekday | 3 | 33 | 30 | 3 |
Next, let's get restrauants who have more than 50 ratings
# create a variable that contains the restraurants and their rating counts
restaurant_rating_count = rated_data.groupby(['restaurant_name'])['order_rating'].count().reset_index()
# rename the rating column to rating_counts and sort them in descending order
restaurant_rating_count.rename(columns={'order_rating':'rating_counts'},inplace=True)
# restaurant_rating_count.sort_values(ascending=False, by= 'rating_counts')
# get list of restaurant with rating greater than 50
restaurant_rating_50 = restaurant_rating_count[restaurant_rating_count['rating_counts']>50]
restaurant_rating_50
restaurant_name | rating_counts | |
---|---|---|
16 | Blue Ribbon Fried Chicken | 64 |
17 | Blue Ribbon Sushi | 73 |
117 | Shake Shack | 133 |
132 | The Meatball Shop | 84 |
Now let's get the restrauants who have an average rating greater than 4
restaurant_average_rating = rated_data.groupby(['restaurant_name'])[['order_rating']].mean().reset_index()
restaurant_average_rating.rename(columns={'order_rating':'avg_rating'},inplace=True)
# get the list of restraurants who have a rating greater than 4
rating_greater_4 = restaurant_average_rating[restaurant_average_rating['avg_rating']>4]
rating_greater_4
restaurant_name | avg_rating | |
---|---|---|
0 | 'wichcraft | 5.000000 |
1 | 12 Chairs | 4.500000 |
3 | 67 Burger | 5.000000 |
4 | Amma | 4.500000 |
6 | Anjappar Chettinad | 5.000000 |
... | ... | ... |
149 | Yama 49 | 5.000000 |
150 | Yama Japanese Restaurant | 4.500000 |
153 | da Umberto | 5.000000 |
154 | ilili Restaurant | 4.153846 |
155 | indikitch | 4.500000 |
110 rows × 2 columns
We have a list of restaurants that have more than 50 ratings (in a dataframe named restaurant_rating_50
)
And we have a list of restraunts that have an average rating greater than 4 (in a dataframe named rating_greater_4
)
Merge both datasets, and create a combined view on the basis of average rating and rating counts
promo_restaurants = restaurant_rating_50.merge(rating_greater_4, on = 'restaurant_name', how = 'inner')
promo_restaurants
restaurant_name | rating_counts | avg_rating | |
---|---|---|---|
0 | Blue Ribbon Fried Chicken | 64 | 4.328125 |
1 | Blue Ribbon Sushi | 73 | 4.219178 |
2 | Shake Shack | 133 | 4.278195 |
3 | The Meatball Shop | 84 | 4.511905 |
Observations:¶
4 restaurants had more than 50 ratings
110 restaurants had a rating above 4
4 restaurants qualified for the promotion:
- Blue Ribbon Fried Chicken
- Blue Ribbon Sushi
- Shake Shack
- The Meatball Shop
Question 14: The company charges the restaurant 25% on the orders having cost greater than 20 dollars and 15% on the orders having cost greater than 5 dollars. Find the net revenue generated by the company across all orders. [3 marks]¶
Create a function with conditional statements (for each category, 25%, 15% and 0%(else condition) and mention the revenue for each condition.
def calculate_revenue(cost_of_the_order):
if cost_of_the_order > 20:
return cost_of_the_order * .25
elif cost_of_the_order > 5:
return cost_of_the_order * .15
else:
return 0
# test the function, numbers should match
print(20*.15, calculate_revenue(20))
print(22*.25, calculate_revenue(22))
print(5*0, calculate_revenue(5))
3.0 3.0 5.5 5.5 0 0
Apply the function to the cost_of_the_order column to calculate the revenue, sum the value in a revenue column.
data['revenue'] = data['cost_of_the_order'].map(lambda x: calculate_revenue(x))
# sanity check the revenue column with the cost_of_order in the first two rows
print(30.75*.25, calculate_revenue(30.75))
print(12.08*.15, calculate_revenue(12.08))
data
7.6875 7.6875 1.8119999999999998 1.8119999999999998
order_id | customer_id | restaurant_name | cuisine_type | cost_of_the_order | day_of_the_week | rating | food_preparation_time | delivery_time | revenue | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1477147 | 337525 | Hangawi | Korean | 30.75 | Weekend | Not given | 25 | 20 | 7.6875 |
1 | 1477685 | 358141 | Blue Ribbon Sushi Izakaya | Japanese | 12.08 | Weekend | Not given | 25 | 23 | 1.8120 |
2 | 1477070 | 66393 | Cafe Habana | Mexican | 12.23 | Weekday | 5 | 23 | 28 | 1.8345 |
3 | 1477334 | 106968 | Blue Ribbon Fried Chicken | American | 29.20 | Weekend | 3 | 25 | 15 | 7.3000 |
4 | 1478249 | 76942 | Dirty Bird to Go | American | 11.59 | Weekday | 4 | 25 | 24 | 1.7385 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1893 | 1476701 | 292602 | Chipotle Mexican Grill $1.99 Delivery | Mexican | 22.31 | Weekend | 5 | 31 | 17 | 5.5775 |
1894 | 1477421 | 397537 | The Smile | American | 12.18 | Weekend | 5 | 31 | 19 | 1.8270 |
1895 | 1477819 | 35309 | Blue Ribbon Sushi | Japanese | 25.22 | Weekday | Not given | 31 | 24 | 6.3050 |
1896 | 1477513 | 64151 | Jack's Wife Freda | Mediterranean | 12.18 | Weekday | 5 | 23 | 31 | 1.8270 |
1897 | 1478056 | 120353 | Blue Ribbon Sushi | Japanese | 19.45 | Weekend | Not given | 28 | 24 | 2.9175 |
1898 rows × 10 columns
The sum of the revenue column will give the total revenue
data['revenue'].sum()
6166.303
Observations:¶
- Of the 1898 orders, the revenue was 6166.30.
- The revenue for the average order was
data['revenue'].mean().round(2)
3.25
Question 15: The company wants to analyze the total time required to deliver the food. What percentage of orders take more than 60 minutes to get delivered from the time the order is placed? (The food has to be prepared and then delivered.) [2 marks]¶
Create a column with the total food preparation time and the delivery time
data['total_time'] = data.apply(lambda x: x['food_preparation_time'] + x['delivery_time'] , axis=1)
# sanity check the result
data.loc[:, ['food_preparation_time', 'delivery_time', 'total_time']]
food_preparation_time | delivery_time | total_time | |
---|---|---|---|
0 | 25 | 20 | 45 |
1 | 25 | 23 | 48 |
2 | 23 | 28 | 51 |
3 | 25 | 15 | 40 |
4 | 25 | 24 | 49 |
... | ... | ... | ... |
1893 | 31 | 17 | 48 |
1894 | 31 | 19 | 50 |
1895 | 31 | 24 | 55 |
1896 | 23 | 31 | 54 |
1897 | 28 | 24 | 52 |
1898 rows × 3 columns
Get the number of rows where the total time is greater than 60 and divide it by the total number of rows
orders_greaterthan_60 = data[data.total_time > 60].shape[0]
np.round(orders_greaterthan_60/data.shape[0], 2)
0.11
Determine if correlations between total time and other variables
plt.figure(figsize=(10,5))
data_without_id_columns= data.drop(columns=["order_id", "customer_id"])
sns.heatmap(data_without_id_columns.corr(),annot=True,cmap='Spectral',vmin=-1,vmax=1)
plt.show()
C:\Users\bruce\AppData\Local\Temp\ipykernel_23616\2183855587.py:3: FutureWarning: The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select only valid columns or specify the value of numeric_only to silence this warning. sns.heatmap(data_without_id_columns.corr(),annot=True,cmap='Spectral',vmin=-1,vmax=1)
Observations:¶
- 11 percent of the orders take more than 60 minutes considering the time it takes to prepare food and then delivered
- Total time for the order is highly correlated to delivery time
Question 16: The company wants to analyze the delivery time of the orders on weekdays and weekends. How does the mean delivery time vary during weekdays and weekends? [2 marks]¶
Get the mean delivery time for all of the days
data.loc[:,'delivery_time'].mean()
24.161749209694417
Get the mean delivery time for just the weekends
data.groupby(['day_of_the_week'])[['delivery_time']].mean()
delivery_time | |
---|---|
day_of_the_week | |
Weekday | 28.340037 |
Weekend | 22.470022 |
View the boxpot to visually inspect the delivery time for the weekend and weekday
sns.boxplot(data=data,x='day_of_the_week',y='delivery_time')
plt.xticks(rotation=90)
plt.show()
Observations:¶
- Weekend deliveries are about 6 minutes faster than weekdays
- There are more orders on weekends with faster deliveries than weekdays
- Weekday deliveries are 4 minutes longer than the average delivery
- Weekend deliveries are 2 minutes faster than the average delivery
Conclusion and Recommendations¶
Question 17: What are your conclusions from the analysis? What recommendations would you like to share to help improve the business? (You can use cuisine type and feedback ratings to drive your business recommendations.) [6 marks]¶
Conclusions:¶
- Weekends dominate over weekday: Weekend is 71%; weekday is 28%
- Weekend deliveries are faster than weekday deliveries
- Shake Shack has significant share of about 12% of the orders
- The top four cuisines (American, Japanese, Italian, Chinese) dominate
- The top four cuisines have 83% of the weekend orders
- nearly 40% of ratings are not given
- The same restaurants that qualified for the promotion are the top four restaurants for orders
- 11% of orders take an hour or more to arrive
- 29% of orders are over $20
Recommendations:¶
Grow the business on weekdays
- The service has capacity to grow on weekdays as suggested by the disparity of weekends dominating the number of orders
- Promote the promotional offer to the top restaurants on weekdays
Improve delivery times on weekdays
- Weekdays have an opportunity to grow. Suggest working closely with delivery firms/individuals to speed up weekday delivery
- Provide offers for weekday buyers near to restarurants
Grow the business with cuisines outside the top 4
- Highlight lesser served restaurants as alternative choices
- Target customers who have alternative restaurants nearby (reducing delivery time) with promotion
- Reward customers who often use the service to try alternative restaurants
Increase ratings for order
- Provide incentives for new raters
- Follow up those who do not rate on receipt a few hours after their order is delivered
Increase order size
- Offer incentives to increase order size to over $20