Project Python Foundations: FoodHub Data Analysis¶

Context¶

The number of restaurants in New York is increasing day by day. Lots of students and busy professionals rely on those restaurants due to their hectic lifestyles. Online food delivery service is a great option for them. It provides them with good food from their favorite restaurants. A food aggregator company FoodHub offers access to multiple restaurants through a single smartphone app.

The app allows the restaurants to receive a direct online order from a customer. The app assigns a delivery person from the company to pick up the order after it is confirmed by the restaurant. The delivery person then uses the map to reach the restaurant and waits for the food package. Once the food package is handed over to the delivery person, he/she confirms the pick-up in the app and travels to the customer's location to deliver the food. The delivery person confirms the drop-off in the app after delivering the food package to the customer. The customer can rate the order in the app. The food aggregator earns money by collecting a fixed margin of the delivery order from the restaurants.

Objective¶

The food aggregator company has stored the data of the different orders made by the registered customers in their online portal. They want to analyze the data to get a fair idea about the demand of different restaurants which will help them in enhancing their customer experience. Suppose you are hired as a Data Scientist in this company and the Data Science team has shared some of the key questions that need to be answered. Perform the data analysis to find answers to these questions that will help the company to improve the business.

Data Description¶

The data contains the different data related to a food order. The detailed data dictionary is given below.

Data Dictionary¶

  • order_id: Unique ID of the order
  • customer_id: ID of the customer who ordered the food
  • restaurant_name: Name of the restaurant
  • cuisine_type: Cuisine ordered by the customer
  • cost_of_the_order: Cost of the order
  • day_of_the_week: Indicates whether the order is placed on a weekday or weekend (The weekday is from Monday to Friday and the weekend is Saturday and Sunday)
  • rating: Rating given by the customer out of 5
  • food_preparation_time: Time (in minutes) taken by the restaurant to prepare the food. This is calculated by taking the difference between the timestamps of the restaurant's order confirmation and the delivery person's pick-up confirmation.
  • delivery_time: Time (in minutes) taken by the delivery person to deliver the food package. This is calculated by taking the difference between the timestamps of the delivery person's pick-up confirmation and drop-off information

Let us start by importing the required libraries¶

In [4]:
# Installing the libraries with the specified version.
!pip install numpy==1.25.2 pandas==1.5.3 matplotlib==3.7.1 seaborn==0.13.1 -q --user

Note: After running the above cell, kindly restart the notebook kernel and run all cells sequentially from the start again.

In [6]:
# import libraries for data manipulation
import numpy as np
import pandas as pd

# import libraries for data visualization
import matplotlib.pyplot as plt
import seaborn as sns

Understanding the structure of the data¶

In [8]:
# uncomment and run the following lines for Google Colab
# from google.colab import drive
# drive.mount('/content/drive')
In [9]:
data = pd.read_csv('foodhub_order.csv')
In [10]:
data.head()
Out[10]:
order_id customer_id restaurant_name cuisine_type cost_of_the_order day_of_the_week rating food_preparation_time delivery_time
0 1477147 337525 Hangawi Korean 30.75 Weekend Not given 25 20
1 1477685 358141 Blue Ribbon Sushi Izakaya Japanese 12.08 Weekend Not given 25 23
2 1477070 66393 Cafe Habana Mexican 12.23 Weekday 5 23 28
3 1477334 106968 Blue Ribbon Fried Chicken American 29.20 Weekend 3 25 15
4 1478249 76942 Dirty Bird to Go American 11.59 Weekday 4 25 24

Question 1: How many rows and columns are present in the data? [0.5 mark]¶

In [12]:
data.shape
Out[12]:
(1898, 9)

Observations:¶

There are 1898 rows and 9 columns

Question 2: What are the datatypes of the different columns in the dataset? (The info() function can be used) [0.5 mark]¶

In [15]:
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1898 entries, 0 to 1897
Data columns (total 9 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   order_id               1898 non-null   int64  
 1   customer_id            1898 non-null   int64  
 2   restaurant_name        1898 non-null   object 
 3   cuisine_type           1898 non-null   object 
 4   cost_of_the_order      1898 non-null   float64
 5   day_of_the_week        1898 non-null   object 
 6   rating                 1898 non-null   object 
 7   food_preparation_time  1898 non-null   int64  
 8   delivery_time          1898 non-null   int64  
dtypes: float64(1), int64(4), object(4)
memory usage: 133.6+ KB

Observations:¶

  • The data appears complete in each column. No missing (null) data.
  • There are 5 numerical columns (two of which are lengths of time and two id columns).
  • Four sting columns (day_of_the_week, cuisine_type, and rating are probably catagorical).

Question 3: Are there any missing values in the data? If yes, treat them using an appropriate method. [1 mark]¶

In [18]:
data.isnull().sum()
Out[18]:
order_id                 0
customer_id              0
restaurant_name          0
cuisine_type             0
cost_of_the_order        0
day_of_the_week          0
rating                   0
food_preparation_time    0
delivery_time            0
dtype: int64

Observations:¶

  • There are no missing values in the 9 columns of data
  • The data does not need to be corrected for missing values

Check for duplicate data¶

In [21]:
data.duplicated().sum()
Out[21]:
0

Question 4: Check the statistical summary of the data. What is the minimum, average, and maximum time it takes for food to be prepared once an order is placed? [2 marks]¶

In [23]:
data.describe().T
Out[23]:
count mean std min 25% 50% 75% max
order_id 1898.0 1.477496e+06 548.049724 1476547.00 1477021.25 1477495.50 1.477970e+06 1478444.00
customer_id 1898.0 1.711685e+05 113698.139743 1311.00 77787.75 128600.00 2.705250e+05 405334.00
cost_of_the_order 1898.0 1.649885e+01 7.483812 4.47 12.08 14.14 2.229750e+01 35.41
food_preparation_time 1898.0 2.737197e+01 4.632481 20.00 23.00 27.00 3.100000e+01 35.00
delivery_time 1898.0 2.416175e+01 4.972637 15.00 20.00 25.00 2.800000e+01 33.00

Observations:¶

  • order_id and customer_id are indexes and only the count is meaningful: 1898 where all rows have the index values
  • cost_of_the_order: The average cost of the order is 16.50. 75% of the orders cost less than 27.05. Orders can be as high as 40.53. All orders are greater than or equal to $4.47. There are outliers for cost
  • food_preparation_time: The entire range is between 20 and 35 with the average being 27. At first glance, this appears to be a normal distribution. The mean and the 50% percentile numbers are close. There are outliers, especially on the max side.
  • delivery_time: The delivery time is between 15 and 33. At first glance, this appears to be a normal distribution. The mean and the 50% percentile numbers are close. There are outliers on both sides.

Question 5: How many orders are not rated? [1 mark]¶

In [26]:
print(data['rating'].value_counts())
Not given    736
5            588
4            386
3            188
Name: rating, dtype: int64

Observation¶

  • 736 are Not given
  • About 39% of orders are not rated

Exploratory Data Analysis (EDA)¶

Univariate Analysis¶

Question 6: Explore all the variables and provide observations on their distributions. (Generally, histograms, boxplots, countplots, etc. are used for univariate exploration.) [9 marks]¶

Observations on cost of order

In [32]:
#observation on cost_of_the_order
sns.displot(data=data,x='cost_of_the_order',kind='kde')
plt.show()
sns.boxplot(data=data,x='cost_of_the_order')
plt.show()
No description has been provided for this image
No description has been provided for this image
  • The distribution is skewed toward the right
  • There do not appear to be any outliers

Observations on food preparation time

In [35]:
#observation on food_preparation_time
sns.displot(data=data,x='food_preparation_time',kind='kde')
plt.show()
sns.boxplot(data=data,x='food_preparation_time')
plt.show()
No description has been provided for this image
No description has been provided for this image
  • The distribution of food_perparation_time appears somewhat normal

Observation on delivery time

In [38]:
#observation on delivery_time
sns.displot(data=data,x='delivery_time',kind='kde')
plt.show()
sns.boxplot(data=data,x='delivery_time')
plt.show()
No description has been provided for this image
No description has been provided for this image
  • Delivery time is left skewed

Observation on cuisine type

In [41]:
#observation of the cusine_type
sns.countplot(data=data,x='cuisine_type')
plt.xticks(rotation=90)
plt.show()
No description has been provided for this image
In [42]:
print(data['cuisine_type'].value_counts())
American          584
Japanese          470
Italian           298
Chinese           215
Mexican            77
Indian             73
Middle Eastern     49
Mediterranean      46
Thai               19
French             18
Southern           17
Korean             13
Spanish            12
Vietnamese          7
Name: cuisine_type, dtype: int64
  • 14 cusine types
  • American is the most popular
  • The top 4 cuisines have 83% share

Observation on the day of the week

In [45]:
#observation of the day_of_the_week
sns.countplot(data=data,x='day_of_the_week')
plt.xticks(rotation=90)
plt.show()
No description has been provided for this image
In [46]:
print(data['day_of_the_week'].value_counts())
Weekend    1351
Weekday     547
Name: day_of_the_week, dtype: int64
  • Weekend is 71%; weekday is 28%
  • Weekend has high activity

Observation by restaurant

In [49]:
#observation of restaurant_name
plt.figure(figsize=(15,5))
sns.countplot(data=data,x='restaurant_name')
plt.xticks(rotation=90)
plt.show()
C:\Users\bruce\anaconda3\Lib\site-packages\IPython\core\pylabtools.py:152: UserWarning: Glyph 140 (\x8c) missing from current font.
  fig.canvas.print_figure(bytes_io, **kw)
C:\Users\bruce\anaconda3\Lib\site-packages\IPython\core\pylabtools.py:152: UserWarning: Glyph 142 (\x8e) missing from current font.
  fig.canvas.print_figure(bytes_io, **kw)
No description has been provided for this image
In [50]:
print(data['restaurant_name'].value_counts())
Shake Shack                  219
The Meatball Shop            132
Blue Ribbon Sushi            119
Blue Ribbon Fried Chicken     96
Parm                          68
                            ... 
Sushi Choshi                   1
Dos Caminos Soho               1
La Follia                      1
Philippe Chow                  1
'wichcraft                     1
Name: restaurant_name, Length: 178, dtype: int64

Observation of restaurants¶

  • Top three restaurants had nearly 25% of the orders
  • Top five restaurants had a third of the orders
  • A few restaurants dominate based on the number of order
In [52]:
#observation of rating
sns.countplot(data=data,x='rating')
plt.xticks(rotation=90)
plt.show()
No description has been provided for this image
In [53]:
print(data['rating'].value_counts())
Not given    736
5            588
4            386
3            188
Name: rating, dtype: int64
  • Nearly 39% of rating has value Not given
  • Rating of 5 had the highest number
  • Rating of 4 and 3 combined had nearly the same number of orders as the 5s

Question 7: Which are the top 5 restaurants in terms of the number of orders received? [1 mark]¶

In [56]:
data['restaurant_name'].value_counts().head(n=5)
Out[56]:
Shake Shack                  219
The Meatball Shop            132
Blue Ribbon Sushi            119
Blue Ribbon Fried Chicken     96
Parm                          68
Name: restaurant_name, dtype: int64

Observations:¶

  • Top 5 restaurants are 33% of the total
  • Shake Shack has the most number of orders by far

Question 8: Which is the most popular cuisine on weekends? [1 mark]¶

In [59]:
sns.countplot(data=data.loc[data['day_of_the_week']=="Weekend"],x='cuisine_type')
plt.xticks(rotation=90)
plt.show()

weekend_cuisines = data.loc[data['day_of_the_week']=="Weekend",'cuisine_type']
print(weekend_cuisines.value_counts(dropna=False))
weekend_cuisines.count()
No description has been provided for this image
American          415
Japanese          335
Italian           207
Chinese           163
Mexican            53
Indian             49
Mediterranean      32
Middle Eastern     32
Thai               15
French             13
Korean             11
Southern           11
Spanish            11
Vietnamese          4
Name: cuisine_type, dtype: int64
Out[59]:
1351

Observations:¶

  • American is the most popular
  • The top four (American, Japanese, Italian, Chinese) dominate both weekend and weekday
  • The top four cuisines have 83% of the weekend orders

Question 9: What percentage of the orders cost more than 20 dollars? [2 marks]¶

In [62]:
order_count_greater_than_20 = data.loc[data['cost_of_the_order']>20].shape[0]
format(order_count_greater_than_20/data['cost_of_the_order'].count(), ".0%")
Out[62]:
'29%'

Observations:¶

29% of the orders cost more than \$20

Question 10: What is the mean order delivery time? [1 mark]¶

In [65]:
data['delivery_time'].mean()
Out[65]:
24.161749209694417

Observations:¶

  • Mean order delivery time is 24

Question 11: The company has decided to give 20% discount vouchers to the top 3 most frequent customers. Find the IDs of these customers and the number of orders they placed. [1 mark]¶

In [68]:
data['customer_id'].value_counts().head(3)
Out[68]:
52832    13
47440    10
83287     9
Name: customer_id, dtype: int64

Observations:¶

The three most frequest customers were ID: 52832, 47442, and 83287

Multivariate Analysis¶

Question 12: Perform a multivariate analysis to explore relationships between the important variables in the dataset. (It is a good idea to explore relations between numerical variables as well as relations between numerical and categorical variables) [10 marks]¶

In [72]:
# plt.figure(figsize=(10,5))
data_without_id_columns= data.drop(columns=["order_id", "customer_id"])
sns.heatmap(data_without_id_columns.corr(),annot=True,cmap='Spectral',vmin=-1,vmax=1)
plt.show()
C:\Users\bruce\AppData\Local\Temp\ipykernel_23616\4285889247.py:3: FutureWarning: The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select only valid columns or specify the value of numeric_only to silence this warning.
  sns.heatmap(data_without_id_columns.corr(),annot=True,cmap='Spectral',vmin=-1,vmax=1)
No description has been provided for this image
In [73]:
sns.pairplot(data, diag_kind="kde");
No description has been provided for this image
  • Weak correlation between cost, preparation time, and delivery time
In [75]:
plt.figure(figsize=(10,5))
sns.scatterplot(data=data,x='food_preparation_time',y='delivery_time')
plt.show()
No description has been provided for this image

When comparing food preparation time to delivery time, no pattern is revealed

Cuisine¶

Lets check relationship between cuisine and cost of the order

In [78]:
sns.boxplot(data=data,x='cuisine_type',y='cost_of_the_order')
plt.xticks(rotation=90)
plt.show()

# Dispersion of price in every region
sns.catplot(x='cost_of_the_order',
            col='cuisine_type', 
            data=data,
            col_wrap=4,
            kind="violin")
plt.show()
No description has been provided for this image
No description has been provided for this image
  • Southern has the biggest IRQ range and Korean has the smallest
  • French has the highest average cost per order
  • Vietnamese has the least average cost per order

Lets check relationship between cuisine and food preparation time

In [81]:
sns.boxplot(data=data,x='cuisine_type',y='food_preparation_time')
plt.xticks(rotation=90)
plt.show()
No description has been provided for this image
  • Food takes between 25 and 28 minutes to prepare
  • Vietnamese and Korean take the shortest time to prepare
  • Italian and Thai take the longest to prepare

Lets check the relationship between cuisine and delivery time

In [84]:
sns.boxplot(data=data,x='cuisine_type',y='delivery_time')
plt.xticks(rotation=90)
plt.show()
No description has been provided for this image
  • Generally Japanese, American, Italian, Mediteranean, Middle Easter, Southern can have delivery times between the 15 (min) and 32 (max).
  • French on average has the longest delivery times
  • Korean has the shortest delivery time
  • Japanese, American, Italian, Indian, Southern have similar average delivery times

Lets check the relationship between cuisine and the day of the week

In [87]:
sns.countplot(data=data,x='cuisine_type',hue='day_of_the_week')
plt.xticks(rotation=90)
plt.show()
No description has been provided for this image
  • The orders seem to be proportional between each of the cuisine types regardless of the day of the week.

Lets compare the cusisine type to cost of the order and consider the day of the week

In [90]:
sns.scatterplot(data=data, x="cuisine_type", y="cost_of_the_order", hue="day_of_the_week")
plt.xticks(rotation=90)
plt.show()
No description has been provided for this image

There may be a lot of overlapping data, so we may want to use a stripplot

In [92]:
plt.figure(figsize=(25, 7)) # To resize the plot
sns.stripplot(data = data, x = "cuisine_type", y = "cost_of_the_order", hue="day_of_the_week"
            )
plt.xticks(rotation=90);
No description has been provided for this image

Question 13: The company wants to provide a promotional offer in the advertisement of the restaurants. The condition to get the offer is that the restaurants must have a rating count of more than 50 and the average rating should be greater than 4. Find the restaurants fulfilling the criteria to get the promotional offer. [3 marks]¶

Let's begin by keeping the data that is rated

In [95]:
# remove the orders that were 'Not Rated'
rated_data = data.loc[data['rating'] != 'Not given'].reset_index()
# convert the ratings to an int
rated_data['order_rating'] = rated_data.loc[:,'rating'].astype(int)
rated_data.head()
Out[95]:
index order_id customer_id restaurant_name cuisine_type cost_of_the_order day_of_the_week rating food_preparation_time delivery_time order_rating
0 2 1477070 66393 Cafe Habana Mexican 12.23 Weekday 5 23 28 5
1 3 1477334 106968 Blue Ribbon Fried Chicken American 29.20 Weekend 3 25 15 3
2 4 1478249 76942 Dirty Bird to Go American 11.59 Weekday 4 25 24 4
3 5 1477224 147468 Tamarind TriBeCa Indian 25.22 Weekday 3 20 24 3
4 7 1477859 89574 Barbounia Mediterranean 5.97 Weekday 3 33 30 3

Next, let's get restrauants who have more than 50 ratings

In [97]:
# create a variable that contains the restraurants and their rating counts
restaurant_rating_count = rated_data.groupby(['restaurant_name'])['order_rating'].count().reset_index()
# rename the rating column to rating_counts and sort them in descending order
restaurant_rating_count.rename(columns={'order_rating':'rating_counts'},inplace=True)
# restaurant_rating_count.sort_values(ascending=False, by= 'rating_counts')
# get list of restaurant with rating greater than 50
restaurant_rating_50 = restaurant_rating_count[restaurant_rating_count['rating_counts']>50]
restaurant_rating_50
Out[97]:
restaurant_name rating_counts
16 Blue Ribbon Fried Chicken 64
17 Blue Ribbon Sushi 73
117 Shake Shack 133
132 The Meatball Shop 84

Now let's get the restrauants who have an average rating greater than 4

In [99]:
restaurant_average_rating = rated_data.groupby(['restaurant_name'])[['order_rating']].mean().reset_index()
restaurant_average_rating.rename(columns={'order_rating':'avg_rating'},inplace=True)
# get the list of restraurants who have a rating greater than 4
rating_greater_4 = restaurant_average_rating[restaurant_average_rating['avg_rating']>4]
rating_greater_4
Out[99]:
restaurant_name avg_rating
0 'wichcraft 5.000000
1 12 Chairs 4.500000
3 67 Burger 5.000000
4 Amma 4.500000
6 Anjappar Chettinad 5.000000
... ... ...
149 Yama 49 5.000000
150 Yama Japanese Restaurant 4.500000
153 da Umberto 5.000000
154 ilili Restaurant 4.153846
155 indikitch 4.500000

110 rows × 2 columns

We have a list of restaurants that have more than 50 ratings (in a dataframe named restaurant_rating_50) And we have a list of restraunts that have an average rating greater than 4 (in a dataframe named rating_greater_4)

Merge both datasets, and create a combined view on the basis of average rating and rating counts

In [101]:
promo_restaurants = restaurant_rating_50.merge(rating_greater_4, on = 'restaurant_name', how = 'inner')
promo_restaurants
Out[101]:
restaurant_name rating_counts avg_rating
0 Blue Ribbon Fried Chicken 64 4.328125
1 Blue Ribbon Sushi 73 4.219178
2 Shake Shack 133 4.278195
3 The Meatball Shop 84 4.511905

Observations:¶

  • 4 restaurants had more than 50 ratings

  • 110 restaurants had a rating above 4

  • 4 restaurants qualified for the promotion:

    • Blue Ribbon Fried Chicken
    • Blue Ribbon Sushi
    • Shake Shack
    • The Meatball Shop

Question 14: The company charges the restaurant 25% on the orders having cost greater than 20 dollars and 15% on the orders having cost greater than 5 dollars. Find the net revenue generated by the company across all orders. [3 marks]¶

Create a function with conditional statements (for each category, 25%, 15% and 0%(else condition) and mention the revenue for each condition.

In [106]:
def calculate_revenue(cost_of_the_order):
    if cost_of_the_order > 20:
        return cost_of_the_order * .25
    elif cost_of_the_order > 5:
        return cost_of_the_order * .15
    else:
        return 0

# test the function, numbers should match
print(20*.15, calculate_revenue(20))
print(22*.25, calculate_revenue(22))
print(5*0, calculate_revenue(5))
3.0 3.0
5.5 5.5
0 0

Apply the function to the cost_of_the_order column to calculate the revenue, sum the value in a revenue column.

In [108]:
data['revenue'] = data['cost_of_the_order'].map(lambda x: calculate_revenue(x))
# sanity check the revenue column with the cost_of_order in the first two rows
print(30.75*.25, calculate_revenue(30.75))
print(12.08*.15, calculate_revenue(12.08))
data
7.6875 7.6875
1.8119999999999998 1.8119999999999998
Out[108]:
order_id customer_id restaurant_name cuisine_type cost_of_the_order day_of_the_week rating food_preparation_time delivery_time revenue
0 1477147 337525 Hangawi Korean 30.75 Weekend Not given 25 20 7.6875
1 1477685 358141 Blue Ribbon Sushi Izakaya Japanese 12.08 Weekend Not given 25 23 1.8120
2 1477070 66393 Cafe Habana Mexican 12.23 Weekday 5 23 28 1.8345
3 1477334 106968 Blue Ribbon Fried Chicken American 29.20 Weekend 3 25 15 7.3000
4 1478249 76942 Dirty Bird to Go American 11.59 Weekday 4 25 24 1.7385
... ... ... ... ... ... ... ... ... ... ...
1893 1476701 292602 Chipotle Mexican Grill $1.99 Delivery Mexican 22.31 Weekend 5 31 17 5.5775
1894 1477421 397537 The Smile American 12.18 Weekend 5 31 19 1.8270
1895 1477819 35309 Blue Ribbon Sushi Japanese 25.22 Weekday Not given 31 24 6.3050
1896 1477513 64151 Jack's Wife Freda Mediterranean 12.18 Weekday 5 23 31 1.8270
1897 1478056 120353 Blue Ribbon Sushi Japanese 19.45 Weekend Not given 28 24 2.9175

1898 rows × 10 columns

The sum of the revenue column will give the total revenue

In [110]:
data['revenue'].sum()
Out[110]:
6166.303

Observations:¶

  • Of the 1898 orders, the revenue was 6166.30.
  • The revenue for the average order was
In [113]:
data['revenue'].mean().round(2)
Out[113]:
3.25

Question 15: The company wants to analyze the total time required to deliver the food. What percentage of orders take more than 60 minutes to get delivered from the time the order is placed? (The food has to be prepared and then delivered.) [2 marks]¶

Create a column with the total food preparation time and the delivery time

In [116]:
data['total_time'] = data.apply(lambda x: x['food_preparation_time'] + x['delivery_time'] , axis=1)
# sanity check the result
data.loc[:, ['food_preparation_time', 'delivery_time', 'total_time']]
Out[116]:
food_preparation_time delivery_time total_time
0 25 20 45
1 25 23 48
2 23 28 51
3 25 15 40
4 25 24 49
... ... ... ...
1893 31 17 48
1894 31 19 50
1895 31 24 55
1896 23 31 54
1897 28 24 52

1898 rows × 3 columns

Get the number of rows where the total time is greater than 60 and divide it by the total number of rows

In [118]:
orders_greaterthan_60 = data[data.total_time > 60].shape[0]
np.round(orders_greaterthan_60/data.shape[0], 2)
Out[118]:
0.11

Determine if correlations between total time and other variables

In [120]:
plt.figure(figsize=(10,5))
data_without_id_columns= data.drop(columns=["order_id", "customer_id"])
sns.heatmap(data_without_id_columns.corr(),annot=True,cmap='Spectral',vmin=-1,vmax=1)
plt.show()
C:\Users\bruce\AppData\Local\Temp\ipykernel_23616\2183855587.py:3: FutureWarning: The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select only valid columns or specify the value of numeric_only to silence this warning.
  sns.heatmap(data_without_id_columns.corr(),annot=True,cmap='Spectral',vmin=-1,vmax=1)
No description has been provided for this image

Observations:¶

  • 11 percent of the orders take more than 60 minutes considering the time it takes to prepare food and then delivered
  • Total time for the order is highly correlated to delivery time

Question 16: The company wants to analyze the delivery time of the orders on weekdays and weekends. How does the mean delivery time vary during weekdays and weekends? [2 marks]¶

Get the mean delivery time for all of the days

In [125]:
data.loc[:,'delivery_time'].mean()
Out[125]:
24.161749209694417

Get the mean delivery time for just the weekends

In [127]:
data.groupby(['day_of_the_week'])[['delivery_time']].mean()
Out[127]:
delivery_time
day_of_the_week
Weekday 28.340037
Weekend 22.470022

View the boxpot to visually inspect the delivery time for the weekend and weekday

In [129]:
sns.boxplot(data=data,x='day_of_the_week',y='delivery_time')
plt.xticks(rotation=90)
plt.show()
No description has been provided for this image

Observations:¶

  • Weekend deliveries are about 6 minutes faster than weekdays
  • There are more orders on weekends with faster deliveries than weekdays
  • Weekday deliveries are 4 minutes longer than the average delivery
  • Weekend deliveries are 2 minutes faster than the average delivery

Conclusion and Recommendations¶

Question 17: What are your conclusions from the analysis? What recommendations would you like to share to help improve the business? (You can use cuisine type and feedback ratings to drive your business recommendations.) [6 marks]¶

Conclusions:¶

  • Weekends dominate over weekday: Weekend is 71%; weekday is 28%
  • Weekend deliveries are faster than weekday deliveries
  • Shake Shack has significant share of about 12% of the orders
  • The top four cuisines (American, Japanese, Italian, Chinese) dominate
  • The top four cuisines have 83% of the weekend orders
  • nearly 40% of ratings are not given
  • The same restaurants that qualified for the promotion are the top four restaurants for orders
  • 11% of orders take an hour or more to arrive
  • 29% of orders are over $20

Recommendations:¶

Grow the business on weekdays

  • The service has capacity to grow on weekdays as suggested by the disparity of weekends dominating the number of orders
  • Promote the promotional offer to the top restaurants on weekdays

Improve delivery times on weekdays

  • Weekdays have an opportunity to grow. Suggest working closely with delivery firms/individuals to speed up weekday delivery
  • Provide offers for weekday buyers near to restarurants

Grow the business with cuisines outside the top 4

  • Highlight lesser served restaurants as alternative choices
  • Target customers who have alternative restaurants nearby (reducing delivery time) with promotion
  • Reward customers who often use the service to try alternative restaurants

Increase ratings for order

  • Provide incentives for new raters
  • Follow up those who do not rate on receipt a few hours after their order is delivered

Increase order size

  • Offer incentives to increase order size to over $20