Dunzo Most Asked Data Science Interview Question

We know that each domain requires a different type of preparation, so we have divided our books in the same way:

Our best seller:
✅Become a Full Stack Analytics Professional with The Data Monk’s master e-book with 2200+ interview questions covering 23 topics – 2200 Most Asked Interview Questions

Machine Learning e-book
✅Data Scientist and Machine Learning Engineer ->23 e-books covering all the ML Algorithms Interview Questions

Domain wise interview e-books
✅Data Analyst and Product Analyst Interview Preparation ->1100+ Most Asked Interview Questions
✅Business Analyst Interview Preparation ->1250+ Most Asked Interview Questions

The Data Monk – 30 Days Mentorship program

We are a group of 30+ people with ~8 years of Analytics experience in product-based companies. We take interviews on a daily basis for our organization and we very well know what is asked in the interviews.
Other skill enhancer websites charge 2lakh+ GST for courses ranging from 10 to 15 months.

We only focus on making you a clear interview with ease. We have released our Become a Full Stack Analytics Professional for anyone in 2nd year of graduation to 8-10 YOE. This book contains 23 topics and each topic is divided into 50/100/200/250 questions and answers. Pick the book and read it thrice, learn it, and appear in the interview.

We also have a complete Analytics interview package
– 2200 questions ebook (Rs.1999) + 23 ebook bundle for Data Science and Analyst role (Rs.1999)
– 4 one-hour mock interviews, every Saturday (top mate – Rs.1000 per interview)
– 4 career guidance sessions, 30 mins each on every Sunday (top mate – Rs.500 per session)
– Resume review and improvement (Top mate – Rs.500 per review)

YouTube channel covering all the interview-related important topics in SQL, Python, MS Excel, Machine Learning Algorithm, Statistics, and Direct Interview Questions
Link –The Data Monk Youtube Channel

Dunzo Data Science Interview Question

Company: Dunzo
Designation: Data Scientist
Year of Experience Required: Not Mentioned
Technical Expertise: SQL, Python/R, Statistics, Machine Learning, Case Studies
Number of Rounds: 5

Dunzo is a leading Indian company providing hyperlocal delivery services across major cities like Bengaluru, Delhi, Gurugram, Pune, Chennai, Jaipur, Mumbai, and Hyderabad. Founded in 2015 and headquartered in Bengaluru, Dunzo has gained significant traction, including funding from Google in 2017. The company also operates a bike taxi service in Gurugram.

Difficulty of Questions

SQL – 9/10

1) What is the difference between VARCHAR and CHAR data types in MySQL?

CHAR is a fixed-length string data type. When you define a CHAR(10) column, it will always store 10 characters, padding with spaces if the actual string is shorter.
VARCHAR is a variable-length string data type. VARCHAR(10) can store up to 10 characters, but it only uses the space needed for the actual string length. This saves storage space.
CHAR is slightly faster when the length of data is consistent, while VARCHAR is more efficient for data of varying lengths.

2) What is the purpose of the PRIMARY KEY constraint in MySQL?

The PRIMARY KEY constraint uniquely identifies each record in a table.
It must contain unique values and cannot contain NULL values.
A table can have only one PRIMARY KEY.
It is often used to establish relationships between tables.

3) What is the difference between DELETE and TRUNCATE statements in MySQL?

DELETE removes rows from a table based on a specified condition (using a WHERE clause). It is a DML (Data Manipulation Language) command. It logs each row deletion, so it is slower.
TRUNCATE removes all rows from a table. It is a DDL (Data Definition Language) command. It does not log individual row deletions and resets the table’s auto-increment counter. Therefore, it is significantly faster than DELETE.
DELETE can be rolled back (if within a transaction), while TRUNCATE cannot.

4) What is the purpose of the WHERE clause in a SELECT statement?

The WHERE clause is used to filter records and retrieve only those that meet a specified condition.
It allows you to specify criteria that rows must satisfy to be included in the result set.
Without a where clause, all rows of the table will be returned.

5) What is the purpose of the JOIN clause in MySQL?

The JOIN clause is used to combine rows from two or more tables based on a related column between them.
It allows you to retrieve data from multiple tables in a single query.
Common JOIN types include INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN.
Joins are essential for relational databases, because they allow data normalization.

R/Python – 7/10

1) Write a Python list comprehension that creates a list containing the squares of all even numbers between 1 and 10 (inclusive).

2) Write a Python function that takes two dictionaries as input and returns a new dictionary that contains all the key-value pairs from both dictionaries. If there are duplicate keys, the values from the second dictionary should take precedence.

3) Write a Python function that reads a text file and returns the number of lines in the file.

4) Given the variables name = "Alice" and age = 30, create a formatted string that says “Alice is 30 years old.” using f-strings.

5) Write a Python function that divides two numbers and handles the ZeroDivisionError if the second number is zero.

Statistics/ML

1) Is it better to spend 5 days developing a 90% accurate solution, or 10 days for 100% accuracy? Which one would you prefer?

It depends on the context of the problem. If the solution is time-sensitive and the 10% inaccuracy has minimal impact, a 90% accurate solution in 5 days is preferable. However, if the problem is critical (e.g., healthcare or finance), investing 10 days for 100% accuracy is justified. In most business scenarios, a 90% solution that can be iteratively improved is often the better choice.

2) How do data management procedures like missing data handling make selection bias worse?

Missing data handling techniques like listwise deletion or mean imputation can introduce selection bias if the missing data is not random. For example, if data is missing for a specific subgroup (e.g., high-income customers), removing or imputing this data can skew the analysis and lead to incorrect conclusions. To mitigate this, use techniques like Multiple Imputation or analyze the missing data pattern before handling it.

3) When does regularization become necessary in Machine Learning? Explain with example situations.

Regularization is necessary when a model is overfitting, i.e., performing well on training data but poorly on unseen data. For example:

In linear regression, adding L1 (Lasso) or L2 (Ridge) regularization helps reduce overfitting by penalizing large coefficients.

In deep learning, dropout regularization is used to prevent overfitting in neural networks.

4) How to optimize a web crawler to run much faster, extract better information, and better summarize data to produce cleaner databases?

Faster Execution: Use asynchronous programming or parallel processing to handle multiple requests simultaneously.

Better Information Extraction: Implement advanced parsing techniques (e.g., regex, XPath) and machine learning models to identify relevant data.

Cleaner Databases: Use data validation rules, deduplication, and normalization techniques to ensure data quality.

5) You are about to send one million emails (marketing campaign). How do you optimise delivery? How do you optimise response?

Optimize Delivery: Use a reliable email service provider (ESP), segment your email list, and ensure your emails comply with anti-spam regulations.

Optimize Response: Personalize emails, use A/B testing to refine subject lines and content, and include clear calls-to-action (CTAs).

Case Study

Problem Statement:

Dunzo wants to improve its delivery time prediction model to provide customers with more accurate estimated delivery times (ETAs). Your task as a data scientist is to analyze delivery data, identify key factors affecting delays, and suggest strategies to enhance delivery time accuracy.

Dataset Overview:

You have access to a dataset containing past delivery records. The dataset includes the following attributes:

Order_ID – Unique identifier for each order
User_ID – Unique identifier for the customer
Vendor_ID – Unique identifier for the store/restaurant partner
Delivery_Partner_ID – Unique identifier for the delivery agent
Order_Placement_Time – Timestamp when the order was placed
Order_Confirmation_Time – Timestamp when the vendor accepted the order
Order_Pickup_Time – Timestamp when the delivery agent picked up the order
Order_Delivery_Time – Timestamp when the order was delivered
Distance_KM – Distance between the vendor and customer in kilometers
Traffic_Conditions – Traffic level at the time of delivery (Low, Medium, High)
Weather_Conditions – Weather at the time of delivery (Clear, Rainy, Stormy)
Delivery_Delay_Flag – Whether the delivery was delayed (1 = Yes, 0 = No)

Key Questions to Answer:

1. What are the main factors influencing delivery delays?

Does distance significantly impact delivery time?
How do traffic and weather conditions affect delivery times?
Do vendor preparation times vary across different types of stores?

2. How can we improve ETA predictions?

Can we build a machine learning model to predict delivery times more accurately?
Should we personalize ETAs based on historical data of specific delivery partners and vendors?
Can real-time factors like current traffic and weather be integrated into the prediction model?

3. How can Dunzo optimize its delivery operations?

Should delivery partners be assigned based on historical speed and efficiency?
Can we dynamically adjust ETAs based on current road conditions?
How can we use past data to predict peak demand times and optimize delivery capacity?

Key Insights & Business Recommendations

1. Identifying and Addressing Delay Causes

Distance and Traffic Impact: Orders with longer distances and higher traffic congestion tend to have more delays. Prioritizing deliveries based on real-time traffic updates can improve ETAs.
Weather Conditions: Rain and storms significantly increase delays. During adverse weather, Dunzo should dynamically adjust ETAs and notify users about possible delays.
Vendor Preparation Time: Some restaurants and stores take longer to prepare orders. Dunzo should use vendor-specific data to provide more accurate pickup estimates.

2. Enhancing ETA Predictions Using Real-Time Data

Integrating Traffic APIs: Using real-time traffic data from Google Maps or other sources can improve ETA predictions.
Personalized ETAs: ETAs should be personalized based on the historical speed of a specific delivery partner rather than using a general estimate.
Dynamic ETA Adjustments: If an order experiences unexpected delays (e.g., a delivery partner is stuck in traffic), the system should automatically update the ETA.

3. Optimizing Delivery Partner Assignments

Assigning the Nearest Available Partner: Instead of assigning randomly, Dunzo should prioritize the nearest delivery partner with a strong on-time delivery history.
Predicting Peak Demand Hours: Analyzing past data can help predict high-demand times, allowing Dunzo to pre-position delivery partners in key areas.
Encouraging Efficiency with Incentives: Offering performance-based incentives to fast and reliable delivery partners can reduce delays and improve customer satisfaction.

4. Improving Customer Communication and Satisfaction

Proactive Delay Notifications: If an order is delayed, customers should receive real-time updates along with revised ETAs.
Transparency in ETAs: Showing the factors affecting delivery time (traffic, vendor delay, etc.) can improve customer trust.
Customer Feedback Integration: Analyzing customer feedback on delivery delays can help refine the prediction model and improve service quality.

Basic, you can practice a lot of case studies and other statistics topics here –
http://thedatamonk.com/data-science-resources/

For any information related to courses or e-books, please send an email to nitinkamal132@gmail.com

Dunzo Data Science Interview: Most Asked Questions and Expert Tips

The Data Monk – 30 Days Mentorship program

Dunzo Data Science Interview Question