If you are preparing for an Uber Data Scientist interview, you are entering one of the most rigorous DS pipelines in the tech industry. The Uber Data Scientist Interview Questions test your ability to design surge pricing experiments, detect demand-supply imbalances using statistics, build ML models for ETA prediction, and run A/B tests on marketplace features β all in the context of real ride-sharing operations at scale. This guide covers all four rounds with exact questions, complete SQL and Python answers, and the frameworks that Uber interviewers score you on.
Python
Machine Learning
Statistics
A/B Testing
Round 1 β Uber Data Scientist Interview Questions: SQL & Data Analysis (45 mins)
| Questions | 4 |
| Duration | 45 minutes |
| Difficulty | MediumβHard |
| Key Topics | Fulfillment RateDriver UtilizationSurge AnalysisWindow FunctionsCohort Retention |
The growth team wants to understand how well Uber retains first-time riders. For riders who completed their first ride in Week 1 of a given month, compute their retention rate in each of the next 4 weeks. A rider is retained in a week if they complete at least one ride that week.
Table:
- trips(trip_id, rider_id, driver_id, city, request_ts, end_ts, status, fare_usd)
MIN(request_ts) per rider, then LEFT JOINing subsequent week activity. Many candidates forget to LEFT JOIN (using INNER JOIN instead), which drops riders who churned and makes retention look artificially high. The interviewer will explicitly ask you to show weeks where riders are absent too.WITH cohort AS (
SELECT
rider_id,
DATE_TRUNC('week', MIN(request_ts)) AS cohort_week
FROM trips
WHERE status = 'completed'
GROUP BY rider_id
HAVING DATE_TRUNC('week', MIN(request_ts))
= DATE_TRUNC('week', '2026-01-01'::date) -- Week 1 of target month
),
weeks AS (
SELECT generate_series(1, 4) AS week_num
),
activity AS (
SELECT
t.rider_id,
(DATE_TRUNC('week', t.request_ts) - c.cohort_week) / 7 AS weeks_since_first
FROM trips t
JOIN cohort c USING (rider_id)
WHERE t.status = 'completed'
AND t.request_ts >= c.cohort_week + INTERVAL '7 days'
GROUP BY t.rider_id, c.cohort_week, DATE_TRUNC('week', t.request_ts)
)
SELECT
w.week_num,
COUNT(DISTINCT a.rider_id) AS retained_riders,
COUNT(DISTINCT c.rider_id) AS cohort_size,
ROUND(COUNT(DISTINCT a.rider_id)::numeric
/ COUNT(DISTINCT c.rider_id) * 100, 1) AS retention_pct
FROM weeks w
CROSS JOIN cohort c
LEFT JOIN activity a
ON a.rider_id = c.rider_id
AND a.weeks_since_first = w.week_num
GROUP BY w.week_num
ORDER BY w.week_num;
CROSS JOIN cohort before LEFT JOIN activity is what ensures weeks with zero retained riders still appear in the output. generate_series(1,4) creates synthetic week rows so the query handles “no riders in Week 3” correctly. If you used an INNER JOIN, any week with 0% retention would vanish from results β that is a critical Uber interviewer flag.- How would you break this down by city to identify where Uber has the biggest Week 2 drop-off?
- A rider requested a ride but no driver accepted. Does that count as “active” for retention purposes? What data would you check?
Compute driver earnings per online hour for each city for last week. A driver is “online” when their app status is available or on_trip. Only count completed trips for earnings. Flag cities where earnings/hour dropped more than 10% week-over-week.
Tables:
- driver_status_log(driver_id, city, status, event_ts)
- trips(trip_id, driver_id, city, end_ts, status, driver_earnings_usd)
LEAD(event_ts) to calculate time spent in each status. The interviewer specifically checks whether you filter only available + on_trip statuses (not offline) for online time. The WoW comparison using CTEs is a bonus that shows analytical maturity.WITH status_sessions AS (
SELECT
driver_id,
city,
status,
event_ts AS start_ts,
LEAD(event_ts) OVER (PARTITION BY driver_id ORDER BY event_ts) AS end_ts
FROM driver_status_log
),
online_hours AS (
SELECT
city,
DATE_TRUNC('week', start_ts) AS week_start,
SUM(EXTRACT(EPOCH FROM (end_ts - start_ts)) / 3600) AS total_online_hrs
FROM status_sessions
WHERE status IN ('available', 'on_trip')
AND end_ts IS NOT NULL
GROUP BY city, DATE_TRUNC('week', start_ts)
),
earnings AS (
SELECT
city,
DATE_TRUNC('week', end_ts) AS week_start,
SUM(driver_earnings_usd) AS total_earnings
FROM trips
WHERE status = 'completed'
GROUP BY city, DATE_TRUNC('week', end_ts)
),
weekly AS (
SELECT
e.city,
e.week_start,
e.total_earnings / NULLIF(o.total_online_hrs, 0) AS earnings_per_hr
FROM earnings e
JOIN online_hours o USING (city, week_start)
)
SELECT
curr.city,
curr.earnings_per_hr AS curr_week_eph,
prev.earnings_per_hr AS prev_week_eph,
ROUND((curr.earnings_per_hr - prev.earnings_per_hr)
/ NULLIF(prev.earnings_per_hr, 0) * 100, 1) AS pct_change,
CASE WHEN curr.earnings_per_hr < prev.earnings_per_hr * 0.9
THEN 'FLAG' ELSE 'OK' END AS status
FROM weekly curr
JOIN weekly prev
ON curr.city = prev.city
AND curr.week_start = prev.week_start + INTERVAL '7 days'
WHERE curr.week_start = DATE_TRUNC('week', CURRENT_DATE - INTERVAL '7 days')
ORDER BY pct_change;
LEAD(event_ts) converts the event log into sessions. NULLIF(total_online_hrs, 0) prevents division-by-zero if a driver had no online time recorded. The 10% threshold flag catches cities where the utilization model is degrading β at Uber this often signals a supply shortage or price elasticity shift that the PM team needs to investigate.- A driver went online at 11:58 PM and offline at 12:05 AM β how do you handle sessions that cross midnight in your week boundary logic?
- How would you identify drivers in the bottom 10th percentile of earnings/hour who might churn? What table would you add?
The pricing team wants to identify zones where demand consistently outstrips supply during peak hours (7β9 AM, 5β8 PM weekdays). For each zone and peak-hour slot, compute the unfulfilled request rate = requests with no driver acceptance within 5 minutes / total requests. Return zones where this rate exceeded 15% on more than 3 days in the last 30 days.
Tables:
- ride_requests(request_id, rider_id, zone_id, request_ts, status)
- trip_matches(request_id, driver_id, match_ts)
match_ts - request_ts <= INTERVAL '5 minutes' inside the JOIN condition, not a WHERE clause (otherwise LEFT JOINs lose their unmatched rows). The interviewer also checks whether you correctly filter peak hours using EXTRACT(HOUR FROM request_ts) and exclude weekends.WITH peak_requests AS (
SELECT
r.request_id,
r.zone_id,
r.request_ts::date AS req_date,
CASE
WHEN EXTRACT(HOUR FROM r.request_ts) BETWEEN 7 AND 8 THEN 'morning'
WHEN EXTRACT(HOUR FROM r.request_ts) BETWEEN 17 AND 19 THEN 'evening'
END AS peak_slot,
m.match_ts
FROM ride_requests r
LEFT JOIN trip_matches m
ON m.request_id = r.request_id
AND m.match_ts - r.request_ts <= INTERVAL '5 minutes'
WHERE r.request_ts >= CURRENT_DATE - INTERVAL '30 days'
AND EXTRACT(DOW FROM r.request_ts) BETWEEN 1 AND 5 -- MonβFri
AND (
EXTRACT(HOUR FROM r.request_ts) BETWEEN 7 AND 8
OR EXTRACT(HOUR FROM r.request_ts) BETWEEN 17 AND 19
)
),
daily_zone AS (
SELECT
zone_id,
req_date,
peak_slot,
COUNT(*) AS total_requests,
SUM(CASE WHEN match_ts IS NULL THEN 1 ELSE 0 END) AS unfulfilled,
ROUND(SUM(CASE WHEN match_ts IS NULL THEN 1.0 ELSE 0 END)
/ COUNT(*) * 100, 1) AS unfulfilled_rate
FROM peak_requests
GROUP BY zone_id, req_date, peak_slot
)
SELECT
zone_id,
peak_slot,
COUNT(*) FILTER (WHERE unfulfilled_rate > 15) AS high_miss_days,
ROUND(AVG(unfulfilled_rate), 1) AS avg_miss_rate
FROM daily_zone
GROUP BY zone_id, peak_slot
HAVING COUNT(*) FILTER (WHERE unfulfilled_rate > 15) > 3
ORDER BY high_miss_days DESC, avg_miss_rate DESC;
match_ts = NULL, which the SUM(CASE WHEN match_ts IS NULLβ¦) counts as unfulfilled. COUNT(*) FILTER (WHERE β¦) is a cleaner PostgreSQL idiom than a nested CASE. The HAVING clause is the final precision filter for chronic under-served zones.- How would you determine if adding more incentive bonuses for drivers in those zones during peak hours actually improved the fulfillment rate? Design the experiment.
- A zone has 95% fulfillment but riders are cancelling within 2 minutes of match β is that a supply problem? How do you diagnose it?
Round 2 β Uber Data Scientist Interview Questions: Statistics & A/B Testing (50 mins)
| Questions | 4 |
| Duration | 50 minutes |
| Difficulty | Hard |
| Key Topics | A/B TestingCUPEDNetwork EffectsHypothesis TestingVariance Reduction |
Uber wants to test a new surge pricing algorithm. You propose a standard rider-level A/B test: half the riders see the new surge, half see the old one. Your manager says this will produce biased results. Explain why, and propose a valid experiment design.
The core problem is interference between units β rides are not independent. When you run a rider-level A/B test for surge pricing:
- Treatment riders face higher surge β some cancel β more drivers become available
- Control riders (same city, same time) now get faster pickups and lower fares due to the excess supply they did not “earn”
- The control group’s metrics are inflated by the treatment, making the new algorithm look worse
Valid design options:
Uber typically uses geographic holdout with hexagonal cells (H3 grid) or interleaved switchback tests for pricing experiments because they minimize spillover while preserving statistical power.
- If you use a geographic holdout and the treatment city has a higher baseline cancellation rate, how do you control for that pre-existing difference?
- How do you compute the minimum detectable effect for a switchback test where the variance per time slot is 3x higher than a standard A/B test?
You are running an A/B test on a new ETA prediction model. Pre-experiment data shows high rider-level variance in trip frequency. Explain how CUPED (Controlled-experiment Using Pre-Experiment Data) works and implement it in Python to reduce variance in the metric trips_per_day.
Y_cuped = Y - theta * (X - E[X]) where theta = Cov(Y, X) / Var(X). You should be able to explain why this reduces variance: it removes the part of Y explained by the stable pre-period behavior X.import pandas as pd
import numpy as np
from scipy import stats
# df columns: rider_id, group ('control'/'treatment'),
# trips_pre (pre-experiment avg trips/day),
# trips_post (experiment avg trips/day)
def apply_cuped(df, metric_col='trips_post', covariate_col='trips_pre'):
"""Apply CUPED variance reduction to an experiment metric."""
X = df[covariate_col].values
Y = df[metric_col].values
# Estimate theta via OLS: Cov(Y, X) / Var(X)
theta = np.cov(Y, X, ddof=1)[0, 1] / np.var(X, ddof=1)
# Compute CUPED-adjusted metric
df = df.copy()
df['Y_cuped'] = Y - theta * (X - X.mean())
return df, theta
df, theta = apply_cuped(df)
print(f"Theta: {theta:.4f}")
# Variance reduction
var_original = df['trips_post'].var()
var_cuped = df['Y_cuped'].var()
print(f"Variance reduction: {(1 - var_cuped/var_original)*100:.1f}%")
# t-test on CUPED metric
control = df[df['group'] == 'control']['Y_cuped']
treatment = df[df['group'] == 'treatment']['Y_cuped']
t_stat, p_val = stats.ttest_ind(treatment, control)
print(f"t-stat: {t_stat:.4f}, p-value: {p_val:.4f}")
# Effect size
effect = treatment.mean() - control.mean()
print(f"ATE (CUPED): {effect:.4f} trips/day")
E[X - E[X]] = 0 β subtracting the covariate adjustment has zero expected effect on the true treatment effect. The variance reduction factor equals the squared correlation between Y and X: if r = 0.7, you reduce variance by 49%, which halves required sample size. At Uber, CUPED on pre-period trips/day typically achieves 30β50% variance reduction, cutting experiment runtime from 3 weeks to under 2.- What if a rider had no trips in the pre-experiment period? How do you handle the missing covariate β impute with 0, drop the row, or use a different baseline?
- Can you use multiple pre-period covariates (trips, cancellations, app opens) in CUPED? What method handles this?
Round 3 β Uber Data Scientist Interview Questions: ML System Design (60 mins)
| Questions | 2 |
| Duration | 60 minutes |
| Difficulty | Hard |
| Key Topics | ETA PredictionChurn PredictionFeature EngineeringModel MonitoringOnline vs Batch |
Uber is losing 18% of drivers annually. Design a driver churn prediction model that identifies drivers likely to stop driving in the next 30 days, so the retention team can intervene with targeted incentives before churn happens.
Step 1 β Define churn precisely: A driver is churned if they complete zero trips in the next 30 days AND did not self-deactivate (voluntary exit) AND their account is not suspended. We exclude seasonal drivers (those with known seasonal patterns in the historical data) from the model.
Step 2 β Feature engineering (as of prediction date T):
- Recency features: Days since last trip, days since last login, days since last surge earnings
- Frequency features: Trips per week (last 4 weeks), weekly active hours, weekend vs weekday ratio
- Earnings features: 30-day earnings trend (slope), earnings vs city median ratio, tip rate
- Experience features: Days since onboarding, total lifetime trips, rating trend (last 20 trips)
- Behavioral signals: App opens without trips (frustration signal), cancellation rate last 14 days, support tickets last 30 days
Step 3 β Model choice: XGBoost or LightGBM (handles mixed feature types, missing values naturally, fast training). Binary classification with probability output so the retention team can tier drivers by risk score (high/medium/low risk buckets).
Step 4 β Training setup:
- Training window: 12 months of historical data, sliding windows to avoid temporal leakage
- Label: 1 if driver churned 30 days after feature snapshot, 0 otherwise
- Class imbalance: ~82% non-churners β use class weights or SMOTE, optimize for F1 at business-relevant threshold
Step 5 β Deployment & monitoring: Batch scoring daily (not real-time β churn is a slow process). Monitor: feature drift via PSI (Population Stability Index), label drift via weekly churn rate, model calibration via expected vs actual churn rate per score decile.
- How do you measure whether the intervention (incentive sent to predicted churners) actually caused drivers to stay, vs they would have stayed anyway?
- A driver completed 1 trip last month vs 50 trips 6 months ago β which feature captures this better: recency or a time-weighted frequency score?
π Recommended Resources to Crack Uber DS Interview
Used by 40,000+ analysts & data scientists to crack product company interviews
- 2,200+ questions Β· 23 topics
- ML, Statistics, SQL, Python, A/B Testing
- Scenario-based questions with detailed answers
- 220+ questions Β· SQL advanced, data models, DBMS
- Window functions, CTEs, query optimization
- Covers Uber-style marketplace query patterns
4-Week Prep Plan: Crack Uber Data Scientist Interview
Build Foundation β SQL, Python, Statistics Basics
Master window functions, CTEs, and marketplace query patterns (cohort retention, funnel analysis, utilization metrics). Refresh statistics: hypothesis testing, p-values, confidence intervals, CLT. Study Uber’s marketplace model β demand/supply balance, surge pricing mechanics, driver incentive structure. Read TDM’s guide on A/B testing for two-sided marketplaces.
2200 Most Asked Analytics Interview Questions β DS & ML Sections
Cover SQL, Python, case study, statistics, and ML sections of the 2200 questions ebook. Focus on the experiment design and A/B testing chapters β these are Uber’s differentiation from standard DA roles. Aim for 100+ questions per day. End of Week 2: First mock interview β
Ace Any SQL Interview β Full Coverage + ML Design Practice
Complete the Ace Any SQL Interview ebook cover to cover β 220+ questions. Practice 2 ML system design questions per day (churn, ETA, pricing models). For each, walk through: problem definition β data β features β model β evaluation β deployment β monitoring. End of Week 3: Second mock interview β
Timed Practice + Revision
Revise first 5 chapters of 2200 ebook. Timed practice: 15 min per SQL, 20 min per case study, 25 min per ML design. Simulate real Uber interview pacing β they move fast. End of Week 4: Final mock interview β