If you are preparing for a PhonePe Data Scientist role, you are entering one of India’s most competitive DS interview processes. The PhonePe Data Scientist Interview Questions cover advanced statistics (Bayesian inference, CLT violations, non-parametric tests), fraud detection ML, UPI transaction pattern analysis in SQL, Python-based experimentation, and product case studies centered on India’s digital payments ecosystem. This guide covers all four rounds with real questions, complete answers, and the frameworks that PhonePe interviewers score on.
Python
Machine Learning
Statistics
Fraud Detection
Round 1 โ PhonePe Data Scientist Interview Questions: SQL & Analytics (45 mins)
| Questions | 4 |
| Duration | 45 minutes |
| Difficulty | MediumโHard |
| Key Topics | UPI FunnelReferral AttributionMerchant Success RateWindow FunctionsCohort Analysis |
PhonePe runs a referral program where existing users earn cashback when they refer new users who complete their first UPI transaction. The marketing team disputes the attribution model: some new users were referred by multiple people before signing up. Compute referral earnings under both first-touch (credit the first referrer) and last-touch (credit the most recent referrer) models for last month, and show the difference in payout per referrer.
Tables:
- referrals(referral_id, referrer_user_id, referred_user_id, referral_ts)
- transactions(txn_id, user_id, amount, status, created_at)
- Cashback: โน100 per successful first transaction of a referred user
ROW_NUMBER() OVER (PARTITION BY referred_user_id ORDER BY referral_ts ASC) = 1 for first-touch and ... ORDER BY referral_ts DESC) = 1 for last-touch. The trap: only referrals that preceded the first qualifying transaction should be eligible โ a referral sent AFTER the user’s first transaction is not a valid attribution source.WITH first_txn AS (
SELECT
user_id,
MIN(created_at) AS first_txn_ts
FROM transactions
WHERE status = 'success'
AND DATE_TRUNC('month', created_at) = DATE_TRUNC('month', CURRENT_DATE - INTERVAL '1 month')
GROUP BY user_id
),
eligible_referrals AS (
SELECT
r.referral_id,
r.referrer_user_id,
r.referred_user_id,
r.referral_ts,
f.first_txn_ts,
ROW_NUMBER() OVER (PARTITION BY r.referred_user_id
ORDER BY r.referral_ts ASC) AS first_touch_rank,
ROW_NUMBER() OVER (PARTITION BY r.referred_user_id
ORDER BY r.referral_ts DESC) AS last_touch_rank
FROM referrals r
JOIN first_txn f
ON f.user_id = r.referred_user_id
AND r.referral_ts < f.first_txn_ts -- referral must precede first txn
),
first_touch AS (
SELECT referrer_user_id, COUNT(*) * 100 AS first_touch_cashback
FROM eligible_referrals
WHERE first_touch_rank = 1
GROUP BY referrer_user_id
),
last_touch AS (
SELECT referrer_user_id, COUNT(*) * 100 AS last_touch_cashback
FROM eligible_referrals
WHERE last_touch_rank = 1
GROUP BY referrer_user_id
)
SELECT
COALESCE(ft.referrer_user_id, lt.referrer_user_id) AS referrer_user_id,
COALESCE(ft.first_touch_cashback, 0) AS first_touch_cashback,
COALESCE(lt.last_touch_cashback, 0) AS last_touch_cashback,
COALESCE(ft.first_touch_cashback, 0)
- COALESCE(lt.last_touch_cashback, 0) AS cashback_diff
FROM first_touch ft
FULL OUTER JOIN last_touch lt USING (referrer_user_id)
ORDER BY ABS(cashback_diff) DESC;
r.referral_ts < f.first_txn_ts filter inside the JOIN is the critical eligibility gate โ referrals sent after the user already transacted are not valid. The FULL OUTER JOIN on the final aggregation ensures referrers who appear in only one model are still shown. The ABS(cashback_diff) sort surfaces the referrers most impacted by the attribution model choice โ these are the ones the finance team should scrutinize first.- PhonePe moves to a linear attribution model where cashback is split equally across all valid referrers. How does your query change?
- The referral_ts and first_txn_ts are in different time zones (referral in UTC, transaction in IST). How do you normalize before comparison?
The merchant success team wants to identify city+category combinations where the UPI payment success rate dropped by more than 5 percentage points compared to the same period last month. They also want to flag where the drop is driven specifically by bank_timeout failures (vs other failure reasons).
Tables:
- merchant_payments(payment_id, merchant_id, city, category, status, failure_reason, amount, created_at)
WITH monthly AS (
SELECT
city,
category,
DATE_TRUNC('month', created_at) AS month,
COUNT(*) AS total_attempts,
SUM(CASE WHEN status = 'success' THEN 1 ELSE 0 END) AS successes,
SUM(CASE WHEN failure_reason = 'bank_timeout' THEN 1 ELSE 0 END) AS timeout_failures
FROM merchant_payments
WHERE created_at >= DATE_TRUNC('month', CURRENT_DATE - INTERVAL '2 months')
GROUP BY city, category, DATE_TRUNC('month', created_at)
),
curr AS (SELECT * FROM monthly WHERE month = DATE_TRUNC('month', CURRENT_DATE - INTERVAL '1 month')),
prev AS (SELECT * FROM monthly WHERE month = DATE_TRUNC('month', CURRENT_DATE - INTERVAL '2 months'))
SELECT
c.city,
c.category,
ROUND(c.successes * 100.0 / NULLIF(c.total_attempts, 0), 1) AS curr_success_rate,
ROUND(p.successes * 100.0 / NULLIF(p.total_attempts, 0), 1) AS prev_success_rate,
ROUND(c.successes * 100.0 / NULLIF(c.total_attempts, 0)
- p.successes * 100.0 / NULLIF(p.total_attempts, 0), 1) AS rate_change_pp,
ROUND(c.timeout_failures * 100.0 / NULLIF(c.total_attempts-c.successes, 0), 1)
AS timeout_share_of_failures,
CASE
WHEN c.timeout_failures * 1.0 / NULLIF(c.total_attempts-c.successes, 0) > 0.5
THEN 'BANK_TIMEOUT_DRIVEN' ELSE 'OTHER_FAILURES'
END AS failure_driver
FROM curr c
JOIN prev p USING (city, category)
WHERE c.successes * 100.0 / NULLIF(c.total_attempts, 0)
- p.successes * 100.0 / NULLIF(p.total_attempts, 0) < -5
ORDER BY rate_change_pp;
NULLIF(total_attempts - successes, 0) prevents division by zero when computing timeout share of failures โ if there are zero failures, the denominator is zero (which NULLIF catches). The CASE WHEN timeout_failures > 50% of failures THEN 'BANK_TIMEOUT_DRIVEN' flag lets the ops team immediately route the issue to the bank’s payment gateway team rather than spending time diagnosing internal Paytm server issues.- Some city+category combos have only 50 transactions this month โ is a 5 pp drop statistically significant? How do you add a minimum volume filter?
- How would you build a real-time alert that fires when any city’s 30-minute rolling success rate drops below 85%?
Round 2 โ PhonePe Data Scientist Interview Questions: Statistics & Experimentation (50 mins)
| Questions | 4 |
| Duration | 50 minutes |
| Difficulty | Hard |
| Key Topics | Non-Parametric TestsPower AnalysisBayesian A/BOutlier HandlingMultiple Testing |
You are running an A/B test on a new UPI cashback feature. The success metric is average transaction amount. You run a two-sample t-test and get p = 0.03. Your senior data scientist says the t-test result is unreliable for this metric. Explain why, and show what to do instead in Python.
import numpy as np
import pandas as pd
from scipy import stats
# Check distribution shape
ctrl = df[df['group']=='control']['txn_amount']
trt = df[df['group']=='treatment']['txn_amount']
print("Skewness โ Control:", ctrl.skew(), "Treatment:", trt.skew())
print("Kurtosis โ Control:", ctrl.kurtosis(), "Treatment:", trt.kurtosis())
# Problem: highly skewed โ t-test unreliable even at n=5000
# Option 1: Mann-Whitney U (compares distributions, not means)
stat, p_mw = stats.mannwhitneyu(trt, ctrl, alternative='two-sided')
print(f"\nMann-Whitney p-value: {p_mw:.4f}")
# Interprets: P(trt amount > ctrl amount) vs 0.5
# Option 2: Bootstrap CI on mean difference
n_boot = 10000
boot_diffs = []
for _ in range(n_boot):
b_ctrl = ctrl.sample(len(ctrl), replace=True).mean()
b_trt = trt.sample(len(trt), replace=True).mean()
boot_diffs.append(b_trt - b_ctrl)
boot_diffs = np.array(boot_diffs)
ci_lo, ci_hi = np.percentile(boot_diffs, [2.5, 97.5])
print(f"\nBootstrap 95% CI on mean diff: [{ci_lo:.2f}, {ci_hi:.2f}]")
print("Significant?" , "YES" if ci_lo > 0 or ci_hi < 0 else "NO")
# Option 3: Log-transform if ratio is more meaningful than absolute diff
ctrl_log = np.log1p(ctrl)
trt_log = np.log1p(trt)
t_stat, p_log = stats.ttest_ind(trt_log, ctrl_log)
print(f"\nt-test on log(amount): p={p_log:.4f}")
- You capped transaction amounts at the 99th percentile to reduce outlier influence. What are the risks of this approach and how do you report it?
- You are running 10 metrics simultaneously in this A/B test. How do you control for false discoveries across all 10 tests?
PhonePe’s cashback experiment has: baseline daily transactions per user = 2.3, standard deviation = 4.1, minimum detectable effect = 5% lift (0.115 additional transactions/user/day). With ฮฑ = 0.05 and power = 0.80, how many users are needed per group? Compute this in Python and explain each parameter’s business meaning.
from statsmodels.stats.power import TTestIndPower
analysis = TTestIndPower()
# Parameters
baseline_mean = 2.3 # avg transactions/user/day in control
std_dev = 4.1 # std of daily transactions/user
mde = 0.115 # 5% lift = 0.05 * 2.3
effect_size = mde / std_dev # Cohen's d
# Sample size per group
n_per_group = analysis.solve_power(
effect_size = effect_size,
alpha = 0.05,
power = 0.80,
alternative = 'two-sided'
)
print(f"Cohen's d: {effect_size:.4f}")
print(f"Users needed per group: {int(n_per_group)+1:,}")
print(f"Total users: {(int(n_per_group)+1)*2:,}")
# Business interpretation
daily_active_users = 50_000_000 # PhonePe's ~50M DAU
exposure_pct = (n_per_group * 2) / daily_active_users
print(f"Exposure needed: {exposure_pct*100:.1f}% of DAU")
# If sample available โ estimate runtime
# Assume 30% of DAU eligible for cashback experiment
eligible_per_day = daily_active_users * 0.30
days_needed = (n_per_group * 2) / eligible_per_day
print(f"Days needed at 30% eligibility: {days_needed:.1f} days")
- If after running for 3 days the p-value is already 0.001, should you stop the experiment early? What is peeking and why is it a problem?
- How does the required sample size change if you use CUPED with a pre-experiment covariate that has r=0.6 with the outcome?
Round 3 โ PhonePe Data Scientist Interview Questions: ML & Fraud Detection (60 mins)
| Questions | 2 |
| Duration | 60 minutes |
| Difficulty | Hard |
| Key Topics | Fraud DetectionReal-time ScoringClass ImbalanceModel ExplanationFeature Engineering |
PhonePe processes 1.2 billion UPI transactions per month. Design a real-time fraud detection system that scores every transaction within 150ms of initiation, blocks high-confidence fraud, and sends medium-confidence cases through additional friction (OTP re-verification). The fraud rate is ~0.01% and false positive cost (blocking a legitimate transaction) is 10x more damaging to PhonePe than a false negative (missing fraud).
Architecture โ Two-Stage Scoring:
Stage 1 โ Rule Engine (<5ms): Hard-block transactions matching known fraud patterns without ML overhead:
- VPA (Virtual Payment Address) blacklist lookup from Redis (O(1), <1ms)
- Velocity rules: >5 transactions within 1 minute from same device ID
- Amount rules: first transaction to a new VPA >โน50,000 without prior relationship
- Geographic anomaly: device GPS in Mumbai, transaction VPA location in Rajasthan
Stage 2 โ ML Model (<120ms): LightGBM (fast inference, <10ms per record) scores transactions passing Stage 1.
Feature engineering (all computed in <100ms from Redis/pre-computed stores):
- User-level recency features: Count of unique VPAs transacted to in last 7 days, transaction amounts 30-day rolling avg and std (pre-computed, updated nightly)
- Session features: Transactions in current session, time since last transaction, device-user mismatch score
- Network features: Is the recipient VPA new (registered <7 days ago)? Has recipient VPA received any chargebacks?
- Behavioral biometrics: Typing speed on PIN entry, scroll pattern during transaction flow (engineered from app telemetry, served from feature store)
Training setup:
- Class imbalance: 0.01% fraud โ use SMOTE + class_weight=’balanced’ + optimize PR-AUC (not ROC-AUC, which is misleading at this imbalance ratio)
- Cost-sensitive learning: set
scale_pos_weight = 10in LightGBM (1 fraud miss costs 10x a false positive) - Threshold: set above 0.85 for hard-block; 0.50โ0.85 for friction; <0.50 for pass-through
Model explanation for compliance: Use SHAP values pre-computed for top 5 features per transaction. If a user disputes a block: “Your transaction was flagged because you sent to a VPA registered 2 days ago, this is 8x your usual transaction amount, and your device is new on this account.”
- Your model has 99.5% accuracy. Is that a useful metric for a 0.01% fraud rate? What do you report to the VP of Risk instead?
- A fraudster reverse-engineers your velocity rule (5 transactions/minute) and starts doing 4 per minute. How do you detect this adversarial adaptation without retraining?
๐ Recommended Resources to Crack PhonePe DS Interview
Used by 40,000+ data scientists to crack fintech and product company interviews
- 2,200+ questions ยท 23 topics
- ML, Statistics, SQL, Fraud Detection patterns
- Scenario-based questions with detailed answers
- 220+ questions ยท SQL advanced, data models, DBMS
- Window functions, CTEs, query optimization
- Covers fintech payment analytics patterns
4-Week Prep Plan: Crack PhonePe Data Scientist Interview
Build Foundation โ SQL, Statistics, Fintech Product Context
Master window functions, referral attribution SQL, and running balance patterns. Refresh statistics: CLT limitations, non-parametric tests, power analysis, Bayesian vs frequentist. Study PhonePe’s product: UPI payment flows, merchant categories, referral programs, cashback mechanics. Read TDM’s posts on DS interview frameworks for fintech.
2200 Most Asked Analytics Interview Questions โ DS, ML, Stats Sections
Cover statistics, ML, SQL, and case study chapters. Focus on class imbalance, A/B testing, experiment design, and fraud detection questions. Do 100+ questions per day. End of Week 2: First mock interview โ
Ace Any SQL Interview + ML System Design
Complete all 220+ questions in Ace Any SQL Interview. Practice 2 ML system design problems per day โ fraud detection, churn, recommendation. Walk through: data โ features โ model โ threshold โ monitoring โ regulatory compliance. End of Week 3: Second mock interview โ
Timed Practice + Final Revision
Revise first 5 chapters of 2200 ebook. Timed statistics: 20 min per question. Simulate PhonePe’s 4-round gauntlet. End of Week 4: Final mock interview โ