Home โ€บ Case Study โ€บ PhonePe Data Scientist Interview Questions and Answers 2026
Case Study Company Interview Questions Data Analyst Interview Questions

PhonePe Data Scientist Interview Questions and Answers 2026

If you are preparing for a PhonePe Data Scientist role, you are entering one of India’s most competitive DS interview processes. The PhonePe Data Scientist Interview Questions cover advanced statistics (Bayesian inference, CLT violations, non-parametric tests), fraud detection ML, UPI transaction pattern analysis in SQL, Python-based experimentation, and product case studies centered on India’s digital payments ecosystem. This guide covers all four rounds with real questions, complete answers, and the frameworks that PhonePe interviewers score on.

PhonePe
Data Scientist
Payments Intelligence / Fraud Detection / Growth
SQL
Python
Machine Learning
Statistics
Fraud Detection
Salary Range
โ‚น18โ€“32 LPA
2โ€“5 Years Exp

Round 1 โ€” PhonePe Data Scientist Interview Questions: SQL & Analytics (45 mins)

SQL Analytics Round (Payments Focus)
PhonePe’s SQL round tests your ability to analyze UPI transaction data โ€” user payment patterns, merchant acceptance rates, peak transaction detection, and referral program attribution. You must work with high-cardinality tables (300M+ daily UPI transactions) and explain how your queries handle scale. The interviewer probes your understanding of NULL handling, time-zone normalization, and filter order optimization.
Questions 4
Duration 45 minutes
Difficulty Mediumโ€“Hard
Key Topics UPI FunnelReferral AttributionMerchant Success RateWindow FunctionsCohort Analysis
Q1 โ€” UPI Referral Attribution: First-Touch vs Last-Touch
Hardโฑ 20 min

PhonePe runs a referral program where existing users earn cashback when they refer new users who complete their first UPI transaction. The marketing team disputes the attribution model: some new users were referred by multiple people before signing up. Compute referral earnings under both first-touch (credit the first referrer) and last-touch (credit the most recent referrer) models for last month, and show the difference in payout per referrer.

Tables:

  • referrals(referral_id, referrer_user_id, referred_user_id, referral_ts)
  • transactions(txn_id, user_id, amount, status, created_at)
  • Cashback: โ‚น100 per successful first transaction of a referred user
๐ŸŽฏ What the interviewer tests
First-touch vs last-touch attribution requires two separate window functions on the same referral table. Use ROW_NUMBER() OVER (PARTITION BY referred_user_id ORDER BY referral_ts ASC) = 1 for first-touch and ... ORDER BY referral_ts DESC) = 1 for last-touch. The trap: only referrals that preceded the first qualifying transaction should be eligible โ€” a referral sent AFTER the user’s first transaction is not a valid attribution source.

WITH first_txn AS (
    SELECT
        user_id,
        MIN(created_at)  AS first_txn_ts
    FROM transactions
    WHERE status = 'success'
      AND DATE_TRUNC('month', created_at) = DATE_TRUNC('month', CURRENT_DATE - INTERVAL '1 month')
    GROUP BY user_id
),
eligible_referrals AS (
    SELECT
        r.referral_id,
        r.referrer_user_id,
        r.referred_user_id,
        r.referral_ts,
        f.first_txn_ts,
        ROW_NUMBER() OVER (PARTITION BY r.referred_user_id
                           ORDER BY r.referral_ts ASC)  AS first_touch_rank,
        ROW_NUMBER() OVER (PARTITION BY r.referred_user_id
                           ORDER BY r.referral_ts DESC) AS last_touch_rank
    FROM referrals r
    JOIN first_txn f
        ON f.user_id = r.referred_user_id
       AND r.referral_ts < f.first_txn_ts   -- referral must precede first txn
),
first_touch AS (
    SELECT referrer_user_id, COUNT(*) * 100 AS first_touch_cashback
    FROM eligible_referrals
    WHERE first_touch_rank = 1
    GROUP BY referrer_user_id
),
last_touch AS (
    SELECT referrer_user_id, COUNT(*) * 100 AS last_touch_cashback
    FROM eligible_referrals
    WHERE last_touch_rank = 1
    GROUP BY referrer_user_id
)
SELECT
    COALESCE(ft.referrer_user_id, lt.referrer_user_id)  AS referrer_user_id,
    COALESCE(ft.first_touch_cashback, 0)                AS first_touch_cashback,
    COALESCE(lt.last_touch_cashback, 0)                 AS last_touch_cashback,
    COALESCE(ft.first_touch_cashback, 0)
        - COALESCE(lt.last_touch_cashback, 0)           AS cashback_diff
FROM first_touch ft
FULL OUTER JOIN last_touch lt USING (referrer_user_id)
ORDER BY ABS(cashback_diff) DESC;
๐Ÿ’ก Key insight
The r.referral_ts < f.first_txn_ts filter inside the JOIN is the critical eligibility gate โ€” referrals sent after the user already transacted are not valid. The FULL OUTER JOIN on the final aggregation ensures referrers who appear in only one model are still shown. The ABS(cashback_diff) sort surfaces the referrers most impacted by the attribution model choice โ€” these are the ones the finance team should scrutinize first.
๐Ÿ” Common Follow-Up Questions
  • PhonePe moves to a linear attribution model where cashback is split equally across all valid referrers. How does your query change?
  • The referral_ts and first_txn_ts are in different time zones (referral in UTC, transaction in IST). How do you normalize before comparison?
Q2 โ€” Merchant Payment Success Rate by City and Category
Mediumโฑ 15 min

The merchant success team wants to identify city+category combinations where the UPI payment success rate dropped by more than 5 percentage points compared to the same period last month. They also want to flag where the drop is driven specifically by bank_timeout failures (vs other failure reasons).

Tables:

  • merchant_payments(payment_id, merchant_id, city, category, status, failure_reason, amount, created_at)

WITH monthly AS (
    SELECT
        city,
        category,
        DATE_TRUNC('month', created_at)                               AS month,
        COUNT(*)                                                       AS total_attempts,
        SUM(CASE WHEN status = 'success' THEN 1 ELSE 0 END)           AS successes,
        SUM(CASE WHEN failure_reason = 'bank_timeout' THEN 1 ELSE 0 END) AS timeout_failures
    FROM merchant_payments
    WHERE created_at >= DATE_TRUNC('month', CURRENT_DATE - INTERVAL '2 months')
    GROUP BY city, category, DATE_TRUNC('month', created_at)
),
curr AS (SELECT * FROM monthly WHERE month = DATE_TRUNC('month', CURRENT_DATE - INTERVAL '1 month')),
prev AS (SELECT * FROM monthly WHERE month = DATE_TRUNC('month', CURRENT_DATE - INTERVAL '2 months'))
SELECT
    c.city,
    c.category,
    ROUND(c.successes * 100.0 / NULLIF(c.total_attempts, 0), 1)    AS curr_success_rate,
    ROUND(p.successes * 100.0 / NULLIF(p.total_attempts, 0), 1)    AS prev_success_rate,
    ROUND(c.successes * 100.0 / NULLIF(c.total_attempts, 0)
        - p.successes * 100.0 / NULLIF(p.total_attempts, 0), 1)    AS rate_change_pp,
    ROUND(c.timeout_failures * 100.0 / NULLIF(c.total_attempts-c.successes, 0), 1)
                                                                    AS timeout_share_of_failures,
    CASE
        WHEN c.timeout_failures * 1.0 / NULLIF(c.total_attempts-c.successes, 0) > 0.5
        THEN 'BANK_TIMEOUT_DRIVEN' ELSE 'OTHER_FAILURES'
    END  AS failure_driver
FROM curr c
JOIN prev p USING (city, category)
WHERE c.successes * 100.0 / NULLIF(c.total_attempts, 0)
    - p.successes * 100.0 / NULLIF(p.total_attempts, 0) < -5
ORDER BY rate_change_pp;
๐Ÿ’ก Key insight
NULLIF(total_attempts - successes, 0) prevents division by zero when computing timeout share of failures โ€” if there are zero failures, the denominator is zero (which NULLIF catches). The CASE WHEN timeout_failures > 50% of failures THEN 'BANK_TIMEOUT_DRIVEN' flag lets the ops team immediately route the issue to the bank’s payment gateway team rather than spending time diagnosing internal Paytm server issues.
๐Ÿ” Common Follow-Up Questions
  • Some city+category combos have only 50 transactions this month โ€” is a 5 pp drop statistically significant? How do you add a minimum volume filter?
  • How would you build a real-time alert that fires when any city’s 30-minute rolling success rate drops below 85%?


Round 2 โ€” PhonePe Data Scientist Interview Questions: Statistics & Experimentation (50 mins)

Statistics & Hypothesis Testing Round
PhonePe’s statistics round goes deep โ€” they expect you to reason about distribution violations, non-parametric alternatives, Bayesian vs frequentist trade-offs, and experiment design for financial products where high-value outlier transactions can skew averages. This is where most candidates without a statistics background get eliminated.
Questions 4
Duration 50 minutes
Difficulty Hard
Key Topics Non-Parametric TestsPower AnalysisBayesian A/BOutlier HandlingMultiple Testing
Q3 โ€” Why t-test Fails for UPI Transaction Amounts: What to Use Instead
Hardโฑ 18 min

You are running an A/B test on a new UPI cashback feature. The success metric is average transaction amount. You run a two-sample t-test and get p = 0.03. Your senior data scientist says the t-test result is unreliable for this metric. Explain why, and show what to do instead in Python.

๐ŸŽฏ What the interviewer tests
UPI transaction amounts are highly right-skewed โ€” most transactions are โ‚น100โ€“โ‚น500 but a small fraction are โ‚น50,000+ merchant payments. The t-test assumes approximate normality (via CLT) but CLT convergence is very slow when the distribution is this heavy-tailed. The interviewer wants to hear: (1) t-test assumptions โ€” normality or large n; (2) the problem โ€” high kurtosis means CLT kicks in only at n > 10,000+; (3) the solution โ€” Mann-Whitney U test (non-parametric) or bootstrap confidence intervals.

import numpy as np
import pandas as pd
from scipy import stats

# Check distribution shape
ctrl = df[df['group']=='control']['txn_amount']
trt  = df[df['group']=='treatment']['txn_amount']

print("Skewness โ€” Control:", ctrl.skew(), "Treatment:", trt.skew())
print("Kurtosis โ€” Control:", ctrl.kurtosis(), "Treatment:", trt.kurtosis())

# Problem: highly skewed โ†’ t-test unreliable even at n=5000

# Option 1: Mann-Whitney U (compares distributions, not means)
stat, p_mw = stats.mannwhitneyu(trt, ctrl, alternative='two-sided')
print(f"\nMann-Whitney p-value: {p_mw:.4f}")
# Interprets: P(trt amount > ctrl amount) vs 0.5

# Option 2: Bootstrap CI on mean difference
n_boot = 10000
boot_diffs = []
for _ in range(n_boot):
    b_ctrl = ctrl.sample(len(ctrl), replace=True).mean()
    b_trt  = trt.sample(len(trt),  replace=True).mean()
    boot_diffs.append(b_trt - b_ctrl)

boot_diffs = np.array(boot_diffs)
ci_lo, ci_hi = np.percentile(boot_diffs, [2.5, 97.5])
print(f"\nBootstrap 95% CI on mean diff: [{ci_lo:.2f}, {ci_hi:.2f}]")
print("Significant?" , "YES" if ci_lo > 0 or ci_hi < 0 else "NO")

# Option 3: Log-transform if ratio is more meaningful than absolute diff
ctrl_log = np.log1p(ctrl)
trt_log  = np.log1p(trt)
t_stat, p_log = stats.ttest_ind(trt_log, ctrl_log)
print(f"\nt-test on log(amount): p={p_log:.4f}")
๐Ÿ’ก Key insight
The Mann-Whitney U test is distribution-free but tests for stochastic dominance, not mean equality โ€” so it answers “is one group’s distribution shifted higher?” rather than “is the mean different?”. Bootstrap CIs are the most interpretable option for business stakeholders because you can report “we estimate the cashback feature increased average transaction value by โ‚น45โ€“โ‚น120 with 95% confidence.” Log-transform works only if the team cares about percentage lift rather than absolute rupee increase.
๐Ÿ” Common Follow-Up Questions
  • You capped transaction amounts at the 99th percentile to reduce outlier influence. What are the risks of this approach and how do you report it?
  • You are running 10 metrics simultaneously in this A/B test. How do you control for false discoveries across all 10 tests?
Q4 โ€” Power Analysis: How Long to Run the Cashback Experiment
Hardโฑ 18 min

PhonePe’s cashback experiment has: baseline daily transactions per user = 2.3, standard deviation = 4.1, minimum detectable effect = 5% lift (0.115 additional transactions/user/day). With ฮฑ = 0.05 and power = 0.80, how many users are needed per group? Compute this in Python and explain each parameter’s business meaning.

from statsmodels.stats.power import TTestIndPower

analysis = TTestIndPower()

# Parameters
baseline_mean = 2.3          # avg transactions/user/day in control
std_dev       = 4.1          # std of daily transactions/user
mde           = 0.115        # 5% lift = 0.05 * 2.3
effect_size   = mde / std_dev  # Cohen's d

# Sample size per group
n_per_group = analysis.solve_power(
    effect_size = effect_size,
    alpha       = 0.05,
    power       = 0.80,
    alternative = 'two-sided'
)
print(f"Cohen's d: {effect_size:.4f}")
print(f"Users needed per group: {int(n_per_group)+1:,}")
print(f"Total users: {(int(n_per_group)+1)*2:,}")

# Business interpretation
daily_active_users = 50_000_000  # PhonePe's ~50M DAU
exposure_pct = (n_per_group * 2) / daily_active_users
print(f"Exposure needed: {exposure_pct*100:.1f}% of DAU")

# If sample available โ†’ estimate runtime
# Assume 30% of DAU eligible for cashback experiment
eligible_per_day = daily_active_users * 0.30
days_needed = (n_per_group * 2) / eligible_per_day
print(f"Days needed at 30% eligibility: {days_needed:.1f} days")
๐Ÿ’ก Key insight
With std_dev = 4.1 and MDE = 0.115, Cohen’s d โ‰ˆ 0.028 โ€” a very small effect size. This means you need a very large sample (typically 20,000+ per group). The interviewer expects you to flag this: “at PhonePe’s scale, we can run this in 1โ€“2 days if we expose 30% of DAU, but we should also consider whether 5% lift is the right MDE from a business ROI perspective โ€” if cashback costs โ‚น50/user and 5% lift = โ‚น12/user in additional GMV, that’s a negative ROI.” This business contextualization is what separates strong candidates from average ones.
๐Ÿ” Common Follow-Up Questions
  • If after running for 3 days the p-value is already 0.001, should you stop the experiment early? What is peeking and why is it a problem?
  • How does the required sample size change if you use CUPED with a pre-experiment covariate that has r=0.6 with the outcome?


Round 3 โ€” PhonePe Data Scientist Interview Questions: ML & Fraud Detection (60 mins)

Machine Learning & Fraud Intelligence Round
PhonePe’s ML round focuses on real-time fraud detection โ€” the toughest ML design problem in fintech. You will be asked to design systems that must make decisions in <200ms, handle extreme class imbalance (0.01% fraud rate), and operate without compromising legitimate user experience. Cost-asymmetric loss functions and model explanation for regulatory compliance are key themes.
Questions 2
Duration 60 minutes
Difficulty Hard
Key Topics Fraud DetectionReal-time ScoringClass ImbalanceModel ExplanationFeature Engineering
Q5 โ€” Design a Real-Time UPI Fraud Detection System
Hardโฑ 35 min

PhonePe processes 1.2 billion UPI transactions per month. Design a real-time fraud detection system that scores every transaction within 150ms of initiation, blocks high-confidence fraud, and sends medium-confidence cases through additional friction (OTP re-verification). The fraud rate is ~0.01% and false positive cost (blocking a legitimate transaction) is 10x more damaging to PhonePe than a false negative (missing fraud).

๐ŸŽฏ What the interviewer tests
This question tests multiple DS dimensions simultaneously: (1) Feature engineering for temporal patterns without batch lookup; (2) Model latency constraints โ€” not all models work at <150ms; (3) Class imbalance handling at 0.01% rate; (4) Asymmetric cost function โ€” 10:1 false positive penalty means you need a high threshold for blocking; (5) Regulatory explainability โ€” RBI requires fraud decisions to be explainable to users who dispute a block.

Architecture โ€” Two-Stage Scoring:

Stage 1 โ€” Rule Engine (<5ms): Hard-block transactions matching known fraud patterns without ML overhead:

  • VPA (Virtual Payment Address) blacklist lookup from Redis (O(1), <1ms)
  • Velocity rules: >5 transactions within 1 minute from same device ID
  • Amount rules: first transaction to a new VPA >โ‚น50,000 without prior relationship
  • Geographic anomaly: device GPS in Mumbai, transaction VPA location in Rajasthan

Stage 2 โ€” ML Model (<120ms): LightGBM (fast inference, <10ms per record) scores transactions passing Stage 1.

Feature engineering (all computed in <100ms from Redis/pre-computed stores):

  • User-level recency features: Count of unique VPAs transacted to in last 7 days, transaction amounts 30-day rolling avg and std (pre-computed, updated nightly)
  • Session features: Transactions in current session, time since last transaction, device-user mismatch score
  • Network features: Is the recipient VPA new (registered <7 days ago)? Has recipient VPA received any chargebacks?
  • Behavioral biometrics: Typing speed on PIN entry, scroll pattern during transaction flow (engineered from app telemetry, served from feature store)

Training setup:

  • Class imbalance: 0.01% fraud โ†’ use SMOTE + class_weight=’balanced’ + optimize PR-AUC (not ROC-AUC, which is misleading at this imbalance ratio)
  • Cost-sensitive learning: set scale_pos_weight = 10 in LightGBM (1 fraud miss costs 10x a false positive)
  • Threshold: set above 0.85 for hard-block; 0.50โ€“0.85 for friction; <0.50 for pass-through

Model explanation for compliance: Use SHAP values pre-computed for top 5 features per transaction. If a user disputes a block: “Your transaction was flagged because you sent to a VPA registered 2 days ago, this is 8x your usual transaction amount, and your device is new on this account.”

๐Ÿ’ก Key insight
The 10:1 false positive penalty directly translates to threshold design: even if the model outputs 30% fraud probability, you do NOT block unless the cost-adjusted expected loss of blocking is lower than the expected loss of passing. With 10:1 ratio, you need P(fraud) > 0.91 before hard-blocking (threshold = 10/(10+1)). At PhonePe, this is implemented as a loss-threshold lookup table updated weekly as fraud pattern shifts.
๐Ÿ” Common Follow-Up Questions
  • Your model has 99.5% accuracy. Is that a useful metric for a 0.01% fraud rate? What do you report to the VP of Risk instead?
  • A fraudster reverse-engineers your velocity rule (5 transactions/minute) and starts doing 4 per minute. How do you detect this adversarial adaptation without retraining?


๐Ÿ“š Recommended Resources to Crack PhonePe DS Interview

Used by 40,000+ data scientists to crack fintech and product company interviews

โญ Bestseller

2200 Most Asked Analytics Interview Questions
  • 2,200+ questions ยท 23 topics
  • ML, Statistics, SQL, Fraud Detection patterns
  • Scenario-based questions with detailed answers
For: anyone targeting DS roles at fintech or product companies
โ‚น1,999
โ‚น7,999

Buy Now

๐Ÿ”ฅ SQL Focused

Ace Any SQL Interview
  • 220+ questions ยท SQL advanced, data models, DBMS
  • Window functions, CTEs, query optimization
  • Covers fintech payment analytics patterns
For: anyone who wants to crack any SQL round
โ‚น799

Buy Now

4-Week Prep Plan: Crack PhonePe Data Scientist Interview

WEEK 1

Build Foundation โ€” SQL, Statistics, Fintech Product Context

Master window functions, referral attribution SQL, and running balance patterns. Refresh statistics: CLT limitations, non-parametric tests, power analysis, Bayesian vs frequentist. Study PhonePe’s product: UPI payment flows, merchant categories, referral programs, cashback mechanics. Read TDM’s posts on DS interview frameworks for fintech.

WEEK 2

2200 Most Asked Analytics Interview Questions โ€” DS, ML, Stats Sections

Cover statistics, ML, SQL, and case study chapters. Focus on class imbalance, A/B testing, experiment design, and fraud detection questions. Do 100+ questions per day. End of Week 2: First mock interview โ†’

WEEK 3

Ace Any SQL Interview + ML System Design

Complete all 220+ questions in Ace Any SQL Interview. Practice 2 ML system design problems per day โ€” fraud detection, churn, recommendation. Walk through: data โ†’ features โ†’ model โ†’ threshold โ†’ monitoring โ†’ regulatory compliance. End of Week 3: Second mock interview โ†’

WEEK 4

Timed Practice + Final Revision

Revise first 5 chapters of 2200 ebook. Timed statistics: 20 min per question. Simulate PhonePe’s 4-round gauntlet. End of Week 4: Final mock interview โ†’

๐Ÿ“… Book 1:1 mock โ€” expert feedback from practitioners who’ve cracked DS interviews at top fintech companies: Book Mock Interview
~950 questions + 3 mock interviews = enough to crack any DS role at a fintech company. Or take the mentorship program below.

Scroll to Top