Home β€Ί Business Intelligence Engineer β€Ί Paytm BI Engineer Interview Questions and Answers 2026
Business Intelligence Engineer Case Study Company Interview Questions Interview Questions

Paytm BI Engineer Interview Questions and Answers 2026

If you are targeting a Paytm BI Engineer role, you need to be ready for a pipeline-heavy interview unlike standard DA roles. The Paytm BI Engineer Interview Questions test advanced SQL for payments data (failed transactions, wallet balances, UPI flows), data pipeline architecture (Spark, Hive, Airflow scheduling), dashboard design for regulatory and business reporting, and your ability to handle near-real-time fintech data at scale. This guide covers all three rounds with exact questions, complete answers, and the frameworks that Paytm interviewers look for.

Paytm
BI Engineer
Payments Analytics / Data Infrastructure
Advanced SQL
Python
Data Pipelines
Dashboard Design
Spark/Hive
Salary Range
β‚Ή12–22 LPA
2–5 Years Exp

Round 1 β€” Paytm BI Engineer Interview Questions: Advanced SQL (60 mins)

Advanced SQL Round (Payments Data Focus)
Paytm’s SQL round is heavier than a standard DA round β€” you will be expected to write multi-level CTEs, handle NULL-heavy fintech data, compute running balances, and build failure attribution reports. The interviewer specifically tests whether you understand the difference between payment attempts and successful settlements, and how to track wallet credits/debits accurately.
Questions 5
Duration 60 minutes
Difficulty Medium–Hard
Key Topics Running BalancesFailure AttributionUPI FunnelWindow FunctionsMerchant Analytics
Q1 β€” UPI Transaction Failure Attribution by Bank
Hard⏱ 20 min

The payments ops team wants to understand which banks are causing the most UPI failures. For each bank (identified by IFSC prefix) and failure reason, compute: total failures, failure rate vs total attempts, and consecutive failure streak β€” the maximum number of consecutive failed transactions for any single user on that bank in the last 7 days.

Tables:

  • upi_transactions(txn_id, user_id, bank_ifsc, amount, status, failure_reason, created_at)
🎯 What the interviewer tests
The failure attribution is straightforward; the consecutive streak is the trap. This requires a gaps-and-islands pattern: assign each transaction a group number by subtracting ROW_NUMBER() from a running count of failures β€” consecutive failures will share the same group. The interviewer wants to see if you know this pattern without a hint. Mention IFSC prefix extraction via LEFT(bank_ifsc, 4) to group banks.

WITH recent_txns AS (
    SELECT
        txn_id,
        user_id,
        LEFT(bank_ifsc, 4)   AS bank_prefix,
        status,
        failure_reason,
        created_at,
        ROW_NUMBER() OVER (PARTITION BY user_id, LEFT(bank_ifsc,4)
                           ORDER BY created_at)           AS rn_all,
        ROW_NUMBER() OVER (PARTITION BY user_id, LEFT(bank_ifsc,4),
                           CASE WHEN status='failed' THEN 1 ELSE 0 END
                           ORDER BY created_at)           AS rn_status
    FROM upi_transactions
    WHERE created_at >= CURRENT_DATE - INTERVAL '7 days'
),
failure_groups AS (
    SELECT
        txn_id, user_id, bank_prefix, status, failure_reason, created_at,
        CASE WHEN status = 'failed'
             THEN rn_all - rn_status END           AS grp_id
    FROM recent_txns
    WHERE status = 'failed'
),
streaks AS (
    SELECT
        user_id, bank_prefix,
        grp_id,
        COUNT(*) AS streak_len
    FROM failure_groups
    GROUP BY user_id, bank_prefix, grp_id
),
max_streaks AS (
    SELECT bank_prefix, MAX(streak_len) AS max_streak
    FROM streaks
    GROUP BY bank_prefix
),
failure_summary AS (
    SELECT
        LEFT(bank_ifsc, 4)   AS bank_prefix,
        failure_reason,
        COUNT(*)                                                  AS total_failures,
        COUNT(*) * 1.0 / NULLIF(SUM(COUNT(*)) OVER
            (PARTITION BY LEFT(bank_ifsc,4)), 0) * 100           AS failure_rate_pct
    FROM upi_transactions
    WHERE created_at >= CURRENT_DATE - INTERVAL '7 days'
    GROUP BY LEFT(bank_ifsc, 4), failure_reason
)
SELECT
    fs.bank_prefix,
    fs.failure_reason,
    fs.total_failures,
    ROUND(fs.failure_rate_pct, 1)  AS failure_rate_pct,
    ms.max_streak                  AS max_consecutive_failures
FROM failure_summary fs
LEFT JOIN max_streaks ms USING (bank_prefix)
ORDER BY fs.total_failures DESC;
πŸ’‘ Key insight
The gaps-and-islands trick: rn_all - rn_status gives a stable group ID for consecutive runs of the same status. Within a failure group, all rows share the same grp_id even if their absolute row numbers differ. NULLIF(SUM(COUNT(*)) OVER PARTITION BY bank_prefix, 0) computes total attempts per bank (not just failures) as the denominator for failure rate.
πŸ” Common Follow-Up Questions
  • If a bank has a 40% failure rate on UPI but it’s only 2% of Paytm’s transaction volume, how do you decide whether to escalate to the bank’s API team?
  • How would you build a real-time alert when any bank’s 15-minute rolling failure rate exceeds 25%? What infrastructure would you use?
Q2 β€” Wallet Running Balance: Detect Negative Balance Events
Medium⏱ 15 min

The risk team reports that some users had a negative wallet balance at certain points (a system bug). Compute each user’s running wallet balance after every transaction (credits positive, debits negative) and flag the first transaction that caused a negative balance for each user.

Table:

  • wallet_transactions(txn_id, user_id, txn_type, amount, created_at)
  • txn_type: 'credit' or 'debit'
🎯 What the interviewer tests
Running balance with SUM() OVER (PARTITION BY user_id ORDER BY created_at ROWS UNBOUNDED PRECEDING) is the core technique. The interviewer checks whether you handle txn_type correctly β€” debits must be sign-flipped. The “first negative” flag requires a secondary filter: WHERE running_balance < 0 followed by ROW_NUMBER() = 1 per user to get only the first offending event.

WITH signed_txns AS (
    SELECT
        txn_id,
        user_id,
        created_at,
        txn_type,
        amount,
        CASE WHEN txn_type = 'credit'  THEN  amount
             WHEN txn_type = 'debit'   THEN -amount
        END  AS signed_amount
    FROM wallet_transactions
),
running AS (
    SELECT
        txn_id,
        user_id,
        txn_type,
        amount,
        created_at,
        SUM(signed_amount) OVER (
            PARTITION BY user_id
            ORDER BY created_at
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
        )  AS running_balance
    FROM signed_txns
),
first_negative AS (
    SELECT
        *,
        ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at) AS rn
    FROM running
    WHERE running_balance < 0
)
SELECT
    user_id,
    txn_id                           AS first_negative_txn,
    txn_type,
    amount,
    created_at,
    running_balance                  AS balance_after_txn
FROM first_negative
WHERE rn = 1
ORDER BY created_at;
πŸ’‘ Key insight
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW is explicit and avoids subtle bugs with duplicate timestamps. If two transactions share the same created_at, add a tie-breaker: ORDER BY created_at, txn_id. At Paytm, this type of audit query is used post-incident to identify the exact debit transaction that triggered the race condition in the wallet service.
πŸ” Common Follow-Up Questions
  • If the same user had 3 separate negative balance events across different dates, how do you report all 3 with the balance before and after each?
  • The wallet table has 500 million rows. How would you optimize this query β€” partitioning strategy, indexing, or materialized views?
Q3 β€” Merchant GMV Rank with Category Percentile
Medium⏱ 15 min

The merchant analytics team wants a report showing each merchant’s GMV for last month, their rank within their category, and their GMV percentile within that category. Also flag merchants who dropped from the top quartile last month to below the median this month.

Tables:

  • merchant_transactions(txn_id, merchant_id, category, gmv, txn_date, status)

WITH monthly_gmv AS (
    SELECT
        merchant_id,
        category,
        DATE_TRUNC('month', txn_date)  AS month,
        SUM(gmv)                       AS total_gmv
    FROM merchant_transactions
    WHERE status = 'success'
      AND txn_date >= DATE_TRUNC('month', CURRENT_DATE - INTERVAL '2 months')
    GROUP BY merchant_id, category, DATE_TRUNC('month', txn_date)
),
ranked AS (
    SELECT
        merchant_id,
        category,
        month,
        total_gmv,
        RANK()         OVER (PARTITION BY category, month ORDER BY total_gmv DESC)  AS gmv_rank,
        PERCENT_RANK() OVER (PARTITION BY category, month ORDER BY total_gmv)       AS gmv_percentile
    FROM monthly_gmv
),
curr  AS (SELECT * FROM ranked WHERE month = DATE_TRUNC('month', CURRENT_DATE - INTERVAL '1 month')),
prev  AS (SELECT * FROM ranked WHERE month = DATE_TRUNC('month', CURRENT_DATE - INTERVAL '2 months'))
SELECT
    c.merchant_id,
    c.category,
    c.total_gmv,
    c.gmv_rank,
    ROUND(c.gmv_percentile * 100, 1)            AS gmv_percentile_pct,
    CASE
        WHEN p.gmv_percentile >= 0.75
         AND c.gmv_percentile < 0.50 THEN 'DROPPED_FROM_TOP_QUARTILE'
        ELSE 'STABLE'
    END  AS status_flag
FROM curr c
LEFT JOIN prev p USING (merchant_id, category)
ORDER BY c.category, c.gmv_rank;
πŸ’‘ Key insight
PERCENT_RANK() returns 0 for the lowest value and 1 for the highest. So gmv_percentile >= 0.75 means top quartile last month. The LEFT JOIN preserves merchants who are new this month (no prev record) β€” they get NULL for prev percentile and default to STABLE in the CASE. Make sure to filter status = 'success' β€” Paytm’s payment tables include pending and failed rows that inflate GMV if unfiltered.
πŸ” Common Follow-Up Questions
  • How would you build a Metabase or Superset dashboard that shows this merchant ranking table and auto-refreshes daily from the data pipeline?
  • A merchant operates across two categories β€” how do you report their cross-category total GMV while preserving category-level ranking?


Round 2 β€” Paytm BI Engineer Interview Questions: Pipeline Design & Architecture (45 mins)

Data Pipeline & BI Architecture Round
This round tests whether you can design end-to-end data pipelines for fintech workloads β€” ingestion, transformation, scheduling, and monitoring. Paytm processes hundreds of millions of transactions daily, so interviewers specifically probe your understanding of partitioning, incremental loads, data quality checks, and SLA monitoring for regulatory dashboards.
Questions 3
Duration 45 minutes
Difficulty Medium
Key Topics Airflow DAGsIncremental LoadPartitioningData QualityReal-time vs Batch
Q4 β€” Design a Daily Payments KPI Pipeline for Finance Reporting
Medium⏱ 25 min

The Finance team needs a daily dashboard showing: total transaction volume (success/failure by category), net settlement amount, and top 10 UPI failure reasons β€” all refreshed by 7 AM IST from yesterday’s data. Design the complete pipeline from raw data to dashboard.

🎯 What the interviewer tests
The interviewer evaluates four things: (1) Do you separate raw, staging, and presentation layers? (2) Do you think about incremental loads β€” not full scans every day? (3) Do you add data quality gates before dashboard refresh? (4) Do you understand SLA implications β€” if the pipeline fails at 6:55 AM, what’s the recovery plan? Many candidates jump straight to “write a Spark job” without thinking about the full pipeline lifecycle.

Layer 1 β€” Raw Ingestion (12 AM – 2 AM):

  • Kafka consumers write payment events to HDFS/S3 raw zone, partitioned by dt=YYYY-MM-DD/hr=HH
  • No transformation β€” raw events preserved for audit and replay
  • Airflow DAG: ingest_payments_raw β€” hourly, checks last Kafka offset committed per partition

Layer 2 β€” Staging / Transformation (2 AM – 4 AM):

  • PySpark job reads yesterday’s raw partition, applies cleaning: dedup on txn_id, correct IFSC format, cast amount to decimal
  • Write to staging Hive table: stg_payments partitioned by dt
  • Data quality checks (Great Expectations or custom):
    • Row count vs Kafka offset count: <0.1% variance allowed
    • NULL rate on bank_ifsc ≀ 2% (more means upstream issue)
    • Settlement amount sign consistency: all debits negative

Layer 3 β€” Aggregation / Presentation (4 AM – 5:30 AM):

  • Hive/SparkSQL job computes KPI aggregates β†’ writes to rpt_daily_payments_kpi
  • Table materialized in the BI layer (Redshift or BigQuery) for fast dashboard queries

Layer 4 β€” Dashboard Refresh (5:30 AM – 6:30 AM):

  • Superset/Metabase scheduled refresh triggered by Airflow sensor on KPI table write timestamp
  • Finance team alert: Slack notification at 7 AM confirming dashboard ready

SLA breach handling: If staging fails, Airflow retries 3x with 10-minute backoff. If still failing at 6 AM, PagerDuty alert to on-call BI engineer. Dashboard shows “Data as of D-2” with a banner until D-1 data is backfilled.

πŸ” Common Follow-Up Questions
  • A batch from 2:15 AM arrived late at 4:30 AM. How does your pipeline detect and re-process the late data without re-running the full day?
  • The Finance team wants the dashboard refreshed every 15 minutes during market hours (9 AM – 5 PM). How does your architecture change?


πŸ“š Recommended Resources to Crack Paytm BI Engineer Interview

Used by 40,000+ analysts to crack fintech and product company interviews

⭐ Bestseller

2200 Most Asked Analytics Interview Questions
  • 2,200+ questions Β· 23 topics
  • SQL, Python, Data Engineering, BI Design
  • Fintech-specific scenario questions
For: BI Engineers targeting fintech and product companies
β‚Ή1,999
β‚Ή7,999

Buy Now

πŸ”₯ SQL Focused

Ace Any SQL Interview
  • 220+ questions Β· SQL advanced, data models, DBMS
  • Window functions, CTEs, query optimization
  • Covers fintech payment table patterns
For: anyone who wants to crack any SQL round
β‚Ή799

Buy Now

4-Week Prep Plan: Crack Paytm BI Engineer Interview

WEEK 1

SQL Foundation β€” Window Functions + Pipeline Basics

Master running totals, gaps-and-islands, percentile functions, and multi-level CTEs. Learn Airflow DAG design basics. Study Paytm’s product model: UPI, wallet, merchant payments, KYC. Read TDM posts on BI engineer interview patterns for fintech.

WEEK 2

2200 Most Asked Analytics Interview Questions β€” SQL & DE Sections

Cover SQL, Python, data engineering, and BI design chapters. Focus on payment-domain questions (transaction success rates, rolling averages, funnel analysis). Do 100+ questions per day. End of Week 2: First mock interview β†’

WEEK 3

Ace Any SQL Interview + Pipeline Architecture Practice

Complete all 220+ questions in Ace Any SQL Interview. Practice 2 pipeline design questions per day β€” walk through: ingestion β†’ staging β†’ aggregation β†’ dashboard β†’ SLA. Study Spark partitioning, Hive optimization, and Airflow retry patterns. End of Week 3: Second mock interview β†’

WEEK 4

Timed Practice + Revision

Revise first 5 chapters of 2200 ebook. Timed SQL practice: 15 min per query. Simulate Paytm’s pipeline design round under 45-minute time pressure. End of Week 4: Final mock interview β†’

πŸ“… Book 1:1 mock interview β€” get expert feedback before the real thing: Book Mock Interview
~950 questions + 3 mock interviews = enough to crack any BI Engineer role at a fintech company. Or take the mentorship program below.

Scroll to Top