Home β€Ί Case Study β€Ί Microsoft Senior Data Analyst Interview Questions and Answers...
Case Study Company Interview Questions Data Analyst Interview Questions

Microsoft Senior Data Analyst Interview Questions and Answers 2026

If you are targeting the Microsoft Senior Data Analyst role, this guide covers all four interview rounds β€” Advanced SQL, Python analytics, product case studies, and Power BI + Azure Synapse β€” with real questions, model answers, and exactly what each interviewer is evaluating. Questions are sourced from 2024–2026 interview experiences across Microsoft Bangalore.

Microsoft
Senior Data Analyst
Product Analytics / Business Intelligence
T-SQL
Power BI
Python
Azure Synapse
DAX
Excel
Salary Range
β‚Ή40–60 LPA
5–8 Years Exp

Round 1 β€” Advanced SQL

Advanced SQL Round (T-SQL focused)
Microsoft uses T-SQL in production (Azure SQL, SQL Server). Expect syntax differences from PostgreSQL β€” TOP N instead of LIMIT, DATEADD, GETDATE(). The interviewer will probe query plan awareness and indexing knowledge.
Questions 5
Duration 60 minutes
Difficulty Medium–Hard
Key Topics Window FunctionsSelf-JoinRolling AveragesGap DetectionT-SQL Syntax
Q1 β€” 7-Day Rolling Average of Daily Active Users
Medium⏱ 12 min

The product team wants a smoothed view of daily engagement for Microsoft Teams. Calculate the 7-day rolling average of daily active users to remove weekend dips.

Table: user_activity(activity_date, user_id)

🎯 What the interviewer tests
Do you know ROWS BETWEEN vs RANGE BETWEEN? Most candidates write a subquery loop β€” the window function approach is 10Γ— faster. They will also ask: “What does the result look like on the first 6 days?”

WITH daily_dau AS (
    SELECT
        activity_date,
        COUNT(DISTINCT user_id) AS dau
    FROM user_activity
    GROUP BY activity_date
)
SELECT
    activity_date,
    dau,
    AVG(dau) OVER (
        ORDER BY activity_date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS rolling_7d_avg
FROM daily_dau
ORDER BY activity_date;
πŸ’‘ Key insight
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW includes exactly 7 rows (today + 6 prior days) regardless of gaps. Use ROWS not RANGE here β€” RANGE compares values, not row positions, which gives wrong results if dates are missing.
πŸ” Common Follow-Up Questions
  • The first 6 days will have averages based on fewer than 7 data points β€” is that acceptable? How would you handle it?
  • How would you calculate a 28-day rolling average alongside the 7-day one in the same query?
  • How would you adapt this for a partitioned approach (per product, per region)?

Q2 β€” Employees Earning More Than Their Manager
Easy⏱ 8 min

A classic Microsoft interview question β€” find all employees whose salary exceeds their direct manager’s salary.

Table: employees(employee_id, name, salary, manager_id)

🎯 What the interviewer tests
Self-join pattern. They watch whether you use an inner join (misses employees with no manager) vs left join. They will also ask: “What about a manager who earns more than their skip-level manager?”

SELECT
    e.name         AS employee_name,
    e.salary       AS employee_salary,
    m.name         AS manager_name,
    m.salary       AS manager_salary
FROM employees e
JOIN employees m
    ON e.manager_id = m.employee_id
WHERE e.salary > m.salary;
πŸ’‘ Key insight
Self-join β€” alias the same table as e (employee) and m (manager), then join on manager_id = employee_id. Use INNER JOIN here intentionally β€” employees with no manager (CEO) don’t have a comparison to make.


πŸ“š Recommended Resources to Crack Microsoft Interview
BESTSELLER
2200 Most Asked Analytics Interview Questions
β‚Ή1,999 β‚Ή7,999
  • 500+ SQL questions with solutions
  • 300+ Python & stats questions
  • Case study frameworks with 120+ examples
  • Covers all top product companies
Best for: Full interview prep in one place

Get the Book β†’

SQL FOCUSED
550 SQL Interview Questions to Crack Any Analytics Interview
β‚Ή499 β‚Ή1,999
  • 550 SQL questions across all difficulty levels
  • T-SQL specific patterns & syntax
  • Window functions, recursive CTEs, gaps
  • Microsoft & enterprise SQL patterns
Best for: T-SQL heavy rounds like Round 1

Get the Book β†’

Q3 β€” Month-over-Month Revenue Growth %
Medium⏱ 14 min

The finance team wants a month-by-month revenue report showing absolute growth and percentage change vs the prior month.

Table: orders(order_date DATE, revenue DECIMAL)

WITH monthly_rev AS (
    SELECT
        DATEADD(MONTH, DATEDIFF(MONTH, 0, order_date), 0) AS month_start,
        SUM(revenue) AS total_revenue
    FROM orders
    GROUP BY DATEADD(MONTH, DATEDIFF(MONTH, 0, order_date), 0)
)
SELECT
    month_start,
    total_revenue,
    LAG(total_revenue) OVER (ORDER BY month_start) AS prev_month_revenue,
    ROUND(
        100.0 * (total_revenue - LAG(total_revenue) OVER (ORDER BY month_start))
              / NULLIF(LAG(total_revenue) OVER (ORDER BY month_start), 0),
        1
    ) AS mom_growth_pct
FROM monthly_rev
ORDER BY month_start;
πŸ’‘ Key insight
NULLIF(..., 0) prevents division-by-zero when prior month revenue is zero. DATEDIFF(MONTH, 0, order_date) is the T-SQL idiom for month-start truncation β€” equivalent to DATE_TRUNC('month', ...) in PostgreSQL.
πŸ” Common Follow-Up Questions
  • How would you show a 3-month moving average of growth % alongside the monthly figure?
  • If a month has zero revenue (e.g. holiday shutdown), how should the % growth be displayed?

Q4 β€” Find Missing Dates in a Time Series
Hard⏱ 20 min

The telemetry pipeline occasionally misses days. Identify all dates between the first and last record date that have no data in the daily_metrics table.

Table: daily_metrics(metric_date DATE, value INT)

🎯 What the interviewer tests
Can you generate a date series in T-SQL? Most candidates are stumped here because T-SQL has no generate_series(). The recursive CTE approach is the expected answer.

WITH date_range AS (
    SELECT MIN(metric_date) AS start_date,
           MAX(metric_date) AS end_date
    FROM daily_metrics
),
all_dates AS (
    -- Recursive CTE to generate every date in range
    SELECT start_date AS dt FROM date_range
    UNION ALL
    SELECT DATEADD(DAY, 1, dt)
    FROM all_dates
    WHERE dt < (SELECT end_date FROM date_range)
)
SELECT a.dt AS missing_date
FROM all_dates a
LEFT JOIN daily_metrics d ON a.dt = d.metric_date
WHERE d.metric_date IS NULL
OPTION (MAXRECURSION 3650);  -- allow up to 10 years
πŸ’‘ Key insight
Recursive CTEs are the T-SQL way to generate sequences. The OPTION (MAXRECURSION 3650) hint is required β€” SQL Server defaults to 100 recursion levels, which only covers ~3 months. Always mention this to the interviewer β€” it shows production awareness.
πŸ” Common Follow-Up Questions
  • How would you modify this to find gaps longer than 3 consecutive missing days?
  • Could you use a Numbers table instead of a recursive CTE? Which is faster at scale?

Q5 β€” Identify Power Users (Top 10% by Engagement)
Medium⏱ 12 min

The growth team wants to identify “power users” β€” defined as the top 10% of users by total session time in the past 30 days. Used for targeted feature rollout and referral campaigns.

Table: sessions(user_id, session_date DATE, duration_seconds INT)

WITH user_totals AS (
    SELECT
        user_id,
        SUM(duration_seconds) AS total_seconds,
        NTILE(10) OVER (ORDER BY SUM(duration_seconds) DESC) AS decile
    FROM sessions
    WHERE session_date >= DATEADD(DAY, -30, GETDATE())
    GROUP BY user_id
)
SELECT
    user_id,
    total_seconds,
    ROUND(total_seconds / 3600.0, 1) AS total_hours
FROM user_totals
WHERE decile = 1
ORDER BY total_seconds DESC;
πŸ’‘ Key insight
NTILE(10) divides users into 10 equal buckets β€” decile 1 is the top 10%. This is cleaner than computing a manual percentile cutoff. GETDATE() is the T-SQL equivalent of CURRENT_DATE in standard SQL.


🎯 Crack Microsoft With The Data Monk Mentorship
Starter Bundle
β‚Ή2,999
eBooks + email support + interview checklist

Get Started β†’

Elite Bundle
β‚Ή9,999
Everything + 1:1 mentorship + job referral network

Get Started β†’


Round 2 β€” Python Analytics

Python + Statistical Analysis
Microsoft Senior DA interviews lean toward statistical thinking alongside coding. Expect questions on A/B testing, RFM analysis, time series, and data wrangling. You will be judged on whether you validate assumptions before computing results.
Questions 4
Duration 45 minutes
Difficulty Medium–Hard
Key Topics PandasA/B TestingRFM AnalysisJSON ParsingStatistical Validation
Q1 β€” RFM Customer Segmentation
Medium⏱ 15 min

Segment customers into Champions, At-Risk, and Churned using Recency, Frequency, and Monetary scores. Microsoft uses this for Azure subscription renewals and Office 365 upsell targeting.

🎯 What the interviewer tests
Whether you understand what RFM means in a business context β€” not just the code. They will ask: “What would you do differently for a B2B customer vs a consumer?” That distinction matters at Microsoft.

import pandas as pd
from datetime import datetime

snapshot_date = datetime(2025, 5, 1)

rfm = (
    df.groupby("customer_id").agg(
        recency   = ("order_date", lambda x: (snapshot_date - x.max()).days),
        frequency = ("order_id",   "nunique"),
        monetary  = ("revenue",    "sum")
    ).reset_index()
)

# Score each dimension 1-5 (5 = best)
rfm["r_score"] = pd.qcut(rfm["recency"],   5, labels=[5,4,3,2,1])
rfm["f_score"] = pd.qcut(rfm["frequency"].rank(method="first"), 5, labels=[1,2,3,4,5])
rfm["m_score"] = pd.qcut(rfm["monetary"],  5, labels=[1,2,3,4,5])

rfm["rfm_score"] = (
    rfm["r_score"].astype(int)
  + rfm["f_score"].astype(int)
  + rfm["m_score"].astype(int)
)

def segment(score):
    if score >= 13: return "Champion"
    if score >= 9:  return "Loyal"
    if score >= 6:  return "At Risk"
    return "Churned"

rfm["segment"] = rfm["rfm_score"].apply(segment)
print(rfm["segment"].value_counts())
πŸ’‘ Key insight
pd.qcut creates equal-frequency bins β€” better than equal-width bins for skewed revenue data. Recency is inverted (lower days = better = score 5). Use rank(method="first") on frequency to handle ties in qcut.

Q2 β€” A/B Test: Is the Conversion Lift Statistically Significant?
Hard⏱ 18 min

A new Teams onboarding flow (variant B) shows 5.2% conversion vs control’s 4.8%. With 10,000 users per group, is this lift statistically significant at 95% confidence?

🎯 What the interviewer tests
This is the most important question for a Senior DA at Microsoft. They want to see: (1) you know to run a two-proportion z-test, (2) you check sample size before trusting the result, (3) you can interpret a p-value for a non-technical audience.

from statsmodels.stats.proportion import proportions_ztest
import numpy as np

n_control  = 10_000
n_variant  = 10_000
conv_ctrl  = 0.048  # 4.8%
conv_var   = 0.052  # 5.2%

count = np.array([conv_var * n_variant, conv_ctrl * n_control])
nobs  = np.array([n_variant, n_control])

z_stat, p_value = proportions_ztest(count, nobs, alternative='larger')

print(f"Z-statistic : {z_stat:.3f}")
print(f"P-value     : {p_value:.4f}")
print(f"Significant : {'Yes βœ“' if p_value < 0.05 else 'No βœ—'}")

# Minimum detectable effect sanity check
lift = (conv_var - conv_ctrl) / conv_ctrl * 100
print(f"Relative lift: {lift:.1f}%")
πŸ’‘ Key insight
A 5.2% vs 4.8% lift (8.3% relative) with 10K users per group gives a p-value β‰ˆ 0.046 β€” just barely significant at 95%. Always report the relative lift (not absolute) to stakeholders. Mention Type I/II error tradeoff β€” the interviewer expects statistical maturity at senior level.
πŸ” Common Follow-Up Questions
  • How would you determine the minimum sample size needed before launching the test?
  • What if the test ran for only 3 days during a holiday week β€” would you trust the results?
  • How would you explain this result to a VP who doesn’t know what a p-value is?

Q3 β€” Parse Nested JSON from Microsoft Graph API
Medium⏱ 12 min

Microsoft Graph API returns deeply nested JSON. Flatten the response into a clean DataFrame with one row per user event.

import pandas as pd
import json

# Simulated Graph API response
raw = '''
{
  "value": [
    {"userId": "u1", "displayName": "Alice",
     "activities": [{"type": "login", "ts": "2025-05-01"},
                    {"type": "file_edit", "ts": "2025-05-02"}]},
    {"userId": "u2", "displayName": "Bob",
     "activities": [{"type": "login", "ts": "2025-05-01"}]}
  ]
}'''

data = json.loads(raw)

rows = []
for user in data["value"]:
    for activity in user.get("activities", []):
        rows.append({
            "user_id"    : user["userId"],
            "name"       : user["displayName"],
            "event_type" : activity["type"],
            "event_ts"   : activity["ts"],
        })

df = pd.DataFrame(rows)
df["event_ts"] = pd.to_datetime(df["event_ts"])
print(df)
πŸ’‘ Key insight
Avoid pd.json_normalize for deeply nested structures β€” a manual loop gives more control and handles missing keys gracefully. Always parse timestamps with pd.to_datetime() immediately β€” raw strings cause bugs in every downstream operation.

Q4 β€” Feature Adoption Funnel Analysis
Easy⏱ 8 min

Calculate step-by-step funnel conversion rates for a Teams feature rollout. Steps: Invited β†’ Clicked β†’ Activated β†’ Retained (used again in 7 days).

funnel = df.groupby("funnel_step")["user_id"].nunique().reset_index()
funnel.columns = ["step", "users"]

# Order steps manually
step_order = ["Invited", "Clicked", "Activated", "Retained"]
funnel["step"] = pd.Categorical(funnel["step"], categories=step_order, ordered=True)
funnel = funnel.sort_values("step").reset_index(drop=True)

top = funnel.loc[0, "users"]
funnel["pct_of_top"]  = (funnel["users"] / top * 100).round(1)
funnel["step_dropoff"] = funnel["users"].pct_change().mul(100).round(1)

print(funnel[["step","users","pct_of_top","step_dropoff"]])
πŸ’‘ Key insight
Always show both absolute-to-top conversion AND step-to-step drop-off β€” PMs need both to prioritise where to fix the funnel. pd.Categorical with ordered=True ensures steps sort correctly regardless of alphabetical order.


Round 3 β€” Product Analytics Case Study

Product Analytics & Metric Design
Microsoft interviews heavily test product intuition. You are expected to define metrics, design experiments, diagnose anomalies, and communicate findings as if presenting to a General Manager. Structure your answer before diving into analysis β€” use the MECE framework.
Questions 1–2
Duration 45 minutes
Difficulty Hard
Key Topics Metric DesignRoot Cause AnalysisMECE FrameworkExperiment Design
Q1 β€” Bing Search Volume Dropped 15% Last Week. Diagnose It.
Hard⏱ 25 min
🎯 What the interviewer tests
This is a real Microsoft interview question reported across 2024 Bangalore interviews. They score you on: (1) segmentation before hypothesising, (2) knowing which internal data to pull first, (3) not jumping to “it’s a product bug” without ruling out external factors.

Use the MECE framework β€” structure the problem before touching data:

πŸ—‚οΈ MECE Analysis β€” Bing 15% Drop
External Factors (Not Microsoft)
Holiday / week with fewer working days Β· Major news event reducing commercial intent Β· Google launched competing feature Β· Seasonal trend (compare same week last year)
Internal / Product Factors
Feature flag deployment changed ranking Β· CDN issue in specific region Β· Autocomplete regression Β· New ad layout reducing organic searches Β· Edge browser update affecting default search
Segment: Where Is the Drop?
By region (US vs EU vs Asia) Β· By device (desktop vs mobile vs API) Β· By search type (web / image / news) Β· By user cohort (logged-in vs anonymous) Β· By query category (informational vs transactional)
Data Pulls to Validate
Compare same week last year (YoY) Β· Check query volume by entry point Β· Pull error rate and latency by region Β· Review recent deployment log Β· Check bounce rate and zero-results rate trend
πŸ’‘ Key insight
Always start with “Is this a real drop or a measurement issue?” β€” check if logging changed. Then segment by the most likely partition (region, device) before hypothesising. At Microsoft, they expect you to say “I’d first pull YoY for the same calendar week” because Bing has strong seasonal patterns around holidays.
πŸ” Common Follow-Up Questions
  • The drop is concentrated in mobile in India β€” what does that suggest and what would you investigate next?
  • How would you set up an ongoing alerting system so this is caught within 2 hours instead of a week?
  • How would you present your findings in a 5-minute slot to the Bing GM?


Round 4 β€” Power BI, DAX and Azure Synapse

BI Tooling + Cloud Data Warehouse
This round is unique to Microsoft DA roles. Interviewers expect hands-on Power BI experience β€” not just awareness. DAX measure design, DirectQuery vs Import Mode tradeoffs, and Azure Synapse architecture are all fair game. Production experience is the differentiator.
Questions 3
Duration 60 minutes
Difficulty Medium–Hard
Key Topics Power BI / DAXDirectQuery vs ImportAzure SynapseStar Schema
Q1 β€” DAX: Calculated Measure vs Calculated Column
Medium⏱ 14 min

When would you use a DAX calculated measure vs a calculated column in Power BI? Write both versions for “Revenue Per User” and explain the performance difference.

🎯 What the interviewer tests
This is the most common Power BI question at Microsoft. Candidates who say “they’re the same” immediately get marked down. The interviewer will push on storage, refresh cost, and filter context.

-- CALCULATED COLUMN (row context, stored in model)
-- Adds a physical column to the table β€” computed at refresh time
Revenue Per User Column =
    DIVIDE(Sales[revenue], Sales[user_count], 0)


-- CALCULATED MEASURE (filter context, computed at query time)
-- Evaluated dynamically based on slicer/visual filters
Revenue Per User =
    DIVIDE(
        SUM(Sales[revenue]),
        DISTINCTCOUNT(Sales[user_id]),
        0
    )
Calculated Column Calculated Measure
Computed At data refresh At query time
Stored in Model (increases file size) Not stored
Context Row context (per row) Filter context (per visual)
Use when Filtering/slicing by value Aggregating across visuals
Performance Slower refresh, fast query Fast refresh, computed live
πŸ’‘ Key insight
Default to measures for aggregations β€” they respond to filter context (slicers, cross-filtering) correctly. Use calculated columns only when you need to filter or sort by the derived value. Columns inflate model size significantly on large fact tables.

Q2 β€” Power BI: Import Mode vs DirectQuery
Medium⏱ 12 min

Your Power BI dashboard serves 300 daily users on top of Azure Synapse. The dashboard data updates every 4 hours. Which connection mode would you choose and why?

Factor Import Mode DirectQuery
Data freshness Scheduled refresh (min 30 min) Real-time (live query)
Performance Sub-second (in-memory VertiPaq) Depends on Synapse performance
Data size limit 1 GB compressed (Premium: larger) No limit
Synapse cost One refresh query per cycle 300 users Γ— queries per session
DAX features Full DAX support Some DAX functions restricted
Best for Historical, high-traffic dashboards Real-time ops, very large data
πŸ’‘ Key insight
For 300 users + 4-hour refresh cadence β†’ Import Mode. It costs one Synapse query per 4 hours vs 300 Γ— N live queries per day with DirectQuery. At β‚Ή40–60 LPA senior level, the interviewer expects you to quantify the cost tradeoff, not just state the answer.

Q3 β€” Design a Star Schema for Office 365 Usage Analytics
Hard⏱ 20 min

Design a data warehouse schema to track daily Office 365 feature usage (Word, Excel, Teams, Outlook) across tenants, enabling product adoption analytics.

🎯 What the interviewer tests
Whether you default to a star schema (correct) vs a normalised 3NF schema (wrong for analytics). They will ask about grain, slowly changing dimensions, and how you’d handle tenant-level privacy constraints.

-- FACT TABLE (grain = one row per user per product per day)
fact_usage (
    usage_id      BIGINT         PRIMARY KEY,
    date_key      INT            FK β†’ dim_date,
    user_key      INT            FK β†’ dim_user,
    product_key   INT            FK β†’ dim_product,
    tenant_key    INT            FK β†’ dim_tenant,
    sessions      INT,
    active_minutes INT,
    files_created INT,
    files_shared  INT
)

-- DIMENSION: Date
dim_date (date_key, full_date, year, quarter, month, week, weekday, is_holiday)

-- DIMENSION: User (SCD Type 2 for role changes)
dim_user (user_key, user_id, email, department, role, country, is_current)

-- DIMENSION: Product
dim_product (product_key, product_name, product_family, tier)

-- DIMENSION: Tenant (customer organisation)
dim_tenant (tenant_key, tenant_id, industry, size_band, region, subscription_plan)
πŸ’‘ Key insight
Grain choice is critical β€” one row per user/product/day gives maximum query flexibility. Use SCD Type 2 on dim_user to track role changes over time (employees change departments). Always add a dim_tenant for multi-tenant SaaS products β€” it enables cohort analysis by customer segment.
πŸ” Common Follow-Up Questions
  • How would you handle a user who belongs to two tenants simultaneously?
  • How would you model a “feature flag” dimension β€” some users have Teams Copilot, others don’t?
  • At what point would you consider a snowflake schema instead of a star schema?


4-Week Study Plan for Microsoft Senior DA Interview

Microsoft evaluates breadth + depth β€” you need both SQL/Python skills AND product thinking. This plan builds both in parallel.

Week 1
Build your foundation β€” T-SQL syntax (DATEADD, GETDATE, recursive CTEs), Python pandas, Power BI basics, and MECE case study frameworks. Read The Data Monk’s Microsoft interview pattern posts.
Week 2
2200 Most Asked Analytics Interview Questions ebook β€” cover the SQL, Python, and case study sections. Focus on A/B testing, RFM, funnel analysis, and metric design questions. End of Week 2: First mock interview β†’
Week 3
Ace Any SQL Interview ebook β€” all 200 questions. Focus on T-SQL patterns: window functions, self-joins, recursive CTEs, gap problems. End of Week 3: Second mock interview β†’
Week 4
Revise + Power BI deep dive β€” build a sample dashboard, practice DAX measures vs columns, study DirectQuery vs Import tradeoffs. Run 2 metric drop case studies per day. End of Week 4: Final mock interview β†’
~950 questions + 3 mock interviews = enough to crack any product company. Or fast-track with the mentorship program below.

Scroll to Top