Home β€Ί Business Intelligence Engineer β€Ί Google Senior Business Intelligence Analyst Interview Questions and...
Business Intelligence Engineer Case Study Company Interview Questions FAANG

Google Senior Business Intelligence Analyst Interview Questions and Answers 2026

The Google Senior Business Intelligence Analyst role in Bangalore is one of the most data-intensive analyst positions in the industry. Google expects you to go beyond dashboards β€” they test your ability to drive product decisions using BigQuery SQL, Looker/LookML, Python, and dbt, while applying deep product intuition to YouTube, Search, Ads, and Maps data. This guide covers all four rounds with real questions from 2025–2026 interview experiences.

Google
Senior Business Intelligence Analyst
Product Analytics / Growth BI
BigQuery
Looker
Python
dbt
Data Studio
SQL
Salary Range
β‚Ή35–55 LPA
5–8 Years Exp

Round 1 β€” BigQuery SQL

BigQuery SQL Round (Google-Scale Data)
Google uses BigQuery as its primary analytics warehouse. Expect standard SQL with BigQuery-specific functions β€” DATE_DIFF, EXTRACT, ARRAY_AGG, APPROX_QUANTILES, UNNEST. The interviewer tests whether you reason about partition pruning and cost before writing the query.
Questions 5
Duration 60 minutes
Difficulty Medium–Hard
Key Topics Window FunctionsRetention CohortsFunnel AnalysisBot DetectionBigQuery Syntax
Q1 β€” 30-Day User Retention Cohort by Signup Week
Hard⏱ 20 min

The growth team wants to understand how well each weekly signup cohort retains over 30 days. For each signup week, compute the % of users who return and perform at least one action on Day 7, Day 14, and Day 30.

Tables:

  • users(user_id, signup_date)
  • events(user_id, event_date, event_type)
🎯 What the interviewer tests
Whether you use DATE_DIFF correctly for cohort offset and whether you pivot retention across multiple day windows without running three separate subqueries. Google interviewers also ask: “How would you partition this table for cost efficiency?” Mentioning _PARTITIONTIME or event_date partitioning immediately signals production BigQuery experience.

WITH cohorts AS (
    SELECT
        u.user_id,
        DATE_TRUNC(u.signup_date, WEEK)                AS cohort_week,
        e.event_date,
        DATE_DIFF(e.event_date, u.signup_date, DAY)    AS days_since_signup
    FROM users u
    JOIN events e USING (user_id)
    WHERE e.event_date BETWEEN u.signup_date
                             AND DATE_ADD(u.signup_date, INTERVAL 30 DAY)
),
cohort_sizes AS (
    SELECT cohort_week, COUNT(DISTINCT user_id) AS cohort_size
    FROM cohorts
    GROUP BY cohort_week
)
SELECT
    c.cohort_week,
    cs.cohort_size,
    ROUND(100.0 * COUNT(DISTINCT CASE WHEN days_since_signup BETWEEN 6  AND 8  THEN c.user_id END) / cs.cohort_size, 1) AS day7_pct,
    ROUND(100.0 * COUNT(DISTINCT CASE WHEN days_since_signup BETWEEN 13 AND 15 THEN c.user_id END) / cs.cohort_size, 1) AS day14_pct,
    ROUND(100.0 * COUNT(DISTINCT CASE WHEN days_since_signup BETWEEN 29 AND 31 THEN c.user_id END) / cs.cohort_size, 1) AS day30_pct
FROM cohorts c
JOIN cohort_sizes cs USING (cohort_week)
GROUP BY c.cohort_week, cs.cohort_size
ORDER BY c.cohort_week;
πŸ’‘ Key insight
Use a Β±1-day window around each retention milestone (Day 7 = days 6–8) to account for users who return slightly early or late due to timezone differences. DATE_TRUNC(..., WEEK) groups cohorts by Monday-start weeks. Always join back to cohort_sizes β€” otherwise the denominator silently excludes users who never returned.
πŸ” Common Follow-Up Questions
  • How would you compute this if the events table has 10 billion rows β€” what partition strategy would you use?
  • Day-30 retention is 8% but Day-7 is 40%. What does that shape tell you about the product?
  • How would you visualise this cohort table in Looker as a heatmap?

Q2 β€” Top 10% Google Ads Advertisers by Spend with Cohort Rank
Medium⏱ 12 min

The ads team needs to identify their top-decile advertisers by total spend in the last 90 days and show each advertiser’s rank, spend, and the % of total platform spend they account for.

Tables:

  • ad_spend(advertiser_id, campaign_date, spend_usd)
🎯 What the interviewer tests
Whether you use NTILE(10) or PERCENT_RANK() and can explain the difference. They also want you to compute share-of-wallet (each advertiser’s % of total spend) β€” a common metric in Google Ads analytics. Expect: “What if an advertiser spent across 3 separate accounts β€” how do you handle that?”

WITH spend_90d AS (
    SELECT
        advertiser_id,
        SUM(spend_usd) AS total_spend
    FROM ad_spend
    WHERE campaign_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY)
    GROUP BY advertiser_id
),
ranked AS (
    SELECT
        advertiser_id,
        total_spend,
        RANK()   OVER (ORDER BY total_spend DESC)                  AS spend_rank,
        NTILE(10) OVER (ORDER BY total_spend DESC)                 AS decile,
        ROUND(100.0 * total_spend / SUM(total_spend) OVER (), 3)   AS pct_of_total
    FROM spend_90d
)
SELECT
    advertiser_id,
    total_spend,
    spend_rank,
    pct_of_total
FROM ranked
WHERE decile = 1
ORDER BY total_spend DESC;
πŸ’‘ Key insight
SUM(total_spend) OVER () without a PARTITION BY computes the grand total across all rows β€” use this for share-of-wallet. NTILE(10) is the right choice here: it guarantees exactly 10% of advertisers in each bucket, whereas PERCENT_RANK() gives a continuous 0–1 score and doesn’t naturally create equally-sized buckets.


πŸ“š Recommended Resources to Crack Google Interview
BESTSELLER
2200 Most Asked Analytics Interview Questions
β‚Ή1,999 β‚Ή7,999
  • 500+ SQL questions with solutions
  • 300+ Python & stats questions
  • Case study frameworks with 120+ examples
  • Covers all top product companies
Best for: Full interview prep in one place

Get the Book β†’

SQL FOCUSED
550 SQL Interview Questions to Crack Any Analytics Interview
β‚Ή499 β‚Ή1,999
  • 550 SQL questions across all difficulty levels
  • BigQuery-compatible SQL patterns
  • Window functions, cohort analysis, funnels
  • Google-scale data problems
Best for: SQL-heavy rounds like Round 1

Get the Book β†’

Q3 β€” YouTube Creator MoM Watch-Time Growth by Category
Medium⏱ 14 min

The YouTube team wants to identify which creator categories are growing watch-time fastest month-over-month. Compute MoM growth % for each category and flag categories with >20% MoM decline.

Tables:

  • video_views(video_id, view_date, watch_seconds)
  • videos(video_id, creator_id, category)

WITH monthly_watch AS (
    SELECT
        v.category,
        DATE_TRUNC(vv.view_date, MONTH)              AS month_start,
        SUM(vv.watch_seconds) / 3600.0               AS total_watch_hours
    FROM video_views vv
    JOIN videos v ON vv.video_id = v.video_id
    GROUP BY v.category, month_start
),
with_lag AS (
    SELECT
        category,
        month_start,
        total_watch_hours,
        LAG(total_watch_hours) OVER (
            PARTITION BY category
            ORDER BY month_start
        ) AS prev_month_hours
    FROM monthly_watch
)
SELECT
    category,
    month_start,
    ROUND(total_watch_hours, 0)  AS watch_hours,
    ROUND(
        100.0 * (total_watch_hours - prev_month_hours)
              / NULLIF(prev_month_hours, 0), 1
    )                             AS mom_growth_pct,
    CASE
        WHEN (total_watch_hours - prev_month_hours) / NULLIF(prev_month_hours, 0) < -0.20
        THEN '⚠️ Declining'
        ELSE 'OK'
    END                           AS status
FROM with_lag
WHERE prev_month_hours IS NOT NULL
ORDER BY month_start DESC, mom_growth_pct ASC;
πŸ’‘ Key insight
LAG() OVER (PARTITION BY category ORDER BY month_start) computes the prior-month value within each category β€” no self-join needed. Convert watch_seconds to hours before aggregating to keep numbers business-readable. The NULLIF guard is critical: a new category in month 2 with zero prior data would cause division by zero without it.
πŸ” Common Follow-Up Questions
  • Gaming category dropped 25% in March β€” what external factors would you check first?
  • How would you weight watch-time growth by creator count to avoid small categories skewing the ranking?
  • How would you expose this data in Looker so category managers can self-serve the analysis?

Q4 β€” Google Search CTR by Query Position and Device
Medium⏱ 14 min

The Search team wants to understand click-through rate (CTR = clicks / impressions) segmented by result position (1–10) and device type. Identify position–device combinations where CTR is statistically below the platform average.

Table:

  • search_results(query_date, query_id, position, device_type, impressions, clicks)
🎯 What the interviewer tests
Whether you correctly compute CTR as SUM(clicks)/SUM(impressions) rather than AVG(clicks/impressions) β€” a common error that gives incorrect results when impressions vary widely. They also test whether you can identify “below average” segments using a window function global average, rather than a hardcoded threshold.

WITH ctr_by_segment AS (
    SELECT
        position,
        device_type,
        SUM(clicks)                                      AS total_clicks,
        SUM(impressions)                                 AS total_impressions,
        SAFE_DIVIDE(SUM(clicks), SUM(impressions))       AS segment_ctr
    FROM search_results
    WHERE query_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
    GROUP BY position, device_type
),
with_platform_avg AS (
    SELECT
        *,
        AVG(segment_ctr) OVER () AS platform_avg_ctr
    FROM ctr_by_segment
)
SELECT
    position,
    device_type,
    ROUND(segment_ctr * 100, 2)      AS ctr_pct,
    ROUND(platform_avg_ctr * 100, 2) AS platform_avg_pct,
    ROUND((segment_ctr - platform_avg_ctr) * 100, 2) AS ctr_gap_pp,
    CASE
        WHEN segment_ctr < platform_avg_ctr * 0.8 THEN 'Under-performing'
        ELSE 'Within range'
    END AS performance_flag
FROM with_platform_avg
ORDER BY ctr_gap_pp ASC;
πŸ’‘ Key insight
SAFE_DIVIDE is the BigQuery-idiomatic way to handle division by zero β€” it returns NULL instead of throwing an error, which is safer than NULLIF in a large production pipeline. Reporting CTR gap in percentage points (pp) is more actionable than relative lift β€” a PM can directly see “mobile position 4 is 3.2pp below average” and prioritise the fix.

Q5 β€” Detect Bot Traffic via Session Anomaly Detection
Hard⏱ 22 min

Flag user sessions that show bot-like behaviour β€” defined as: more than 3 standard deviations above the mean session event count, or more than 500 events in a session, or less than 1 second average time between events.

Table:

  • session_events(session_id, user_id, event_ts TIMESTAMP, event_type)
🎯 What the interviewer tests
Whether you can combine statistical anomaly detection (3Οƒ rule) with rule-based thresholds in a single query. Google specifically tests this because bot filtering is core to Ads quality. They will ask: “What false-positive rate would you accept? How would you validate your bot labels against ground truth?”

WITH session_stats AS (
    SELECT
        session_id,
        user_id,
        COUNT(*)                                                        AS event_count,
        TIMESTAMP_DIFF(MAX(event_ts), MIN(event_ts), SECOND)           AS session_duration_secs,
        SAFE_DIVIDE(
            TIMESTAMP_DIFF(MAX(event_ts), MIN(event_ts), SECOND),
            NULLIF(COUNT(*) - 1, 0)
        )                                                               AS avg_secs_between_events
    FROM session_events
    GROUP BY session_id, user_id
),
platform_stats AS (
    SELECT
        AVG(event_count) AS mean_events,
        STDDEV(event_count) AS std_events
    FROM session_stats
)
SELECT
    s.session_id,
    s.user_id,
    s.event_count,
    s.avg_secs_between_events,
    ROUND((s.event_count - p.mean_events) / NULLIF(p.std_events, 0), 2) AS z_score,
    CASE
        WHEN s.event_count > 500 THEN 'rule:high_volume'
        WHEN s.avg_secs_between_events < 1 THEN 'rule:too_fast'
        WHEN (s.event_count - p.mean_events) / NULLIF(p.std_events, 0) > 3 THEN 'stat:3sigma'
        ELSE 'clean'
    END AS bot_flag
FROM session_stats s
CROSS JOIN platform_stats p
WHERE
    s.event_count > 500
    OR s.avg_secs_between_events < 1
    OR (s.event_count - p.mean_events) / NULLIF(p.std_events, 0) > 3
ORDER BY z_score DESC;
πŸ’‘ Key insight
CROSS JOIN platform_stats β€” a single-row CTE β€” efficiently broadcasts the global mean and stddev to every session row without a subquery per row. Labelling the bot reason (rule vs stat) is critical for downstream analysis: rule-based flags catch obvious bots; stat-based flags catch sophisticated bots that stay under fixed thresholds but are still statistical outliers.
πŸ” Common Follow-Up Questions
  • What precision/recall tradeoff would you accept for ad fraud detection vs content recommendation?
  • How would you handle a legitimate power user (e.g., a developer testing an app) who triggers the 3Οƒ flag?
  • How would you run this as an incremental dbt model on daily partitions instead of a full scan?


🎯 Crack Google With The Data Monk Mentorship
Starter Bundle
β‚Ή2,999
eBooks + email support + interview checklist

Get Started β†’

Elite Bundle
β‚Ή9,999
Everything + 1:1 mentorship + job referral network

Get Started β†’


Round 2 β€” Python Analytics

Python + Statistical Analysis
Google’s Python round tests statistical rigour alongside coding ability. You need to validate assumptions, handle at-scale data patterns, and communicate results clearly. Expect questions on A/B testing, forecasting, segmentation, and geospatial market analysis.
Questions 4
Duration 45 minutes
Difficulty Medium–Hard
Key Topics A/B TestingForecastingSegmentationPandasStatsmodels
Q1 β€” A/B Test: YouTube Shorts Recommendation Algorithm
Hard⏱ 20 min

YouTube tested a new Shorts recommendation algorithm (variant B) on 200,000 users per group. Variant B shows 4.8% 7-day retention vs control’s 4.3%. Evaluate statistical significance and practical impact at Google scale.

🎯 What the interviewer tests
Google expects you to go beyond a simple z-test. They want: (1) network effect check β€” do users in variant B interact with users in control? (2) novelty effect check β€” did retention spike in week 1 only? (3) practical significance at scale β€” 0.5% absolute lift on 1B YouTube users is enormous. Candidates who only run the test and report p-value miss the full picture.

from statsmodels.stats.proportion import proportions_ztest
import numpy as np

n = 200_000
conv_ctrl = 0.043  # 4.3%
conv_var  = 0.048  # 4.8%

counts = np.array([conv_var * n, conv_ctrl * n])
nobs   = np.array([n, n])

z, p = proportions_ztest(counts, nobs, alternative='larger')
print(f"Z-stat  : {z:.3f}")
print(f"P-value : {p:.6f}")
print(f"Result  : {'Significant βœ“' if p < 0.05 else 'Not significant βœ—'}")

# Relative and absolute lift
abs_lift = conv_var - conv_ctrl
rel_lift = abs_lift / conv_ctrl * 100
print(f"\nAbsolute lift  : {abs_lift*100:.1f}pp")
print(f"Relative lift  : {rel_lift:.1f}%")

# Google-scale business impact
monthly_users = 1_000_000_000
retained_delta = abs_lift * monthly_users
print(f"\nAt Google scale (1B users):")
print(f"  Additional retained users/month : {retained_delta:,.0f}")
print(f"  Estimated ARR impact (β‚Ή12/user) : β‚Ή{retained_delta * 12 / 1e9:.2f}B")
πŸ’‘ Key insight
At 200K users per group, even a 0.5pp lift is highly significant (p < 0.001). But the business question is whether it’s practically significant β€” at 1B users, 0.5pp retention lift translates to 5 million additional retained users per month. Always frame statistical results in business terms. Also flag: if the test ran for less than 2 weeks, the result may be inflated by the novelty effect.
πŸ” Common Follow-Up Questions
  • How would you check if the lift is driven by novelty effect (week 1 spike that fades)?
  • The algorithm change affects content recommendations β€” could this create a filter bubble concern?
  • How would you compute the minimum detectable effect before designing the experiment?

Q2 β€” Forecast Google Ads Revenue for Next 30 Days
Medium⏱ 15 min

The ads finance team needs a 30-day forward forecast of daily revenue. Use the last 180 days of data with proper seasonality handling (weekly + monthly cycles).

🎯 What the interviewer tests
Whether you reach for Prophet (the right tool) or try to use a simple moving average (wrong for seasonal data). They also test whether you can quantify forecast uncertainty with confidence intervals β€” a PM needs to know the range, not just the point estimate. Google additionally asks: “How would you detect when the forecast has gone stale?”

import pandas as pd
from prophet import Prophet

# Prepare data in Prophet format
df_prophet = df[["date", "revenue"]].rename(
    columns={"date": "ds", "revenue": "y"}
)

# Build model with weekly + yearly seasonality
model = Prophet(
    weekly_seasonality=True,
    yearly_seasonality=True,
    changepoint_prior_scale=0.05,   # conservative β€” avoids overfitting
    interval_width=0.90             # 90% confidence interval
)
model.fit(df_prophet)

# Generate 30-day forecast
future = model.make_future_dataframe(periods=30)
forecast = model.predict(future)

# Extract key columns
result = forecast[["ds", "yhat", "yhat_lower", "yhat_upper"]].tail(30)
result.columns = ["date", "forecast", "lower_90ci", "upper_90ci"]

# Sanity check: flag days where range is > 40% of forecast (high uncertainty)
result["uncertainty_flag"] = (
    (result["upper_90ci"] - result["lower_90ci"]) / result["forecast"] > 0.40
)
print(result.head(10))
πŸ’‘ Key insight
changepoint_prior_scale=0.05 makes Prophet conservative about detecting trend changes β€” useful for ad revenue which has gradual, not sudden, trend shifts. The 90% CI rather than 95% is intentional for planning: you want a tighter range that’s still credible. Always flag high-uncertainty days β€” they indicate upcoming holidays or events where the model has little historical signal.

Q3 β€” User Segmentation for Google Maps Feature Rollout
Medium⏱ 12 min

Segment Google Maps users into Heavy, Moderate, and Light users based on weekly session frequency and navigation distance, to prioritise which segment to target for a new offline maps feature.

import pandas as pd
import numpy as np

# Compute weekly summary per user
user_weekly = (
    df.groupby("user_id")
    .agg(
        avg_weekly_sessions = ("session_id", "nunique"),
        avg_nav_km          = ("navigation_km", "mean"),
        total_searches      = ("search_query", "count")
    )
    .reset_index()
)

# Normalise each feature to 0-1 for scoring
for col in ["avg_weekly_sessions", "avg_nav_km", "total_searches"]:
    user_weekly[f"{col}_norm"] = (
        (user_weekly[col] - user_weekly[col].min())
        / (user_weekly[col].max() - user_weekly[col].min())
    )

# Composite engagement score (equal weights)
user_weekly["engagement_score"] = (
    user_weekly["avg_weekly_sessions_norm"] * 0.40
  + user_weekly["avg_nav_km_norm"]          * 0.35
  + user_weekly["total_searches_norm"]      * 0.25
)

# Segment using percentile cutoffs
user_weekly["segment"] = pd.cut(
    user_weekly["engagement_score"],
    bins=[0, 0.33, 0.66, 1.0],
    labels=["Light", "Moderate", "Heavy"],
    include_lowest=True
)

print(user_weekly["segment"].value_counts(normalize=True).mul(100).round(1))
πŸ’‘ Key insight
Normalising before combining prevents high-magnitude features (navigation_km can be hundreds) from dominating low-magnitude features (sessions = 1–10). Weighting sessions highest (40%) reflects that frequency of use is the strongest proxy for engagement in navigation apps. Always validate segment sizes β€” if Heavy is only 3% of users, the offline feature rollout economics may not justify the investment.
πŸ” Common Follow-Up Questions
  • How would you validate that “Heavy” users are genuinely high-value, not just high-frequency bots?
  • The offline maps feature requires storage space β€” how would you factor device storage into the targeting model?

Q4 β€” Geographic Market Penetration Analysis for Google Pay
Easy⏱ 8 min

Compute Google Pay market penetration (active users / total smartphone users) per Indian state, identify states where penetration is below 20%, and calculate the potential user uplift if they reached the national average.

import pandas as pd

# df: state | active_users | smartphone_users
df["penetration_pct"] = (df["active_users"] / df["smartphone_users"] * 100).round(2)

national_avg = (df["active_users"].sum() / df["smartphone_users"].sum() * 100)
print(f"National penetration: {national_avg:.1f}%")

low_pen = df[df["penetration_pct"] < 20].copy()
low_pen["target_users"] = (national_avg / 100 * low_pen["smartphone_users"]).astype(int)
low_pen["uplift_users"] = low_pen["target_users"] - low_pen["active_users"]

print(
    low_pen[["state","penetration_pct","uplift_users"]]
    .sort_values("uplift_users", ascending=False)
    .to_string(index=False)
)
print(f"\nTotal uplift potential: {low_pen['uplift_users'].sum():,} users")
πŸ’‘ Key insight
Always compute the national average as total_active / total_smartphones (weighted), not mean(state_penetrations) (unweighted) β€” small states would distort the unweighted average. The uplift calculation shows the business value of market expansion in absolute user numbers, which is what growth teams budget against.


Round 3 β€” Product Analytics Case Study

Product Analytics & Root Cause Analysis
Google’s case study round tests your ability to think like a product manager with data. You are expected to define the right metrics, structure the problem before touching data, and reason about second-order effects across products. Structure matters as much as the answer.
Questions 1–2
Duration 45 minutes
Difficulty Hard
Key Topics Root Cause AnalysisMetric DesignMECE FrameworkCross-Product Effects
Q1 β€” Google Maps DAU Dropped 18% in India Last Week. Diagnose It.
Hard⏱ 25 min
🎯 What the interviewer tests
Google scores this on (1) whether you distinguish DAU from new user vs returning user drop β€” each has different implications, (2) whether you consider cross-product effects (Search, Assistant, Waze all feed Maps traffic), and (3) whether you think about measurement issues before jumping to product hypotheses. Most candidates go straight to “app bug” β€” that’s wrong.

πŸ—‚οΈ MECE Analysis β€” Google Maps India DAU Drop
Measurement / Data Issues (Check First)
Analytics SDK update silently filtering sessions Β· BigQuery pipeline delay masking sessions Β· Device fingerprinting change reducing deduplication Β· New privacy manifest (iOS/Android) blocking DAU logging Β· Daylight saving or timezone offset in date aggregation
Product / Feature Issues
New app version with Maps crash on Android (check crash rate) Β· Search box UX regression reducing query entry Β· Offline maps update broke navigation for non-premium users Β· Maps API latency spike causing users to abandon before load Β· Location permissions prompt change reducing GPS opt-in
Segment: Where Is the Drop?
By state (Bihar vs Maharashtra vs Delhi) Β· By device OS (Android vs iOS) Β· By use case (navigation vs search vs transit) Β· By user tenure (new vs 1yr+ users) Β· By access channel (app vs web vs embedded in Search)
External / Cross-Product Factors
Public holiday reducing commute-driven Maps use (check YoY same week) Β· Competitor (Apple Maps, OlaCabs) promotion reducing intent Β· Google Assistant query volume drop reducing Maps handoffs Β· Fuel price spike reducing driving and navigation intent Β· Government regulation affecting location data collection
πŸ’‘ Key insight
Always begin with: “Is this a measurement failure or a real drop?” β€” 30% of senior-level metric drops at Google are pipeline or SDK issues. Then decompose by user type: if new user DAU dropped but returning user DAU is flat, the problem is acquisition or onboarding. If both dropped equally, the cause is likely product-wide. Google Maps in India is heavily commute-driven β€” always check if the drop coincides with a public holiday or hartal before escalating.
πŸ” Common Follow-Up Questions
  • You find the drop is entirely in Android users on app version 11.2. What do you do next?
  • How would you build a real-time alert that catches an 18% DAU drop within 4 hours?
  • How would you present this investigation to the Maps India GM in a 3-slide deck?


Round 4 β€” BI Tools & Data Modeling

Looker / LookML + dbt + Data Warehouse Design
This round is unique to the Google SBIA role. You are expected to have hands-on experience with Looker (Google’s primary BI tool), understand LookML model design, know dbt incremental model patterns for BigQuery, and be able to design a star schema for a Google-scale analytics use case.
Questions 3
Duration 60 minutes
Difficulty Medium–Hard
Key Topics LookML / Lookerdbt IncrementalStar SchemaBigQuery Optimization
Q1 β€” Design a Looker Dashboard for YouTube Creator Analytics
Medium⏱ 18 min

A YouTube creator partnership manager needs a self-serve Looker dashboard to track top creators by watch-time, subscriber growth, and revenue per creator category. Design the LookML model and explain the key design decisions.

🎯 What the interviewer tests
Whether you know the LookML view/explore/model hierarchy and understand the difference between a dimension and a measure. They will specifically ask: “How would you handle the fan-out problem if a creator has multiple categories?” and “Why use derived tables instead of exposing the raw BigQuery table directly?”

# LookML View: creators
view: creators {
  sql_table_name: `analytics.dim_creators` ;;

  dimension: creator_id {
    type: string
    primary_key: yes
    sql: ${TABLE}.creator_id ;;
  }

  dimension: creator_name { type: string; sql: ${TABLE}.creator_name ;; }
  dimension: primary_category { type: string; sql: ${TABLE}.primary_category ;; }
  dimension: country { type: string; sql: ${TABLE}.country ;; }
  dimension: is_partner { type: yesno; sql: ${TABLE}.is_partner ;; }
}

# LookML View: video_performance (fact)
view: video_performance {
  sql_table_name: `analytics.fact_video_daily` ;;

  dimension_group: report {
    type: time
    timeframes: [date, week, month, quarter]
    sql: ${TABLE}.report_date ;;
  }

  measure: total_watch_hours {
    type: sum
    sql: ${TABLE}.watch_seconds / 3600.0 ;;
    value_format: "#,##0.0"
  }

  measure: total_revenue {
    type: sum
    sql: ${TABLE}.revenue_usd ;;
    value_format: "$#,##0.00"
  }

  measure: subscriber_net_gain {
    type: sum
    sql: ${TABLE}.subscribers_gained - ${TABLE}.subscribers_lost ;;
  }

  measure: revenue_per_watch_hour {
    type: number
    sql: ${total_revenue} / NULLIF(${total_watch_hours}, 0) ;;
    value_format: "$0.00"
  }
}

# Explore: joins creators to performance
explore: video_performance {
  label: "YouTube Creator Performance"
  join: creators {
    type: left_outer
    sql_on: ${video_performance.creator_id} = ${creators.creator_id} ;;
    relationship: many_to_one
  }
}
πŸ’‘ Key insight
Always use many_to_one relationship when joining a fact table to a dimension β€” this tells Looker not to fan out the aggregation. The revenue_per_watch_hour derived measure uses Looker’s ${measure} references rather than raw SQL β€” this ensures it automatically respects all active filters and drill-down dimensions. Using a separate dim_creators table rather than the raw creators table gives you control over which fields to expose and allows adding business-logic columns (like is_partner) without modifying the source system.

Q2 β€” Design a Star Schema for Google Ads Performance Reporting
Hard⏱ 22 min

Design a BigQuery data warehouse schema to support daily Google Ads performance reporting β€” clicks, impressions, conversions, cost β€” across campaigns, ad groups, keywords, and geographic targets.

🎯 What the interviewer tests
Whether you get the grain right (one row = one keyword/ad-group/day combination), whether you include surrogate keys (required for SCD Type 2 on keyword dimensions), and whether you think about BigQuery-specific optimisations like partition by report_date and cluster by advertiser_id. They will ask: “How does this schema handle keyword bid changes over time?”

-- FACT TABLE (grain: keyword Γ— ad_group Γ— date)
-- BigQuery: PARTITION BY report_date, CLUSTER BY advertiser_id
CREATE TABLE fact_ads_performance (
    report_date          DATE,          -- partition key
    advertiser_id        STRING,        -- cluster key
    campaign_key         INT64,         -- FK β†’ dim_campaign
    ad_group_key         INT64,         -- FK β†’ dim_ad_group
    keyword_key          INT64,         -- FK β†’ dim_keyword (SCD Type 2)
    geo_key              INT64,         -- FK β†’ dim_geography
    device_type          STRING,        -- mobile | desktop | tablet
    impressions          INT64,
    clicks               INT64,
    conversions          INT64,
    cost_usd             FLOAT64,
    -- Derived metrics (pre-computed for Looker performance)
    ctr                  FLOAT64,       -- clicks / impressions
    cpc                  FLOAT64,       -- cost / clicks
    conversion_rate      FLOAT64        -- conversions / clicks
);

-- DIM: Campaign
CREATE TABLE dim_campaign (
    campaign_key    INT64,
    campaign_id     STRING,
    campaign_name   STRING,
    campaign_type   STRING,            -- search | display | shopping | video
    bidding_strategy STRING,
    daily_budget_usd FLOAT64
);

-- DIM: Keyword (SCD Type 2 β€” tracks bid and match type changes)
CREATE TABLE dim_keyword (
    keyword_key     INT64,             -- surrogate key
    keyword_id      STRING,            -- natural key
    keyword_text    STRING,
    match_type      STRING,            -- broad | phrase | exact
    bid_usd         FLOAT64,
    valid_from      DATE,
    valid_to        DATE,
    is_current      BOOL
);

-- DIM: Geography
CREATE TABLE dim_geography (
    geo_key         INT64,
    country         STRING,
    state           STRING,
    city            STRING,
    geo_tier        STRING             -- tier1 | tier2 | tier3
);
πŸ’‘ Key insight
Pre-computing CTR, CPC, and conversion_rate in the fact table is a BigQuery best practice β€” it avoids recomputing division in every query and reduces slot consumption. SCD Type 2 on keywords is essential: if a keyword’s bid changes mid-campaign, you need to know what bid was active on each report date to correctly attribute cost to the bid strategy. Always specify PARTITION BY report_date CLUSTER BY advertiser_id β€” on a billion-row ads table, this reduces query cost by 80–90%.
πŸ” Common Follow-Up Questions
  • How would you handle a keyword that belongs to multiple ad groups β€” does your schema support that?
  • A new advertiser launches mid-month β€” how do you handle partial-month cohort analysis in this schema?
  • At what point would you use a wide denormalized table instead of a star schema for BigQuery?

Q3 β€” dbt Incremental Model for Daily Ad Event Log Processing
Medium⏱ 16 min

You have a raw ad click events table in BigQuery that receives 500M+ rows per day. Write a dbt incremental model that processes only new partitions each run, handles late-arriving data up to 3 days old, and builds the daily aggregated fact table.

🎯 What the interviewer tests
Whether you know the merge incremental strategy (required for late-arriving data β€” append would create duplicates) and whether you correctly use is_incremental() to conditionally filter the source. Google SBIA candidates must understand why unique_key is needed and what happens without it.

-- models/mart/fct_ads_daily.sql
{{
    config(
        materialized     = 'incremental',
        incremental_strategy = 'merge',
        unique_key       = ['report_date', 'advertiser_id', 'campaign_id', 'ad_id'],
        partition_by     = {'field': 'report_date', 'data_type': 'date'},
        cluster_by       = ['advertiser_id'],
        on_schema_change = 'sync_all_columns'
    )
}}

WITH raw_events AS (
    SELECT
        DATE(event_timestamp)     AS report_date,
        advertiser_id,
        campaign_id,
        ad_id,
        event_type,               -- 'click' | 'impression' | 'conversion'
        revenue_usd
    FROM {{ source('raw', 'ad_events') }}

    {% if is_incremental() %}
    -- Only process last 3 days to handle late-arriving events
    WHERE DATE(event_timestamp) >= DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY)
    {% endif %}
)
SELECT
    report_date,
    advertiser_id,
    campaign_id,
    ad_id,
    COUNTIF(event_type = 'impression')            AS impressions,
    COUNTIF(event_type = 'click')                 AS clicks,
    COUNTIF(event_type = 'conversion')            AS conversions,
    SUM(CASE WHEN event_type = 'click'
             THEN revenue_usd ELSE 0 END)         AS revenue_usd,
    SAFE_DIVIDE(
        COUNTIF(event_type = 'click'),
        NULLIF(COUNTIF(event_type = 'impression'), 0)
    )                                             AS ctr,
    CURRENT_TIMESTAMP()                           AS dbt_updated_at
FROM raw_events
GROUP BY report_date, advertiser_id, campaign_id, ad_id
πŸ’‘ Key insight
incremental_strategy = 'merge' with unique_key ensures that when late-arriving events come in (e.g., a click logged 2 days late), the row for that date is replaced not duplicated. The INTERVAL 3 DAY lookback covers Google’s SLA for late-arriving ad event data. Always include dbt_updated_at β€” it lets you audit when each row was last reprocessed and detect if a partition stopped being refreshed.
πŸ” Common Follow-Up Questions
  • What happens if you use append strategy instead of merge for this model?
  • How would you test this model with dbt tests to ensure no duplicate unique keys appear after a merge?
  • At what point would you move from incremental to a full-refresh model β€” and what triggers that decision?


4-Week Study Plan for Google Senior BI Analyst Interview

Google tests the deepest combination of SQL, BI tooling, and product thinking in the industry. The 4-week plan below builds all three systematically.

Week 1
Build your foundation β€” BigQuery SQL (DATE_DIFF, SAFE_DIVIDE, ARRAY_AGG, UNNEST), Python pandas, and product analytics frameworks (MECE, metric trees). Read The Data Monk’s Google analytics interview pattern posts to understand exactly what they test.
Week 2
2200 Most Asked Analytics Interview Questions ebook β€” cover the SQL, Python, and case study chapters. Focus on cohort analysis, A/B testing, and product metric design questions. End of Week 2: First mock interview β†’
Week 3
Ace Any SQL Interview ebook β€” all 200 questions. Focus on window functions, cohort retention patterns, and funnel analysis. Also spend 2 days on Looker LookML basics and dbt incremental models. End of Week 3: Second mock interview β†’
Week 4
Revise + deep dive on Google products β€” understand YouTube, Ads, Maps, and Google Pay metrics. Practice 2 metric-drop case studies per day with MECE framing. Run timed SQL sessions (20 min per query). End of Week 4: Final mock interview β†’
~950 questions + 3 mock interviews = enough to crack any product company. Or fast-track with the mentorship program below.

Scroll to Top