LinkedIn is one of the most sought-after companies for data roles in India. Preparing for LinkedIn Data Analyst interview questions requires a solid grip on SQL graph traversal, feed ranking logic, and product metrics — skills that go well beyond standard SQL. This post covers all three rounds with real questions from 2024–2026 LinkedIn interview experiences, exact SQL solutions, and scoring rubrics for each question.
Graph Queries
Product Analytics
Python
Tableau
Feed Ranking
Top Resources for LinkedIn Interview Prep
Two ebooks that cover everything you need — from SQL graph problems to product case studies.
- 2,200+ questions · 23 topics
- SQL, Python, Case Studies, Statistics
- Company-wise question breakdowns
- 220+ questions · SQL advanced, data models, DBMS
- Window functions, CTEs, graph traversal
- LinkedIn-style query patterns included
Round 1 — SQL & Graph Queries
| Questions | 5 |
| Duration | 45 minutes |
| Difficulty | Medium–Hard |
| Key Topics | Graph TraversalSelf-JoinsWindow FunctionsFeed RankingFunnel Analysis |
Find all second-degree connections for a given user (user_id = 12345): people who are friends-of-friends but NOT direct connections and NOT the user themselves. Also include the count of mutual connections for each result.
Table:
- connections(user_id_1, user_id_2, connected_date) — undirected: both (A,B) and (B,A) are stored
WITH direct AS (
-- All direct connections of user 12345
SELECT user_id_2 AS friend_id
FROM connections
WHERE user_id_1 = 12345
),
second_degree_raw AS (
-- Friends of direct connections
SELECT c.user_id_2 AS candidate_id
FROM connections c
JOIN direct d ON c.user_id_1 = d.friend_id
WHERE c.user_id_2 != 12345
AND c.user_id_2 NOT IN (SELECT friend_id FROM direct)
),
mutual_count AS (
-- Count mutual connections for each candidate
SELECT
s.candidate_id,
COUNT(DISTINCT d.friend_id) AS mutual_connections
FROM second_degree_raw s
JOIN direct d
ON d.friend_id IN (
SELECT user_id_2
FROM connections
WHERE user_id_1 = s.candidate_id
)
GROUP BY s.candidate_id
)
SELECT
candidate_id,
mutual_connections
FROM mutual_count
ORDER BY mutual_connections DESC;
- How would you modify this query if the table stored only one direction per edge?
- How would you find third-degree connections? What happens to query performance?
- LinkedIn shows “X mutual connections” on the profile — how would you expose this at scale for all user pairs?
For each post in the last 7 days, calculate a ranking score = (likes × 3 + comments × 5 + shares × 7 − skips × 2) / SQRT(hours_since_posted). Then rank posts by score within each post_type using window functions.
Tables:
- posts(post_id, author_id, post_type, created_at)
- interactions(interaction_id, user_id, post_id, action_type, ts) — action_type ∈ {view, like, comment, share, skip}
NULLIF or a GREATEST guard to prevent division by zero for posts less than an hour old. Interviewers also watch for whether you pivot the action counts cleanly using conditional aggregation vs. multiple joins.WITH interaction_counts AS (
SELECT
post_id,
SUM(CASE WHEN action_type = 'like' THEN 1 ELSE 0 END) AS likes,
SUM(CASE WHEN action_type = 'comment' THEN 1 ELSE 0 END) AS comments,
SUM(CASE WHEN action_type = 'share' THEN 1 ELSE 0 END) AS shares,
SUM(CASE WHEN action_type = 'skip' THEN 1 ELSE 0 END) AS skips
FROM interactions
GROUP BY post_id
),
post_scores AS (
SELECT
p.post_id,
p.author_id,
p.post_type,
p.created_at,
ic.likes,
ic.comments,
ic.shares,
ic.skips,
EXTRACT(EPOCH FROM (NOW() - p.created_at)) / 3600.0 AS hours_since_posted,
(
ic.likes * 3
+ ic.comments * 5
+ ic.shares * 7
- ic.skips * 2
) / SQRT(
GREATEST(EXTRACT(EPOCH FROM (NOW() - p.created_at)) / 3600.0, 1)
) AS ranking_score
FROM posts p
LEFT JOIN interaction_counts ic ON p.post_id = ic.post_id
WHERE p.created_at >= NOW() - INTERVAL '7 days'
)
SELECT
post_id,
author_id,
post_type,
ranking_score,
RANK() OVER (
PARTITION BY post_type
ORDER BY ranking_score DESC
) AS rank_within_type
FROM post_scores
ORDER BY post_type, rank_within_type;
GREATEST(..., 1) is a clean way to avoid division-by-zero without NULLIF and ensures posts less than an hour old get the maximum time-decay penalty (i.e., dividing by 1 instead of a fraction, which would artificially inflate them). This is a real production concern in feed systems.- How would you add a personalization factor — e.g., posts from closer connections get a score boost?
- What’s the difference between RANK() and DENSE_RANK() here? Which is more appropriate?
- How would you refresh these scores in near-real-time without recomputing all posts every minute?
Calculate: (a) view-to-apply rate by job category, (b) application-to-shortlist rate by company size, (c) median time in days from application to rejection. Identify job categories where apply rate > 15% AND shortlist rate > 25%.
Tables:
- job_views(user_id, job_id, viewed_at)
- applications(app_id, user_id, job_id, applied_at, status) — status ∈ {submitted, viewed_by_recruiter, shortlisted, rejected, hired}
- jobs(job_id, job_category, company_id, company_size)
MEDIAN in most dialects. Interviewers look for whether you use PERCENTILE_CONT(0.5) (ANSI standard) or correctly simulate it with NTILE. Also check that you join correctly — a user can view a job without applying, so the funnel denominator must come from views, not applications.-- (a) View-to-apply rate by job category
WITH view_counts AS (
SELECT
j.job_category,
COUNT(DISTINCT jv.user_id || '-' || jv.job_id) AS total_views
FROM job_views jv
JOIN jobs j ON jv.job_id = j.job_id
GROUP BY j.job_category
),
apply_counts AS (
SELECT
j.job_category,
COUNT(DISTINCT a.app_id) AS total_applications
FROM applications a
JOIN jobs j ON a.job_id = j.job_id
GROUP BY j.job_category
),
-- (b) Application-to-shortlist rate by company size
shortlist_by_size AS (
SELECT
j.company_size,
COUNT(DISTINCT a.app_id) AS total_apps,
COUNT(DISTINCT CASE WHEN a.status = 'shortlisted' THEN a.app_id END) AS shortlisted
FROM applications a
JOIN jobs j ON a.job_id = j.job_id
GROUP BY j.company_size
),
-- (c) Median days from application to rejection
rejection_times AS (
SELECT
a.app_id,
EXTRACT(DAY FROM (r.rejected_at - a.applied_at)) AS days_to_rejection
FROM applications a
JOIN (
SELECT app_id, MAX(applied_at) AS rejected_at
FROM applications
WHERE status = 'rejected'
GROUP BY app_id
) r ON a.app_id = r.app_id
WHERE a.status = 'rejected'
)
-- (a) view-to-apply, filtered for high performers
SELECT
v.job_category,
v.total_views,
COALESCE(ap.total_applications, 0) AS total_applications,
ROUND(100.0 * COALESCE(ap.total_applications, 0) / NULLIF(v.total_views, 0), 2) AS apply_rate_pct
FROM view_counts v
LEFT JOIN apply_counts ap ON v.job_category = ap.job_category
ORDER BY apply_rate_pct DESC;
-- Shortlist rate by company size
SELECT
company_size,
total_apps,
shortlisted,
ROUND(100.0 * shortlisted / NULLIF(total_apps, 0), 2) AS shortlist_rate_pct
FROM shortlist_by_size
ORDER BY shortlist_rate_pct DESC;
-- Median time to rejection
SELECT
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY days_to_rejection) AS median_days_to_rejection
FROM rejection_times;
- How would you handle users who viewed the same job multiple times — should each view count separately?
- How would you extend this to measure time-to-hire, not just time-to-rejection?
- If rejection rate spikes suddenly, what would you investigate first?
Evaluate LinkedIn’s “People You May Know” quality by: (a) acceptance rate by mutual friends count, (b) time-to-accept for accepted recommendations, (c) users who got 10+ recommendations but accepted 0 — flag them for reduced frequency.
Table:
- recommendations(user_id, recommended_user_id, recommendation_date, accepted, accepted_date, mutual_friends_count)
-- (a) Acceptance rate by mutual friends bucket
WITH bucketed AS (
SELECT
user_id,
recommended_user_id,
accepted,
mutual_friends_count,
CASE
WHEN mutual_friends_count = 0 THEN '0 mutuals'
WHEN mutual_friends_count BETWEEN 1 AND 2 THEN '1-2 mutuals'
WHEN mutual_friends_count BETWEEN 3 AND 5 THEN '3-5 mutuals'
ELSE '6+ mutuals'
END AS mutual_bucket
FROM recommendations
)
SELECT
mutual_bucket,
COUNT(*) AS total_recommendations,
SUM(CASE WHEN accepted = TRUE THEN 1 ELSE 0 END) AS accepted_count,
ROUND(100.0 * SUM(CASE WHEN accepted = TRUE THEN 1 ELSE 0 END)
/ NULLIF(COUNT(*), 0), 2) AS acceptance_rate_pct
FROM bucketed
GROUP BY mutual_bucket
ORDER BY MIN(mutual_friends_count);
-- (b) Time-to-accept for accepted recommendations
SELECT
user_id,
AVG(EXTRACT(DAY FROM (accepted_date - recommendation_date))) AS avg_days_to_accept,
PERCENTILE_CONT(0.5) WITHIN GROUP (
ORDER BY EXTRACT(DAY FROM (accepted_date - recommendation_date))
) AS median_days_to_accept
FROM recommendations
WHERE accepted = TRUE
GROUP BY user_id
ORDER BY avg_days_to_accept;
-- (c) Users with 10+ recommendations and 0 acceptances
SELECT
user_id,
COUNT(*) AS total_recommendations,
SUM(CASE WHEN accepted = TRUE THEN 1 ELSE 0 END) AS total_accepted
FROM recommendations
GROUP BY user_id
HAVING COUNT(*) >= 10
AND SUM(CASE WHEN accepted = TRUE THEN 1 ELSE 0 END) = 0
ORDER BY total_recommendations DESC;
- What other features beyond mutual_friends_count would you add to a recommendation quality model?
- How would you A/B test reducing recommendation frequency for low-acceptance users?
- If acceptance rate is 15% globally but drops to 8% in a specific country, what would you investigate?
Build a weekly Creator Performance Scorecard: total impressions, engagement rate, viral coefficient (shares / total interactions), average dwell time, and new follows within 48 hours of publishing — per creator per week.
Tables:
- posts(post_id, creator_id, content_type, published_at, word_count)
- interactions(interaction_id, user_id, post_id, action_type, dwell_time_seconds, ts) — action_type ∈ {view, like, comment, share, skip}
- follows(follower_id, creator_id, followed_at)
WITH weekly_posts AS (
SELECT
post_id,
creator_id,
content_type,
published_at,
DATE_TRUNC('week', published_at) AS post_week
FROM posts
),
interaction_stats AS (
SELECT
p.creator_id,
p.post_week,
COUNT(DISTINCT CASE WHEN i.action_type = 'view' THEN i.interaction_id END) AS impressions,
COUNT(DISTINCT CASE WHEN i.action_type != 'view' THEN i.interaction_id END) AS total_engagements,
COUNT(DISTINCT CASE WHEN i.action_type = 'share' THEN i.interaction_id END) AS shares,
COUNT(DISTINCT i.interaction_id) AS total_interactions,
AVG(CASE WHEN i.dwell_time_seconds > 0 THEN i.dwell_time_seconds END) AS avg_dwell_time
FROM weekly_posts p
JOIN interactions i ON p.post_id = i.post_id
GROUP BY p.creator_id, p.post_week
),
follow_attribution AS (
SELECT
p.creator_id,
p.post_week,
COUNT(DISTINCT f.follower_id) AS new_follows_48h
FROM weekly_posts p
JOIN follows f
ON f.creator_id = p.creator_id
AND f.followed_at BETWEEN p.published_at
AND p.published_at + INTERVAL '48 hours'
GROUP BY p.creator_id, p.post_week
)
SELECT
ist.creator_id,
ist.post_week,
ist.impressions,
ROUND(100.0 * ist.total_engagements / NULLIF(ist.impressions, 0), 2) AS engagement_rate_pct,
ROUND(1.0 * ist.shares / NULLIF(ist.total_interactions, 0), 4) AS viral_coefficient,
ROUND(ist.avg_dwell_time, 1) AS avg_dwell_time_seconds,
COALESCE(fa.new_follows_48h, 0) AS new_follows_48h,
RANK() OVER (
PARTITION BY ist.post_week
ORDER BY ist.impressions DESC
) AS impressions_rank
FROM interaction_stats ist
LEFT JOIN follow_attribution fa
ON ist.creator_id = fa.creator_id
AND ist.post_week = fa.post_week
ORDER BY ist.post_week DESC, impressions_rank;
CASE WHEN action_type != 'view' pattern for engagements — this cleanly separates impressions from real engagements without a subquery. The 48-hour BETWEEN join is intentionally tricky: if a creator posted 5 times in a week, a follow within 48h of any post counts — clarify this assumption with the interviewer before writing.- How would you normalize the scorecard so creators with very different follower counts are comparable?
- How would you attribute a follow to the most recent post vs. any post within 48 hours?
- What would a “creator health score” look like if you had to summarize all 5 metrics into one number?
Round 2 — Product Analytics
| Questions | 3 |
| Duration | 30 minutes |
| Difficulty | Medium |
| Key Topics | Funnel MetricsFeature AdoptionCohort AnalysisLeading Indicators |
InMail is LinkedIn’s paid messaging feature for recruiters. Define the full conversion funnel, the metric at each stage, a realistic benchmark, and what a 10% drop at each stage indicates.
The InMail Funnel:
| Stage | Metric | Benchmark | 10% Drop Indicates |
|---|---|---|---|
| InMail Sent | Total InMails sent / recruiter / week | 15–20 | Recruiter fatigue or quota exhaustion |
| InMail Opened | Open rate = opened / sent | 40–55% | Subject line irrelevance; wrong audience targeting |
| InMail Replied | Reply rate = replied / opened | 18–25% | Message content mismatch; role not compelling to candidate |
| Interview Scheduled | Interview rate = scheduled / replied | 30–40% | Mismatch between candidate expectations and role reality |
| Offer Made | Offer rate = offers / interview | 15–25% | Skills gap discovered in interviews; JD-to-interview misalignment |
| Hire | Hire rate = hired / offer | 60–80% | Competing offers; compensation not competitive |
- How would you build a “InMail Effectiveness Score” for each recruiter that compares them to peers in the same industry?
- If open rate is high but reply rate drops sharply, what A/B tests would you run?
- How do you handle attribution when a candidate gets 3 InMails from different recruiters and only responds to one?
LinkedIn Learning adoption is 8% of active users. Target: 20% in 6 months. Segment users by learning propensity, design 3 interventions, and define success metrics beyond the headline number.
(a) User Segmentation by Learning Propensity
Signals to build a propensity score:
- Skill endorsements on profile (indicated desire to signal skills)
- Recent job change or job seeking activity (career transition motivation)
- Profile views from recruiters in the past 30 days (external motivation)
- Incomplete skills section on profile (explicit gap)
- Connection with peers who completed Learning courses (social proof signal)
(b) 3 Targeted Interventions
- Intervention 1 — Skills Gap Nudge (High propensity, career seekers): Show “Recruiters searched for [skill] 3,400 times this week — you don not have it listed” in the feed. Hypothesis: personalized scarcity messaging increases Learning starts by 15%.
- Intervention 2 — Peer Learning Social Proof (Mid propensity, active networkers): Notify users when 3+ connections complete the same course. Hypothesis: social proof increases first course starts by 12%.
- Intervention 3 — Manager-Assigned Learning (Low propensity, Premium B2B users): Enable team leads to assign courses in LinkedIn Recruiter. Hypothesis: top-down assignment increases Learning starts among non-adopters by 20% in enterprise accounts.
(c) Success Metrics Beyond 20% Adoption
- Completion rate (starting a course is not learning)
- Return sessions (did users come back for a second course?)
- Skill add rate (did completing a course lead to a skill added on profile?)
- Job outcome correlation (did Learning completers get more recruiter InMails?)
- How would you set up an A/B test for Intervention 1 without contaminating the control group?
- If all three interventions are launched together, how do you measure the incremental contribution of each?
- What is the risk of optimizing for adoption rate at the expense of course quality?
Define 5 leading indicators that predict whether a free user will convert to LinkedIn Premium. How would you build a lookalike model, and what look-back window should you use for features?
5 Leading Indicators for Premium Conversion:
- InMail receipt rate — Free users who receive InMails see Premium value first-hand; they experience what recruiters can do with Premium
- Profile view velocity — Users whose profile views are accelerating are more likely to upgrade to see who viewed them (a core Premium feature)
- Search activity — Users searching for people with “Premium” filters blocked are frustrated by free-tier limits
- Job application rate — Active job seekers (5+ applications in 30 days) are high-conversion Premium Career candidates
- Connection request to acceptance ratio — Users with low acceptance rates are likely to upgrade for broader outreach via InMail
Lookalike Model Approach: Take users who converted to Premium in the last quarter. Extract the 5 features above for the 30-day window before their conversion date. Train a logistic regression or gradient boosted model on converters vs. matched non-converters. Score all active free users using the same 30-day window of features.
Look-back Window: 30 days is the recommended window. 7 days is too noisy (captures one-off spikes). 90 days dilutes the signal because LinkedIn usage patterns shift with job-seeking cycles. For seasonal roles (e.g., campus recruiting in October), add a 12-month seasonal feature on top of the 30-day window.
- How would you handle feature leakage in the lookalike model — some features might only appear because the user is already deciding to upgrade?
- How would you evaluate the model beyond AUC — what business metric matters for the sales team?
- Which of the 4 Premium tiers (Career, Business, Sales, Recruiter) would you build separate models for, and why?
Round 3 — Business Case Study
| Questions | 2 |
| Duration | 20 minutes |
| Difficulty | Medium |
| Key Topics | Root Cause AnalysisROI FrameworksGeo SegmentationSurvivorship Bias |
LinkedIn’s job recommendation CTR is 4.2% globally but only 1.8% in India. Walk through your analysis framework: 5 hypotheses for the gap, data to test each, and recommendations to close it.
5 Hypotheses (MECE):
- H1 — Profile Completeness (Data Quality): Indian users have less complete profiles (missing skills, work history gaps), so the recommendation model has fewer signals and suggests irrelevant roles. Test: Compare average profile completeness score for India vs. global; correlate profile completeness with CTR.
- H2 — Job Supply Mismatch (Supply Side): The ratio of senior/niche roles to mid-level roles in India on LinkedIn may be skewed, causing mismatches for the majority of users. Test: Compute job_role_level distribution of recommended vs. available jobs in India vs. globally.
- H3 — Salary Transparency Gap (Demand Side): Indian job listings often omit salary information. Users may click through less if they cannot quickly assess fit. Test: Compare CTR for India listings with vs. without salary data.
- H4 — Mobile vs. Desktop Experience (UX): India has higher mobile penetration. If the job card UI on mobile is less engaging or harder to click, CTR will be structurally lower. Test: Segment CTR by device type in India vs. globally.
- H5 — Model Training Data Skew (Algorithm): If the recommendation model was trained predominantly on US/EU click data, it may not generalize to Indian user intent patterns. Test: Retrain the model with India-specific features and run a holdout experiment.
Priority Recommendation: Start with H1 and H4 — they are testable in days with existing data. If profile completeness explains the gap, launch a profile completion nudge campaign targeted at India. If mobile UX explains it, run a mobile card redesign experiment. H5 (model retraining) is a 2–3 month project — validate it only after ruling out simpler causes.
- How would you design the experiment for H5 without disrupting the current recommendation system for all India users?
- CTR is a proxy metric — what is the north star metric for job recommendations, and could improving CTR hurt it?
- If you had to present one recommendation to the India VP in 5 minutes, which hypothesis would you lead with and why?
A customer’s ROI analysis team asks: is Sales Navigator worth $79/month? Define an ROI framework, identify control/treatment groups, handle survivorship bias in renewal data, and give a recommendation for a 3-month sales cycle.
ROI Framework:
- Revenue benefit: Incremental deals closed attributable to Sales Navigator leads × average deal value
- Time benefit: Hours saved on prospecting × hourly cost of sales rep time
- Cost: $79/month × number of licenses × months
- ROI = (Revenue Benefit + Time Benefit − Cost) / Cost
Control vs. Treatment Groups:
- Treatment: Sales reps with Sales Navigator access
- Control: Comparable reps at similar companies without Sales Navigator (or a pre-post comparison at the same company if a randomized holdout is not possible)
- Match on: company size, industry, sales cycle length, and rep tenure
Handling Survivorship Bias: The renewal data only includes companies who continued paying. To get an unbiased estimate, you must include churned customers in the denominator. Pull 12-month cohorts including both renewed and churned accounts. For churned accounts, impute the outcome period as zero incremental revenue (conservative) or use exit survey data if available. Failure to include churned accounts will systematically overstate ROI by 20–40%.
Recommendation for 3-Month Sales Cycle: With a 3-month sales cycle, a 1-month trial of Sales Navigator will not show deal closure outcomes — leads sourced in month 1 close in month 4. The right measurement window is 6 months minimum. Recommend a 6-month cohort study with a holdout group. Present interim leading indicators (pipeline value added, InMail reply rates, meetings booked) to the customer at the 3-month mark so they have evidence before their renewal decision.
- How would you structure the ROI analysis differently for a large enterprise (500+ licenses) vs. a small team (5 licenses)?
- If the customer’s industry has a 9-month average sales cycle, how does that change the study design?
- How would you handle the confound that companies who buy Sales Navigator may already be better at sales?
4-Week Preparation Plan for LinkedIn Interview
Build the Foundation
Master SQL window functions, self-joins for graph traversal, and product analytics frameworks. Read The Data Monk posts on LinkedIn-style SQL problems and feed ranking patterns.
2200 Most Asked Analytics Interview Questions
Work through the SQL + Python + Case Study sections. Focus on graph-query patterns and funnel metric questions. End of Week 2: First Mock Interview.
Ace Any SQL Interview Ebook
Go through all questions in the Ace Any SQL Interview ebook — no shortcuts. Pay special attention to window functions and recursive patterns. End of Week 3: Second Mock Interview.
Revise + Timed Practice
Revise first 5 chapters of the 2200 ebook. Do timed practice: 15 min/SQL, 20 min/case study. End of Week 4: Final Mock Interview.