SQL 10 Days Series

Day 8 | SQL Error Handling, Stored Procedures and CTE Interview Questions

📅

✍️ TheDataMonk

⏱️ 27 min read

👁️ 2,674 views

Master production-grade SQL with comprehensive error handling, advanced debugging techniques, complex stored procedures, and recursive CTEs. This guide covers 8 real FAANG interview questions with detailed solutions you’ll encounter at Meta, Amazon, Google, Apple, and Netflix.

Error Handling & Debugging (4 Questions)
Stored Procedures & CTEs (4 Questions)
Production Best Practices

Introduction: Why Error Handling and Advanced SQL Matter in Production

Most junior engineers write SQL that works on happy paths. Senior engineers at FAANG write SQL that handles edge cases, recovers from errors gracefully, and scales to billions of rows. The difference between a $150K engineer and a $350K engineer is often their ability to:

Handle errors properly: Not just fail silently, but alert and recover intelligently
Debug efficiently: Find bottlenecks and data issues in minutes, not hours
Write maintainable code: Stored procedures and CTEs that 10 engineers can understand
Think about failure modes: What breaks? How do we recover?
Monitor in production: Catch issues before customers see them

FAANG companies test this because they’ve been burned by engineers who don’t. One bad query on a production database with 100M users can cost millions of dollars.

Error Handling & Debugging: 4 FAANG Interview Questions

Question 1 (Error Handling): Implement Robust Try-Catch with Rollback and Error Logging

Asked at: Amazon, Google

Interview Question:

“Write a stored procedure that transfers money between accounts. Include proper error handling with TRY-CATCH. If ANY error occurs, rollback the entire transaction AND log the error with timestamp, error code, and error message to an error_log table. The procedure should return a status code (0 = success, 1 = error).”

Why This Matters:

This tests if you understand:

Transaction control: COMMIT vs ROLLBACK, BEGIN TRANSACTION
Error handling: TRY-CATCH blocks, ERROR_NUMBER(), ERROR_MESSAGE()
Logging strategy: How to debug production issues
Return codes: How the application knows if it succeeded or failed
Data integrity: Never leave data in an inconsistent state

Solution: Complete Error-Handling Procedure

-- Create error log table first
CREATE TABLE ErrorLog (
    error_id INT AUTO_INCREMENT PRIMARY KEY,
    procedure_name VARCHAR(100),
    error_number INT,
    error_message VARCHAR(500),
    error_severity INT,
    error_state INT,
    logged_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    parameters_passed VARCHAR(500)  -- For debugging
);

-- The actual procedure with error handling
DELIMITER //

CREATE PROCEDURE TransferMoney(
    IN p_from_account INT,
    IN p_to_account INT,
    IN p_amount DECIMAL(10, 2),
    OUT p_status_code INT,
    OUT p_error_message VARCHAR(500)
)
BEGIN
    DECLARE EXIT HANDLER FOR SQLEXCEPTION
    BEGIN
        -- On any error, rollback and log
        ROLLBACK;

        -- Capture error details
        INSERT INTO ErrorLog (
            procedure_name,
            error_number,
            error_message,
            error_severity,
            error_state,
            parameters_passed
        ) VALUES (
            'TransferMoney',
            SQLSTATE,
            CONCAT('Account from: ', p_from_account, ', To: ', p_to_account, ', Amount: ', p_amount),
            0,
            0,
            'Transfer failed'
        );

        SET p_status_code = 1;
        SET p_error_message = CONCAT('Error: ', SQLSTATE);
    END;

    -- Start transaction
    START TRANSACTION;

    -- Validation: Check amount is positive
    IF p_amount <= 0 THEN
        ROLLBACK;
        SET p_status_code = 1;
        SET p_error_message = 'Amount must be greater than zero';
        LEAVE;
    END IF;

    -- Validation: Check account exists
    IF NOT EXISTS (SELECT 1 FROM Accounts WHERE account_id = p_from_account) THEN
        ROLLBACK;
        SET p_status_code = 1;
        SET p_error_message = 'From account does not exist';
        LEAVE;
    END IF;

    IF NOT EXISTS (SELECT 1 FROM Accounts WHERE account_id = p_to_account) THEN
        ROLLBACK;
        SET p_status_code = 1;
        SET p_error_message = 'To account does not exist';
        LEAVE;
    END IF;

    -- Validation: Check sufficient balance
    IF (SELECT balance FROM Accounts WHERE account_id = p_from_account) < p_amount THEN
        ROLLBACK;
        SET p_status_code = 1;
        SET p_error_message = 'Insufficient balance';
        LEAVE;
    END IF;

    -- Execute transfer
    UPDATE Accounts SET balance = balance - p_amount WHERE account_id = p_from_account;
    UPDATE Accounts SET balance = balance + p_amount WHERE account_id = p_to_account;

    -- Log successful transaction
    INSERT INTO TransactionLog (from_account, to_account, amount, status)
    VALUES (p_from_account, p_to_account, p_amount, 'SUCCESS');

    -- Commit if everything succeeded
    COMMIT;

    SET p_status_code = 0;
    SET p_error_message = 'Transfer successful';

END //

DELIMITER ;

-- How to call it:
CALL TransferMoney(1, 2, 100, @status, @message);
SELECT @status AS status_code, @message AS message;

-- Check error log if failed:
SELECT * FROM ErrorLog ORDER BY logged_at DESC LIMIT 10;

Key Error Handling Patterns:

Pattern	Why It Matters
EXIT HANDLER	Catches ANY unexpected error and rolls back
Validation Before Action	Check constraints BEFORE updating (faster rollback)
Error Logging	Log every error for debugging production issues
Return Status Code	Application knows if procedure succeeded (0 = success)

Interview Tip:

“When the interviewer asks about error handling, they want to see: validation, transactions, rollback, and logging. Explain: ‘I check constraints BEFORE modifying data, so if validation fails we avoid unnecessary updates. I log errors with context so we can debug production issues quickly.'”

Question 2 (Debugging): Write a Debug Procedure That Shows Query Execution and Data Issues

Asked at: Google, Apple

Interview Question:

“A stored procedure is running slow and returning wrong data. Write a debugging procedure that: (1) shows the execution plan, (2) counts rows at each step, (3) identifies NULL values causing issues, (4) shows data type mismatches. How would you troubleshoot this in production?”

Solution: Debug Information Gathering Procedure

-- Create a debug logging table
CREATE TABLE DebugLog (
    debug_id INT AUTO_INCREMENT PRIMARY KEY,
    procedure_name VARCHAR(100),
    step_name VARCHAR(100),
    row_count INT,
    value_checked VARCHAR(500),
    logged_at DATETIME DEFAULT CURRENT_TIMESTAMP
);

-- Debug procedure
DELIMITER //

CREATE PROCEDURE DebugQueryPerformance()
BEGIN
    DECLARE v_total_users INT;
    DECLARE v_users_with_null_email INT;
    DECLARE v_users_with_invalid_email INT;
    DECLARE v_duplicate_users INT;

    -- Step 1: Basic row counts
    SELECT COUNT(*) INTO v_total_users FROM Users;
    INSERT INTO DebugLog (procedure_name, step_name, row_count, value_checked)
    VALUES ('DebugQueryPerformance', 'Total users', v_total_users, 'All users');

    -- Step 2: Find NULL values (common issue)
    SELECT COUNT(*) INTO v_users_with_null_email FROM Users WHERE email IS NULL;
    INSERT INTO DebugLog (procedure_name, step_name, row_count, value_checked)
    VALUES ('DebugQueryPerformance', 'Users with NULL email', v_users_with_null_email, 'NULL emails can break joins');

    -- Step 3: Find invalid emails (pattern mismatch)
    SELECT COUNT(*) INTO v_users_with_invalid_email 
    FROM Users 
    WHERE email IS NOT NULL AND email NOT LIKE '%@%';
    INSERT INTO DebugLog (procedure_name, step_name, row_count, value_checked)
    VALUES ('DebugQueryPerformance', 'Users with invalid email format', v_users_with_invalid_email, 'No @ symbol');

    -- Step 4: Find duplicates
    SELECT COUNT(*) INTO v_duplicate_users 
    FROM (
        SELECT email, COUNT(*) as cnt FROM Users WHERE email IS NOT NULL GROUP BY email HAVING cnt > 1
    ) as duplicates;
    INSERT INTO DebugLog (procedure_name, step_name, row_count, value_checked)
    VALUES ('DebugQueryPerformance', 'Duplicate emails', v_duplicate_users, 'Duplicates cause wrong aggregation');

    -- Step 5: Show the debug summary
    SELECT 
        'Total Users' as metric,
        v_total_users as value
    UNION ALL
    SELECT 'NULL Emails', v_users_with_null_email
    UNION ALL
    SELECT 'Invalid Emails', v_users_with_invalid_email
    UNION ALL
    SELECT 'Duplicate Emails', v_duplicate_users;

END //

DELIMITER ;

-- Show slow query execution plan
-- Run BEFORE your query to see what the database does
EXPLAIN SELECT u.user_id, COUNT(o.order_id) as order_count
FROM Users u
LEFT JOIN Orders o ON u.user_id = o.user_id
GROUP BY u.user_id;

-- Key columns in EXPLAIN output:
-- type: 'ALL' = full table scan (slow!), 'index' = index scan, 'ref' = using index (fast)
-- key: which index was used (NULL = no index used)
-- rows: estimated rows examined
-- If rows is huge and type is ALL, add an index!

-- Common performance issues to look for:
-- 1. No index on join columns (type = ALL)
-- 2. Large intermediate result sets
-- 3. GROUP BY without index on grouping column
-- 4. Multiple JOINs creating Cartesian product (huge row count)

Debugging Checklist for Slow Queries:

Run EXPLAIN: Show execution plan. Look for type = ALL (bad)

Count rows at each step: Find where explosion happens

Check for NULLs: NULL in JOIN column breaks joins

Check for duplicates: Unintended joins create Cartesian product

Check data types: INT vs VARCHAR in JOIN causes conversion

Look at indexes: Is there an index on the join column?

Test with sample data: Does it work on 1000 rows? 1M rows?

Check for hidden joins: WHERE clauses creating implicit joins

Question 3 (Error Handling): Handle Division by Zero, Overflow, and Precision Issues

Asked at: Meta, Netflix

Interview Question:

“Write a procedure that calculates: average order value, revenue growth percentage, and margin percentage. Handle: (1) division by zero (no orders), (2) negative numbers (refunds), (3) precision issues (rounding), (4) NULL values.”

Solution: Safe Mathematical Operations

DELIMITER //

CREATE PROCEDURE CalculateMetrics(
    IN p_product_id INT,
    OUT p_avg_order_value DECIMAL(10, 2),
    OUT p_revenue_growth_pct DECIMAL(5, 2),
    OUT p_margin_pct DECIMAL(5, 2)
)
BEGIN
    DECLARE v_total_orders INT;
    DECLARE v_total_revenue DECIMAL(15, 2);
    DECLARE v_previous_month_revenue DECIMAL(15, 2);
    DECLARE v_total_cost DECIMAL(15, 2);
    DECLARE v_exit_code INT DEFAULT 0;

    DECLARE CONTINUE HANDLER FOR SQLEXCEPTION
    BEGIN
        SET v_exit_code = 1;
    END;

    START TRANSACTION;

    -- Count total orders
    SELECT COUNT(*) INTO v_total_orders 
    FROM Orders 
    WHERE product_id = p_product_id;

    -- Safety Check 1: Division by zero
    IF v_total_orders = 0 THEN
        SET p_avg_order_value = 0;
        SET p_revenue_growth_pct = 0;
        SET p_margin_pct = 0;
        ROLLBACK;
        EXIT;
    END IF;

    -- Calculate total revenue (handle NULLs and negative values)
    SELECT COALESCE(SUM(amount), 0) INTO v_total_revenue
    FROM Orders
    WHERE product_id = p_product_id
      AND amount > 0;  -- Exclude refunds from revenue

    -- Average order value (safe division)
    SET p_avg_order_value = ROUND(
        CASE 
            WHEN v_total_orders > 0 THEN v_total_revenue / v_total_orders
            ELSE 0
        END, 2
    );

    -- Get previous month's revenue
    SELECT COALESCE(SUM(amount), 0) INTO v_previous_month_revenue
    FROM Orders
    WHERE product_id = p_product_id
      AND amount > 0
      AND MONTH(order_date) = MONTH(CURDATE()) - 1
      AND YEAR(order_date) = YEAR(CURDATE());

    -- Revenue growth (safe division, handle negative)
    SET p_revenue_growth_pct = ROUND(
        CASE 
            WHEN v_previous_month_revenue = 0 THEN 0  -- Can't grow from 0
            WHEN v_total_revenue >= v_previous_month_revenue THEN 
                ((v_total_revenue - v_previous_month_revenue) / v_previous_month_revenue) * 100
            ELSE 
                -(((v_previous_month_revenue - v_total_revenue) / v_previous_month_revenue) * 100)
        END, 2
    );

    -- Get total cost
    SELECT COALESCE(SUM(cost), 0) INTO v_total_cost
    FROM Products
    WHERE product_id = p_product_id;

    -- Margin percentage (safe division)
    SET p_margin_pct = ROUND(
        CASE 
            WHEN v_total_revenue = 0 THEN 0
            WHEN v_total_cost >= v_total_revenue THEN -100  -- Loss
            ELSE ((v_total_revenue - v_total_cost) / v_total_revenue) * 100
        END, 2
    );

    COMMIT;

END //

DELIMITER ;

-- Usage:
CALL CalculateMetrics(123, @avg_order, @growth_pct, @margin);
SELECT 
    @avg_order AS average_order_value,
    @growth_pct AS revenue_growth_percent,
    @margin AS margin_percent;

-- Key Safety Patterns:
-- 1. COALESCE(..., 0): Handle NULL from SUM()
-- 2. WHERE amount > 0: Exclude refunds from revenue
-- 3. CASE WHEN div_by_zero_risk = 0: Avoid division by zero
-- 4. ROUND(..., 2): Always round money to 2 decimals
-- 5. Handle negative (loss): Check if cost > revenue

Common Math Pitfalls to Avoid:

Division by zero: Always check denominator is > 0
NULL propagation: NULL + 1 = NULL. Use COALESCE(column, 0)
Integer division: 5 / 2 = 2 (not 2.5). Multiply by 1.0 first: 5 * 1.0 / 2
Precision loss: Always ROUND money to 2 decimals
Negative numbers: Check for losses, refunds, write-offs
Overflow: DECIMAL(15,2) prevents overflow for currency

Question 4 (Debugging): Find and Fix Data Quality Issues That Break Queries

Asked at: Amazon, Google

Interview Question:

“Your revenue query is returning NULL or incorrect results. Find: (1) rows with missing timestamps, (2) duplicate records, (3) future-dated orders, (4) negative amounts. Show a diagnostic report.”

Solution: Data Quality Diagnostic Procedure

DELIMITER //

CREATE PROCEDURE DiagnoseDataQuality()
BEGIN
    -- Create temporary table for results
    CREATE TEMPORARY TABLE DataQualityReport (
        issue_type VARCHAR(100),
        row_count INT,
        affected_ids VARCHAR(1000),
        severity VARCHAR(20),
        recommendation VARCHAR(500)
    );

    -- Issue 1: Missing timestamps
    INSERT INTO DataQualityReport (issue_type, row_count, severity, recommendation)
    SELECT 
        'Missing order timestamp',
        COUNT(*),
        'CRITICAL',
        'Cannot aggregate by time. Fix before proceeding.'
    FROM Orders
    WHERE order_date IS NULL;

    -- Issue 2: Future-dated orders (data entry error)
    INSERT INTO DataQualityReport (issue_type, row_count, severity, recommendation)
    SELECT 
        'Future-dated orders',
        COUNT(*),
        'WARNING',
        'Exclude these from revenue calculations. Likely data entry errors.'
    FROM Orders
    WHERE order_date > CURDATE();

    -- Issue 3: Negative amounts (should be in refunds table)
    INSERT INTO DataQualityReport (issue_type, row_count, severity, recommendation)
    SELECT 
        'Negative order amounts',
        COUNT(*),
        'WARNING',
        'Should be in Refunds table, not Orders. Move or exclude.'
    FROM Orders
    WHERE amount < 0;

    -- Issue 4: Duplicate orders (same user, amount, timestamp)
    INSERT INTO DataQualityReport (issue_type, row_count, severity, recommendation)
    SELECT 
        'Duplicate orders',
        COUNT(*),
        'CRITICAL',
        'Remove duplicates. This inflates revenue by 2-Nx.'
    FROM (
        SELECT user_id, order_date, amount, COUNT(*) as cnt
        FROM Orders
        GROUP BY user_id, order_date, amount
        HAVING cnt > 1
    ) as dups;

    -- Issue 5: Orphaned orders (user_id doesn't exist in Users table)
    INSERT INTO DataQualityReport (issue_type, row_count, severity, recommendation)
    SELECT 
        'Orphaned orders (user not in Users)',
        COUNT(*),
        'WARNING',
        'Foreign key constraint missing. Add constraint or update user records.'
    FROM Orders o
    LEFT JOIN Users u ON o.user_id = u.user_id
    WHERE u.user_id IS NULL;

    -- Issue 6: Type mismatch in join columns
    INSERT INTO DataQualityReport (issue_type, row_count, severity, recommendation)
    SELECT 
        'Type mismatch (user_id as string not int)',
        COUNT(*),
        'CRITICAL',
        'Fix column types. Implicit conversion causes slow queries.'
    FROM Orders
    WHERE user_id IS NOT NULL
      AND CAST(user_id AS CHAR) NOT REGEXP '^[0-9]+$';

    -- Issue 7: NULL foreign keys
    INSERT INTO DataQualityReport (issue_type, row_count, severity, recommendation)
    SELECT 
        'NULL user_id in orders',
        COUNT(*),
        'CRITICAL',
        'Cannot join to Users. Investigate source.'
    FROM Orders
    WHERE user_id IS NULL;

    -- Display full report
    SELECT * FROM DataQualityReport ORDER BY severity DESC;

    -- Summary statistics
    SELECT 
        COUNT(*) as total_issues,
        SUM(CASE WHEN severity = 'CRITICAL' THEN 1 ELSE 0 END) as critical_issues,
        SUM(CASE WHEN severity = 'WARNING' THEN 1 ELSE 0 END) as warning_issues
    FROM DataQualityReport;

    DROP TEMPORARY TABLE DataQualityReport;

END //

DELIMITER ;

-- Call the diagnostic procedure:
CALL DiagnoseDataQuality();

-- Output tells you exactly what's wrong and how to fix it

Interview Tip:

“When a query is broken, the first thing to do is check data quality. Show: ‘I’d create a diagnostic procedure that checks for NULLs, duplicates, type mismatches, and boundary violations. Most bugs are data issues, not query logic issues.'”

🎯 Master Production SQL at FAANG Level?

Error handling and debugging are 80% of real engineering work. These 8 questions cover what actually breaks in production.

Get Complete Production SQL Guide →

Stored Procedures & Advanced CTEs: 4 FAANG Interview Questions

Question 5 (Stored Procedures): Build a Reusable Procedure for Complex Business Logic

Asked at: Meta, Amazon

Interview Question:

“Build a procedure that processes monthly bonuses: (1) Calculate performance score for each employee based on orders, sales, and customer satisfaction, (2) Determine bonus tier (bronze/silver/gold based on score), (3) Apply bonus amount (10%/20%/30% of salary), (4) Update employee compensation table, (5) Return audit trail showing calculations.”

Solution: Complex Business Logic Procedure

CREATE TABLE EmployeeBonus (
    bonus_id INT AUTO_INCREMENT PRIMARY KEY,
    employee_id INT,
    bonus_month DATE,
    performance_score DECIMAL(5, 2),
    bonus_tier VARCHAR(20),
    bonus_amount DECIMAL(10, 2),
    calculated_at DATETIME DEFAULT CURRENT_TIMESTAMP
);

DELIMITER //

CREATE PROCEDURE CalculateMonthlyBonus(
    IN p_month DATE,
    OUT p_total_bonuses DECIMAL(15, 2),
    OUT p_employees_processed INT,
    OUT p_status_message VARCHAR(500)
)
BEGIN
    DECLARE v_exit_code INT DEFAULT 0;
    DECLARE v_employee_cursor CURSOR FOR 
        SELECT employee_id, salary FROM Employees WHERE active = 1;
    DECLARE DONE INT DEFAULT FALSE;
    DECLARE v_emp_id INT;
    DECLARE v_salary DECIMAL(10, 2);
    DECLARE v_performance_score DECIMAL(5, 2);
    DECLARE v_bonus_tier VARCHAR(20);
    DECLARE v_bonus_pct DECIMAL(3, 2);
    DECLARE v_bonus_amount DECIMAL(10, 2);
    DECLARE v_order_count INT;
    DECLARE v_sales_total DECIMAL(15, 2);
    DECLARE v_satisfaction_score DECIMAL(3, 2);

    DECLARE CONTINUE HANDLER FOR NOT FOUND SET DONE = TRUE;
    DECLARE CONTINUE HANDLER FOR SQLEXCEPTION
    BEGIN
        SET v_exit_code = 1;
        ROLLBACK;
    END;

    -- Initialize totals
    SET p_total_bonuses = 0;
    SET p_employees_processed = 0;

    START TRANSACTION;

    -- Process each active employee
    OPEN v_employee_cursor;

    process_employees: LOOP
        FETCH v_employee_cursor INTO v_emp_id, v_salary;
        IF DONE THEN
            LEAVE process_employees;
        END IF;

        -- Calculate performance score from multiple metrics
        -- Score is weighted: 40% orders, 40% sales, 20% satisfaction

        -- Metric 1: Order count (normalized to 0-100)
        SELECT COUNT(*) INTO v_order_count
        FROM Orders
        WHERE employee_id = v_emp_id
          AND MONTH(order_date) = MONTH(p_month)
          AND YEAR(order_date) = YEAR(p_month);

        -- Metric 2: Total sales
        SELECT COALESCE(SUM(amount), 0) INTO v_sales_total
        FROM Orders
        WHERE employee_id = v_emp_id
          AND MONTH(order_date) = MONTH(p_month)
          AND YEAR(order_date) = YEAR(p_month);

        -- Metric 3: Customer satisfaction (0-100 scale)
        SELECT COALESCE(AVG(rating), 0) INTO v_satisfaction_score
        FROM CustomerReviews
        WHERE employee_id = v_emp_id
          AND MONTH(review_date) = MONTH(p_month)
          AND YEAR(review_date) = YEAR(p_month);

        -- Calculate weighted performance score (0-100)
        SET v_performance_score = 
            (LEAST(v_order_count / 10 * 100, 100) * 0.4) +  -- Order metric
            (LEAST(v_sales_total / 10000 * 100, 100) * 0.4) + -- Sales metric
            (v_satisfaction_score * 0.2);                      -- Satisfaction metric

        -- Determine bonus tier based on score
        IF v_performance_score >= 80 THEN
            SET v_bonus_tier = 'gold';
            SET v_bonus_pct = 0.30;
        ELSEIF v_performance_score >= 60 THEN
            SET v_bonus_tier = 'silver';
            SET v_bonus_pct = 0.20;
        ELSEIF v_performance_score >= 40 THEN
            SET v_bonus_tier = 'bronze';
            SET v_bonus_pct = 0.10;
        ELSE
            SET v_bonus_tier = 'none';
            SET v_bonus_pct = 0;
        END IF;

        -- Calculate bonus amount
        SET v_bonus_amount = v_salary * v_bonus_pct;

        -- Insert bonus record for audit trail
        INSERT INTO EmployeeBonus (
            employee_id,
            bonus_month,
            performance_score,
            bonus_tier,
            bonus_amount
        ) VALUES (
            v_emp_id,
            p_month,
            v_performance_score,
            v_bonus_tier,
            v_bonus_amount
        );

        -- Accumulate totals
        SET p_total_bonuses = p_total_bonuses + v_bonus_amount;
        SET p_employees_processed = p_employees_processed + 1;

    END LOOP;

    CLOSE v_employee_cursor;

    COMMIT;

    SET p_status_message = CONCAT(
        'Successfully processed ',
        p_employees_processed,
        ' employees. Total bonuses: $',
        ROUND(p_total_bonuses, 2)
    );

END //

DELIMITER ;

-- Usage:
CALL CalculateMonthlyBonus('2024-01-01', @total, @count, @msg);
SELECT @total AS total_bonuses, @count AS employees_processed, @msg AS status_message;

-- View audit trail:
SELECT * FROM EmployeeBonus 
WHERE DATE_TRUNC('month', bonus_month) = DATE_TRUNC('month', CURDATE())
ORDER BY performance_score DESC;

Key Procedure Design Patterns:

Cursor loop: Process each row with business logic
Multiple calculations: Weighted scoring from multiple metrics
Audit trail: Log all calculations for verification
Error handling: Roll back if anything fails
Output parameters: Return summary to application
Normalization: LEAST() to cap metric scores at 100

Question 6 (CTEs): Recursive CTE for Hierarchical Data Processing

Asked at: Google, Apple

Interview Question:

“You have an organizational hierarchy with managers and reports. Write a recursive CTE that: (1) finds all subordinates under a given manager, (2) shows the depth/level in hierarchy, (3) calculates total compensation cost for the entire team, (4) shows the chain of command path.”

Solution: Recursive CTE with Hierarchical Aggregation

WITH RECURSIVE team_hierarchy AS (
    -- ANCHOR: Start with the manager
    SELECT 
        employee_id,
        name,
        manager_id,
        salary,
        1 as hierarchy_level,
        CONCAT(name) as chain_of_command
    FROM Employees
    WHERE employee_id = 5  -- Start with specific manager

    UNION ALL

    -- RECURSIVE: Get all subordinates
    SELECT 
        e.employee_id,
        e.name,
        e.manager_id,
        e.salary,
        th.hierarchy_level + 1,
        CONCAT(th.chain_of_command, ' > ', e.name)
    FROM Employees e
    INNER JOIN team_hierarchy th ON e.manager_id = th.employee_id
    WHERE th.hierarchy_level < 10  -- Prevent infinite loops
)
SELECT 
    employee_id,
    name,
    hierarchy_level,
    salary,
    chain_of_command,
    -- Count direct reports
    (SELECT COUNT(*) FROM team_hierarchy t2 
     WHERE t2.manager_id = team_hierarchy.employee_id) as direct_reports,
    -- Sum of subordinates' salaries (compensation cost)
    (SELECT COALESCE(SUM(salary), 0) FROM team_hierarchy t3
     WHERE t3.manager_id = team_hierarchy.employee_id) as subordinates_salary_cost
FROM team_hierarchy
ORDER BY hierarchy_level, employee_id;

-- Alternative: Use aggregate in CTE itself
WITH RECURSIVE team_cost AS (
    SELECT 
        employee_id,
        name,
        manager_id,
        salary,
        salary as subtree_cost,  -- Start with own salary
        1 as depth
    FROM Employees
    WHERE manager_id IS NULL  -- CEO

    UNION ALL

    SELECT 
        e.employee_id,
        e.name,
        e.manager_id,
        e.salary,
        tc.subtree_cost + e.salary,
        tc.depth + 1
    FROM Employees e
    INNER JOIN team_cost tc ON e.manager_id = tc.employee_id
)
SELECT 
    employee_id,
    name,
    depth,
    salary,
    subtree_cost as total_team_cost
FROM team_cost
WHERE subtree_cost > 100000  -- Show expensive teams
ORDER BY subtree_cost DESC;

-- Key Recursive CTE Patterns:
-- 1. ANCHOR query: Start with top-level (manager_id = specific ID)
-- 2. RECURSIVE query: Join to find children
-- 3. UNION ALL: Combine anchor + recursive
-- 4. Termination: WHERE clause prevents infinite loops
-- 5. Depth tracking: Increment level at each recursion
-- 6. Path building: CONCAT to show full hierarchy path

Question 7 (CTEs): Complex Multi-Stage CTE for Data Pipeline

Asked at: Amazon, Netflix

Interview Question:

“Build a data pipeline with 5 stages: (1) Extract raw events, (2) Deduplicate, (3) Calculate user metrics, (4) Identify cohorts, (5) Rank users by value. Each stage transforms data for the next stage.”

Solution: Multi-Stage Data Pipeline with CTEs

WITH 

-- STAGE 1: Extract and clean raw events
raw_events AS (
    SELECT 
        event_id,
        user_id,
        event_type,
        event_date,
        amount,
        ROW_NUMBER() OVER (PARTITION BY user_id, event_date, event_type ORDER BY event_id) as rn
    FROM Events
    WHERE event_date >= DATE_SUB(CURDATE(), INTERVAL 90 DAY)  -- Last 90 days
),

-- STAGE 2: Deduplicate (keep only first occurrence)
deduplicated_events AS (
    SELECT 
        event_id,
        user_id,
        event_type,
        event_date,
        amount
    FROM raw_events
    WHERE rn = 1  -- Keep first occurrence only
),

-- STAGE 3: Calculate user metrics
user_metrics AS (
    SELECT 
        user_id,
        COUNT(DISTINCT event_date) as active_days,
        COUNT(CASE WHEN event_type = 'purchase' THEN 1 END) as purchase_count,
        SUM(CASE WHEN event_type = 'purchase' THEN amount ELSE 0 END) as total_spend,
        AVG(CASE WHEN event_type = 'purchase' THEN amount ELSE NULL END) as avg_purchase_value,
        MIN(event_date) as first_event_date,
        MAX(event_date) as last_event_date
    FROM deduplicated_events
    GROUP BY user_id
),

-- STAGE 4: Identify user cohorts based on behavior
user_cohorts AS (
    SELECT 
        user_id,
        active_days,
        purchase_count,
        total_spend,
        avg_purchase_value,
        first_event_date,
        last_event_date,
        -- Cohort logic: VIP = spend > $1000, Active = 10+ active days, etc.
        CASE 
            WHEN total_spend > 1000 AND active_days >= 20 THEN 'VIP'
            WHEN total_spend > 500 AND active_days >= 10 THEN 'Premium'
            WHEN total_spend > 100 AND active_days >= 5 THEN 'Regular'
            WHEN total_spend > 0 THEN 'Low-Value'
            ELSE 'Inactive'
        END as user_segment,
        -- LTV (Lifetime Value) estimate based on 90-day window
        ROUND(total_spend * (365.0 / 90.0), 2) as estimated_annual_ltv
    FROM user_metrics
),

-- STAGE 5: Rank users by value within each segment
ranked_users AS (
    SELECT 
        user_id,
        user_segment,
        estimated_annual_ltv,
        total_spend,
        purchase_count,
        active_days,
        RANK() OVER (PARTITION BY user_segment ORDER BY estimated_annual_ltv DESC) as segment_rank,
        ROW_NUMBER() OVER (PARTITION BY user_segment ORDER BY estimated_annual_ltv DESC) as segment_row_num,
        ROUND(100.0 * RANK() OVER (PARTITION BY user_segment ORDER BY estimated_annual_ltv DESC) / 
              COUNT(*) OVER (PARTITION BY user_segment), 2) as percentile_in_segment
    FROM user_cohorts
)

-- FINAL OUTPUT: Top users in each segment
SELECT 
    user_segment,
    RANK() OVER (PARTITION BY user_segment ORDER BY segment_rank) as top_n,
    COUNT(*) as users_in_segment,
    SUM(estimated_annual_ltv) as segment_total_ltv,
    AVG(estimated_annual_ltv) as segment_avg_ltv
FROM ranked_users
WHERE segment_rank <= 100  -- Top 100 in each segment
GROUP BY user_segment
ORDER BY segment_total_ltv DESC;

-- This query shows:
-- Stage 1: Raw data cleaned and filtered
-- Stage 2: Duplicates removed
-- Stage 3: Metrics calculated (spend, activity, etc.)
-- Stage 4: Users segmented into cohorts
-- Stage 5: Ranked by value within segment
-- Final: Summary statistics per segment

Why This Pattern Matters:

Data pipelines with CTEs:

Each CTE is tested independently

Previous stages can be debugged (SELECT from any CTE)

Easy to modify one stage without breaking others

Performance-optimized by database engine

Self-documenting code (stage names explain purpose)

Question 8 (Stored Procedures + CTEs): Real-World: Nightly Data Processing Job

Asked at: Meta, Google, Amazon

Interview Question:

“Design a nightly stored procedure that: (1) Processes 50M events into daily summary, (2) Updates user cohort assignments, (3) Recalculates all metrics, (4) Identifies fraud, (5) Sends alerts, (6) Handles failures gracefully, (7) Logs execution details.”

Solution: Production-Grade Nightly Job

CREATE TABLE JobExecutionLog (
    job_id INT AUTO_INCREMENT PRIMARY KEY,
    procedure_name VARCHAR(100),
    execution_date DATE,
    start_time DATETIME,
    end_time DATETIME,
    status VARCHAR(20),  -- SUCCESS, FAILURE, PARTIAL
    rows_processed INT,
    error_message VARCHAR(1000),
    duration_seconds INT
);

DELIMITER //

CREATE PROCEDURE NightlyDataProcessing(
    OUT p_status VARCHAR(20),
    OUT p_error_message VARCHAR(500)
)
BEGIN
    DECLARE v_job_start DATETIME;
    DECLARE v_job_end DATETIME;
    DECLARE v_rows_processed INT DEFAULT 0;
    DECLARE v_error_count INT DEFAULT 0;
    DECLARE v_exit_code INT DEFAULT 0;
    DECLARE v_job_id INT;

    DECLARE CONTINUE HANDLER FOR SQLEXCEPTION
    BEGIN
        SET v_error_count = v_error_count + 1;
        ROLLBACK;
    END;

    SET v_job_start = NOW();

    -- Log job start
    INSERT INTO JobExecutionLog (procedure_name, execution_date, start_time, status)
    VALUES ('NightlyDataProcessing', CURDATE(), v_job_start, 'RUNNING');
    SET v_job_id = LAST_INSERT_ID();

    START TRANSACTION;

    -- STEP 1: Process daily summary (50M events)
    BEGIN
        INSERT INTO DailySummary (user_id, event_date, event_count, revenue)
        WITH daily_summary AS (
            SELECT 
                user_id,
                DATE(event_timestamp) as event_date,
                COUNT(*) as event_count,
                SUM(amount) as revenue
            FROM Events
            WHERE DATE(event_timestamp) = CURDATE() - 1  -- Yesterday's data
            GROUP BY user_id, DATE(event_timestamp)
        )
        SELECT * FROM daily_summary;

        SET v_rows_processed = ROW_COUNT();
    END;

    -- STEP 2: Update cohort assignments
    BEGIN
        WITH cohort_calc AS (
            SELECT 
                user_id,
                SUM(revenue) as total_revenue,
                COUNT(DISTINCT event_date) as active_days,
                CASE 
                    WHEN SUM(revenue) > 1000 THEN 'VIP'
                    WHEN SUM(revenue) > 500 THEN 'Premium'
                    ELSE 'Standard'
                END as new_cohort
            FROM DailySummary
            WHERE event_date >= DATE_SUB(CURDATE(), INTERVAL 90 DAY)
            GROUP BY user_id
        )
        UPDATE UserCohorts uc
        INNER JOIN cohort_calc cc ON uc.user_id = cc.user_id
        SET uc.cohort = cc.new_cohort,
            uc.last_updated = NOW();
    END;

    -- STEP 3: Identify fraud
    BEGIN
        WITH fraud_candidates AS (
            SELECT 
                user_id,
                COUNT(*) as transaction_count,
                SUM(amount) as total_amount,
                MAX(amount) - MIN(amount) as amount_variance
            FROM DailySummary
            WHERE event_date = CURDATE() - 1
            GROUP BY user_id
            HAVING COUNT(*) > 100 OR (MAX(amount) > 10000 AND COUNT(*) > 1)
        )
        INSERT INTO FraudAlerts (user_id, alert_type, severity, created_at)
        SELECT 
            user_id,
            'SUSPICIOUS_ACTIVITY',
            CASE 
                WHEN transaction_count > 500 THEN 'CRITICAL'
                WHEN total_amount > 50000 THEN 'HIGH'
                ELSE 'MEDIUM'
            END,
            NOW()
        FROM fraud_candidates;
    END;

    -- STEP 4: Send alerts
    BEGIN
        INSERT INTO AlertQueue (alert_id, recipient_email, alert_type, status)
        SELECT 
            alert_id,
            u.email,
            alert_type,
            'PENDING'
        FROM FraudAlerts fa
        INNER JOIN Users u ON fa.user_id = u.user_id
        WHERE DATE(fa.created_at) = CURDATE();
    END;

    COMMIT;

    -- Success path
    SET v_job_end = NOW();
    SET p_status = 'SUCCESS';
    SET p_error_message = '';

    UPDATE JobExecutionLog
    SET 
        status = 'SUCCESS',
        end_time = v_job_end,
        rows_processed = v_rows_processed,
        duration_seconds = TIMESTAMPDIFF(SECOND, v_job_start, v_job_end)
    WHERE job_id = v_job_id;

EXCEPTION HANDLER
    -- Error path
    SET v_job_end = NOW();
    SET p_status = 'FAILURE';
    SET p_error_message = 'Job failed. Check JobExecutionLog for details.';

    UPDATE JobExecutionLog
    SET 
        status = 'FAILURE',
        end_time = v_job_end,
        error_message = CONCAT('Error processing: ', ERROR_MESSAGE()),
        duration_seconds = TIMESTAMPDIFF(SECOND, v_job_start, v_job_end)
    WHERE job_id = v_job_id;

END //

DELIMITER ;

-- Schedule this procedure to run nightly:
-- CREATE EVENT NightlyDataProcessingEvent
-- ON SCHEDULE EVERY 1 DAY
-- STARTS '2024-01-01 23:00:00'
-- DO CALL NightlyDataProcessing(@status, @message);

-- Check job logs:
SELECT * FROM JobExecutionLog ORDER BY start_time DESC LIMIT 10;

Production-Grade Procedure Checklist:

✅ Transactional: All or nothing. Partial failures are rolled back.
✅ Logged: Every execution recorded with status, duration, errors
✅ Error recovery: CONTINUE HANDLER catches errors
✅ Performance tracking: Rows processed, duration measured
✅ Scalable: Handles 50M+ rows overnight
✅ Scheduled: Can be run via EVENT or cron
✅ Alerts: Fraud detection integrated
✅ Debuggable: Logs show exactly what ran and when

Ready to Build Production SQL?

These 8 questions represent 80% of what senior engineers do: handle errors gracefully, design robust procedures, and build scalable data pipelines.

Our complete guide includes:

✓ 220+ real FAANG interview questions
✓ Error handling patterns
✓ Production deployment strategies
✓ Performance optimization tips
✓ Complete working examples

Get Complete SQL Mastery Guide →

Production Best Practices: What FAANG Senior Engineers Know

Error Handling Best Practices

Practice	Why It Matters	Impact
Validate before action	Catch errors early, before updating	10x faster rollback
Log everything	Debug production issues in minutes	MTTR -70%
Return status codes	Application knows if it succeeded	No silent failures
Use transactions	All-or-nothing. Never partial updates	Data integrity guaranteed
Handle NULL carefully	NULL propagates. Use COALESCE()	Prevents silent bugs

Stored Procedure Best Practices

Keep it simple: Complex logic goes to application code. DB procedures are for transactions only.
Use meaningful names: Procedures should be named for what they do: CalculateMonthlyBonus, not proc_calc
Document parameters: Every IN/OUT parameter should have a comment
Test edge cases: Empty tables, NULL values, negative numbers, huge datasets
Monitor execution: Log runtime, rows affected, errors
Version control: All procedures in Git with comments for each version

CTE Best Practices

One CTE per logical step: Stage 1, Stage 2, etc. Not 20 CTEs in one query.
Name descriptively: raw_events, deduplicated_events, user_metrics, not t1, t2, t3
Test each stage: SELECT * FROM stage_2 to verify before moving to stage 3
Recursive CTEs need limits: Always have WHERE clause to prevent infinite loops
Don’t abuse CTEs: If it gets >10 CTEs, refactor into procedure + temp table
Performance testing: Run on 1M+ rows to see actual performance

Debugging Checklist

When a query is slow or broken:

Run EXPLAIN to see execution plan

Count rows at each stage

Check for NULLs in join columns

Check for type mismatches

Look for Cartesian products (joining without ON clause)

Check indexes exist on join/where columns

Test with sample data first (1000 rows, then 1M)

Use LIMIT during debugging

Comment out sections to isolate the problem

Check data quality (nulls, duplicates, future dates)

Interview Tips: What FAANG Interviewers Look For

✅ Traits of Candidates Who PASS:

“What happens if the query fails?” → Shows you think about error cases
“How do I debug this?” → Shows you know diagnostic techniques
“Let me check for NULLs first…” → Shows experience with real data
“I’d add logging here…” → Shows production thinking
“For 100M rows, this needs an index…” → Shows scalability awareness
“Let me break this into stages…” → Shows structured thinking
“I’d test the procedure with edge cases…” → Shows thoroughness

❌ Traits of Candidates Who FAIL:

“I’ve never written a stored procedure” → Red flag at FAANG
“Error handling is the application’s job” → Wrong! DB must protect data
“This should work fine” (without testing) → No confidence
“I don’t know what a CTE is” → Outdated knowledge
“NULL handling isn’t important” → Has burned production before
“Performance doesn’t matter” → Hasn’t worked at scale
“I’d just add a try-catch” (without thought) → Cargo cult programming

Key Takeaways: Master Production SQL

Error handling is mandatory: Every procedure needs TRY-CATCH and logging
Debugging is a skill: Learn EXPLAIN, row counting, data quality checks
Stored procedures are for transactions: Complex business logic in one atomic unit
CTEs make pipelines clear: Each stage is independent and testable
Data quality matters most: 80% of bugs are data issues, not logic
Think about failure: “What breaks?” before “Will this work?”
Log everything: Production support depends on logs to debug
Test at scale: 1000 rows != 1M rows. Performance matters.

What’s Next?

You now understand the advanced SQL patterns that senior engineers use daily. To prepare for FAANG:

Set up a local database: PostgreSQL or MySQL with 1M+ rows
Write stored procedures: At least 5 complex ones with error handling
Build CTEs for pipelines: 5+ multi-stage data pipelines
Practice debugging: Deliberately break queries, then fix them
Learn EXPLAIN: Read execution plans, understand index usage
Handle errors: Every procedure needs TRY-CATCH
Record yourself: Explain your solutions out loud
Time yourself: Medium questions in 20-25 minutes

Ready to interview at FAANG?
The 8 questions here cover the core production patterns. FAANG asks variations of these in every technical round.

Get 220+ practice questions with solutions →

Want 2,200+ More Questions Like This?

The Data Monk has compiled the most comprehensive analytics interview question bank — SQL, Python, ML, Statistics & Case Studies. Built by 50+ experts from Amazon, Flipkart & OYO.

Get the Full Question Bank — ₹1,999 Download Free Case Study Ebook

📚 Browse All Content

📊 Data Analyst 45

SQL 18 Python 12 Case Studies 10 Excel 5

🤖 Data Scientist 38

Machine Learning 15 Statistics 12 Python 11

⚙️ Data Engineer 22

SQL 10 Python 8 Cloud & Tools 4

📋 Business Analyst 30

Case Studies 14 Product Metrics 10 SQL 6

🏢 By Company 40

Amazon 12 Flipkart 8 Microsoft 7 Walmart 6 Google 4

Day 8 | SQL Error Handling, Stored Procedures and CTE Interview Questions

Table of Contents

Introduction: Why Error Handling and Advanced SQL Matter in Production

Error Handling & Debugging: 4 FAANG Interview Questions

Question 1 (Error Handling): Implement Robust Try-Catch with Rollback and Error Logging

Why This Matters:

Solution: Complete Error-Handling Procedure

Key Error Handling Patterns:

Interview Tip:

Question 2 (Debugging): Write a Debug Procedure That Shows Query Execution and Data Issues

Solution: Debug Information Gathering Procedure

Debugging Checklist for Slow Queries:

Question 3 (Error Handling): Handle Division by Zero, Overflow, and Precision Issues

Solution: Safe Mathematical Operations

Common Math Pitfalls to Avoid:

Question 4 (Debugging): Find and Fix Data Quality Issues That Break Queries

Solution: Data Quality Diagnostic Procedure

Interview Tip:

Stored Procedures & Advanced CTEs: 4 FAANG Interview Questions

Question 5 (Stored Procedures): Build a Reusable Procedure for Complex Business Logic

Solution: Complex Business Logic Procedure

Key Procedure Design Patterns:

Question 6 (CTEs): Recursive CTE for Hierarchical Data Processing

Solution: Recursive CTE with Hierarchical Aggregation

Question 7 (CTEs): Complex Multi-Stage CTE for Data Pipeline

Solution: Multi-Stage Data Pipeline with CTEs

Why This Pattern Matters:

Question 8 (Stored Procedures + CTEs): Real-World: Nightly Data Processing Job

Solution: Production-Grade Nightly Job

Production-Grade Procedure Checklist:

Ready to Build Production SQL?

Production Best Practices: What FAANG Senior Engineers Know

Error Handling Best Practices

Stored Procedure Best Practices

CTE Best Practices

Debugging Checklist

Interview Tips: What FAANG Interviewers Look For

✅ Traits of Candidates Who PASS:

❌ Traits of Candidates Who FAIL:

Key Takeaways: Master Production SQL

What’s Next?

Want 2,200+ More Questions Like This?

Related Interview Questions

📚 Browse All Content

🔥 Top Resource

Quick Links