5 Most Important SQL questions before you appear for your Data Science Interview

SQL is the bread and butter of an analyst. You can’t survive in the Data Science industry with a grip on this ‘easy-looking’ query language. I have been interviewed for more than 30 companies in the past 3-4 years. SQL rounds are mostly a rapid fire round where you either keep on answering all the answers or start missing them after a threshold.

This is one of those rounds in which you can impress the interviewer. Recently, I have been taking interviews and I can assure you that most of the logics asked in the interviews are repeated, no upto some extent, but completely repeated 😛

Following is the point charter:-
5 correct – SQL God, You are going to nail 9/10 SQL interviews
4 correct – Really Good, clearing the SQL rounds should not be a problem
3 correct – Ummm..Dicey, you should be able to crack a few rounds
2 correct – You need at least 2-3 weeks before you start applying
1 correct – Padh lo Beta, sochna bhi nai apply krne ka
0 correct – Try an MBA/MS in Analytics/SBI PO/UPSC

Please comment your answer or send it directly to me over Linkedin

Q1. Suppose there is a Movie Theatre with 26 rows(A,B,C..Z) and in each row you have 6 seats. The structure of table is given below

DateRow_NoSeat_NoOccupiedName
04-Apr-20AA1YesKuchi Bhi
04-Apr-20AA2YesKuchi Bhi
04-Apr-20AA3No
04-Apr-20AA4No
04-Apr-20AA5No
04-Apr-20AA6No
04-Apr-20BB1YesKuchi Bhi
04-Apr-20BB2No
04-Apr-20BB3No
04-Apr-20BB4YesKuch Bhi
04-Apr-20BB5No
04-Apr-20BB6YesKuchi Bhi
Table Name – PVR

First, let me know all the starting seats where the number of consecutive vacant seats is 2 (B2 here)
Secondly, write a generalised approach to solve for any number of vacant seats. Basically, you need to create a table with two columns,
1. Seat_No
2. Number of consecutive vacant seats

Q2. There are multiple ways to get the 3rd highest salary, write down at least three. This question is important because the moment you tell the interviewer the first way, he/she will ask to solve the same in any other way

Hint –
1. Naive Approach
2. Inner Query
3. Ranking


Q3. I don’t remember any interview which doesn’t have this question

Table ATable B
11
11
11
1
1

There are two tables, column name in Table A is X and in Table B is Y

How many rows will the resultant have, if you do:-
a. inner join
b. left join
c. right join
d. outer join
e. cross join

Justify your answer

Q4. I own 5 restaurants in Bangalore, following is the table of business, get me the date on which each of these restaurant crossed a total revenue of Rs.10,000

RestaurantDateRevenue
Bangalore Mandrian01/04/205000
Bangalore Mandrian02/04/204000
Bangalore Mandrian03/04/203000
Bangalore Mandrian04/04/206000
Boondock Bistro01/04/207000
Boondock Bistro02/04/203000
Boondock Bistro03/04/205000
Oliver’s02/04/2011000
Oliver’s03/04/208000
Oliver’s04/04/209000
Whitefiled Social02/04/2010000
Punjabi By Nature03/04/208000

Q5. There is a table with employee and manager information, the schema is given below. Write a query to get the EmployeeName and ManagerName.

Hint – Consider the edge cases, i.e. your query should cater the Manager Name of the CEO as well

You can access – 300+ Data Science Interview Questions covering SQL,R,Python,Case Studies, Guesstimates, Statistics, and Machine Learning Questions in the embedded link.

Keep Learning 🙂

The Data Monk

Author: TheDataMonk

I am the Co-Founder of The Data Monk. I have a total of 6+ years of analytics experience 3+ years at Mu Sigma 2 years at OYO 1 year and counting at The Data Monk I am an active trader and a logically sarcastic idiot :)

18 thoughts on “5 Most Important SQL questions before you appear for your Data Science Interview”

  1. I have made my attempt at these questions. I want to find the author’s approach to these queries, where can I find that?

    TIA

  2. Q2. 3rd Highest Salary

    — creating Employee temporary table
    CREATE TABLE Employee (name varchar(10), salary int);

    — inserting sample data into Employee table
    INSERT INTO Employee VALUES (‘Rohit’, 3000);
    INSERT INTO Employee VALUES (‘John’, 4000);
    INSERT INTO Employee VALUES (‘Shaan’, 3000);
    INSERT INTO Employee VALUES (‘Christopher’, 5000);
    INSERT INTO Employee VALUES (‘Mike’, 7000);

    Assuming N= 3

    1. Inner Query Method. Generic solution for most DBs
    –1.1. 3rd highest , use n-1 = 2 with condition Emp2.Salary > Emp1.Salary
    SELECT *
    FROM Employee Emp1
    WHERE (2) = (
    SELECT COUNT(DISTINCT(Emp2.Salary))
    FROM Employee Emp2
    WHERE Emp2.Salary > Emp1.Salary)

    –1.2 3rd highest , use n = 3 with condition Emp2.Salary >= Emp1.Salary
    SELECT *
    FROM Employee Emp1
    WHERE (2) = (
    SELECT COUNT(DISTINCT(Emp2.Salary))
    FROM Employee Emp2
    WHERE Emp2.Salary > Emp1.Salary)

    2.Using Ranking Function. We should use DENSE_RANK() instead of ROW_NUMBER() or RANK() because in case of repetitive salary dense rank will give correct results.

    SELECT * FROM
    ( SELECT e.*, DENSE_RANK() OVER
    (ORDER BY salary DESC) rn FROM Employee e )
    WHERE rn = 3;

    3. Naive Methods :

    3.1 SELECT salary FROM Employee ORDER BY salary DESC LIMIT N-1, 1 — works for mysql
    3.2 SELECT TOP 1 salary FROM ( SELECT DISTINCT TOP 3 salary FROM Employee ORDER BY salary DESC ) AS temp ORDER BY salary — works for sql server

  3. Hi Data Monk Team , Please clear that In question 3 table B has 3 records only with 1,1,1 or is there any null also to consider (1,1,1,null,null) in tableB ??

  4. Q3.
    /*
    — Tables creation
    CREATE LOCAL TEMPORARY TABLE t1
    (ID INT)
    ON COMMIT PRESERVE ROWS;

    CREATE LOCAL TEMPORARY TABLE t2
    (ID INT)
    ON COMMIT PRESERVE ROWS;

    INSERT INTO t1
    select 1
    union all
    select 1
    union all
    select 1
    union all
    select 1
    union all
    select 1

    INSERT INTO t2
    select 1
    union all
    select 1
    union all
    select 1 ;

    */

    ANS : /* Below OUTPUT is same for all the joins. Reason is explained further.
    1 1
    1 1
    1 1
    1 1
    1 1
    1 1
    1 1
    1 1
    1 1
    1 1
    1 1
    1 1
    1 1
    1 1
    1 1
    */

    Total Number of Rows = 15 , because in inner join for every 1 in table t1 join with every 1 in table t2
    5 ones in t1 , 3 ones in t2. total rows = 5*3 = 15

    Total Number of Rows = 15 , because in left join all records of table t1 will join with matching records in table t2
    all 5 ones in table t1 will match with 3 ones in t2. total rows = 5*3 = 15

    Total Number of Rows = 15 , because in right join all records of table t2 will join with matching records in table t1
    all 3 ones in table t2 will match with 5 ones in t1. total rows = 5*3 = 15

    Total Number of Rows = 15 , because in outer join/full join all records of table t1 will join with all records in table t2
    all 5 ones in table t1 will match with all 3 ones in t2. total rows = 5*3 = 15

    Total Number of Rows = 15 , cross join is simple cartesian product of both number of records in table.
    5 records in table t1 multiply with 3 records in table t2. total rows = 5*3 = 15

  5. Hi Data Monk Team , Please delete previous post as it was incomplete and get posted automatically while i was writing. This is the correct answer.
    Q4.
    Answer :

    Step1 : Temp Table creation and Data Loading
    /*
    –Syntax of local temp table creation in vertica
    CREATE LOCAL TEMPORARY TABLE Cafe_Accounts
    (Resturant VARCHAR(100),
    Date VARCHAR(100),
    Revenue BIGINT
    )
    ON COMMIT PRESERVE ROWS;

    INSERT INTO Cafe_Accounts
    SELECT ‘Bangalore Mandrian’, ’01/04/20′, 5000 UNION ALL
    SELECT ‘Bangalore Mandrian’, ’02/04/20′, 4000 UNION ALL
    SELECT ‘Bangalore Mandrian’, ’03/04/20′, 3000 UNION ALL
    SELECT ‘Bangalore Mandrian’, ’04/04/20′, 6000 UNION ALL
    SELECT ‘Boondock Bistro’, ’01/04/20′, 7000 UNION ALL
    SELECT ‘Boondock Bistro’, ’02/04/20′, 3000 UNION ALL
    SELECT ‘Boondock Bistro’, ’03/04/20′, 5000 UNION ALL
    SELECT ‘Oliver’s’, ’02/04/20′, 11000 UNION ALL
    SELECT ‘Oliver’s’, ’03/04/20′, 8000 UNION ALL
    SELECT ‘Oliver’s’, ’04/04/20′, 9000 UNION ALL
    SELECT ‘Whitefiled Social’, ’02/04/20′, 10000 UNION ALL
    SELECT ‘Punjabi By Nature’, ’03/04/20′, 8000
    */

    Step2 : Query Writing :
    SELECT Resturant , min(Date) as DATE , min(Running_Sum) AS Running_Sum
    FROM
    (
    SELECT a.Resturant , a.Date , SUM(b.Revenue) as Running_Sum
    FROM Cafe_Accounts a
    INNER JOIN Cafe_Accounts b
    ON b.date = 10000
    GROUP by a.Resturant
    order by a.Resturant;

    Step3: Output we got
    Resturant DATE Running_Sum
    Bangalore Mandrian 03/04/20 12000
    Boondock Bistro 02/04/20 10000
    Oliver’s 02/04/20 11000
    Whitefiled Social 02/04/20 10000

  6. @Data Monk Team Why my answer of Q4. is deleted 🙁 ????
    It was correct answer with the desired output attached as well?

  7. Q4.
    Answer : Reposting the solution
    Step1 : Temp Table creation and Data Loading
    /*
    –Syntax of local temp table creation in vertica
    CREATE LOCAL TEMPORARY TABLE Cafe_Accounts
    (Resturant VARCHAR(100),
    Date VARCHAR(100),
    Revenue BIGINT
    )
    ON COMMIT PRESERVE ROWS;

    INSERT INTO Cafe_Accounts
    SELECT ‘Bangalore Mandrian’, ’01/04/20′, 5000 UNION ALL
    SELECT ‘Bangalore Mandrian’, ’02/04/20′, 4000 UNION ALL
    SELECT ‘Bangalore Mandrian’, ’03/04/20′, 3000 UNION ALL
    SELECT ‘Bangalore Mandrian’, ’04/04/20′, 6000 UNION ALL
    SELECT ‘Boondock Bistro’, ’01/04/20′, 7000 UNION ALL
    SELECT ‘Boondock Bistro’, ’02/04/20′, 3000 UNION ALL
    SELECT ‘Boondock Bistro’, ’03/04/20′, 5000 UNION ALL
    SELECT ‘Oliver’s’, ’02/04/20′, 11000 UNION ALL
    SELECT ‘Oliver’s’, ’03/04/20′, 8000 UNION ALL
    SELECT ‘Oliver’s’, ’04/04/20′, 9000 UNION ALL
    SELECT ‘Whitefiled Social’, ’02/04/20′, 10000 UNION ALL
    SELECT ‘Punjabi By Nature’, ’03/04/20′, 8000
    */

    Step2 : Query Writing :
    SELECT Resturant , min(Date) , min(Running_Sum)
    FROM
    (
    SELECT a.Resturant , a.Date , SUM(b.Revenue) as Running_Sum
    FROM Cafe_Accounts a
    INNER JOIN Cafe_Accounts b
    ON b.date = 10000
    GROUP by a.Resturant
    order by a.Resturant;

    Step3: Output we got
    Resturant DATE Running_Sum
    Bangalore Mandrian 03/04/20 12000
    Boondock Bistro 02/04/20 10000
    Oliver’s 02/04/20 11000
    Whitefiled Social 02/04/20 10000

  8. Answer 4:
    Approach 1:
    select max(day) from
    (select restaurant, min(day) as day, total_revenue
    from
    (select *, sum(revenue)
    over (partition by restaurant order by day asc) AS total_revenue
    from business) AS table_a
    where
    total_revenue > 10000
    group by
    restaurant) AS table_b
    ;

    Approach 2:
    SELECT MAX(DATE)
    FROM
    (SELECT DATE
    FROM
    (SELECT *,
    ROW_NUMBER() OVER (PARTITION BY RESTAURANT ORDER BY CUMULATIVE_SUM ASC,DATE ASC) AS ROW
    FROM
    (Select *,
    SUM(REVENUE) OVER (PARTITION BY RESTAURANT ORDER BY DATE ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS CUMULATIVE_SUM
    FROM table) AS INTER_1
    WHERE CUMULATIVE_SUM>10000) AS INTER_2
    WHERE ROW=1) AS INTER_3

  9. Answer 1:

    SELECT seat_no, vacant_seats
    FROM
    ( SELECT date
    ,row_no
    ,seat_no
    ,occupied
    ,name
    ,ROW_NUMBER() OVER (PARTITION BY date, row_no ORDER BY seat_no ASC) AS rn
    ,ROW_NUMBER() OVER (PARTITION BY date, row_no ORDER BY seat_no DESC) AS vacant_seats
    FROM theatre
    ) AS a
    WHERE occupied = ‘No’ AND rn = 1 AND vacant_seats = 2

    — The query is generalized for any number of vacant seats.

  10. A4:
    SELECT resturant,
    MIN(date_1)
    FROM
    (SELECT resturant,
    date_1,
    revenue,
    SUM(revenue)over (partition BY resturant order by date_1 range BETWEEN unbounded preceding AND CURRENT row) AS running_sum
    FROM Cafe_Accounts
    )
    WHERE running_sum >=’10000′
    GROUP BY resturant;

    output :
    Bangalore Mandrian 03/04/20
    Boondock Bistro 02/04/20
    Oliver’s 02/04/20
    Whitefiled Social 02/04/20

  11. Answer 1:
    — I see that there is an issue in my previous answer to this question.
    Here is another attempt. (I know its a bit long, would be glad to hear about a shorter and smarter way about doing this.)

    SELECT
    date
    ,seat_no
    ,vacant_seats
    FROM
    (
    SELECT
    t.*
    ,(COALESCE( LEAD( rn ) OVER (PARTITION BY date, row_no ORDER BY rn) , 7) – rn) AS vacant_seats
    FROM
    (
    SELECT
    date
    ,row_no
    ,seat_no
    ,occupied
    ,name
    ,RANK() OVER (PARTITION BY date, row_no ORDER BY seat_no) AS rn
    ,LAG(occupied) OVER (PARTITION BY date, row_no ORDER BY seat_no) AS lag_occupied
    FROM movie_theater
    ) AS t
    WHERE occupied != lag_occupied
    ) AS t2
    WHERE occupied = ‘No’

  12. Answer 4:

    SELECT Restaurant, MIN(Date) Date, MIN(Running_Sum) TotalRevenue FROM (
    SELECT Restaurant, Date, Running_Sum FROM
    (SELECT Restaurant , Date, SUM(Revenue) OVER(PARTITION BY Restaurant ORDER BY Date) As Running_Sum
    FROM Cafe) As t1
    WHERE Running_Sum >= 10000) As t2
    GROUP BY Restaurant

  13. select seat_no,vaccant_seat from (
    select *,count(csum) over (partition by csum order by csum) -1 vaccant_seat from
    (select *,sum (flag) over(order by seat_no ) csum from (select * ,case when occupied = ‘yes’ then 1
    when occupied = ‘no’ then 0 end Flag ))) where occupied=’Yes’ and vaccant_seat >1;

Comments are closed.