5 Most Important SQL questions for your Data Science Interview

SQL is the bread and butter of an analyst. You can’t survive in the Data Science industry with a grip on this ‘easy-looking’ query language. I have been interviewed for more than 30 companies in the past 3-4 years. SQL rounds are mostly a rapid fire round where you either keep on answering all the answers or start missing them after a threshold.

This is one of those rounds in which you can impress the interviewer. Recently, I have been taking interviews and I can assure you that most of the logics asked in the interviews are repeated, no upto some extent, but completely repeated 😛

Following is the point charter:-
5 correct – SQL God, You are going to nail 9/10 SQL interviews
4 correct – Really Good, clearing the SQL rounds should not be a problem
3 correct – Ummm..Dicey, you should be able to crack a few rounds
2 correct – You need at least 2-3 weeks before you start applying
1 correct – Padh lo Beta, sochna bhi nai apply krne ka
0 correct – Try an MBA/MS in Analytics/SBI PO/UPSC

Please comment your answer or send it directly to me over Linkedin

Q1. Suppose there is a Movie Theatre with 26 rows(A,B,C..Z) and in each row you have 6 seats. The structure of table is given below

Date	Row_No	Seat_No	Occupied	Name
04-Apr-20	A	A1	Yes	Kuchi Bhi
04-Apr-20	A	A2	Yes	Kuchi Bhi
04-Apr-20	A	A3	No
04-Apr-20	A	A4	No
04-Apr-20	A	A5	No
04-Apr-20	A	A6	No
04-Apr-20	B	B1	Yes	Kuchi Bhi
04-Apr-20	B	B2	No
04-Apr-20	B	B3	No
04-Apr-20	B	B4	Yes	Kuch Bhi
04-Apr-20	B	B5	No
04-Apr-20	B	B6	Yes	Kuchi Bhi

Table Name – PVR

First, let me know all the starting seats where the number of consecutive vacant seats is 2 (B2 here)
Secondly, write a generalised approach to solve for any number of vacant seats. Basically, you need to create a table with two columns,
1. Seat_No
2. Number of consecutive vacant seats

Q2. There are multiple ways to get the 3rd highest salary, write down at least three. This question is important because the moment you tell the interviewer the first way, he/she will ask to solve the same in any other way

Hint –
1. Naive Approach
2. Inner Query
3. Ranking

Q3. I don’t remember any interview which doesn’t have this question

Table A		Table B
1		1
1		1
1		1
1
1

There are two tables, column name in Table A is X and in Table B is Y

How many rows will the resultant have, if you do:-
a. inner join
b. left join
c. right join
d. outer join
e. cross join

Justify your answer

Q4. I own 5 restaurants in Bangalore, following is the table of business, get me the date on which each of these restaurant crossed a total revenue of Rs.10,000

Restaurant	Date	Revenue
Bangalore Mandrian	01/04/20	5000
Bangalore Mandrian	02/04/20	4000
Bangalore Mandrian	03/04/20	3000
Bangalore Mandrian	04/04/20	6000
Boondock Bistro	01/04/20	7000
Boondock Bistro	02/04/20	3000
Boondock Bistro	03/04/20	5000
Oliver’s	02/04/20	11000
Oliver’s	03/04/20	8000
Oliver’s	04/04/20	9000
Whitefiled Social	02/04/20	10000
Punjabi By Nature	03/04/20	8000

Q5. There is a table with employee and manager information, the schema is given below. Write a query to get the EmployeeName and ManagerName.

Hint – Consider the edge cases, i.e. your query should cater the Manager Name of the CEO as well

You can access – 300+ Data Science Interview Questions covering SQL,R,Python,Case Studies, Guesstimates, Statistics, and Machine Learning Questions in the embedded link.

Keep Learning 🙂

The Data Monk

The Data Monk services

We are well known for our interview books and have 70+ e-book across Amazon and The Data Monk e-shop page . Following are best-seller combo packs and services that we are providing as of now

YouTube channel covering all the interview-related important topics in SQL, Python, MS Excel, Machine Learning Algorithm, Statistics, and Direct Interview Questions
Link – The Data Monk Youtube Channel
Website – ~2000 completed solved Interview questions in SQL, Python, ML, and Case Study
Link – The Data Monk website
E-book shop – We have 70+ e-books available on our website and 3 bundles covering 2000+ solved interview questions. Do check it out
Link – The Data E-shop Page
Instagram Page – It covers only Most asked Questions and concepts (100+ posts). We have 100+ most asked interview topics explained in simple terms
Link – The Data Monk Instagram page
Mock Interviews/Career Guidance/Mentorship/Resume Making
Book a slot on Top Mate

The Data Monk e-books

We know that each domain requires a different type of preparation, so we have divided our books in the same way:

1. 2200 Interview Questions to become Full Stack Analytics Professional – 2200 Most Asked Interview Questions
2.Data Scientist and Machine Learning Engineer -> 23 e-books covering all the ML Algorithms Interview Questions
3. 30 Days Analytics Course – Most Asked Interview Questions from 30 crucial topics

You can check out all the other e-books on our e-shop page – Do not miss it

For any information related to courses or e-books, please send an email to [email protected]

Comments ( 18 )

Pradeep Jain
May 17, 2020 at 5:37 pm

I have made my attempt at these questions. I want to find the author’s approach to these queries, where can I find that?

TIA
- xtramous1 Grand Master
  May 17, 2020 at 5:45 pm
  
  Please post your answers in the comment section. We will evaluate 🙂
Payal Chuhan Member
June 22, 2020 at 7:10 pm

Q2. 3rd Highest Salary

— creating Employee temporary table
CREATE TABLE Employee (name varchar(10), salary int);

— inserting sample data into Employee table
INSERT INTO Employee VALUES (‘Rohit’, 3000);
INSERT INTO Employee VALUES (‘John’, 4000);
INSERT INTO Employee VALUES (‘Shaan’, 3000);
INSERT INTO Employee VALUES (‘Christopher’, 5000);
INSERT INTO Employee VALUES (‘Mike’, 7000);

Assuming N= 3

1. Inner Query Method. Generic solution for most DBs
–1.1. 3rd highest , use n-1 = 2 with condition Emp2.Salary > Emp1.Salary
SELECT *
FROM Employee Emp1
WHERE (2) = (
SELECT COUNT(DISTINCT(Emp2.Salary))
FROM Employee Emp2
WHERE Emp2.Salary > Emp1.Salary)

–1.2 3rd highest , use n = 3 with condition Emp2.Salary >= Emp1.Salary
SELECT *
FROM Employee Emp1
WHERE (2) = (
SELECT COUNT(DISTINCT(Emp2.Salary))
FROM Employee Emp2
WHERE Emp2.Salary > Emp1.Salary)

2.Using Ranking Function. We should use DENSE_RANK() instead of ROW_NUMBER() or RANK() because in case of repetitive salary dense rank will give correct results.

SELECT * FROM
( SELECT e.*, DENSE_RANK() OVER
(ORDER BY salary DESC) rn FROM Employee e )
WHERE rn = 3;

3. Naive Methods :

3.1 SELECT salary FROM Employee ORDER BY salary DESC LIMIT N-1, 1 — works for mysql
3.2 SELECT TOP 1 salary FROM ( SELECT DISTINCT TOP 3 salary FROM Employee ORDER BY salary DESC ) AS temp ORDER BY salary — works for sql server
- Amit
  July 12, 2020 at 2:59 pm
  
  We can do it by FIRST_VALUE() function too , if you are using PostgresSQL.
Payal Chuhan Member
June 22, 2020 at 7:19 pm

Hi Data Monk Team , Please clear that In question 3 table B has 3 records only with 1,1,1 or is there any null also to consider (1,1,1,null,null) in tableB ??
Payal Chuhan Member
June 23, 2020 at 8:18 am

Q3.
/*
— Tables creation
CREATE LOCAL TEMPORARY TABLE t1
(ID INT)
ON COMMIT PRESERVE ROWS;

CREATE LOCAL TEMPORARY TABLE t2
(ID INT)
ON COMMIT PRESERVE ROWS;

INSERT INTO t1
select 1
union all
select 1
union all
select 1
union all
select 1
union all
select 1

INSERT INTO t2
select 1
union all
select 1
union all
select 1 ;

*/

ANS : /* Below OUTPUT is same for all the joins. Reason is explained further.
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
*/

Total Number of Rows = 15 , because in inner join for every 1 in table t1 join with every 1 in table t2
5 ones in t1 , 3 ones in t2. total rows = 5*3 = 15

Total Number of Rows = 15 , because in left join all records of table t1 will join with matching records in table t2
all 5 ones in table t1 will match with 3 ones in t2. total rows = 5*3 = 15

Total Number of Rows = 15 , because in right join all records of table t2 will join with matching records in table t1
all 3 ones in table t2 will match with 5 ones in t1. total rows = 5*3 = 15

Total Number of Rows = 15 , because in outer join/full join all records of table t1 will join with all records in table t2
all 5 ones in table t1 will match with all 3 ones in t2. total rows = 5*3 = 15

Total Number of Rows = 15 , cross join is simple cartesian product of both number of records in table.
5 records in table t1 multiply with 3 records in table t2. total rows = 5*3 = 15
- TheDataMonk Grand Master
  June 23, 2020 at 9:09 am
  
  This is superb explanation 🙂
Payal Chuhan Member
June 23, 2020 at 10:19 am

Hi Data Monk Team , Please delete previous post as it was incomplete and get posted automatically while i was writing. This is the correct answer.
Q4.
Answer :

Step1 : Temp Table creation and Data Loading
/*
–Syntax of local temp table creation in vertica
CREATE LOCAL TEMPORARY TABLE Cafe_Accounts
(Resturant VARCHAR(100),
Date VARCHAR(100),
Revenue BIGINT
)
ON COMMIT PRESERVE ROWS;

INSERT INTO Cafe_Accounts
SELECT ‘Bangalore Mandrian’, ’01/04/20′, 5000 UNION ALL
SELECT ‘Bangalore Mandrian’, ’02/04/20′, 4000 UNION ALL
SELECT ‘Bangalore Mandrian’, ’03/04/20′, 3000 UNION ALL
SELECT ‘Bangalore Mandrian’, ’04/04/20′, 6000 UNION ALL
SELECT ‘Boondock Bistro’, ’01/04/20′, 7000 UNION ALL
SELECT ‘Boondock Bistro’, ’02/04/20′, 3000 UNION ALL
SELECT ‘Boondock Bistro’, ’03/04/20′, 5000 UNION ALL
SELECT ‘Oliver’s’, ’02/04/20′, 11000 UNION ALL
SELECT ‘Oliver’s’, ’03/04/20′, 8000 UNION ALL
SELECT ‘Oliver’s’, ’04/04/20′, 9000 UNION ALL
SELECT ‘Whitefiled Social’, ’02/04/20′, 10000 UNION ALL
SELECT ‘Punjabi By Nature’, ’03/04/20′, 8000
*/

Step2 : Query Writing :
SELECT Resturant , min(Date) as DATE , min(Running_Sum) AS Running_Sum
FROM
(
SELECT a.Resturant , a.Date , SUM(b.Revenue) as Running_Sum
FROM Cafe_Accounts a
INNER JOIN Cafe_Accounts b
ON b.date = 10000
GROUP by a.Resturant
order by a.Resturant;

Step3: Output we got
Resturant DATE Running_Sum
Bangalore Mandrian 03/04/20 12000
Boondock Bistro 02/04/20 10000
Oliver’s 02/04/20 11000
Whitefiled Social 02/04/20 10000
Payal Chuhan Member
June 23, 2020 at 12:53 pm

TheDataMonk Monk Team Why my answer of Q4. is deleted 🙁 ????
It was correct answer with the desired output attached as well?
Payal Chuhan Member
June 23, 2020 at 12:57 pm

Q4.
Answer : Reposting the solution
Step1 : Temp Table creation and Data Loading
/*
–Syntax of local temp table creation in vertica
CREATE LOCAL TEMPORARY TABLE Cafe_Accounts
(Resturant VARCHAR(100),
Date VARCHAR(100),
Revenue BIGINT
)
ON COMMIT PRESERVE ROWS;

INSERT INTO Cafe_Accounts
SELECT ‘Bangalore Mandrian’, ’01/04/20′, 5000 UNION ALL
SELECT ‘Bangalore Mandrian’, ’02/04/20′, 4000 UNION ALL
SELECT ‘Bangalore Mandrian’, ’03/04/20′, 3000 UNION ALL
SELECT ‘Bangalore Mandrian’, ’04/04/20′, 6000 UNION ALL
SELECT ‘Boondock Bistro’, ’01/04/20′, 7000 UNION ALL
SELECT ‘Boondock Bistro’, ’02/04/20′, 3000 UNION ALL
SELECT ‘Boondock Bistro’, ’03/04/20′, 5000 UNION ALL
SELECT ‘Oliver’s’, ’02/04/20′, 11000 UNION ALL
SELECT ‘Oliver’s’, ’03/04/20′, 8000 UNION ALL
SELECT ‘Oliver’s’, ’04/04/20′, 9000 UNION ALL
SELECT ‘Whitefiled Social’, ’02/04/20′, 10000 UNION ALL
SELECT ‘Punjabi By Nature’, ’03/04/20′, 8000
*/

Step2 : Query Writing :
SELECT Resturant , min(Date) , min(Running_Sum)
FROM
(
SELECT a.Resturant , a.Date , SUM(b.Revenue) as Running_Sum
FROM Cafe_Accounts a
INNER JOIN Cafe_Accounts b
ON b.date = 10000
GROUP by a.Resturant
order by a.Resturant;

Step3: Output we got
Resturant DATE Running_Sum
Bangalore Mandrian 03/04/20 12000
Boondock Bistro 02/04/20 10000
Oliver’s 02/04/20 11000
Whitefiled Social 02/04/20 10000
Aayush
July 8, 2020 at 5:12 pm

Answer 4:
Approach 1:
select max(day) from
(select restaurant, min(day) as day, total_revenue
from
(select *, sum(revenue)
over (partition by restaurant order by day asc) AS total_revenue
from business) AS table_a
where
total_revenue > 10000
group by
restaurant) AS table_b
;

Approach 2:
SELECT MAX(DATE)
FROM
(SELECT DATE
FROM
(SELECT *,
ROW_NUMBER() OVER (PARTITION BY RESTAURANT ORDER BY CUMULATIVE_SUM ASC,DATE ASC) AS ROW
FROM
(Select *,
SUM(REVENUE) OVER (PARTITION BY RESTAURANT ORDER BY DATE ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS CUMULATIVE_SUM
FROM table) AS INTER_1
WHERE CUMULATIVE_SUM>10000) AS INTER_2
WHERE ROW=1) AS INTER_3
Himanshu Sharma Member
July 12, 2020 at 9:24 pm

Answer 1:

SELECT seat_no, vacant_seats
FROM
( SELECT date
,row_no
,seat_no
,occupied
,name
,ROW_NUMBER() OVER (PARTITION BY date, row_no ORDER BY seat_no ASC) AS rn
,ROW_NUMBER() OVER (PARTITION BY date, row_no ORDER BY seat_no DESC) AS vacant_seats
FROM theatre
) AS a
WHERE occupied = ‘No’ AND rn = 1 AND vacant_seats = 2

— The query is generalized for any number of vacant seats.
Himanshu Sharma Member
July 12, 2020 at 9:40 pm

*** Forgot to add “occupied” column in the PARTITION BY clause.
Vinod K
July 13, 2020 at 7:15 am

A4:
SELECT resturant,
MIN(date_1)
FROM
(SELECT resturant,
date_1,
revenue,
SUM(revenue)over (partition BY resturant order by date_1 range BETWEEN unbounded preceding AND CURRENT row) AS running_sum
FROM Cafe_Accounts
)
WHERE running_sum >=’10000′
GROUP BY resturant;

output :
Bangalore Mandrian 03/04/20
Boondock Bistro 02/04/20
Oliver’s 02/04/20
Whitefiled Social 02/04/20
Himanshu Sharma Member
July 15, 2020 at 7:31 pm

Answer 1:
— I see that there is an issue in my previous answer to this question.
Here is another attempt. (I know its a bit long, would be glad to hear about a shorter and smarter way about doing this.)

SELECT
date
,seat_no
,vacant_seats
FROM
(
SELECT
t.*
,(COALESCE( LEAD( rn ) OVER (PARTITION BY date, row_no ORDER BY rn) , 7) – rn) AS vacant_seats
FROM
(
SELECT
date
,row_no
,seat_no
,occupied
,name
,RANK() OVER (PARTITION BY date, row_no ORDER BY seat_no) AS rn
,LAG(occupied) OVER (PARTITION BY date, row_no ORDER BY seat_no) AS lag_occupied
FROM movie_theater
) AS t
WHERE occupied != lag_occupied
) AS t2
WHERE occupied = ‘No’
- Ananya
  June 16, 2021 at 6:54 am
  
  The previous answer works well
Harshit Goyal Contributor
July 31, 2020 at 11:27 am

Answer 4:

SELECT Restaurant, MIN(Date) Date, MIN(Running_Sum) TotalRevenue FROM (
SELECT Restaurant, Date, Running_Sum FROM
(SELECT Restaurant , Date, SUM(Revenue) OVER(PARTITION BY Restaurant ORDER BY Date) As Running_Sum
FROM Cafe) As t1
WHERE Running_Sum >= 10000) As t2
GROUP BY Restaurant
Ari_Klaus Newbie
August 11, 2020 at 8:10 am

select seat_no,vaccant_seat from (
select *,count(csum) over (partition by csum order by csum) -1 vaccant_seat from
(select *,sum (flag) over(order by seat_no ) csum from (select * ,case when occupied = ‘yes’ then 1
when occupied = ‘no’ then 0 end Flag ))) where occupied=’Yes’ and vaccant_seat >1;

Register Now

Login

Lost Password