SQL is the bread and butter of an analyst. You can’t survive in the Data Science industry with a grip on this ‘easy-looking’ query language. I have been interviewed for more than 30 companies in the past 3-4 years. SQL rounds are mostly a rapid fire round where you either keep on answering all the answers or start missing them after a threshold.
This is one of those rounds in which you can impress the interviewer. Recently, I have been taking interviews and I can assure you that most of the logics asked in the interviews are repeated, no upto some extent, but completely repeated 😛
Following is the point charter:-
5 correct – SQL God, You are going to nail 9/10 SQL interviews
4 correct – Really Good, clearing the SQL rounds should not be a problem
3 correct – Ummm..Dicey, you should be able to crack a few rounds
2 correct – You need at least 2-3 weeks before you start applying
1 correct – Padh lo Beta, sochna bhi nai apply krne ka
0 correct – Try an MBA/MS in Analytics/SBI PO/UPSC
Please comment your answer or send it directly to me over Linkedin
Q1. Suppose there is a Movie Theatre with 26 rows(A,B,C..Z) and in each row you have 6 seats. The structure of table is given below
Date | Row_No | Seat_No | Occupied | Name |
04-Apr-20 | A | A1 | Yes | Kuchi Bhi |
04-Apr-20 | A | A2 | Yes | Kuchi Bhi |
04-Apr-20 | A | A3 | No | |
04-Apr-20 | A | A4 | No | |
04-Apr-20 | A | A5 | No | |
04-Apr-20 | A | A6 | No | |
04-Apr-20 | B | B1 | Yes | Kuchi Bhi |
04-Apr-20 | B | B2 | No | |
04-Apr-20 | B | B3 | No | |
04-Apr-20 | B | B4 | Yes | Kuch Bhi |
04-Apr-20 | B | B5 | No | |
04-Apr-20 | B | B6 | Yes | Kuchi Bhi |
First, let me know all the starting seats where the number of consecutive vacant seats is 2 (B2 here)
Secondly, write a generalised approach to solve for any number of vacant seats. Basically, you need to create a table with two columns,
1. Seat_No
2. Number of consecutive vacant seats
Q2. There are multiple ways to get the 3rd highest salary, write down at least three. This question is important because the moment you tell the interviewer the first way, he/she will ask to solve the same in any other way
Hint –
1. Naive Approach
2. Inner Query
3. Ranking
Q3. I don’t remember any interview which doesn’t have this question
Table A | Table B | |
1 | 1 | |
1 | 1 | |
1 | 1 | |
1 | ||
1 |
There are two tables, column name in Table A is X and in Table B is Y
How many rows will the resultant have, if you do:-
a. inner join
b. left join
c. right join
d. outer join
e. cross join
Justify your answer
Q4. I own 5 restaurants in Bangalore, following is the table of business, get me the date on which each of these restaurant crossed a total revenue of Rs.10,000
Restaurant | Date | Revenue |
Bangalore Mandrian | 01/04/20 | 5000 |
Bangalore Mandrian | 02/04/20 | 4000 |
Bangalore Mandrian | 03/04/20 | 3000 |
Bangalore Mandrian | 04/04/20 | 6000 |
Boondock Bistro | 01/04/20 | 7000 |
Boondock Bistro | 02/04/20 | 3000 |
Boondock Bistro | 03/04/20 | 5000 |
Oliver’s | 02/04/20 | 11000 |
Oliver’s | 03/04/20 | 8000 |
Oliver’s | 04/04/20 | 9000 |
Whitefiled Social | 02/04/20 | 10000 |
Punjabi By Nature | 03/04/20 | 8000 |
Q5. There is a table with employee and manager information, the schema is given below. Write a query to get the EmployeeName and ManagerName.
Hint – Consider the edge cases, i.e. your query should cater the Manager Name of the CEO as well
You can access – 300+ Data Science Interview Questions covering SQL,R,Python,Case Studies, Guesstimates, Statistics, and Machine Learning Questions in the embedded link.
Keep Learning 🙂
The Data Monk
The Data Monk services
We are well known for our interview books and have 70+ e-book across Amazon and The Data Monk e-shop page . Following are best-seller combo packs and services that we are providing as of now
- YouTube channel covering all the interview-related important topics in SQL, Python, MS Excel, Machine Learning Algorithm, Statistics, and Direct Interview Questions
Link – The Data Monk Youtube Channel - Website – ~2000 completed solved Interview questions in SQL, Python, ML, and Case Study
Link – The Data Monk website - E-book shop – We have 70+ e-books available on our website and 3 bundles covering 2000+ solved interview questions. Do check it out
Link – The Data E-shop Page - Instagram Page – It covers only Most asked Questions and concepts (100+ posts). We have 100+ most asked interview topics explained in simple terms
Link – The Data Monk Instagram page - Mock Interviews/Career Guidance/Mentorship/Resume Making
Book a slot on Top Mate
The Data Monk e-books
We know that each domain requires a different type of preparation, so we have divided our books in the same way:
1. 2200 Interview Questions to become Full Stack Analytics Professional – 2200 Most Asked Interview Questions
2.Data Scientist and Machine Learning Engineer -> 23 e-books covering all the ML Algorithms Interview Questions
3. 30 Days Analytics Course – Most Asked Interview Questions from 30 crucial topics
You can check out all the other e-books on our e-shop page – Do not miss it
For any information related to courses or e-books, please send an email to [email protected]
Comments ( 18 )
I have made my attempt at these questions. I want to find the author’s approach to these queries, where can I find that?
TIA
Please post your answers in the comment section. We will evaluate 🙂
Q2. 3rd Highest Salary
— creating Employee temporary table
CREATE TABLE Employee (name varchar(10), salary int);
— inserting sample data into Employee table
INSERT INTO Employee VALUES (‘Rohit’, 3000);
INSERT INTO Employee VALUES (‘John’, 4000);
INSERT INTO Employee VALUES (‘Shaan’, 3000);
INSERT INTO Employee VALUES (‘Christopher’, 5000);
INSERT INTO Employee VALUES (‘Mike’, 7000);
Assuming N= 3
1. Inner Query Method. Generic solution for most DBs
–1.1. 3rd highest , use n-1 = 2 with condition Emp2.Salary > Emp1.Salary
SELECT *
FROM Employee Emp1
WHERE (2) = (
SELECT COUNT(DISTINCT(Emp2.Salary))
FROM Employee Emp2
WHERE Emp2.Salary > Emp1.Salary)
–1.2 3rd highest , use n = 3 with condition Emp2.Salary >= Emp1.Salary
SELECT *
FROM Employee Emp1
WHERE (2) = (
SELECT COUNT(DISTINCT(Emp2.Salary))
FROM Employee Emp2
WHERE Emp2.Salary > Emp1.Salary)
2.Using Ranking Function. We should use DENSE_RANK() instead of ROW_NUMBER() or RANK() because in case of repetitive salary dense rank will give correct results.
SELECT * FROM
( SELECT e.*, DENSE_RANK() OVER
(ORDER BY salary DESC) rn FROM Employee e )
WHERE rn = 3;
3. Naive Methods :
3.1 SELECT salary FROM Employee ORDER BY salary DESC LIMIT N-1, 1 — works for mysql
3.2 SELECT TOP 1 salary FROM ( SELECT DISTINCT TOP 3 salary FROM Employee ORDER BY salary DESC ) AS temp ORDER BY salary — works for sql server
We can do it by FIRST_VALUE() function too , if you are using PostgresSQL.
Hi Data Monk Team , Please clear that In question 3 table B has 3 records only with 1,1,1 or is there any null also to consider (1,1,1,null,null) in tableB ??
Q3.
/*
— Tables creation
CREATE LOCAL TEMPORARY TABLE t1
(ID INT)
ON COMMIT PRESERVE ROWS;
CREATE LOCAL TEMPORARY TABLE t2
(ID INT)
ON COMMIT PRESERVE ROWS;
INSERT INTO t1
select 1
union all
select 1
union all
select 1
union all
select 1
union all
select 1
INSERT INTO t2
select 1
union all
select 1
union all
select 1 ;
*/
ANS : /* Below OUTPUT is same for all the joins. Reason is explained further.
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
*/
Total Number of Rows = 15 , because in inner join for every 1 in table t1 join with every 1 in table t2
5 ones in t1 , 3 ones in t2. total rows = 5*3 = 15
Total Number of Rows = 15 , because in left join all records of table t1 will join with matching records in table t2
all 5 ones in table t1 will match with 3 ones in t2. total rows = 5*3 = 15
Total Number of Rows = 15 , because in right join all records of table t2 will join with matching records in table t1
all 3 ones in table t2 will match with 5 ones in t1. total rows = 5*3 = 15
Total Number of Rows = 15 , because in outer join/full join all records of table t1 will join with all records in table t2
all 5 ones in table t1 will match with all 3 ones in t2. total rows = 5*3 = 15
Total Number of Rows = 15 , cross join is simple cartesian product of both number of records in table.
5 records in table t1 multiply with 3 records in table t2. total rows = 5*3 = 15
This is superb explanation 🙂
Hi Data Monk Team , Please delete previous post as it was incomplete and get posted automatically while i was writing. This is the correct answer.
Q4.
Answer :
Step1 : Temp Table creation and Data Loading
/*
–Syntax of local temp table creation in vertica
CREATE LOCAL TEMPORARY TABLE Cafe_Accounts
(Resturant VARCHAR(100),
Date VARCHAR(100),
Revenue BIGINT
)
ON COMMIT PRESERVE ROWS;
INSERT INTO Cafe_Accounts
SELECT ‘Bangalore Mandrian’, ’01/04/20′, 5000 UNION ALL
SELECT ‘Bangalore Mandrian’, ’02/04/20′, 4000 UNION ALL
SELECT ‘Bangalore Mandrian’, ’03/04/20′, 3000 UNION ALL
SELECT ‘Bangalore Mandrian’, ’04/04/20′, 6000 UNION ALL
SELECT ‘Boondock Bistro’, ’01/04/20′, 7000 UNION ALL
SELECT ‘Boondock Bistro’, ’02/04/20′, 3000 UNION ALL
SELECT ‘Boondock Bistro’, ’03/04/20′, 5000 UNION ALL
SELECT ‘Oliver’s’, ’02/04/20′, 11000 UNION ALL
SELECT ‘Oliver’s’, ’03/04/20′, 8000 UNION ALL
SELECT ‘Oliver’s’, ’04/04/20′, 9000 UNION ALL
SELECT ‘Whitefiled Social’, ’02/04/20′, 10000 UNION ALL
SELECT ‘Punjabi By Nature’, ’03/04/20′, 8000
*/
Step2 : Query Writing :
SELECT Resturant , min(Date) as DATE , min(Running_Sum) AS Running_Sum
FROM
(
SELECT a.Resturant , a.Date , SUM(b.Revenue) as Running_Sum
FROM Cafe_Accounts a
INNER JOIN Cafe_Accounts b
ON b.date = 10000
GROUP by a.Resturant
order by a.Resturant;
Step3: Output we got
Resturant DATE Running_Sum
Bangalore Mandrian 03/04/20 12000
Boondock Bistro 02/04/20 10000
Oliver’s 02/04/20 11000
Whitefiled Social 02/04/20 10000
TheDataMonk Monk Team Why my answer of Q4. is deleted 🙁 ????
It was correct answer with the desired output attached as well?
Q4.
Answer : Reposting the solution
Step1 : Temp Table creation and Data Loading
/*
–Syntax of local temp table creation in vertica
CREATE LOCAL TEMPORARY TABLE Cafe_Accounts
(Resturant VARCHAR(100),
Date VARCHAR(100),
Revenue BIGINT
)
ON COMMIT PRESERVE ROWS;
INSERT INTO Cafe_Accounts
SELECT ‘Bangalore Mandrian’, ’01/04/20′, 5000 UNION ALL
SELECT ‘Bangalore Mandrian’, ’02/04/20′, 4000 UNION ALL
SELECT ‘Bangalore Mandrian’, ’03/04/20′, 3000 UNION ALL
SELECT ‘Bangalore Mandrian’, ’04/04/20′, 6000 UNION ALL
SELECT ‘Boondock Bistro’, ’01/04/20′, 7000 UNION ALL
SELECT ‘Boondock Bistro’, ’02/04/20′, 3000 UNION ALL
SELECT ‘Boondock Bistro’, ’03/04/20′, 5000 UNION ALL
SELECT ‘Oliver’s’, ’02/04/20′, 11000 UNION ALL
SELECT ‘Oliver’s’, ’03/04/20′, 8000 UNION ALL
SELECT ‘Oliver’s’, ’04/04/20′, 9000 UNION ALL
SELECT ‘Whitefiled Social’, ’02/04/20′, 10000 UNION ALL
SELECT ‘Punjabi By Nature’, ’03/04/20′, 8000
*/
Step2 : Query Writing :
SELECT Resturant , min(Date) , min(Running_Sum)
FROM
(
SELECT a.Resturant , a.Date , SUM(b.Revenue) as Running_Sum
FROM Cafe_Accounts a
INNER JOIN Cafe_Accounts b
ON b.date = 10000
GROUP by a.Resturant
order by a.Resturant;
Step3: Output we got
Resturant DATE Running_Sum
Bangalore Mandrian 03/04/20 12000
Boondock Bistro 02/04/20 10000
Oliver’s 02/04/20 11000
Whitefiled Social 02/04/20 10000
Answer 4:
Approach 1:
select max(day) from
(select restaurant, min(day) as day, total_revenue
from
(select *, sum(revenue)
over (partition by restaurant order by day asc) AS total_revenue
from business) AS table_a
where
total_revenue > 10000
group by
restaurant) AS table_b
;
Approach 2:
SELECT MAX(DATE)
FROM
(SELECT DATE
FROM
(SELECT *,
ROW_NUMBER() OVER (PARTITION BY RESTAURANT ORDER BY CUMULATIVE_SUM ASC,DATE ASC) AS ROW
FROM
(Select *,
SUM(REVENUE) OVER (PARTITION BY RESTAURANT ORDER BY DATE ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS CUMULATIVE_SUM
FROM table) AS INTER_1
WHERE CUMULATIVE_SUM>10000) AS INTER_2
WHERE ROW=1) AS INTER_3
Answer 1:
SELECT seat_no, vacant_seats
FROM
( SELECT date
,row_no
,seat_no
,occupied
,name
,ROW_NUMBER() OVER (PARTITION BY date, row_no ORDER BY seat_no ASC) AS rn
,ROW_NUMBER() OVER (PARTITION BY date, row_no ORDER BY seat_no DESC) AS vacant_seats
FROM theatre
) AS a
WHERE occupied = ‘No’ AND rn = 1 AND vacant_seats = 2
— The query is generalized for any number of vacant seats.
*** Forgot to add “occupied” column in the PARTITION BY clause.
A4:
SELECT resturant,
MIN(date_1)
FROM
(SELECT resturant,
date_1,
revenue,
SUM(revenue)over (partition BY resturant order by date_1 range BETWEEN unbounded preceding AND CURRENT row) AS running_sum
FROM Cafe_Accounts
)
WHERE running_sum >=’10000′
GROUP BY resturant;
output :
Bangalore Mandrian 03/04/20
Boondock Bistro 02/04/20
Oliver’s 02/04/20
Whitefiled Social 02/04/20
Answer 1:
— I see that there is an issue in my previous answer to this question.
Here is another attempt. (I know its a bit long, would be glad to hear about a shorter and smarter way about doing this.)
SELECT
date
,seat_no
,vacant_seats
FROM
(
SELECT
t.*
,(COALESCE( LEAD( rn ) OVER (PARTITION BY date, row_no ORDER BY rn) , 7) – rn) AS vacant_seats
FROM
(
SELECT
date
,row_no
,seat_no
,occupied
,name
,RANK() OVER (PARTITION BY date, row_no ORDER BY seat_no) AS rn
,LAG(occupied) OVER (PARTITION BY date, row_no ORDER BY seat_no) AS lag_occupied
FROM movie_theater
) AS t
WHERE occupied != lag_occupied
) AS t2
WHERE occupied = ‘No’
The previous answer works well
Answer 4:
SELECT Restaurant, MIN(Date) Date, MIN(Running_Sum) TotalRevenue FROM (
SELECT Restaurant, Date, Running_Sum FROM
(SELECT Restaurant , Date, SUM(Revenue) OVER(PARTITION BY Restaurant ORDER BY Date) As Running_Sum
FROM Cafe) As t1
WHERE Running_Sum >= 10000) As t2
GROUP BY Restaurant
select seat_no,vaccant_seat from (
select *,count(csum) over (partition by csum order by csum) -1 vaccant_seat from
(select *,sum (flag) over(order by seat_no ) csum from (select * ,case when occupied = ‘yes’ then 1
when occupied = ‘no’ then 0 end Flag ))) where occupied=’Yes’ and vaccant_seat >1;