SQL interview questions for Data Science and Business Analyst role

Before you start with the questions, make sure you have your basics clear. Look for explanation if you don’t have enough context about some syntax or function. We have provided the table schema wherever required, in case the schema is not provided, please assume the table and column names.

1. How to find the third highest salary in an employee table with employee number and employee salary?

SELECT * FROM Employee_Table t1
WHERE 3 =
(SELECT Count(distinct Salary) from Employee_Table t2 WHERE t1.Salary <= t2.Salary);

2. Write a query to find maximum salary of each department in an organisation.

SELECT Department_Name, Max(Salary)
FROM Department_Table
GROUP BY Department_Name

3.What is wrong with the following query?
SELECT Id, Year(PaymentDate) as PaymentYear
FROM Bill_Table
WHERE PaymentYear > 2018;

Though the variable PaymentYear has already been defined in the first line of the query, but this is not the correct logical process order. The correct query will be
SELECT Id, Year(PaymentDate) as PaymentYear
FROM Bill_Table
WHERE Year(PaymentDate) > 2018;

4. What is the order of execution in a query?

The order of query goes like this:-
FROM – Choose and join tables to get the raw data
WHERE – First filtering condition
GROUP BY – Aggregates the base data
HAVING – Apply condition on the base data
SELECT – Return the final data
ORDER BY – Sort the final data
LIMIT – Apply limit to the returned data

5.What is ROW_NUMBER() function?

It assigns a unique id to each row returned from the query ,even if the ids are the same. Sample query:-

SELECT emp.*,
row_number() over (order by salary DESC) Row_Number
from Employee emp;

Employee NameSalaryRow_Number
Amit70001
Bhargav60002
Chirag60003
Dinesh50004
Esha30005
Farhan30006

Even when the salary is the same for Bhargav and Chirag, they have a different Row_Number, this means that the function row_number just gives a number to every row

6. What is RANK() function?

RANK() function is used to give a rank and not a row number to the data set. The basic difference between RANK() and ROW_NUMBER is that Rank will give equal number/rank to the data points with same value. In the above case, RANK() will give a value of 2 to both Bhargav and Chirag and thus will rank Dinesh as 4. Similarly, it will give rank 5 to both Esha and Farhan.

SELECT emp.*,
RANK() over (order by salary DESC) Ranking
from Employee emp;

7. What is NTILE() function?

NTILE() function distributes the rows in an ordered partition into a specific number of groups. These groups are numbered. For example, NTILE(5) will divide a result set of 10 records into 5 groups with 2 record per group. If the number of records is not divided equally in the given group, the function will set more record to the starting groups and less to the following groups.

SELECT emp.*,
NTILE(3) over (order by salary DESC) as GeneratedRank
from Employee emp

This will divide the complete data set in 3 groups from top. So the GeneratedRank will be 1 for Amit and Bhargav, 2 for Chirag and Dinesh: 3 for Esha and Farhan

8. What is DENSE_RANK() ?

This gives the rank of each row within a result set partition, with no gaps in the ranking values. Basically there is no gap, so if the top 2 employees have the same salary then they will get the same rank i.e. 1 , much like the RANK() function. But, the third person will get a rank of 2 in DENSE_RANK as there is no gap in ranking where as the third person will get a rank of 3 when we use RANK() function. Syntax below:-

SELECT emp.*,
DENSE_RANK() OVER (order by salary DESC) DenseRank
from Employee emp;

9. Write a query to get employees name starting with vowels.

SELECT EmpID,EmpName
FROM Employee
where EmpName like ‘[aeiou]%’

10. Write a query to get employee name starting and ending with vowels.

SELECT EmpID,EmpName
FROM Employee
where EmpName like ‘[aeiou]%[aeiou]’

11. What are the different types of statements supported in SQL?

There are three types of statements in SQL:-
a. DDL – Data Definition Language
b. DML – Data Manipulation Language
c. DCL – Data Control Language

12. What is DDL?

It is used to define the database structure such as tables. It includes 3 commands:-
a. Create – Create is for creating tables
CREATE TABLE table_name (
    column1 datatype,
    column2 datatype,
    column3 datatype,
   ….
);
b. Alter – Alter table is used to modifying the existing table object in the database.
ALTER TABLE table_name
ADD column_name datatype
c. Drop – If you drop a table, all the rows in the table is deleted and the table structure is removed from the database. Once the table is dropped, you can’t get it back

13. What is DML?

Data Manipulation Language is used to manipulate the data in records. Commonly used DML commands are Update, Insert and Delete. Sometimes SELECT command is also referred as a Data Manipulation Language.

14. What is DCL?

Data Control Language is used to control the privileges by granting and revoking database access permission

15. What is the difference between DELETE and TRUNCATE?

16. What are the important SQL aggregate functions?
a. AVG()
b. COUNT()
c. MAX()
d. MIN()
e. SUM()

17. What is Normalization? How many Normalization forms are there?

Normalization is used to organize the data in such manner that data redundancy will never occur in the database and avoid insert, update and delete anomalies.
There are 5 forms of Normalization
First Normal Form (1NF): It removes all duplicate columns from the table. Creates a table for related data and identifies unique column values
Second Normal Form (2NF): Follows 1NF and creates and places data subsets in an individual table and defines the relationship between tables using a primary key
Third Normal Form (3NF): Follows 2NF and removes those columns which are not related through primary key
Fourth Normal Form (4NF): Follows 3NF and do not define multi-valued dependencies. 4NF also known as BCNF

18. How to fetch only common records between two tables?

SELECT * FROM Employee
INTERSECT
SELECT * FROM Employee1

19. Full Outer Join is a combination of which of the following:-
a. Left Outer and Left Inner Join
b. Left Outer and Right Inner Join
c. Left Outer and Right Outer Join
d. Left Outer and Right Outer Join

A. Full Outer Join is a combination of Left Outer and Right Outer Join in SQL

20. Right Outer Join is similar to:-
a. Right Inner Join
b. Left Inner Join
c. Left Outer Join
d. Right Outer Join

A. Right Outer Join is similar to Left Outer Join in SQL

If you want to practice SQL questions before an interview or to hone up your SQL skills, you can either practice at www.sqlzoo.net or you can buy 112 Questions to Crack Business Analyst Interview from Amazon.























Author: TheDataMonk

I am the Co-Founder of The Data Monk. I have a total of 6+ years of analytics experience 3+ years at Mu Sigma 2 years at OYO 1 year and counting at The Data Monk I am an active trader and a logically sarcastic idiot :)