Company Name – Bookmyshow
Location –
Position –
Data Scientist

Number of Rounds – 4 
Round 1 –
Written Open book SQL and R/Python round
Round 2 –
Case Study
Round 3 –
Statistics and Project Discussion
Round 4 –
HR Round

Round 1 – SQL and R open book written test

1. Count the total salary department number wise where more than 2 employees exist.
2. How can I retrieve all records of emp1 those should not present in emp2?
3. How to fetch only common records from two tables emp and emp1?
4. How to get nth max salaries?
5. How to get 3 Min salaries?
6. Select all customers who purchased at least two items on two separate days.
7. Given a table with a combination of flight paths, how would you identify unique flights if you don’t care which city is the destination or arrival location.
8. If you have two SQL database tables that are not joined  together, how would you create another table to join them.
9. There were plotting questions in R, normal syntax of ggplot in R and seaborn package in Python to create countplot and catplot. Go through the visualizations in R/Python

Round 2 – Case Study

Case Study 1 – A client has a Diwali-themed e-commerce shop that sells five items. What are some potential problems you foresee with their revenue streams?

Case Study 2 – Taj Group of Hotels is planning to start a new branch, What are the parameters it should consider to find the appropriate place?

Round 3 – Statistics and Project Discussion
My project was on Natural Language Processing, so the questions were mostly around the same topic.

a. Give an example of Normal Distribution from daily life.
b. Why do we have N-1 as the denominator when calculating sample variance and N when calculating population variance?
c. How do you remove your own list of stop words from a line of text given below
‘Book My Show is the best website to book a show’
d. What is the difference between stemming and lemmatization?
e. What were the packages which you used in this project?
f. Suppose there is a column in a text file with lots of text and you have take only words and exclude special characters and number.
g. What are the steps involved in a typical Text-Analytics project
h. How many bi-grams can be generated from given sentence:
“Sachin Tendulkar is the best batsman in the World”
i. What Is The Significance Of Tf-idf?
j. What is Normalization in text or text normalization?
k. What kind of features can be followed by NLP for improving accuracy in the classification model?
l. How does a Sentiment analysis algorithm about customer review works?
m. Then how do you counter sarcasm?

Round 4 – HR Round
Basic HR Questions

This was it 

