Register Now


Lost Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.


Register Now

It will take less than 1 minute to register for lifetime. Bonus Tip - We don't send OTP to your email id Make Sure to use your own email id for free books and giveaways

Big Data Interview Questions | Day 5

Welcome to the 5th Day of Big Data Interview Questions.
You can go through the question and answer for
Day 1
Day 2
Day 3
Day 4

Big Data Interview Questions

In this series, all you need to do is to go through the questions and try to attempt it with whatever knowledge you have.

Before or after you attempt the questions, do go through the complete road map about what you need to study to crack an analytics role. We have covered from 3rd year college student to 8 years of analytics or non-analytic experience. Make sure you understand where to emphasize more(after all by the end of the day you want to crack an interview)

Complete Roadmap for cracking an Analytics Role in the next 30 days

Now, coming back to our questions. Remember, these are the most asked questions in any interview, if you are shy to answer the questions in the comment section, then do make sure to Google the answer and jot it down in your notebook.

Big Data Interview Questions

Following are the topic wise questions


  1. Difference between 2NF and 3NF in Normalizaiton?
  2. There is this Student table with columns – Name, Roll No, Marks, Class.
    Calculate the class for which the average score was the highest?


  1. Import a csv using pandas and then remove all the columns with any null value
  2. Multiple methods or functions to get a gist of the type of data present in the table like number of rows, columns, mean, etc.

Case Study

  1. Flipkart has recently observed that people are falling out of it’s buy funnel i.e. a list of pages following which a customer makes a payment.
    The pages are – Listing page, Description Page, Delivery option page, Payment Page and Redirection page.

    How would you identify the problem faced by the customers

Machine Learning

  1. What is F1 score?
  2. What is ROC and AUC curver?

The Data Monk Interview Books – Don’t Miss

Now we are also available on our website where you can directly download the PDF of the topic you are interested in. At Amazon, each book costs ~299, on our website we have put it at a 60-80% discount. There are ~4000 solved interview questions prepared for you.

10 e-book bundle with 1400 interview questions spread across SQL, Python, Statistics, Case Studies, and Machine Learning Algorithms – Ideal for 0-3 years experienced candidates

23 E-book with ~2000 interview questions spread across AWS, SQL, Python, 10+ ML algorithms, MS Excel, and Case Studies – Complete Package for someone between 0 to 8 years of experience (The above 10 e-book bundle has a completely different set of e-books)

12 E-books for 12 Machine Learning algorithms with 1000+ interview questions – For those candidates who want to include any Machine Learning Algorithm in their resume and to learn/revise the important concepts. These 12 e-books are a part of the 23 e-book package

Individual 50+ e-books on separate topics

Important Resources to crack interviews (Mostly Free)

There are a few things that might be very useful for your preparation

The Data Monk Youtube channel – Here you will get only those videos that are asked in interviews with Data Analysts, Data Scientists, Machine Learning Engineers, Business Intelligence Engineers, Analytics managers, etc.
Go through the watchlist which makes you uncomfortable:-

All the list of 200 videos
Complete Python Playlist for Data Science
Company-wise Data Science Interview Questions – Must Watch
All important Machine Learning Algorithm with code in Python
Complete Python Numpy Playlist
Complete Python Pandas Playlist
SQL Complete Playlist
Case Study and Guesstimates Complete Playlist
Complete Playlist of Statistics

About TheDataMonkGrand Master

I am the Co-Founder of The Data Monk. I have a total of 6+ years of analytics experience 3+ years at Mu Sigma 2 years at OYO 1 year and counting at The Data Monk I am an active trader and a logically sarcastic idiot :)

Follow Me

Comments ( 2 )

  1. SQL
    2. with class_results as
    (select class, avg(marks) as avg_marks from student
    group by class order by avg(marks) desc)
    select class, avg_marks from class_results limit 1

    1. import pandas as pd
    file = pd.read_Csv(‘data.csv’)
    file_v2 = file.dropna(axis=1, how=’all’)
    2. describe (to know mean, std, etc. for all columns), info (to know the datatype of the columns), shape (to know number of rows and columns)

    Case Study
    The problem – people are falling out of Flipkart’s buy funnel (different pages mentioned, across the funnel)
    Clarification questions – Have there been any UI/UX or any changes made to any of the pages in recent times? Are customers of all profiles/locations falling out of the funnel, or is it specific to any? What is the horizon of this fall-out (timeframe since this has been happening)
    Approach – We have to calculate drop-off rate, bounce rate, click rates across page elements, error alerts sent to customers, avg. time spent across all these pages, for customers. This data would help us narrow down the page which is problematic. Based on the above tracked metrics, we can identify the problem too. If the error messages are high. then we need to fix some bug in the back-end. If the drop-off rate is high, we have to check our ‘items in the cart’ reminder alert system and customers’ response (one probable solution). If the click rates across elements have dropped, we have to check when the UI was updated and run some A/B tests to solve for the problem.

    Machine Learning
    1. F1 score is used as a metric in classification models. It is the harmonic mean of precision and recall. It ranges from 0 to 1. If it is 1, the model has perfect precision and recall.
    2. ROC is known as receiver-operator characteristic curve is the graph between TPR (True positive rate) and FPR (false positive rate) and AUC is the area under this curve, which indicates the model’s ability to differentiate between the classes.