Statistics Interview Questions for Data Science

Here we have a set of 17 statistics interview questions that you should understand before your data science interviews. These are very basic Statistics questions which will check your elementary knowledge

Statistics Interview Question

15+ correct = Very strong fundamentals
10-15 correct = At par with the concepts, you should try to complete one basic book on statistics
<10 correct = Go through at least one book and cover 80-100 MCQs

Let’s get to the statistics interview questions

1. The mean of a distribution is 20 and the standard deviation is 5. What is the value of the coefficient of variation?

A.
Variation
 = (Standard Deviation/Mean)*100
= (5/20)*100
= 25%

2. When the mean is less than mode and median, then what type of distribution is it?

A. Negatively Skewed

3. Which of the following describe the middle part of a group of numbers?
a. Measure of Variability
b. Measure of Central Tendency
c. Measure of Association
d. Measure of Shape

A.
Measure of Central Tendency

4. According to the empirical rule, approximately what percent of the data should lie within μ±2σ?

A. 95% of the data should lie between μ±2σ

5. The sum of the deviations about the mean is always:
a. Range
b. Zero
c. Total Deviation
d. Positive

A.
Zero

6. The middle value of an ordered array of numbers is the

  1. Mode
  2. Mean
  3. Median
  4. Standard Deviation

    Ye toh kar lo 🙂

7. Height of employees is a :-
a. Continuous value
b. Qualitative value
c. Discrete value
d. None of these


8. Which of these is a measure of dispersion:-
a. Mean
b. Median
c. Quartile
d. Standard Deviation


A. Standard deviation is a measure of dispersion

9. The variance of a dataset is 144, what is the Standard Deviation?

A. Standard deviation is square root of Variance, so the Standard deviation will be 12


10. Which of these is a qualitative data:-
a. Weight of family members
b. Salary
c. Feedback of 100 customers about your website
d. Number of burgers sold in India


A. Feedback of 100 customers about your website, rest all are discrete

11. Which of these is/are measure of central tendency?
a. Median
b. Mean
c. Mode
d. Mid range
e. Mid hinge


A. All of these are measures of central tendency

12. What divides a data set in a group of 10 parts?
a. Deciles
b. Percentile
c. Quartile
d. Standard Deviation

A. Deciles divide the complete dataset in a group of 10 parts

13. What is Mid-range?

A.
The arithmetic mean of the maximum and minimum values of a dataset is called mid-range

14. What is Mid hinge?
A.
The arithmetic mean of the two quartiles is called mid hinge

15. What is Inter Quartile Range?
a. 0-50th percentile
b. 25-50th percentile
c. 25-75th percentile
d. 50-100th percentile

A.
25-75th percentile is called IQR i.e. Inter Quartile Range

16. What is a cap in a box-plot
A.
An upper cap contains the values which falls between 75th percentile and 75th Percentile+1.5*IQR. Similarly lower cap contains the values which falls between 25th Percentile and 25th Percentile-1.5*IQR

17. What values are termed as an outlier in a box plot?
A.
Any value which is more than upper cap and less than the lower cap will fall under the definition of an outlier

You can also checkout Data Camp for Statistics courses

If you are wondering how to study and be interview ready for SQL, Python/R, Statistics, Machine Learning, Case Study and Guesstimates, then you can have a look at our 7 min-read article – How to make a career in Data Science

The Data Monk Interview Books – Don’t Miss

Now we are also available on our website where you can directly download the PDF of the topic you are interested in. At Amazon, each book costs ~299, on our website we have put it at a 60-80% discount. There are ~4000 solved interview questions prepared for you.

10 e-book bundle with 1400 interview questions spread across SQL, Python, Statistics, Case Studies, and Machine Learning Algorithms – Ideal for 0-3 years experienced candidates

23 E-book with ~2000 interview questions spread across AWS, SQL, Python, 10+ ML algorithms, MS Excel, and Case Studies – Complete Package for someone between 0 to 8 years of experience (The above 10 e-book bundle has a completely different set of e-books)

12 E-books for 12 Machine Learning algorithms with 1000+ interview questions – For those candidates who want to include any Machine Learning Algorithm in their resume and to learn/revise the important concepts. These 12 e-books are a part of the 23 e-book package

Individual 50+ e-books on separate topics

Important Resources to crack interviews (Mostly Free)

There are a few things which might be very useful for your preparation

The Data Monk Youtube channel – Here you will get only those videos that are asked in interviews for Data Analysts, Data Scientists, Machine Learning Engineers, Business Intelligence Engineers, Analytics managers, etc.
Go through the watchlist which makes you uncomfortable:-

All the list of 200 videos
Complete Python Playlist for Data Science
Company-wise Data Science Interview Questions – Must Watch
All important Machine Learning Algorithm with code in Python
Complete Python Numpy Playlist
Complete Python Pandas Playlist
SQL Complete Playlist
Case Study and Guesstimates Complete Playlist
Complete Playlist of Statistics

How to make a career in Data Science? Data Science Preparation

People have always been curious about how to make a career in data science or how to enter this domain after an experience of, say 3-4 years.
We will help you in providing Data Science preparation strategy

Should I do a course from Simplilearn?
Should I try Udemy?
I know Python is the key to Data Science?
Is it about a lot of Mathematics?

And after this, a promotional line comes in which sounds something like this

Do you wish to get into Data Science?
We have four ways to be a Data Scientist in 12 hours.

Data Science preparation



Just follow our course which will cost you only Rs.24999 or a course at Indian Class of Business for Rs. 10 lacks


I would suggest you not to fall in these traps !!

No, we are not here to brag about our course or books or anything. We will genuinely tell you what you should do to enhance your skills and how can you be a strong candidate.

I can assure you that you can be a strong candidate in a DS interview and you can convert 1 in every 8-9 interviews. Why? Because the competition is too freaking high 🙂

Follow our Data Science preparation strategy, before that make a profile on www.thedatamonk.com and make sure to check our page How to use The Data Monk

Let’s divide a Data Scientist interview into 5 parts:-

-SQL
-R/Python
-Visualization Tools
-Statistics, Machine Learning Algorithms and practicing Hackathons
-Case Studies and Guesstimates


Some may argue that SQL and Visualization tools are not used in DS domain, but here I am catering to a broader audience and a good knowledge of each will definitely be fruitful. I mean, DEFINITELY 🙂

1. How to prepare SQL for Data Science?


This is one hell of a sexy language(common, it’s my private space :P)
I love SQL <3

How to prepare –
Give one complete day on SQL from either w3school or Tutorialspoint
After that 2 hours daily, start with SQLZOO (complete it in 4-5 days)
Then move to Hackerrank SQL questions(Complete it in 4-5 days)

I assume that after w3school,sqlzoo, and hacker rank you are in pretty good shape.


You can call it marketing gimmick or promotion or anything.
Dekho, If you buy these two books during the preparation of SQL then you will have lesser chances to get eliminated in the interviews. I am not saying that if you don’t do these questions then you will be eliminated.

These two books are like archive questions of IIT-JEE because most of these are those questions which were actually asked in the companies in which you will apply 🙂

112 Questions To Crack Business Analyst Interview Using SQL
Crack your next Data Science Interview with 300+ Questions

If you follow these steps, I can assure you that you won’t be eliminated in SQL round.

P.S. – We are writing a final book on SQL which will have something around 200 intermediate to difficult interview questions, so be connected over me on Linkedin for the giveaway


2. How to prepare R/Python for Data Science?


Everyone says that Language does not matter but I would suggest that if you are starting your preparation then go for Python.

Preparation

Again, go with w3school or tutorials point, but if you are going for Python, I will suggest you to study the first 1 hour of a 6-hour long video freely available on youtube. Search ‘Python for Data Science’, the video is by Intelipaat.

For R/Python I would definitely recommend Data Camp. The courses here are way too good for any beginner, it might cost you something around 6-7k. But this is totally worth it 🙂

It would take you one full day to go through the installation, and these study materials.

Then solve these many questions

If else = 10 quest
function = 20 quest
lambda function = 10 quest
loop = 10 quest

You can get these questions on the internet

If you want to go deep in the syntax part then solve 20-25 beginner level questions on Codechef

Now you are good with Python basics, now you need to give one complete day on EACH of these packages
-pandas
-NumPy
-seaborn and sklearn

We have a book on these ‘Learn Pandas and Numpy the Data Science way’ (will be published soon. You can get a copy if you want to get the basics better

But, I will highly recommend you
125 Must Have Python Questions

Again, whenever you think you need an extra push, you can buy the book. Also in the 300 questions book mentioned above, we have a question on Python also, so wahan se bhi cover ho jaeyga

If you are good till here, then you would be quite decent in SQL and Python(Syntax)

3. How to study Power BI/Tableau for Data Science?


You can opt for either Power BI or Tableau. If you have industrial experience in Tableau then these rounds or questions would not be difficult.

If you are a fresher or want to move to DS then do try any course on Udemy. It will cost you around ~500 rupees or $7

I would recommend you to take any free or paid certification. I have an e-book on Power BI with 100 questions. Ping in the comment section or send a mail to contact@thedatamonk.com to receive a copy of the same.

Mention in the mail that you are here after reading our post on the website to get the book for free 🙂

4. How to prepare for Case Studies and Guesstimates


I would be frank here if you are too good at case studies then no need to spend much time. But if you are not in the top 2-3 percentile, there is just one book with most of the interview questions asked in Myntra, Flipkart, Google, Ola, OYO, Uber, etc.

We have collated it here in this book, this is a must-buy
Business Analyst and MBA Aspirant’s Complete Guide to Case Study

Take it as a cheat-sheet or an archived question bank, but this contains around 30 case studies and guesstimates which can surely help you


5. Statistics, Machine Learning Algorithms and practicing Hackathons

This is where you need to keep your focus on, start with any book on basic statistics. You need to cover the basics of population, sampling, error, prediction, normal distribution, linear equation, etc.

You just need to be a bit confident about the topics.

Broadly speaking, the following are the different algorithms/domains of Data Science, choose one or two from the below list:-

1. Linear Regression(LASSO, Elastic Net)
2. Logistic Regression
3. NLP
4. Time Series Forecasting
5. K-Nearest neighbor
6. K-mean
8. Boosting Algorithms(Ada, Gradient, XGBoost)
9. Artificial Intelligence
10. Neural Network
11. Random Forest and Decision Tree


I can go on and on, but wait, you need to start with Linear Regression, yes, that’s it!!

For Machine Learning also I would definitely recommend Data Camp. They have very interactive courses on different algorithms. Again, look into the website for the course fee, at the time of writing there were some discounts as well.

If not Data Camp, then we can surely help you

Start with the implementation part of Linear Regression and then hop to Logistic Regression or Natural Language processing.

Linear Regression is both, a simple and important algorithm to clear your basics.

There are a few good blogs on Kaggle and Analytics Vidhya, but if you feel lost, we are here with our book on Linear Regression and ARIMA Time Series Forecasting

Complete Linear Regression and ARIMA Forecasting project using R

I would also recommend you to solve a few of these problems on Kaggle
-Titanic Survival Problem
-Boston Housing Problem
-Petal-Sepal Problem


These will definitely get you a gist of different algorithms. So, go and make a profile on Kaggle and Analytics Vidhya.

Analytics Vidhya also organizes daily hackathons, I would suggest you to participate in at least one per month.

You can also look around on this website to understand a few tricky concepts.

This whole exercise will take you around 10-12 days(Part 5)

So, in a month or so you can definitely be quite good in most of the parts of Data Science.

Now, if you are planning to appear to any Data Science interviews, then you need to know what type of questions are asked in the interview

We have three such books which cover more than 700 interview questions in the following books, You should consider getting your hands on these

What do they ask in top Data Science Interviews?
What do they ask in top Data Science Interviews Part 2
Crack your next Data Science Interview with 300+ Questions

All of our books are having good reviews, so you can try these.

If you are serious about getting into Data Science, we have 10 e-book Bundle which covers more than 1400 interview questions distributed in 10 e-books. Following can help you in framing right Data

1. 112 Questions to crack Business Analyst interview using SQL
2. 100 Supervised Learning Questions (To give a background of ML)
3. 125 Must have Python Interview Questions(To start with Python)
4. Visualization in Python (Covering Visualization in Python)
5. 100 Power BI Questions (Covering Visualization in Power BI)
6. Learn Statistics in Python (Now, you start getting better in Python)
7. Top 300+Interview Questions in Data Science(Covering statistics,python,SQL,case studies,guesstimates)
8. What do they ask in Top Data Science interviews – Part 1 (Now you understand the complete flow)
9. What do they ask in Top Data Science interviews – Part 2
10. Case Studies and Guesstimates for Interviews

These books are marked at Rs.2300 on Amazon, but we are offering it at Rs.549.

If you want to avail these PDFs then comment in the section below or drop a mail to contact@thedatamonk.com or nitinkamal132@gmail.com or Whatsapp at 9798049301

All the questions are more than welcome.

Keep Learning 🙂

The Data Monk



What is Stationarity in Time Series?

Stationarity in Time Series
The first step for any time series analysis is to make the data set stationary. Everyone knows that stationarity means a near to constant mean and variance across time.
Stationarity in Time Series

Stationarity in Time Series
Stationarity in Time Series

The red line above shows an increasing trend and the blue line is the result of the de-trending series. De-trending means to fit a regression line and then subtract it using original data

Stationarity does not mean that the series does not change over time, just the way it changes does not itself change over time.

The reason why we need a stationary data is simple – It’s easier to analyze and predict a data set with stationarity. If a series is consistently increasing over time (like the one above), then the sample mean and variance will also grow with the size of the sample, and your model or the proposed time series solution will always underestimate the mean and variance in the future periods.

How you check the stationarity of a series?
In general, we use Augmented Dickey Fuller Test or KPSS test to check the stationarity of the series. Here we will discuss only the ADF test, KPSS phir kabhi

ADF is a statistical significance test (a test which involves null and alternate hypothesis) and it falls under the category of ‘unit root test’. Now, what is a unit root test?

Yt is the value of the time series at time ‘t’ and Xe is an exogenous variable (a separate explanatory variable, which is also a time series).

The presence of a unit root means the time series is non-stationary. Besides, the number of unit roots contained in the series corresponds to the number of differencing operations required to make the series stationary.

A time series is a process that can be written in its components which contains ‘roots’. For example:

v(t)=c+a1 v(t−1) + ϵt − 1

The coefficient a1 is a root. You can interpret this process/formula as ‘the value of today depends on the value of yesterday and some randomness we can’t predict’. We expect this process to always converge back to the value of c.

Try this is out with an example:
suppose c=0 and a1=0.5.

If yesterday v(t−1) the value was 100, then we expect that today the value will be around 50. Tomorrow, we expect the value to be 25 and so on.

You see that this series will ‘come home’, in this case meaning it will converge back to the value of cc.

When one of the roots is a unit, i.e. equal to 1 (in this example when a1=1), then this series will not recover to its origin. You can see this by using the example given above.
That is why the concepts of unit roots and unit root tests are useful: it gives us insights into whether the time series will recover to its expected value. If this is not the case, then the process will be very susceptible to shocks and hard to predict and control.

What is the significance of p-value in ADF test?
A high p-value, suppose 0.87 indicates that the possibility of the series to be non-stationary is 87%.
We do multiple differencing in the dataset to make it stationary

adf.test(diff(time_series))

In the above snippet, we are doing one differentiation of the time series data and then testing the stationarity using the adf test in R
You can also try a double differentiation or a difference after log to check the stationarity(if the noise is high)

adf.test(diff(log(time_series))

A rule of thumb – Don’t over differentiate i.e. don’t apply 6-7 differentiation to fix the noise in order to decrease the p-value for a stationary dataset.

In the case of a first-difference, we are literally getting the difference between a value and the one for the time period immediately previous to it. If you are going for a high number of differentiation then it clearly means that your data has too much noise to cater to a time series pattern

Differencing can help stabilize the mean of a time series by removing changes in the level of a time series, and therefore eliminating (or reducing) trend and seasonality.

Bottom line :-
-the value of today depends on the value of yesterday and some randomness we can’t predict
-Stationarity is useful to identify the pattern in order to predict values
-You do a difference of order one, two, three, etc. to get to a stationary value for the dataset
-Do an ADF or a KPSS test to check if the series is stationary
-Uske baad chill 🙂

The Data Monk e-books

Tired of online courses costing 2 to 8 lakh and taking more than a year to complete?
Tired of going through 500+ hours of videos at a super slow pace?
We at The Data Monk believe that you have to start and complete things as quickly as possible. We believe in a target-based study where we break a topic into 100 questions and make sure that if you cover these questions you will surely be able to crack the interview questions. Rest all theory and practical can ONLY be learned while working in an organization.


Pick any of our books from our e-shop page and complete it in 6-8 hours, learn the 100 questions and write it in your resume. We guarantee you that you will nail 8 out of 10 interviews

We also have 3 bundles at a price that is affordable to everyone. We are a group of people placed in the best of the product-based companies and we take 100+ interviews per week. Do we know what is being asked and what is not? So, just grab any of the following book bundles and give not more than 30 days to LEARN all the questions. We guarantee you that you will become a very strong candidate in any analytics interview

Set A – [3rd/4th year/ and 0 to 3 years of experience]

Crack any analytics or data science interview with our 1400+ interview questions which focus on multiple domains i.e. SQL, R, Python, Machine Learning, Statistics, and Visualization. – https://thedatamonk.com/product/books-to-crack-analytics-interview/

Set B – [0-5 Years of Experience]

1200+ Interview Questions on all the important Machine Learning algorithms (including complete Python code) Ada Boost, CNN, ANN, Forecasting (ARIMA, SARIMA, ARIMAX), Clustering, LSTM, SVM, Linear Regression, Logistic Regression, Sentiment Analysis, NLP, K-Mean – https://thedatamonk.com/product/machine-learning-interview-questions/

Set C – [0-7 Years of Experience]

2000+ interview questions that include 100 questions each on 12 most asked Machine Learning Algorithms, Python, Numpy and Pandas – 300 Interview Questions, Pandas,PCA,AWS,Data Preprocessing,Case Studies, and many more
https://thedatamonk.com/product/the-data-monk-e-book-bundle/

Note – Set C contains all the questions of Set B


Youtube Channel – The Data Monk

Unlike any other youtube channel, we do not teach basic stuff, we teach only topics that are asked in interviews. If the interviewer asks about p-value, we will have a video on that topic,
If the interviewer is interested in asking the sequence of execution of SQL commands then we will give you an overview of all the commands but stress so much on the question that can answer it comfortably in the interview. We definitely recommend you to follow our youtube channel for any topic that you are interested in or weak at
.

If you wish to get all the study material and topics to cover for an interview at one place, then you can subscribe to our channel. We have covered the complete syllabus of
Get all the youtube videos playlist on our youtube Channel – The Data Monk

Code in Python for Data Science – Understand one algorithm at a time in 30 minutes (theory and python code)
Company-wise Data Science Interview Questions – 15 videos on how to crack analytics interview
Complete Numpy Tutorial – 14 videos on all the functions and questions on Numpy
Complete Python Pandas Tutorial – 15 videos to completely cover Pandas
SQL Complete Playlist – 20 highly recommended videos to cover all the interview questions
Case Study and Guesstimates Complete Playlist –  Real-life interview case study asked in 2021
Statistics– 10 videos to completely cover Statistics for interviews


Lastly,
If you are in dire need of any help, be it book-wise or guidance-wise, then you can definitely connect with me on Linkedin. We will try to help as much as possible