Register Now

Login

Lost Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Login

Register Now

It will take less than 1 minute to register for lifetime. Bonus Tip - We don't send OTP to your email id Make Sure to use your own email id for free books and giveaways

Missing Value Treatment – Mean, Median, Mode, KNN Imputation, and Prediction

Missing Value treatment is no doubt one of the most important parts of the whole process of building a model. Why?
Because we can’t afford to eliminate rows wherever there is a missing value in any of the columns. We need to tackle it in the best possible way. There are multiple ways to deal with missing values, and these are my top four methods:-


1. Mean – When do you take an average of a column? There is a saying which goes like this, “When a Billionaire walks in a small bar, everyone becomes a millionaire”
So, avoid using Mean as a missing value treatment technique when the range is too high. Suppose there are 10,000 employees with a salary of Rs.40,000 each and there are 100 employees with a salary of Rs. 1,00,000 each. In this case you can consider using the mean for missing value treatment.

But, if there are 10 employees with 8 employees earning Rs.40,000 and one of them earning Rs. 10,00,00. Now, here you should avoid using mean for missing value treatment. You can use mode !!

2. Median – Median is the middle term when you write the terms in ascending or descending order. Think of one example where you can use this? The answer is at the bottom of the article

3. Mode – Mode is the maximum occurring number. As we discussed in point one, we can use Mode where there is a high chance of repetition.

4. KNN Imputation – This is the best way to solve a missing value, here n number of similar neighbors are searched. The similarity of two attributes is determined using a distance function.

In one of the Hackathon, I had to impute or treat the missing value of age, so I tried the following way out( in R)

new_dataset <- knnImputation(data = df,k=8)

k-nearest neighbour can predict both qualitative & quantitative attributes but it consumes a lot of time and processor

install.packages(“imputeTS”)
library(imputeTS)
x <- ts(c(12,23,41,52,NA,71,83,97,108))

na.interpolation(x)

na.interpolation(x, option = “spline”)

na.interpolation(x, option = “stine”)



5. Bonus type – Prediction
This is another way of fixing the missing values. You can try linear regression/time series analysis or any other method to fill in the missing values using prediction

Median – You can use median where there is low variance in age


Came across KNN Imputation, so thought of sharing the same !!

Keep Learning 🙂
The Data Monk

The Data Monk services

We are well known for our interview books and have 70+ e-book across Amazon and The Data Monk e-shop page . Following are best-seller combo packs and services that we are providing as of now

  1. YouTube channel covering all the interview-related important topics in SQL, Python, MS Excel, Machine Learning Algorithm, Statistics, and Direct Interview Questions
    Link – The Data Monk Youtube Channel
  2. Website – ~2000 completed solved Interview questions in SQL, Python, ML, and Case Study
    Link – The Data Monk website
  3. E-book shop – We have 70+ e-books available on our website and 3 bundles covering 2000+ solved interview questions. Do check it out
    Link – The Data E-shop Page
  4. Instagram Page – It covers only Most asked Questions and concepts (100+ posts). We have 100+ most asked interview topics explained in simple terms
    Link – The Data Monk Instagram page
  5. Mock Interviews/Career Guidance/Mentorship/Resume Making
    Book a slot on Top Mate

The Data Monk e-books

We know that each domain requires a different type of preparation, so we have divided our books in the same way:

1. 2200 Interview Questions to become Full Stack Analytics Professional â€“ 2200 Most Asked Interview Questions
2.Data Scientist and Machine Learning Engineer -> 23 e-books covering all the ML Algorithms Interview Questions
3. 30 Days Analytics Course – Most Asked Interview Questions from 30 crucial topics

You can check out all the other e-books on our e-shop page â€“ Do not miss it


For any information related to courses or e-books, please send an email to nitinkamal132@gmail.com

About TheDataMonkGrand Master

I am the Co-Founder of The Data Monk. I have a total of 6+ years of analytics experience 3+ years at Mu Sigma 2 years at OYO 1 year and counting at The Data Monk I am an active trader and a logically sarcastic idiot :)

Follow Me