Register Now


Lost Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.


Register Now

It will take less than 1 minute to register for lifetime. Bonus Tip - We don't send OTP to your email id Make Sure to use your own email id for free books and giveaways

Multicollinearity in Simple Terms

We all know the definition of multi-collinearity i.e. when 2 or more explanatory variable in multi regression model are highly linearly related then it’s called multicollinearity

Example –
Age and Selling price of a Car
Education and Annual Income
Height and Weight

Why should we remove multicollinearity from our model?
Example, You are watching WWE and Batista is thrashing Undertaker. Now you know that Batista is better.
But suppose it’s a Royal Rumble where 5 wrestlers are beating Undertaker simultaneously. Now, you can’t say which one is attacking with what intensity and thus you can’t say which wrestler among the five are better.

Thus when you have multiple variables which are correlated, the model is unable to give proper weightage about the impact of each variable. So, we need to remove redundant variables

What all methods are used to remove multi-collinearity?
There are two methods to do the same:-
1. VIF – It stands for Variance Inflation Factor. During regression analysis, VIF assesses whether factors are correlated to each other (multicollinearity), which could affect p-values and the model isn’t going to be as reliable

Factor with high VIF should be removed. A VIF of 1 suggests no correlation

2. PCA – Principal component analysis (PCA) is a technique used to emphasise variation and bring out strong patterns in a dataset. It’s often used to make data easy to explore and visualise.

How so we deal with Multicollinearity in our model?
1. You can use feature engineering to convert the two variables into one and then use this variable

2. Use VIF/PCA to eliminate one of the variables
You should eliminate the one which is not strongly correlated with the target variable

I think this is much about Multi-collinearity. Let me know if you have any questions

Keep Learning 🙂
The Data Monk

About TheDataMonkNewbie

I am the Co-Founder of The Data Monk. I have a total of 4+ years of analytics experience with 3+ years at Mu Sigma and 1 year at OYO. I am an active trader and a logically sarcastic idiot :)

Follow Me

Leave a reply