What to do if you have 2 variables in a Linear Regression model with high multicollinearity ? How do you decide which one to remove from the model?

Question

You have two variables age and years of experience to determine salary of a person.
age and year of exp is highly correlated. What to do?

in progress 0
TheDataMonk 55 years 6 Answers 746 views Grand Master 0

Answers ( 6 )

    0

    We can check multi collinearity by using variation inflation factor(VIF). Feature js affected by multi collinearity when the value of VIF is greater than 10. So, if the columns which is highly correlate with each other we can drop that column.

  1. We can remove the variable which have less correlation with dependent variable.

    Or we will remove the variable which has highest VIF among the two

  2. If aga and years of experience are highly correlated with each other, then definitely one of the variables will have to be removed in order to avoid the overestimation of the salary of the person.
    What happens is, since the two variables age and years of experience of a person are highly correlated, the study of the effect of age on the salary is overestimated as a part of it is also contributed by the ‘years of experience’ variable due to the multicollinearity factor existing between the two variables.
    To remove this multicollinearity, we need to find the Variance Inflation Factor (VIF) for each variable. The variable with the highest VIF will have to be removed from the model as that variable would be the one contributing the most to the overestimated value of the dependent variable.

  3. We can keep the column which has more correlation with the target variable. Also, we can run two different models, keeping one of the variables at a time and see which gives better accuracy. If even that is same, then we can look at VIF.

  4. Suppose we have two independent variable age and years of experience is highly correlated with each other so we have to remove one independent variable to avoid multicollinearity in the model to overestimate the salary of a person.
    we can remove the variable which is not significant to the model or less significant to the model.
    also we can use variance inflation factor (VIF) to detect multicollinearity for each variable.The variable with the highest VIF will have to be removed from the model as that variable would be the one contributing the most to the overestimated value of the dependent variable.

  5. Vif method : calculate vif for each variable. If it’s greater than 10 remove it
    Keep the highly correlated one with dependent variable and remove the other
    Check the error by forward regression and backward regression and keep the variable accordingly

Leave an answer

Browse
Browse