Walmart Labs Interview Question | Better Regression
Question
Which one would likely perform better- Linear Regression or Random Forest Regression? Why?
solved
1
Machine Learning
4 years
5 Answers
1553 views
Great Grand Master 0
Answers ( 5 )
Linear regression works well when there is linearity in the data. It determines the best fit line using this data. Hence the error is less.
But if the data is non-linear and have multiple features, random forest works well as it combines several decision trees to fit the data.
LinearRegression would work well when the data is somewhat linearly correlated with the target variable and is mostly continuous.
RandomForest Regression would work well when there are too many categorical features and the relationship between the data and target variable is non-linear.
One of the another situation is when the data is highly skewed, non-parametric models like RF Regression works very well in such cases
Depends on the data.
If there is a fairly linear relationship between dependent and independent variables,
then a Linear Regression would perform better.
If the relationship is not linear , then Linear Regression would not able to capture the non-linear
features and hence Random Forest would perform better in such scenarios.
For a more detailed explanation, refer the following link
https://www.quora.com/When-are-random-forests-RFs-better-than-linear-regression-models-Do-RFs-fit-nonlinear-data-better-from-the-get-go-without-a-plotting-transform-dance-Are-there-other-cases-where-I-should-use-RF-instead-of-trusty-old-linear-models
Therefor sure have to be situations where Linear Regression outperforms Random Forests, but I think the more important thing to consider is the complexity of the model.
Linear Models have very few parameters, Random Forests are a lot more. That means that Random Forests will overfit more easily than a Linear Regression.
Key advantages of linear models over tree-based ones are:
1. They can extrapolate (e.g. if labels are between 1-5 in the train set, the tree-based model will never predict 10, but linear will)
2. Could be used for anomaly detection because of extrapolation
3. Interpretability (yes, tree-based models have feature importance, but it’s only a proxy, weights in the linear model are better)
4. Need less data to get good results
Depend on the data.
Linear regression will perform better when continuous dependent variable. There exit linear relationship between dependent and independent variables.
Random forest will perform better when more number of categorical variables.