Share
What makes Ridge Regression different from Linear Regression?
Question
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Explain in simple terms
Answers ( 8 )
1) The main idea behind Linear Regression is to minimize the Residual Sum of Squares(RSS),
which is the squared distance of each point from the ‘line of fit’.
The aim of Linear Regression is to minimize -> e1^2 + e2^2 +…….. + en^2 (where e1,e2…..en are respective error terms).
2) The main idea behind Ridge Regression is to shrink the coefficients of the Linear Regression
and introduce a little bit of bias in the model so that it gives overall better predictions
when it is exposed to unseen data.
The aim of Ridge Regression is to minimize -> e1^2 + e2^2 +…….. + en^2 + (lambda*slope^2)
Here, slope refers to the slope of the line.
The (slope^2) term is the penalty and lambda determines how severe the penalty is going to be .
The value of lambda is found out by cross validation.
If we look at the figure above, where we are predicting size depending upon the weight,
when the slope of line is small, the predictions of the size are much less sensitive to changes in weight.
This is what Ridge Regression tries to achieve
Ridge regression can be considered as a regularization technique which can be used to reduce over-fitting of the data. essentially it adds an additional term which is lambda*slope**2 which penalizes model.
First to explain why we are using Ridge regression , lets first try to understand what is Bias and Variance.
Bias – Its the inability for a machine learning method( like a linear regression ) to capture a true relationship is called Bias.
Because the Straight Line cant be curved like the ‘True’ Relationship. It has relatively large amount of Bias.
Variance – The difference In fits between data sets is called Variance.
Now coming to Ridge Regression We add a little bias so that to capture the future predictions better( Low Variance)
Ridge Regression is a technique for analyzing multiple regression data that suffer from multicollinearity. When
multicollinearity occurs, least squares estimates are unbiased, but their variances are large so they may be far from
the true value. By adding a degree of bias to the regression estimates, ridge regression reduces the standard errors.
It is hoped that the net effect will be to give estimates that are more reliable.
Linear Regressrion : Linear Regression is used for prediction. It is used to predict the value without any prior knowledge of the data. and It is used to reduce RMSE and MSE.
whereas Ridge is used as a feature selection also and it is used to reduce the coefficients of Linear regression. It penalize the data for being to large than that if there is no multicollinearity present. it imposes penalty called alpha and this penalty is calculated during cross validation. where penalty is equal to the (lambd/2*slope)(sum of square of weights).
it shrinks the coefficient of linear regression to such extent that it doesnot become zero.
And those variables which have small coefficients in Ridge than Linear are selected as features on which Ridge Regression model is performed.
1) Ridge regression is basically a regularized linear regression model. Regularization helps to find a trade-off between fitting the model, but not overfitting it
2) L = ∑( Ŷi- Yi)2 + λ∑ β2 is the loss function for ridge regression. IThis loss function includes two elements. The first one is the sum of distances between each prediction and its ground truth. The second element is the regularization term. It sums over squared β values and multiplies it by another parameter λ. The reason for doing that is to “punish” the loss function for high values of the coefficients β (slope). We can thus add a constraint to minimize them as much as possible and make our model simple to reduce overfitting
3) The λ parameter is a scalar that should be learned using cross-validation
4) A super important fact we need to notice about ridge regression is that it enforces the β coefficients to be lower, but it does not enforce them to be zero. That is, it will not get rid of irrelevant features but rather minimize their impact on the trained model
EXTRA: Lasso method overcomes the disadvantage of Ridge regression by not only punishing high values of the coefficients β but actually setting them to zero if they are not relevant. Therefore, you might end up with fewer features included in the model than you started with. Loss function for lasso regression is L = ∑( Ŷi- Yi)2 + λ∑ |β|
Linear regression is the one, where the model is not penalized for its choice of weights, at all. That means, during the training stage, if the model feels like one particular feature is particularly important, the model may place a large weight to the feature. This sometimes leads to overfitting in small datasets.
Whereas,
Ridge takes a step further and penalizes the model for the sum of squared value of the weights. Thus, the weights not only tend to have smaller absolute values and more evenly distributed, but also tend to be close to zeros.
Linear Regression – Its purpose is to minimize the Residual sum of squares (RSS). Ordinary Least Squares (OLS) method finds the unbiased coefficients i.e. it doesn’t consider which independent variable is more important than others.
It does not take Multicollinearity into account, thus, an OLS model becomes more complex as new variables are added which can lead to overfitting.
Ridge Regression – It is basically a regularized linear regression model. Regularization helps to find a trade-off between variance and bias to prevent overfitting.
The idea here is to add a bias and shrink the coefficients of Linear Regression. We can tune the lambda parameter so that model coefficients Beta (β) change and converge to zero.