Regularization

Question

I have been reading about regularization since some time and want to hear from everyone what it means (in the simplest possible terms).

Also, do define L1 and L2 regularization as if you are talking to a 10 year old child.

in progress 0
Raahul 55 years 1 Answer 592 views Member 0

Answer ( 1 )

  1. Regularization is a technique which makes slight modifications to the learning algorithm such that the model generalizes better. This in turn improves the model’s performance on the unseen data as well.Regularization penalizes the coefficients. In deep learning, it actually penalizes the weight matrices of the nodes.L1 and L2 are the most common types of regularization. These update the general cost function by adding another term known as the regularization term.

    Cost function = Loss (say, binary cross entropy) + Regularization term

    Due to the addition of this regularization term, the values of weight matrices decrease because it assumes that a neural network with smaller weight matrices leads to simpler models. Therefore, it will also reduce overfitting to quite an extent.

    However, this regularization term differs in L1 and L2.

  2. Okay let’s say you have data of some people’s height and weight. You want to predict a person’s weight given his height and marks in grade 12th.

    Implementing regression, you will get some equation as:
    Weight = a * Height + b*Marks in 12th + Error

    Now given the height of person and grades in 12th, you can predict weight of the person by plugging in the values. This is simple linear regression.

    But there is a problem with above approach. Even though the Weight has nothing to do with grades in 12th. Simple regression will try hard to fit a line(plane in this case) to the data points.

    This is where regularization comes in picture. It penalizes the model when it tries to learn such estimates. In other words the new line after regularization will not fit the training data very well, in order to give better predictions for new data.

    In very simple words:
    L2 Regression:
    It penalizes but the coefficients will never be zeroed. There will still be a very small effect of marks on the weight.

    L1 Regression:
    It has the capability to make some coefficients to exactly 0. So you can get a model where marks are not even considered.

Leave an answer

Browse
Browse