How do you ensure you’re not overfitting with a model?

Question

Explain methods to avoid overfitting

in progress 0
Pragya Rani 55 years 4 Answers 2016 views Contributor 1

Answers ( 4 )

  1. This is a simple restatement of a fundamental problem in machine learning: the possibility of overfitting training data and carrying the noise of that data through to the test set, thereby providing inaccurate generalizations.
    There are three main methods to avoid overfitting:
    1- Keep the model simpler: reduce variance by taking into account fewer variables and parameters, thereby removing some of the noise in the training data.
    2- Use cross-validation techniques such as k-folds cross-validation.
    3- Use regularization techniques such as LASSO that penalize certain model parameters if they’re likely to cause overfitting.

  2. Initially, I split the data into three-part, training, validation and test. Then I choose some valid cross-validation strategies like KFold, Stratified KFold, GroupKFold, TimeSeriesSplit based on the objective, and train my model with some folds and validate on validation dataset. I will calculate the eval score for each fold and then cv score (mean of evaluation score for total number of the fold) and its standard deviation. If my standard deviation is too large and testing the model on test data give score beyond my confidence interval (cv_score +- std) then I am overfitting the model.
    The reason might be large number of features, irrelevant or redundant features.

  3. ways to avoid overfitting :-
    – using a simpler model like linear regression .
    – using emsembling of multiple models.
    – using ridge and lasso regression techniques.
    – performing stratified k-fold or k-fold cross validation

  4. 1) You can introduce a little bit of bias in the model like in Linear Regression, so
    that you will reduce the variance.
    2) You can use k-fold cross validation instead of just splitting your data in one
    single train and test set.
    3) You can use regularization techniques like ridge and lasso which make the
    target variable less sensitive to changes in the independent variable.
    4) You can use ensemble methods like Random forests which average out the results
    of many different trees.

  5. By cross validation technique
    By reducing variance and adding some bias to model
    By regularization techniques like L1 and L2
    By checking accuracy score on unseen test dataset
    By train validation and test splitting of data
    Etc.

Leave an answer

Browse
Browse