Define Degree of Freedom with example

Question

Give proper example. You can also use to show how it impacts your selection of Linear Regression Model

in progress 0
TheDataMonk 3 years 2 Answers 1374 views Grand Master 0

Answers ( 2 )

  1. Degree of freedom (df) is the number of observations of a sample or a data which can vary without any constraints, i.e.: it can take any value. Greater the sample size, more the number of degrees of freedom of variables, given the parameters. The constraints depend on estimating the parameters of a sample.

    For example: Mean of 10 observations can have 9 observations with any value (given the mean) and the value of one of the observations will be derived and restricted to the calculated value; can’t be anything else. In this case, degree of freedom is n-1 (N being the number of observations=10 and 1 being the number of parameters calculated, in this case, mean).

    In case of regression models (Linear regression), the equation can be divided into 3 segments, ESS (Explained Sum of squares), Total sum of squares and Residual sum of squares.
    TSS = ESS + RSS
    (n-1) (k) (n-k-1)

    Where
    Total sum of squares (TSS), Explained sum of squares (ESS) and Residual sum of square (RSS)
    N = number of observations
    K= Number of parameters in the regression
    n-1 due to calculation of mean value of y
    n-k-1 or n-(k+1) for the model having a constant term and k predictors (K+1) parameters to be calculated

    Therefore, if the number of observations remain same but as explanatory variables increases, df for RSS will go down, leading to increase in standard error of regression which is not a good sign as standard error of the regression represents the average distance that the observed values fall from the regression line. Conveniently, it tells you how wrong the regression model is on average using the units of the response variable. Smaller values are better because it indicates that the observations are closer to the fitted line.
    So higher the df, better it is. So, overfitting or adding unnecessary explanatory variables to the model can also lead to loss of degrees of freedom and impact on standard error. It is also reflected on the adjusted R squared metric which lowers with addition of irrelevant explanatory variables as it also accounts for loss of degrees of freedom.

    https://stats.stackexchange.com/questions/63245/df-for-regression-ss-in-multiple-linear-regression-when-sigma-is-known
    https://blog.minitab.com/en/adventures-in-statistics-2/regression-analysis-how-to-interpret-s-the-standard-error-of-the-regression

  2. Degree of freedom(df) is the number of observations of a sample or a dataset which can vary without any constraints, i.e.: it can take any value. Greater the sample size, more the number of degrees of freedom of variables, given the parameters. The constraints depend on estimating the parameters of a sample.

    For example: Mean of 10 observations can have 9 observations with any value (given the mean) and the value of one of the observations will be derived and restricted to the calculated value; can’t be anything else. In this case, degree of freedom is n-1 (N being the number of observations=10 and 1 being the number of parameters calculated, in this case, mean).

    In case of regression models (Linear regression), the equation can be divided into 3 segments, ESS (Explained Sum of squares), Total sum of squares and Residual sum of squares.
    TSS = ESS + RSS
    (n-1) (k) (n-k-1)

    Where
    Total Sum of squares (TSS), Explained Sum of squares (ESS), Residual sum of sqaures (RSS)
    N = number of observations
    K= Number of parameters in the regression
    n-1 due to calculation of Expected Value of y
    n-k-1 or n-(k+1) for the model having a constant term and k predictors (K+1) parameters to be calculated

    Therefore, if the number of observations remain same but as explanatory variables increases, df for RSS will go down, leading to increase in standard error of regression which is not a good sign as standard error of the regression represents the average distance that the observed values fall from the regression line. Conveniently, it tells you how wrong the regression model is on average using the units of the response variable. Smaller values are better because it indicates that the observations are closer to the fitted line.

    Higher the df, better it is. So, overfitting or adding unnecessary explanatory variables to the model can also lead to loss of degrees of freedom and impact on standard error. It is also reflected on the adjusted R squared metric which lowers with addition of irrelevant explanatory variables as it also accounts for loss of degrees of freedom.

    https://stats.stackexchange.com/questions/63245/df-for-regression-ss-in-multiple-linear-regression-when-sigma-is-known
    https://blog.minitab.com/en/adventures-in-statistics-2/multiple-regession-analysis-use-adjusted-r-squared-and-predicted-r-squared-to-include-the-correct-number-of-variables

Leave an answer

Browse
Browse