Question

Define Degree of Freedom with example

Question

Give proper example. You can also use to show how it impacts your selection of Linear Regression Model

in progress 0

Statistics TheDataMonk 56 years 2 Answers 1947 views Grand Master 0

About TheDataMonkGrand Master

I am the Co-Founder of The Data Monk. I have a total of 6+ years of analytics experience 3+ years at Mu Sigma 2 years at OYO 1 year and counting at The Data Monk I am an active trader and a logically sarcastic idiot :)

Follow Me

Answers ( 2 )

Leave an answer

Name*

E-Mail*

Website

Attachment

Browse

Featured image

Browse

Answer*

Previous question

Next question

Sanjanaa Member · Answer 1 · June 1, 2021

Degree of freedom (df) is the number of observations of a sample or a data which can vary without any constraints, i.e.: it can take any value. Greater the sample size, more the number of degrees of freedom of variables, given the parameters. The constraints depend on estimating the parameters of a sample.

For example: Mean of 10 observations can have 9 observations with any value (given the mean) and the value of one of the observations will be derived and restricted to the calculated value; can’t be anything else. In this case, degree of freedom is n-1 (N being the number of observations=10 and 1 being the number of parameters calculated, in this case, mean).

In case of regression models (Linear regression), the equation can be divided into 3 segments, ESS (Explained Sum of squares), Total sum of squares and Residual sum of squares.
TSS = ESS + RSS
(n-1) (k) (n-k-1)

Where
Total sum of squares (TSS), Explained sum of squares (ESS) and Residual sum of square (RSS)
N = number of observations
K= Number of parameters in the regression
n-1 due to calculation of mean value of y
n-k-1 or n-(k+1) for the model having a constant term and k predictors (K+1) parameters to be calculated

Therefore, if the number of observations remain same but as explanatory variables increases, df for RSS will go down, leading to increase in standard error of regression which is not a good sign as standard error of the regression represents the average distance that the observed values fall from the regression line. Conveniently, it tells you how wrong the regression model is on average using the units of the response variable. Smaller values are better because it indicates that the observations are closer to the fitted line.
So higher the df, better it is. So, overfitting or adding unnecessary explanatory variables to the model can also lead to loss of degrees of freedom and impact on standard error. It is also reflected on the adjusted R squared metric which lowers with addition of irrelevant explanatory variables as it also accounts for loss of degrees of freedom.

https://stats.stackexchange.com/questions/63245/df-for-regression-ss-in-multiple-linear-regression-when-sigma-is-known
https://blog.minitab.com/en/adventures-in-statistics-2/regression-analysis-how-to-interpret-s-the-standard-error-of-the-regression

Sanjanaa Member · Answer 2 · June 1, 2021

Degree of freedom(df) is the number of observations of a sample or a dataset which can vary without any constraints, i.e.: it can take any value. Greater the sample size, more the number of degrees of freedom of variables, given the parameters. The constraints depend on estimating the parameters of a sample.

For example: Mean of 10 observations can have 9 observations with any value (given the mean) and the value of one of the observations will be derived and restricted to the calculated value; can’t be anything else. In this case, degree of freedom is n-1 (N being the number of observations=10 and 1 being the number of parameters calculated, in this case, mean).

In case of regression models (Linear regression), the equation can be divided into 3 segments, ESS (Explained Sum of squares), Total sum of squares and Residual sum of squares.
TSS = ESS + RSS
(n-1) (k) (n-k-1)

Where
Total Sum of squares (TSS), Explained Sum of squares (ESS), Residual sum of sqaures (RSS)
N = number of observations
K= Number of parameters in the regression
n-1 due to calculation of Expected Value of y
n-k-1 or n-(k+1) for the model having a constant term and k predictors (K+1) parameters to be calculated

Therefore, if the number of observations remain same but as explanatory variables increases, df for RSS will go down, leading to increase in standard error of regression which is not a good sign as standard error of the regression represents the average distance that the observed values fall from the regression line. Conveniently, it tells you how wrong the regression model is on average using the units of the response variable. Smaller values are better because it indicates that the observations are closer to the fitted line.

Higher the df, better it is. So, overfitting or adding unnecessary explanatory variables to the model can also lead to loss of degrees of freedom and impact on standard error. It is also reflected on the adjusted R squared metric which lowers with addition of irrelevant explanatory variables as it also accounts for loss of degrees of freedom.

https://stats.stackexchange.com/questions/63245/df-for-regression-ss-in-multiple-linear-regression-when-sigma-is-known
https://blog.minitab.com/en/adventures-in-statistics-2/multiple-regession-analysis-use-adjusted-r-squared-and-predicted-r-squared-to-include-the-correct-number-of-variables

Register Now

Login

Lost Password

Define Degree of Freedom with example

About TheDataMonkGrand Master

Related questions

https://thedatamonk.com/add-question/

Want to get funny velcro morale patches?

Professional Security Guard Service

Advantage and Disadvantage of different sampling method

How do you create a sample data of 1000 rows from a population of 1 Million rows and 100 columns?

Answers ( 2 )

Leave an answer