Very briefly explain the assumptions of Linear Regression
This might look like a standard subjective question but any interviewer will assume that you know at least Linear and Logistic Regression irrespective of whether it’s mentioned on your resume.
This is the first question you start with. Do not copy and paste. Go through a couple of tutorials or websites, understand then post it here. Even if you do not post, do understand the concept
Answers ( 20 )
1. Y and X have linear relationship between them
2. All the features(Xi’s) are independent of each other
3. The error term follows normal distribution
4. The covariance between Xi’s and error term is 0
1. Independent and dependent variables must have a linear relationship
2. The independent variables must not be correlated with one another
3. Normal distribution of residuals of model
4. Residuals must have constant variance
5. No correlation between error terms
1. Linear relationship between the features and target: According to this assumption there is a liner relationship between the feature and the target.
2. There is no multicollinearity between the features.
3. Normal distribution of the error terms: As per this assumption error follows a normal distribution.
4. Homoscedasticity distribution.
1) Dependent and independent should have linear relationship
2) the independent variables should be independent of each other
3) the error term should be normally distributed
4) there should be no correlation between residuals or error terms
5) error terms should have constant variance
1. The target variable and feature variables are assumed to be linearly related
2. Feature variables are linearly independent.
3.Residuals are the difference between the true value and predicted value. It is assumed that the mean of residuals is zero.
4. Residuals are assumed to have equal variance accross the regression line.
5. Residuals are assumed to be normally distributed.
6. Residuals are assumed not to be correlated. If the residuals are correlated, tit means that the present value is dependent on the previous value. It usually happens in time series data.
1. The relationship between X and Y is linear.
2. The errors are normally distributed with mean 0.
3. The errors have constant variance or they are said to be homoscedastic.
4. The X variables are uncorrelated. There is absence of multicollinearity.
5. The X variable and errors are uncorrelated.
6. The errors are not autocorrelated.
1. Dependent and independent variables should hve linear relationship
2. No correlation between the error (residual) terms
3. The independent variables should not be correlated to each other. There should be no multicollinearity
4. The error terms must have constant variance (i.e.,) Homoscedasticity
5. The error terms should be normally distributed
The dependent variable Y and the independent variable Y are lineary related( straight line relationship).
2. On an average the errors are zero.
3. The errors are homoscedastic i.e have same variance. (homo means same)
4. There is no covariance between independent variables (X) i.e. there is no multicollinearity.
5. The covariance between X variable and errors is zero.
6. There is no autocorrelation between errors of two time periods.
There are four assumptions associated with a linear regression model:
1-Linearity: The relationship between X and the mean of Y is linear.
2- Homoscedasticity: The variance of residual is the same for any value of X.
3-Independence: Observations are independent of each other.
1. The relationship between individual features and the target variable should be linear.
2. No feature should be predictable by any other feature (No two features must be correlated).
3. All the features and the target variable should be normally distributed.
4. The variance of residual should be same as that of any feature.
Basic assumptions of linear regression model are:
1. The relationship between X and mean of Y must be linear.
2. Independent observations
3. Variance of residual is same for all value of X
4. For fixed value of X, Y is normally distributed.
Relationship between dependent and independent variables should be linear
Variance of residual should be same for any independent variable
The data should be normally distributed with mean 0 and STD dev as 1
Observations should be independent of each other
Assumptions of linear regression:
– Existence of linear relationship between the variables under study
– Homoscedasticity of the error terms
– No autocorrelation
– Independence of observations
– The error terms follow normal distribution with constant variance and 0 mean
1. The response variable y should be linearly related to explanatory variable x. This can be verified using a correlation matrix.
2. Residual errors should be:
a. Random variables
b. Independent random variables
c. Their probability distribution should be identical
Residual errors may not be independent, due to following reasons:
1. One or more important explanatory variable is missing from the model.
2. Explanatory variables are themselves related to each other. (Multi collinearity)
3. Residual errors should be normally distributed.
1. Dependent variable are independently distributed.
2. There is linear correlation between X and y.
3. Error are normally distributed.
4. Error should have constant variance (Homoscedasticity).
5.There is no correlation between independent variables (No multicollinearity).
6.Error should have independently distributed.( No autocorrelation).
7. No outlier.
Multicollinearity
Linearity
Normality
Auto correlation
Homoscedasticity
1.Their should be linear relation ship between x and y
2.Error terms should be normally distributed
3.Error terms are independent of each other
4.Error terms have constant variance
1) There should be linear relationship between dependent and independent variable
2) Multicollinearity could exist but a good linear regression model should not have multicollinear independent variables
3) Gradient decent play a major role in this, behind the scene algorithm always tries to find lowest value of m(coefficient) and c(error term), on the basis of equation y=mx+c
4) Untill it find appropriate values for m and c it keep on searching and caclulating.
5) once we have best m and c , finally it calculate R square(accuracy of model)
6) why we always tries to reduce m and c value: higher m= high slope and high change in goal value even the little change in x values. so smallest m always batter.
1) There should be linear relationship between dependent and independent variable
2) Multicollinearity could exist but a good linear regression model should not have multicollinear independent variables
3) Gradient decent play a major role in this, behind the scene algorithm always tries to find lowest value of m(coefficient) and c(error term), on the basis of equation y=mx+c
4) Untill it find appropriate values for m and c it keep on searching and caclulating.
5) once we have best m and c , finally it calculate R square(accuracy of model)
6) why we always tries to reduce m and c value: higher m= high slope and high change in goal value even the little change in x values. so smallest m always batter.
Assumptions of Linear Regression :
1) There should be linear relationship between dependent and independent variables.
2) The independent variables should not be correlated. (Multicollinearity should not be present.)
3) There should be no correlation between residual(error) terms. (Auto correlation should not be present.)
4) The error terms must have constant variance. (Homoscedasticity should be there.)
5) The error terms must be normally distributed.