Register Now

Login

Lost Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Login

Register Now

It will take less than 1 minute to register for lifetime. Bonus Tip - We don't send OTP to your email id Make Sure to use your own email id for free books and giveaways

Linear Regression Quick Guide

Linear Regression Quick Guide
Linear Regression Most Asked Interview Questions and complete code in Python. This mini-book contains 50+ interview questions asked in analytics interviews. If you have mentioned regression in your resume then do go through this book before your interviews.

Get your hands on the most asked 50 Interview questions on Linear Regression. Link to the e-shop
30 minutes complete code guide on Linear Regression – Linear Regression Quick Guide

Why do you need Linear Regression Quick Guide?

Well Linear Regression is one of the most asked algorithms in any analytics interview, it is also one of the most used algorithms in the Analytics domain (mostly because of its flexibility to include new variables)

What type of questions can be asked in an interview?

You can very well play around your strength, pitch that you have created a model in L.R., and then next thing which you will receive are the following questions:-

1. What is Linear Regression?
2. L.R. mathematical formula?
3. What is the R squared error?
4. What is Adjusted R squared error?
5. What is the difference between R squared and Adjusted R squared error? Which is better?
6. How do you measure the accuracy of your model?
7. What is the degree of freedom?
8. What is correlation and what is its range?
9. What are the assumptions of Linear Regression?
10. Can you use Linear Regression for Classification?


We have covered all the important questions in our book below. It’s also available on Amazon.

Linear Regression Quick Guide

Introduction of Linear Regression:
Linear Regression is the most commonly used supervised machine learning algorithm. Linear regression is used to discover a linear relationship between one dependent variable(Y) and one or more independent variable(X). Linear regression is also known as Ordinary Least-Squares(OLS). Linear regression is used to predict the future scores of the dependent variable(Y) based on the measured score of the independent variable(X) when the dependent variable(Y) is continuous such as salary, age, sales, product price, etc.

Importance Of Linear Regression:

The importance of Linear Regression is that it is one of the easy to understand machine learning algorithm that can help business owners to grow by understanding the data they have the factor which help their business to grow, the factor which is contributing to the growth of the business and the other factors which is not at all helping the business to grow. So after understanding the data knowing that which data is more significantly contributing to the growth and which is not contributing to the growth they can manipulate or change the data for maximum profit.
The linear regression algorithm is used for:
1.Predicting the sales of Company.
2.Predicting the house price.
3.For Insurance prediction.

Correlation:
Correlation measures the relative strength of linear relationship between two variables Independent(X) and Dependent Variable(Y)
Correlation ranges from -1 to 1.
The closer to -1, the stronger is the negative linear relationship.
The closer to 1, the stronger is the positive linear relationship.
The closer to 0, the weaker is the positive linear relationship.

Assumptions of Linear Regression:
The assumptions of linear regression are as follow:
Assumption 1: There should be no outliers.
Assumption 2: Assumption of Linearity.
Assumption 3: Assumption of Normality.
Assumption 4: Assumption of Multicollinearity.
Assumption 5: Assumption of Independence.

Effects of Multicollinearity on Linear Regression:
Multicollinearity is a scenario in which the predictor variables or the independent variables are somehow highly correlated with each other. Multicollinearity is problematic because it is against the assumption of linear regression. If you are building the model only for the prediction purpose then the multicollinearity may not cause any problem. But in case if you want to check the regression coefficient how the independent variable affects the dependent variable in that case multicollinearity can be problematic.

Evaluation matrix for Linear Regression:


R Square:
R square is also described as the coefficient of determination. R square is used to determine the strength of correlation between the independent and the dependent variable. In simple terms R square lets us know how accurate our regression model is when compared to average. R square ranges between 0 to 1 higher the number the better is the accuracy or prediction of the model. If our R square is greater than 70% which is 0.7 indicated a good fit model.

Adjusted R Square:
The Adjusted R square is a modified version of the R square. Adding more independent variables will result in an increased value of R square irrespective of whether the new independent variable is significant or not. But in the case of Adjusted R square if the new independent variable added is insignificant the adjusted r square has the capability to decrease therefore resulting in a better, more reliable, and accurate evaluation.

RMSE:
RMSE stands for ROOT MEAN SQUARE ERROR is a standard way to measure the error rate of the model. RMSE is a standard deviation of residuals or errors. Residuals or Errors are a measure of how far the data points are from the regression line. RMSE is a value that should be closer to 0.

Steps for Performing Linear Regression in Python:
1. Create a dataframe properly –> pd.read_csv(), pd.read_excel()
2. Assumption 1-There should be no outliers in the data –> pd.boxplot()
3. Assumption 2-Assumption of Linearity –> pairplot()
4. Create X and Y
5. Assumption 3-Assumption of Normality of Y –> distplot(), log()
6. Handle the skewness in the X –> skew(), log1p()
7. Assumption no 4-There should be no multicollinearity –>corr(),
heatmap(), vif()
8. Splitting the data –> train_test_split(), manual splitting
9. Build the model:
a.Create the model object –> obj=LinearRegression()
b. Train the model –> obj.fit(X_train,Y_train)
c. Predict using the model –> Y_pred=obj.predict(X_test)
10.Evaluating the model:
a. Rsquare
b. Adjusted Rsquare
c. RMSE (ROOT MEAN SQUARE ERROR)
11.Tuning the model –> Manual feature selection, pvalues, Ridge
Regression, Lasso Regression, Applying Feature engineering, PCA
principle component

Linear Regression Interview Questions

If you want to explore the video version of the topic and most asked questions – Then please subscribe to our Youtube Channel – The Data Monk

About TheDataMonkGrand Master

I am the Co-Founder of The Data Monk. I have a total of 6+ years of analytics experience 3+ years at Mu Sigma 2 years at OYO 1 year and counting at The Data Monk I am an active trader and a logically sarcastic idiot :)

Follow Me

Leave a reply