When to use Linear and when yo use Logistic regression?
Question
Classification and regression
solved
0
Machine Learning
55 years
8 Answers
1652 views
Grand Master 0
Answers ( 8 )
Linear Regression is used when you are dealing with continuous target variables( For Example – 12.5,75, 34.7) .
Examples –
1)Suppose you want to predict the rent of houses. The rent is going to be something like 10000,150000,25000.
2)You want to predict the final score of an innings in a one day match given the situation at the end of 25 overs.
Logistic Regression is used when you your target variable is binary (For example -1 and 0).
You will find its application in cases where there are only two possible outcomes
Examples –
1) credit default, where in the person who has taken the credit will either default or pay the requisite amount.
2) A person will be diagnosed with cancer or not.
Also, Logistic Regression gives probabilities as the output, which we also call as propensity scores.
You can choose a cut-off point value for these probabilities to get your binary outcomes according to the business problem you are solving.
For Example – If you choose a cut-off point of 0.4, all the values above 0.4 will be categorized as 1 and all the values below
0.4 will be categorized as 0.
Now, you might have a doubt that propensity scores are a continuous value, so why not use Linear Regression to predict
propensity scores. Yes, Linear Regression can predict propensity scores, but there is a high chance that it will predict
probabilities that are higher than 1 and lower than 0, which is not feasible for a binary problem.
The predictor variables can be of any category in both the cases. They can be continuous as well as categorical.
The target variable will majorly decide the kind of algorithm you choose to apply to solve the problem.
Linear regression general equation Y = b0 + b1*x1+b2*x2
Logistic regression general equation Y = b0 + b1*x1 + b2*x2
both equations look similar but they are not. In linear regression, Y is a continuous feature which is basically the sales price of the product whereas Y in logistic regression is a continuous feature which is basically the log of odds (log(p(x)/(1-p(x)) where p(x) is the probability of an outcome). Since Y is not our aim, our aim is to find that probability.
So Linear regression is used when our aim is to find a target that is continuous in nature.
But can we use linear regression for the classification tasks? The answer is no, First reason, if we are applying linear regression to classify the multiclass problem then changing the order, will change the model’s parameters. Because there is no straight rule for assigning coding for the categorical variables.
Second, for example, binary classification problem if we have probabilities then we can classify points into two classes based on some threshold value. But linear regression will give values which can have negative values and even greater than 1 (You can see the above image uploaded by swaplaw007).
So instead of using linear regression, we use logistic regression for the classification tasks. It considers the sigmoid function (s shape curve) which comprises the value between 0 and 1. it can be used for binary and as well as for multiclass classification problems.
I have question for the community. I have mentioned that sigmoid function is used in logistic regression. What can be another function which can replace the sigmoid function?
answer to Richa Shankar , since sigmoid is an activation function we can use some other activation function like tanh or relu
Classification And Regression
y=mx+c (both have same equation but different working)
It both refers to predicting where Regression refers to Predicting a continuous value and Classification refers to predicting an event
For example : Regression
Predicting sales of car
Predicting Cost of advertisement for increasing sales.
Example of Classification:
Whether Fire alarms rings or not
where a person have cancer or not
In regression, the model predicts the value of y which is continuous and the values can be greater than 1 or less than 0 also
whereas in classification model we find the probability (log(p(x)/1-p(x))) and based on the probability and calculating threshold value we can divide the model into 2 classes i.e., either the event happens or not (either 1 or 0)
The essential difference between these two is that Logistic regression is used when the dependent variable is binary in nature. In contrast, Linear regression is used when the dependent variable is continuous and the nature of the regression line is linear.
Linear Regression is a supervised regression model.
Logistic Regression is a supervised classification model.
Outcome
In linear regression, the outcome (dependent variable) is continuous. It can have any one of an infinite number of possible values.
In logistic regression, the outcome (dependent variable) has only a limited number of possible values.
The dependent variable
Logistic regression is used when the response variable is categorical in nature. For instance, yes/no, true/false, red/green/blue, 1st/2nd/3rd/4th, etc.
Linear regression is used when your response variable is continuous. For instance, weight, height, number of hours, etc.
Equation
Linear regression gives an equation that is of the form Y = mX + C, which means equation with degree 1.
However, logistic regression gives an equation which is of the form Y = eX + e-X
Linear Regression:
1) Linear regression is used to predict the continuous dependent variable using a given set of independent variables
2) Linear Regression is used for solving regression problem
3) Least square estimation method is used for estimation of accuracy
4) best fit line
5) The output is a continuous variable like price, quantities, etc
Logistic Regression:
1) Logistic Regression is used to predict the categorical dependent variable using a given set of independent variables
2) Logistic regression is used for solving Classification problems
3) Maximum likelihood estimation method is used for estimation of accuracy
4) S-Shaped cure
5) Output: 0 or 1
Linear regression makes a linearization of a problem where y=f(x), with x and y are continuous variables.
Now imagine that you want to predict a kind of boolean behavior (yes/no) based on a x value. For example, based on your salary, are you happy or not.
You can say happy = 1 and not happy = 0. You can make a scatter plot with all pairs (salary, happy) (happy in the vertical axis).
You can try to make a line to separate the happy people from the unhappy ones, but you’ll see quickly that’s not working well (what is a value in the middle, etc.).
A better idea would be to draw a kind of s curve which will pass as best through the points you have.
That’s what the logistics regression makes. It basically makes linear the s curve by transforming the y values: this is the Logit function.
We then say that we predict “True” if the predicted logit is higher than a threshold. This threshold corresponds often to 0.5, which is the inflexion point of the curve (some tools are only allowing to use 0.5). That threshold is in fact the probability.
When we have more than yes/no possibilities, one solution is to make the logistics regression for all the possibilities (e.g. if you have A, B, C, that will be A/not A, B/not B, C/not C) and take the possibility that give you highest probability. This is called the “one-vs-all” approach.
Linear Regression is used for predicting continuous variables.
Logistic Regression is used for predicting variables which has only limited values.
For instance, if X contains the area in square feet of houses, and Y contains the corresponding sale price of those houses, you could use linear regression to predict selling price as a function of house size. While the possible selling price may not actually be any, there are so many possible values that a linear regression model would be chosen.
If, instead, you wanted to predict, based on size, whether a house would sell for more than 200K, you would use logistic regression. The possible outputs are either Yes, the house will sell for more than $200K, or No, the house will not.