Linear regression is an approach for modelling the relationship between a scalar dependent variable y and one or more explanatory variables (or independent variables) denoted X. The case of one explanatory variable is called simple linear regression.

In regression we are allowed to have arbitrary input values, but the outputs tend to be binary. Now this kind of output is called discrete, but in many learning problems, our output could be continues as well.
So for example, if the input is the height of a person, and you put it output is the weight, then what you find is probably a function. It says that the taller a person, the more a person weights. And in this case the output is not a binary concept like, light or heavy. It’s a continuous concept, and the output itself is also continuous. So this is what we call continuous supervised learning.

The linear regression equation is:
where y is the output of our predictions, x is the input, m is slope and b is the intercept. We get the output in the form of a line.

Example code:

from sklearn import linear_model
reg = linear_model.LinearRegression(), target_train)
print reg.coef_
print reg.intercept_
print reg.score(feature_test, target_test)

To find the coefficients and intercept of the line we can use reg.coef_ and reg.intercept_ like you can in the code above.
To know the performance of our regression we can use score function performed on our regression. One performance metric that we use is r-squared. And the higher our r-squared is, the better. R-squared has a maximum value of one.

Errors is a technical term and that’s the difference in the actual output and the net output that’s predicted by our regression line.
Now the best chance of giving a good fit to the data is by minimizing the sum of the squared error on all the data points. This has the advantages that we get with the absolute value of the error because even if we have an error that’s negative, when we square it, it becomes positive, and of course, if it’s positive to begin with, it’ll still be positive after you square it.
But there’s a problem with SSE. Like in the picture below:

the distribution on the right has a larger sum of squared errors even though it’s probably not doing a much worse job of fitting the data than the distribution on the left. And this is one of the shortcomings of the sum of squared error in general as an evaluation metric.

As we add more data the sum of the squared error will almost certainly go up, but it doesn’t necessarily mean that our fit is doing a worse job.

However, if we are comparing two sets of data that have different number of points in them then this can be a big problem, because if we are using the sum of square errors to figure out which one is being fit better then the sum of squared errors can be jerked around by the number of data points that you’re using, even though the fit might be perfectly fine.
So, we use another metric called R-squared metric in regression.
And, what r squared is, is it’s a number that effectively answers the question, how much of my change in the output is explained by the change in my input?And, the values that r squared can take on, will be between 0 and 1. If the number is very small, that generally means that your regression line isn’t doing a good job of capturing the trend in the data. On the other hand, if r squared is large, close to 1, what that means is your regression line is doing a good job of describing the relationship between your input, or your x variable, and your output, or your y variable.
The whole point of performing a regression is to come up with a mathematical formula that describes this relationship.

The good thing about r squared is that it’s independent of the number of training points. So, it’ll always be between 0 and 1. So, this is a little bit more reliable than a sum of squared errors especially, if the number of points in the data set could potentially be changing.
In the code above, reg.score() gives us the r-squared value of our regression.

Classification vs regression:
1. Output type: Regression- the output variable takes continuous values.
Classification: the output variable takes class labels.
2. What we try to find: In the case of classification this is usually a decision boundary. And then depending on where a point follows relative to that decision boundary you can assign it a class label. With a regression what we’re trying to find is a best fit line.
3. Evaluation: In supervised classification we usually use the accuracy, which is whether it got the class labels correct or not on your test set. And for regression we have different evaluation metrics, one of which is called the sum of a squared error. Another one is called r squared.