Regressions
Last updated
Last updated
Here is an example of descreet data. Input is numerical, but out put is binary (or dimensional or so to say, there is no order).
Here is continuous example, both input and output are numerical. For example, taller person weights more, this is called continous learning. Here are few other examples to test your understanding of discreet and continuous learning. Here is one more example.
Linear regression is meant for numerical results (this chapter is only about linear regression). Logistic regression is for discrete results (used for classification).
Both linear regression and logistic regression have the same drawbacks. Both have the tendency to “overfit,” which means the model adapts too exactly to the data at the expense of the ability to generalise to previously unseen data. Because of that, both models are often “regularised,” which means they have certain penalties to prevent overfit. Another drawback of linear models is that, since they’re so simple, they tend to have trouble predicting more complex behaviours.
Slope says the angle of the line. Intercept says where the line is positioned. Here is another example showing slope and intercept. In SciKitt, coef__ is slope.
Here is an example how to train linear regression algorithm. We print out the slope (coef_) and intercept. Then you can access r-squared score for a dataset (it is way to test regression algorithm). Higher r-squared is, better the algoritm is performing (maximum is 1.0).
The best regression is the one that minimizes SUM( |error| ). m is slope and b is intercept. We can use these algorithms to minimize squeare error:
Ordinary least square (OLS)
Why is it good to think of regression as minimizing of square errors? It is because it increases accuracy by finding line that is just far enough from all the points in training dataset.
Error is length between regression line and actual position of the point. For this example, the distance between line and point is -18.75, therefore, the error for a point is -18.75. What is a good criterion for a good fit? It is sum of all |errors| (sum of all absolute values for all points).
Gradient descent
SSE is not perfect because it dependents on number of point in the training dataset.
R squared is better metric for evaluation of Regression if number of points in the dataset is changing. If the result of r-squared is close to 1, the regression line does good predicting job.Here are few additional info for Regression metrics.