Regressions

Discreet vs Continuous learning

Here is an example of descreet data. Input is numerical, but out put is binary (or dimensional or so to say, there is no order).

Linear regression is meant for numerical results (this chapter is only about linear regression). Logistic regression is for discrete results (used for classification).

Both linear regression and logistic regression have the same drawbacks. Both have the tendency to “overfit,” which means the model adapts too exactly to the data at the expense of the ability to generalise to previously unseen data. Because of that, both models are often “regularised,” which means they have certain penalties to prevent overfit. Another drawback of linear models is that, since they’re so simple, they tend to have trouble predicting more complex behaviours.

Slope And Intercept

Code example

Here is an example how to train linear regression algorithm. We print out the slope (coef_) and intercept. Then you can access r-squared score for a dataset (it is way to test regression algorithm). Higher r-squared is, better the algoritm is performing (maximum is 1.0).

from sklearn import linear_model
reg = linear_model.LinearRegression()
reg.fit ([[0, 0], [1, 1], [2, 2]], [0, 1, 2])

reg.coef_
reg.intercept_
reg.score([[0, 0], [1, 1], [2, 2]], [0, 1, 2]) # here you should use test data (not training data like I did)

How to evaluate linear regression errors

Minimizing of error using Sum Square Errors

The best regression is the one that minimizes SUM( |error| ). m is slope and b is intercept. We can use these algorithms to minimize squeare error:

  • Ordinary least square (OLS)

Why is it good to think of regression as minimizing of square errors? It is because it increases accuracy by finding line that is just far enough from all the points in training dataset.

SSE is not perfect

R squared

Last updated