Regression is an important approach for data modeling and analyzing. It is a form of predictive modeling technique examining the relationship between a dependent (target) and independent (features) variables. A line to the data points is fitted so that the differences between the distances of data points from the curve or line is minimized. This method is considered as a machine learning algorithm based on supervised learning. It is used for forecasting, time series, and finding the causal effect relationship between the variables.
Simple linear regression is a type of regression analysis where only one feature is used, and there is a linear relationship between the independent and dependent variables. Based on the given data points, a line that models the points the best is plotted. The hypothesis line can be modeled based on the linear equation:
Training the model means finding values of 0 and 1 so that the got line is fitted given data points in the best way. After training, it can predict the value of y for the input value of x.
To have the best-fit regression line, the model minimizes the error difference between the predicted value and the true value. This difference is called cost function. The most used cost function for the linear regression is Root Mean Squared Error (RMSE):
Because of the cost function form, this technique is called the least-squares method.
To perform training the model with a given cost function, we need the next important concept, gradient descent. The idea is to start with random 0 and 1 values and then iteratively update the values, reaching minimum cost. The gradient descent helps us on how to change the parameter values. It can be guessed from the method name, gradients from the cost function are used. To find these gradients, partial derivatives are taken with respect to 0 and 1. The parameter values of the next iteration are defined from the current iteration values and calculated gradients:
The linear regression cost function is convex and has a simple form. Its minimum cost can even be found analytically, without gradient descent. However, in more complicated cases with a non-convex cost function, gradient descent can fall into the trap of local minima, and a choice of the learning rate helps to avoid it.
We have seen the concept of simple linear regression where a single feature x was used to predict the target y. In many applications, more than one factor influences the response. Multiple linear regression models thus describe how a single response depends linearly on several predictors. The term was first used by Pearson in 1908. The model has the following form:
The multiple regression is based on the following assumptions:
- The relationship between variables is linear. In practice, this assumption can rarely be fulfilled. However, if curvature in the relationships is evident, you may consider feature transformation.
- Residuals (the difference between predicted and observed values) have the Gaussian (normal) distribution. In many cases, this assumption can be exchanged by asymptotic normality provided by the central limit theorem.
This technique induces the desire to use as many features as possible, and usually at least a few of them is becoming significant. There is an appropriate variant of the regression — stepwise regression when step by step features are added (or removed) to the model. The best result from all steps is chosen then. The feature number is restricted by the observation number. You can’t add endless features in the model. Most authors recommend that you should have at least 10 to 20 times as many observations as you have features. Otherwise, the estimates of the regression will be probable, and this model will have the wrong capacity to be predicted after learning.
It’s essential to understand the most conceptual limitation of regression approach. It gives tools to discover relationships between features and targets, but we can never be completely sure about the underlying causal mechanism. Regression analysis can only give food for thought to understand the reasons for found relations.