General information

Linear regression is one of the simplest ML models for regression tasks. Its main principle is in finding linear dependence between one dependent and multiple independent variables (features).

Linear Regression fits a linear model with coefficients $w = (w_1, …, w_n)$ to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation.

Description

Brick Locations

Bricks → Machine Learning → Linear Regression

Brick Parameters

Simple mode

Regularization

Regularization is a technique used for tuning the function by adding a penalty term in the error function, which reduces overfitting. The model supports the following types of regularization:
- Lasso Regression (L1) Least Absolute Shrinkage and Selection Operator, is a type of linear regression that uses shrinkage. Shrinkage is where data values are shrunk towards a central point like “mean”. The lasso procedure encourages simple, sparse models (i.e. models with fewer parameters). This particular type of regression is well-suited for models showing high levels of multicollinearity or when you want to automate certain parts of model selection, like variable selection/parameter elimination. The cost function for Lasso regression is:
  
  $$ \sum_{i=1}^n(y_i-\hat{y})^2=\sum_{i=1}^n(y_i-\sum_{j}xij \cdot w_j)^2+\lambda\sum_{j}|w_j| $$
- Ridge Regression (L2) - (also known as Tikhonov regularization), ridge regression shrinks the coefficients and it helps to reduce the model complexity and multi-collinearity
  
  $$ \sum_{i=1}^n(y_i-\hat{y})^2=\sum_{i=1}^n(y_i-\sum_{j}xij \cdot w_j)^2+\lambda\sum_{j}w_j^2 $$
- ElasticNet - linearly combines both the L1 and L2 penalties of the Lasso and Ridge methods.
Target variable

The column which contains values for the model to predict.
Disallow negative predictions

This checkbox forces the model to round up negative values to be equal to 0.
Columns

Columns from the dataset that are ignored during training. However, they will be present in the resulting dataset. Multiple columns can be selected by clicking the + button.

In case you want to remove a large number of columns, you can select the columns to keep and use the flag ‘Remove all except selected’.

Advanced mode

Has the same set of parameters as in the simple mode with one additional parameter:

Train Explainer

If checked, the model explainer for API usage is built.

Brick Inputs/Outputs

Inputs

Brick takes the dataset
Outputs
- Brick produces the dataset with an extra column for predicted target value by the model
- A trained model that can be used in other bricks as an input

Additional Features

Model performance

Gives you a possibility to check the model's performance (a.k.a. metrics) to then adjust your pipeline if needed. Available after successful brick run.

Supported metrics: MAPE (Mean Average Percentage Error), R2, RMSE (Root Mean Square Error).