Linear regression is one of the simplest ML models for regression tasks. Its main principle is in finding linear dependence between one dependent and multiple independent variables (features).
Linear Regression fits a linear model with coefficients $w = (w_1, …, w_n)$ to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation.
Bricks → Machine Learning → Linear Regression
Simple mode
Regularization
Regularization is a technique used for tuning the function by adding a penalty term in the error function, which reduces overfitting. The model supports the following types of regularization:
Lasso Regression (L1) Least Absolute Shrinkage and Selection Operator, is a type of linear regression that uses shrinkage. Shrinkage is where data values are shrunk towards a central point like “mean”. The lasso procedure encourages simple, sparse models (i.e. models with fewer parameters). This particular type of regression is well-suited for models showing high levels of multicollinearity or when you want to automate certain parts of model selection, like variable selection/parameter elimination. The cost function for Lasso regression is:
$$ \sum_{i=1}^n(y_i-\hat{y})^2=\sum_{i=1}^n(y_i-\sum_{j}xij \cdot w_j)^2+\lambda\sum_{j}|w_j| $$
Ridge Regression (L2) - (also known as Tikhonov regularization), ridge regression shrinks the coefficients and it helps to reduce the model complexity and multi-collinearity
$$ \sum_{i=1}^n(y_i-\hat{y})^2=\sum_{i=1}^n(y_i-\sum_{j}xij \cdot w_j)^2+\lambda\sum_{j}w_j^2 $$
ElasticNet - linearly combines both the L1 and L2 penalties of the Lasso and Ridge methods.
Target variable
The column which contains values for the model to predict.
Disallow negative predictions
This checkbox forces the model to round up negative values to be equal to 0.
Columns
Columns from the dataset that are ignored during training. However, they will be present in the resulting dataset. Multiple columns can be selected by clicking the + button.
In case you want to remove a large number of columns, you can select the columns to keep and use the flag ‘Remove all except selected’.
Advanced mode
Has the same set of parameters as in the simple mode with one additional parameter:
Train Explainer
If checked, the model explainer for API usage is built.
Inputs
Brick takes the dataset
Outputs
Model performance
Gives you a possibility to check the model's performance (a.k.a. metrics) to then adjust your pipeline if needed. Available after successful brick run.
Supported metrics: MAPE (Mean Average Percentage Error), R2, RMSE (Root Mean Square Error).