Random forest is an ensemble learning method based on the construction of a multitude of decision trees. It solves regression tasks by outputting the averaged prediction of the individual trees.

Description

Brick Locations

Bricks → Machine Learning → Random Forest Regression

Simple mode

Target variable

Binary target column to predict from the input dataset.
Optimize

This checkbox enables the Bayesian hyperparameter optimization, which tweaks the learning rate, as well as the number of iterations and leaves, to find the best model's configuration in terms of metrics.

Be aware that this process is time-consuming.
Disallow negative predictions

This checkbox forces the model to round up negative values to be equal to 0.
Filter Columns Settings - Columns

Columns from the dataset that are ignored during the training but not removed from the dataset. Multiple columns can be selected by clicking the + button.

In case you want to remove a large number of columns, you can select the columns to keep and use the flag ‘Remove all except selected’.

Advanced mode

The advanced mode features an additional set of parameters:

Learning rate

Boosting learning rate. This parameter controls how quickly or slowly the algorithm will learn a problem. Generally, a bigger learning rate will allow a model to learn faster while a smaller learning rate will lead to a more optimal outcome.
Number of boosted trees

A number of estimators (boosted trees) to fit.
Number of leaves

Maximum tree leaves for base learners. The main parameter to control model complexity. Higher values should increase accuracy but might lead to overfitting.
Minimum data in leaves

The minimum number of data needed in a child (leaf).
Minimum Loss Reduction

The minimal gain to perform a split. Can be used to speed up the training process.
Bagging Fraction

A fraction of data samples (rows) that will be randomly selected on each iteration (tree).
Feature Fraction

A fraction of features (columns) that will be randomly selected on each iteration (tree).
L1 regularization weight

Lasso Regression (L1) - Least Absolute Shrinkage and Selection Operator - regularization term on weights.
L2 regularization weight

*Ridge Regression (L2) *****regularization term on weights.