General Information

This brick provides you with an easy interface for creating your own out-of-box and distributed gradient boosted decision tree for your multiclass classification tasks (you can use this brick for binary classification as well, but we would recommend using LBGM Binary instead). Due to its leaf-wise processing nature, the created model can be easily trained on large datasets, while giving formidable results.

The models are built on three important principles:

Weak learners
Gradient Optimization
Boosting Technique

In this case, the weak learners are multiple sequential specialized decision trees, which do the following things:

the first tree learns how to fit to the target variable
the second one learns how to fit the difference between the predictions of the first tree and the ground truth (real data)
The next tree learns how to fit the residuals of the second tree and so on.

All those trees are trained by propagating the gradients of errors throughout the system.

The main drawback of the LGBM Binary is that finding the best split points in each tree node is both a time-consuming and memory-consuming operation.

Description

Brick Locations

Bricks → Analytics → Data Mining / AI → Classification Models → LGBM Multiclass

Brick Parameters

Learning rate

Boosting learning rate. This parameter controls how quickly or slowly the algorithm will learn a problem. Generally, a bigger learning rate will allow a model to learn faster while a smaller learning rate will lead to a more optimal outcome.
Number of iterations

A number of boosting iterations. This parameter is recommended to be set inversely to the learning rate selected (decrease one while increasing second).
Number of leaves

The main parameter to control model complexity. Higher values should increase accuracy but might lead to overfitting.
Prediction mode

This parameter specifies the model's prediction format of the target variable:
- Class - get predictions as a single value of the 'closest' target class for each data point. Will create only one column with the "predicted_" prefix
- Probability of class - get the numerical probability of each class in a separate column.