XGBoost is an ensemble learning and a gradient boosting algorithm for decision trees that uses a second-order approximation of the scoring function. This approximation allows XGBoost to calculate the optimal “if” condition and its impact on performance. XGBoost can then store these in its memory in the next decision tree to save recomputing it.
While training, the XGBoost algorithm constructs a graph that examines the input under various “if” statements (vertices in the graph). Whether the “if” condition is satisfied influences the next “if” condition and eventual prediction. XGBoost progressively adds more and more “if” conditions to the decision tree to build a stronger model. By doing so, the algorithm increases the number of tree levels, therefore, implementing a level-wise tree growth approach.
XGBoost learns a model faster than many other machine learning models (especially among the other ensemble methods) and works well on categorical data and limited datasets.
This specific brick solves the classification task. If you need to solve a regression task, then you may be interested in the ‣
Bricks → Machine Learning → XGBoost Classification
Prediction mode
This parameter specifies the model's prediction format of the target variable:
Target Variable
The column that has the values you are trying to predict. Note that the column must contain exactly categorical values and no missing values, a corresponding error message will be given if done otherwise.
Optimize
This checkbox enables the Bayesian hyperparameter optimization, which tweaks the learning rate, as well as the number of iterations and leaves, to find the best model's configuration in terms of metrics.
Be aware that this process is time-consuming.
Filter Columns
If you have columns in your data that need to be ignored (but not removed from the data set) during the training process (and later during the predictions), you should specify them in this parameter. To select multiple columns, click the '+' button in the brick settings.
In addition, you can ignore all columns except the ones you specified by enabling the "Remove all except selected" option. This may be useful if you have a large number of columns while the model should be trained just on some of them.
Tree method
The tree construction algorithm. Supports next options:
Learning rate
Boosting learning rate. This parameter controls how quickly or slowly the algorithm will learn a problem. Generally, a bigger learning rate will allow a model to learn faster while a smaller learning rate will lead to a more optimal outcome.
Number of boosted iterations
A number of boosting iterations. This parameter is recommended to be set inversely to the learning rate selected (decrease one while increasing second).
Gamma
Minimum loss reduction required to make a further partition on a leaf node of the tree. The larger this parameter is, the more conservative the algorithm will be.
Maximum depth
Maximum depth of a tree. Increasing this value will make the model more complex and more likely to overfit