Classes weights can be used in order to balance the dataset. They give all the classes equal importance on gradient updates when model is training regardless of how many samples we have from each class in the training data. This prevents models from predicting the more frequent class more often just because it’s more common.
Bricks → Machine Learning → Classes Weighting
Target variable
The column which contains target variable of classes.
Weights adjustment
In order to activate it, run the pipeline.
There you can enter the weights or use ‘Auto Suggestion’ to do it in the optimized way.
Inputs
Brick takes the classified dataset.
Outputs
Brick produces the dataset with a weighted column.
Let's consider the binary classification problem Binary classification : Titanic
The dataset is unbalanced since Survived column contains less zero than one. That is why it may be useful to do classes weighting before using machine learning for classification.
We use simple Logistic Regression to classify data and get following result. We notice that even for class 1 we get more predictions of 0 because of imbalance in initial data.
Let’s add Class Weighting brick to see what will change.
As a ‘Target variable’ we choose ‘survived’ and use ‘Auto Suggestion’.