General information

Isolation Forest is an unsupervised ensemble learning algorithm for anomaly detection that works on the principle of isolating anomalies in the leaves.

Isolation Forest isolates observations by randomly selecting a feature and then randomly selecting a split value between the minimum and maximum values of the selected feature.

The path length from the root node to the terminating node averaged over a forest of such random trees, is a measure of normality and our decision function.

Untitled

Random partitioning produces noticeably shorter paths for anomalies. Hence, when a forest of random trees collectively produces shorter path lengths for particular samples, they are very likely to be anomalies.

Description

Brick Locations

BricksMachine Learning → Isolation Forest

Brick Parameters

Brick Inputs/Outputs

Example of usage

Let’s try to indicate anomalies in the data from the ‘segmentation_moons.csv’ dataset using the Isolation Forest algorithm. The dataset consists of 3 columns: ‘Unnamed: 0’, ‘0’ and ‘1’.

Untitled

We can connect this dataset directly to the Isolation Forest Brick and leave the default parameter values for the ‘Number of estimators’ as 100 and the ‘Contamination’ equal to 0,1. Also, we should filter column ‘Unnamed: 0’ as it sets the index of the record and doesn’t represent any feature of the sample.