- nonlinear ensemble method, used to better represent a non-linear feature space
- basically any feature space, where data is not aligned is an apparent linear fashion
- since trees don’t use coefficients they can easily handle non-linear data and don’t have the same problems which all linear models face
- no need for regularisation across data features
- no problem of collinearity among features
- no problems from non-linear data
- no problems from correlated error terms (which were assumed independent in Linear Regression - funnel shaped residual graphs )
How are they made ?
- similar to decision trees, we follow a top down approach by starting with any feature
- then we go through all the data with this single node tree and evaluate the square residuals at each data point
- following this we change the threshold value, calculating the square residuals and stop at the least value of the square residual
- this is iterated till we get a single leaf at each depth
- this sequence is followed for all the features in the feature space, and the root node is assigned to the feature with the lowest square residuals, from where we once again follow this procedure till we reach leave nodes at all ends

<aside>
💡
even though, this would make an impeccable tree, which predicts PERFECTLY - but that’s a trap, the Bias-Variance Trap
- this can be minimized by using some threshold on the number of data points present per splitting criterion
</aside>
How to prune regression trees ?
- a method to prevent the over fitting to the data
- this method usually relies on replacing a node with a leaf - till the test accuracy betters
Cost Complexity Pruning

- take all the sub trees formed by this procedure and calculate their respective sum of square residuals
Tree Score
$$
\text{Tree Score} = SSR + \alpha T
$$
- where alpha is an hyperparameter and T is the number of leaves
- then pick the tree with the lowest tree scores
How to find the best $\alpha$ and make the best tree ?