nonlinear ensemble method, used to better represent a non-linear feature space
basically any feature space, where data is not aligned is an apparent linear fashion
since trees don’t use coefficients they can easily handle non-linear data and don’t have the same problems which all linear models face
- no need for regularisation across data features
- no problem of collinearity among features
- no problems from non-linear data
- no problems from correlated error terms (which were assumed independent in Linear Regression - funnel shaped residual graphs )

How are they made ?

similar to decision trees, we follow a top down approach by starting with any feature
then we go through all the data with this single node tree and evaluate the square residuals at each data point
- following this we change the threshold value, calculating the square residuals and stop at the least value of the square residual
- this is iterated till we get a single leaf at each depth
this sequence is followed for all the features in the feature space, and the root node is assigned to the feature with the lowest square residuals, from where we once again follow this procedure till we reach leave nodes at all ends

Screenshot 2025-08-02 at 2.04.02 PM.png

<aside> 💡

even though, this would make an impeccable tree, which predicts PERFECTLY - but that’s a trap, the Bias-Variance Trap

this can be minimized by using some threshold on the number of data points present per splitting criterion </aside>

How to prune regression trees ?

a method to prevent the over fitting to the data
this method usually relies on replacing a node with a leaf - till the test accuracy betters

Screenshot 2025-08-02 at 2.13.26 PM.png

take all the sub trees formed by this procedure and calculate their respective sum of square residuals

$$ \text{Tree Score} = SSR + \alpha T $$