Decision Trees
Question
Why decision trees and their ensembles have such amazing predictive power(And why is it prone to overfit to the dataset)?
in progress
0
Machine Learning
55 years
1 Answer
730 views
Member 0
Answer ( 1 )
A decision tree selects a feature at first, then goes on to split this variable based on another variable(Splitting is based on a metric such that it is optimized or a feature can be chosen randomly). This process is controlled by hyperparameters. Now to know an exact split point for that variable(assume continuous variable), there are many ways to do that (-infinity to infinity), and our error rate has to be minimized. The optimal way(Lloyd’s algorithm) to do that is by choosing an optimal split(split is where that continuous variable acquires a value, and hence our split point now can take values from a discrete set), thereby dividing the vector space into regions and assigning the regions to some labels(or values if our dependent variable is continuous).
So since the algorithm goes into the feature space, partitions it into regions, and assigns them variables, it is prone to overfit. So even noise gets learned well by partitioning and hence the overfit.