SAP interview Questions | Binary Classification
Question
Executing a binary classification tree algorithm is a simple task. But, how does a tree splitting take place? How does the tree determine which variable to break at the root node and which at its child nodes?
in progress
0
Interview Question
55 years
2 Answers
797 views
Great Grand Master 0
Answers ( 2 )
Decision trees use multiple algorithms to decide to split a node in two or more sub-nodes. The creation of sub-nodes increases the homogeneity of resultant sub-nodes. In other words, we can say that purity of the node increases with respect to the target variable. The decision tree splits the nodes on all available variables and then selects the split which results in most homogeneous sub-nodes.
Mainly they are done in two ways:
1. Gini Index
Whichever categorical variable has a higher Gini index, splitting starts from there
2. Information gain
Opposite of the Gini index. whichever variable has the lowest information gain, splitting starts from there
If you are solving a Regression problem, the decision tree looks at all the variables and
all the cut-points and chooses a cut point which leads to maximum reduction in RSS.
It goes on continuing this process until a stopping criterion is encountered.
If you are solving a classification problem, reduction in the classification error rate is considered
while splitting. An alternative to this is Gini Index and Entropy which are an indication of node
purity.