SAP interview Questions | Binary Classification

Question

Executing a binary classification tree algorithm is a simple task. But, how does a tree splitting take place? How does the tree determine which variable to break at the root node and which at its child nodes?

in progress 0
Dhruv2301 55 years 2 Answers 771 views Great Grand Master 0

Answers ( 2 )

  1. Decision trees use multiple algorithms to decide to split a node in two or more sub-nodes. The creation of sub-nodes increases the homogeneity of resultant sub-nodes. In other words, we can say that purity of the node increases with respect to the target variable. The decision tree splits the nodes on all available variables and then selects the split which results in most homogeneous sub-nodes.

    Mainly they are done in two ways:
    1. Gini Index
    Whichever categorical variable has a higher Gini index, splitting starts from there

    2. Information gain
    Opposite of the Gini index. whichever variable has the lowest information gain, splitting starts from there

  2. If you are solving a Regression problem, the decision tree looks at all the variables and
    all the cut-points and chooses a cut point which leads to maximum reduction in RSS.
    It goes on continuing this process until a stopping criterion is encountered.
    If you are solving a classification problem, reduction in the classification error rate is considered
    while splitting. An alternative to this is Gini Index and Entropy which are an indication of node
    purity.

Leave an answer

Browse
Browse