Barclays Interview Questions | Imbalanced Dataset

Question

What is one way that you would handle an imbalanced data set that’s being used for prediction (i.e., vastly more negative classes than positive classes)?

in progress 0
Dhruv2301 4 years 4 Answers 1814 views Great Grand Master 0

Answers ( 4 )

  1. Assume Negative Class to be of O and Positive Class of 1’s.
    When we have a vastly more negative class than positive class then we can do Oversampling, the process in which the length of minority class is reduced to the length of majority class. But there are two disadvantages to this approach:
    1. It will have duplicate data.
    2.It will result in Overfitting.
    One efficient way of Oversampling is is SMOTE(Synthetic Minority OverSampling Technique).
    SMOTE works by selecting examples that are close in the feature space, drawing a line between the examples in the feature space and drawing a new sample at a point along that line.

  2. Dealing with imbalanced datasets entails strategies such as improving classification algorithms or balancing classes in the training data (data preprocessing) before providing the data as input to the machine learning algorithm. The later technique is preferred as it has wider application.

    Some of the common techniques are:
    1. Random Under-Sampling of the dominant class
    2. Random Over-Sampling of the non- dominant class
    3. Cluster-Based Over Sampling
    4. Algorithmic Ensemble Techniques
    5. Bagging Based techniques for imbalanced data

  3. Use Libraries like SMOTE which can synthetically create additional data and balance out the classes.

  4. There are several techniques to handle imbalanced dataset:
    We can use random undersampling where the number of instance of majority class(here the negative class) is deleted.
    We can also use random oversampling where the number of instance of minority class(here the positive class) is duplicated
    We can use SMOTE ( Synthetic Minority Oversampling Technique) to add instance of minority class. SMOTE uses nearest neighbours of the minority class to create synthetic data.

Leave an answer

Browse
Browse