Mckinsey Interview Questions | Imbalanced Dataset

Question

How would you handle an imbalanced dataset?

in progress 0
Dhruv2301 4 years 2 Answers 976 views Great Grand Master 0

Answers ( 2 )

  1. We can use techniques like over sampling and under sampling to
    handle issues like imbalanced data sets. We can over sample the class
    which has a lesser proportion in the data set and also under sample the class
    which has a higher proportion and make sure that there is an adequate proportion
    of both the classes in the train set as well as test set. We can use libraries like
    SMOTE to achieve this.

  2. Handling imbalanced data set can be overcome by undersampling the dominating class in the dataset or oversampling the non-dominating class

  3. python code : x_train,x_test,y_train,y_valid=train_test_split(x,y,test_size=0.2,stratify=y)
    explanation : by setting stratify=’y’ we instruct python to create train and test splits with balanced class even if the input dateset has imbalanced class.

Leave an answer

Browse
Browse