Mckinsey Interview Questions | Imbalanced Dataset
Question
How would you handle an imbalanced dataset?
in progress
0
Statistics
4 years
2 Answers
976 views
Great Grand Master 0
Answers ( 2 )
We can use techniques like over sampling and under sampling to
handle issues like imbalanced data sets. We can over sample the class
which has a lesser proportion in the data set and also under sample the class
which has a higher proportion and make sure that there is an adequate proportion
of both the classes in the train set as well as test set. We can use libraries like
SMOTE to achieve this.
Handling imbalanced data set can be overcome by undersampling the dominating class in the dataset or oversampling the non-dominating class
python code : x_train,x_test,y_train,y_valid=train_test_split(x,y,test_size=0.2,stratify=y)
explanation : by setting stratify=’y’ we instruct python to create train and test splits with balanced class even if the input dateset has imbalanced class.