Oracle Interview Questions | Prediction
Question
What is one way that you would handle an imbalanced data set that’s being used for prediction?
in progress
0
Statistics
55 years
3 Answers
1168 views
Great Grand Master 0
Answers ( 3 )
You can use techniques like over sampling and under sampling to deal with imbalanced data sets.
You can increase the proportion of the class which has lesser no of observations and decrease the
proportion of the class which has higher no of observations. Libraries like SMOTE can help you to achieve
that. Also using evaluation metrics other than just accuracy can help you to evaluate your model more
accurately.
To handle imbalanced dataset, we can use undersampling where the number of instance of majority class is deleted. We can also use Oversampling through SMOTE ( Synthetic Minority Oversampling Technique) to add instance of minority class
One approach to addressing the problem of class imbalance is to randomly resample the training dataset. The two main approaches to randomly resampling an imbalanced dataset are to delete examples from the majority class, called undersampling, and to duplicate examples from the minority class, called oversampling.
There are two main approaches to random resampling for imbalanced classification; they are oversampling and undersampling.
Random Oversampling: Randomly duplicate examples in the minority class.
Random Undersampling: Randomly delete examples in the majority class.
Combining Random Oversampling and Undersampling:
This can result in improved overall performance compared to performing one or the other techniques in isolation.
For example, if we had a dataset with a 1:100 class distribution, we might first apply oversampling to increase the ratio to 1:10 by duplicating examples from the minority class, then apply undersampling to further improve the ratio to 1:2 by deleting examples from the majority class.