Oracle Interview Questions | Prediction

Question

What is one way that you would handle an imbalanced data set that’s being used for prediction?

in progress 0
Dhruv2301 4 years 3 Answers 1140 views Great Grand Master 0

Answers ( 3 )

  1. You can use techniques like over sampling and under sampling to deal with imbalanced data sets.
    You can increase the proportion of the class which has lesser no of observations and decrease the
    proportion of the class which has higher no of observations. Libraries like SMOTE can help you to achieve
    that. Also using evaluation metrics other than just accuracy can help you to evaluate your model more
    accurately.

  2. To handle imbalanced dataset, we can use undersampling where the number of instance of majority class is deleted. We can also use Oversampling through SMOTE ( Synthetic Minority Oversampling Technique) to add instance of minority class

  3. One approach to addressing the problem of class imbalance is to randomly resample the training dataset. The two main approaches to randomly resampling an imbalanced dataset are to delete examples from the majority class, called undersampling, and to duplicate examples from the minority class, called oversampling.

    There are two main approaches to random resampling for imbalanced classification; they are oversampling and undersampling.

    Random Oversampling: Randomly duplicate examples in the minority class.
    Random Undersampling: Randomly delete examples in the majority class.

    Combining Random Oversampling and Undersampling:
    This can result in improved overall performance compared to performing one or the other techniques in isolation.

    For example, if we had a dataset with a 1:100 class distribution, we might first apply oversampling to increase the ratio to 1:10 by duplicating examples from the minority class, then apply undersampling to further improve the ratio to 1:2 by deleting examples from the majority class.

Leave an answer

Browse
Browse