BCG Interview Question | Data Distribution

Question

What could be some issues if the distribution of the test data is significantly different than the distribution of the training data?

in progress 1
Dhruv2301 4 years 2 Answers 703 views Great Grand Master 0

Answers ( 2 )

  1. Some of the issues can be:

    1. Covariate shift: training and test input follow different distributions, but functional relation remains unchanged.
    2. Sample selection bias: the training examples have been obtained through a biased method, such as non-uniform selection.
    3. Non-stationary environments: Training environment is different from the test one, whether it’s due to a temporal or a spatial change. One typical scenario is adversarial classification problems, such as spam filtering and network intrusion detection.

  2. It would be difficult to gauge the performance of the model as the training and
    test samples appear to be very different from each other.

Leave an answer

Browse
Browse