BCG Interview Question | Data Distribution

Question

BCG Interview Question | Data Distribution

Question

What could be some issues if the distribution of the test data is significantly different than the distribution of the training data?

in progress 1

Machine Learning Dhruv2301 4 years 2 Answers 703 views Great Grand Master 0

About Dhruv2301Great Grand Master

Follow Me

Answers ( 2 )

Leave an answer

Name*

E-Mail*

Website

Attachment

Browse

Featured image

Browse

Answer*

Previous question

Next question

Ognish Master · Answer 1 · July 23, 2020

Some of the issues can be:

1. Covariate shift: training and test input follow different distributions, but functional relation remains unchanged.
2. Sample selection bias: the training examples have been obtained through a biased method, such as non-uniform selection.
3. Non-stationary environments: Training environment is different from the test one, whether it’s due to a temporal or a spatial change. One typical scenario is adversarial classification problems, such as spam filtering and network intrusion detection.

swap007 Grand Master · Answer 2 · August 1, 2020

swap007 Grand Master

0

August 1, 2020 at 3:15 pm

Reply

It would be difficult to gauge the performance of the model as the training and
test samples appear to be very different from each other.

Register Now

Login

Lost Password

Login

Register Now

BCG Interview Question | Data Distribution

Top Categories