Question

BCG Interview Question | Data Distribution

Question

What could be some issues if the distribution of the test data is significantly different than the distribution of the training data?

in progress 1

Machine Learning Dhruv2301 55 years 2 Answers 1043 views Great Grand Master 0

About Dhruv2301Great Grand Master

Follow Me

Answers ( 2 )

Leave an answer

Name*

E-Mail*

Website

Attachment

Browse

Featured image

Browse

Answer*

Previous question

Next question

Ognish Master · Answer 1 · July 23, 2020

Some of the issues can be:

1. Covariate shift: training and test input follow different distributions, but functional relation remains unchanged.
2. Sample selection bias: the training examples have been obtained through a biased method, such as non-uniform selection.
3. Non-stationary environments: Training environment is different from the test one, whether it’s due to a temporal or a spatial change. One typical scenario is adversarial classification problems, such as spam filtering and network intrusion detection.

swap007 Grand Master · Answer 2 · August 1, 2020

swap007 Grand Master

0

August 1, 2020 at 3:15 pm

Reply

It would be difficult to gauge the performance of the model as the training and
test samples appear to be very different from each other.

Register Now

Login

Lost Password

BCG Interview Question | Data Distribution

About Dhruv2301Great Grand Master

Related questions

What kind of jobs or career opportunities are present in the Machine Learning domain?

Random Forest

Can you use Linear Regression for Classification?

What are the assumptions of Linear Regression?

What is correlation and what is its range?

Answers ( 2 )

Leave an answer