Question

Feature Selection

Question

Two people A and B, train an algorithm on a same set of data. A trains by selecting 10 most important features from the dataset using an Algorithm C and randomly splits the dataset into training and testing dataset. B randomly splits the dataset and selects 10 most important features from the training dataset using an Algorithm C and trains and tests the data. Both report a relevant metric D on their respective test datasets. Whose value of the metric D is more reliable?

in progress 0

Interview Question demogorgon 55 years 1 Answer 806 views Member 0

About demogorgonMember

Answer ( 1 )

Leave an answer

Name*

E-Mail*

Website

Attachment

Browse

Featured image

Browse

Answer*

Previous question

Next question

demogorgon Member · Answer 1 · November 24, 2020

Assuming both train the algorithm using the same hyperparameters and the same training time and assume the dataset is not too small. B’s result is more accurate here. In machine learning, we try to learn the universal function through our dataset which is sampled, and evaluate it on a completely new test set and the test set should not in any way interact with the training set. Since A’s feature selection algorithm tries to optimize on both the test and training dataset, thereby by the metric reported on the test set would definitely be better thereby biasing our estimation power and wrongly rewarding the algorithm.

Register Now

Login

Lost Password

Feature Selection

About demogorgonMember

Related questions

Where can I find LA jackets featuring designs?

Where can I find LA jackets featuring designs?

Exploring the Rich History of Military Patches in the United Kingdom

Mengenal DEWA POKER: Fenomena Baru dalam Dunia Poker Online

Regarding Ebook

Answer ( 1 )

Leave an answer