Feature Selection

Question

Feature Selection

Question

Two people A and B, train an algorithm on a same set of data. A trains by selecting 10 most important features from the dataset using an Algorithm C and randomly splits the dataset into training and testing dataset. B randomly splits the dataset and selects 10 most important features from the training dataset using an Algorithm C and trains and tests the data. Both report a relevant metric D on their respective test datasets. Whose value of the metric D is more reliable?

in progress 0

Interview Question demogorgon 3 years 1 Answer 619 views Member 0

About demogorgonMember

Answer ( 1 )

Leave an answer

Name*

E-Mail*

Website

Attachment

Browse

Featured image

Browse

Answer*

Previous question

Next question

demogorgon Member · Answer 1 · November 24, 2020

Assuming both train the algorithm using the same hyperparameters and the same training time and assume the dataset is not too small. B’s result is more accurate here. In machine learning, we try to learn the universal function through our dataset which is sampled, and evaluate it on a completely new test set and the test set should not in any way interact with the training set. Since A’s feature selection algorithm tries to optimize on both the test and training dataset, thereby by the metric reported on the test set would definitely be better thereby biasing our estimation power and wrongly rewarding the algorithm.

Register Now

Login

Lost Password

Login

Register Now

Feature Selection

Top Categories