Share
ML Interview Question | Reducing Dimension
Question
You are given a train data set having 2000 rows and 1.2 Million columns. The data set is based on a classification problem. You are asked to reduce the dimension of this data so that model computation time can become manageable. What will be your suggestion?
(You are free to make practical assumptions.)
in progress
1
Machine Learning
55 years
5 Answers
1020 views
Great Grand Master 0
Answers ( 5 )
To reduce dimensionality, we can separate the numerical and categorical variables and remove the correlated variables. For numerical variables, we’ll use correlation. For categorical variables, we’ll use chi-square test.
Also, we can use PCA and pick the components which can explain the maximum variance in the data set.
We can use PCA or LDA to decrease the dimensions
We can use dimensionality reduction techniques like Principal Components Analysis
so that highly correlated features are represented as single dimension.
Variable Importance plots can be created using decision trees and features can be selected
according to their importance.
we can use dimentionality reduction techniques like PCA, feature selection techniques like ridge regression or recursive feature elimination or use svm since svm works wel for high dimensional data
Dimensionality reduction is the way to manage a dataset with a large number of variables.
Two classes in dimensionality reduction:
1. Feature Elimination
2. Feature Extraction
Feature Elimination eliminates the variables which are considered unimportant for the analysis. Though it is doesn’t give any information for the targeted analysis, we may loose some important data associated with the dropped variables
Feature Extraction creates new independent variables where each new variable is a combination of original variables. Since we have all the original variables combined in each new independent variable, we can drop some of the new variables that are not so important.
PCA (Principal Component Analysis) comes under Feature Extraction.
PCA helps in dimensional reduction.