ML Interview Question | Reducing Dimension

Question

You are given a train data set having 2000 rows and 1.2 Million columns. The data set is based on a classification problem. You are asked to reduce the dimension of this data so that model computation time can become manageable. What will be your suggestion?

(You are free to make practical assumptions.)

in progress 1
Dhruv2301 55 years 5 Answers 994 views Great Grand Master 0

Answers ( 5 )

  1. To reduce dimensionality, we can separate the numerical and categorical variables and remove the correlated variables. For numerical variables, we’ll use correlation. For categorical variables, we’ll use chi-square test.
    Also, we can use PCA and pick the components which can explain the maximum variance in the data set.

  2. We can use PCA or LDA to decrease the dimensions

  3. We can use dimensionality reduction techniques like Principal Components Analysis
    so that highly correlated features are represented as single dimension.
    Variable Importance plots can be created using decision trees and features can be selected
    according to their importance.

  4. we can use dimentionality reduction techniques like PCA, feature selection techniques like ridge regression or recursive feature elimination or use svm since svm works wel for high dimensional data

  5. Dimensionality reduction is the way to manage a dataset with a large number of variables.

    Two classes in dimensionality reduction:
    1. Feature Elimination
    2. Feature Extraction

    Feature Elimination eliminates the variables which are considered unimportant for the analysis. Though it is doesn’t give any information for the targeted analysis, we may loose some important data associated with the dropped variables

    Feature Extraction creates new independent variables where each new variable is a combination of original variables. Since we have all the original variables combined in each new independent variable, we can drop some of the new variables that are not so important.

    PCA (Principal Component Analysis) comes under Feature Extraction.
    PCA helps in dimensional reduction.

Leave an answer

Browse
Browse