While working on a data set, how do you select important variables?

Question

Explain your methods.

solved 0
Dhruv2301 55 years 2 Answers 1756 views Great Grand Master 1

Answers ( 2 )

  1. There are various methods of Feature extractions. Variance, correlation, and from the variable are few of them to name. You can refer to the following article: https://towardsdatascience.com/feature-selection-techniques-in-machine-learning-with-python-f24e7da3f36e

    To explain:
    1. Variance: You can eliminate the features which have very little variance, as they are not giving any insight into the prediction and overfitting the model.
    2. Correlated Features: There may be features with high correlations among them, we can keep few and eliminate the others as the kept set are enough to explain the variability by the eliminated one, keeping the correlated features would increase the cost of dimensionality.
    3. From the Model itself: This method of feature selection is more time consuming but give us the important features produced by the models itself. In this, we can use 2-3 (any large number can be used depends on you.) models and fit the train set on it and asks the model to produce let say n numbers of important functions and eliminate the others. There are two functions in feature selection module of Sklearn RFE and RFECV, which we can use for the feature selection.

    Best answer
  2. 1) In R, you can use the step() function and pass your model as the parameter, the step function internally builds various models and determines the predictors which goes onto build the best model.
    2) In sklearn , feature importance is an inbuilt class in tree based classifiers. you can plot the importance of all the features relative to the most important feature.
    3) You can also use the correlation matrix, where you get to know the correlation of all variables with the target variable.
    you can eliminate the features which do not possess a strong correlation with the target variable.

Leave an answer

Browse
Browse