American Express Interview Question | Categorical Variable

Question

Treating a categorical variable as a continuous variable would result in a better predictive model? How?

in progress 1
Dhruv2301 4 years 1 Answer 869 views Great Grand Master 0

Answer ( 1 )

  1. Okay i will go a bit descriptive to this answers to help my peers understand properly:

    Challenges faced in a categorical variable:

    1. A categorical variable has too many levels. This pulls down the performance level of the model. For example, a cat. variable “zip code” would have numerous levels.

    2. A categorical variable has levels which rarely occur. Many of these levels have minimal chance of making a real impact on model fit.

    3. There is one level that always occurs i.e. for most of the observations in data set there is only one level. Variables with such levels fail to make a positive impact on model performance due to very low variation.

    4. If the categorical variable is masked, it becomes a laborious task to decipher its meaning. Such situations are commonly found in data science competitions.

    5. We can’t fit categorical variables into a regression equation in their raw form. They must be treated.
    Most of the algorithms (or ML libraries) produce better result with numerical variable. In python, library “sklearn” requires features in numerical arrays.

    One of the major reason why we convert categorical variables into factors i.e number because to make things easy and because of ML format/ constraints. . For example, if you run a machine learning algorithm on a Logistic Regression you have to keep your dependent variable(Which you want to predict) as binary i.e 0 or 1. In that case you try to convert a categorical variable into numeric. That in turn helps in better model performance.

Leave an answer

Browse
Browse