The Chi-Squared test of independence (and subsequent Cramer’s V test) gives an indication of the relationship between two categorical variables.It measures the significance of the association of two categorical variables and does not speak about its strength
For the Ordinal variables, Spearman’s correlation to understand whether there is an association between them
Pearson’s correlation coefficient measures the strength of the linear relationship between two variables on a
continuous scale.
If there are only 2 variables, one continuous and one categorical, finding correlation
is not feasible because correlation ideally measures how much linear dependency is there
between these two variables – if one variable increases whether another one increases or decreases.
However, in a supervised learning setting, when both variables are independent, you can
perform one-hot encoding on the categorical variables and get a correlation matrix.
You can also use ANOVA test which determines whether a categorical variable has a significant effect
on the value of a continuous variable.
Case 1: When an Independent Variable Only Has Two Values
Point Biserial Correlation:
If a categorical variable only has two values (i.e. true/false), then we can convert it into a numeric datatype (0 and 1). Since it becomes a numeric variable, we can find out the correlation using the dataframe.corr() function.
Case 2: More Than Two Values: Use ANOVA (Analysis of Variance)
Answers ( 3 )
The Chi-Squared test of independence (and subsequent Cramer’s V test) gives an indication of the relationship between two categorical variables.It measures the significance of the association of two categorical variables and does not speak about its strength
For the Ordinal variables, Spearman’s correlation to understand whether there is an association between them
Pearson’s correlation coefficient measures the strength of the linear relationship between two variables on a
continuous scale.
If there are only 2 variables, one continuous and one categorical, finding correlation
is not feasible because correlation ideally measures how much linear dependency is there
between these two variables – if one variable increases whether another one increases or decreases.
However, in a supervised learning setting, when both variables are independent, you can
perform one-hot encoding on the categorical variables and get a correlation matrix.
You can also use ANOVA test which determines whether a categorical variable has a significant effect
on the value of a continuous variable.
Case 1: When an Independent Variable Only Has Two Values
Point Biserial Correlation:
If a categorical variable only has two values (i.e. true/false), then we can convert it into a numeric datatype (0 and 1). Since it becomes a numeric variable, we can find out the correlation using the dataframe.corr() function.
Case 2: More Than Two Values: Use ANOVA (Analysis of Variance)