Flipkart Interview Question | What are the basic checks you do for cleaning the data?
Question
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
It will take less than 1 minute to register for lifetime. Bonus Tip - We don't send OTP to your email id Make Sure to use your own email id for free books and giveaways
Answers ( 4 )
Before cleaning the data there are certain checks that one should do.
1.We should first see the shape of the data i.e. the number of rows and number of columns the dataframe has.
Code for the above (df.shape)
2. We should check the info of the dataframe which tells us about the datatype of the features in the dataframe.
Code for the above(df.info)
3. We should also check the 5 point summary of the dataframe.
Code for the above (df.describe)
4. We can also check for skewness,kurtosis and outliers as well.
1) Missing values per column
2) Presence of outliers and the reason for their existence.
3) Treatment for missing values.
4) Converting datatypes of certain features.
5) Merging and restructuring datasets.
6) Performing one–hot encoding if required.
Basic checks:
df.head()
df.shape()
df,describe()
df.info()
df.isnull().sum()
1) missing values per column
2) outliers
3) standardisation issues
4) impractical values
5) incorrect datatypes