American Express Interview Question | Data Cleansing
Question
Explain why Data Cleansing is essential and which method you use to maintain clean data?
in progress
0
Statistics
4 years
4 Answers
905 views
Great Grand Master 0
Answers ( 4 )
Data cleaning is one of an important step in solving the ML problem. We cannot feed raw data directly to machine learning model. The various data cleaning activities to be perrformed are:
1. Handling of missing values : We can use mean, median,mode or knn imputation
2. Handling categorical variables: The categorical variables must be one hot encoded if it is nominal and label encoded if ordinal
3. Feature scaling: If the feature have variance, then it must be standadized or normalized.
Data cleaning is important to get the best out of our models.
1) Handle missing values, remove or impute them according to the need.
2) Inspect outliers and remove them if needed.
3) Transform data according to the need like performing one-hot encoding.
Important steps to consider in data cleaning
1. Remove duplicate and irrelevant observations
2. Fix structural errors
3. Filter unwanted outliers
4. Handle missing data
1) Handle missing values, remove or impute them according to the need.
2) Inspect outliers and remove them if needed.
3) Transform data according to the need like performing one-hot encoding.
4) Feature scaling: If the feature have variance, then it must be standadized or normalized.