SAP interview Questions | Cleaning Data
Question
What are the steps for wrangling and cleaning data before applying machine learning algorithms?
in progress
0
Machine Learning
4 years
2 Answers
964 views
Great Grand Master 0
Answers ( 2 )
Steps for data cleaning:
1. Remove unwanted data
– This includes duplicate or irrelevant observations.
2. Fix Structural Errors
– Structural errors are those that arise during measurement, data transfer, or other types of poor data managemnet
3. Filter Unwanted Outliers
– Outliers can cause problems with certain types of models. For example, linear regression models are less robust to outliers than decision tree models.In general, if you have a legitimate reason to remove an outlier, it will help your model’s performance.
4. Handle Missing Data
– They are:
Dropping observations that have missing values
Imputing the missing values based on other observations
1) Eliminate columns having high proportion of missing values and which do not
make sense in the model. Same applies to rows also.
2) Identify missing values and impute them with mean,median,k-nn imputation etc,
according to the need.
3) If there are too many features, perform feature selection or compute new columns
( feature engineering) if necessary.
4) Combine multiple data sets
5) Check outliers and decide on the course of action. Remove them, if needed.