SAP interview Questions | Cleaning Data

Question

What are the steps for wrangling and cleaning data before applying machine learning algorithms?

in progress 0
Dhruv2301 4 years 2 Answers 964 views Great Grand Master 0

Answers ( 2 )

  1. Steps for data cleaning:
    1. Remove unwanted data
    – This includes duplicate or irrelevant observations.

    2. Fix Structural Errors
    – Structural errors are those that arise during measurement, data transfer, or other types of poor data managemnet

    3. Filter Unwanted Outliers
    – Outliers can cause problems with certain types of models. For example, linear regression models are less robust to outliers than decision tree models.In general, if you have a legitimate reason to remove an outlier, it will help your model’s performance.

    4. Handle Missing Data
    – They are:
    Dropping observations that have missing values
    Imputing the missing values based on other observations

  2. 1) Eliminate columns having high proportion of missing values and which do not
    make sense in the model. Same applies to rows also.
    2) Identify missing values and impute them with mean,median,k-nn imputation etc,
    according to the need.
    3) If there are too many features, perform feature selection or compute new columns
    ( feature engineering) if necessary.
    4) Combine multiple data sets
    5) Check outliers and decide on the course of action. Remove them, if needed.

Leave an answer

Browse
Browse