Tiger Analytics Interview Question | Data Cleaning
Question
What are the steps for wrangling and cleaning data before applying machine learning algorithms?
in progress
0
Machine Learning
55 years
1 Answer
738 views
Great Grand Master 0
Answer ( 1 )
Data wrangling is a process by which we convert and map data. This changes data from its raw form to a format that is a lot more valuable.
The important steps in data wrangling are –
1. Acquiring data: This is an extremely tedious process and requires the most amount of time. O Sources for data collection Data is publicly available on various websites like kaggle.com, data.gov ,World Bank, AWS Datasets, Google Datasets.
2. Data cleaning: Data cleaning is an essential component of data wrangling and requires a lot of patience. To make the job easier it is first essential to format the data make the data readable for humans at first. The essentials involved are,find outliers (data points that do not match the rest of the dataset) in data, find missing values and remove them from the data set (without this, any model being trained becomes incomplete and useless) .
3. Data Computation: At times, your machine not have enough resources to run your algorithm e.g. you might not have a GPU. In these cases, you can use publicly available APIs to run your algorithm. These are standard end points found on the web which allow you to use computing power over the web and process data without having to rely on your own system. An example would be the Google Colab Platform.