Day 9 – EDA in Python

The complete Machine Learning journey can be penned down in 4 steps:-

1. Exploratory Data Analysis This is the first thing you do when you get a dataset. Before jumping on to building models, you need to first understand the nature of the data. EDA helps in making it easier for the audience to get along with data.

EDA includes visualizing the raw data, looking for correlation in the dataset and finding missing values in the data set. In short you have to plot a hell lot of graphs to understand the dataset.

2. Cleaning the data – You will spend more than 50 percent of your time in cleaning the data and doing missing value treatment. Cleaning is important because the accuracy of your model will depend on the number of proper data points.

3. Building models – We are talking about Machine Learning algorithms, so, once you have the clean data, you need to build models, visualize results, check the results and improve the success metric of the model

4. Result Presentation – You have the results of the model. This result is of no use until and unless it is consumed by the audience. You will again need the power of visualizations to prove the result of your analysis

We will take us a data set and will build some graphs from scratch.
I will be using the Titanic Data set because of the following reasons:-
1. Firstly, It is freely and easily available at this link – https://www.kaggle.com/c/titanic/data
2. It’s clean and easy to understand

Gist of the dataset – Titanic train dataset contains the various information(like. age, sex, family size of the passenger, etc.) about those who survived and who could not survive

Leave a Reply

Your email address will not be published. Required fields are marked *