Visualizations in Python

Data visualization is the discipline of trying to understand data by placing it in a visual context so that patterns, trends and correlations that might not otherwise be detected can be exposed. It  is one of the basic but a very important weapon in your Data Science career.
Python is blessed with some good libraries for visualizations. 

Open Jupyter notebook or any other IDE of your preference.

Library to use – There are a lots of good visualization libraries, but matplot library is the most preferred one to start with because of its simple implementation.
So,We will mostly concentrate on matplot library.

Importing the library and giving it the standard alias as plt.

Following are the two important functions which will come handy in this book:-

To display a chart you should use – plt.show()
To save the chart as an image, use the code – plt.savefig(“Filename.png”)

Popular plotting libraries in Python are:-

1. Matplotlib – Best to start with. It provides easy implementation and gives a lot of freedom
2. Seaborn – It has a high level interface and great default styles
3. Plotly – To create interactive plots
4. Pandas Visualization – Easy interface, built on Matplotlib

Line Chart

A line chart or line graph is a type of chart which displays information as a series of data points called ‘markers’ connected by straight line segments.

So, a line plot is a very basic plot which is used to show observations collected after a regular interval. The x-axis represents the interval and the y-axis represents the values.

Lets plot our first graph
import matplotlib.pyplot as plt
x = [1,2,3,4,5,6]
y = [10,12,20,21,30,35]
plt.plot(x,y)
Here is what you will get

Graph 1 – Basic Line Chart

Plot a sin graph using line plot

import matplotlib.pyplot as plt
from numpy import cos

x = [x*0.01 for x in range(100)]
y = cos(x)
plt.plot(x,y)
plt.show()


Here is what you get as a cos graph

Graph 2 – Cos graph using line plot

You know how to plot a line graph, but there is one important thing missing in the graph i.e. the x and y-axis, and the plot title. Let’s create another line plot for number of students in a class for the following data

c = [1,2,3,4,5,6]
student = [40,52,50,61,70,78]

Following commands are used to put x-axis label, y-axis label, and chart title

plt.xlabel(“Label”)
plt.ylabel(“Label”)
plt.title(“Title”)

The code is given below
c = [1,2,3,4,5,6]
student = [40,52,50,61,70,78]
plt.xlabel(“Class”)
plt.ylabel(“Number of Students”)
plt.title(“Class vs Number of students”)
plt.plot(c, student)

Graph 3 – Class vs Number of Students chart with proper labels and plot title

Do you want to change the color of the line?
Try the following code instead to make the line green in color
plt.plot(c,student,color=’g’)

Graph 4 – Adding color to the same graph

Multi Line Chart

You can also add multiple plots in the same graph. Let’s try to put a couple of new lines in the graph i.e. number of teachers and average marks

Graph 5 – Adding multiple lines to a graph

To add a legend, you have to give label to each of the line which you want to plot and after that you specify a location to the legend

The code is self explanatory and is given below:-

c = [1,2,3,4,5,6]
student = [40,52,50,61,70,78]
avg_marks = [34,43,54,44,50,55]
num_of_teachers = [10,12,13,10,15,10]
plt.xlabel(“Class”)
#plt.ylabel(“Number of Students”)
plt.title(“Class vs Number of students”)
plt.plot(c,student,color=’orange’,label=’Student’)
plt.plot(c,avg_marks,color=’red’,label=’Marks’)
plt.plot(c,num_of_teachers,color=’green’,label=’Teachers’)
plt.legend(loc=”upper left”)


Bar Chart

“A bar chart or bar graph is a chart or graph that presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent. The bars can be plotted vertically or horizontally.”

After the line chart, the second basic but highly used chart is the bar chart

To create a bar chart – plt.bar(x,y)

We will plot few graphs first and then you can put labels, title, and legends later.

import matplotlib.pyplot as plt
a = [‘Apple’,’Mango’,’Pineapple’]
b = [40,60,50]
plt.bar(a,b)

Graph 6 – A simple bar chart

Use random values between 1 and 100 to create the same graph.

import matplotlib.pyplot as plt
from random import seed
from random import randint
seed(123)
x = [‘Apple’,’Mango’,’Pineapple’]
y = [randint(0,100),randint(0,100),randint(0,100)]
plt.bar(x,y)

Graph 7 – Bar chart with random values

Adding color, labels, and title to the random values bar chart

Stacked 100% bar chart with sub component
When you have to show components of components like the graph below

Example of 100% bar chart

x = [“a”,”b”,”c”,”d”]
y1 = np.array([3,8,6,4])
y2 = np.array([10,2,4,3])
y3 = np.array([5,6,2,5])

snum = y1+y2+y3

# normalization
y1 = y1/snum*100.
y2 = y2/snum*100.
y3 = y3/snum*100.
plt.figure(figsize=(4,3))

# stack bars

plt.bar(x, y1, label=’y1′)
plt.bar(x, y2 ,bottom=y1,label=’y2′)
plt.bar(x, y3 ,bottom=y1+y2,label=’y3′)

Graph 8 – A 100% stacked bar chart

Histogram

Histograms are density estimates. A density estimate gives a good impression of the distribution of the data. The idea is to locally represent the data density by counting the number of observations in a sequence of consecutive intervals (bins).

To plot a histogram use this code – plt.hist(x,y)

A simple histogram plot
q = [1,2,34,5,44,66,66,90,33,45,2,1,2,3,4]
plt.hist(q,bins = 3,color=’green’)

Graph 9 – A simple histogram

Create a list using random variables and plot it in 4 bins

import random
my_rand = random.sample(range(1,30),20)
print(my_rand)
print(type(my_rand))
plt.hist(my_rand,bins=4,color=’orange’)

Graph 10 – A histogram made with random variables

In Histogram also you can add more than one data points to make parallel bars.

import random
my_rand = random.sample(range(1,30),20)
my_rand2 = random.sample(range(1,25),20)
print(my_rand)
print(type(my_rand))
plt.hist([my_rand,my_rand2],bins=4,color=[‘green’,’red’])
legend = [‘Rand1′,’Rand2’]
plt.legend(legend)
plt.xlabel(“Bins”)
plt.ylabel(“Random Number”)
plt.title(“Random Variable distribution”)

Graph 11 – Parallel histogram

Horizontal Histogram

import numpy as np
import matplotlib.pyplot as plt
name = [‘Nitin’,’Saurabh’,’Rahul’,’Gaurav’,’Amit’]
run = [200,70,130,120,100]
plt.barh(name,run,color=’orange’)
plt.xlabel(“Runs Scored”)
plt.ylabel(“Cricketer”)
plt.title(“Runs scored by cricketers”)
plt.show()

Graph 12 – A horizontal histogram

Keep making irrelevant and unnecessary graphs.
Keep practicing 🙂

XtraMous

Author: TheDataMonk

I am the Co-Founder of The Data Monk. I have a total of 6+ years of analytics experience 3+ years at Mu Sigma 2 years at OYO 1 year and counting at The Data Monk I am an active trader and a logically sarcastic idiot :)