Register Now

Login

Lost Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Visualization in Python Part 2

We have already plotted some basic graphs. Now it’s time to plot some more graphs:-

Line Histogram

Now let’s create a line histogram with some random data

import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats
noise = np.random.normal(0, 1, (1000, ))
density = stats.gaussian_kde(noise)
n, x, _ = plt.hist(noise, bins=np.linspace(-3, 3, 50),histtype=u’step’, density=True) 
plt.plot(x, density(x))
plt.show()

Graph 13 – A line histogram

Variable Width histogram

This is how a variable column width histogram looks like

Let’s create one with our dataset

import numpy as np
import matplotlib.pyplot as plt
freqs = np.array([2, 7, 21, 15, 12])
bins = np.array([65, 75, 80, 90, 105, 110])
widths = bins[1:] – bins[:-1]
heights = freqs.astype(np.float)/widths
plt.fill_between(bins.repeat(2)[1:-1], heights.repeat(2), facecolor=’orange’)
plt.show()

Graph 14 – A variable width histogram

One more example belowimport numpy as np
import matplotlib.pyplot as plt
x = np.sort(np.random.rand(6))
y = np.random.rand(5)
plt.bar(x[:-1], y, width=x[1:] – x[:-1])
plt.show()

Graph 15 – Variable width histogram

Area Chart

Below is how an area chart looks like:

Let’s create a basic area chart with some dummy data

Import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Data
x=range(1,6)
y=[ [1,4,6,8,9], [2,2,7,10,12], [2,8,5,10,6] ]

# Plot
plt.stackplot(x,y, labels=[‘A’,’B’,’C’])
plt.legend(loc=’upper left’)
plt.show()

Graph 16 – A basic area chart

You already know how to add x-labels, y-labels, title, etc.
Go ahead and add these in the graph above

Box and Whisker Plot

A box and whisker plot, or boxplot for short, is generally used to summarize the distribution of a data sample.
The x-axis is used to represent the data sample, where multiple boxplots can be drawn side by side on the x-axis if desired.

Box plot is one of the most common type of graphics. It gives a nice type of summary of one or more numeric variables. The line that divides the box in the two half is the median of the numbers.
The end of the boxes represents

seed(123)
a = random.sample(range(1,100),20)
b = random.sample(range(1,100),20)
c = random.sample(range(1,100),20)
d = random.sample(range(1,100),20)
list_Ex = [a,b,c,d]
plt.boxplot(list_Ex)

Graph 17 – A basic Box-Whisker graph

Now we will try to make the graph look better by adding color to the plot. The box-plot shows median, 25th and 75th percentile, and outliers. You should try to give different color to these points to make the plot more appealing.

When you plot a boxplot, you can use the following 5 attributes of the plot:-
a. box – To modify the color, line width, etc. of the central box
b. whisker – To modify the color and line width of the line which connects the box to the cap i.e. the horizontal end of the box plot
c. cap – The horizontal end of the box
d. median – The center of the box
e. flier

The box denotes the 1st and 3rd Quartile and it is called IQR i.e. the Inter Quartile Range. The lower fence is at Q1 – 1.5*IQR and the upper fence is at Q3 + 1.5*IQR. Any point which falls above or below it is called fliers or outliers

Following is the code with some fancy colors to help you understand each term individually.

bp=plt.boxplot(list_Ex,patch_artist = True)
for box in bp[‘boxes’]:
    box.set(color=’orange’,linewidth=2
for whisker in bp[‘whiskers’]:
    whisker.set(color = ‘red’,linewidth=2)
for cap in bp[‘caps’]:
    cap.set(color=’green’,linewidth=2)
for median in bp[‘medians’]:
    median.set(color=’blue’,linewidth=2)
for flier in bp[‘fliers’]:
    flier.set(marker=’o’,color = ‘black’, alpha=0.5)

Graph 18 – Box Whisker Chart

Box-plot practice

Following is one more code with the help of which you can replicate a Gaussian distribution

from numpy.random import seed
from numpy.random import randn
from matplotlib import pyplot

seed(1)
# random numbers drawn from a Gaussian distribution
x = [randn(1000), 5 * randn(1000), 10 * randn(1000)]
# create box and whisker plot
pyplot.boxplot(x)
# show line plot
pyplot.show()

Graph 19 – A Box-Whisker Plot

Scatter plot

Scatter plot is an easy to make but interesting visualization which gives a clear picture of how the data is distributed.

Let’s take example of 10 innings played by Sachin, Dhoni, and Kohli and see how their scores are distributed. The code is fairly easy to understand

sachin = [89, 90, 70, 89, 100, 80, 90, 100, 80, 34]
kohli = [30, 29, 49, 48, 100, 48, 38, 45, 20, 30]
dhoni = [23,45,67,76,65,45,100,12,34,65]
run = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
plt.scatter(run, sachin, color=’red’)
plt.scatter(run, kohli, color=’green’)
plt.scatter(run,dhoni,color=’blue’)
plt.xlabel(‘Score Range’)
plt.ylabel(‘Run scored’)
plt.show()

You can also add legend in the plot by using the following command

legend = [‘sachin’,’kohli’,’dhoni’]
plt.legend(legend)
The plot will now look like this

Graph 20 – A scatter plot

Below is one more scatter plot where you give weighted area and the size of the circle will be on the basis of the circle

import numpy as np
np.random.seed(123)
x = random.sample(range(1,100),40)
y = random.sample(range(1,100),40)
colors = np.random.rand(N)
area = (30*np.random.rand(N))**2
plt.scatter(x,y,s=area,c=colors,alpha=0.5)
plt.show()


Graph 21 – A scatter plot with area of bubble denoting the volume

Pie Chart

Create a pie chart for the number of centuries scored by Sachin, Dhoni, Dravid, and Kohli.

labels = ‘Sachin’,’Dhoni’,’Kohli’,’Dravid’
size = [100,25,70,50]
colors = [‘pink’,’blue’,’red’,’orange’]
explode = (0.1,0,0,0)
plt.pie(size,explode=explode,labels=labels,colors=colors,autopct=’%1.1f%%’,shadow=True,startangle=140)
plt.axis(‘equal’)
plt.show()
explode is used to set apart the first part of the pie chart. Everything else in the code is self explanatory. Below is the plot

Graph 22 – Pie chart showing performance of cricketers

Some cool  Visualizations

Create a stacked chart to demonstrate the number of people voting for either Python or Java in 5 countries, namely, India, USA, England, S.A., Nepal

import numpy as np
import matplotlib.pyplot as plt
Python = (20, 35, 30, 35, 27)
Java = (25, 32, 34, 20, 25)
width = 0.35       # the width of the bars: can also be len(x) sequence
p1 = plt.bar(ind, Python, width)
p2 = plt.bar(ind, Java, width,bottom=Python)
plt.ylabel(‘Votes’)
plt.title(‘Number of people using Python or Java’)
plt.xticks(ind, (‘India’, ‘USA’, ‘England’, ‘S.A.’, ‘Nepal’))
plt.yticks(np.arange(0, 81, 10))
plt.legend((p1[0], p2[0]), (‘Python’, ‘Java’))
plt.show()


xticks is used to give labels to the x-axis and yticks give labels to the y-axis.

Graph 23 – Stacked Bar graph

A cool area graph

import numpy as np
import matplotlib.pyplot as plt
# create data
x=range(1,15)
y=[1,4,6,8,4,5,3,2,4,1,5,6,8,7]

# Change the color and its transparency
plt.fill_between( x, y, color=”red”, alpha=0.4)
plt.show()
# Same, but add a stronger line on top (edge)
plt.fill_between( x, y, color=”red”, alpha=0.2)
plt.plot(x, y, color=”red”, alpha=0.6)

The parameter alpha is used to give weight age to the density of color. 0.4 is given to the edge and 0.2 is given to the fill

Graph 24 – An area graph

One of the most important thing is to understand when to use which graph and a list of all the graphs in your knowledge.

There are four types of information which we can display using any plot:-
1. Distribution
2. Comparison
3. Relationship
4. Composition


1. Distribution shows how diversely the data is distributed in your data set.
How many people are from which state of the country?

a   Histogram – If you have few data point
b.  Line Histogram – When you have a lot of data points
c.   Scatter plot – When you have to show the distribution of 2-3 variables

2.    Comparison – When you have to compare something over 2 or more categories

a. Variable width chart – When you have to compare two variables per item
b. Tables with embedded charts – When there are many categories, basically a matrix of charts
c. Horizontal or Vertical Histogram – When there are few categories in a data set
d. If you want to compare something over time
    i.    Line Chart
   ii.   Bar Vertical Chart
   iii.  Many categories line chart

3. Relationship Charts – When you want to see the relationship between two or more variables then you have to use relationship charts

a. Scatter Plot
b. Scatter plot bubble chart

4. Composition Charts –
When you have to show a percentage or composition of variables.

a. Pie Chart – Very basic plot when there are 3-6 categories
b. Stacked 100% bar chart with sub component – When you have to show components of components
c. Stacked 100% bar chart – When you have to look into the contribution of each component.
d. Stacked area chart – When relative and absolute difference matters



The Data Monk services

We are well known for our interview books and have 70+ e-book across Amazon and The Data Monk e-shop page . Following are best-seller combo packs and services that we are providing as of now

  1. YouTube channel covering all the interview-related important topics in SQL, Python, MS Excel, Machine Learning Algorithm, Statistics, and Direct Interview Questions
    Link – The Data Monk Youtube Channel
  2. Website – ~2000 completed solved Interview questions in SQL, Python, ML, and Case Study
    Link – The Data Monk website
  3. E-book shop – We have 70+ e-books available on our website and 3 bundles covering 2000+ solved interview questions. Do check it out
    Link – The Data E-shop Page
  4. Instagram Page – It covers only Most asked Questions and concepts (100+ posts). We have 100+ most asked interview topics explained in simple terms
    Link – The Data Monk Instagram page
  5. Mock Interviews/Career Guidance/Mentorship/Resume Making
    Book a slot on Top Mate

The Data Monk e-books

We know that each domain requires a different type of preparation, so we have divided our books in the same way:

1. 2200 Interview Questions to become Full Stack Analytics Professional – 2200 Most Asked Interview Questions
2.Data Scientist and Machine Learning Engineer -> 23 e-books covering all the ML Algorithms Interview Questions
3. 30 Days Analytics Course – Most Asked Interview Questions from 30 crucial topics

You can check out all the other e-books on our e-shop page – Do not miss it


For any information related to courses or e-books, please send an email to [email protected]

About TheDataMonkGrand Master

I am the Co-Founder of The Data Monk. I have a total of 6+ years of analytics experience 3+ years at Mu Sigma 2 years at OYO 1 year and counting at The Data Monk I am an active trader and a logically sarcastic idiot :)

Follow Me