Visualization in Python Part 2
We have already plotted some basic graphs. Now it’s time to plot some more graphs:-
Line Histogram
Now let’s create a line histogram with some random data
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats
noise = np.random.normal(0, 1, (1000, ))
density = stats.gaussian_kde(noise)
n, x, _ = plt.hist(noise, bins=np.linspace(-3, 3, 50),histtype=u’step’, density=True)
plt.plot(x, density(x))
plt.show()
Graph 13 – A line histogram
Variable Width histogram
This is how a variable column width histogram looks like
Let’s create one with our dataset
import numpy as np
import matplotlib.pyplot as plt
freqs = np.array([2, 7, 21, 15, 12])
bins = np.array([65, 75, 80, 90, 105, 110])
widths = bins[1:] – bins[:-1]
heights = freqs.astype(np.float)/widths
plt.fill_between(bins.repeat(2)[1:-1], heights.repeat(2), facecolor=’orange’)
plt.show()
Graph 14 – A variable width histogram
One
more example belowimport numpy as np
import matplotlib.pyplot as plt
x = np.sort(np.random.rand(6))
y = np.random.rand(5)
plt.bar(x[:-1], y, width=x[1:] – x[:-1])
plt.show()
Graph 15 – Variable width histogram
Area Chart
Below is how an area chart looks like:
Let’s create a basic area chart with some dummy data
Import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Data
x=range(1,6)
y=[ [1,4,6,8,9], [2,2,7,10,12], [2,8,5,10,6] ]
# Plot
plt.stackplot(x,y, labels=[‘A’,’B’,’C’])
plt.legend(loc=’upper left’)
plt.show()
Graph 16 – A basic area chart
You already know how to add x-labels, y-labels, title, etc.
Go ahead and add these in the graph above
Box and Whisker Plot
A box and whisker plot, or boxplot for short, is generally used to summarize the distribution of a data sample.
The x-axis is used to represent the data sample, where multiple boxplots can be drawn side by side on the x-axis if desired.
Box plot is one of the most common
The end of the boxes represents
seed(123)
a = random.sample(range(1,100),20)
b = random.sample(range(1,100),20)
c = random.sample(range(1,100),20)
d = random.sample(range(1,100),20)
list_Ex = [a,b,c,d]
plt.boxplot(list_Ex)
Graph 17 – A basic Box-Whisker graph
Now we will try to make the graph look better by adding
color to the plot. The box-plot shows median, 25th and 75th
percentile, and outliers. You should try to give different color to these
points to make the plot more appealing.
When you plot a boxplot, you can use the following 5 attributes of the plot:-
a. box – To modify the color, line
width, etc. of the central box
b. whisker – To modify the color and
line width of the line which connects the box to the cap i.e. the horizontal
end of the box plot
c. cap – The horizontal end of the
box
d. median – The center of the box
e. flier
The box denotes the 1st and 3rd Quartile and it is called
IQR i.e. the Inter Quartile Range. The lower fence is at Q1 – 1.5*IQR and the
upper fence is at Q3 + 1.5*IQR. Any point which falls above or below it is
called fliers or outliers
Following is the code with some fancy colors to help you understand each term individually.
bp=plt.boxplot(list_Ex,patch_artist = True)
for box in bp[‘boxes’]:
box.set(color=’orange’,linewidth=2
for whisker in bp[‘whiskers’]:
whisker.set(color = ‘red’,linewidth=2)
for cap in bp[‘caps’]:
cap.set(color=’green’,linewidth=2)
for median in bp[‘medians’]:
median.set(color=’blue’,linewidth=2)
for flier in bp[‘fliers’]:
flier.set(marker=’o’,color = ‘black’, alpha=0.5)
Graph 18 –
Box-plot practice
Following is one more code with the help of which you can replicate a Gaussian
distribution
from numpy.random import seed
from numpy.random import randn
from matplotlib import pyplot
seed(1)
# random numbers drawn from a Gaussian distribution
x = [randn(1000), 5 * randn(1000), 10 * randn(1000)]
# create box and whisker plot
pyplot.boxplot(x)
# show line plot
pyplot.show()
Graph 19 – A Box-Whisker Plot
Scatter plot
Scatter plot is an easy to make but interesting visualization which gives a clear picture of how the data is distributed.
Let’s take example of 10 innings played by Sachin, Dhoni, and Kohli and see how their scores are distributed. The code is fairly easy to understand
sachin = [89, 90, 70, 89, 100, 80, 90, 100, 80, 34]
kohli = [30, 29, 49, 48, 100, 48, 38, 45, 20, 30]
dhoni = [23,45,67,76,65,45,100,12,34,65]
run = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
plt.scatter(run, sachin, color=’red’)
plt.scatter(run, kohli, color=’green’)
plt.scatter(run,dhoni,color=’blue’)
plt.xlabel(‘Score Range’)
plt.ylabel(‘Run scored’)
plt.show()
You can also add
legend = [‘sachin’,’kohli’,’dhoni’]
plt.legend(legend) The plot will now look like this
Graph 20 – A scatter plot
Below
is one more scatter plot where you give weighted area and the size of the
circle will be on the basis of the circle
import numpy as np
np.random.seed(123)
x = random.sample(range(1,100),40)
y = random.sample(range(1,100),40)
colors = np.random.rand(N)
area = (30*np.random.rand(N))**2
plt.scatter(x,y,s=area,c=colors,alpha=0.5)
plt.show()
Graph 21 – A scatter plot with area of bubble denoting the volume
Pie Chart
Create a pie chart for the number of centuries scored by Sachin, Dhoni, Dravid,
and Kohli.
labels =
‘Sachin’,’Dhoni’,’Kohli’,’Dravid’
size = [100,25,70,50]
colors = [‘pink’,’blue’,’red’,’orange’]
explode = (0.1,0,0,0)
plt.pie(size,explode=explode,labels=labels,colors=colors,autopct=’%1.1f%%’,shadow=True,startangle=140)
plt.axis(‘equal’)
plt.show()
explode is used to set apart the first part of the pie chart. Everything
else in the code is self explanatory. Below is the plot
Graph 22 – Pie chart showing performance of cricketers
Some cool Visualizations
Create a stacked chart to demonstrate the number of people voting for either Python or Java in 5 countries, namely, India, USA, England, S.A., Nepal
import numpy as np
import matplotlib.pyplot as plt
Python = (20, 35, 30, 35, 27)
Java = (25, 32, 34, 20, 25)
width = 0.35 # the width of the bars: can also be len(x) sequence
p1 = plt.bar(ind, Python, width)
p2 = plt.bar(ind, Java, width,bottom=Python)
plt.ylabel(‘Votes’)
plt.title(‘Number of people using Python or Java’)
plt.xticks(ind, (‘India’, ‘USA’, ‘England’, ‘S.A.’, ‘Nepal’))
plt.yticks(np.arange(0, 81, 10))
plt.legend((p1[0], p2[0]), (‘Python’, ‘Java’))
plt.show()
xticks is used to give labels to the x-axis and yticks give labels to the y-axis.
Graph 23 – Stacked Bar graph
A
cool area graph
import numpy as np
import matplotlib.pyplot as plt
# create data
x=range(1,15)
y=[1,4,6,8,4,5,3,2,4,1,5,6,8,7]
# Change the color and its transparency
plt.fill_between( x, y, color=”red”, alpha=0.4)
plt.show()
# Same, but add a stronger line on top (edge)
plt.fill_between( x, y, color=”red”, alpha=0.2)
plt.plot(x, y, color=”red”, alpha=0.6)
The parameter alpha is used to give weight age to the density of color. 0.4
is given to the edge and 0.2 is given to the fill
Graph 24 – An area graph
One of the most important
There are four types of information which we can display using any plot:-
1. Distribution
2. Comparison
3. Relationship
4. Composition
1.
How many people are from which state of the country?
a Histogram – If you have few data point
b. Line Histogram – When you have a lot of data points
c. Scatter plot – When you have to show the distribution of 2-3 variables
2. Comparison – When you have to compare something over 2 or more categories
a. Variable width chart – When you have to compare two variables per item
b. Tables with embedded charts – When there are many categories, basically a matrix of charts
c. Horizontal or Vertical Histogram – When there are few categories in a data set
d. If you want to compare something over time
i. Line
iii. Many categories line chart
3. Relationship Charts – When you want to see the relationship between
two or more variables then you have to use relationship charts
a. Scatter Plot
b. Scatter plot bubble chart
4. Composition Charts – When you have to show a percentage or composition
of variables.
a. Pie Chart – Very basic plot when
there are 3-6 categories
b. Stacked 100% bar chart with sub
component – When you have to show components of components
c. Stacked 100% bar chart – When you
have to look into the contribution of each component.
d. Stacked area chart – When
relative and absolute difference matters
The Data Monk services
We are well known for our interview books and have 70+ e-book across Amazon and The Data Monk e-shop page . Following are best-seller combo packs and services that we are providing as of now
- YouTube channel covering all the interview-related important topics in SQL, Python, MS Excel, Machine Learning Algorithm, Statistics, and Direct Interview Questions
Link – The Data Monk Youtube Channel - Website – ~2000 completed solved Interview questions in SQL, Python, ML, and Case Study
Link – The Data Monk website - E-book shop – We have 70+ e-books available on our website and 3 bundles covering 2000+ solved interview questions. Do check it out
Link – The Data E-shop Page - Instagram Page – It covers only Most asked Questions and concepts (100+ posts). We have 100+ most asked interview topics explained in simple terms
Link – The Data Monk Instagram page - Mock Interviews/Career Guidance/Mentorship/Resume Making
Book a slot on Top Mate
The Data Monk e-books
We know that each domain requires a different type of preparation, so we have divided our books in the same way:
1. 2200 Interview Questions to become Full Stack Analytics Professional – 2200 Most Asked Interview Questions
2.Data Scientist and Machine Learning Engineer -> 23 e-books covering all the ML Algorithms Interview Questions
3. 30 Days Analytics Course – Most Asked Interview Questions from 30 crucial topics
You can check out all the other e-books on our e-shop page – Do not miss it
For any information related to courses or e-books, please send an email to [email protected]