## Visualization in Python Part 2

We have already plotted some basic graphs. Now it’s time to plot some more graphs:-

**Line Histogram **Now let’s create a line histogram with some random data

*import matplotlib.pyplot as plt*

import numpy as np

import scipy.stats as stats

noise = np.random.normal(0, 1, (1000, ))

density = stats.gaussian_kde(noise)

n, x, _ = plt.hist(noise, bins=np.linspace(-3, 3, 50),histtype=u’step’, density=True)

plt.plot(x, density(x))

plt.show()

import numpy as np

import scipy.stats as stats

noise = np.random.normal(0, 1, (1000, ))

density = stats.gaussian_kde(noise)

n, x, _ = plt.hist(noise, bins=np.linspace(-3, 3, 50),histtype=u’step’, density=True)

plt.plot(x, density(x))

plt.show()

**Graph 13 –** A line histogram

**Variable Width histogram **

This is how a variable column width histogram looks like

Let’s create one with our dataset*import numpy as np import matplotlib.pyplot as plt freqs = np.array([2, 7, 21, 15, 12]) bins = np.array([65, 75, 80, 90, 105, 110]) widths = bins[1:] – bins[:-1] heights = freqs.astype(np.float)/widths plt.fill_between(bins.repeat(2)[1:-1], heights.repeat(2), facecolor=’orange’) plt.show()*

**Graph 14 – **A variable width histogram

One
more example below*import numpy as np
import matplotlib.pyplot as plt
x = np.sort(np.random.rand(6))
y = np.random.rand(5)
plt.bar(x[:-1], y, width=x[1:] – x[:-1])
plt.show()*

**Graph 15 – **Variable width histogram

**Area Chart **Below is how an area chart looks like:

Let’s create a basic area chart with some dummy data

Import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

# Data

x=range(1,6)

y=[ [1,4,6,8,9], [2,2,7,10,12], [2,8,5,10,6] ]

# Plot

plt.stackplot(x,y, labels=[‘A’,’B’,’C’])

plt.legend(loc=’upper left’)

plt.show()

**Graph 16 – **A basic area chart

You already know how to add x-labels, y-labels, title, etc.

Go ahead and add these in the graph above

**Box and Whisker Plot**

A box and whisker plot, or boxplot for short, is generally used to summarize the distribution of a data sample.

The x-axis is used to represent the data sample, where multiple boxplots can be drawn side by side on the x-axis if desired.

Box plot is one of the most common

The end of the boxes represents *seed(123) a = random.sample(range(1,100),20) b = random.sample(range(1,100),20) c = random.sample(range(1,100),20) d = random.sample(range(1,100),20) list_Ex = [a,b,c,d] plt.boxplot(list_Ex) *

**Graph 17 –** A basic Box-Whisker graph

Now we will try to make the graph look better by adding
color to the plot. The box-plot shows median, 25^{th} and 75^{th}
percentile, and outliers. You should try to give different color to these
points to make the plot more appealing.

When you plot a boxplot, you can use the following 5 attributes of the plot:-

**a. box –** To modify the color, line
width, etc. of the central box

**b. whisker –** To modify the color and
line width of the line which connects the box to the cap i.e. the horizontal
end of the box plot

**c. cap –** The horizontal end of the
box

**d. median –** The center of the box

**e. flier
**

The box denotes the 1

^{st}and 3

^{rd}Quartile and it is called IQR i.e. the Inter Quartile Range. The lower fence is at Q1 – 1.5*IQR and the upper fence is at Q3 + 1.5*IQR. Any point which falls above or below it is called fliers or outliers

Following is the code with some fancy colors to help you understand each term individually.*bp=plt.boxplot(list_Ex,patch_artist = True) for box in bp[‘boxes’]: box.set(color=’orange’,linewidth=2 for whisker in bp[‘whiskers’]: whisker.set(color = ‘red’,linewidth=2) for cap in bp[‘caps’]: cap.set(color=’green’,linewidth=2) for median in bp[‘medians’]: median.set(color=’blue’,linewidth=2) for flier in bp[‘fliers’]: flier.set(marker=’o’,color = ‘black’, alpha=0.5)*

**Graph 18 –** * *

**Box-plot practice**

Following is one more code with the help of which you can replicate a Gaussian
distribution

*from numpy.random import seed
from numpy.random import randn
from matplotlib import pyplot
seed(1)
# random numbers drawn from a Gaussian distribution
x = [randn(1000), 5 * randn(1000), 10 * randn(1000)]
# create box and whisker plot
pyplot.boxplot(x)
# show line plot
pyplot.show()*

**Graph 19 –** A Box-Whisker Plot

**Scatter plot **Scatter plot is an easy to make but interesting visualization which gives a clear picture of how the data is distributed.

Let’s take example of 10 innings played by Sachin, Dhoni, and Kohli and see how their scores are distributed. The code is fairly easy to understand

*sachin = [89, 90, 70, 89, 100, 80, 90, 100, 80, 34]*

kohli = [30, 29, 49, 48, 100, 48, 38, 45, 20, 30]

dhoni = [23,45,67,76,65,45,100,12,34,65]

run = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]

plt.scatter(run, sachin, color=’red’)

plt.scatter(run, kohli, color=’green’)

plt.scatter(run,dhoni,color=’blue’)

plt.xlabel(‘Score Range’)

plt.ylabel(‘Run scored’)

plt.show()

kohli = [30, 29, 49, 48, 100, 48, 38, 45, 20, 30]

dhoni = [23,45,67,76,65,45,100,12,34,65]

run = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]

plt.scatter(run, sachin, color=’red’)

plt.scatter(run, kohli, color=’green’)

plt.scatter(run,dhoni,color=’blue’)

plt.xlabel(‘Score Range’)

plt.ylabel(‘Run scored’)

plt.show()

You can also add

*legend = [‘sachin’,’kohli’,’dhoni’] plt.legend(legend)* The plot will now look like this

**Graph 20 –** A scatter plot

Below
is one more scatter plot where you give weighted area and the size of the
circle will be on the basis of the circle*import numpy as np
np.random.seed(123)
x = random.sample(range(1,100),40)
y = random.sample(range(1,100),40)
colors = np.random.rand(N)
area = (30*np.random.rand(N))**2
plt.scatter(x,y,s=area,c=colors,alpha=0.5)
plt.show()*

**Graph 21 –** A scatter plot with area of bubble denoting the volume

**Pie Chart**

Create a pie chart for the number of centuries scored by Sachin, Dhoni, Dravid,
and Kohli.

*labels =
‘Sachin’,’Dhoni’,’Kohli’,’Dravid’
size = [100,25,70,50]
colors = [‘pink’,’blue’,’red’,’orange’]
explode = (0.1,0,0,0)
plt.pie(size,explode=explode,labels=labels,colors=colors,autopct=’%1.1f%%’,shadow=True,startangle=140)
plt.axis(‘equal’)
plt.show()
*explode is used to set apart the first part of the pie chart. Everything
else in the code is self explanatory. Below is the plot

**Graph 22 –** Pie chart showing performance of cricketers

**Some cool Visualizations **

Create a stacked chart to demonstrate the number of people voting for either Python or Java in 5 countries, namely, India, USA, England, S.A., Nepal

*import numpy as np*

import matplotlib.pyplot as plt

Python = (20, 35, 30, 35, 27)

Java = (25, 32, 34, 20, 25)

width = 0.35 # the width of the bars: can also be len(x) sequence

p1 = plt.bar(ind, Python, width)

p2 = plt.bar(ind, Java, width,bottom=Python)

plt.ylabel(‘Votes’)

plt.title(‘Number of people using Python or Java’)

plt.xticks(ind, (‘India’, ‘USA’, ‘England’, ‘S.A.’, ‘Nepal’))

plt.yticks(np.arange(0, 81, 10))

plt.legend((p1[0], p2[0]), (‘Python’, ‘Java’))

plt.show()

import matplotlib.pyplot as plt

Python = (20, 35, 30, 35, 27)

Java = (25, 32, 34, 20, 25)

width = 0.35 # the width of the bars: can also be len(x) sequence

p1 = plt.bar(ind, Python, width)

p2 = plt.bar(ind, Java, width,bottom=Python)

plt.ylabel(‘Votes’)

plt.title(‘Number of people using Python or Java’)

plt.xticks(ind, (‘India’, ‘USA’, ‘England’, ‘S.A.’, ‘Nepal’))

plt.yticks(np.arange(0, 81, 10))

plt.legend((p1[0], p2[0]), (‘Python’, ‘Java’))

plt.show()

xticks is used to give labels to the x-axis and yticks give labels to the y-axis.

**Graph 23 –** Stacked Bar graph

A
cool area graph*
import numpy as np
import matplotlib.pyplot as plt
# create data
x=range(1,15)
y=[1,4,6,8,4,5,3,2,4,1,5,6,8,7]
# Change the color and its transparency
plt.fill_between( x, y, color=”red”, alpha=0.4)
plt.show()
# Same, but add a stronger line on top (edge)
plt.fill_between( x, y, color=”red”, alpha=0.2)
plt.plot(x, y, color=”red”, alpha=0.6)
*The parameter alpha is used to give weight age to the density of color. 0.4
is given to the edge and 0.2 is given to the fill

**Graph 24 – **An area graph

One of the most important

There are four types of information which we can display using any plot:-**1. Distribution 2. Comparison 3. Relationship 4. Composition**

**1.**Distribution shows how diversely the data is distributed in your data set.

How many people are from which state of the country?

**a Histogram**– If you have few data point

**b. Line Histogram –**When you have a lot of data points

**c. Scatter plot –**When you have to show the distribution of 2-3 variables

**2. Comparison –**When you have to compare something over 2 or more categories

**a. Variable width chart –**When you have to compare two variables per item

**b. Tables with embedded charts –**When there are many categories, basically a matrix of charts

**c. Horizontal or Vertical Histogram –**When there are few categories in a data set

**d. If you want to compare something over time**

**i. Line**Chart

ii . Bar Vertical Chart

iii. Many categories line chart

iii. Many categories line chart

**3. Relationship Charts –** When you want to see the relationship between
two or more variables then you have to use relationship charts**a. Scatter Plot
b. Scatter plot bubble chart
4. Composition Charts –** When you have to show a percentage or composition
of variables.

**a. Pie Chart –**Very basic plot when there are 3-6 categories

**b. Stacked 100% bar chart with sub component –**When you have to show components of components

**c. Stacked 100% bar chart –**When you have to look into the contribution of each component.

**d. Stacked area chart –**When relative and absolute difference matters

## Leave a reply