Supply Chain Analytics in Python

Let’s take a case study of Supply Chain optimization.

There is a Restaurant which serves Mega Pizza (40”).  It has one oven, 3 bakers, and 1 packer. Following is the time required by each Pizz

  Number Pizza A Pizza B Pizza C Working Days
Oven 1 Oven 1 Day 0.5 Day 1 Day 30 Days
Baker 3 Bakers 1 Day 2 Days 2 Days 30 Days
Packer 2 Packers 1 Day 1 Day 1 Day 20 Days
Profit   $30 $40 $50  

Now you have to maximize the Profit using PuLP library. Use decision variables, objective functions, and constraints.

How much pizza of each type should we make in 30 days.

First let’s look into the coding part in Python

from pulp import *
model = LpProblem(“Maximize Pizza Profit”, LpMaximize)

#Declare Decision Variable
A = LpVariable(‘A’,lowbound=0,upbound = None,cat=’Integer’)
B = LpVariable(‘B’,lowbound=0, upbound = None, cat=’Integer’)
C = LpVariable(‘C’,lowbound=0,upbound = None, cat=’Integer’)

#Define Objective function
#For Oven
model += 1*A + 0.5*B + 1*C <=  30
#For Baker
model += 1*A+2*B+2*C <=90
#For Packer
model += 1*A+1*B+1*C <= 40

#Solve Model
model.solve()
print(“Produce {} Pizza A”.format(A.varValue))
print(“Produce {} Pizza B”.format(B.varValue))
print(“Produce {} Pizza C”.format(C.varValue))


Now let’s understand the code

from pulp import *
Here you are importing the complete package

model = LpProblem(“Maximize Pizza Profit”, LpMaximize)
Here you are defining the model using LpProblem function. The LpMaximize will look for maximizing the value i.e. Profit. If you want to get the minimum value from the model then use LpMinimize. We can use LpMinimize when we are talking about reducing the wastage.

A = LpVariable(‘A’,lowbound=0,upbound = None,cat=’Integer’)
Here we define each Variable using LpVariable function. Lowbound refers to the lowest possible value of the variable.
Pizza can not be negative so we have given the value 0, Upbound is the maximum value of the variable.
None will ensure that the upbound could be anything
cat is the characteristic of the variable. It could be integer, categorical, or Binary

model += 1*A + 0.5*B + 1*C <=  30
This is the constraint for Oven. A requires 1 day, B requires 0.5 Day, and C requires 1 Day. The <=30 is the constraint which is because there is one oven which will work for 30 days

model += 1*A+2*B+2*C <=90
Similar to the above, the Baker will need 1, 2, and 2 days for A,B, and C respectively. And there are 3 Bakers which work 30 days. Thus constraint is 30*3 = 90

#For Packer
model += 1*A+1*B+1*C <= 40

A packer takes 1,1,and 1 day for A,B, and C pizza. And there are 2 Packers which works 20 days each. Thus constraint is 40


Day 31 – Supervised Learning Overview

The word “Supervised” means monitoring. A supervised learning algorithm is one in which you train a data set on output and then the model takes up these inputs and predicts the outcome. Confusing?

Let’s try an example
You own a restaurant and you have collected various information about the customers like Name, Status, Job, Salary, Address, Home town, Food item they ordered, etc.
Now you want to make a recommendation engine where a new customer’s data is used to give that customer a free dish. You took the data of all the customers and fed it into your model. Now this model knows that if a person is from Punjab( State in India) and is 26 years old, then there is a high chance of him ordering Paratha(Sorry if I am typecasting :P)

So, you already have the historic data and most importantly you know the output for each row of data. Using this historic data you created a model which learns and makes a recommendation in the real time. This whole process is based on the fact that “The model creates a set of rule which enables it to understand the nature of the data and it can then use these set of rules for further prediction”

Interestingly most of the work you will do in your Data Science job will revolve around Supervised Learning.

Supervised learning is where you have input variables (x) and an output variable (Y) and you use an algorithm to learn the mapping function from the input to the output.

Y = f(X)

The goal is to approximate the mapping function so well that when you have new input data (x) that you can predict the output variables (Y) for that data.

It is called supervised learning because the process of an algorithm learning from the training dataset can be thought of as a teacher supervising the learning process. We know the correct answers, the algorithm iteratively makes predictions on the training data and is corrected by the teacher. Learning stops when the algorithm achieves an acceptable level of performance

The most important Supervised Learning algorithms are:-
1. Support Vector Machines
2. Linear Regression
3. Logistic Regression
4. Naive Bayes
5. Linear Discriminant Analysis (LDA)
6. Decision Tree
7. K-Nearest Neighbor
8. Neural Network
9. Similarity Training

You will learn about each of these algorithms one by one, but first let’s look into the process involved in building these models

Step 1 – Gather your data
Step 2 – Clean the data. It will occupy a lot of your time
Step 3 – Feature Engineering. You might need to create or derive new features from the already present data set. The input object is transformed into a feature vector, which contains a number of features that are descriptive of the object.
Step 4 – Determine which algorithm you want to implement on your data set
Step 5 – Run the model on the training data set. Some supervised learning algorithms require the user to determine certain control parameters. These parameters may be adjusted by optimizing performance on a subset (called a validation set) of the training set, or via cross-validation.
Step 6 – Evaluate the performance or accuracy of the model. If everything is fine, then run the model on the test dataset

Above we saw the list of Supervised Learning Algorithms. Supervised Learning problems can further be divided into two categories:-
a. Classification – A classification problem is such where the output variable is a categorical variable. If you are predicting different disease on the basis of symptoms, then that will fall under Classification

b. Regression – Regression is used when you need to predict continuous values like Number of customers coming to a restaurant, the number of visitors on a website, etc.

Some of the applications of Supervised Learning:-

1. Use a predictive algorithm to find out which student will get how much marks
2. Use Logistic Regression to find out which customer will in-cash his insurance policy
3. Predicting prices of House
4. Weather forecasting
5. Classification of emails (Spam and non-spam)
6. In supervised learning for image processing, for example, an AI system might be provided with labeled pictures of vehicles in categories such as cars and trucks. After a sufficient amount of observation, the system should be able to distinguish between and categorize unlabeled images, at which time training can be said to be complete.

Supervised Learning is like learning from a teacher. He will teach you the ways to answer questions and will evaluate your learning. You can expect the same types of questions to appear in the examination i.e. your testing condition. And you answer according to your understanding. Your marks is your accuracy.

Courtesy – Big Data Made Simple

We will use Python to train our Supervised Learning algorithm in the next few Days.

Keep Learning 🙂

XtraMous








Day 30 – What is Machine Learning?

Big Data, Machine Learning, Artificial Intelligence, etc.
If you are regular with news, then you must have heard a lot about these words. Let’s try to understand things with examples

What is Big Data?
In 1990s the size of data used to be small and people used to store only relevant data points. With the World Wide Web(WWW) boom, data became omnipresent. There was a way to store a good amount of data in Excel files and other applications. But the major change happened with the advancement in mobile technologies.
Smartphones came up with a lot of data. Every application and website is storing a plethora of data ranging from your personal to professional information. Almost all the clicks you make on the internet are being stored somewhere in the word.
When you are working with a lot of data, then that data is termed as Big Data.

So, it’s not like Big Data is a new concept. It’s just that the size of data increased multiple times and in order to store these data, we needed new tools and technologies. All this complete eco-system is called Big Data.

Now, what is Machine Learning?
Machine Learning is a way to train a machine to start learning from the user’s behavior and then provide useful information or take actions accordingly. You can see Machine Learning examples around you.

1. You click an advertisement on Google and the next day you get similar ads. This is because your interest was tagged in this brief span of time and now you are bombarded with the advertisements.

2. Ever heard of Driverless cars? Can you even imagine the rate at which the back-end algorithms need to work in order to identify an object and taking actions accordingly? The margin of error is almost zero because we are talking about real life. This is where image recognition and several different algorithms come into the picture.

3. Machine Learning is learning from data, on the other hand, Artificial Intelligence is a buzz word. There are so many problems which you can solve using machine learning. You will understand the capabilities of this domain in the coming Days

4. 10 Years back Software Engineers used to work on these predictive models, clustering and classifying data, etc. But as the amount of data started increasing, handling data and getting insights from these data because difficult. This gave rise to new job opportunities which go by the name of Data Scientist, Data Analyst, Decision Scientist, Big Data Analyst, etc.
So, this thing is not new, it just got scaled up

5. Most of the hard work for machine learning is data transformation. From reading the hype about new machine learning techniques, you might think that machine learning is mostly about selecting and tuning algorithms. The reality is more prosaic: most of your time and effort goes into data cleansing and feature engineering — that is, transforming raw features into features that better represent the signal in your data.

6. AI is not going to become self-aware, rise up, and destroy humanity. A surprising number of people seem to be getting their ideas about artificial intelligence from science fiction movies. We should be inspired by science fiction, but not so credulous that we mistake it for reality. There are enough real and present dangers to worry about, from consciously evil human beings to unconsciously biased machine learning models. So you can stop worrying about skynet and superintelligence .

7. ML is a computer science discipline that consists in making computers “learn” from data rather than programming instructions. For example, imagine you had to implement a gender (male vs female) recognition software. If you had to implement this in the traditional way, you would need to extract features that would help you decide. Then, you would write a lot of code to instruct the computer how to use these features. Unfortunately, this approach is tedious and not robust enough. On the other hand, the ML approach consists in collecting lots of images and labeling them. Then, running an ML algorithm that will learn the task by observing the data. By the way, this approach is called supervised learning.

8. ML is an evolving and exciting field. Many jobs exist and many more will. It is the modern form of literacy in our technological and data-driven society. Learn about it as much as you can.

9. You can very well make a career in Machine Learning and Data Science. You just have to practice playing with data and understanding the data. In my personal opinion, Machine Learning is here to stay, so it’s better if you take some time to understand it

Go through all the overview Days in this challenge, pick up a few and then gain expertise in a couple of these.

XtraMous


Common Statistical Tests

There are various statistical tests in Data Analysis, following are the tests and their use:-
1. Correlation -> This test looks at the association between variables
2. Pearson Correlation -> It tests the strength of the association between two continuous variables
3. Spearman correlation -> It tests the strength of association between two ordinal variables
4. Chi-Square -> It tests for the strength between two categorical variables

Comparison of Means – The below tests looks for the difference between the means of variables

1. Paired T-test ->Test for difference between two relatable variables
2. Independent T-Test -> Test for difference between two related variables
3. ANOVA -> It stands for Analysis of Variance. It is a statistical method used to test differences between two or more means.

Regression – It assess if change in one variable predicts change in another variable

1. Simple Regression – Tests how change in the predictor variable predicts the level of change in the outcome variable
2. Multiple Regression – Tests how change in the combination of two or more variables predict the level of change in the outcome variable

Non-Parametric – These tests are used when the data does not meet assumptions required for parametric tests

1. Wilcoxon rank-sum test -> Tests for difference between two independent variables, takes into account magnitude and direction of difference

2. Wilcoxon sign-rank test -> Test for difference between two related variables, takes into account magnitude and direction of difference

3. Sign test -> Tests if two related variables are different – ignores magnitude of change, only takes into account direction

We will have a separated blog for all the tests separately. Till then google.

XtraMous




R – Cheat Sheet

This is a cheat sheet which aims on giving all the important concepts in a very crisp manner. Just give it a read before starting a new project in R or interviewing for a Data Science or Data Analyst or Business Analyst post.
There are various advantages and disadvantages of using R over Python, but we will not dig deep into it.
This is a cheat sheet, so if you need more help, there is this awesome website www.google.com

We will start directly with Data Types which are the building blocks of a programming language.

There are 6 object types supported in R:-
1. Vectors
2. Lists
3. Matrices
4. Arrays
5. Factors
6. Data Frames

There are 6 data types of these objects:-
1. Logical – TRUE, FALSE
2. Numeric – 56.4, 45.3
3. Integer – 1,2,3,4
4. Complex – 6+2i
5. Character – “the”, “data”, “monk”
6. Raw – Any string or anything

Let’s briefly look into each Object types:-
1. Vector

If you want to create a vector with different data types then you have to use c() to define the vector

The only thing worth mentioning here is that when you use a negative index then that index will be ignored. See the example above

2. List
Vector was the simplest object in R, next in the list is a List 😛

Let’s create a list which will include a list, a vector, and a matrix. If you don’t know much about matrix, just remember it’s a 2-dimensional object which is defined as x<-matrix(c(1,2,3,4,5,6),nrow=2) to create a matrix with 2 rows.

Give names to the elements of list. See the example below

Merge two lists

Converting a list to a vector i.e. unlisting a list


Basically all you need to know about list are:-
1. How to create a list (x <- list())
2. What all can a list include? (Anything ranging from a vector to arrays, matrix, etc.)
3. Giving name to each element of the list (use the function names())
4. Accessing elements of the list (Use [])

Matrix
A matrix is a 2-dimensional object, you need to specify the number of rows and columns, and dim names while declaring a matrix. Here dim names are the names given to the rows and columns 😛

x <- matrix(c(1,2,3,4,56,23),ncol=2,nrow=3,dimname=list(rownames,column names)

I think you must have got a gist of a matrix. You can definitely create two matrices and apply arithmetic operations like addition, subtraction, etc.
Matrix multiplication is also simple only, see the examples below

Accessing elements in a matrix

Arrays
Arrays are able to store more than 2 dimensions in itself. Vector one dimensional, list is 2-dimensional, and now this array is more than 2 dimensional. God knows where this programming language is going 😛

Give names to columns, rows, and matrix

Day 29 – Visualization in Python cont.

We have already plotted some basic graphs. Now it’s time to plot some more graphs:-

Line Histogram

Now let’s create a line histogram with some random data

import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats
noise = np.random.normal(0, 1, (1000, ))
density = stats.gaussian_kde(noise)
n, x, _ = plt.hist(noise, bins=np.linspace(-3, 3, 50),histtype=u’step’, density=True) 
plt.plot(x, density(x))
plt.show()

Graph 13 – A line histogram

Variable Width histogram

This is how a variable column width histogram looks like

Let’s create one with our dataset

import numpy as np
import matplotlib.pyplot as plt
freqs = np.array([2, 7, 21, 15, 12])
bins = np.array([65, 75, 80, 90, 105, 110])
widths = bins[1:] – bins[:-1]
heights = freqs.astype(np.float)/widths
plt.fill_between(bins.repeat(2)[1:-1], heights.repeat(2), facecolor=’orange’)
plt.show()

Graph 14 – A variable width histogram

One more example belowimport numpy as np
import matplotlib.pyplot as plt
x = np.sort(np.random.rand(6))
y = np.random.rand(5)
plt.bar(x[:-1], y, width=x[1:] – x[:-1])
plt.show()

Graph 15 – Variable width histogram

Area Chart

Below is how an area chart looks like:

Let’s create a basic area chart with some dummy data

Import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Data
x=range(1,6)
y=[ [1,4,6,8,9], [2,2,7,10,12], [2,8,5,10,6] ]

# Plot
plt.stackplot(x,y, labels=[‘A’,’B’,’C’])
plt.legend(loc=’upper left’)
plt.show()

Graph 16 – A basic area chart

You already know how to add x-labels, y-labels, title, etc.
Go ahead and add these in the graph above

Box and Whisker Plot

A box and whisker plot, or boxplot for short, is generally used to summarize the distribution of a data sample.
The x-axis is used to represent the data sample, where multiple boxplots can be drawn side by side on the x-axis if desired.

Box plot is one of the most common type of graphics. It gives a nice type of summary of one or more numeric variables. The line that divides the box in the two half is the median of the numbers.
The end of the boxes represents

seed(123)
a = random.sample(range(1,100),20)
b = random.sample(range(1,100),20)
c = random.sample(range(1,100),20)
d = random.sample(range(1,100),20)
list_Ex = [a,b,c,d]
plt.boxplot(list_Ex)

Graph 17 – A basic Box-Whisker graph

Now we will try to make the graph look better by adding color to the plot. The box-plot shows median, 25th and 75th percentile, and outliers. You should try to give different color to these points to make the plot more appealing.

When you plot a boxplot, you can use the following 5 attributes of the plot:-
a. box – To modify the color, line width, etc. of the central box
b. whisker – To modify the color and line width of the line which connects the box to the cap i.e. the horizontal end of the box plot
c. cap – The horizontal end of the box
d. median – The center of the box
e. flier

The box denotes the 1st and 3rd Quartile and it is called IQR i.e. the Inter Quartile Range. The lower fence is at Q1 – 1.5*IQR and the upper fence is at Q3 + 1.5*IQR. Any point which falls above or below it is called fliers or outliers

Following is the code with some fancy colors to help you understand each term individually.

bp=plt.boxplot(list_Ex,patch_artist = True)
for box in bp[‘boxes’]:
    box.set(color=’orange’,linewidth=2
for whisker in bp[‘whiskers’]:
    whisker.set(color = ‘red’,linewidth=2)
for cap in bp[‘caps’]:
    cap.set(color=’green’,linewidth=2)
for median in bp[‘medians’]:
    median.set(color=’blue’,linewidth=2)
for flier in bp[‘fliers’]:
    flier.set(marker=’o’,color = ‘black’, alpha=0.5)

Graph 18 – Box Whisker Chart

Box-plot practice

Following is one more code with the help of which you can replicate a Gaussian distribution

from numpy.random import seed
from numpy.random import randn
from matplotlib import pyplot

seed(1)
# random numbers drawn from a Gaussian distribution
x = [randn(1000), 5 * randn(1000), 10 * randn(1000)]
# create box and whisker plot
pyplot.boxplot(x)
# show line plot
pyplot.show()

Graph 19 – A Box-Whisker Plot

Scatter plot

Scatter plot is an easy to make but interesting visualization which gives a clear picture of how the data is distributed.

Let’s take example of 10 innings played by Sachin, Dhoni, and Kohli and see how their scores are distributed. The code is fairly easy to understand

sachin = [89, 90, 70, 89, 100, 80, 90, 100, 80, 34]
kohli = [30, 29, 49, 48, 100, 48, 38, 45, 20, 30]
dhoni = [23,45,67,76,65,45,100,12,34,65]
run = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
plt.scatter(run, sachin, color=’red’)
plt.scatter(run, kohli, color=’green’)
plt.scatter(run,dhoni,color=’blue’)
plt.xlabel(‘Score Range’)
plt.ylabel(‘Run scored’)
plt.show()

You can also add legend in the plot by using the following command

legend = [‘sachin’,’kohli’,’dhoni’]
plt.legend(legend)
The plot will now look like this

Graph 20 – A scatter plot

Below is one more scatter plot where you give weighted area and the size of the circle will be on the basis of the circle

import numpy as np
np.random.seed(123)
x = random.sample(range(1,100),40)
y = random.sample(range(1,100),40)
colors = np.random.rand(N)
area = (30*np.random.rand(N))**2
plt.scatter(x,y,s=area,c=colors,alpha=0.5)
plt.show()


Graph 21 – A scatter plot with area of bubble denoting the volume

Pie Chart

Create a pie chart for the number of centuries scored by Sachin, Dhoni, Dravid, and Kohli.

labels = ‘Sachin’,’Dhoni’,’Kohli’,’Dravid’
size = [100,25,70,50]
colors = [‘pink’,’blue’,’red’,’orange’]
explode = (0.1,0,0,0)
plt.pie(size,explode=explode,labels=labels,colors=colors,autopct=’%1.1f%%’,shadow=True,startangle=140)
plt.axis(‘equal’)
plt.show()
explode is used to set apart the first part of the pie chart. Everything else in the code is self explanatory. Below is the plot

Graph 22 – Pie chart showing performance of cricketers

Some cool  Visualizations

Create a stacked chart to demonstrate the number of people voting for either Python or Java in 5 countries, namely, India, USA, England, S.A., Nepal

import numpy as np
import matplotlib.pyplot as plt
Python = (20, 35, 30, 35, 27)
Java = (25, 32, 34, 20, 25)
width = 0.35       # the width of the bars: can also be len(x) sequence
p1 = plt.bar(ind, Python, width)
p2 = plt.bar(ind, Java, width,bottom=Python)
plt.ylabel(‘Votes’)
plt.title(‘Number of people using Python or Java’)
plt.xticks(ind, (‘India’, ‘USA’, ‘England’, ‘S.A.’, ‘Nepal’))
plt.yticks(np.arange(0, 81, 10))
plt.legend((p1[0], p2[0]), (‘Python’, ‘Java’))
plt.show()


xticks is used to give labels to the x-axis and yticks give labels to the y-axis.

Graph 23 – Stacked Bar graph

A cool area graph

import numpy as np
import matplotlib.pyplot as plt
# create data
x=range(1,15)
y=[1,4,6,8,4,5,3,2,4,1,5,6,8,7]

# Change the color and its transparency
plt.fill_between( x, y, color=”red”, alpha=0.4)
plt.show()
# Same, but add a stronger line on top (edge)
plt.fill_between( x, y, color=”red”, alpha=0.2)
plt.plot(x, y, color=”red”, alpha=0.6)

The parameter alpha is used to give weight age to the density of color. 0.4 is given to the edge and 0.2 is given to the fill

Graph 24 – An area graph

One of the most important thing is to understand when to use which graph and a list of all the graphs in your knowledge.

There are four types of information which we can display using any plot:-
1. Distribution
2. Comparison
3. Relationship
4. Composition


1. Distribution shows how diversely the data is distributed in your data set.
How many people are from which state of the country?

a   Histogram – If you have few data point
b.  Line Histogram – When you have a lot of data points
c.   Scatter plot – When you have to show the distribution of 2-3 variables

2.    Comparison – When you have to compare something over 2 or more categories

a. Variable width chart – When you have to compare two variables per item
b. Tables with embedded charts – When there are many categories, basically a matrix of charts
c. Horizontal or Vertical Histogram – When there are few categories in a data set
d. If you want to compare something over time
    i.    Line Chart
   ii.   Bar Vertical Chart
   iii.  Many categories line chart

3. Relationship Charts – When you want to see the relationship between two or more variables then you have to use relationship charts

a. Scatter Plot
b. Scatter plot bubble chart

4. Composition Charts –
When you have to show a percentage or composition of variables.

a. Pie Chart – Very basic plot when there are 3-6 categories
b. Stacked 100% bar chart with sub component – When you have to show components of components
c. Stacked 100% bar chart – When you have to look into the contribution of each component.
d. Stacked area chart – When relative and absolute difference matters



Day 28 – Visualization in Python

Data visualization is the discipline of trying to understand data by placing it in a visual context so that patterns, trends and correlations that might not otherwise be detected can be exposed. It  is one of the basic but a very important weapon in your Data Science career.
Python is blessed with some good libraries for visualizations. 

Open Jupyter notebook or any other IDE of your preference.

Library to use – There are a lots of good visualization libraries, but matplot library is the most preferred one to start with because of its simple implementation.
So,We will mostly concentrate on matplot library.

Importing the library and giving it the standard alias as plt.

Following are the two important functions which will come handy in this book:-

To display a chart you should use – plt.show()
To save the chart as an image, use the code – plt.savefig(“Filename.png”)

Popular plotting libraries in Python are:-

1. Matplotlib – Best to start with. It provides easy implementation and gives a lot of freedom
2. Seaborn – It has a high level interface and great default styles
3. Plotly – To create interactive plots
4. Pandas Visualization – Easy interface, built on Matplotlib

Line Chart

A line chart or line graph is a type of chart which displays information as a series of data points called ‘markers’ connected by straight line segments.

So, a line plot is a very basic plot which is used to show observations collected after a regular interval. The x-axis represents the interval and the y-axis represents the values.

Lets plot our first graph
import matplotlib.pyplot as plt
x = [1,2,3,4,5,6]
y = [10,12,20,21,30,35]
plt.plot(x,y)
Here is what you will get

Graph 1 – Basic Line Chart

Plot a sin graph using line plot

import matplotlib.pyplot as plt
from numpy import cos

x = [x*0.01 for x in range(100)]
y = cos(x)
plt.plot(x,y)
plt.show()


Here is what you get as a cos graph

Graph 2 – Cos graph using line plot

You know how to plot a line graph, but there is one important thing missing in the graph i.e. the x and y-axis, and the plot title. Let’s create another line plot for number of students in a class for the following data

c = [1,2,3,4,5,6]
student = [40,52,50,61,70,78]

Following commands are used to put x-axis label, y-axis label, and chart title

plt.xlabel(“Label”)
plt.ylabel(“Label”)
plt.title(“Title”)

The code is given below
c = [1,2,3,4,5,6]
student = [40,52,50,61,70,78]
plt.xlabel(“Class”)
plt.ylabel(“Number of Students”)
plt.title(“Class vs Number of students”)
plt.plot(c, student)

Graph 3 – Class vs Number of Students chart with proper labels and plot title

Do you want to change the color of the line?
Try the following code instead to make the line green in color
plt.plot(c,student,color=’g’)

Graph 4 – Adding color to the same graph

Multi Line Chart

You can also add multiple plots in the same graph. Let’s try to put a couple of new lines in the graph i.e. number of teachers and average marks

Graph 5 – Adding multiple lines to a graph

To add a legend, you have to give label to each of the line which you want to plot and after that you specify a location to the legend

The code is self explanatory and is given below:-

c = [1,2,3,4,5,6]
student = [40,52,50,61,70,78]
avg_marks = [34,43,54,44,50,55]
num_of_teachers = [10,12,13,10,15,10]
plt.xlabel(“Class”)
#plt.ylabel(“Number of Students”)
plt.title(“Class vs Number of students”)
plt.plot(c,student,color=’orange’,label=’Student’)
plt.plot(c,avg_marks,color=’red’,label=’Marks’)
plt.plot(c,num_of_teachers,color=’green’,label=’Teachers’)
plt.legend(loc=”upper left”)


Bar Chart

“A bar chart or bar graph is a chart or graph that presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent. The bars can be plotted vertically or horizontally.”

After the line chart, the second basic but highly used chart is the bar chart

To create a bar chart – plt.bar(x,y)

We will plot few graphs first and then you can put labels, title, and legends later.

import matplotlib.pyplot as plt
a = [‘Apple’,’Mango’,’Pineapple’]
b = [40,60,50]
plt.bar(a,b)

Graph 6 – A simple bar chart

Use random values between 1 and 100 to create the same graph.

import matplotlib.pyplot as plt
from random import seed
from random import randint
seed(123)
x = [‘Apple’,’Mango’,’Pineapple’]
y = [randint(0,100),randint(0,100),randint(0,100)]
plt.bar(x,y)

Graph 7 – Bar chart with random values

Adding color, labels, and title to the random values bar chart

Stacked 100% bar chart with sub component
When you have to show components of components like the graph below

Example of 100% bar chart

x = [“a”,”b”,”c”,”d”]
y1 = np.array([3,8,6,4])
y2 = np.array([10,2,4,3])
y3 = np.array([5,6,2,5])

snum = y1+y2+y3

# normalization
y1 = y1/snum*100.
y2 = y2/snum*100.
y3 = y3/snum*100.
plt.figure(figsize=(4,3))

# stack bars

plt.bar(x, y1, label=’y1′)
plt.bar(x, y2 ,bottom=y1,label=’y2′)
plt.bar(x, y3 ,bottom=y1+y2,label=’y3′)

Graph 8 – A 100% stacked bar chart

Histogram

Histograms are density estimates. A density estimate gives a good impression of the distribution of the data. The idea is to locally represent the data density by counting the number of observations in a sequence of consecutive intervals (bins).

To plot a histogram use this code – plt.hist(x,y)

A simple histogram plot
q = [1,2,34,5,44,66,66,90,33,45,2,1,2,3,4]
plt.hist(q,bins = 3,color=’green’)

Graph 9 – A simple histogram

Create a list using random variables and plot it in 4 bins

import random
my_rand = random.sample(range(1,30),20)
print(my_rand)
print(type(my_rand))
plt.hist(my_rand,bins=4,color=’orange’)

Graph 10 – A histogram made with random variables

In Histogram also you can add more than one data points to make parallel bars.

import random
my_rand = random.sample(range(1,30),20)
my_rand2 = random.sample(range(1,25),20)
print(my_rand)
print(type(my_rand))
plt.hist([my_rand,my_rand2],bins=4,color=[‘green’,’red’])
legend = [‘Rand1′,’Rand2’]
plt.legend(legend)
plt.xlabel(“Bins”)
plt.ylabel(“Random Number”)
plt.title(“Random Variable distribution”)

Graph 11 – Parallel histogram

Horizontal Histogram

import numpy as np
import matplotlib.pyplot as plt
name = [‘Nitin’,’Saurabh’,’Rahul’,’Gaurav’,’Amit’]
run = [200,70,130,120,100]
plt.barh(name,run,color=’orange’)
plt.xlabel(“Runs Scored”)
plt.ylabel(“Cricketer”)
plt.title(“Runs scored by cricketers”)
plt.show()

Graph 12 – A horizontal histogram

Keep making irrelevant and unnecessary graphs.
Keep practicing 🙂

XtraMous

Day 27 – Reading and Writing files in Python

You always have to read and write files when working for a company or Hackathon. So, it’s necessary to know how to read different types of files.

Let’s start the boring but important part

The most important command to open a file in Python is the open() method. It takes two parameters, Name of the file and action mode.

Like most of the other programming languages, Python has 4 modes to access a file:-
1. “r” – Read – Reads a file
2. “a” – Append – Appends a file or create a new file
3. “w” – Write – Writes a new file
4. “x” – Create – Creates the specified file

Apart from these you can also specify the format in which you want to open the file:
1. t for Text(Default)
2. b for Binary file

Open a file
x = open(“Analytics.txt”,”rt”)
It opens the file, basically reads it in text format

Read the file

You can also read the file line by line by the following method or by using readline() method

Write something in a file

Delete a file

Use the “os” package and then run the remove() command
import os
os.remove(“file name”)

God forbid, if you ever have to delete a folder and want to look cool in front of your friends, you can use the following command

os.rmdir(“Name of directory”)

Reading CSV file
Comma Separated Values or CSV file format is one of the most used file formats and you will definitely come across reading a csv file often.
In order to read it, you should ideally import pandas library

import pandas as pd
x = pd.read_csv(“File Path”)

P.S. – This will convert the file in a Data Frame

You can read about different parameters here

There are a lot of file formats, but we covered only those which are of utmost important. In case you need more information, try this link from Data Camp or you can trust your best friend StackOverFlow 😛

If you need information about a specific file format, do comment below.

Keep learning 🙂

XtraMous

Day 26 – Tricky Interview Questions (Python)

The reason why I put interview questions as the title of a lot of posts is because:
1. It makes you click on the post
2. It makes you feel that these are very important questions and you can nail an interview with it
3. These are actual interview questions asked in companies like Myntra, Flipkart, BookMyShow, WNS, Sapient, etc.
4. You have to practice to become perfect. You can practice here or anywhere else. But make sure you know all the questions given below.


Toh suru karte hain bina kisi bakchodi ke
Let’s start with the questions 😛

1. Which data type is mutable and ordered in Python?
List

2. Can a dictionary contain another dictionary?
Yes, a dictionary can contain another dictionary. In fact, this is the main advantage of using dictionary data type.

3. When to use list, set or dictionaries in Python?
A list keeps order, dict and set don’t: When you care about order, therefore, you must use list (if your choice of containers is limited to these three, of course;-).

dict associates with each key a value, while list and set just contain values: very different use cases, obviously.
set requires items to be hashable, list doesn’t: if you have non-hashable items, therefore, you cannot use set and must instead use list.

4. WAP where you first create an empty list and then add the elements.
basic_list = []
basic_list.append(‘Alpha’)
basic_list.append(‘Beta’)
basic_list.append(‘Gamma’)

5. What does this mean: *args, **kwargs? And why would we use it?
We use *args when we aren’t sure how many arguments are going to be passed to a function, or if we want to pass a stored list or tuple of arguments to a function. **kwargsis used when we don’t know how many keyword arguments will be passed to a function, or it can be used to pass the values of a dictionary as keyword arguments. The identifiers args and kwargs are a convention, you could also use *bob and **billy but that would not be wise.

6. What are negative indexes and why are they used?
The sequences in Python are indexed and it consists of the positive as well as negative numbers. The numbers that are positive uses ‘0’ that is uses as first index and ‘1’ as the second index and the process goes on like that.

7. Randomly shuffle the content of a list

8. Take a random sample of 20 elements and put it in a list

9. Take a list and sort it

10. Explain split() and sub() function from the “re” package
split() – uses a regex pattern to “split” a given string into a list
sub() – finds all substrings where the regex pattern matches and then replace them with a different string

11. What are the supported data types in Python?
The most important data types include the following:
1. Number
2. String
3. List
4.Tuple
5. Dictionary
6. Set

12. What is the function to reverse a list?
list.reverse()

13. How to remove the last object from the list?
list.pop(obj=list[-1])
Removes and returns last object or obj from list.

14. What is a dictionary?
A dictionary is one of the built-in data types in Python. It defines an unordered mapping of unique keys to values. Dictionaries are indexed by keys, and the values can be any valid Python data type (even a user-defined class). Notably, dictionaries are mutable, which means they can be modified. A dictionary is created with curly braces and indexed using the square bracket notation.

15. Python is an object oriented language. What are the features of an object oriented programming language?
OOP is the programming paradigm based on classes and instances of those classes called objects. The features of OOP are:
Encapsulation, Data Abstraction, Inheritance, Polymorphism.

16. What is the difference between append() and extend() method?
Both append() and extend() methods are the methods of list. These methods a re used to add the elements at the end of the list.
append(element) – adds the given element at the end of the list which has called this method.
extend(another-list) – adds the elements of another-list at the end of the list which is called the extend method.

17. Write a program to check if a string is a palindrome?
Palindrome is a string which is symmetric like. aba, nitin, nureses run, etc

Below is the code, write it down yourself 😛

18. Take a random list and plot a histogram with 3 bins.

19. What is the different between range () and xrange () functions in Python?
range () returns a list whereas xrange () returns an object that acts like an iterator for generating numbers on demand.

20. Guess the output of the following code
x = “Fox ate the pizza”
print(x[:7])

You can find Python interview questions on many websites, we will keep on updating this list. Time for some marketing, if you want to get some more interview questions on Python, then click below:-

100 Python Questions to crack Data Science/Analyst Interview

Keep practicing 🙂

XtraMous