Supervised Learning Overview

The word “Supervised” means monitoring. A supervised learning algorithm is one in which you train a data set on output and then the model takes up these inputs and predicts the outcome. Confusing?

Let’s try an example
You own a restaurant and you have collected various information about the customers like Name, Status, Job, Salary, Address, Home town, Food item they ordered, etc.
Now you want to make a recommendation engine where a new customer’s data is used to give that customer a free dish. You took the data of all the customers and fed it into your model. Now this model knows that if a person is from Punjab( State in India) and is 26 years old, then there is a high chance of him ordering Paratha(Sorry if I am typecasting :P)

So, you already have the historic data and most importantly you know the output for each row of data. Using this historic data you created a model which learns and makes a recommendation in the real time. This whole process is based on the fact that “The model creates a set of rule which enables it to understand the nature of the data and it can then use these set of rules for further prediction”

Interestingly most of the work you will do in your Data Science job will revolve around Supervised Learning.

Supervised learning is where you have input variables (x) and an output variable (Y) and you use an algorithm to learn the mapping function from the input to the output.

Y = f(X)

The goal is to approximate the mapping function so well that when you have new input data (x) that you can predict the output variables (Y) for that data.

It is called supervised learning because the process of an algorithm learning from the training dataset can be thought of as a teacher supervising the learning process. We know the correct answers, the algorithm iteratively makes predictions on the training data and is corrected by the teacher. Learning stops when the algorithm achieves an acceptable level of performance

The most important Supervised Learning algorithms are:-
1. Support Vector Machines
2. Linear Regression
3. Logistic Regression
4. Naive Bayes
5. Linear Discriminant Analysis (LDA)
6. Decision Tree
7. K-Nearest Neighbor
8. Neural Network
9. Similarity Training

You will learn about each of these algorithms one by one, but first let’s look into the process involved in building these models

Step 1 – Gather your data
Step 2 – Clean the data. It will occupy a lot of your time
Step 3 – Feature Engineering. You might need to create or derive new features from the already present data set. The input object is transformed into a feature vector, which contains a number of features that are descriptive of the object.
Step 4 – Determine which algorithm you want to implement on your data set
Step 5 – Run the model on the training data set. Some supervised learning algorithms require the user to determine certain control parameters. These parameters may be adjusted by optimizing performance on a subset (called a validation set) of the training set, or via cross-validation.
Step 6 – Evaluate the performance or accuracy of the model. If everything is fine, then run the model on the test dataset

Above we saw the list of Supervised Learning Algorithms. Supervised Learning problems can further be divided into two categories:-
a. Classification – A classification problem is such where the output variable is a categorical variable. If you are predicting different disease on the basis of symptoms, then that will fall under Classification

b. Regression – Regression is used when you need to predict continuous values like Number of customers coming to a restaurant, the number of visitors on a website, etc.

Some of the applications of Supervised Learning:-

1. Use a predictive algorithm to find out which student will get how much marks
2. Use Logistic Regression to find out which customer will in-cash his insurance policy
3. Predicting prices of House
4. Weather forecasting
5. Classification of emails (Spam and non-spam)
6. In supervised learning for image processing, for example, an AI system might be provided with labeled pictures of vehicles in categories such as cars and trucks. After a sufficient amount of observation, the system should be able to distinguish between and categorize unlabeled images, at which time training can be said to be complete.

Supervised Learning is like learning from a teacher. He will teach you the ways to answer questions and will evaluate your learning. You can expect the same types of questions to appear in the examination i.e. your testing condition. And you answer according to your understanding. Your marks is your accuracy.

Courtesy – Big Data Made Simple

We will use Python to train our Supervised Learning algorithm in the next few Days.

Keep Learning 🙂

XtraMous








Author: TheDataMonk

I am the Co-Founder of The Data Monk. I have a total of 6+ years of analytics experience 3+ years at Mu Sigma 2 years at OYO 1 year and counting at The Data Monk I am an active trader and a logically sarcastic idiot :)