1. What is Machine Learning ?
a. Have you ever shopped online? while checking for a product, did you notice it recommends product similar to what you are looking for? or did you see “the person bought this product also bought this” combination of products. How are they doing this recommendation? This is machine learning.
b. Did you ever get a call from any bank or finance company asking you to take loan or any insurance policy? Do they call everyone? No, they call only selected customers who they think will purchase their product. How do they select? This is target marketing and can be applied using Clustering. This is machine learning
c. Do you go to supermarket for groceries or something? Ever noticed chips are placed near cold drinks? Why? Whoever buys cold drink will most likely buy chips. How did they get to know this relation? This is association rule mining (market basket analysis). This is machine learning
d. Do you know most of the time when you chat on a website, it’s not a human with whom you are talking to. It’s a bot. This is machine learning
e. Driverless car is a direct product of machine learning and artificial intelligence
2. What is Supervised Learning?
– We can go on and on with the basic definition of Supervised learning, but that would defeat the purpose of the book. Supervised means a thing which you can monitor. Supervised learning includes all the algorithms where you know the output of some data. You train your model on these data assuming the fact that these are correct data points. And then you build a model on top of it.
Example – We want to know the number of customers which will come to my restaurant in November. Now, I have the number of customers who have visited my restaurant in the last 3 years. So, we have some data points of the past, we can build a forecasting model using these data points and then we can predict the customers visiting in coming November.
Anything for which we know the output for few data points will fall under supervised learning
3. What are the applications of supervised learning?
– PC Games
– Chat Bots
– Forecasting number of visitors on Amazon
– Classification of objects for Tesla
4. What is unsupervised learning?
– A supervised learning needs some output to build a model. An unsupervised learning algorithm needs nothing. It will build a model on your training dataset y finding connection between different values and it will keep iterating the process until all the data points are considered. An example will help you understand better:-
Example – You have things with different geometric shape, some are circular, some are oval, square, rectangular, etc. You need to make bucket these into 4 parts. Now the algorithm which you will use does not know anything about bucketing, it only knows that you need 4 buckets. It will most probably take the first 4 items and place them on a co-ordinate. Now each object coming in will be allocated near to one of the four buckets. The algorithm will keep iterating till you are done with all the items. By then end of the run, you will have 4 buckets. This is unsupervised learning
5. What is reinforcement learning?
We talked about supervised and unsupervised learning. But this is not all. There is this third type of machine learning, which is called reinforcement learning.
Example – Let’s imagine that a newborn baby comes across a lit candle. Now, the baby does not know what happens if it touches the flame. Eventually, out of curiosity, the baby tries to touch the flame and gets hurt. After this incident, the baby learns that repeating the same thing again might get him hurt. So, the next time it sees a burning candle, it will be more cautious
That is exactly how Reinforcement learning works. Reinforcement learning is a kind of Machine Learning wherein the system that is to be trained to do a particular job, learns on its own based on its previous experiences and outcomes while doing a similar kind of a job.
6. Explain Reinforcement learning with an example(Sandeep Dayananda)
Look at the image here.
1. You provide the system with an image of an apple and ask it to identify it.
2. The computer comes up with an answer as you can see on the image…it says it’s a ‘mango’.
3. You tell the system that it’s a wrong answer and the image is of an apple. That’s the feedback.
4. The machine learns from the feedback.
5. Finally, if it comes across another image of an apple, it will be able to identify it correctly.
That’s reinforcement learning.
7. What are the predictor and target variable?
A target variable is the one which you want to predict or forecast and the predictor variables are those which actually affects the prediction.
Example – Suppose I want to predict the number of customers coming in my restaurant next Monday, then the number of customers will be target variable and predictor variable will be something like, is it a holiday (Binary variable), season of the year(may be more people visits a restaurant in winter), festival – If there is some festival then more people may come to the restaurant. These are your predictor variable.
8. Different names for predictor and target variable
The reason why this question is here is because you will be exploring a lot of thing if you are into supervised modeling and you will go through multiple blogs and study materials. Different m=naming conventions will definitely confuse you. So, remember what is written below
a. Features = Predictor Variable = Independent variable
b. Target variable = Dependent variable = Response Variable
9. Uses of Supervised Learning
– Predicting the future
– Demand Supply prediction
10. How to get labeled data?
To perform Supervised learning all you need is a labeled data i.e. a dataset which have the target values. There are three ways in which you can get the labeled data:-
a. Historic data – The restaurant example will again come to my rescue, suppose I have historical data of the last 3 years, so I have the target variable which is a monthly number of customers.
b. Experiments to label data
c. Crowdsource labeled data
11. What are the packages in Python?
There are multiple packages which are very handy for supervised learning, but we will use the most common package i.e. scikit-learn or sklearn package
Other important libraries are Tensor flow and Keras.
12. What is scikit?
Scikit-learn provides a range of supervised and unsupervised learning algorithms via a consistent interface in Python.
It is licensed under a permissive simplified BSD license and is distributed under many Linux distributions, encouraging academic and commercial use.
The library is built upon the SciPy (Scientific Python) that must be installed before you can use scikit-learn. This stack that includes:
NumPy: Base n-dimensional array package
SciPy: Fundamental library for scientific computing
Matplotlib: Comprehensive 2D/3D plotting
IPython: Enhanced interactive console
Sympy: Symbolic mathematics
Pandas: Data structures and analysis
13. What is the main difference between classification problems and regression problems in machine learning?
Regression is used to predict continuous values. Classification is used to predict which class a data point is part of (discrete value).
Example: I have a house with W rooms, X bathrooms, Y square-footage and Z lot-size. Based on other houses in the area that have recently sold, how much (dollar amount) can I sell my house for? I would use regression for this kind of problem.
Example: I have an unknown fruit that is yellow in color, 5.5 inches long, diameter of an inch, and density of X. What fruit is this? I would use classification for this kind of problem to classify it as a banana (as opposed to an apple or orange).
14. Is logistic regression classification or regression?
Logistic regression is emphatically not a classification algorithm on its own. It is only a classification algorithm in combination with a decision rule that makes dichotomous the predicted probabilities of the outcome. Logistic regression is a regression model because it estimates the probability of class membership as a (transformation of a) multilinear function of the features.
15. What is data.shape?
data.shape is the command which gets you the number of rows and columns in the dataset.
16. Types of classifiers?
There are different types of classifiers:
b. Naïve Bayes
c. Decision Tree
d. Logistic Regression
e. K-Nearest Neighbor
f. Artificial Neural Network
g. Support Vector Machine
17. What is knn?
You will get 100s of definitions of KNN on the internet, but to keep it simple. The basic idea of K-Nearest Neighbor is to predict the label for any item by looking at the value of k. Okay, let’s understand it, you have a sample of animals, few have 4 legs and others have 2 legs. Now you put the value of k as 2 that means you need to make 2 buckets. The algorithm will pick the first item and will place it in a bucket. It will again take up another item and will place it somewhere on the co-ordinate. Now from the third sample it will start placing the item near to one of these buckets. Sooner all the items will be put in one or the other bracket
18. How does knn works?
KNN works by analogy. The idea is that you are what you resemble.
So when we want to classify a point we look at its K-closest (most similar) neighbors and we classify the point as the majority class in those neighbors.
KNN depends on two things: A metric used to compute the distance between two points and the value of “k” the number of neighbors to consider.
When “k” is a very small number KNN can over fit, it will classify just based on the closest neighbors instead of learning a good separating frontier between classes. But if “k” is a very big number KNN will under fit, in the limit if k=n KNN will think every point belongs to the class that has more samples.
KNN can be used for regression, just average the value for the k nearest neighbors or a point to predict the value for a new point.
One nice advantage of KNN is that it can work fine if you only have a few samples for some of the classes.
19. What is fitting the model?
Fitting the model is training a model i.e. you take two arguments, independent variable and dependent variable and train your model on these data points.
20. EDA using seaborn’s countplot
The countplot function of seaborn library is a very useful function which is used to plot the count of a categorical variable. Let’s suppose we have the Titanic dataset which has a column “Survived” and it is a binary variable with 0 denoting not survived and 1 denoting survived. Plotting this using countplot is very easy.
In the next blog, we will create a data set and will build our Supervised model
Keep Learning 🙂