## Register Now

It will take less than 1 minute to register for lifetime. Bonus Tip - We don't send OTP to your email id Make Sure to use your own email id for free books and giveaways

## Ada Boost Algorithm in Python

Gist of Adaptive boost Algorithm in layman’s term – If you want to improve the performance of a class then you should concentrate on improving the average marks of the class. In order to increase the average marks you need to focus on the weaker section of the class because the toppers will anyways perform. Toppers might dip down from 95% to 90%, but it won’t matter much if you can improve the percentage of the bottom 10 student from 35% to 80% which is relatively easier than training the toppers to improve from 90% to 95%.

When I started my career as a Decision Scientist, I had very limited knowledge of anything which was even remotely close to the domain. With time I was exposed to Regression models which allured me to try new algorithms on my data set.

Linear and Logistic Regression is still the best algorithm to start exploring the Data Science. But, sooner you will start feeling that the regression has a lot into it.

Before you start exploring the Queen Algorithm of all the Kaggle solution i.e. XGBoost, you should learn about Gradient Boosting, and before exploring GBM, you should understand Ada Boosting

`Ada Boosting -> Gradient Boosting -> XGBoosting`

Boosting in general is the method to empower the weak learner i.e. It is a method of converting a weak learner into strong learner.

Let’s take a simple example, you have a dataset in which you are predicting the sale of cake in a particular region. Now the strong learners are something like , festivals, birthdays, etc. i.e. whenever there is a festival or birthday then the sale of cakes increases.

A weak learner could be something like temperature or rainfall which might be a very weak learner but the challenge is to convert it into a strong learner to chisel the prediction.

Talking about Ada Boost, it starts with training a decision tree in which each observation is assigned an equal weight. Then comes the interesting part, you already know which is a strong learner, now you lower the weight of the strong learner and gives more weight to the weak learner.

Thus the second tree which is grown is on the previous weight.

```Tree 1 = All the observation with equal weight
Tree 2 = More weight to the weak learner and less weight to the strong learner```

The error is then calculated on the second tree. Predictions of the final ensemble model is therefore the weighted sum of the predictions made by the previous tree models.

The major difference between Ada and Gradient boosting is the way the weak learners are treated. While Ada treats it with increasing the weight of the weak learner, the Gradient Boosting uses loss function to evaluate and transform the weak learner

loss function measures how well a machine learning or statistics model fits empirical data of a certain phenomenon (e.g. speech or image recognition, predicting the price of real estate, describing user behavior on a web site).

The way a loss function is treated depends on the problem we want to solve, if we are predicting the number of tickets created in a particular month, then loss function will be the difference between the actual and predicted value.
If you want to predict if a person is suffering from a particular disease then confusion matrix could be your loss function

One of the biggest motivations of using gradient boosting is that it allows one to optimize a user specified cost function, instead of a loss function that usually offers less control and does not essentially correspond with real world applications. Below is the flow diagram of prediction process

Let’s build our first Ada Boost model

`from sklearn.ensemble import AdaBoostClassifierfrom sklearn import datasetsfrom sklearn.model_selection import train_test_splitfrom sklearn import metrics`

First import all the important packages in Python. The above code is ready to use, but I would appreciate writing the code
Now get the most famous iris dataset

```iris_dataset = datasets.load_iris()
X = iris_dataset.data
y = iris_dataset.target```

Taking the test size as 0.25

`X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25) `

Creating our classifier

n_estimator = The maximum number of estimators at which boosting is terminated. In case of perfect fit, the learning procedure is stopped early. The default value of n_estimator is 50

learning_rate = Learning rate shrinks the contribution of each classifier by learning_rate. There is a trade-off between learning_rate and n_estimators

base_estimator = The base estimator from which the boosted ensemble is built.

`classifier = AdaBoostClassifier(n_estimators=40, learning_rate=1)`

Creating a model on this classifier

`model_1 = classifier.fit(X_train, y_train)`

Predicting the response for test dataset

`y_pred = model_1.predict(X_test)`

Checking the accuracy of the model

`print("Model_1 accuracy",metrics.accuracy_score(y_test, y_pred))`

Complete Code

` from sklearn.ensemble import AdaBoostClassifier from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn import metrics iris_dataset = datasets.load_iris() X = iris_dataset.data y = iris_dataset.target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.35)  classifier = AdaBoostClassifier(n_estimators=50,                          learning_rate=1) model_1 = classifier.fit(X_train, y_train) y_pred = model_1.predict(X_test) print("Model_1 accuracy",metrics.accuracy_score(y_test, y_pred))`
`Model_1 accuracy 0.9622641509433962`

If you have reached till here, I guess that you are already good with the basic concepts and above all you can build a simple Adaptive Boosting model.
The way in which Ada Boost works:-
It takes the training subset, repeatedly trains the model by selecting the accuracy of the training dataset, means, if the model is correctly predicting the dependent variable then it learns from the model.
The main part is that it assigns higher weight to the weaker learner. This process iterate until the complete training data fits without any error or until reached to the specified maximum number of estimators.

Keep Learning 🙂

The Data Monk