How to select number of trees in Random forest?
Question
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
It will take less than 1 minute to register for lifetime. Bonus Tip - We don't send OTP to your email id Make Sure to use your own email id for free books and giveaways
Answers ( 7 )
One of the techniques is to use GridsSearchCV() in scikit-Learn where you will have to tune the n_estimators parameter
to find the correct no of trees. But, it is also necessary to pass in the adequate no of trees to the list of n_estimators.
Example – n_estimators = [10,30,100]
Typical values are 10, 30 or 100.
Passing lesser no of trees will not actually give you the benefits of the Random forest method as you loose on the benefit
of creating large no of trees and averaging their output.
Also, creating large no of trees more than that are required will increase the training time and beyond a certain limit,
you will that get substantial benefits in terms of accuracy.
There is no right rule. But I will tell my way to use the number of trees in tree algorithms.
I start with the default number of tree generally 100. Depending on the size of dataset, if it is small then I will tune my n_estimators parameters by using GridSearchCV or Bayesian optimization method. If the dataset is very large then rather tuning it, I will try with some large number of tree-like 1000, 2000, 5000, 100000, etc, and use early stopping to handle overfitting case.
we can use cross validation techniques like gridsearchcv or randomsearh cv to find these hyperparameters
I use GridSearchCV cross validation technique and tune n_estimators parameter
I use early stopping with some large no of trees to handle overfitting. This seems best to me
In Random Forest, the more the number of trees, the more samples you are creating of your data, the more samples you have created the more you reduce the bias-ness of your data. But a time comes where you have enough samples and now data is getting duplicated in further samples. In order to know the optimal number of trees, we can use cross-validation techniques like gridsearchcv or randomsearhcv to tune the n_estimators hyperparameter.
I would prefer either Grid Search CV or Random Search CV .