vAlgo ML : AdaBoost Alternatives
Hello Learners,
Continuing with my vAlgo Machine Learning experiment (original post here).
In my previous post, I wrote about creating the Malicious URLs Model, I used a Decision Tree algorithm, and the accuracy was at a whopping 3%. Wanting to increase that accuracy score, I decided to try and use a booster, the AdaBoost Classifier to be specific.
It did not work, literally. When ever it would get to the training (model.fit), the kernel would always die, even with the adjustments of (n_estimator=50, learning_rate=1, random_state=0) .
My lab was an Ubuntu VM, with Anaconda installed, which allowed me to run Jupyter notebooks locally. I don’t have a GPU, so it was just using that set up.
I did some research and turns out (as I expected) the issue could be that my setup might not have enough power to properly run AdaBoost. But I still wanted to find a way to get that accuracy score up. I asked ChatGPT for alternatives to AdaBoost, and these were the recommendations it provided for me.
- **Random Forest:**
Random Forest is an ensemble method that combines multiple decision trees. It’s less prone to overfitting compared to a single decision tree, and it’s often more computationally efficient than AdaBoost. Random Forests can handle large datasets and are less sensitive to hyperparameter tuning.
from sklearn.ensemble import RandomForestClassifier
# Create a Random Forest Classifier
clf = RandomForestClassifier(n_estimators=100, max_depth=None, min_samples_split=2, random_state=0)
# Train and use the classifier as usual
2. **Gradient Boosting with Decision Trees:**
Gradient Boosting methods, such as Gradient Boosted Trees (GBT) or XGBoost, can be used with decision trees as base learners. These methods build trees sequentially, focusing on examples that were previously misclassified, which can lead to strong predictive performance.
from sklearn.ensemble import GradientBoostingClassifier
# Create a Gradient Boosting Classifier with decision trees as base learners
clf = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0, max_depth=1, random_state=0)
# Train and use the classifier as usual
3. **LightGBM and CatBoost:**
These are gradient boosting libraries that are designed to be highly efficient and memory-friendly. They can handle large datasets and often perform better than traditional gradient boosting methods. Both libraries are known for their speed and effectiveness.
# Light GBM
import lightgbm as lgb
# Create a LightGBM classifier
clf = lgb.LGBMClassifier(n_estimators=100)
# Train and use the classifier as usual
# CatBoost
from catboost import CatBoostClassifier
# Create a CatBoost classifier
clf = CatBoostClassifier(iterations=100, depth=6, learning_rate=0.1)
# Train and use the classifier as usual
These alternatives provide various options that are less intensive than AdaBoost. Your choice depends on the specific requirements of your project and the computational resources available.
What next?
Well, I’m going to start by trying the Random Forest Classifier and see if that makes a difference. The goal now is to get the accuracy number above 3%. Once I achieve that, I will see if one of the other options can meet or exceed the previous method. Alright, coding time.
Wish me luck.
Thanks for reading.
Lets code something cool.
Ash, The Machine Learner.
Support The Project.
Buy me a coffee | Become my GitHub Sponsor | Become a Patreon