XGBoost 2.0, wait whaaaat?

Ahmedabdullah
Tensor Labs
Published in
4 min readJan 29, 2024

Oh hi there my fellow companion on data journey, I must admit that this time I am a bit late. It’s been a couple of months since the release of our very own favorite XGBoost’s new release 2.0. In my defense I came to know about it just today and thought that I should share will all my data companions if they too have missed this just like me. Now I know all of you data enthusiasts on your journey must have met XGBoost and most probably it’d have been your good friend through out your journey just like me but guess what? Our good friend with its recent update has just become amazing.

XGBoost (eXtreme Gradient Boosting) is an open-source software library which provides a regularizing gradient boosting framework for C++, Java, Python, R, Julia,Perl, and Scala. It works on Linux, Microsoft Windows, and macOS. From the project description, it aims to provide a “Scalable, Portable and Distributed Gradient Boosting (GBM, GBRT, GBDT) Library”. It runs on a single machine, as well as the distributed processing frameworks Apache Hadoop, Apache Spark, Apache Flink, and Dask.

XGBoost 101: A Quick Recap

Before we dive into the new features, let’s refresh our memory about what XGBoost is. Think of XGBoost as your GPS for machine learning. It’s like plotting the best course to descend a hill, much like the way we calculate how our steps affect the slope when hiking down a hill. It’s smart and efficient, just like using Newton-Raphson method to reach the bottom of that hill faster!

What’s New in XGBoost 2.0?

XGBoost 2.0 is not just an update; it’s a significant upgrade packed with new features, optimizations, and improvements. Let’s explore them under the following sections for clarity.

Unified Device Parameter [Updated]

  • Previous Version: Separate parameters for CPU and GPU.
  • XGBoost 2.0: A single ‘device’ parameter.
  • Simplifying life for developers, XGBoost 2.0 merges the CPU and GPU parameters into one. Think of it like a universal remote that controls all your devices instead of juggling multiple remotes.
# XGBoost 2.0
model = XGBClassifier(tree_method='hist', device='gpu')

Multi-Target Trees with Vector-Leaf Outputs [New]

  • Previous Version: Limited to single-target trees.
  • XGBoost 2.0: Supports multi-target regression and classification.
  • Imagine teaching a robot to not only water plants but also to adjust sunlight and temperature. This multi-tasking is what XGBoost 2.0 achieves with multi-target trees, enhancing prediction accuracy and preventing overfitting.

GPU-Based Approx Tree Method [New]

  • Previous Version: No GPU support for ‘approx’ tree method.
  • XGBoost 2.0: Initial support for running ‘approx’ on GPUs.
  • This is akin to upgrading from a regular bicycle to an electric one. The GPU support for the ‘approx’ tree method means faster computations, especially beneficial for large datasets.
# XGBoost 2.0
model = XGBClassifier(tree_method='approx', device='gpu')

Optimizing Histogram Size on CPU [New]

  • Previous Version: Histogram size wasn’t optimized, leading to excessive memory usage.
  • XGBoost 2.0: Introduces ‘max_cached_hist_node’ to limit histogram size.
  • It’s like having a smart wallet that expands or contracts based on your spending, ensuring your CPU’s memory is used efficiently.

Learning to Rank [Improved]

  • Previous Version: Basic ranking capabilities.
  • XGBoost 2.0: Enhanced with new parameters and NDCG as the default objective function.
  • Think of this as refining your recipe based on customer feedback in a restaurant. The new learning-to-rank implementation fine-tunes the results, making them more relevant and accurate.

Quantile Regression [New]

  • Previous Version: Not available.
  • XGBoost 2.0: Support for quantile regression.
  • This feature is like setting milestones in a marathon. Quantile regression helps predict not just the average but also the variability in your data, offering a comprehensive view of potential outcomes.

PySpark Interface [Improved]

  • Previous Version: Limited features.
  • XGBoost 2.0: Enhanced with GPU prediction, improved logs, and Python typing support.
  • It’s as if your car got an upgrade with a better navigation system, more informative dials, and a smoother steering experience.

Federated Learning Support [New]

  • Previous Version: Not supported.
  • XGBoost 2.0: Introduction of vertical federated learning.
  • Imagine a group of detectives (data sources) working on a case (model) together but without sharing their notes (data). That’s vertical federated learning, enhancing privacy and collaboration.

Improved External Memory Support [Improved]

  • Previous Version: Basic external memory support.
  • XGBoost 2.0: Enhanced performance with memory mapping.
  • This improvement is like moving from a cluttered room (inefficient memory usage) to a well-organized one (optimized memory usage), especially beneficial for large datasets.

Final Notes !

XGBoost 2.0 is more than an update; it’s a significant leap forward. With its enhanced multi-tasking capabilities, improved memory management, and a host of new features, it sets a new standard in the machine learning landscape. Whether you’re a seasoned data scientist or a machine learning enthusiast, XGBoost 2.0 offers the tools and flexibility to push the boundaries of your predictive models. With the recent update your friend/ companion in data-journey is now equipped with some pretty amazing stuff. I wish you all the very best for the next steps in your data journey.

While we’re talking about partner in data journey I’d also love for you guys to think about your partner in AI journey especially if you have an idea that you think can really stand-out and now are in need of an AI partner.

A partner that can not only take care of your technical side but also provides detailed consultation on how to take your idea to market, market analysis and feature consultation.

That’s where TensorLabs comes in. TensorLabs group of exceptionally talented AI Engineers and Developers who transformed five concepts from mere ideas to MVPs and then to fully operational products just within 2023, want to discuss your ideas with us? Feel free to reach out and happy AI 🚀✨.

https://tensorlabs.io/
https://tensorlabs.io/

--

--