Speeding up Xgboost models with Treelite

Machine learning, thou art too slow

Published in

Homeday

5 min readDec 16, 2019

In our our first two articles about the Homeday Price Explorer( part one, part two ), we have touched on the topic of applying machine learning to real estate.

Today, let’s take a peek under the hood. We’re going to examine the next thing a developer usually stumbles upon after getting the machine learning to work right: performance.

We’ll deal with Python, but you don’t need to be a Python expert, it’s all pretty straightforward.

Our goal is to measure response time of Machine Learning API (hint: it has not been fast) and to make it faster.

TLDR

If you use Xgboost, check out treelite. Treelite allows you to generate C sources from your Xgboost models, so you can compile the whole trained model into a shared library. In our case, using compiled treelite models gave us:

roughly ten times less storage space needed for compiled models;
ability to preload compiled models (with plain Xgboost we just ran out of RAM);
on average, about 10x faster prediction;
on average, about 20% faster response time of our machine learning API.

Machine learning API

To give a little bit of context, we’re using gradient-based decision trees models, powered by Xgboost.

The training process is automated and produces xgboost models which can then estimate prices for properties.

After training, the models are deployed behind an API.

There are two main endpoints that this API provides:

GET /prices/area : this returns prices for a specific area, like city, district, zip code or a building block
GET /prices/estimation : this returns a price estimation (aka valuation or appraisal) for a specific property

The GET /prices/area is used by the Price Explorer and can be cached, because areas like cities or zip codes don’t usually change their shapes within days or weeks, so we can introduce a caching layer between the price explorer and the ML service:

machine learning and the price caching layer

Here, the PriceService works as a caching layer that stores prices in the database, so if further requests have the same {address, property_type, marketing type}, the price is fetched from DB, sparing the machine learning API call.

For GET /prices/estimation, each request specifies a number of property-specific parameters, such as:

coordinates (lat, lng)
construction year
number of rooms
living space
floor (if any)
basement (if any)
number of balconies
parking (if any)
bathrooms (if any)
geo-specific features (separate DB calls)
and about 40 other features

The problem here is that a call to GET /prices/estimation is slow and takes about nine seconds to return results, which is enough to make the service look unresponsive, degrade the user experience, and overall makes it very hard to implement any kind of responsive UI.

There’s no easy way to apply caching here directly either, as each of 40+ parameters will potentially break the cache.

So let’s try to speed this endpoint up. The first step here is to understand what exactly makes it slow, which leads up to profiling.

Profiling

Python has a built-in cprofile extension which is pretty straightforward to use. So let’s profile the GET /prices/estimation:

from estimation import estimateimport cprofilepr = cProfile.Profile()pr.enable()estimate(params)pr.disable()pr.dump_stats(‘/tmp/profiling/estimate.dump’)

In the snippet above, estimate imported from the estimation package is the actual call that returns the estimated price for the property, and params is a dictionary of features described above.

Now that we have the profiler dump, there’s a number of ways to visualize profiling data.

In this article, we’ll use snakeviz because it is super easy to use and it gives you a very clear visual representation of how long your function calls take: the longer the function bar on your screen, the longer the function takes to execute.

Here’s snakeviz showing the profile stats dump from the code above:

Here, we can see the following:

a total of 8.72 sec has been spent in estimate.py
process_df.py took 5.48 sec
predict_from_df.py (the magenta rectangle on the picture above) took 1.9 sec
a bunch of other calls took much less overall time

At the moment, we can’t change dataframe processing, as our current models require some support data to be present, so speeding up process_df.py is a topic for another article.

For now, let’s take a closer look at predict_from_df.py:

The core.py call that takes 95% of the total time in this frame looks like this:

# somewhere aboveimport xgboost as xgbest = xgb.Booster({“nthread”: 4}) # init modelest.load_model(file)

Optimizing model load

So we spend just a bit less than two seconds loading our Xgboost model. It feels too long; what can we do to make it faster?

Our data science expert Dr. Artiom Kovnatsky has suggested to use the awesome treelite library. Treelite allows you to generate C source files from your Xgboost models, so you can compile the whole trained model into a shared library and use it to make predictions.

Treelite even provides a helper method to get a compiled shared lib:

# load the standard Xgboost modelmodel = treelite.Model.load(bin_model_path, model_format=’xgboost’)# compile the model to executable object files (*.so)model.export_lib(toolchain=’clang’,libpath=compiled_model_path,params={“parallel_comp”: parallel_comp}, # this one is worth a separate story :)nthread=4,verbose=True)

Having compiled the models, we can change the estimation code to:

import treeliteest = treelite.runtime.Predictor(model_path, verbose=True)

And now, let’s profile the things again!

Results

Having switched from xgboost to treelite, let’s get back to the initial profiling call:

from estimation import estimateimport cprofilepr = cProfile.Profile()pr.enable()estimate(params)pr.disable()pr.dump_stats(‘/tmp/profiling/estimate.dump’)

And point snakeviz at it:

It’s not obvious at first, but the predict_from_df.py is now taking so little time that you have to deliberately look for it. Do you see the little magenta rectangle in the image above? ;)

By using treelite, we cut down the prediction time from 1.8 sec to 0.12 sec, which is.. well, 15 times faster. We can call ourselves 15x developers now (pun intended :).

The overall response time also decreased by about 20%.

Next steps

While it’s been very interesting (and relatively simple) to optimize the Xgboost model load, the next topic will be optimizing dataframe processing, which is taking the majority of time now. So stay tuned for further updates!

Do you like what you see?

Give us a clap, leave a comment, or share anywhere. We appreciate feedback.

Also, check out other articles in our blog. We deal with many interesting things, for example:

And if you like what we do and could consider joining our dev team, check out our openings!