Scalable machine learning with Apache Ignite, Python and Julia: from prototype to production

Published in

Alliedium

10 min readMay 28, 2021

In the previous article Boosting Jira Cloud app development with Apache Ignite we explained the benefits of using Apache Ignite in a combination with Spring Boot for building scalable and distributed backends for Jira add-ons. The important aspect of the backend design left without much attention was the machine learning infrastructure. In this post I’ll explain why we started with a combination of Scikit-learn and Celery and ended up with a combination of Apache Ignite, Ray Serve[4],[5],[6], Scikit-learn and PyTorch.

Our Alliedium Jira add-on for simplifying project management via automating the ticket assignment, labeling, ranking by priority works via using ML for inferring statistical patterns from existing Jira tickets. This requires a scalable backend infrastructure for online training and serving a large number of both classical ML and Deep Learning models in parallel for different clients (treated as separate tenants). Initially we had a prototype of ML pipeline implemented in Python’s Scikit-learn library and it was quite naturally to use the Python-based distributed task queue framework called Celery as a model orchestration layer inside Kubernetes cluster. PostgreSQL was our first choice of the database (until we switched to Apache Ignite later).

As number of users and ML model complexity increased we have quickly started to feel that PostgreSQL was a limiting factor on the way to the fault-tolerant, scalable and distributed architecture of the back-end. We could scale Celery by running more pods in Kubernetes, we could also scale the web server part of the back-end by running more instances of Spring Boot pods but we could not scale PostgreSQL without switching to one of the distributed PostgreSQL derivatives such as Greenplum[17], CitusDB[16] or TimescaleDB[18]. Manual sharding among different instances of PostgreSQL was not an option for us as we also wanted to have a more elegant solution that would be less painful to use in multi-tentant setting for complex schemas and preferably without bulky ORM tools such as Hibernate. After evaluating a few alternatives we have chosen Apache Ignite because it

integrates with Java natively
highly available and horizontally scalable
fault-tolerant and distributed
supports distributed ACID transactions
provides both persistent and in-memory storage
supports SQL for distributed data
supports user-defined distributed jobs
provides automatic failover (jobs and db connections)
provides Transparent Data Encryption for safety reasons
supports native configurations for deployment in Kubernetes
free and open-source

Switching from Celery to Apache Ignite + Ray Serve

We’ve decided to keep Celery for the time being (though it wasn’t a perfect fit anymore as Apache Ignite had all we needed when it came to being able to run distributed user-defined jobs on the cluster — see the next section for details). The problem was that Celery didn’t have any native Java API while Apache Ignite Python thin client API didn’t have all the features we needed (see https://ignite.apache.org/docs/latest/thin-clients/getting-started-with-thin-clients). To work around this limitation we’ve wrapped a full-featured thick Java client for Apace Ignite with Python-Java bridge called Py4j (https://www.py4j.org/).

Celery wasn’t perfect for building a distributed ML serving engine just because it was a specialized system for ML-related tasks. Ray Serve seemed a much better choice (see Machine Learning is Broken article for details). However Celery provided an ability to run generic not necessarily ML-related tasks on the cluster which was important for us as PostgreSQL being a pure SQL database didn’t have any features for running user-defined jobs. The strong sides of Celery for our use case were

Celery is written in Python; thus easier to integrate with Python-based ML frameworks
“At Least Once” delivery guarantee for Celery message queues (implemented via RabbitMQ)[38]

Apache Ignite on the other hand has a built-in support for running distributed jobs on a cluster that eliminated the need for having Celery. We appreciated the following strong sides of Apache Ignite as a compute grid:

Native Java API for messages and distributed computing tasks
Built-in distributed basic ML models
Automatic connection failover for both thick and thin clients

Despite Apache Ignite having a set of built-in ML models we’ve decided not to use those (for the reasons explained below) and keep using Python-based pipelines. The other problems with Apache Ignite as a compute grid we noticed are:

Weaker delivery guarantees; not suitable for important messages (in finance e.g.)[42]
Built-in ML models lack certain features for our use case
Python thin client doesn’t support neither message nor compute API[34]
Using the thick client API from Python requires Py4J Python-Java bridge[36]

The Celery was far from being perfect either because it

Requires a separate message broker (RabbitMQ) for submitting tasks[39]
Requires a separate results backend for large results[39]
No out-of-the-box pure Java API[40]
If not run inside K8s a special care is needed for RabbitMQ auto-failover implementation[41]
Automatic connection failover is available only inside Kubernetes

At the end the strong sides of Apache Ignite outweighed the weak sides as we found the following workarounds for them:

Weaker delivery guarantees were handled on the front-end side inside JavaScript
Thick client Java API wrapped with Py4j was used instead of less-featured thin Python client API

While we no longer needed Celery for running generic distributed computing tasks we still needed a place to serve and run our ML models. Thus, after releasing a version with a combination of Celery and Apache Ignite we’ve decided to replace Celery with a combination of Apache Ignite built-in distributed computing capabilities and Ray Serve for training and serving ML models.

Evaluating Julia language as Python alternative for ML

While some of our models are Deep Learning-based and run on GPU, most of our models are still are in the the classical ML camp and run on CPU. The training speed is a very important factor for us as this directly impacts the user wait times and consequently — the user experience. Scikit-learn models run quite fast but we wanted to see if we could make classical ML models even faster and started to evaluate the following alternatives:

Linfa (https://github.com/rust-ml/linfa)
CuML (https://github.com/rapidsai/cuml)
MLJ (https://github.com/alan-turing-institute/MLJ.jl)
Apache Ignite ML (https://ignite.apache.org/features/machinelearning.html)

Linfa didn’t seem mature enough for our case (e.g. it didn’t have a multinomial logistic regression classifier at the moment of writing this post) though it turned out to be very fast.

CuML required GPU and that is why we didn’t seriously consider it.

Out of all four options we considered Julia’s MLJ was the most promising one. Firstly, we were already familiar with Julia from a few previous projects. Secondly, Julia was built for speed and doesn’t have the overhead introduced by calling NumPy functions (written in C) from high-level Python functions of Scikit-learn.

As for Apache Ignite ML — it wasn’t built for running ML model training fast on small datasets (~10k observations) but we’ve decided to include it into our comparison anyway.

Here are the results of our comparison between Scikit-learn (Python), MLJ (Julia) and Apache Ignite ML:

We noticed that

ML in both Julia and Python is much faster than Apache Ignite ML for our case (~8–10k observations)
A limited set of optimization solvers in Apache Ignite ML (e.g. LogisticRegressionSGDTrainer for LogisticRegressionModel, in scikit-learn — 5 solvers)
There is no nested cross-validation[51] and no stratified cross-validation[52] in Apache Ignite ML

and concluded that Python is stronger than Julia’s MLJ when it comes to

Apache Ignite thin client for Python (no such client for Julia)
Ray Serve (e.g. Genie.jl + Dagger.jl is not an equivalent replacement)[4],[5],[6]
Python has a much more mature ML ecosystem comparing to Julia
Scikit-learn is sometimes faster than MLJ

Python and Julia go head to head for

Calling Apache Ignite thick client Java API: Py4J (Python)[36] vs JavaCall.jl (Julia)[53]
Calling Apache Ignite thick client C++ API: Cython (Python)[54] vs CxxWrap.jl (Julia)[55]

Julia in its turn is stronger than Python when it comes to

Being a more flexible language (custom models and transformations do not have to be written in C to be fast, they can be written in Julia itself)
Easier parallelism (native threads in Julia vs GIL in Python)[56]

Based on the results of this comparison the decision was to stick with Python (Scikit-learn for classical ML algorithms and PyTorch for Deep Learning models) for now and keep an eye on Julia ML ecosystem that includes such important packages as

MLJ.jl (https://github.com/alan-turing-institute/MLJ.jl)
AutoMLPipelines.jl (https://github.com/IBM/AutoMLPipeline.jl)
Flux.jl (https://github.com/FluxML/Flux.jl)
EvoTrees.jl (https://github.com/Evovest/EvoTrees.jl)

As for Apache Ignite ML — as was expected it did show lower performance for our use case (~10k observations) just because it was designed for large datasets distributed across multiple nodes of Apache Ignite cluster and wasn’t optimized for small datasets like ours.

The Final Architecture Layout

After switching from PostgreSQL+Celery to Apache Ignite + Ray Serve the ended up with the following architecture layout of the backend:

The blue hexagons represent Apache Ignite nodes (each running in a separate Kubernetes pod when the cloud deployment is used), the green hexagons represent the Ray Serve nodes. In our case both blue and green pods run inside the same Kubernetes cluster while the grey hexagons represent the GPU-enabled Deep Learning containers managed by Amazon SageMaker service. The diagram above hides certain aspects of the design such as Spring Boot-based web server pods and the load balancer in front of those web server pods. When Allidium client performs actions requiring training of ML model (or triggering an inference for already trained model) the load balancer forwards the request to one of Spring Boot Kubernetes pods which in its turn forwards the request to one of Ray Serve pods. If the request is related to the classical ML model the Ray Serve pod handles the request all by itself running all the calculations on CPU. The Deep Learning model-related requests are forwarded by Ray Serve to GPU-enabled Amazon SageMaker pods.

References

Scalable machine learning with Apache Ignite, Python and Julia: from prototype to production

Switching from Celery to Apache Ignite + Ray Serve

Evaluating Julia language as Python alternative for ML

The Final Architecture Layout

References

Written by Peter Gagarinov