Microservices and Kafka — Part 2

Evolution of the eBay Kleinanzeigen platform architecture

--

The biggest German classifieds platform eBay Kleinanzeigen was built in 2009 as a scalable web platform to process 10K new ads per day. We began with a plain old database backend to store all information, a search backend based on Apache Solr™ (with a Master/Slave setup) to publish the content, and some web servers to talk to the public — done.

We launched with around 700K live ads. People still used newspapers to sell used goods at the time, and the smartphone era had just begun. After several years with this architecture, it was time to evolve, and take our technology stack to the next level. Until 30M unique user visit our platform. I peak times we had more than 50k requests per second.

We began our journey with Apache Kafka® in 2015, building on the microservice architecture blueprint described in Part 1 of our series.

The joy of microservices

Messaging

We introduced Kafka to break out from the monolith.

Like other platforms, we had the idea to inform our users about new content on our classifieds platform. We introduced a model to save a search and send push notifications whenever new results were available. Could we do this in real time with MySQL and modify queries at scale? — The answer was no.

We needed to decouple this as a complex subsystem. Our core system (a.k.a. “the monolith”) was built on Java. To break out, we introduced Kafka as a message broker and created a message for every important event, command, or entity update.

With this approach we created our scalable “saved searches” backend, which creates a message for every search match, then consumes it and decides how to handle these matches (update a view, send a push notification, etc.).
— A scalable notification system was born.

In 2019, we have 45 million saved searches and process up to 1 million new ads per day. Our Kafka system can still handle it.

By introducing Kafka with a truly scalable use case, we won big trust and now use this reliable message backed to distribute all of our data.

Key Learning:
Using Kafka to decouple systems and messaging works — and scales!

Storage (+ Backup)

Since we started using Kafka for messaging and to feed some of our systems with data, we came up with the idea to use it as storage. What if we loaded all the data needed for a service from a Kafka topic on startup?

The example below shows how we use Kafka as storage for our search system.

Our search system is based on a bigger Elasticsearch cluster. With more than 34 million live ads and up to 1 million new ads + 1 million edited ads per day, we used Elastic as a distributed, scalable search backend. The challenge is to recreate a full search cluster from scratch to production within a small amount of time (important for recovery, or schema changes).

Since we introduced Kafka and have an indexer application running, the idea was to fully index our search cluster from a dedicated Kafka topic. For this we used a feature in Kafka, called “compaction”. Instead of deleting all messages with a specific age, Kafka stores the latest message per key until it gets an explicit delete message for that key (value = null). In that case, for sure we need to store the full entity per message value. With a meaningful replication factor based on our Kafka infrastructure, we can easily scale up multiple instances of an indexer application. With this infrastructure, we are able to fully index 34 million documents in our search cluster in less than 30 minutes.

indexer architecture using kafka topic source of truth

Streaming

Messaging and storage works fine. Time to evaluate: our data is no longer static — it moves all around, like the users visiting and leaving our platform. So why not present them all the things they might like when they come back? For that, we need to handle data not in a static way, as it changes every month, every week, every day, every hour, and every minute. Let’s build a system that tracks the behavior of our users and enriches this data with incoming new ads continuously every month, every week …

Using Kafka and feeding it with all this information allows us to build a feed system which continuously delivers new content to our users when they use our apps — without doing any search. When users open our app, the content is consumed like a stream.

Summary

Over the last few years, our team has grown along with our businesses, software and services. The introduction of Kafka was a big factor in our success. We learned a lot and used it not only for scaling and decoupling. Kafka also plays a role when it comes to topics like storing and streaming.

For more details, see Part 1 of our series and visit our talk at the Devoxx conference in Kyiv November 1. Special thanks to Christiane Lemke and Grygoriy Gonchar for presenting more details, use cases and learnings.

Update

The talk was recorded and is now available on YouTube:

--

--