You have probably heard about Tarantool. This is a super fast DBMS with an application server inside. Tarantool is an open source project. It’s been around for more than 8 years. At Mail.Ru Group, we’re widely using it in more than half of our services like Email, Cloud, MyWorld, Mail.Ru Agent. We’re giving all the changes in Tarantool back to the community, so our users have the same version of Tarantool as we do. As of now, we have client libraries for nearly all popular languages. This was our priority over the last year. Libraries have been partially written by the community, and we immensely appreciate it. When we come across some really efficient library, we immediately include it into our public package repositories. We’re trying hard to deliver the DBMS and libraries right out of the box.
The main thing about Tarantool itself is that it is a perfect combination of a DBMS and a cache. A classical DBMS is a durable storage system with ACID-transactions, server-side language, tables, primary/secondary indexes, stored procedures and many more features. A cache, on the contrary, is nothing like DBMS but lightning fast in terms of throughput and latency. So, these are two different worlds, and they both converge in one product which is Tarantool. The main purpose of Tarantool is being the single source of truth for web-scale application when they need to work with hot data.
Comparison with classical DBMSs
If you’re using a traditional DBMS, e.g. Oracle or MySQL, then you probably miss some cache features. I’m talking about such things as fast request processing and low latency. Traditional DBMSs are good at many kinds of work, but they don’t have those speedy things above. At the same time, caches have their own cons, like no transactions or stored procedures. Even if you’re trying a DBMS with a cache on top of it, you still face some trade-offs: you’re losing some DBMS features, like ACID transactions, stored procedures, secondary indexes, and you’re losing some cache features, like high throughput of write operations. Plus, you get new problems. The most serious ones are data inconsistency and the cold start problem.
If you aren’t OK with these trade-offs, and you want to have a real DBMS and a real cache all in one, then consider Tarantool. It was designed specifically to address these issues.
Now a couple of words about what is under the hood of Tarantool, more specifically — about its in-memory engine. Tarantool stores a database basically in two files: a snapshot file capturing the whole database image at some point in time, and a transaction log file that stores all the transactions committed after that point.
This architecture allows warming up a database on startup as quickly as possible. Try testing Tarantool’s cold start time. “Cold start” means reading the whole snapshot to memory. Reading from disc is done at the speed of 100Mb/sec even with a spinning disk. So, if the dataset size is 100Gb, for example, it’ll be uploaded to memory in 1000 seconds, which is approximately 15 minutes.
When we tested the cold start time of MySQL and PostgreSQL, the result was much worse. Unlike Tarantool, they begin accepting queries right after the start, before the database is warmed up, but you still can’t use a cold database, and the warm-up speed is a couple orders of magnitude slower than with Tarantool — around 1–2 Mb/sec. This implies that you need to use some dirty tricks, e.g. run the “cat” command against some database files to warm up an index in advance, otherwise your database will be warming up for decades. Guys who administer MySQL know this kind of stuff, and they’re mostly unhappy with that. As for Tarantool, it’s just up & running, requiring the shortest possible cold start time.
Tarantool is yet not perfect. One of the main improvements that we’re working on right now is the disk store. Tarantool has been designed as an in-memory DBMS. It’s way better than traditional DBMSs because of its speed and low total cost of ownership. But what do we do with cold data in Tarantool? It handles hot data efficiently, but it holds everything in memory, including the cold part of the dataset, which is not efficient.
Without an on-disk storage engine, we can’t implement any smart algorithm to move cold data away from the memory. For example, users’ profiles at Mail.Ru Group are handled by 8 servers with Tarantool. With MySQL, it could have been a million-dollar farm with hundreds of servers. But on the other hand, there could have been only 4 servers with Tarantool if we were able to displace cold data to a disk store.
Next, we’re working hard on an automatic cluster solution. We actually have a couple of such solutions already in production, but these are proprietary solutions designed for specific tasks. What we want to achieve here is a universal cluster solution with automatic replication, sharding and resharding, but without system administration pains.
One more restriction to pinpoint is that Tarantool still does not support SQL. That’s another big feature that we’re working on right now. SQL support will enable a smooth migration path for existing systems or applications to Tarantool from any other SQL database. For example, we have more than 100 MySQL servers behind the email service at Mail.Ru Group, and we could easily handle their workload by a couple of Tarantool servers, but we won’t do that because we’d need to rewrite tons of code. So, we’d better add SQL support to Tarantool to close this issue once and for all.
Finally, ACID-transactions are only supported inside stored procedures. So you can’t say BEGIN and COMMIT outside of a stored procedure, which is not convenient in some cases. We’re working on this problem right now.
Tarantool has a fantastic memory footprint. Its overhead for storing data is very low. The real size of data on disk or in RAM is usually only a couple percent more than the size of raw data itself. The overhead never exceeds 10%, plus the memory used by indexes.
Our use cases
At Mail.Ru Group, Tarantool is used for a wide variety of tasks. We have as many as a couple hundreds (!) of Tarantool deployments. Three of them handle the heaviest workloads — the authentication system, the push notification system and the advertising system. Let’s talk about each of them in more detail.
Perhaps, every website or mobile application has its own authentication system. I mean a system that authenticates an end user by a login/password pair, or by a session ID, or by a token. Mail.Ru Group is no exception here, as it authenticates all web and mobile users. This system has non-trivial requirements which could be considered contradictory:
- Heavy workloads. Every page, every AJAX request, every API call in mobile applications use this system in order to authenticate the user being served.
- Low latency. Users are surprisingly impatient. :) They want to have all the information they requested right away. So, each call must be handled ASAP.
- High availability. The authentication system must serve every single request, otherwise a user is 100% likely to get an HTTP error 500, because we can’t handle a request until the user is authenticated.
- Every hit hits a database. Every hit to the authentication system triggers a check of some credentials which are stored in a database. Moreover, every hit needs to be checked against anti-bruteforce and anti-fraud systems, which in their turn query the database and add a record to the user’s authentication history (current IP address, geographical location, time, authentication client, etc). Plus, we need to update the last session/token access time, and we need to update anti-bruteforce/anti-fraud databases with all the changes that we’ve made. Within login/password authentication, we need to create a session in a session database, i.e. need to insert a row into a table.
See, there is a lot of work to do on — let me say it again — EVERY hit to our web or mobile applications. And this work includes not only elementary read-only operations, but also SELECT and UPDATE queries to a database. On top of that, there are a lot of hackers who are constantly trying to break into our authentication system. This entails additional workload, which is quite heavy but absolutely useless to us.
- Fairly large dataset. This system stores a lot of information about all our users.
- Expiration. Some pieces of the dataset need to be expired, e.g. sessions/tokens. Every expiration requires an UPDATE transaction.
- Persistence. Every single change must be written securely to disk. We can’t afford storing sessions in Memcached because, when it goes down and up again, we’ll lose all the sessions, forcing ALL our users to remember and enter their logins & passwords again, and our users will most probably hate us. The same thing is with anti-bruteforce data: this is our main weapon against hackers, we can’t risk losing it. And of course, we don’t want to lose our database with the hashes of user passwords — it’s one of the worst things that could happen to a web site or a mobile application.
That was our authentication system. How do you like it? A bunch of very strict requirements. Some of them are met if you’re using a cache, e.g. ability to withstand heavy workloads, low request latency and date expiration. Others are met if you use a database — persistence, for example. Which implies that the authentication system should be based on a cache and a database combined into a single solution. It has to be as durable as a truck, and as fast as a red sport car — or at least as a yellow sport car. :-)
Right now we get around 50K queries per second that require login/password authentication. This rate seems relatively low, but there is a lot of work to do on each request, i.e. perform a number of queries and transactions. Tarantool takes care of it perfectly.
On the other hand, the session/token authentication workload reaches 1 million queries per second! This is the total workload of the entire Mail.Ru Group portal. And this workload is handled by only 12 servers with Tarantool: 4 servers with sessions and 8 servers with user profiles. Replicas are already included in these numbers! By the way, those servers are far from reaching their maximum CPU capacity. The CPU usage is around 15–20% as of now.
These days there are many users switching from laptops to mobile devices. They mostly use applications rather than mobile web interfaces. And of course, you know that mobile devices send push notifications.
A push notification is used when there is an event on the server side, and this event should be delivered to the end user’s mobile device.
An interesting fact about the notification delivery process is that the server side does not send any message directly to the user’s mobile device. It rather sends a message to a special iOS or Android service that takes care about delivery.
iOS/Android services need to somehow authenticate a user, which is done via a token. You need to store these tokens in a database. Plus, a user can have more than one device, and therefore there can be many tokens per user. There are lots of events on the server side, and the more often you notify your users, the more engaged they are with your application.
Having said that, it is becoming clear that a push notification system generates a heavy workload on the underlying database. Worse still, the heavy workload is generated along with sub-millisecond latency requirements, because you don’t want to slow down your backend and keep it waiting for a database. Heavy workloads and small latencies are all about what Tarantool was created for.
But this is not the only job for Tarantool in the push notification system. What else? The short answer is: queues. The long answer follows.
What would you say if your server side had to connect with iOS/Android APIs directly? I bet, you’d say “never!” And I would agree because iOS/Android APIs can slow down, become unavailable or unreachable. In all of these cases, your backend performance will be affected severely. Obviously, you need a queue as an intermediate storage system for all the notifications. A queue, but a fast one, reliable, durable, and with replication. And again, this is Tarantool. It can be perfectly used as a queue, here is an interesting article about this.
Our push notification system at Mail.Ru Group handles 200K queries and transactions per second. By the way, each access to the queue is a transaction because, whether you push or pop, you need to change the state of the queue and commit all the changes to disk.
Mail.Ru Group is a huge web portal, and of course it has advertisements on the majority of its pages. We have a sophisticated high-performance system that determines which advertisement to show. The system keeps a lot of information about users, their interests and other kind of stuff that helps to understand what advertisement to show right now and right here to the specific user on the specific page.
The main challenge of an advertising system is to handle heavy workloads at millisecond speed. This system is exposed to even heavier workloads than the authentication system.
As an example, let’s say that we have 10 advertisement slots on a page. For each slot, we need to look up many data sources, then aggregate the results, then determine which advertisement to show and then show it. It’s obvious, but let’s put it clearly: advertisements don’t offer any functionality to the end user, so their existence can’t be an excuse for a slowdown in the main services. So, all this stuff needs to be done in a few milliseconds.
Our advertising system runs on one of the biggest Tarantool clusters in the world. Every single second, it handles 3 million read transactions and 1 million write transactions.
Last but not least
Tarantool was born for heavy workloads. And even with a moderate workload Tarantool can be valuable to ensure good latency — in most cases, it’s one millisecond or less. In fact, Tarantool delivers good latencies no matter what the workload is. Traditional databases can’t do that fast. Sometimes, in order to process a single user request, you need to do many queries to a database. As all the latencies sum up, the total latency per user request reaches some really unpleasant value. In this scenario, Tarantool could be really helpful again — the less the time for processing one query, the less the total request time.
Tarantool provides you with high throughput, low latency and great uptime. It gets every last drop of performance out of your servers whilst being a real database with transactions, replication and stored procedures.