DBMS as an application server

Published in

Tarantool Database

7 min readMar 6, 2016

Some of you have probably heard of Tarantool: a super-fast persistent in-memory data store with replication and procedures in Lua. Imagine that we take some pieces of Redis, add frozen Node.js, dress it with Go, and then cook it stirring slowly for 5 minutes after it boils.

Surprising as it may seem, such awesome products as nginx, Go Lang, Node.js, Redis, MongoDB etc., have lots in common in terms of architecture. In order to create a high-end network server, you need a set of libraries providing non-blocking input/output, asynchronous event processing, error handling, logging, daemonization and so on. Such runtime is usually quite a complicated thing that requires low-level system programming skills and a lot of efforts to support different operating systems.

At Tarantool team, we’ve come a long way creating our home-grown server runtime based on green threads with cooperative multitasking (fibers). In the end, we got something that in terms of architecture most resembles Go (yeah, geeks think alike), but in the form of C libraries. Thanks to this basis, Tarantool is capable to process up to 2 million SELECT requests per second (2kk RPS) using the single core of a regular laptop.

Let’s program the database!

Want to send JSON to a cloud service when some database records have been updated? Impossible in PL/SQL? Still need to poll your database from a client?

Why not add a general-purpose programming language to DBMS and allow developers to use all frameworks they want? Tarantool has integrated a Turing-complete high-level programming language called Lua. It is like JavaScript, but much simpler. All technologies provided by C runtime and used by DBMS under the hood are now available to the developers via Lua bindings:

fiber — cooperative multitasking and channels (like Go);
socket — asynchronous non-blocking socket I/O;
fio — asynchronous non-blocking file I/O (using a thread pool);
mysql/pg — non-blocking сlient libraries for MySQL and PostgreSQL to use from Tarantool;
net.box — сonnection to different Tarantool databases from Tarantool;
http — primitive HTTP server and client;
json/msgpack/yaml — data serializers and encoders;
digest — cryptographic hash functions;
clock — high-precision timers;
console — server condition introspection, hot code loading, on-the-fly configuration changes;
log — event logging, logrotate.

box, Tarantool’s multi-engine database engine, that supports multi-master replication and works in the same address space with your code, serves as the icing on the cake. Just write the code. Developers, developers, developers

What’s the profit?

As a Tarantool developer, I frequently asked whether there are queues, data expiration, pub/sub, MGET and some other sort of built-in Redis features in Tarantool. No, there is nothing like that in Tarantool. There never was and never will be (sorry, guys). As the old wisdom goes, “Give a man a fish, and you feed him for a day. Teach a man to fish, and you feed him for a lifetime”. Tarantool provides you with tools and frameworks (“a fishing rod”) that could be used to solve a variety of tasks including the ones that are not falling into the categories of templates and patterns.

Want to send a HTTP request to a cloud service, parse JSON and then store data persistently? No problems, just use HTTP client with JSON serializer. What about automatic removal of the old data? Just start a background fiber. You can go deeper and create your own REST service where data is served directly from DBMS. Don’t limit yourself to the old-school Apache-PHP-MySQL stack. Think outside the box!

What is about my PHP/Python/Ruby?

“Do we really have to transfer the whole business logic into the database instead of keeping a dedicated application server (Node.js, PHP, Python, whatever) and DBMS (Redis, MongoDB, etc)”? — you might as well ask. And that’s a fair question. But no, we aren’t trying to turn the world upside down. Let’s take a bit more pragmatic look.

Nowadays, when complex applications can be written right in the browser, the server side is needed mostly for storing and processing data. A browser can request and update all data dynamically via AJAX. So, what does your application server (PHP/Node/Go/Python) do in this situation? Does it make sense to stay idle waiting for the database response and then transfer all the data unchanged to, say, nginx? Your application opens a transaction, gets some data from DBMS via network, changes some fields on the application server, sends the updates back to the database, commits the transaction and then returns processed data to nginx. Isn’t it too much network and userspace ↔ kernel round trips just to serve a simple web-page?

For these kinds of micro-services, Tarantool suggests that you write a stored procedure close to the data. The tarantool-http module and nginx_tarantool_upstream easily organize a REST service from Tarantool, simplifying your architecture and removing the extra link, i.e. the dedicated application server. That said, nobody’s suggesting to re-write the whole application when you can just create micro-services of the most loaded project parts where the performance of traditional solutions isn’t enough anymore. Tarantool as a general-purpose DBMS can be used for the rest of tasks via connectors from various programming languages.

How hard is it to speak Lua?

“We can’t write in Lua! Where would we find Lua developers?” — could be your next question. Don’t panic! Lua is an easy-as-pie language that doesn’t require studying the collected works in ten volumes. At Mail.Ru, Lua procedures are efficiently written by C/C++ programmers as well by Python, Perl, Ruby and JS developers.

But why Lua? Lua is very simple and fast. With Lua, there is no controversy about which 10% of language functionality should be allowed to use by the company’s projects. By the way, Tarantool also has a module C API that allows to achieve a new level of efficiency and, theoretically, integrate ${your_favorite_language_here}.

Let’s give it a shot!

Install Tarantool from http://tarantool.org/download.html. The site repository has binary packets for the major Linux distributions, ports for FreeBSD and brew formulas for OS X. After the installation, enter “tarantool” in the console. You’ll get an interactive console (like Python, Node, irb and so on):

roman@book:~$ tarantool
tarantool: version 1.6.8–123-gbe2ce21
type ‘help’ for interactive help
tarantool>

You can enter some Lua code in the interpreter, and the result will be displayed in a human-readable format (YAML):

tarantool> 2 + 2
---
- 4
...
tarantool> { name = "Roman", gender = "male" }
---
- name: Roman
  gender: male
...

You can write the same as a standalone script:

#!/usr/bin/env tarantoolprint(‘Hello world!’)

Run this script just like Bash, Python or Ruby:

roman@desktop:~$ edit trololo.lua
roman@desktop:~$ chmod a+x trololo.lua
roman@desktop:~$ ./trololo.lua
Hello world!

Tarantool is fully compatible with LuaJIT and Lua 5.1 on the script level and can be used as a drop-in replacement. Most Lua modules are suitable to use with Tarantool.

The box.cfg{} function configures and activates a built-in database (box); after that, you can create tables and carry out tasks:

tarantool> box.cfg {}
[cut]
tarantool> space = box.schema.space.create('test')
[cut]
tarantool> box.space.test:create_index('primary', { type = 'tree', parts = { 1, 'num' }})
[cut]
tarantool> box.space.test:insert({48, 'some data', { key = 'value', key2 = 'value2' }})
---
- [48, 'some data', {'key': 'value', 'key2': 'value2'}]
...
tarantool> box.space.test:select()
---
- - [48, 'some data', {'key': 'value', 'key2': 'value2'}]
...

If you stop the interactive console now (by CTRL+D or os.exit(0)), you’ll see new *.snap and *.xlog files in the filesystem. These files are used to provide persistence for our data:

roman@book:~$ tarantool
tarantool: version 1.6.8–123-gbe2ce21
type ‘help’ for interactive help
tarantool> box.cfg{} -- start a database and recover data
---
...tarantool> box.space.test:select()
---
- - [48, 'some data', {'key': 'value', 'key2': 'value2'}]
...

And now let’s try something more challenging:

#!/usr/bin/env tarantoolbox.cfg{}-- Create tables ("spaces") at first run
box.once('schema', function()
    box.schema.create_space('hosts')
    box.space.hosts:create_index('primary', { type = 'hash',
       parts = {1, 'str'} })
end)-- Handle GET-requests to /
local function handler(self)
    -- Get the client's IP address
    local ipaddr = self.peer.host
    -- Insert a new record for the address and increment the record counter 
    box.space.hosts:upsert({ ipaddr, 1 }, {{'+', 2, 1}})
    -- Return all records in JSON to the client
    return self:render{ json = box.space.hosts:select() }
endlocal httpd = require('http.server')
local server = httpd.new('127.0.0.1', 8080)
server:route({ path = '/' }, handler)
server:start()

Install the tarantool-http package for this example from tarantool.org:

# Debian/Ubuntu
sudo apt-get install -y tarantool-http# CentOS/Fedora
sudo yum install -y tarantool-http

At first run, the script will create the hosts table; after that, the HTTP server will increment a counter for every IP-address on ‘/’, and return all addresses in JSON to a client. Nginx can be easily used in front of the service. By the way, try.tarantool.org is written on Tarantool itself. We are eating our own dog food and trying to make the developers’ life better.

Going into production

So, what’s the best way to deploy our simple application to a production server? As you know, playing in console is one thing, and putting things into production is quite another. But don’t worry, it will be easy. Let’s copy our script to “/etc/tarantool/instances.enabled/myapp.lua” and run it with init scripts (“service tarantool restart”) or via “tarantoolctl start myapp”.

You can create as many applications as you’d like; init scripts will automatically run Tarantool daemons and monitor them. We recommend running a little bit fewer Tarantool daemons than the number of physical cores you have. This approach will guarantee the best per-server performance and save millions of dollars. You can easily find a daemon with your application name in the process list. By default, the log file is “/var/log/tarantool/myapp.log”, data is stored in “/var/lib/tarantool/myapp/”, and the pid file is “/var/run/tarantool/myapp.pid”. In other words, the paths are exactly as supposed by your favorite distribution. You can use our template for RPM and DEB packages.

Yet another useful Tarantool command to mention is “tarantoolctl enter myapp” that opens a console connection to the running application and allows to introspect the daemon’s state and to change the code on the fly. It is also possible to open a network port using box.cfg({ listen = 3313 }) and access the daemon through connectors from other programming languages and frameworks (as you remember, we promised not to ruin your universe completely!)

What next?

Questions and comments are welcome!