Hammerspace: Persistent, Concurrent, Off-heap Storage
By Jon Tai
What is hammerspace?
According to Wikipedia, “hammerspace is a fan-envisioned extradimensional, instantly accessible storage area in fiction, which is used to explain how animated, comic, and game characters can produce objects out of thin air.”
We recently built a library that stores strings off the ruby heap, but still allows fast access of those strings from within ruby. Applications can use it to produce strings out of thin air, so we named the gem hammerspace. Hammerspace provides persistent, concurrently-accessible off-heap storage of strings with a familiar hash-like interface. It is optimized for bulk writes and random reads.
We are pleased to announce that hammerspace is now an open source project — the code is available on GitHub.
The Problem
The weekly performance report arrived in my inbox, but I already knew what it would say. For the seventh week in a row, our overall application response time was up. And I had no idea why.
We had looked at all the obvious things — traffic was not up significantly, and there were no big jumps in response time after a code change was deployed. None of our external service dependencies had regressed significantly. In fact, a steady increase in garbage collection time seemed to be the biggest contributor to the regression. But that just added to the mystery.
We theorized that a slow regression over many weeks must be caused by data growth, and that led us to translations.
Airbnb’s web site and mobile apps have over 80,000 translatable strings. Using our translation tools, our community of translators localizes Airbnb for over 30 locales. The number of translated strings grows every day as we introduce new strings, translate more strings, and localize Airbnb for more locales. And the growth looked more or less like our response time.
Loading Translations
Each locale’s translated strings are stored in a database. This allows our translators to update translations and see the results immediately without having to do a code deploy. When a rails process renders a page in a given locale for the first time, the locale’s translations are loaded from the database into a ruby hash.
Translations are bulk loaded into memory because accessing translations is very sensitive to latency. Rendering a page may require accessing hundreds of translations. Incurring a 2ms delay to access each one from an external cache would be prohibitively slow. Fetching all translations for a page at once is not straightforward, since the strings required to render a page are not known ahead of time.
Translations are updated relatively infrequently. If a translator updates a string, the subsequent request must reload the updated locale’s translations to provide the translator with instant feedback, but otherwise translations can be stale by tens of minutes or more. Thus, it is efficient to incur the cost of loading all translations for a locale once if the translations can be accessed quickly thereafter.
Growing Pains
Over its lifetime, a process accumulates many translations. This poses a number of problems. First, the translated strings are stored on the heap, where the garbage collector must scan over the objects on every run (at least in the case of Ruby MRI). For a mature process with all the translations loaded, this is almost 1 million objects! Second, each process has its own copy of the strings. These strings total over 80mb when all the translations are loaded. Since each process has its own copy, this 80mb is multiplied by each process on the machine. Third, each machine only has a finite amount of memory. As each process on the machine reaches maturity, memory pressure builds and a process must be sacrificed to keep the machine from running out of memory. (We have a watchdog process that kills ruby processes that are using too much memory — this keeps the kernel’s out-of-memory killer at bay.)
Of course, we can prevent processes from being killed by adding more memory to each machine or by limiting the number of processes that can run on each machine — in fact, we employed both of these measures as stopgap solutions. However, these solutions are not scalable because the number of active translations is constantly growing.
Over the summer, as the number of translated strings grew, we saw processes being killed more and more frequently. As process lifetimes became shorter, bulk loading translations became less efficient because the up-front cost of loading the translations was no longer being offset by a long period of fast accesses.
We needed a solution that would store translations off the heap, allow sharing between processes, and persist translations across process restarts — while still providing fast access.
Early Solutions
We considered storing translations in a local memcache instance. This would get them off the heap, allow sharing between processes, and persist them across process restarts. However, accessing memcache over a local socket was still several orders of magnitude slower than a ruby hash access, so we ruled out memcache.
Next, we benchmarked cdb and sparkey. These libraries essentially provide on-disk hash tables. They are optimized for bulk writes and random reads. The numbers were encouraging — writing a locale’s translations was almost as fast as a ruby hash, and reads were only about twice as slow. The filesystem cache helps a lot here — moving the files to an in-memory filesystem made little difference. Unfortunately, neither library supports concurrent writers, so files cannot be shared between processes because multiple processes might try to load the same locale at the same time. Another drawback is that these libraries require special usage patterns, so we would have to tightly couple our translation tools with these libraries.
We needed a layer on top of these lower-level libraries that would support concurrent writers. Since we had to build something anyway, we made the interface as similar to ruby’s hash as possible. This solved the problem of tight coupling with the application code — we could just substitute a ruby hash with a disk-based object that acted like a hash, and the application code would function just the same.
Hammerspace
So we built hammerspace. Hammerspace adds concurrency control to allow multiple processes to update and read from a single shared copy of the data safely. Its interface is designed to mimic Ruby’s Hash to make integrating with existing applications simple and straightforward. We chose to build hammerspace on top of sparkey, but support for other low-level libraries can be added by implementing a new backend class.
We are using hammerspace to store translations, but it should be generally applicable to any data that is bulk-loaded from some authoritative external source and benefits from low-latency reads. If your application downloads a bunch of data or loads a bunch of data into memory on startup, hammerspace may be worth a look. Hammerspace is an open source project — the code is available on GitHub.
Usage
For the most part, hammerspace acts like a Ruby hash. But since it’s a hash that persists on disk, you have to tell it where to store the files. Hammerspace objects are backed by files on disk, so even a new object may already have data in it.
h = Hammerspace.new("/tmp/hammerspace")h["cartoons"] = "mallets"
h["games"] = "inventory"
h["rubyists"] = "data"h.size #=> 3
h["cartoons"] #=> "mallets"h.map { |k,v| "#{k.capitalize} use hammerspace to store #{v}." }h.close
You should call close on the hammerspace object when you're done with it. This flushes any pending writes to disk and closes any open file handles.
Concurrency
Multiple concurrent readers are supported. Readers are isolated from writers, i.e., reads are consistent to the time that the reader was opened.
h = Hammerspace.new("/tmp/hammerspace")
h["foo"] = "bar"
h.closereader1 = Hammerspace.new("/tmp/hammerspace")
reader1["foo"] #=> "bar"writer = Hammerspace.new("/tmp/hammerspace")
writer["foo"] = "updated"
writer.close# Still "bar" because reader1 opened its files before the write
reader1["foo"] #=> "bar"# Updated key is visible because reader2 opened its files after the write
reader2 = Hammerspace.new("/tmp/hammerspace")
reader2["foo"] #=> "updated"
reader2.closereader1.close
Multiple concurrent writers are also supported. When a writer flushes its changes it will overwrite any previous versions of the hammerspace.
In practice, this fits our use case because hammerspace is designed to hold data that is bulk-loaded from some authoritative external source. Rather than block writers to enforce consistency, it is simpler to allow writers to concurrently attempt to load the data. The last writer to finish loading the data and flush its writes will have its data persisted.
Flushing a write incurs some overhead to build the on-disk hash structures that allows fast lookup later. To avoid the overhead of rebuilding the hash after every write, most write operations do not implicitly flush. Writes can be flushed explicitly by calling close.
writer1 = Hammerspace.new("/tmp/hammerspace")
writer1["color"] = "red"# Can start while writer1 is still open
writer2 = Hammerspace.new("/tmp/hammerspace")
writer2["color"] = "blue"
writer2["fruit"] = "banana"
writer2.close# Reads at this point see writer2's data
reader1 = Hammerspace.new("/tmp/hammerspace")
reader1["color"] #=> "blue"
reader1["fruit"] #=> "banana"
reader1.close# Replaces writer2's data
writer1.close# Reads at this point see writer1's data; note that "fruit" key is absent
reader2 = Hammerspace.new("/tmp/hammerspace")
reader2["color"] #=> "red"
reader2["fruit"] #=> nil
reader2.close
Behind the scenes, hammerspace maintains several versions of the sparkey files in separate directories. The current version of the files is pointed to by a current symlink. Each writer begins by copying the current version of the files into its own private directory. Writes are then made to the private files. When the writes are flushed, the current symlink is atomically updated to point to the directory with the updated files, and the previous target of the current symlink is unlinked. Updating the symlink atomically ensures that the hammerspace will never be left in a corrupt state, even if a writer should crash while performing the update. Also, unlinking a file does not actually remove the contents of the file until all open file handles are closed, so if there are readers that still have the old files open they can continue to use them.
Integration
Integrating hammerspace with our translation tools was mostly straightforward, although there were a few tricky bits.
The code was already structured to do a bulk write from the database to a hash, but we needed to ensure that clearing the existing translations, writing the new translations, and setting the "last updated" timestamp are all flushed in a single write. Otherwise, multiple writers might overlap and we would end up with inconsistent data.
We also needed to ensure that strings are encoded as UTF-8 going into hammerspace so that we can force the encoding to be UTF-8 when pulling the strings back out. Sparkey just stores a sequence of bytes. Every hammerspace access results in a new string object created from the bytes stored in sparkey, so we cannot rely on the original string's encoding to persist.
When we were using ruby hashes, translations were updated when a translator made a request, or when the process was killed and a new process was spawned. New processes were spawned fairly regularly, but a code deploy served as an upper bound -- all processes would be restarted and we were guaranteed to reload the latest translations from the database. With hammerspace, the loaded translations persist across process restarts, so the only thing that causes translations to be reloaded is a request made by a translator. It is unlikely that our translators can hit every machine in every locale in a timely manner, so we needed a new trigger for reloading translations, or at least checking to see if updated translations were available.
We considered checking for updated translations on every request, but that seemed overkill. Ideally, we could have a single process check for updated translations every few minutes. If there are updated translations available, the process would load them into the shared hammerspace files on the machine for all other processes to use. A rake task scheduled by a cron job fit the bill perfectly. And as a bonus, a rake task takes translation loading out of the request cycle entirely (except for translators).
The final piece of the puzzle was a before filter that runs on every request. The before filter simply closes any open hammerspace files so they are reopened on first access. Hammerspace reads are consistent to the time the reader is opened, so without the before filter, processes would continue to read from old hammerspace files indefinitely.
Impact
"I would merge this so hard that only the true king of England could pull it out again."
Over the four hours between 17:15 and 21:15, we launched hammerspace to 20%, 50%, then 100% of our application servers. The graphs below show a few key metrics during the rollout.
The first graph shows the number of locales loaded and where they are loaded from. There are two trends in this graph. First, the number of locales loaded from the database drops by 86.5%, while the number of locales loaded from hammerspace increases. Second, the total number of locale loads from both sources drops by 72.6% as processes lifetimes increase. The second graph, processes killed, confirms this. Less processes are killed, meaning processes are staying alive longer.
The third and fourth graphs show runs of the scheduled rake task and the number of processes using the hash store versus the hammerspace store. These are straightforward and trend as expected.
The fifth graph shows a decrease in total in-request garbage collection time. By looking at some other metrics, we were able to determine that the time of each garbage collection run remained roughly the same, so this decrease in total time is due to fewer garbage collection runs. In-request garbage collection runs decreased by 67.3%; total time decreased by 66.3%.
The final graph shows the number of locales available in each process. When locales were loaded into memory, processes were never able to get all locales loaded before they were killed. With hammerspace, it is possible for all locales to be loaded.
Thanks to the improvements in garbage collection time and less time spent loading locales and doing other process startup tasks, overall application response time dropped by 17.3%!
We had originally set out to curb a garbage collection regression. Along the way, we also fixed some inefficiencies in the way we load translations. But none of us expected such a dramatic improvement. In the future, we will definitely pay more attention to lower-level issues such as memory usage, garbage collection time, and application startup time.
Check out all of our open source projects over at airbnb.io and follow us on Twitter: @AirbnbEng + @AirbnbData
Originally published at nerds.airbnb.com on January 7, 2014.