Did you know there were NoSQL engines prior to 2009?

Olga Tchistopolskii
4 min readOct 31, 2022

--

NoSQL wave is the wave of technology that began around 2009. Basically, “NoSQL engine” is something that is “a storage, but it is not a relational database, usually produced after 2008”. Well-known examples are MongoDB, Redis, Hadoop, etc. Such engines existed before 2009. The most popular (during the times of the Internet) was Memcached. My rewrite of Memcached was called Univca, and here is a little bit of history.

In 2006, while employed by Live Nation, I was asked to write a Memcached replacement that would actually be robust and ‘just work.’ At that point, Memcached was a rather questionable piece of software with zero support. Zero support was because the original author of Memcached abandoned its code. You can find it all in the archives, I guess. Like I did in 2006. Anyway. There was no support for Memcached, there was not a single stress test, but there were (many) bugs. There were bugs in the Memcached homegrown network protocol. There were bugs in the Memcached homegrown memory allocator. The only expiration policy was LRU — so it pushed the implementation of complex expiration policies to the outside app. Also, the re-fetching of data and placing it into cache — Memcached is pushing it outside. That creates race conditions — especially if you must deal with the expiration of objects that depend on each other. Comparing this to the ‘universal’ cache that can handle all those complexities on its own and free developers from dealing with those issues. The Live Nation engineering management (the best I’ve met in SFBA) decided not to gamble their enterprise careers by deploying Memcached as a cornerstone for an enterprise website. Comparing to Digg engineering management, who at some point made a comparable gamble on Cassandra. And lost. You can find it all in the archives. Fun stuff.

Long story short. I implemented a simple HTTP 1.1 (because keep-alive) protocol — to make use of Netscaler, if we would need load-balancing (we turned out fine without it). Univca version 1 supported everything Memcached did (get/put etc.) The main difference was that Univca was stress-tested/benchmarked with AB. Which was possible only because Univca used HTTP for a protocol — so I got the stress-test system for free. Stress-test system means that I can deploy this component into the enterprise website and sleep well, knowing that the website will not collapse. Engineering approach vs. bullshit approach, basically. That was before 2008, of course. I’ve also compared the PHP/Memcached vs. PHP/Univca benchmarks running for days — there were no performance differences. So all that noise about Memcached being super efficient — it’s mostly bullshit. It is not super efficient. Anybody can take stock of STL’s hashtable, slap the HTTP server on top, and get (robust) Memcached clone in 1–2 weeks. Try it. Another interesting thing that came out of running C++ code for days was the impact time has on the performance of HashMap vs. BinaryTree. I suggest you try that too — was a bit surprising. But then again — not many people have a task to stress-test caching server, right?
After version 1 — I implemented MySQL read-through — so that Univca goes directly to the MySQL server if the key/value is missing in the cache. Then in another couple weeks, I implemented XSP — PHP / C++ cross-compiler/transpiler of sorts. I re-used caching component from Univca, but I also had to piggyback one more C++ component for that. That was before HipHop. Performance gains of XSP were measured in 1000%. We switched parts of the Live Nation production website from Zend+memcached to XSP and went from 50 hps to 5000 hps — measured by AB on the local network — with a hit/miss ratio close to 100% — on purpose, because that was the business case. That is, naturally, way better, than HipHop. That is, primarily, because HipHop is not optimizing things that are worth optimization.

I’m writing this down because I’m being asked about it a lot recently. Also, because it shows how long it took me to arrive at ‘infinite MySQL’ architecture — there was some preceding work. To be precise, it all started several years before, when a little SFBA startup asked me to implement the fastest possible XSLT engine. The way we did, it was cross-compiling/transpiling XSLT stylesheets into C code. One can do some cool things that way. That’s why Steve Jobs was trying to shit on cross-compilers. Steve Jobs says cross-compilers (like Flash CS5) make sub-standard apps
.Maybe somebody finds this interesting. Eventually.
For a while now, I have been waiting that somebody (anybody) would make the next natural generalized step after Memcached. Alas. Nobody did it. They all are rehashing Memcached, not adding much power to it. The next step is (naturally) to blend caching layer with the declarative layers, like ‘make’ did. Patentable, clearly.

References

Univca was effectively a (load-balancing) server, consisting of HTTP protocol with LRU in-memory cache, which eventually mutated into a super-efficient PHP transpiler (hence becoming a pragmatic replacement for the entire LAMP stack)

After 2008 the following products inherited some of Univca features :

  1. Unicorn ( application server inherited the name )
  2. Hip-Hop ( inherited the PHP transpiler )
  3. Redis ( tried to extend the key-value pair into key-complex-type pair )

By Paul Tchistopolskii

--

--