What Does OpenAI’s Acquisition of Rockset Mean for AI Companies?

Madhukar Kumar
madhukarkumar
Published in
6 min readJun 22, 2024

In March of 2020, as the world huddled behind closed doors to stare into their glowing devices’ screens, a curious little tech company emerged from nowhere. It was called Clubhouse. Imagine a live audio platform where people could gather in virtual rooms, listen to speakers in real-time, and then watch the content vanish into the digital void once the session ended. This technology was eerily similar to another lesser-known groundbreaking tech that came before it — the Radio!

In just a few short months, Clubhouse became the shiny new toy everyone wanted to play with, boasting around 10 million active users per week.

Then, a year later, similar to the content that vanished into the ether after being aired, Clubhouse itself vaporized from the frequently overloaded and permanently ephemeral attention of the masses.

This was not the first time we saw such a movie.

A similar tale unfolded years ago with Evernote, though it still lingers on, or with del.ici.ous, my personal favorite at the time. This creatively crafted tool and the even more creative domain url (https://del.icio.us) was a social bookmarking service that once held a special place in the hearts of the early 2000s internet aficionados, yours truly included.

So, what happened to these companies and the once once-coveted shiny tech toys? Why did they disappear into oblivion after reaching such dizzying heights of 15-min fame?

I call it the “Microsoftization” of companies.

Essentially, these entities were never truly products; they were features masquerading as products. And when features try to stand alone, the overlords of the industry — large behemoth tech giants — swoop in, incorporating these features into their own sprawling ecosystems, which already boast massive audiences and even more massive distribution channels. Pinterest absorbed the essence of del.ici.ous, Twitter and Discord added audio-only channels, and Google, Microsoft and Apple, all have their own takes on notetaking in their eco-systems.

So, why am I bringing this up now?

Well, OpenAI just announced the acquisition of Rockset, a relatively lesser-known database.

To me, it screams of two things:

  1. AI is now a feature and not a product.
  2. Correspondingly, a Vector-only database is a feature, not a product.

Stay with me as we unpack this.

But, before we dive into what this means for companies, especially those currently using Rockset, let’s ponder this: Vector-only databases like Pinecone, Milvus, and Weaviate have been all the rage recently (anyone remember how Pinecone was valued at $750M when they raised $100M?)

So why did OpenAI choose to acquire Rockset, which touts itself as a real-time analytics database, instead of one of these trendy vector-only shiny options?

Before we answer this question, let’s look at what exactly Rockset is and what is it known for from a database perspective.

Let’s stroll down memory lane. Some years back, Google released a key-value in-memory storage library called LevelDB. It was great if you wanted to do some real-time analytics but turns out it would perform dismally if the dataset was larger than the memory of the machine running it.

Some Facebook engineers that were working with this library built and then open sourced an embedded database on top of this and called it RocksDB. This open-source DB now added the ability to run the extremely fast in-memory processing to flash-based storage in addition to memory.

Cut to a few years later and some of the same Facebook engineers then went on to start a database company built on top of RocksDB, and called it — yes you guessed it — Rockset.

Rockset took the LevelDB and RocksDB a few steps further with connectors and support for real-time analytics and added a three-tier storage from in-memory to local disks to object storage. Believe it or not, this three-tiered storage architecture is identical to SingleStore.

For full transparency I work for SingleStore, and I promise this blog is not about what SingleStore does but feel free to check out the docs on your own.

But the similarity between Rockset and SingleStore ends there because Rockset works on semi-structured data (it’s primitive datatype is a Document) and does real-time analytics. In other words, you can bring in a JSON document which then gets flattened into a table and now you can run SQL queries, especially analytics based queries on the data. In SingleStore, on the other hand, you get support for ALL datatypes including JSON except for RDF (Graph) across trillions of rows with sub-second latency, but I digress.

So back to OpenAI acquiring Rockset.

Why, on God’s green earth, did OpenAI acquire Rockset instead of a Vector-only database given that it is an AI company?

Two words — structured and semi-structured data.

Ok, that is more than two words, but you get the point.

When OpenAI released a feature called Assistants (we all know it as Agents!) last year, it added a feature for this called the Retrieval tool. I wrote about this at length here in case you are interested in going down rabbit holes.

Screenshot of OpenAI Assistant’s Retrieval Tool

This retrieval tool that was available as both no-code and API versions allowed users to upload documents that would become available as a context for ChatGPT. In other words, it did Retrieval Augmented Generation (RAG) invisibly by using Vector search aka semantic as a feature! Hopefully you are now getting where I am going with this whole product vs feature thing.

Then a few weeks ago, OpenAI added some few new features — the ability to read through Google sheets and create native tables for data analysis and building charts through ChatGPT.

OpenAI Data Analysis Feature
OpenAI Tables Features

So, what was missing from all of this?

Real-time analytics!

And real-time analytics, as we all know, is not typically done from PDF docs but rather from tabular aka structured and semi-structured data.

My bet is that OpenAI will now extend the Retrieval tool with features to connect with more structured data sources — MySQL, SingleStore (MySQL wire protocol compatible), Postgres, Kafka and others. This means that users will now start bringing in both unstructured and unstructured data into the OpenAI eco-system.

But wait, what is all this talk about product vs feature then?

Three things:

  1. AI is a feature (powered by LLMs).
  2. Vectors are a feature (Did I mention Rockset and every other database and their grandmother already has vectors as a feature now).
  3. Data is the real product.

The company that arguably created the first LLM has now moved into making not just its LLM but data as the true differentiators.

If you are a Rockset customer, I feel for you but given that the query language is SQL, the outlook is not as bad as rewriting all your app code.

As for the data itself, consider taking control (too bad Rockset’s snapshoting and restore process is still in private preview 🫣) over it immediately. I could give you a step-by-step guide, but they would be biased towards the product I love and work for.

However, if you think of yourself as an AI differentiated company, I would really start thinking hard about data, especially real-time curation of YOUR data for your AI features.

✌️

--

--

Madhukar Kumar
madhukarkumar

CMO @SingleStore, tech buff, ind developer, hacker, distance runner ex @redislabs ex @zuora ex @oracle. My views are my own