The New World of Vectorization and AI-First
Author: Sam Ramji
In this episode of the Open||Source||Data podcast, Sam Ramji talks to the CEO and co-founder of SeMI Technologies and co-creator of Weaviate, Bob van Luijt. They talk about vector databases and how they will benefit search and the AI-first ecosystem and what their components are.
We’ve been through many stages of database development and storage over the years: hierarchical to relational, SQL to NoSQL, datacenter to cloud-native, and according to Bob van Luijt, we’re entering into the AI-first era. Van Luijt is the CEO and co-founder of SeMI Technologies and co-creator of Weaviate. Weaviate is an open-source vector search engine that stores both objects and vectors.
In this episode of the Open||Source||Data podcast, I invited Bob van Luijt on the show to break down vectorization and the advantages it provides. We also discuss the AI-first ecosystem and the niches that make it up.
The Possibilities with Vectorization
As the co-creator of Weaviate, an open-source vector search engine, van Luijt was the perfect guest to break down vectorization. The idea of vector databases emerged when folks were leaning into machine learning to solve problems but needed a way to efficiently search through millions of vectors on scale, either through a database or a search engine. So vector databases were born, which helped bring machine learning base applications to production.
While you can vectorize any data, searching for it is limited to your information. For example, if the JSON object of a document contains ‘Eiffel Tower” and its address, “Champ de Mars, 5 Av. Anatole France, 75007 Paris,” you would only find that document by searching a combination of the words Eiffel, tower, and Paris. You wouldn’t find it by searching for “landmarks in France”, for instance.
However, if you ran the JSON document through a vector search, you would match the vectors and find a document closest to your search. A vector database allows people to describe the higher-order relationships between data sets and discover similarities that wouldn’t be possible with a traditional rectangular query.
“I think if you look at platforms like Hugging Face, you see the great things people are building and what people are vectorizing. [A vector is] just basically the thing that comes out of the machine learning model.” — Bob van Luijt, CEO and co-founder of SeMI Technologies and co-creator of Weaviate
The 4 Niches of the AI-First Ecosystem
According to van Luijt, we see the third wave of database technology. The first wave was in the 1970s and 80s with SQL. The second wave was NoSQL. Now, we’re entering what van Luijt calls an AI-first wave.
Within this wave are four niches:
- Embedding providers
- Neural search frameworks
- Feature stores
- Search engines
The embedding providers are companies like Hugging Face and OpenAI, providing embeddings necessary to do vectorization. Neural search frameworks are tools used to work with the models, like Haystack or Jina. Feature stores like Featureform and Tecton can store large amounts of data to do vectorization. And finally, search engines, like Weaviate, search through data on a large, vectorized scale.
“I’d have to argue that we see a third wave coming up, and the third wave, I simply call it AI-first.” — Bob van Luijt, CEO and co-founder of SeMI Technologies and co-creator of Weaviate
This AI-first wave of database technology is a new way of how we look at, work with, and structure data. We will still need the databases of prior eras, as they solve problems in classical applications and processes extremely well. But for the future of autonomous applications, we’ll need the AI-first stack. It’s an exciting time and there is much for us to learn together.
About Bob van Luijt
Bob van Luijt is the CEO of SeMI Technologies, and co-creator of Weaviate, the open source vector search engine. Bob started his first software company when he was growing up in The Netherlands, at age 15. He went to music school in Holland and moved to the U.S. to study music at Berklee College in Boston, all while growing the company. Bob also has a TEDx talk on the relationship between language and software, and why he believes we can create anything with software.