Neural Search and the unbundling of Google

The idea of Neural Search isn’t new — it’s been a hot topic at NeurIPS (the leading Deep Learning conference) for the last 5 or so years. As with most technology developments, incremental development usually leads to incumbents capturing value. We believe there’s been a step change in the Neural Search space due to recent advances in LLMs, vector databases, neural frameworks and multi-modal AI— opening up the opportunity for startups in the space.

Sivesh Sukumar

Published in

Balderton

5 min readDec 21, 2022

People have been preaching the death of Google since it was founded — has the time finally come?

What’s happening?

Understanding the shift from symbolic search to semantic search

In a nutshell, Neural Search just means leveraging neural networks to understand and interpret user queries. It is an iteration of search engines, which until recently used symbolic / lexical algorithms (i.e. word matching). The issue with lexical search is that you have to define every single rule to robustly manage typos, acronyms and a range of other things that we take granted when using search engines — for example, computers don’t just know that red can also mean maroon, they have to be told. Even tech titans struggle with search today — have you ever found yourself searching Twitter or LinkedIn via Google?

Elasticsearch has been the go-to tool for lexical search for a decade (with Algolia hot on its tail) and even in the current environment it has a $5bn market cap. It’s clear Search is a large market, and that’s without discussing the $1tn company built on it. Around 5 years ago that company, Google, began to shift its algorithms from symbolic search to semantic search (i.e. algorithms which can understand and interpret intent), which is the basis of neural search — and it did so using BERT, one of the most notable LLMs.

But today, the landscape is changing, and many companies leverage semantic search. Platforms such as Hugging Face have opened up access to this technology, meaning semantic search is no longer reserved for Big Tech, and creating significant opportunities for startups in the space.

Note: It’s worth mentioning that there are often scenarios where symbolic search makes sense so it’s optimal to find a balance of both symbolic and neural search (hybrid search).

Sentence Transformers and Embedding Valuation

Why now?

Opening up access to embedding models, vector databases, neural frameworks and multi-modal AI

Much of Machine Learning is built on vector embeddings, which essentially means converting content (text, video, images etc…) into vectors so that computers can understand it. Vector databases handle the unique structure of vectors and are able to group them — allowing for fast retrieval and similarity search. VectorDB’s have a huge range of benefits, from semantic search, to ranking, to deduplication and anomaly detections. Crucially, they not only help you find information but also help you group and detect similarities, which is great for matching engines — an increasingly important technology in the rise of B2B marketplaces.

But until recently, only tech giants could develop and manage these vector databases, as it’s hard to get an ROI unless they’re calibrated correctly. Now, that’s starting to change, with new models and startups making them accessible to the masses.

Last week OpenAI released a new model to make SOTA embedding incredibly cost-effective. Cohere also has its roots in these concepts. A range of startups are also on a mission to bring vector search and databases to the masses, including vector databases Weaviate and Qdrant. Some notable Neural Search Frameworks, which are used to help tie everything together, are Deepset and Jina.ai.

Neural Search is intertwined with the concept of multi-modal AI, which means models that can tackle multiple modes of content (image, text or video). This increases the amount of contextual information a model has and can greatly increase performance, which we’re seeing with multi-modal Neural Search models. The idea of being able to search all the files on your laptop at once (not just by name but also by scanning the underlying content whether it be text, image or video) has long been discussed. Apple gave it a good go with Spotlight but fell short — now Rewind is taking this head on with multi-modal AI. In the same way Algolia emerged as “ Elasticsearch but easy” — we’re seeing a range of companies emerge with the same mindset for multi-modal Search, such as Marqo, Twelvelabs and Nuclia.

What’s next?

Verticalised Search Applications and the unbundling of Google

We’ve all heard tales about the death of Google, but it’s worth mentioning that 40% of GenZ’s go to TikTok or Instagram, rather than Google, for their average search. With SOTA search technology now out in the open and not locked away in Mountain View, there’s an emerging opportunity for search applications focusing on a specific vertical and a great UX.

For example, Bloop is building a platform for searching large and complex Enterprise codebases with ease — this was once a superpower granted only to engineers at Facebook and Google but now engineers at any enterprise can gain the productivity boost. Similar business models are being applied to verticals such as medical, legal and financial services, and I’m excited to see how they evolve.

Final thoughts

There’s a lot of deserved hype surrounding Generative AI and in many ways Search is just overfitted Generative AI. Google has long been the gateway to the internet and ChatGPT has shown us that might not always be the case. At Balderton, we believe that Search is one of the most exciting areas in AI today — we’ve backed many AI-native companies in the past 12 months such as Levity, PhotoRoom and more that are unannounced. If you’re building in the space, feel free to reach out to me at ssukumar@balderton.com