The Next Generation of Gen AI Apps and How to Build One

Madhukar Kumar
madhukarkumar
6 min readMay 7, 2024

--

Sitting in a room at the iconic British Library is a book that was printed over a 1,000 years ago.

It is called the Diamond Sutra.

The ancient artifact is considered to be the very first printed book published in 868 AD that consists of a manuscript of a spoken dialog between the Buddha and his disciple.

Even though the subject of the book is remarkable, what is even more interesting is the fact that it took a few hundred years after the first printed book came into existence for us humans to realize the power of the new medium. Eventually, the mass production of printed books led to the diffusion of knowledge about politics, religion, music, theatre, and fiction, as well as the coining of the famous adage, “The pen is mightier than the sword.”

Printed books led to newspapers, a way to disseminate news to large masses. Then came the radio, which initially mimicked the newspaper and was primarily used to distribute news, and finally, TV, which initially mimicked radio by airing shows that were like radio shows but with pictures.

Call it “Media Mimicry” or “Evolution through Echoes” but over the ages, all media have evolved from mimicking the media before it before the full potential of the new medium was realized and led to metamorphosis of the human civilization to the next level.

We are seeing this all over again with the web and year two of generative AI.

Many believe that the web has already been through three evolutions. Web 1.0 was read-only and looked a lot like newspapers.

Web 2.0 became read and write as users started to engage with the websites that evolved into web apps and started generating their own content through social media etc.

Web 3.0, some argue, was about reading, writing, and owning the content through decentralization with Blockchain.

And now we are seeing the birth of a new web. Unlike web 1.0 which had no personalization and web 2.0 that had partial personalization, the new web is custom built for each individual. The new web is driven by knowledge synthesis and is agentic (can take actions on your behalf).

Above all, the new web is conversational.

Let’s look at an example (see video below). The user chats with LLM and gets back not just text but video, images, and live widgets in response. The user then saves these widgets and artifacts, and every conversation thread becomes a new “web page” and another layer of context for future interactions.

In this article, we will look at how to build this new generation of generative UI apps.

But, before we get to that, let’s answer the crucial question — LLMs have become commoditized to an extent in the last few months, so if everyone has the same LLM, then how can your app be differentiated? What makes your application different than every other out there?

The answer is not as dramatic but something we humans have been using ever since the dawn of civilization to make all of us unique — knowledge and tools.

Differentiating your AI App

The reason why any AI app is different from a plethora of apps today is because of two key factors. Knowledge and Tools (APIs) that together are evolving into what some companies are calling Agents while others are calling Assistants. Each agent or assistant has custom knowledge and custom tools or skills that mimic humans in some ways.

Let’s look at each of these from the perspective of a company that is looking to differentiate its products in this rapidly changing age of AI.

1. Knowledge, data, or information that is specific to an Agent consists of both structured and unstructured data. For companies, this includes data in different databases and tables of relational data, semi-structured data like JSON, or unstructured data like PDFs, images, videos, etc.

2. Tools consist of other apps that an agent or assistant may use to take actions like searching the web, submitting a form, getting additional information, etc. For companies, these include internal and external API calls that allow them to perform action. For example, sending an email or updating a payroll system.

When it comes to managing knowledge or data for building applications, the entire world seems to be heavily over-rotated toward mining unstructured data using Retrieval Augmented Generation (RAG).

But here is the dirty secret about data in very large companies — most business-critical data in large enterprises is in a structured format.

So, how do you do RAG over structured data?

What good is a vector database when it comes to either relational tables or mining insights through analytics?

I can think of a couple of options.

Option 1 — Use LLMs to convert the user queries from natural language to SQL and send them to the database of your choice. However, there are a few issues in this approach. First, most LLMs are only 30% accurate when they generate SQL. Second, most companies have dashboards with over three pages of long SQL statements, and LLMs are not yet good enough to build these queries. Finally, there is a specific context built inside the data that is specific to companies. For example, my company has a different meaning of fiscal year than other companies.

Option 2 — Function calling to use pre-written SQL queries and using hybrid search over structured and unstructured data (Incidentally SingleStore allows SQL over structured and vector data in single queries). In this option, the SQL queries are written or reused but exposed as functions in Open API schema so that they can be used by LLMs as tools. The functions become API endpoints so that LLMs can simply discover and call them when needed for specific questions users ask instead of directly generating SQL.

Fun fact — Open AI seems to be headed in this direction as well since both Agents and Custom GPT have the same constituents — Retrieval (upload your files) and function calling or actions or tools to call APIs. For retrieval, Open AI recently announced that 10,000 documents can be uploaded to one agent. However, OpenAI still doesn’t support retrieval over structured data, and if I were to make a guess, they might add this by acquiring a small database company.

How do we build Next-Gen AI apps?

Let’s break down the app into two key requirements. For this example, I will use NextJS Vercel AI and GenUI SDK since they are full-stack React frameworks. For the backend, we will use SingleStore that is a full stack database with support for both SQL based relational data, JSON, Vector Datatypes and split second analytic across petabytes of data.

Requirement 1 — UI — Ability to respond with data in the form of widgets. When a user asks a question, we want the response to not only include text but also widgets with buttons and charts with visual analytical insights.

Requirement 2 — Data — We want the database to be able to run queries over SQL with analytics and combine them with keyword and vector search when querying unstructured data. For example, a query that involves an uploaded image of a shoe and a request to show products similar to the picture but with filters by brand, price, popularity, and availability.

Here is an example of a single powerful query that involves vector search, relational joins and analytics (example shown with SingleStore).

Now let’s look at the overall overall flow of how this gets called using an LLM’s function calling feature.

Conclusion

New medium always mimics the old medium till it evolves into something entirely new with an inflection point. We are now squarely in the middle of that inflection point with generative AI. In this article we looked at the fundamentals and architecture of building enterprise grade next gen AI apps.

I will also be shortly sharing the demo app url and the Github repo so that you can use the code as a starter for your own next gen AI app.

✌️

--

--

Madhukar Kumar
madhukarkumar

CMO @SingleStore, tech buff, ind developer, hacker, distance runner ex @redislabs ex @zuora ex @oracle. My views are my own