LLM — Landscape

Published in

Israeli Tech Radar

5 min readMay 20, 2024

LLM, Gemini, and OpenAI are currently generating a lot of buzz on the internet and within many companies. Any company that takes itself seriously must have some form of AI in its toolkit.

I’ve recently started a proof of concept (POC) for an LLM project at a company and would like to share my journey and the insights we’ve gained along the way with you.

I work at a professional services company called Tikal, where we assist other companies with their projects. Recently, we received a client request for assistance with an LLM (Large Language Model) project. The project is led by a data scientist who needed help creating a proof of concept (POC) for the CTO. I joined forces with another backend developer, and later on, another data scientist joined our team.

The actual details of the project do not matter, so I will be discussing more about the technological aspects and even more about the scope of the project.

Quality

No matter your project — whether it involves creating content, activating APIs, or retrieving documents — the most crucial step is to validate the quality of your application.

Like any ML model that you create, you need to do the following steps:

Data Preparation: Depending on what you want to do with the LLM, you need to make sure that the data you feed the LLM is as good as possible.

Training the Model: While prompt engineering is essential for getting the most out of any LLM application, in some cases, fine-tuning the underlying LLM itself can provide an extra layer of customization for your specific needs.

Testing the Model: It is essential to have a varied and thorough test suite to verify that your code is operating correctly and yielding the anticipated outcomes. It is also customary to enable others to utilize the application and document their interactions to establish a genuine set of tests.

You also need to save the parameters under which the application was run alongside the test set. This includes details such as the language model (e.g., Gemini, OpenAI) and its version. If you’re using prompt engineering, it’s important to version your prompts and save the version numbers accordingly.

Model Validation: Just as with testing, you should be able to perform a suite of checks to validate whether your application is still producing the correct results that you desire.

Model Deployment: One of the major challenges in deployment is monitoring your model to ensure that it does not drift. New tools, such as Langsmith, have been developed to address this issue.

What does your app do?

Now that we have a framework for running and testing the app, we can start to look at what the app does.

LLM is an umbrella that can implement multiple types of applications. The simplest is of course the rag.

RAG

The core idea behind the RAG (Retrieval-Augmented Generation) application is to incorporate context that the language model (LLM) does not inherently possess into its requests. This can be achieved through a vector database or any data source that suits your needs. Several existing products facilitate this process, such as hosted solutions like Bedrock, LangChain, or using Airbyte with ChromaDB.

Data Enrichment

Let’s say Netflix decided to add a feature that allows users to search for movies based on their content, whether it’s the location where the movie was filmed or specific phrases and sentences spoken in the movie.

Starting simply is always best. Therefore, for movie metadata that Netflix can obtain from external sources, I would store it in a standard relational database.

Now we come to the text that was in the movie. So Netflix will need to get this from a third source or use some voice-to-text library. Once the text is extracted the best way to search the text is to use a vector-db. There are many out there and I will not go into them I will just list some issues that you need to take into account:

Chunking of the data. The text of a movie can be very lengthy, and searching within long texts is not always efficient.
Caching: Since LLM can be expensive, you should check if you can cache the requests so that each request does not need to go to the LLM. This will also enhance performance.
Retrieving Data: We know how to add and retrieve data from a vector-db, and we know how to do it from a relational DB, but what about both? What do you do when you need to select according to multiple criteria? This is something that you need to take into account when choosing a vector db.

Chat

Another main feature of most LLM applications is the chat. You get text from the user, and based on this text you will find data or run some API, and then return a result to the user.

This means that you usually need at least two API’s. The first API is the chat. This API needs to be able to allow the user to refine his request. So you might have some interface where you get the text, analyze it, and return to the user what you understood from the text. The user can then either request to “run the query” or to refine his request.

You then also need another API to actually run the request. This can be a search where you return results or this can run some application logic for instance to create a presentation based on the request.

GUI

Most applications have a front end for interacting with an LLM. The simplest frameworks that support chat-type LLMs are Chainlit and Streamlit. Each has its pros and cons, but Streamlit tends to gain more traction and offers more UI components.

While both are nice, in my opinion, they are mainly for a POC and not production grade. For production, you need to embed the LLM capabilities within your current application.

Summary

LLM applications hold immense promise and have a vast amount of untapped potential. Their capabilities extend far beyond what we’ve seen so far, and I am eager to see the innovative and diverse ways in which people will leverage these technologies. From enhancing customer service with more intuitive chatbots to revolutionizing content creation, the possibilities are endless. It’s exciting to imagine how these applications will evolve and the creative solutions they will inspire across different industries and domains.