Is it still worth it to develop a custom RAG application in 2024?

Published in

enefitit

6 min readMar 13, 2024

In short, RAG or Retrieval-Augmented-Generation is a technique that improves the capabilities of large language models (LLMs) by giving it access to relevant outside information before generating a response. It is done by first finding the most relevant information from a collection of documents, and then using this information to create its response. This makes the responses from the system more informed and up-to-date, enhancing the overall quality of the interaction. It’s like giving the system the ability to do quick research before answering, making it more reliable and helpful.

There has now been some time after the unveiling of ChatGPT GPTs. It easily allows to create custom versions of ChatGPT that combine instructions, extra knowledge, and tools for example code interpreter. At the time of release, the results were not looking too promising. However, these systems have now matured and produce quite good results. Additionally, Azure has now announced their Assistants API, that promises Copilot-like experience with similar capabilities. OpenAI is also testing out ChatGPT with memory. These systems all have RAG already working in the background, so is it even worth it to make your own RAG application in 2024?

When to use built-in RAG service

If the data and use-case you are after, are generic and straight-forward, then in my opinion, there is no longer any point in building a custom RAG application. Moreover, I would recommend setting your application up with one of the above services anyways. It is simple and will serve for you as a great starting point.

In the best case scenario you will solve your use-case and start producing value. But even if it doesn’t work as good as you expected, it is a proof-of-concept that you can showcase and use as a base to build on. If it doesn’t work at all, then custom RAG application most likely won’t solve it either.

Keep in mind that no LLM application is perfect and going custom isn’t going to make it so either!

When to go custom

“If a tool can be used for everything, it isn’t great at anything” — my woodworking teacher about multi-tools

Just like a multi-tool is easy to use, relatively cheap and can accomplish a very wide variety of tasks, it falls short on more specific jobs. If using a ready-made service showed promise, but didn’t produce good enough results, developing a custom RAG solution might help improve the results.

Biggest issue with using built-in services is that everything is done in a black box. You have no visibility on what parts of the RAG system is causing issues. Building a custom solution gives you power over choosing between different retrieval strategies, document ingestion, adding additional processing and much more. However this requires deep understanding of the techniques and a lot of custom work. If you are up for the task, next chapter gives some pointers on where to get started and what to keep in mind.

Cheat-sheet from LlamaIndex blog about advanced RAG techniques

Where to get started on your custom RAG application

Use frameworks

There is no point re-inventing the wheel. I‘d recommend the 2 most established frameworks: LlamaIndex and LangChain. LlamaIndex having more emphasis on providing the data for your LLM applications, while LangChain offers same capabilities, but with focus being on chaining of requests and agent features. These two are not exclusive and can be used together to use best parts of both frameworks.

Frameworks allow to skip the cumbersome tasks of setting up the structure and allow you to focus on the tuning of your LLM application itself. They take care of error handling, retry logic, logging, vector DB connections and LLM connections. In addition, frameworks provide easy integration with observability applications to give insight on how your LLM is working and performing. Lastly frameworks include ready made strategies to improve your LLM application, which you can then customize to give the best results for your use case. These include among many others data retrieval strategies, pre- and post-processing of LLM answers and example prompts for different use-cases.

Observability

Knowing what goes on inside the RAG system is essential to finding errors and finding areas to improve. Custom RAG solution has a lot of parts that impact the performance. This means that overview of the whole system and individual components is crucial. Fortunately frameworks mentioned above give you tools to achieve this. LlamaIndex has handlers built in, that give you possibility to trace steps happening in the application, this data can also be used with external monitoring tools. LangChain has developed their own observability tool LangSmith offering a full monitoring solution. In addition, I would recommend making the LLM cite its sources in the answer, making it less prone to hallucinations.

Know your data

Answers from your application can only be as good as the overlying data that is augmented with RAG. Some of the things to keep in mind are:

Is the data in knowledge base correct — quite obvious, wrong data produces wrong results

Format of your data — All data needs to be processed to plain text format, however for example, table and image based documents require different kind of processing from PDF and text documents. Your document processing should accommodate for the specific structure of your base information.

Is everything relevant — If your document knowledge base contains a lot of unrelated data, the retrieved information by RAG will become muddy. Less relevant augmented data means answer quality will suffer.

Make use of metadata — Besides the information contained in the document, they often include metadata like document title, creation date, author etc. This can give context to the LLM model when answering and enables it to produce better results.

How to gather knowledge-base data — Keeping the knowledge base up to date is essential for good answers. Building the data retrieval and updating pipeline is a difficult task, so make sure you have the means to make it automatic.

Testing is key

At first you can evaluate the results of improvement/downgrade by yourself, however this is subjective and will only be visible if big changes are achieved.

For a production application a more automatic and programmatic approach is needed. There are now several frameworks developed for testing the RAG application quality. One popular one is a framework called Ragas. In essence you have to define a set of baseline questions and a “perfect” answer to what you would expect you RAG application to give. Then it leverages LLM technology to rate the actual answer against the “perfect” answer. It can also check things like if the RAG retrieved information is contained in the “perfect answer”.

Creating the baseline question-answer pairs is quite a lot of work, but will give you concrete metrics to test new developments and their impact on the system. Its well worth it!

Embrace the evolution

RAG and LLM-s are in constant evolution. New competing models come out and new RAG improvement techniques are researched. It is important to keep yourself up to date and choose the best new tools. One of my favorite newsletters to keep up to date is Last Week in AI, but choose what suits you the best.

To leverage this, keep your options open. Structure your code in a way to be able to switch out components easily. For example if a better quality/ cheaper LLM comes out, make sure you can plug it in your RAG application with ease. This is where frameworks will help you a lot, new model integrations are released quite fast and switching out components like models and vector databases is a matter of changing the configuration.

Why take my advice

I am a cloud developer, who took interest in LLM technologies and for the past half a year I’ve been leading the development of a LLM application in our innovation unit.

Long before OpenAI Assistants or MS Copilot, briefly after the time GPT-3.5 was released, we started to tinker with an idea of creating a chatbot, that could could give answers to employees based on company documents. Our company Enefit is a large energy producer group operating in the Baltics, Finland and Poland. We more than 5000 employees and tons of documents in multiple languages. This kind of application improves information search and employee on boarding greatly.

Our application is still by no means perfect. We built a question answering chatbot from scratch, but its also evident that with the new built-in services our use-case is being replaced. However, for the time being, using our own solution is still cheaper and more secure than using ready-made services. It gave us the knowledge to start developing RAG applications for more domain specific data and use-cases.