FaQ & Tips/Tricks

AutoRAG

5 min readJun 18, 2024

🛣️ Support plans & roadmap

📌 Chunking Optimization AutoRAG support plans ?

Currently, evaluations in AutoRAG is now chunk-dependent.

So it is important to make good corpus before evaluating it.

Of course we are aware of chunking’s importance.

And we have plans to develop it in the future.

We do have a support plan, but it’s a topic that requires a lot of thought and research, so we can’t say exactly what it will be!

📌 Ability to know the estimated cost of an experiment Support plan?

As much as we use the LLM, many of you are wondering how much it will cost before you start running experiments, Of course, it’s in the support plan!

We’ll support it in that issue :)

📌 Any plans to support multiple vector DBs?

There are no plans to support multiple vector DBs yet.

We decided that the performance of the Vector DBs was sufficiently leveled upwards.

We’re currently using ChromaDB running locally.

AutoRAG will find the best RAG pipeline for your data, we don’t provide a distribution or code for production, so we don’t have any plans to support different vector DBs at this time :)

💻 Hardware specs

📌 GPU minimum specs?

To fully utilize AutoRAG, we recommend a computer with a CUDA GPU.

Specifically, we recommend a GTX 1000 or higher GPU.

LLM also requires a system with RTX 3090 and 4090 class GPUs with at least 20GB of VRAM if you want to run it on AutoRAG.

If you don’t have CUDA : It may be difficult to run various reranker modules. → Use of API modules such as OpenAI is recommended.
GTX 1000 or higher GPU (CUDA) : Can run reranker modules. Cannot run LLM locally.
3090 and above : All functions can be run.

⭐About running AutoRAG

📌 How to use another LLM other than OpenAI or Local LLM (how to configure Yaml) ?

AutoRAG can use all LLMs supported by the LlamaIndex

1. Use `vllm` module

We recommend using VLLM for fast inference!

We developed it separately to be parallelizable, so you can experiment faster than using the llama_index_llm module :)

Docs: vllm
Sample YAML

modules:
 -module_type: vllm
  llm: mistralai/Mistral-7B-Instruct-v0.2
  temperature:[0.1, 1.0]
  max_tokens: 512

2. Use `llama_index_llm` module

Docs: llama_index_llm supporting llm models
Sample YAML

modules:
  - module_type: llama_index_llm
    llm: openailike
    model: mistralai/Mistral-7B-Instruct-v0.2
    api_base: your_api_base
    api_key: your_api_key

❗ How to use LLMs other than the 3 LLM Model Types (e.g. `ollama`, `groq`)

[Tutorial] Use `Ollama`

Add ollama with the code below in python

import autorag
from llama_index.llms.ollama import Ollama

autorag.generator_models["ollama"] = Ollama

2. Configuring a YAML file

nodes:
  - node_line_name: node_line_1
    nodes:
      - node_type: generator
        modules:
          - module_type: llama_index_llm
            llm: ollama
            model: [llama3, qwen, mistral]

Additional parameters can be used directly in a YAML file

📌 Where can I find RAG code with an optimized pipeline?

No direct code is provided in the form of Langchain or LlamaIndex.

AutoRAG provides the following two features

Optimized RAG pipeline YAML file extraction.

Docs: Extract pipeline and evaluate test dataset

autorag extract_best_config --trial_path your/path/to/trial_folder --output_path your/path/to/pipeline.yaml

2. RAG answers for the optimal pipeline are available in the CLI, API Server, and Web Interface (Streamlit).

Docs: Deploy your optimal RAG pipeline

If you want to build it for production yourself, you can write RAG code using the best methodology!

📌 Even with `Compact.yaml` it takes 1-2 hours, this seems too long, is this normal?

Unless you have a CUDA-enabled environment, Reranker may take a long time when using compact.

In that case, we recommend using the simple_openai.yaml file first!

📌 If I get interrupted during an evaluation, is there a way to resume?

If an evaluation is interrupted, we have a restart_evaluate function that can resume the evaluation from where it left off.

Use at CLI

autorag restart_evaluate --trial_path your/path/to/trial_folder

Use Python Code

from autorag.evaluator import Evaluator

evaluator = Evaluator(qa_data_path='your/path/to/qa.parquet', corpus_data_path='your/path/to/corpus.parquet')
evaluator.restart_trial(tiral_path='your/path/to/trial_path')

→ Docs: Restart a trial if an error occurs during the trial

📌 Can I use languages other than English in the same way?

AutoRAG defaults to English!

You can use them in the same way, but if you are using a specific module that uses prompts, you will get better results if you enter the prompts for that language separately!

📌 Is it possible to do table Q&A within documents?

To build a RAG pipeline that utilizes tables within a document for QA, you will first need to extract the information from the tables using a separate OCR or parser that can extract them :)

🍯 Tips

📌 What do you think is an appropriate number of QA datasets?

I think you need around 100 or so.

In some cases, you may need much more than that if your RAG questions are diverse.

📌 I’m curious about your favorite LLMs to experiment with.

Our team is using SOTA models within acceptable security boundaries.

Currently, we are primarily using GPT-4o.

📌 Instead of embedding the chunked text directly, do you sometimes embed it by putting metadata (document title, etc.) in front of the text? I’m wondering if this is a meaningful preprocessing in the retrieve step.

Instead of using the text after the chunk directly as the Contents of the corpus, (title) + (summary or metadata) + (chunked text) for embedding.

It’s a good tip and trick to try when improving retrieval performance :)

❗But remember, different data will perform differently. ❗

Some preprocessors may improve Retrieval performance significantly on certain data, while others may only improve it marginally or even decrease it.

In the end, you’ll need to experiment to find the best method for your data.

AutoRAG was created to make these experiments easy and fast, so we recommend using AutoRAG to do some quick experiments 😁.

📌 Share your tricks for boosting RAG performance in domain-specific, jargon-laden documentation!

The more jargon-ridden a domain is, it is important to construct a realistic evaluation QA dataset.

For non-experts, they are less familiar with the jargon and ask more vague questions than precise ones, so retrieval using VectorDB with high semantic similarity may perform better.

Conversely, for experts, the matching-centric BM25 may perform better because they know the exact terms.

Therefore, it is important to construct a well-crafted QA dataset for evaluation that takes into account whether RAG is used more by laymen or experts in real-world environments, and in the end, everything should be judged by realistic data-driven experiments !