Cutting Cost and Enhancing Performance: Minifying Markdown Tables to Improve Token Efficiency in Retrieval Augmented Generation (RAG)

Budi Syahiddin
Government Digital Products, Singapore
10 min readMay 16, 2024

Introduction

At my summer internship at GovTech GDS Central, I am working on an LLM agnostic Visual Studio Code extension to allow developers to chat with an LLM within the editor to help with their daily coding needs. In addition to chatting with various LLMs, the extension is also able to connect to any backend server that does RAG. The backend server is able to do RAG by connecting to a knowledge base and using the knowledge base to generate responses for a given query. One of the biggest use cases of this for developers is looking up internal documentation which LLMs do not have any knowledge of.

What internal documents are we talking about?

SHIP (Secure Hybrid Integration Pipeline)-HATS (Hive Agile Testing Solutions) is a Continuous Integration/Continuous Delivery CI/CD component within Singapore Government Tech Stack (SGTS) with security and governance guardrails that enable developers to plan, build, test and deploy code to production. The SHIP team has written many high quality, reusable templates and documentation to help developers to get started on the SHIP-HATS platform. However, there are many of them, and developers who are new to CI/CD may find it daunting to get started. An example of a template looks like this:

.publish-maven-artefact:
image: $NEXUSREPO_DOCKER_PROXY_HOSTNAME/maven:latest
tags:
- non_privileged
- no_root
- cstack
script:
- export VARIABLE_NAME=MVN_SETTINGS_FILE
- !reference [.check-variable-sh, script]
- export VARIABLE_NAME=NEXUSREPO_REPO_GROUP_ID
- !reference [.check-variable-sh, script]
- export VARIABLE_NAME=ARTEFACT_ID
- !reference [.check-variable-sh, script]
- export VARIABLE_NAME=ARTEFACT_VERSION
- !reference [.check-variable-sh, script]
- export VARIABLE_NAME=ARTEFACT_PACKAGE
- !reference [.check-variable-sh, script]
- export VARIABLE_NAME=MAVEN_SETTINGS_SERVER_ID
- !reference [.check-variable-sh, script]
- export VARIABLE_NAME=ARTEFACT
- !reference [.check-variable-sh, script]
- export VARIABLE_NAME=NEXUSREPO_REPO_ID
- !reference [.check-variable-sh, script]
- export VARIABLE_NAME=NEXUS_REPO_USERNAME
- !reference [.check-variable-sh, script]
- export VARIABLE_NAME=NEXUS_REPO_PASSWORD
- !reference [.check-variable-sh, script]
- mvn --settings $MVN_SETTINGS_FILE deploy:deploy-file -DgroupId=$NEXUSREPO_REPO_GROUP_ID -DartifactId=$ARTEFACT_ID -Dversion=$ARTEFACT_VERSION -DgeneratePom=false -Dpackaging=$ARTEFACT_PACKAGE -DrepositoryId=$MAVEN_SETTINGS_SERVER_ID -Durl=$NEXUSREPO_URL/repository/$NEXUSREPO_REPO_ID/ -Dfile=$ARTEFACT

Corresponding Documentation

## File: [.gitlab-ci-publish-to-nexus.yml](.gitlab-ci-publish-to-nexus.yml)

### Template: .publish-maven-artefact

### Target user:

This template allows users to publish maven artefacts to SHIP's Nexus Repo as the main job script.

### Add on implementation should go in:

* before_script
* after_script

### See example/s:

* Nexus-Pull-And-Publish-MVN: [Sample usage code](../gitlab-ci/Nexus-Pull-And-Publish-MVN.yml) and [Documentation](../gitlab-ci#example-nexus-pull-and-publish-mvn)
* Maven-Artefact-Checksum-Verification: [Sample usage code](../gitlab-ci/Maven-Artefact-Checksum-Verification.yml) and [Documentation](../gitlab-ci#example-maven-artefact-checksum-verification)
* [supporting-apps/maven-simple-app](../supporting-apps/maven-simple-app)

### Variable/s to set:
|Key |Required |Description |
|----------------|-----------|-----------|
|MAVEN_SETTINGS_SERVER_ID | Y | Server id in settings.yml.|
|NEXUSREPO_REPO_ID | Y | Nexus repo name that your team uses.|
|NEXUSREPO_REPO_GROUP_ID | Y | Directory structure storing your artefact specified like Java's package name rules.|
|MVN_SETTINGS_FILE | Y | Authentication details to access Nexus Repo. Check out an [example](../supporting-apps/maven-simple-app/settings.xml).|
|ARTEFACT | Y | Path to artefact.|
|ARTEFACT_ID | Y | Artefact id.|
|ARTEFACT_VERSION | Y | Artefact version.|
|ARTEFACT_PACKAGE | Y | Artefact package. Common values - jar/war. |

### CICD Settings/Variables to set:
|Key |Required |Description|
|----------------|-----------|-----------|
|NEXUS_REPO_USERNAME | Y | Service account username required in settings.yml to authenticate with Nexus Repo.|
|NEXUS_REPO_PASSWORD | Y | Service account password required in settings.yml to authenticate with Nexus Repo.|

### Known issue/s:
NA

### Reference/s:
NA

Since these two files are located in a different location, the process of integrating a template into a pipeline is not as seamless as it could be. Here’s an example work flow of a developer trying to integrate a template into their pipeline.

  • Finds a template that they want to use
  • Copy the source code of the template
  • Paste the source code into their pipeline
  • Read the documentation to understand how to use the template
  • Make changes to the template to suit their needs
  • Test the pipeline
  • Deploy the pipeline
  • Check if the pipeline is working as expected
  • Go back to square one if the pipeline is not working as expected

Now we understand how cumbersome this workflow is, wouldn’t it be great if we could just use an LLM to help us with this?

Understanding Markdown Chunking and Token Efficiency

Many of the internal documentation are written in markdown and are stored in a git repository. The backend then ingest these markdown files and store it in a vector databases. However, some of the documentation is large and we will have to chunk some of them in order for them to work with the vector database and LLM. This is advantageous as it allows us to chunk the documentation into smaller pieces and allows us to have more control over the responses that we get from the LLM. However, this also means that some chunks may not have the full context of the documentation and may not be able to generate the best response. The snippet below is an example of a chunked markdown file. We typically split markdown files by headers and code blocks.

Chunk 1

# Introduction

Rust is a modern systems programming language that empowers developers to build fast, reliable, and secure software.
Designed with a strong focus on memory safety and concurrency, Rust combines low-level control over hardware with
high-level abstractions to ensure efficient and safe code execution. Originally developed by Mozilla, Rust has
gained popularity for its exceptional performance, expressive syntax, and extensive tooling support. It offers
a unique set of features such as ownership and borrowing, which eliminate common bugs like null pointer dereferences
and data races. With a growing community and an increasing number of projects using Rust, this language is changing the
landscape of systems programming, offering a compelling solution for developers seeking both performance and safety.

Chunk 2

### `Option<T>`

Implementation of `Option<T>`,

| Method | Description | Function Signature |
| --------- | --------------------------------------------------- | ---------------------------------------------- |
| `is_some` | Returns `true` if the option is a `Some` value. | `const fn is_some(&self) -> bool` |
| `is_none` | Returns `true` if the option is a `None` value. | `const fn is_none(&self) -> bool` |
| `as_ref` | Converts from `&Option<T>` to `Option<&T>`. | `const fn as_ref(&self) -> Option<&T>` |
| `as_mut` | Converts from `&mut Option<T>` to `Option<&mut T>`. | `const fn as_mut(&mut self) -> Option<&mut T>` |

Chunk 3

### `Result<T, E>`

Implementation of `Result<T, E>`,
| Method | Description | Function Signature |
| -------- | -------------------------------------------------- | ------------------------------------------ |
| `is_ok` | Returns `true` if the result is an `Ok` value. | `const fn is_ok(&self) -> bool` |
| `is_err` | Returns `true` if the result is an `Err` value. | `const fn is_err(&self) -> bool` |
| `as_ref` | Converts from `&Result<T, E>` to `Result<&T, &E>`. | `const fn as_ref(&self) -> Result<&T, &E>` |
| `ok` | Converts from `Result<T, E>` to `Option<T>`. | `fn ok(self) -> Option<T>` |

Assuming we have a prompt from the user: List down the methods that are const functions, the backend will most likely only use chunk 2 and chunk 3 since those are the most semantically related to the prompt. The final prompt might look like this:

Given the following information, respond as best as you can. If you don't know the answer to the question, you can respond with "I don't know".

### `Option<T>`

Implementation of `Option<T>`,

| Method | Description | Function Signature |
| --------- | --------------------------------------------------- | ---------------------------------------------- |
| `is_some` | Returns `true` if the option is a `Some` value. | `const fn is_some(&self) -> bool` |
| `is_none` | Returns `true` if the option is a `None` value. | `const fn is_none(&self) -> bool` |
| `as_ref` | Converts from `&Option<T>` to `Option<&T>`. | `const fn as_ref(&self) -> Option<&T>` |
| `as_mut` | Converts from `&mut Option<T>` to `Option<&mut T>`. | `const fn as_mut(&mut self) -> Option<&mut T>` |

### `Result<T, E>`

Implementation of `Result<T, E>`,
| Method | Description | Function Signature |
| -------- | -------------------------------------------------- | ------------------------------------------ |
| `is_ok` | Returns `true` if the result is an `Ok` value. | `const fn is_ok(&self) -> bool` |
| `is_err` | Returns `true` if the result is an `Err` value. | `const fn is_err(&self) -> bool` |
| `as_ref` | Converts from `&Result<T, E>` to `Result<&T, &E>`. | `const fn as_ref(&self) -> Result<&T, &E>` |
| `ok` | Converts from `Result<T, E>` to `Option<T>`. | `fn ok(self) -> Option<T>` |

---
Prompt: List down the methods that are const functions

This technique works well until your markdown files become large. In such cases, you are restricted to providing only a limited number of chunks due to the LLM’s token limit. To address this issue, we can enhance the token efficiency of certain chunks, particularly markdown tables, which tend to consume a significant number of tokens for minimal information. By utilizing the OpenAI tokenizer, we can accurately count the number of tokens. For the preceding few chunks, the token count is as follows:

  • Chunk 1: 158 tokens, 827 characters
  • Chunk 2: 361 tokens, 895 characters
  • Chunk 3: 355 tokens, 755 characters

As we can see, despite having similar number of characters, chunk 2 and chunk 3 takes up about twice as much tokens compared to chunk 1. The following screenshot below show why this is the case. We can see that many of the tokens are used to represent the markdown table such as spaces.

Token Efficiency of Tables in Markdown

With a more token efficient format for markdown tables, we can reduce cost by using less tokens to represent the same information. Smaller chunks also means that we can provide more information to the LLM, potentially improving the quality of the response.

Using mdt2json to convert Markdown Tables to Minified JSON

Minified JSON has always been used in many places such as REST APIs to cut down amount of bytes sent over the wire. We can bring that idea to markdown tables. The following is chunk 2 and chunk 3 converted to minified JSON using a library/cli I wrote called mdt2json.

Chunk 2

### `Option<T>`

Implementation of `Option<T>`,

```json
[{"Method":"`is_some`","Description":"Returns `true` if the option is a `Some` value.","Function Signature":"`const fn is_some(&self) -> bool`"},{"Method":"`is_none`","Description":"Returns `true` if the option is a `None` value.","Function Signature":"`const fn is_none(&self) -> bool`"},{"Method":"`as_ref`","Description":"Converts from `&Option<T>` to `Option<&T>`.","Function Signature":"`const fn as_ref(&self) -> Option<&T>`"},{"Method":"`as_mut`","Description":"Converts from `&mut Option<T>` to `Option<&mut T>`.","Function Signature":"`const fn as_mut(&mut self) -> Option<&mut T>`"}]```

Chunk 3

### `Result<T, E>`

Implementation of `Result<T, E>`,

```json
[{"Method":"`is_ok`","Description":"Returns `true` if the result is an `Ok` value.","Function Signature":"`const fn is_ok(&self) -> bool`"},{"Method":"`is_err`","Description":"Returns `true` if the result is an `Err` value.","Function Signature":"`const fn is_err(&self) -> bool`"},{"Method":"`as_ref`","Description":"Converts from `&Result<T, E>` to `Result<&T, &E>`.","Function Signature":"`const fn as_ref(&self) -> Result<&T, &E>`"},{"Method":"`ok`","Description":"Converts from `Result<T, E>` to `Option<T>`.","Function Signature":"`fn ok(self) -> Option<T>`"}]```

As we can see, the minified JSON format is much more token efficient. The following is the token count for the minified JSON format:

  • Chunk 2: 212 tokens, 654 characters
  • Chunk 3: 215 tokens, 633 characters

Can we do better?

mdt2json provides 2 different layouts to choose from, Structure of Arrays (SoA) or Array of Structures (AoS). If we were to use AoS, we can improve the token efficiency further. The following is the AoS format for chunk 2 and chunk 3.

Chunk 2

### `Option<T>`

Implementation of `Option<T>`,

```json
{"Method":["`is_some`","`is_none`","`as_ref`","`as_mut`"],"Description":["Returns `true` if the option is a `Some` value.","Returns `true` if the option is a `None` value.","Converts from `&Option<T>` to `Option<&T>`.","Converts from `&mut Option<T>` to `Option<&mut T>`."],"Function Signature":["`const fn is_some(&self) -> bool`","`const fn is_none(&self) -> bool`","`const fn as_ref(&self) -> Option<&T>`","`const fn as_mut(&mut self) -> Option<&mut T>`"]}```

Chunk 3

### `Result<T, E>`

Implementation of `Result<T, E>`,

```json
{"Method":["`is_ok`","`is_err`","`as_ref`","`ok`"],"Description":["Returns `true` if the result is an `Ok` value.","Returns `true` if the result is an `Err` value.","Converts from `&Result<T, E>` to `Result<&T, &E>`.","Converts from `Result<T, E>` to `Option<T>`."],"Function Signature":["`const fn is_ok(&self) -> bool`","`const fn is_err(&self) -> bool`","`const fn as_ref(&self) -> Result<&T, &E>`","`fn ok(self) -> Option<T>`"]}```

As we can see, the AoS format is even more token efficient. The following is the token count for the AoS format:

  • Chunk 2: 191 tokens, 520 characters
  • Chunk 3: 194 tokens, 499 characters

Below is a table showing the token efficiency (chunk 2) of the different formats. As we can see, the AoS format is the most token efficient at about 47% less tokens compared to the markdown format.

Token Efficiency of Minified JSON (AoS)

Evaluating the impact of token efficiency on RAG

One of the concern that I had when I was working on this project was whether the token efficiency of the markdown tables would have a significant impact on the performance of RAG. We will be using the prompt below to evaluate the impact of token efficiency on RAG.

Prompt: List down the methods that are const functions

Markdown

The methods that are const functions are:
- is_some in Option<T>
- is_none in Option<T>
- as_ref in Option<T>
- as_mut in Option<T>
- is_ok in Result<T, E>
- is_err in Result<T, E>
- as_ref in Result<T, E>

Minified JSON (SoA)

The methods that are const functions in the Option<T> implementation are:

- is_some
- is_none
- as_ref
- as_mut

The methods that are const functions in the Result<T, E> implementation are:

- is_ok
- is_err
- as_ref

Minified JSON (AoS)

The methods that are const functions are:

- is_some in Option<T> (Function Signature: const fn is_some(&self) -> bool)
- is_none in Option<T> (Function Signature: const fn is_none(&self) -> bool)
- as_ref in Option<T> (Function Signature: const fn as_ref(&self) -> Option<&T>)
- as_mut in Option<T> (Function Signature: const fn as_mut(&mut self) -> Option<&mut T>)
- is_ok in Result<T, E> (Function Signature: const fn is_ok(&self) -> bool)
- is_err in Result<T, E> (Function Signature: const fn is_err(&self) -> bool)
- as_ref in Result<T, E> (Function Signature: const fn as_ref(&self) -> Result<&T, &E>)

The responses generated by the LLM for the 3 different formats are very similar. The only difference is that the responses that used minified JSON formats are more verbose. While this testing is not conclusive, it does show that the token efficiency of the markdown tables does not have a significant impact on the performance of RAG. If you are replicating this, you will have to do more testing to confirm this as LLMs are not deterministic and they can be sensitive to the prompt.

Conclusion

While the final output of the markdown is not as human readable compared to using tables, it’s meant to be preprocessed before being passed to the LLM. The original document is retained for normal human reading. The minified JSON format is more token efficient than the markdown format. In addition, it also produced similar results to the markdown format when used with RAG. Hence, I don’t see any reason to use the markdown tables over the minified JSON format.

It has been an interesting journey working on this project. I have learnt a lot about the different formats and how they can be used to improve the performance of LLM. I also learnt that with an AST (mdt2json), it’s much easier to generate the different formats compared to using regex based parsing.

PS. I would like to thank my supervisor Liyana Fauzi for proof reading this blog post!

--

--

Budi Syahiddin
Government Digital Products, Singapore

Aspiring Software Engineer interested in efficiency and correctness. Undergraduate Computer Science student @ NTU Singapore