The Art of AI Brevity: Minimizing Tokens to Maximize Savings

Dylan Tullberg
𝐀𝐈 𝐦𝐨𝐧𝐤𝐬.𝐢𝐨
5 min readMay 16, 2023

In the realm of generative artificial intelligence (AI), size does indeed matter. The determining factor is not the physical dimensions or processing power, but the size of prompts and their resulting outputs. Tokens — the words, punctuations, or symbols in your AI text — directly influence the cost of operating these models.

Think of tokens as the DNA of your dialogue with the AI. To make your AI interactions economical, you need to become an adept practitioner of brevity. Consider it your Hemingway moment in AI discourse.

Generative AI LLMs have token limits. To engage in an elaborate and meaningful conversation or batch requests using the API within these constraints, your prompt must fit this limit and you may want to reduce the size of your output. So, how can we minimize token size and consequently the cost?

1. Pruning with Precision: Techniques to Reduce Prompt Size

Reducing tokens involves a careful trimming process. Like a seasoned hairdresser, you must chop off the excess while maintaining an appealing and coherent style. Here are some fundamental techniques:

1.1. Eliminate Redundancy: Take, for instance, you’re asking your AI to concoct a short story about a haunted house. Your initial prompt might read: “Could you possibly generate a short scary story for me about a haunted house that is located deep in the heart of a dense, dark forest, and it’s considered by many to be extremely terrifying?”

This prompt is teeming with redundancy. The words “scary”, “haunted”, and “extremely terrifying” express the same sentiment. The house’s location need not be specified unless integral to the story.

A succinct version of this prompt could be: “Generate a short story about a haunted house in a forest.”

1.2. Parameterization: For instance, you’re analyzing populations of different cities. Your original prompt might be: “How many people live in New York City? How many people live in Los Angeles? How many people live in Chicago?”

Rather than repeating the question for each city, parameterize the city name: “How many people live in [city]?”

By substituting specific values (New York City, Los Angeles, Chicago) with a parameter ([city]), you can adapt your prompt for various cities without adding superfluous tokens.

1.3. Generalization: Consider a prompt like: “Can you, my trusty AI, weave a compelling tale for me about a unique character, who against all odds, steps up, confronts adversity and ultimately saves the day in a spectacular fashion?”

This prompt, while vibrant and detailed, includes several redundant and nonessential tokens. The AI doesn’t need this degree of detail to grasp the task.

An abstracted prompt would be: “Write a story about a hero.” This version removes all nonessential elements, focusing on the key component: a hero. This prompt is shorter, granting the AI the flexibility to interpret the hero, the adversity they face, and their ultimate triumph.

While the revised prompt is less specific, it still directs the AI: crafting a hero’s story. It’s a perfect example of how to distill your prompts, saving tokens while achieving your intended result.

Prompt Input Size Reduction Cheat Sheet

A verbose prompt like: “Please reduce this prompt to the minimum possible length while still preserving the meaning and desired output format…” can be trimmed to: “Please revise this prompt to its shortest length, while maintaining the original meaning.

And for those with more analytical purposes, you can be even more concise: “Condense this prompt. Remove redundancies, unnecessary formats, filler words, truncate excess terms, and combine ranges. Maintain meaning.

Output Brevity: Minimalist yet Significant

For article writing, you wouldn’t want to excessively reduce output. Focus on eliminating unnecessary formatting instead. For analytical tasks, you can drastically reduce output by requesting serialized JSON or a similar format.

Remember, the output’s importance equals that of the input. Strive for precision, conciseness, and wisdom.

A Recipe for Success: Streamlined Prompts and Outputs

Consider a verbose, less optimized version of a prompt that instructs an AI to analyze and score several categories:

“I would like you to analyze XYZ input and score the following categories: Product Quality, Customer Service, and Pricing. Please use a scoring scale of 1.0 to 5.0. After that, could you also analyze and score these additional categories: Delivery Speed, Packaging, and Return Policy. Again, use a scoring scale of 1.0 to 5.0. Once you’ve completed this, I would like the output presented in a JSON serialized format.”

While this prompt is comprehensive and clear, it’s quite wordy and uses an unnecessary number of tokens. Let’s see how we can optimize this prompt:

“Scores (1.0–5.0), JSON serialized: Product Quality, Customer Service, Pricing Delivery Speed, Packaging, Return Policy. Analyze XYZ.”

Here is a concrete example of of scoring this article you’re reading with a few sentiment categories via ChatGPT.

Prompt:

Scores (1.0 - 5.0), JSON serialized, no analysis:
{formality, humor, enthusiasm}
Analyze {The Art of AI Brevity article is placed here}

Note: I added “no analysis” to remove any extra analysis that ChatGPT may provide after scoring, and so it only provides scores. Each AI may behave differently. It may seem counter intuitive, but it works with ChatGPT and Bard.

Answer:

{
"formality": 4.2,
"humor": 2.0,
"enthusiasm": 3.7
}

This streamlined version carries the same instructions as the previous one, but in a much more condensed form. The categories are listed concisely, and the scoring scale and desired format are clearly indicated. By following this format, you can significantly reduce your token usage while still obtaining the valuable information you need.

In the updated, streamlined version of the prompt, the same instructions are maintained, yet in a far more compact form and with additional attention to the output. Categories are neatly presented, and the scoring range along with the desired format are straightforwardly delineated. This example illustrates how, by adhering to such a format, you can remarkably cut down on token usage while preserving the extraction of valuable information.

However, remember that the art of brevity doesn’t stop here. The strategies presented in the article are merely the tip of the iceberg. There’s ample room for innovation and personalization in your quest to minimize both input and output. The creative prompter can continuously discover new techniques to trim unnecessary tokens, tailor instructions more concisely, and optimize output formatting according to the specific needs of each project.

In conclusion, mastering brevity in generative AI interactions not only minimizes costs but also enhances the efficiency and effectiveness of your AI models. So, channel your inner Hemingway, and make every token count!

--

--

Dylan Tullberg
𝐀𝐈 𝐦𝐨𝐧𝐤𝐬.𝐢𝐨

A forward-thinking writer with a passion for exploring the intersection of technology, business, and society.