Mastering Token Costs in ChatGPT and other Large Language Models

Learn about token costs in AI model interactions for conversations and content revision. Manage usage effectively with tips and examples for cost-effectiveness.

Russell Kohn
Geek Culture
5 min readApr 1, 2023

--

MidJourney image generated from “a simple logo of a post modern female ai prompt engineer, screen-print, flat, vector — no realistic photo text — v 5”
“Prompt Engineer” — prompt by Russ, rendering by MidJourney

I. Introduction

As AI models and their APIs gain popularity, mastering token costs is crucial for efficient and cost-effective usage. This article offers a clear explanation of token costs in AI model interactions, focusing on two use cases: free-form conversations and content revisions. By the end, readers will have a solid understanding of token cost calculations and effective management strategies. Although primarily aimed at API implementers, advanced Chat users and budding prompt engineers will also find this article highly beneficial.

II. Token Basics

A token is a small piece of text that can be as short as one character or as long as one word. Tokens play a crucial role in AI model interactions, as they represent the units of text that AI models process. On average, there are about 750 words in 1000 tokens.

III. Token Costs in Conversations

When using an AI model like GPT-4 through an API, both questions (prompts) and answers (completions) have associated costs based on the number of tokens they contain. The more tokens, the higher the cost. For instance, with GPT-4, using the API currently entails a cost of $0.03 for every 1000 tokens in the question and $0.06 for every 1000 tokens in the answer. On the other hand, ChatGPT Plus subscribers are not limited by tokens but rather by message count. At the time of writing, ChatGPT Plus subscribers have a limit of 25 messages every 3 hours.

Consider the following conversation between a user and a LLM like ChatGPT or GPT-4 discussing token costs:

Screenshot of conversation between a user and an AI with eight messages discussing token costs.

Table 1 provides a detailed breakdown of token costs in this basic conversation. As the conversation progresses, the total cost increases due to the accumulation of tokens in both prompts and completions. For example, note how the last comment thanking the AI has the largest cost with the least informational value because the entire prior history of the conversation was sent to the AI along with the Thanks.

The context window, which represents the AI’s memory limit for a single conversation, may also play a role in determining costs. If a conversation exceeds the context window, the AI may “forget” earlier parts, affecting the conversation’s flow and potentially increasing token usage by requiring users to resubmit information. (For more information about Context, Tokens and Token Limits, see my article Mastering Token Limits and Memory in ChatGPT and other Large Language Models: A Guide for the Everyday User)

A table showing the costs for the eight messages of a conversation, illustrating how costs accumulate.
Table 1: Basic Conversation Token Cost Example.

IV. Token Costs in Revising Content

In a content revision process such as refining an email or revising a report or website article, token costs are determined by the tokens in the prior context, new prompt, and completion. These can grow quickly if the initial draft is long. For example, in Table 2, we see the token costs associated with an article draft and its revisions. As the draft is refined and additional suggestions are made, the total cost increases primarily due to the accumulation of tokens in the prior context (conversation history).

Table 2: Content Revision Token Cost Example

V. Cost Mitigation Strategies

In an API-mediated environment, different strategies can be employed to reduce token costs while maintaining the quality of AI interactions:

1. Utilize appropriate models: Tasks with varying complexity can be assigned to different AI models. Less expensive models can handle simpler tasks, while more complex tasks can be directed to higher-cost models for more extensive results. For example gpt-35-turbo is about 10x less expensive than gpt-4.

2. Minimize unnecessary chatter: Keeping chat sessions focused on the objective helps avoid extra token usage and costs.

3. Combine questions or information: Submitting multiple questions or providing additional information within a single prompt is more efficient than submitting them separately, except when an answer is necessary before proceeding to the next prompt.

4. Remove irrelevant context: Deleting messages from the context that are no longer needed reduces the prior context size, lowering prompt costs.

5. Reset the context: Starting a new conversation resets the context and prompt size, potentially decreasing overall token usage and costs.

6. Use a Prompt Manager or implement token cost tracking into your solution.

VI. Breakeven Analysis: ChatGPT Plus vs. API

To determine the breakeven point between using ChatGPT Plus and the API, consider the average number of messages per conversation. For example, if an average conversation consists of 8 messages, ChatGPT Plus subscribers can have approximately 1 conversation per hour.

To compare the costs, calculate the token usage for an average conversation using the API and multiply it by the number of conversations per hour. If the total API costs exceed the ChatGPT Plus subscription fee, it may be more cost-effective to choose ChatGPT Plus. Conversely, if the API costs are lower, using the API might be the better option. By comparing the costs of both options, users can make informed decisions on the most cost-effective approach for their specific use case. In many cases cost will not be the only determinant.

VII. Conclusion

Understanding token costs in AI conversations and draft revisions is crucial for efficient and cost-effective AI model interactions. By following the tips and strategies outlined in this article, users can optimize their AI interactions and API usage, ultimately saving time and money.

by Russ Kohn

https://www.linkedin.com/in/russkohn

Credits: This article was written by me with the assistance of GPT4 by OpenAI for editorial cleanup and revision, using a custom prompt pipeline in software I wrote myself. Image generation by MidJourney. Token counts were obtained from OpenAI’s Tokenizer utility. Pricing example were for GPT4 using OpenAI’s pricing page.

According to GPT-4: “The author is a seasoned software developer and web services architect with over 30 years of experience. They specialize in API integration, cloud development, FileMaker Pro, project management, and public speaking. Having implemented innovative solutions for diverse industries, the author is an internationally recognized expert in api integration and FileMaker Pro.” I’m now also doing deep dives into how AI can be safely used in the real world, which for completion models means learning how to write effective prompts and back end filters…more about that some other day.

--

--