Sitemap

LLM Cost Estimation Guide: From Token Usage to Total Spend

7 min readMay 22, 2025

--

Nowadays, there is a lot of buzz around large language models (LLMs) such as ChatGPT, Gemini, Mistral Large, and others.

If you are a Data Scientist — or aspiring to become one — and are planning to build an LLM-based solution, then estimating the cost of using LLMs is a critical part of your project planning.

This article provides a practical guide on how to estimate LLM costs without actually running the model, helping you make informed decisions early in the development process.

Photo by Annie Spratt on Unsplash

How LLMs Process Texts

LLMs process texts as tokens. An LLM learns the statistical relationship between the tokens and given a set of tokens, it tries to predict the next set of tokens.

What are tokens

In the context of Natural Language Processing (NLP), tokens are the basic units of text. When you take a piece of text and break it down into smaller components, such as words or subwords, each of these components is a token.

Tokenizer (Image by Author)

Tokens are the fundamental unit of LLM.

Types of Tokenization

Types of Tokenization

There are three types of Tokenization:

  1. Character Tokenization: This method breaks the text to individual characters.
  2. Word Tokenization: This method breaks the text to individual words.
  3. Subword Tokenization: This method breaks text into smaller subword units. Large or complex words are split into meaningful parts. For example, the word “sleeping” might be tokenized as “sleep” and “##ing” (where “##” indicates a continuation of the previous subword)

Typically, LLMs are employed with Subword tokenization.

Counting the Tokens in LLMs

I will be using below prompt and output (taken from gpt-4o) as a running example in this section.

Input Prompt:

''' As an AI assistant, your task is to take out the city name from the given sentences. 
Please output only the city names from each sentence. If there is no city present, then otuput NA.
Please output the key of each sentence also. Output should be in JSON format.
Below are the list of the sentences: \

{"0": "I live in Bangalore.", "1": "NewYork Times is a great media company", "2":"The Area my home is 1,600.50 sq. feet" ]'''

gpt-4o Output:

'''{
"0": "Bangalore",
"1": "NA",
"2": "NA"
}'''

1. General Rule of Thumb:

According to Open AI, a helpful rule of thumb is that ~750 words correspond to 1000 tokens.

You can put the prompt into any word counter tool (example: word doc), and count the number of words into it.

It is to be noted that, This rule of thumb works well when you have a simple paragraph in the prompt. If the prompt contains lot of punctuations, delimiters etc. then the tokens will be very high.

Below is calculation based on General Rule of Thumb:

### Token Count Estimation

- Rule of thumb:
~750 words ≈ 1000 tokens
⇒ 1 word ≈ 1.33 tokens

- Total words in the prompt: 77

- Calculation:

77 words × 1.33 tokens/word ≈ **102 tokens**

Hence Approximate Token Count: 100–110 tokens

2. Open AI’s Token Counter

You can put the prompt into Open AI’s open source tokenizer here.

Similarly, there are several websites available where you can input your prompt and select the model you are using to get accurate token counts.
Example: https://www.prompttokencounter.com/, https://tokencalc.com/ etc.

3. Open AI’s chat.completions

We can also leverage OpenAI’s built-in response.usage property to calculate the number of tokens used. Please refer below sample code:

import os
from openai import AzureOpenAI
from dotenv import load_dotenv

# Load environment variables from a .env file
load_dotenv()

# Load Azure OpenAI credentials and model config
azure_endpoint = os.environ.get("EMBEDDING_AZURE_OPENAI_ENDPOINT")
openai_api_version = os.environ.get("AZURE_API_VERSION")
api_key = os.environ.get("EMBEDDING_AZURE_OPENAI_API_KEY")
model = os.environ.get("AZURE_OPENAI_DEPLOYMENT")

# Initialize AzureOpenAI client
client = AzureOpenAI(
api_key=os.environ.get("OPENAI_API_KEY"),
api_version=os.environ.get("OPENAI_API_VERSION"),
)

# System and user prompts
sys_prompt = "You are a helpful AI assistant"

user_prompt = '''As an AI assistant, your task is to take out the city name from the given sentences.
Please output only the city names from each sentence. If there is no city present, then output NA.
Please output the key of each sentence also. Output should be in JSON format.
Below are the list of the sentences:
{"0": "I live in Bangalore.", "1": "NewYork Times is a great media company", "2":"The Area my home is 1,600.50 sq. feet"}'''

# Chat message history
messages = [
{"role": "system", "content": sys_prompt},
{"role": "user", "content": user_prompt}
]

# Make the chat completion request
response = client.chat.completions.create(
model=model,
messages=messages,
temperature=0,
)

# Extract the generated content
output_text = response.choices[0].message.content
print("Output:")
print(output_text)

# Extract token usage info
usage = response.usage
prompt_tokens = usage.prompt_tokens
completion_tokens = usage.completion_tokens
total_tokens = usage.total_tokens

print("\nToken Usage:")
print(f"Prompt Tokens: {prompt_tokens}")
print(f"Completion Tokens: {completion_tokens}")
print(f"Total Tokens: {total_tokens}")

Output:

// Output
```
{
"0": "Bangalore",
"1": "NewYork",
"2": "NA"
}
```

// Token Usage
Token Usage:
Prompt Tokens: 123
Completion Tokens: 30
Total Tokens: 153

Cost Calculation for a LLM based System:

1. LLM Based Sytem Description

We consider a recommendation system that leverages large language models (LLMs), specifically GPT-4o / GPT-4o-mini model and text-embedding-3-large, across the following four stages:

Step 1: Image Description Generation

  • An image is processed by a multi-model LLM along with text input.
  • The model generates a descriptive caption based on the visual content.
  • Includes image token cost and text prompt/completion cost.

Step 2: Embedding Generation

  • The generated caption is sent to the text-embedding-3-large model.
  • This produces vector embeddings used for semantic search or similarity.
  • Cost is based on input tokens only.

Step 3: Chunk Summarization

  • A large body of text (e.g. document or transcript) is divided into chunks.
  • Each chunk is summarized by the LLM and extract meaning.
  • Significant input and output tokens contribute to cost here.

Step 4: Final Recommendation Generation

  • Summarized content is provided to the LLM to generate actionable recommendations.
  • This step has a higher output token cost due to the detailed response generated.

2. Cost Calculation Table:

We now calculate the cost to process a single query end-to-end. Please refer below tables for the same.

Cost calculation using gpt-4o model
Cost calculation using gpt-4o-mini model

Please note that we have assumed the approximate number of input and completion token counts for each step.

Hence, the cost to process one query is $0.1082 (GPT-4o) and $0.0136 (GPT-4o-mini). Notice the difference in image input token costs between GPT-4o and GPT-4o-mini.

Now, considering 50 daily active users and 20 queries per user daily, please refer to the table below for the monthly cost calculation:

| Metric                          | GPT-4o         | GPT-4o-mini    |
|---------------------------------|----------------|----------------|
| Cost per Query | $0.1082 | $0.0136 |
| Number of Users | 50 | 50 |
| Queries per User per Day | 20 | 20 |
| Total Queries per Day | 1,000 | 1,000 |
| Days per Month | 30 | 30 |
|---------------------------------|----------------|----------------|
| Total Monthly Queries | 30,000 | 30,000 |
|---------------------------------|----------------|----------------|
| **Total Monthly Cost (USD)** | **$3,246.00** | **$408.00** |
|---------------------------------|----------------|----------------|

Note: The cost per query values are illustrative estimates. Actual costs may vary based on specific usage patterns and token consumption

Hence total monthly cost comes out to be $3246 (with gpt-4o model) and $408 (with gpt-4o-mini model).

Similar calculations can be done with other models also.

Here is a tabular comparison of input and completion (output) token costs per million tokens for 10 popular Large Language Models (LLMs) based on the latest API pricing data from multiple sources:

LLM API Cost Comparison

Conclusion

This article provided a practical framework for estimating the cost of LLM-based systems, using a recommendation system as an example. By breaking down each processing step and analyzing token usage, users can better understand how input/output volumes and model choices impact overall costs.

Comparing token pricing across popular LLMs shows that while powerful models like GPT-4o offer strong performance, more cost-effective options like GPT-4o-mini may suit many use cases. With this approach, teams can make informed decisions to balance performance, scalability, and budget.

References:

  1. https://platform.openai.com/tokenizer
  2. https://www.prompttokencounter.com/
  3. https://platform.openai.com/docs/pricing
  4. https://christophergs.com/blog/understanding-llm-tokenization
  5. https://yourgpt.ai/tools/openai-and-other-llm-api-pricing-calculator
  6. https://mem0.ai/llm/compare?models=OpenAI%3AGPT+4o%2CxAI%3AGrok+2
  7. https://docsbot.ai/tools/gpt-openai-api-pricing-calculator

--

--

No responses yet