Toolkit to Calculate /Predict the Cost of Using OpenAI’s API Models.

Published in

Kariyer.net Tech

6 min readJul 13, 2024

(Our experiment in Kariyer.net)

Intro

OpenAI (GPT models) and other large language models use a sequence of characters called tokens to charge you. These models don’t see text the way we do; instead, they see it as a series of numbers, or tokens.

In very basic terms, the method to convert text (words or tokens) into binary or vice versa is called Byte Pair Encoding (BPE). This method combines or separates tokens to make the processing more efficient.

Token is a basic unit that fall somewhere between characters and words. For instance, the word “tokenizing” might be split into “token” and “izing” as two separate tokens.

Keep in mind aslight change in wording can lead to a different tokenization and, consequently, a different response from the model. Knowing how to structure your prompts to get the desired results involves a good grasp of tokenization principles. OpenAI models learn to understand the statistical relationships between tokens. A sequence of tokens leads to the generation of the next token.

The cornerstone of pricing of the model you use is based on tokens. Clearly, the more tokens you use in your input or produce output, the higher the cost.
This makes understanding tokens crucial not just for using AI effectively, but also for managing costs efficiently. Let’s discuss what is the best toolkit for calculation of the cost of OpenAI. I mean the working with APIs!:

First stratrgy

You have two options based on your production by which you can stream or batch the OpenAI models. Batch is more economical and 50% lower price for tasks like clustering, classification or any thing that you do not need for a real-time responses.

To learn more about Batch Pricing you can visit: https://platform.openai.com/docs/guides/batch/overview

Second: Metrics

You should have some pre-defined metrics to track regularly in time period. For example size of token (context or generated), limits etc. Consider, if you take into regard metrics like

Token Usage:

Total Tokens Consumed: the total number of tokens used in API calls.
Tokens per Request: the average number of tokens used per API request.
Input vs. Output Tokens: Differentiates between tokens sent in the request and tokens received in the response.

API Latency:

Response Time: Measures the time taken for the API to respond to a request.
Request Processing Time: Tracks the time taken by the model to generate a response.

Cost Metrics:

Cost per Token: Calculates the cost incurred for each token used.
Total API Costs: Tracks the overall expenditure on API usage over a specific period.
Cost per Request: Measures the average cost per API request.

Usage Patterns:

Request Frequency: Monitors the number of API requests made over time.
Peak Usage Times: Identifies the times when API usage is highest.

Performance Metrics:

Accuracy and Quality of Responses: Evaluates the relevance and correctness of the responses generated by the model.
Error Rates: Tracks the number of failed or erroneous requests.

Scalability Metrics:

Concurrent Requests: Monitors the number of simultaneous API requests being handled.
Throughput: Measures the number of requests processed per unit of time.

These can be some pre-defined metrics you can use for any LLM models.

Tokenizer Playground:

Before writing any prompt, check the token size here! This shows how a single character can be counted as a token. However, do not remove delimiters if you need them, as they can be important for guiding your prompt as system or role indicators.

Here, the split of colors shows the unit of tokens. For instance, the “!” symbol can have a different location if we add a single space.

You can use Tokenizer to see the size of your prompt or output. As OpenAI says, the general rule is that one token corresponds to about 4 characters of text for common English text. This means roughly 3/4 of a word (so 100 tokens are about 75 words).

If you read carefully the reference of OpenAI, you can easily know the best tacticts and with error and trial you will have your optimum size of prompt. For output you can play with Max_Token parameters.

2. Tiktoken Library

This is at least my bible to learn token!. A github repo as https://github.com/openai/tiktoken can help you to understand the functionality of tokens. Knowing the number of tokens in a text string helps determine (a) if the string is too lengthy for a text model to handle and (b) the cost of an OpenAI API call, as pricing is based on token usage This library can help you to count tokens or encoding it. You can check libraries based on the stack you use:

Tokenizer libraries by language

For cl100k_base and p50k_base encodings:

Python: tiktoken
.NET / C#: SharpToken, TiktokenSharp
Java: jtokkit
Golang: tiktoken-go
Rust: tiktoken-rs

For r50k_base (gpt2) encodings, tokenizers are available in many languages.

Python: tiktoken (or alternatively GPT2TokenizerFast)
JavaScript: gpt-3-encoder
.NET / C#: GPT Tokenizer
Java: gpt2-tokenizer-java
PHP: GPT-3-Encoder-PHP
Golang: tiktoken-go
Rust: tiktoken-rs

3. openai-cost-tracker 0.5 (Python Library)

You can use the lightweight wrapper that helps you track the cost of each request. This tool not only handles your API interactions but also calculates how much each request will cost you. You need to install package in your IDE.

Installation

pip install openai-cost-tracker

Usage

Import the query_openai function from openai_cost_tracker:

from openai_cost_tracker import query_openai

You can follow the instructions here: https://pypi.org/project/openai-cost-tracker/

4. Dashboard of Platform.OpenAI (Excel, GoogleSheet or Shiny!)

4.1 Logic

The simple way is that you can export your activity logs periodically to track and analyze your costs based on the API keys you use. This allows you to get a comprehensive view of your expenditure. By employing simple calculation functions like count or sumif, you can precisely compute the costs associated with each individual API key. This approach helps in managing and optimizing your API usage efficiently.

The pricing model of OpenAI, platform.openai.com (Last update: July 2024)

4.2 Real Time Dashboarding

The best part is you can create a real-time dashboard with Shiny in Python. If you can fetch with import or maybe directly adding to a Shiny for Python, you can analyze more details.

You can check and learn shiny here: https://shiny.posit.co/

An example of endpoint: Connect (https://api.openai.com/v1/dashboard/billing/usage) to a Shiny application, you can use httr package to make HTTP requests and retrieve the data or in Python as def.

def fetch_api_data(api_key, start_date, end_date):
    url = "https://api.openai.com/v1/dashboard/billing/usage"
    headers = {
        "Authorization": f"Bearer {api_key}"
    }
    params = {
        "start_date": start_date,
        "end_date": end_date
    }

5. OpenAI observability tools (with extra cost for you!)

Not only can you monitor incidents, but you can also use the following tools to analyze more details of cost. Here are some available platforms or open-source tools:

a) Grafana: you can monitor, check and track incidents of openai models here. Check: https://grafana.com/solutions/openai/monitor/

b) new relic : To integrate OpenAI Observability alerts with your favorite company communication tools like Slack, or others, you typically follow incidents and cost volume.

c) whylabs: you can may use more functions to monitor or control.

d)Lakera: has a well-written guide for monitoring and observability of LLM models if you use more than one models.

e)Guardrails: Guardrails is a Python framework designed to enhance the reliability of AI applications. It achieves this by performing two main functions: These guards detect, quantify, and mitigate specific types of risks present in your application. They help ensure that inputs and outputs are managed in a way that minimizes potential vulnerabilities and errors.. This capability helps in organizing and utilizing data effectively within AI applications.

And for sure, if you search you can find more like Langchain or other open-source tools on Huggingface. But my motivation is just help you to find the suitable tools or approach you may want to use.

Final tip: You can use OpenAI CookBook or Forum to learn more. If you keep you updated, I am sure you will learn more tips there!.

More Resource may you want to look at:

https://lilianweng.github.io/

https://github.com/openai/tiktoken/blob/main/README.md

https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them

https://neptune.ai/blog/tokenization-in-nlp

https://tiktokenizer.vercel.app/

https://bea.stollnitz.com/blog/how-gpt-works-technical/

Shiny Templates

Build interactive web applications easily with the power of Python's data and scientific stack.

shiny.posit.co

Monitor your OpenAI usage with Grafana Cloud | Grafana Labs

Monitoring OpenAI usage can help control costs, optimize resources, identify models, and more. Learn how to easily get…

grafana.com