My name is Jerry… What’s my name?

Jerry Cuomo
5 min readAug 2, 2023

--

Think A.I. by Jerry Cuomo: Article Series 2023

Exploring Tokens in AI through Prompt Engineering

There’s a little game I enjoy, one that is not only amusing but also quite useful when engaging in ‘prompt engineering’ with AI models. Here’s how it goes: I start the conversation by introducing myself, “My name is Jerry.” In response, the AI, ever the courteous entity, usually responds, “Pleased to meet you, Jerry!” As the chat progresses, weaving through various topics, I periodically toss in the question, “What’s my name?” Without a pause, the AI confirms, “Your name is Jerry.” However, after the conversation stretches over a considerable span of time, the plot thickens. I pose the same question, but this time the AI seemingly hesitates, and then answers, “I’m sorry, but I don’t know your name.” Hold on… what exactly transpired here?

Welcome to the fascinating world of prompt engineering with Large Language Models (LLMs) and the curious case of tokens!

You can think of tokens as little chunks of conversation. They’re bits of words, whole words, and sometimes even include the spaces around words. Every time you chat with an AI model, it takes your input, breaks it down into these tokens, and processes them.

Tokens, essentially the building blocks AI models use to comprehend and create language, are placed into a sequence. This arrangement of tokens is then analyzed by the AI to understand the relationship between them. It’s through this process that an AI model is able to generate meaningful responses or perform tasks. Therefore, tokenization plays a vital role in how AI interacts with and understands human language.

But here’s the catch: these models have a token limit. Think of it like a glass that fills up with each token. Once the glass is full, the oldest tokens start to spill out, causing the model to “forget” them. This is exactly what happened to me, Jerry; my name just got spilled out of the AI’s memory.” Such a phenomenon is called ‘truncation’.

For English language, these tokens roughly correspond to:

1 token ~= 4 characters

1 token ~= ¾ of a word

100 tokens ~= 75 words

To illustrate how tokens count, let’s take a couple of examples (By the way, I used Hugging Face’s Lama Token Counter to do the counting; feel free to try it yourself).

Consider Wayne Gretzky’s famous quote, “You miss 100% of the shots you don’t take.” It contains 12 tokens.

A heftier piece of text like the US Declaration of Independence? That’s 1,927 well stated tokens.

And this article? A comparatively verbose 1313 tokens.

Now, why should you care about these tokens when working with LLMs? Because they’re the key to making the most out of your prompts. The better you understand how tokens work, the better your model will perform.

Let’s say you’re aware that a model can juggle a maximum of 4096 tokens. You can then strategically orchestrate your chat to make sure you don’t cross this limit. But what if there are certain details you need the AI to remember? Here are a few tactics you might try.

Remembering Important Information: Let’s say you’re having a chat with your AI and there are key pieces of information you want it to remember. Much like gently nudging a friend to remember an important point in your conversation, you need to periodically bring up these crucial details to ensure they stay fresh in the AI’s memory.

Managing the Context Window: Imagine you’re editing a movie script. You would cut out the fluff and make sure only the most meaningful parts — those pushing the story forward — are left in. It’s the same with managing a language model’s context window. You’ve got to keep only the most recent and relevant tokens (up to 4096 for many models like Llama2) in play.

Document embedding: By transforming an entire document into a singular vector representation — an “embedding”, we can, in essence, “converse with a document”. This innovative approach also provides a method to manage token efficiency. Instead of feeding the AI assistant a large document that might contain thousands of tokens, we identify relevant sections of tokenized text, pass only these to the AI, and stay comfortably within the token limit. This method not only enables more meaningful interactions but also optimizes token usage. Look out for a deeper exploration of this strategy in a forthcoming article.

Working with APIs: Many AI models provide APIs, such as Hugging Face’s Transformers API, that give you the ability to manipulate tokens directly. These APIs come with parameters like ‘temperature’ and ‘max tokens’ which you can adjust to influence the AI’s responses. For instance, if you’re penning a mystery narrative and desire more dramatic twists, tweaking the ‘temperature’ setting could result in more diverse and unpredictable AI contributions. Tokens are more than mere counts; they also influence the model’s output bias. Consider this: you’re designing an AI Baking Assistant and you’d prefer eggless recipes. By setting a negative bias for tokens like ‘egg’, ‘eggs’, and ‘gg’, you can steer the AI away from suggesting egg-inclusive recipes. Stay tuned for future articles where we’ll touch on more sophisticated strategies for model customization.

It’s all about humans and AI, not one or the other in isolation.

Mastering prompt engineering involves remembering a crucial fact: while AI tools are capable of remarkable things, they don’t relieve us of our responsibility to think, analyze, and double-check. Think of it as a dance — a partnership where you’re leading, and the AI is following. However, every once in a while, the AI might surprise you with an unexpected twirl. Be ready for it, and remember, the beauty of this dance lies in the balance between your skill and the AI’s capabilities.

Modern language models, as impressive as they are, aren’t flawless. They might occasionally lose track of details, misinterpret context, or even forget your name. This isn’t a journey where AI does all the thinking; it’s about pairing your human intuition, judgment, and skills with the AI’s capabilities.

Prompt engineering is an art form, blending an understanding of your AI partner’s strengths and limitations. Be it for text generation, prompt creation, or casual conversation, it’s crucial that you’re an active participant in the dialogue. Remember, it’s on you to ensure ‘Jerry’ doesn’t become just another forgotten token. (Apologies for the AI humor; couldn’t resist it!)

Make sure to listen to the related episode on the Art of A.I. for Business podcast, where you’ll find more insights on topics like this. Also, enjoy a bonus episode featuring the podcast’s host, Robo, the digital colleague.

References:

Selin, L. (2023, May 23). Demystifying Tokens in LLMs: Understanding the Building Blocks of Large Language Models. LinkedIn. https://www.linkedin.com/pulse/demystifying-tokens-llms-understanding-building-blocks-lukas-selin/

OpenAI. (n.d.). What are tokens and how to count them? OpenAI Help Center. Retrieved August 2, 2023, from https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them

Acknowledgments:

I wish to express my gratitude for the ongoing collaboration and learning opportunities offered by the following individuals:

Criveti Mihai — @CrivetiMihai (Twitter) — Solution Architect, Inner-source expert, and A.I. guru in IBM Consulting.

Chris Hay — IBM Distinguished Engineer and a savant of Prompt Engineering at IBM Consulting. Also the host of the popular Chris Hay YouTube Channel.

--

--

Jerry Cuomo

IBM Fellow & CTO. Innovator and instigator of AI, Automation, Blockchain, Cloud at IBM. Husband, Dad, and Bass Player.