Appreciate What You Have

Vjeran Buselic
In Search for Knowledge
11 min readAug 28, 2024

The old proverb “Don’t look a gift horse in the mouth” teaches us the value of gratitude. Originating from the ancient practice of examining a horse’s teeth to assess its age and health, the saying reminds us that when receiving a gift, the focus should be on appreciating the gesture rather than scrutinizing the gift’s quality or worth. This timeless wisdom encourages us to accept generosity with humility and thankfulness, recognizing the goodwill behind the gift instead of questioning its value. The true worth of a gift lies in the intention of the giver, not in its material value.

Even though the Trojans would not completely agree with that.

While gratitude is important, there are times when evaluating a gift can be equally wise. This perspective, brought by economist Milton Friedman, highlights that at times, a closer inspection of a gift is necessary — not to offend the giver, but to ensure the gift aligns with our needs and goals.

Friedman’s critique is rooted in the idea that government well-intended welfare programs can often lead to unintended negative consequences -create dependency, reduce individual initiative, and lead to inefficiencies due to bureaucratic management.

So, it is not us being ungrateful, just ensuring that the gift served its purpose effectively — without leading to dependency or inefficiency.

Only by fully understanding a gift’s true impact and long-term consequences, we could better utilize its value.

Let’s Look a Gift Horse in the Mouth

Let’s try to understand LLMs

If you somehow get lost into my analogies, or just want to get back on the track understanding Generative AI, consider rereading (with some new pair of eyes, developed in the meantime) Understanding Generative AI article. Especially Knowing More section where LLM is explained just at the level we need to understand it.

Spoiler alert! In this article there is larger, even better LLM overview to consider 😊

In order to understand basic principles, constrains and strengths of LLMs, we will select minimal number of building components combined with basic principles, hoping we covered all the important tigers. At least for our goal — using Generative AI as personal source for knowledge gathering.

The five concepts outlined in this article — LLMs, Tokens, Transformers, Probability, and Context — cover the core principles that users must understand to effectively use Generative AI.

1. Large Language Models (LLMs)

Basically, they are a subset of artificial intelligence family that leverage deep learning techniques, particularly neural networks, to process and generate human-like text.

These models are trained on vast amounts of text data, learning patterns, structures, and relationships inherent in the language itself.

Spoiler alert! They are trained mostly on English language!

This extensive training enables them to predict the next word in a sequence, thereby generating coherent and contextually relevant text.

LLMs use techniques like tokenization (breaking down text into smaller units) and attention mechanisms (focusing on relevant parts of the input) to comprehend and create responses.

Their performance improves with model size and the diversity of the training data, but we will not go there, as we are trying to understand common principles, not specifics and differences, which makes this understanding extremely hard.

2. Tokens

To be precise, transformers operate not in words, but tokens. You can think of tokens as pieces of words used for natural language processing. They are the fundamental units of text that the model processes. A token can be as small as a single character, a word, or even part of a word.

For English text, 1 token is approximately 4 characters or 0.75 words.

In business sense, token is the currency through which all transactions are calculated, particularly regarding token-based pricing and usage. When interacting with LLMs, it’s important to be mindful of how many tokens your inputs and the model’s outputs consume, as this directly impacts costs.

Many platforms impose limitations based on tokens, such as free-tier quotas or usage caps. For instance, a free tier might allow 10,000 tokens per month, meaning both your inputs and the model’s responses draw from this allocation, and understanding those limits is essential for navigating the pricing structures of different providers.

As free usage capacities are generally sufficient for casual, non-profit use, one should be aware of these limitations in urgent or large-scale tasks. Additionally, around-the-clock availability of LLMs may be subject to these constraints as well.

So, plan accordingly!

3. Transformer

The transformer model, a fundamental architecture behind LLMs, is built on layers of self-attention and feed-forward neural networks.

Unlike traditional models, transformers process input data in parallel rather than sequentially, allowing for faster and more efficient computations.

The architecture consists of two main parts:

— Encoder: Processes the input data (e.g., a sentence, or prompt, if you want to use the lingo) and generates a series of contextualized representations.
- Decoder: Uses these representations to generate the output (e.g., predicting the next word in a sequence — not sentence, if you proper understand its basic function).

In the context of LLMs, the decoder is often the focal point, especially in autoregressive models like GPT, which generate text one word at a time.

Self-Attention Mechanism

The self-attention mechanism is the core of transformers. It enables the model to focus on different parts of the input when generating each word.

For instance, in the sentence “The cat sat on the mat,” when predicting “mat,” the model can “attend” to “sat” and “on” to understand the context.

This mechanism calculates attention scores to determine the importance of each word in the input relative to others, allowing the model to understand relationships within the text, irrespective of word order.

4. Important Probability Concepts to Understand

LLMs rely heavily on probability to predict and generate the next word in a sequence, a process grounded in statistical language modeling. The model assigns probabilities to potential next words based on the context provided by previous words, which is essential for generating coherent text.

When processing input text, the model breaks it down into tokens (words, subwords, or characters) and converts them into numerical representations known as embeddings. These embeddings capture the contextual meaning of each token. The model processes these embeddings through layers of neural networks, using attention mechanisms to weigh the relevance of different parts of the context.

The model generates a probability distribution over its entire vocabulary for the next word, reflecting the likelihood of each word being the correct continuation of the sentence.

For instance, in the sentence “The cat sat on the _____,” the model might assign high probabilities to words like “mat,” “floor,” or “couch” based on learned language patterns.

The model then selects the next word based on this probability distribution, using strategies like greedy search, beam search or sampling, which balance precision and creativity.

The chosen word is then added to the sequence, and the process repeats for subsequent words until the sentence is complete.

Why do we need to understand these details (more in knowing more section)?

Understanding these probability-driven processes is crucial because users can have direct influence.

By controlling specific parameters, users can shape the model’s behavior, enhancing or tailoring the generated content to meet their needs.

5. The Power of Context

Context is a crucial element in how LLMs generate text, encompassing the information provided in the input prompt and any prior interactions within the same session.

Which, as we remember, is done through the Chatbot component.

The quality of the output is significantly influenced by how well the input prompt is structured — clear, specific, and relevant prompts lead to more accurate and contextually appropriate responses.

We will spend the most of our time teaching ourselves this dialogue!
YES, Prompt engineering is a dialogue in the essence, not skill of firing the right questions.

However, several limitations impact how context is managed in LLMs. One key limitation is the session length, as models typically do not retain information from one session to the next unless it is explicitly reintroduced in subsequent prompts.

Additionally, the amount of context that a model can consider is constrained by token limits, which define the maximum combined length of input and output tokens.

As the context length increases, the model’s ability to accurately recall earlier parts of the context diminishes, potentially leading to less coherent responses in longer conversations. Luckily for us, as a rule, new versions increase the limits in our favor.

To effectively use LLMs, it’s important to craft prompts that prioritize key information and manage the flow of context, ensuring that the most relevant details are clear and accessible to the model.

Tired, ha?

If you’ve made it this far and feel a bit lost, don’t worry — you’re not alone 😊

I encourage you to revisit my first column, where I discuss the importance of personal knowledge, the kind that you actively engage with and truly believe in. This kind of knowledge is often what we accumulate through our education and experiences.

For instance, we don’t personally prove that the Earth is round; we believe so because we’ve seen enough evidence — photos, scientific articles, and other credible sources.

So, if you’re feeling you didn’t get it so far, take some time to practice, read more, and reflect. And reread again. Or ask me to clarify any concept you don’t get, maybe is my bad.

You can also relax and trust in my approach; it works for me so far, and I stand by what I write. Maybe it’s worth considering.

Simple believe 😊

Knowing More

A Comprehensive Overview of Large Language Models

from June 2023, is valuable resource for anyone wanted to understand not only how models are build, trained, but also how they work. This self-contained comprehensive overview of LLMs discusses relevant background concepts along with covering a wide range of topics, including architectural innovations, training strategies, context length improvements, fine-tuning, multi-modal LLMs, and their applications in various fields such as robotics and natural language processing.

If you are looking one research paper to start from, my warm recommendation.

Constraints of LLMs

While LLMs are powerful, they have notable limitations.

First, they rely on the data they were trained on, meaning they may produce outdated or biased information.

They also struggle with understanding context beyond a certain length and can generate plausible-sounding but incorrect or nonsensical answers.

Additionally, LLMs lack true comprehension and reasoning — they mimic understanding based on learned patterns rather than genuine cognitive processing.

Ethical concerns, such as generating harmful content or reinforcing stereotypes, are also critical issues, requiring careful consideration and mitigation strategies.

Strengths of LLMs in Personalized Knowledge Tasks

LLMs excel in several tasks that benefit from personalized knowledge application. They are effective in content generation, summarization, translation, and providing contextualized explanations.

For personalized learning, they can adapt responses based on user needs and abilities, offering tailored educational content.

LLMs are also useful for brainstorming, drafting, and supporting decision-making processes by synthesizing information from multiple sources.

However, their outputs should be validated by users for accuracy and relevance, given their potential to hallucinate — generate misleading content.

Utilizing Probability Parameters to Influence LLM Behavior

Several probability-related parameters in LLMs allow users to fine-tune how the model generates text, offering control over the output’s creativity, coherence, and relevance. The key parameters that can be adjusted include temperature, top-p (nucleus sampling), and top-k. Each of these parameters plays a unique role in guiding the model’s decision-making process during text generation.

1. Temperature: This parameter controls the randomness of the model’s predictions. A lower temperature (e.g., 0.2) makes the model more deterministic by favoring high-probability words, leading to more conservative and predictable outputs. Conversely, a higher temperature (e.g., 1.0 or above) increases randomness, allowing for more creative or unexpected word choices. Adjusting temperature is particularly useful when you want to balance between precision and creativity in responses.

2. Top-p (Nucleus Sampling): This setting dynamically selects from a smaller subset of the most probable words whose cumulative probability exceeds a certain threshold (e.g., 0.9). This approach retains diversity while filtering out unlikely options, resulting in text that is both coherent and varied.
It’s ideal for generating high-quality text without the risk of incoherence.

3. Top-k: Similar to top-p, this parameter limits the selection to the top k most probable words, but instead of focusing on cumulative probability, it simply restricts the pool to a fixed number of options. This can be particularly effective when you want to constrain the model’s choices to the most relevant or contextually appropriate words.

By combining these parameters, users can exert nuanced control over the text generation process.

For example, you might set a lower temperature with a high top-p value to ensure the generated text is coherent and contextually appropriate while still allowing for some creative variation.

Alternatively, adjusting top-k alongside temperature can produce text that is both focused and original, depending on the needs of the specific application.

Understanding and manipulating these probability parameters enable users to optimize the output of LLMs, enhancing their effectiveness in generating contextually appropriate and engaging text across different scenarios.

The aim of this article is to understand which parameters are available, and how they influence the final result. More thorough, but still ‘readable’ explanation can be found in part of Langchain 101 Course — How does an LLM Generate text by Ivan Reznikov.

Implicit vs. Explicit Context in LLMs

Implicit Context from Input Sentences
When an LLM processes an input sentence, it automatically derives context based on the words and their relationships. This process relies entirely on the information provided in the prompt.

For example, if you input “The cat curled up on the warm blanket,” the model understands the context to be about a cat’s cozy behavior. The LLM uses its training to infer relationships, such as “warm blanket” being associated with comfort, to generate the next words or sentences.

Here, the context is dynamically built from the prompt and the model’s learned patterns.

Limitation: The context is limited to the scope and clarity of the input. If the input is ambiguous or lacks detail, the model may generate a less relevant or inaccurate response.

Explicit Context through Customization Settings

In models like GPT-3.5/4 or other configurable versions, you can explicitly set context through customization settings.

This might involve defining specific instructions, parameters, or background information that guides how the model interprets inputs.

For example, you can customize the model to prioritize a formal tone, focus on educational content, or assume that discussions involve technology topics. This setting influences how the model processes subsequent prompts, ensuring consistency and alignment with your goals.

Example: If you set a customization that emphasizes pet care, the model will interpret “The cat curled up on the warm blanket” with a focus on comfort and well-being, possibly generating responses about how to maintain a cozy environment for pets.

Of course, you can use both implicit and explicit context together. For example, you might set explicit context through customization to ensure the model prioritizes technical accuracy, while still allowing it to interpret specific prompts dynamically.

Context provides the necessary background, such as the purpose, audience, and key points, which guides the interpretation of content. And do not forget that it consist of both the content and the form.

Content is the actual information or message being conveyed. Form encompasses the format, structure, and style in which the content is presented.

Context refers to the circumstances or setting in which communication occurs. It consist of both content and the form, the structure, which would be different if you writing a blog, or formal report.

Only, and only when you define the context clearly, you ensure that your message is understood as intended.

In Search for Knowledge publication
Mastering Insightful Dialogue with Gen AI

<PREV AI — Simplified to The Core
NEXT> Invert, always invert

--

--

Vjeran Buselic
In Search for Knowledge

30 years in IT, 10+ in Education teaching life changing courses. Delighted by GenAI abilities in personalized learning. Enjoying and sharing the experience.