10 Things People Get Wrong About AI models (Don’t Be One of Them!)

Olejniczak Lukasz
Google Cloud - Community
17 min readMay 29, 2024

Jackie Stewart, the legendary three-time F1 world champion, coined the term “Mechanical Sympathy” to describe the harmonious collaboration between driver and machine. The basic idea is that Formula One driver doesn’t need to know how to build an engine to be F1 champion, but he/she needs to have mechanical sympathy which is understanding of car’s mechanics to drive it effectively and get the best out of it. The same applies to AI, we don’t need to be AI experts to harness the power of language models but we need to develop a certain level of “mechanical sympathy” with AI to work in harmony, not just as users, but as collaborators who understand the underlying mechanics.

However, the world of artificial intelligence is a captivating blend of scientific breakthroughs and speculative fiction. It’s a realm where innovation and imagination intertwine, often blurring the lines between reality and fantasy. As a result a cloud of myths and misconceptions swirls around AI, making it difficult to separate fact from fiction. My goal in this blog post, is to debunk common misconceptions hoping it will help you develop mechanical sympathy with AI.

Myth №1 — AI models learn everything by eating up all the stuff they find online.

It’s a common misconception that AI models constantly consume every bit of data they encounter on the internet, similar to Pac-Man devouring everything in its path. This image, while vivid, is inaccurate. AI models are trained on large amounts of data, but this involves a very sophisticated data curation and engineering process. It's far more than simply feeding raw data into a model. Just like crude oil, raw data has immense potential value, but it needs refinement to become truly useful. If left unprocessed, it’s merely digital sand — abundant, ubiquitous, but lacking inherent worth.

Therefore while many believe that AI models continuously learn and evolve just by absorbing raw data, the reality is that models we interact with are static snapshots of a learning process. They are trained on a fixed dataset and then deployed, with their capabilities largely frozen at that point.

Myth №2 — The AI Encyclopedia: Is Everything on the Internet Stored Inside Language Models?

While large language models are trained on vast amounts of data, that data may not cover every possible topic or scenario. If a model hasn’t been exposed to certain information during training, it won’t know about it.

Also, AI models don’t store everything verbatim. Instead, they learn patterns, relationships, and associations between tokens representing text, images, audio and videos. Think of it as learning the grammar and vocabulary of a language, not necessarily memorizing every text ever written.

These models are specifically designed to generate human-like text based on the patterns they’ve learned during training and as such they can answer questions, summarize information, or even create content, but they don’t have a perfect recollection of every piece of information they’ve encountered. If you think of them as AI Encyclopedia the reality is that even the most sophisticated AI model, have their limitations. Their knowledge isn’t a solid block; it’s more like a piece of Swiss cheese.

This is especially true if you take into account that world is constantly changing, and new information is emerging all the time. Language models cannot keep up with this pace in real time. They rely on periodic updates and retraining to stay relatively current, but there will always be a gap between the most recent information and what the model has been trained on.

AI is still a rapidly developing field, and researchers are constantly working to address these limitations. Many consider language models as knowledge compression algorithm that has some flaws but with further advancements in AI the “holes” in models knowledge will gradually shrink. But for now, it’s important to remember that they’re not all-knowing, and to use them with a healthy dose of skepticism and critical thinking.

We have techniques to deal with these limitations. For example the limitations of static model knowledge can be addressed by injecting into model context the most relevant information from external knowledge repositories. This approach, known as retrieval augmented generation (RAG), has become increasingly popular. It leverages the wealth of information stored in various sources like Google Search or company systems to enhance the model’s responses. High quality external knowledge sources are therefore a key components of every AI application because they can be used not just as a primary source of information but also as grounding mechanism for the generated responses. Specifically, when a model generates a response, the RAG pipeline can cross-reference it with information from external sources reducing the risk of hallucinations or misinformation.

Very sophisticated grounding capabilities are quite unique differentiator for Google Cloud Vertex AI users who can enable grounding for generated responses against Google Search or .. private Google Search built on company data (Vertex AI Search):

All these show that language models are not static knowledge databases in the traditional sense. They are dynamic information synthesizers. They don’t simply store and retrieve facts; they process, interpret and generate quite coherent, and contextually relevant responses by synthesizing the information retrieved from external knowledge repositories and their pre-existing knowledge based on patterns and associations between tokens learned from massive amounts of data. Still, very sophisticated prompt engineering techniques and mechanism like grounding help to reduce hallucinations and make the generated responses more factual.

Myth №3: The AI Illusion: Are AI Advancements Just Marketing Smoke and Mirrors, or Are There Real Breakthroughs?

While new AI models are indeed released frequently, most of these releases are incremental improvements rather than groundbreaking innovations. They often build upon existing architectures and techniques, introducing smart optimizations that enhance efficiency in training and/or serving of models. With the rapid pace of AI development, it’s crucial to distinguish major breakthroughs from the noise. Google’s contributions to AI have definitely been groundbreaking, far from mere marketing hype. Their Attention Is All You Need paper introduced transformers, a revolutionary architecture now underpinning all large language models like Gemini, GPT, LLaMA and many more.

Another breakthrough was Google’s switch transformer — novel architecture for large language models that significantly improved efficiency and scalability. Unlike traditional transformer models, which activate all parameters for every input, the Switch Transformer selectively activates a subset of parameters, referred to as an “expert,” for each input. This approach, known today as Mixture of Experts (MoE), dramatically reduces computational costs while maintaining or even enhancing performance.

Additionally, Google’s advancements in multimodality, exemplified by models like Gemini, enable seamless interaction between text, images, audio, video and other data types. Their vision of “anything to anything” AI, where models can translate between different modalities effortlessly, holds the promise of transformative applications in many domains.

Google’s recent breakthroughs include extending context windows to an astounding 10 million tokens, allowing models to process vast amounts of information simultaneously.

Why large context is sooo important? Large context allows your model to retain information from the beginning of a conversation, even after numerous exchanges. It also enables the model to process lengthy and complex documents, videos, images, and audio files simultaneously, drawing insights and connections across different modalities.

There is more. While existing RAG techniques provide the model with just the essential details relevant to user questions, they act like executive summaries, only scratching the surface. This is done to avoid overwhelming the limited context that other models have. However, Gemini breaks free from these limitations. It can dive deeper, processing significantly more online details and generating nuanced answers that take into account subtleties that would otherwise be filtered out prematurely.

While Google hasn’t released detailed information about Project Astra’s inner workings, it’s highly likely that large context windows will play a crucial role in its functionality. Project Astra aims to create AI assistants that can understand and interact with the real world in real time.

Google Project Astra https://deepmind.google/technologies/gemini/project-astra/

This requires processing and integrating vast amounts of information from various sources, including visual input, audio, and other contextual data. We can anticipate breakthroughs around:

  • Combining the video and speech input into a timeline of events
  • Continuously encoding video frames rather than importing large video batches
  • Caching information for efficient recall for long, multimodal conversations with AI assistants so that it can quickly access and utilize it in subsequent exchanges.

Myth №4: The AI Snail: Are LLMs Slow and Steady

The notion that Large Language Models (LLMs) are inherently slow is in general true for that larger, more complex models that can have slower response times due to their increased computational demands. There are however options available that prioritize speed without sacrificing performance.

Gemini 1.5 Flash, as showcased in the table from its technical report, is a prime example of this balance. It offers impressive speed, generating output characters within milliseconds even with 100 tokens in context. This makes it ideal for real-time applications where quick responses are crucial.

Furthermore, Gemini 1.5 Flash doesn’t compromise on quality or capabilities. It provides performance comparable to top-tier models while being the most cost-effective option. Additionally, it inherits all the amazing multimodal capabilities of its larger sibling, Gemini 1.5 Pro, including handling text, images, video, video with audio, audio, and PDFs.

I wanted to stress here that real-world applications often involve additional layers like safety filters and security measures (as discussed in Myth #10), therefore end-to-end latency of your system will also depend on other factors. The core speed of Gemini 1.5 Flash, however, remains a significant advantage.

By choosing the right model and optimizing its implementation, developers can create AI-powered applications that deliver exceptional performance and responsiveness, so expected by modern users and businesses.

Myth №5: The AI Patriot: Are Gemini Models Exclusively American, or Are They Available Globally?

The misconception that Gemini models are exclusively American and only accessible through VPNs from countries like Poland is inaccurate. While access was initially limited during testing phases, Gemini models are now available globally through various channels:

  • Enterprise Users, Startup and Developers: Businesses worldwide can access Gemini models through Google Cloud, allowing them to integrate these powerful AI capabilities into their products and services.
  • Researchers, Hobbyists, and Developers: Google AI Studio (https://aistudio.google.com/) provides a platform for researchers, hobbyists, and developers to experiment with and utilize Gemini models for their projects and research with no need for Google Cloud account.
Google Cloud Vertex AI Studio
https://aistudio.google.com/

Myth №6: The AI Goliath: Do You Always Need the Biggest and Baddest Model, or Can Smaller Ones Get the Job Done?

Gemini, like many other large language models, is available in different sizes (nano, flash, pro, ultra). This allows users to choose a model that best suits their needs and resources. Smaller Gemini models may be a good starting point for those who need a more affordable and faster option, while larger models may be better suited for complex tasks or applications that require a wide range of capabilities. Impressive performance of language models across a wide range of tasks is attributed to training technique called multi-task instruction tuning (or just instruction tuning).

This process involves training the model on a collection of datasets phrased as instructions with corresponding outputs. This technique has been shown to improve both model performance and its capabilities to generalize to unseen tasks. Very interesting observation is that as the scale of the model increases, the performance of the model improves across tasks while also unlocking new capabilities (so-called emerging abilities).

https://research.google/blog/pathways-language-model-palm-scaling-to-540-billion-parameters-for-breakthrough-performance/

However, when choosing an AI model for your use case, bigger isn’t always better. While large language models like Gemini Ultra offer impressive capabilities, its size often comes with trade-offs in terms of cost and speed. Think about it: GPT4 is 20 times (!) more expensive than Gemini Pro. If you have a well-defined task in mind, many times, a smaller, more specialized model like fine-tuned Gemini Pro (or Flash) can be a more efficient solution. For example, if your task is to extract values from JSON messages chances are that Gemini Pro fine-tuned with your examples of JSON inputs and expected outputs will perform as well as Ultra or GPT4.

Myth №7: The AI Extravaganza: Is Fine-Tuning a Costly, Difficult and Data-Hungry Ordeal, or Can It Be Simple and Affordable?

While pre-training and instruction-tuning of large AI models like Gemini is both quite challenging and resource-intensive, advancements in parameter efficient fine-tuning (PEFT) techniques, which let you tune instruction-tuned model have made it more accessible and affordable than ever before. Definitely, it is no longer an exclusive domain of AI experts. Google Cloud has made is as simple as it is only possible so users can focus on curating a dataset that accurately reflects their specific use case, while Google Cloud handles the intricate details of the fine-tuning process.

But Google Cloud has democratized the fine-tuning of language models, making it accessible even to those without machine learning expertise. For example, data analysts can now effortlessly customize Google’s language models using their own data within BigQuery, leveraging simple SQL commands to initiate and manage the fine-tuning process.:

Fine-tuned model is available as yet another option for developers in Vertex AI studio or as Model object in BigQuery empowering analysts to extract deeper insights and generate more valuable outputs from their domain specific data.

Parameter Efficient Fine-tuning is also available to Google AI Studio Users. Again all you need to do is to prepare a dataset that accurately reflects your specific use case:

You might be thinking, “All I need is a dataset with examples for fine-tuning, right?” But then you hear everyone saying deep learning needs massive amounts of data, and you wonder, “How many examples is enough?

The truth is, while more data can help, it’s not always about quantity. Having 100–500 high-quality examples that truly represent your specific use case is often sufficient for effective fine-tuning. Remember, quality trumps quantity here.

Then you might be thinking: Okay, so if I don’t need a ton of data, what about the cost and time of fine-tuning with all this fancy AI computing power? Is it going to break the bank and take weeks to finish?

Here’s the exciting part: fine-tuning your AI model might only cost you around $100 and take as little as two hours! Plus, the real kicker is that you won’t face any extra charges to use your newly refined model. The pricing remains the same as the base model you started with.

Isn’t that incredible? You essentially pay a one-time fee to create a customized model that’s perfectly tailored to your specific needs, while enjoying the same affordability and speed as the smaller base model. You get the best of both worlds: top-notch performance for your unique use case, combined with the cost-effectiveness and responsiveness of a smaller model.

So, stop worrying about sky-high expenses and lengthy training times. Fine-tuning your AI model is easier and more budget-friendly than you might have imagined.

Myth №8: The AI Narcissist: Is My Model Always the Best, or Are Other Models Just as Good (or Even Better)?

The notion that your AI model is always superior is a common misconception fueled by the rapid advancements and marketing hype surrounding artificial intelligence. In reality, the AI landscape is vast and diverse, with a multitude of models excelling in different areas. The only way to find out which model works best for you is to measure its performance on your use case and then consider other factors like latency, cost, ease of use, stability, availability ….. But the very first step is to measure if you assumptions hold true.

In fact when you think about it, all these shows that it’s not just about the model itself; it’s about the platform that empowers you to create trustworthy AI applications and guides you in making informed choices. Vertex AI exemplifies this, offering a comprehensive suite of tools and resources to build, deploy, and manage AI models effectively. It’s the platform that truly unlocks the potential of AI, not just the model alone.

One of the tools it offers to make informed decisions about the model is automated side by side (AutoSxS) comparison of distinct models on your inputs.

To understand how it works, imagine you and your friend are playing a drawing game. You both get the same instructions, like “draw a cat wearing a hat,” and you both draw your own versions. Now, imagine there’s a third person, like a teacher or a parent, who looks at both drawings and decides which one is better.

In the world of AI, this is kind of like how AutoSxS works. There are two AI models, let’s call them Model A and Model B. They both get the same instructions, or “prompts”. Each model writes its own response, just like you and your friend drew your own cats.

Then, there’s another AI model, called the “autorater”. This is like the teacher or parent in our drawing game. The autorater looks at both stories and decides which one is better. It does this by using a set of rules, or “criteria,” to judge things like how well the stories follow the instructions, how interesting they are, and how well they’re written:

https://cloud.google.com/vertex-ai/generative-ai/docs/models/side-by-side-eval

The autorater then tells us which model did a better job overall, and it can even explain why it chose one story over the other. It also tells us how confident it is in its decision, just like a teacher might say “I’m very sure your drawing is better” or “It’s a close call, but I think your friend’s drawing is slightly better.”

This is how AutoSxS helps us figure out which AI model is better at a particular task, like writing stories or answering questions. It’s like a fair and objective judge for AI models.

Once the rating is done you can visit Vertex AI console and see a list of your input questions, and for each question, you’ll know which model’s answer the autorater preferred. The autorater explains why it chose that answer and how confident it feels about its judgment. The win rate tells you which model overall performed better across all your questions.

Remember, even though there’s only one winner per question, both models might give correct answers. It’s just that one answer might be better based on the criteria the autorater used, like being more informative or easier to understand. It is universal truth: you can not improve things that can’t be measured so make data driven decisions. AutoSxS is a very powerful tool in your toolkit.

You may want to go one step further with LLM Comparator tool — open-source project from Google (https://github.com/PAIR-code/llm-comparator) that computes aggregated statistics that help to understand when, how and why the winner is better.

https://pair-code.github.io/llm-comparator/?results_path=https%3A%2F%2Fpair-code.github.io%2Fllm-comparator%2Fdata%2Fexample_arena.json

Myth №9: The AI Gemini Obsession: Is It All About Gemini, or Are There Other Models Worth Exploring?

It’s easy to get caught up and assume that Google Cloud requires you to use Gemini, but remember, it’s not just about the model itself. The true power lies in the platform that enables you to harness a diverse range of AI models, tools, infrastructure and operationalize your ML projects.

Vertex AI, for instance, goes beyond Gemini and offers a vast Model Garden filled with options:

  • Open-Source Models: including open-source models from Google like Gemma, or multimodal PaliGemma, but also from Mistral, Meta, Databricks and models from HuggingFace.
  • Partner Models: including models from Anthropic and others.

By exploring this diverse model ecosystem, you can find the perfect fit for your specific needs, whether it’s a smaller model for a niche task or a specialized model tailored to your industry.

So, while Gemini is undoubtedly a groundbreaking model, it’s not the only star in the AI universe. Vertex AI’s Model Garden opens up a world of possibilities, allowing you to discover and experiment with a variety of models to find the perfect solution for your unique challenges.

Myth №10: The AI Playground: Is the Internet a Safe Playground for Language Models, or Is It a Minefield of Risks?

The internet is far from a safe playground and this applies to language models as well. In fact, it’s a minefield of risks, as demonstrated by the recent massive DDoS attack that Google mitigated. This attack targeted HTTP/2-capable servers, highlighting the vulnerabilities of systems exposed to the internet.

https://cloud.google.com/blog/products/identity-security/google-cloud-mitigated-largest-ddos-attack-peaking-above-398-million-rps?e=48754805

Language models, especially those integrated into web applications and services, face similar risk but there ar also new ones specific to this new field. Here are some key concerns:

  • Prompt Injection: Malicious actors can manipulate input prompts to trick the model into revealing sensitive information, generating harmful content, or performing unintended actions.
  • Data Leakage: Users might inadvertently share Personally Identifiable Information (PII) through interactions with language model-powered interfaces, leading to privacy breaches.
  • Misuse and Abuse: Language models can be exploited to generate misinformation, propaganda, or harmful content, potentially causing widespread harm.

Google Cloud users already have the option to protect their endpoints with Cloud Armor. Soon, they’ll be able to fortify their AI solutions even further with Model Armor, a new layer of protection specifically designed to safeguard the language models powering these solutions.

Summary

The world of AI is rife with misconceptions, and the myths we’ve explored today are just the tip of the iceberg. It’s crucial to approach AI with a discerning eye, separating fact from fiction to fully grasp its potential and limitations.

We’ve learned that AI models don’t simply “eat up” all online data, but rely on curated datasets and sophisticated learning processes. They are not all-knowing encyclopedias but rather information synthesizers with gaps in their knowledge, akin to Swiss cheese.

We’ve dispelled the notion that fine-tuning is an expensive and complex endeavor, highlighting how platforms like Google Cloud have made it accessible and affordable. We’ve also challenged the idea that larger models are always superior, emphasizing the value of smaller, specialized models for specific tasks.

The internet, while a valuable resource for AI, is not without its risks. From prompt injection to data leakage, language models face various threats that require robust security measures and responsible AI practices.

Finally, we’ve debunked the myth that Gemini is the only game in town, showcasing the diverse array of models available on platforms like Vertex AI.

By understanding these nuances, we can make informed decisions about AI adoption and development, harnessing its power for good while mitigating its potential risks. As AI continues to evolve, it’s essential to stay informed and critical, separating hype from reality to unlock its true potential.

This article is authored by Lukasz Olejniczak — Customer Engineer at Google Cloud. The views expressed are those of the authors and don’t necessarily reflect those of Google.

Please clap for this article if you enjoyed reading it. For more about google cloud, data science, data engineering, and AI/ML follow me on LinkedIn.

--

--