Creating Wonders: A Closer Look at the Inner Workings of Generative AI

Published in

Slalom Data & AI

8 min readSep 5, 2023

By Trupti Sanikop, Jonah Abraham, Karthik Sridhar, Brandon Mino, Archisa Sood, and Tanya Gupta

Generative artificial intelligence (generative AI, or GenAI) is the hottest topic in industry today. This technology is a powerful tool revolutionizing workflows and opening new possibilities for efficient problem-solving, personalized experiences, and enhanced productivity. Unlike past AI models, generative AI can create novel, human-readable content at scale. With generative AI, businesses and individuals can generate text, audio, video, and other forms of media with machines.

According to Bloomberg, generative AI is expected to be a $1.3 trillion market by 2032. While the roots of the technology can be traced back to the 1950s, the current boom resulted from advances in machine learning and neural network construction. Additionally, the rise of cheap, accessible cloud computing, combined with foundation models that anyone can fine-tune, has democratized the technology, allowing the public to experience generative AI themselves.

History of generative AI

As computer science rapidly advanced during the Second World War, scientists began to wonder if machines could process and output data in the same way as the human brain. British mathematician Alan Turing was particularly interested in this concept, creating the “Turing test” in 1950. Still familiar to many today, the Turing test posited that “artificial intelligence” could be tested by whether a human could distinguish between computer and human-generated conversations. Turing’s paper “Computing Machines and Intelligence” is still cited today as one of the foundational works of AI. As computing power improved dramatically through the 1950s and 1960s, a hype cycle began as scientists and futurists envisioned true artificial intelligence within the next few decades.

By the 1970s, however, the hype cycle dissolved into an “AI winter.” The fading interest in AI is best represented by the Lighthill report, which expressed a profound disappointment in AI advances to the British Science Research Council.

“Workers entered the field around 1950, and even around 1960, with high hopes that are very far from having been realised in 1972. In no part of the field have the discoveries made so far produced the major impact that was then promised.”

This cycle of hype and bust, like many burgeoning technologies, repeated several times over the next decades. However, by the 1990s, as multilayer neural networks became increasingly accessible to researchers, interest surged once again. Of particular interest is the 2017 paper “Attention Is All You Need” by the Google Brain team, which proposed adding attention mechanisms to neural networks. This allowed AI models to be trained significantly faster, and with far more data. The resulting models that were trained with this “semi-supervised” framework, such as GPT-3, became public in the early 2020s, and now are contributing to the current AI surge. Let’s look at how these models are trained.

Revealing the magic behind GenAI: Semi-supervised learning, neural networks, and transformers

Semi-supervised learning in generative AI combines unsupervised and supervised approaches to train models (see Figure 1). At its core, generative AI relies on neural networks, which are computational models inspired by the human brain. Initially, models undergo unsupervised learning, analyzing vast amounts of unlabeled data to discover patterns and gain insights. This unsupervised pretraining forms a strong language foundation, leveraging neural networks to process and understand language structures. Then the models are fine-tuned using supervised learning on labeled data, where neural networks are adjusted based on known input-output pairs, making them more accurate and applicable to specific tasks.

Image depicting supervised and unsupervised learning — *Figure 1: Example of supervised and unsupervised learning*

In recent years, the transformer architecture has revolutionized generative AI by enhancing the ability to capture long-range dependencies in language. While unsupervised learning dominates the training process, the fusion of both methods, powered by neural networks and transformers, empowers the latest generative AI models with context-awareness, creativity, and a broader range of capabilities.

Image depicting a neural network — *Figure 2: Example of learning using neural networks*

Having taken a “peek behind the curtain” of generative AI, let us now delve into foundation models and how they allow even nontechnical users to build AI tools and platforms.

Foundation models

Foundation models (FMs) are large AI models that have been trained on massive amounts of data to perform multiple downstream tasks. Instead of being trained to solve a specific task, FMs are trained to understand the very nature of an entire problem space. This approach has helped to overcome some of the drawbacks of traditional AI models across three major dimensions:

Expanded problem-solving scope
Massively improved performance
Increased availability and accessibility

FMs have created a paradigm shift in the way artificial intelligence can be used to solve several real-world problems — individuals no longer require expertise in machine learning algorithms to enjoy the benefits of AI. These models have been pretrained and can be fine-tuned as required. We can be viewed as the “consumers” of AI foundation models.

Image depicting how a Foundation Model is consumed — Figure 3: Foundation model lifecycle

Examples of foundation models

Foundation models are being developed across different problem spaces such as text and language, image, video, audio, and code.

An example of a text-based foundation model is GPT-3, developed by OpenAI. GPT-3 serves as a powerful base upon which various language understanding and generation tasks can be built and fine-tuned. It is also the driving force behind the ChatGPT application. Some examples of tasks that GPT-3 can perform are text generation, language translation, text summarization, question-answering, chatbot interactions, sentiment analysis, text completion, text classification, and other language understanding tasks. Competitors to GPT-3 include Amazon’s Titan and Claude by Anthropic.

An example of an image-based foundation model is Stable Diffusion by Stability AI. It can perform image-based tasks such as text-to-image generation, image inpainting, and image-to-image translation. It can also be easily fine-tuned with a few pictures of a new subject (even a picture of yourself!) to then generate images that contain the new subject. Other image-based foundation models include DALL-E by OpenAI and Imagen by Google. The visual below demonstrates an example on inpainting, where edits are made inside an image.

GIF showing use of Stable Diffusion — Source: Stable Diffusion

Fine-tuned models

As opposed to foundation models, which do many things well, fine-tuned models are designed to perform one task extremely well. An example of a fine-tuned model is OpenAI Codex, which was derived from GPT-3. GitHub’s Copilot service makes use of Codex to understand code context and generate code that is efficient and secure.

Generative AI in the real world — chatbots

While generative AI has created exciting new opportunities, it also offers improvements to current technologies. One such use case is using generative AI to enhance chatbots. It can be used to craft more natural and contextually relevant responses based on brand-specific data. Unlike traditional chatbots, where the conversation flows along a predefined knowledge graph based on responses, GenAI chatbots can understand and produce more humanlike responses, making the interactions feel more engaging and authentic. Additionally, the knowledge base of a chatbot engine can be expanded using generative AI capabilities, allowing it to handle a wider range of user queries. For instance, imagine the chatbot incorporating prior answers or prior chat-session data to drive the conversation. By using generative AI, chatbots can be more versatile, effective, and user-friendly.

Cloud services and generative AI

The quick rise and multiple benefits of generative AI have led to major cloud providers adopting this technology quickly. The three biggest cloud providers — Amazon Web Services, Microsoft Azure, and Google Cloud — have already established their own services and offerings to make generative AI available to their customers. These services are generally available in two formats:

As fully managed services that provide access to foundation models through API calls.
As models that can be deployed, fine-tuned, and customized on cloud computing resources.

The cloud services that can be used are as follows:

Table comparing AI offerings from cloud providers

An added benefit — since these offerings are presented as managed services in your own cloud environment, any private data used with these models stays private within your account. It only exists in the instance of the model deployed in your private cloud environment.

Self-hosted generative AI

For those who prefer a self-hosted approach with more control over their data and fine-tuning, Hugging Face provides open-source models for natural language, audio, and image processing. Driven by a community that promotes open-source contributions, it also offers paid services, counting big names like Microsoft, Apple, and Meta as consumers of its datasets and models. BLOOM, or BigScience Large Open-science Open-access Multilingual Language Model, is Hugging Face’s transformer-based LLM. BLOOM’s largest model has 176B parameters (GPT-3 has 175B parameters), supporting text for 46 natural languages and 13 programming languages.

We are starting to see other user-created self-hosted options as well. A tool called “llama.cpp” demonstrated how to run Meta’s LLaMA on a MacBook. Enterprising techies have figured out how to run LLaMA on Windows, a Google Pixel 6, and Raspberry Pi … but would you want to?

Risks of generative AI

As with any new technology, understanding the risks is essential to building safe and effective tools. A model is only as good as the data used to train it and the design of the model itself. A generative AI model does not understand the responses it is generating — it simply provides the most probabilistically accurate output based on the input prompt. Hence, not everything created by generative AI models can be automatically trusted, as one lawyer found out the hard way. Additionally, the quality of the data used to train the model largely determines the quality of the outputs generated. As a result, any inherent biases in the data are also learned by the model. This article does a great job of walking through some of the observed biases in image-based generative AI foundation models.

Conclusion

Generative AI may be the most revolutionary technology since the introduction of the smartphone. It promises to change the way we do business and interact with media. As foundational models and cloud computing become ever more accessible, even individuals can incorporate AI into their daily lives. The rapid advance of the technology in the last few years shows no signs of slowing, and the future is bright for even more exciting breakthroughs in AI.

Slalom is a global consulting firm that helps people and organizations dream bigger, move faster, and build better tomorrows for all. Learn more about Slalom’s human-centered AI approach and reach out today.