An Introduction to Generative AI on AWS

Adithya Bodi
TrackIt
Published in
6 min readDec 6, 2023

Generative AI (also known as GenAI), is a subset of artificial intelligence that focuses on creating novel content across various domains, from art and literature to music and beyond. It has emerged as a transformative force, enabling computers to produce content that closely resembles human-generated data. Unlike traditional AI which primarily deals with pattern recognition, Generative AI thrives on the ability to generate new and unique data patterns based on existing information.

The subsequent sections offer a beginner-friendly introduction to Generative AI, defining its key terminology, and exploring the tools available within the AWS (Amazon Web Services) ecosystem for building and scaling generative AI applications.

Defining Key Terms

Artificial Intelligence (or AI) refers to the development of computer systems that exhibit intelligence, enabling them to perform tasks traditionally requiring human expertise. AI encompasses a broad spectrum of applications, ranging from visual perception and language processing to speech recognition. Its primary goal is to create machines capable of mimicking human cognitive functions.

Neural Networks are models inspired by the structure of the human brain, comprising interconnected nodes known as “perceptrons”. These nodes, organized in layers, process information and learn patterns through the adjustment of connection weights. Neural networks are fundamental to AI, serving as the foundation for various applications, including image recognition and natural language processing.

Parameters, in the context of AI, refer to the individual weights associated with the connections between nodes, or perceptrons, within a neural network. Each connection has an associated weight, governing how input values are combined as they propagate through the network’s layers. The number and values of these parameters define the model’s structure and play a crucial role in determining its performance and capabilities. In essence, parameters serve as the adjustable elements that influence how a neural network processes and interprets data.

Generative AI (or Gen AI) refers to models capable of creating media, such as text or images, based on text-based input or prompts. Unlike traditional AI which focuses on recognizing patterns, Generative AI excels at generating new data patterns. This capability is now leading significant advancements in fields such as art, literature, music, and beyond.

Large Language Models (or LLMs) are Generative AI models designed to work specifically with human language. With billions of parameters, these models can perform diverse language tasks based on a given prompt. LLMs (e.g. ChatGPT), represent a paradigm shift, as they are not task-specific and can handle a wide range of language-related applications.

Foundation Models (or FMs) are prebuilt machine learning models trained on extensive datasets, making them adaptable to various downstream tasks. A Foundation Model is characterized by its versatility, serving as a ready-made solution that organizations can seamlessly integrate into their projects, accelerating the development of advanced machine learning applications.

Pretraining involves the creation of a Foundation Model by training a model with terabytes of unlabeled text or multi-modal data. This unlabeled data, often obtained by crawling the internet, forms the basis for the Foundation Model’s understanding of patterns and features, allowing it to be fine-tuned for specific tasks.

Inference, in the context of Foundation Models, refers to the process of using a trained model to make predictions or generate outputs based on new, unseen data. It is the phase where the model applies its learned knowledge to perform tasks, such as generating text or making classifications, without further training.

Transformers are neural network architectures that lie at the heart of Generative AI. Efficient, scalable, and parallelizable, transformers are exemplified by models such as GPT (Generative Pre-trained Transformer), process entire input sequences at once during training. This architecture has significantly improved the training speed and scalability of Generative AI models.

Building Generative AI Applications on AWS

AWS’s approach to generative AI is structured around three distinct layers, each catering to specific aspects of application development and deployment.

Layer 1: Compute

Effective computation is paramount to Gen AI. AWS addresses this need with specialized “accelerator” chips — Inferentia and Trainium — designed for running Large Language Models (LLMs). These accelerators help optimize the costs associated with the development of generative AI models.

AWS Inferentia is an accelerator that powers Amazon EC2 Inf1 instances. It helps improve throughput and significantly lowers the cost per inference. This enables faster and more economical deployment of models, including large language models (LLMs) and vision transformers, with significant throughput improvements and lower latency compared to traditional Amazon EC2 instances.

AWS Trainium is a second-generation machine learning accelerator that powers Amazon EC2 Trn1 instances. It enhances the efficiency and cost-effectiveness of deep learning training for models with over 100 billion parameters, enabling faster training times and up to 50% cost savings compared to traditional Amazon EC2 instances. Trainium is particularly beneficial for tasks such as natural language processing, computer vision, and recommender models.

EC2 UltraClusters

Amazon EC2 UltraClusters enable seamless scalability to thousands of GPUs or ML accelerators such as AWS Trainium and Amazon EC2 P5. They democratize access to supercomputing-class performance for machine learning, generative AI, and high-performance computing developers through a simple pay-as-you-go model without setup or maintenance costs. Amazon EC2 P5 instances, EC2 P4d instances, and EC2 Trn1 instances (associated with AWS Trainium) are all deployed using EC2 UltraClusters.

Layer 2: Custom Models

The second layer focuses on enabling users to create their own models using Amazon SageMaker Jumpstart. This versatile tool enables users to construct custom models with ease. Leveraging the purpose-built AWS Trainium chip, users can embark on training their own Large Language Models (LLMs). Alternatively, SageMaker Jumpstart offers a curated selection of pre-trained models, allowing users to effortlessly retrain models with their unique datasets, fostering flexibility in model creation tailored to individual needs.

Layer 3: Foundation Models (FMs)

The third layer of AWS’s approach to Gen AI focuses on making Foundation Models (FMs) more accessible to users. With billions of parameters, pretraining FMs demands substantial time and resources.

AWS addresses this challenge with Amazon Bedrock, a fully managed service that streamlines the accessibility of high-performing foundation models. By offering a range of FMs, Amazon Bedrock enables users to easily build and scale generative AI applications, ushering in a new era of innovation and efficiency in model development on the AWS platform.

Additional AWS Gen AI Services

Amazon Q

Unveiled during the keynote address at re:invent 2023 by AWS CEO Adam Selipsky, Amazon Q is an AI-powered chatbot that redefines customer interaction. Beyond conventional question-answering, it leverages 17 years’ worth of AWS knowledge, enabling users to engage in dynamic conversations, generate content, and execute various actions. By connecting Amazon Q to company data through over 40 built-in connectors, businesses can tailor this chatbot to enhance operational efficiency.

AWS HealthScribe

AWS HealthScribe is a service that empowers healthcare software vendors to build clinical applications by combining speech recognition and generative AI to automatically generate clinical notes from patient-clinician conversations. Healthscribe streamlines the clinical documentation process and ensures responsible use of AI with built-in security and privacy measures.

AWS CodeWhisperer

Amazon CodeWhisperer is an AI coding companion that enhances and accelerates application development by providing intelligent assistance and security measures for coding tasks, ensuring faster and more secure software development processes.

Conclusion

The impact of Generative AI is predicted to be substantial, with Goldman Sachs forecasting a remarkable $7 trillion increase in the global gross domestic product (GDP) over the next decade. In this rapidly evolving landscape, AWS emerges as a key player, offering powerful tools and services that empower both individuals and organizations to dive into the world of Generative AI. As the demand for this technology grows, AWS’s commitment to innovation solidifies its crucial role in shaping the future of this dynamic industry, making it a go-to platform for creating and scaling innovative Generative AI applications across various sectors.

About TrackIt

TrackIt is an Amazon Web Services Advanced Tier Services Partner specializing in cloud management, consulting, and software development solutions based in Marina del Rey, CA.

TrackIt specializes in Modern Software Development, DevOps, Infrastructure-As-Code, Serverless, CI/CD, and Containerization with specialized expertise in Media & Entertainment workflows, High-Performance Computing environments, and data storage.

In addition to providing cloud management, consulting, and modern software development services, TrackIt also provides an open-source AWS cost management tool that allows users to optimize their costs and resources on AWS.

--

--