Before GenAI impact comes GenAI architecture (and infrastructure)

Several steps exist between an idea for using GenAI and a finished solution. We walk through four — backwards.

Lily Hicks
Slalom Data & AI
8 min readMay 15, 2024

--

You’ve done it. Your organization has used generative AI (GenAI) to improve the experience of your customers or users in a big way.

If you’ve played your cards right, you’ve arrived at the launch of your GenAI-powered solution, service, or experience with budget to spare. Money for marketing what you’ve made possible and continuing to make it better. You’re entering this launch feeling confident in its success and fellow stakeholders feel the same as you.

How did you get here? How do any of us get here? One way to pave your way to continued GenAI success is to use Amazon’s approach of working backwards — in this case, from GenAI-powered solution back to GenAI-friendly chip (compute matters!).

Step #4: Low-risk experimentation

To be able to bring a use case to production and feel confident it’s going to create business and customer value, chances are you’ve tested it out. Organizations can make it easier for more teams to independently explore different GenAI use cases by giving them user-friendly platforms for experimentation, or “testbeds,” that securely connect with company data sources.

At Slalom, we’ve built an accelerator — which we define as a reusable asset that combines code, tools, and processes — to help our customers quickly set up this type of testbed. “Customers can upload their documents, including ones that aren’t easily used, and then swap between different large-language models to interact with those documents in a chat-based UI,” says Stephen Nichols, director of cloud solutions at Slalom and lead creator of the accelerator. “The first thing our customers want to do is see if an LLM can make heads or tails out of these documents.”

What helps keep these documents secure is that the testbed accelerator is deployed in the customer’s own Amazon Web Services (AWS) environment. “You can see where all the buckets are, you can see the permissions on every single file, nothing’s leaving your environment,” says Nichols. Internal deployment lowers the risk of employees mistakenly sharing company data with the GenAI technology provider, which is more possible with subscription-based, externally hosted GenAI applications.

We used the testbed accelerator to help set up a GenAI platform at a major airline. The platform allows teams across the organization to experiment with different models and determine which ones work best for their unique use cases. The airline’s now moving to production with a selection of promising use cases aimed at improving customer communications.

“The key element here is that we were able to validate and confirm the value of generative AI and then develop different ways to unlock that value across the enterprise,” says Papa Ndir, a director at Slalom who helped lead our work at the airline.

The architecture for Slalom’s GenAI testbed accelerator on AWS.

Step #3: Good techniques

Just like there are many different large-language models — known more broadly as foundation models (FMs) — that can and do work well for GenAI use cases, there are many different techniques to explore. While the techniques you should focus on depend on the use cases you want to pursue, there are a few standouts.

Retrieval-augmented generation (RAG)

RAG is the process of optimizing the output of an LLM or an FM by having it reference a knowledge base outside of its training data — such as a customer data platform or a company website — before generating a response. With RAG, you essentially customize the model’s response without customizing the model itself, which is an important distinction considering the high resource and computing demands of customizing models through fine-tuning.

Our GenAI testbed accelerator works because of RAG. When customers upload their documents to the testbed, they’re uploading them to a RAG architecture, thereby equipping all the models they’re testing out with the same information to make responses more accurate and relevant.

We use RAG architecture in more than just our testbed accelerator. A major use case for RAG is when you want to create specialized chatbots or personalized search capabilities. For this reason, we’ve used RAG architecture to build a GenAI accelerator for chat-based search, which you can learn more about in our post about chat-based search on the AWS Partner Network (APN) Blog.

In the words of Jack McCush, a director at Slalom and the author of the blog post: “In essence, RAG architecture enables users to engage with search systems conversationally. It empowers users to pose questions, seek clarification, and obtain valuable information effortlessly, mirroring the experience of interacting with an expert.”

The RAG-based architecture for Slalom’s GenAI accelerator for chat-based search, originally published on the APN Blog.

Image description

Image description is just one example of the multimodality that is now possible with GenAI. Generating text descriptions of images with AI can be useful in a variety of industries, functions, and groups of people (for example: the visually impaired). In terms of industry and function, one of the ripest business opportunities for image description exists in marketing and advertising. Knowing this, we’ve built a GenAI accelerator for online marketing automation, including automating the creation of product descriptions using an image-to-text model on Amazon SageMaker. But the automation doesn’t stop there.

“After it generates the basic description, it will automatically generate marketing content for different channels like search or social media using an LLM that is hosted on Amazon Bedrock,” says Brandon Ryan, a senior engineer at Slalom and the creator of the accelerator. “Then this information will be passed along to the frontend, which is being hosted on Amazon S3. The calls between the backend and the frontend are being made with an API that’s hosted on Amazon API Gateway.”

The accelerator also gives you the opportunity to add information about your brand, such as your messaging voice and tone, as instruction for personalizing the marketing content.

A look at Slalom’s online marketing automation accelerator, which leverages image description.

Step #2: Smart storage & compute

Lest we forget: a massive amount of data puts the “large” in large-language model. Much attention is paid to the computational power required to use all that data to train an LLM or other FM. But compute is not something that only organizations building models need to think about. Even if you’re augmenting an existing FM with techniques like RAG, compute should still be a consideration. The same goes for storage.

Storage: Vector databases & search

Think about the data your organization might combine with an FM to create a solution or service that your customers will love. Now think of that data as fuel (to use a somewhat contentious metaphor). Vector databases have been referred to as the engines of the AI era, capable of turning all that fuel into a smooth-running machine generating accurate, contextual information and content. They aren’t the only storage option for GenAI, but they stand out as a storage option at least as much as RAG does as a technique — a technique that vector databases are perfect for.

So, do you need a vector database for GenAI? There’s some debate about whether you can only reap the benefits of vector databases by creating new, purpose-built databases, or whether you can achieve comparable results just by layering or embedding vector search capabilities into an existing database.

Does AWS offer a vector database? Short answer: yes! In 2021, AWS launched support for OpenSearch, the open-source search and analytics suite used for purposes like website search. AWS has called Amazon OpenSearch Service a “vector datastore,” stopping short of “database.” Terminology aside, we’ve enjoyed using OpenSearch to enable vector search for use cases such as website chatbots.

Compute: AI hardware

Must we talk about hardware? Known by such names as silicon, chips, processors, GPUs, and accelerators, AI hardware is a tangled web of loose synonyms, rising competition, and long supply chains. But it would be a shame to your water cooler game and to your wallet if you didn’t know what’s up with this stuff.

You might have heard: “Chips” and “GPUs” appear frequently in AI news headlines. So does NVIDIA, a leader in the 19-company industry that designs and produces GenAI-friendly GPUs. Slalom joined the NVIDIA Partner Network last year and cloud providers including AWS collaborate with NVIDIA to offer GPU-based technology solutions.

You might not know: GPUs aren’t the only computing solution for GenAI. For example, AWS designs two chips, or accelerators (not our kind of accelerators, though!), specifically for AI/ML applications: AWS Trainium for training models, and AWS Inferentia for inference. Inference is the underappreciated process after training an AI model where the model applies its training data to respond to new inputs. Inference isn’t just for GenAI; it’s for ML overall. For example, we helped Finch Computing use Amazon EC2 Inf1 Instances to release its product in four new languages while reducing its inference costs by more than 80%.

“Given the cost of GPUs, we simply couldn’t have offered our customers additional languages while keeping our product profitable,” says Finch’s founder and chief technology officer, Scott Lightner, in an AWS case study. “Amazon EC2 Inf1 Instances changed that equation for us.”

Why care about hardware? That’s why.

Step #1: Solid strategy

Testbeds will only ever be as good as the ideas you tuck under the covers. While the purpose of this article is to illustrate — literally — the technology infrastructure, architecture, and techniques that help organizations go from GenAI ideas to GenAI prototypes and products, learning how to come up with good ideas and evaluate them before you even do a POC is arguably the most important part of succeeding with GenAI. You don’t need technology to do that; you need strategy. Business alignment. A way to calculate ROI. Employees who understand how the technology works — or at least how to use it to create value.

And even in the realm of technology, there are more steps to GenAI success than choosing smart storage and compute options, using good techniques, and providing internal platforms for experimentation — especially if your data is disorganized. But we predict that these steps (with a solid AI strategy!) will stay relevant as GenAI continues to evolve. So, come what may, we invite you to come back to this article whenever you’re searching for some direction. Good luck out there!

PS: AI can help us humans do more of what we do best: care. Care to know more about safely implementing GenAI as an AWS user? We have a collection of insights on our website — check it out.

Slalom is a next-generation professional services company creating value at the intersection of business, technology, and humanity. Learn more and reach out today.

--

--