AWS NLP Conference 2023: Key Takeaways

Published in

Credera Engineering

6 min readOct 10, 2023

Photo by Lukas: https://www.pexels.com/photo/blue-retractable-pen-574070/

Introduction

I recently attended the AWS NLP Conference 2023 in AWS’s London offices. The event is tailored for data specialists and technology leaders, with global leaders in Large Language Models (LLMs) and Natural Language Processing (NLP) in attendance. In this blog, I’m sharing some of my key takeaways from the event.

The majority of the talks, demos, and workshops were focused on LLMs and Generative AI. Given the popularity of LLMs and tools like ChatGPT, Github, Copilot, and Midjourney, Generative AI is currently considered to be in the Peak of Inflated Expectations stage of the Gartner Hype Cycle for Emerging Technologies.

To find out more about common use cases and challenges of Generative AI in NLP please refer to my other blog post:

Use Cases and Challenges of Generative AI for Text Data

For now, let’s dive in to some of the key takeaways from the AWS NLP Conference 2023.

Release of Falcon 180B language model

On the 6th of September 2023, a new open-source language model was released by the Technology Innovation Institute. Falcon 180B is a powerful language model with 180 billion parameters, trained on 3.5 trillion tokens. It was trained using a staggering number of 4000 GPUs! It’s currently at the top of the Hugging Face Leaderboard for pre-trained Open Large Language Models and is available for both research and commercial use.

This model performs exceptionally well in a number of tasks such as reasoning, coding, proficiency, and knowledge tests - even beating competitors like Meta’s LLaMA 2. The model doesn’t use Reinforcement learning from human feedback (RLHF).

Amongst closed-source models, it ranks just behind OpenAI’s GPT 4 and performs on par with Google’s PaLM 2 Large, which powers Bard, despite being half the size of the model.

Falcon-180B is accessible to developers through a royalty-free license, based on Apache 2.0. The licence includes restrictions on illegal or harmful use and requires those intending to provide hosted access to the model to seek additional consent from TII.

The model is free of charge to download, use, and integrate into applications and end-user products. You can access it here.

Given the rapid development in the NLP space, Falcon 180B may be outperformed by a more powerful model this very moment.

Model fine-tuning with Jumpstart

AWS offers a solution known as Jumpstart that provides trained, open-source models for a wide range of problem types to help you get started with machine learning. You can incrementally train and tune these models before deployment. JumpStart also provides solution templates that set up infrastructure for common use cases, and executable example notebooks for machine learning with SageMaker.

A relatively new addition to the Jumpstart family are the Foundation Models.

Foundation models are large-scale machine learning (ML) models that contain billions of parameters and are pre-trained on the vast datasets of text and image. You can perform a wide range of tasks such as article summarisation and text, image, or video generation. Because foundation models are pre-trained, they can help lower training and infrastructure costs and enable customisation for your use case. JumpStart supports popular proprietary foundation models, such as Llama-2 models, Falcon 180B, or Stable Diffusion. According to AWS, Jumpstart offers over 300 models.

List of models available in SageMaker JumpStart

You can try out popular pre-trained foundation models without the need to deploy. To get started on the latest models that are in preview or to try out models in a playground, you need to request access. You’ll receive an email once it’s ready. If you have preview access, choose Foundation models under JumpStart in the Amazon SageMaker console. Playground lets you experiment with model capabilities via the AWS console.

AWS Console interface to send prompts to a model in the playground

More information on how to try the model in Playground mode, fine-tune it, or deploy it to production can be found in the links below:

Introducing AWS Bedrock

Amazon Bedrock is a fully managed, serverless tool that makes foundation models available through an API, so you can choose from various foundation models to find the model that’s best suited for your use case. It was released in April this year and is yet to be made generally available. In the meantime, it is possible to get access to a beta release.

With the Amazon Bedrock serverless experience, you can quickly get started, easily experiment with FMs, privately customise FMs with your own data, and integrate and deploy them into your applications using AWS tools and capabilities.

Compared to the previously mentioned solution, SageMaker, JumpStart supports custom model training and utilises user-provided data. It is suitable for those who need to build models from scratch. Amazon Bedrock simplifies the process by leveraging pre-trained models, eliminating the need for custom training.

More information about Bedrock can be found here:

New AWS purpose-build accelerators for deep learning

Training, fine-tuning, or making inferences from LLMs can be time-consuming and expensive. Fortunately, AWS released a set of new compute accelerators to help with some of those challenges.

AWS Trainium: AWS Trainium is the second-generation machine learning (ML) accelerator that AWS purpose-built for deep learning training of 100B+ parameter models. Each Amazon Elastic Compute Cloud (EC2) Trn1 instance deploys up to 16 AWS Trainium accelerators. Trainium-based EC2 Trn1 instances provide faster time to train whilst offering up to 50% cost-to-train savings over comparable Amazon EC2 instances. Trainium has been optimised for training NLP, computer vision, and recommender models.

AWS Inferentia: AWS Inferentia accelerators are designed by AWS to deliver high performance at the lowest cost for your deep learning (DL) inference applications. The first-generation AWS Inferentia accelerator powers Amazon Elastic Compute Cloud (Amazon EC2) Inf1 instances, which deliver up to 2.3x higher throughput and up to 70% lower cost per inference than comparable Amazon EC2 instances.

AWS Inferentia2: Amazon Elastic Compute Cloud (Amazon EC2) Inf2 instances are purpose-built for deep learning (DL) inference. They deliver high performance at the lowest cost in Amazon EC2 for generative artificial intelligence (AI) models, including large language models (LLMs) and vision transformers. Inf2 instances raise the performance of Inf1 by delivering 3x higher compute performance, 4x larger total accelerator memory, up to 4x higher throughput, and up to 10x lower latency.

Conclusion

Despite the fact that the field is developing rapidly, we are able to see the emergence of useful tools, available via open source or large cloud vendors, that help organisations implement their AI solutions. I would highly recommend the AWS NLP Conference to anyone who is interested in AI, data, and NLP.

If you are considering starting a new Generative AI initiative but find it hard to make sense of the ever-changing AI landscape, we at Credera have vast experience in implementing AI solutions for our clients and we’ll be more than happy to help you. You can find more information on our website: Credera AI

Got a question?

Please get in touch to speak to a member of our team.