How To: Deploy GPT2 NLG with Flask on AWS ElasticBeanstalk

Paul Watson
Mar 16, 2020 · 5 min read

At CaliberAI we use GPT2 natural language generation to aid our data annotation. We have been running a Python Flask app using Hugging Face’s Transformers on DigitalOcean. That setup was fairly simple; start a server, configure Ubuntu and Apache, deploy the Python Flask code, and leave it alone. If it stopped working I would just restart the server or SSH in and look at the logs. This is no way to run a service for the long term and as we scaled up our data annotation it became clear we needed a more resilient setup. We could have kept on using DigitalOcean but around the same time we also received some Amazon Web Services credits and realised Elastic Beanstalk could provide all of the production goodness we needed.

Now if you read the AWS Elastic Beanstalk guides you’ll think this is a very simple task and it should all just work. However we encountered quite a learning curve in getting from a hello world Python Flask app to a Python Flask app running a GPT2 machine learning model. In the end the setup is not complicated and it taught us some important Elastic Beanstalk concepts. You can see the working code here;

eb CLI

You can use the AWS console to create, configure, and deploy your code but as coders we found using the Elastic Beanstalk command line interface to fit our development practices better. On macOS you can install it with and then configure it with your AWS credentials.


An incomplete example of env.yaml

This file, in the root of your project, contains high-level configuration instructions. We highly recommend you do all your configuration and setup through files checked into your source repository (e.g. git). Many settings like instance size, default paths, logging etc. can be changed via the AWS Elastic Beanstalk console but then you’ll have to remember to do them each time you want a new environment (dev, staging, production etc.) You also do not want to SSH into a running instance to change anything because any new instances will then not have that change.

Python supported platform(s)

is an example of how Elastic Beanstalk can be inconsistent and verbose. For our needs the exact value of works and gives us Python 3.6 in a Linux OS. You can see a list of supported platforms here but make sure to always use the full Solution Stack Name. e.g. and not .

top showing memory usage

and are set to 1 each as they default to 3. Loading GPT2 models eats memory and in our case we don’t need concurrency. Elastic Beanstalk puts a load balancer in front of your instances so you can also scale out rather than up.

AWS CloudWatch showing a Python error log

directs any default logs from your instances and from the Elastic Beanstalk system to CloudWatch. This was especially useful while troubleshooting our setup and generally you want to be able to see your application logs somewhere. We didn’t find particularly up to date or useful. In our case we set the to 1 but you may want it to be at least 3 to cover weekends.

AWS Elastic Beanstalk showing instance health

for is very useful if you want to see the status of individual instances and not just an overall green, yellow, or red health indicator.

AWS EC2 T3 instance types

is what you think it is; the type of instance you want to launch. AWS has many different instance types with different CPU architectures, memory and disk sizes, network speeds etc. To run GPT2 we found worked fine with 8 gibibytes of memory. had 4 gibibyte of memory which is too little while has 16 gibibytes and would allow for a larger GPT2 dataset or concurrency. You could try out the P3 GPU instance types but they are significantly more expensive and we’d need to alter the code to use torch cuda.


This is a directory of files that can do high level instance configuration as well as lower level instance deployment commands. You can rewrite most of in a config file but we preferred the separation of instance commands and setup from configuration.

uses the directive to create a file on every instance in the specified location with the permissions and even contents that you want. In this case we are setting to the value of . We are not WSGI experts so don’t fully understand why a Flask app requires this, but it does.

If there isn’t a setting for what you need then are available. These run on every instance on every deploy and can do almost anything you need. Our ensures exists and has the right permissions and ownership for Flask and it also runs which uses to download the Hugging Face GPT2 model files into a sub-directory of . You’ll see both and when working with Elastic Beanstalk. You generally want to work in the former as it is a staging directory during deployment which is then moved to the latter on success.


By default Elastic Beanstalk looks for and an exported to run. Originally our code used and had for production serving of Flask but Elastic Beanstalk already runs its own WSGI server and it works fine. You can configure Elastic Beanstalk to use something else like and but we changed to the default conventions.

And that’s it. If you clone our code and run the instructions with your AWS Elastic Beanstalk account you should get a running HTTP API that generates some text using GPT2.

OpenAPI/Swagger UI showing GPT2 text generation example

The Startup

Get smarter at building your thing. Join The Startup’s +792K followers.

Sign up for Top 10 Stories

By The Startup

Get smarter at building your thing. Subscribe to receive The Startup's top 10 most read stories — delivered straight into your inbox, once a week. Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Paul Watson

Written by

Web-developer // Remote working CTO for CaliberAI // Formerly [@Kinzen, ChangeX, Storyful, FeedHenry] // Learning to code since 1993 // South Africa // EOF

The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +792K followers.

Paul Watson

Written by

Web-developer // Remote working CTO for CaliberAI // Formerly [@Kinzen, ChangeX, Storyful, FeedHenry] // Learning to code since 1993 // South Africa // EOF

The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +792K followers.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store