At CaliberAI we use GPT2 natural language generation to aid our data annotation. We have been running a Python Flask app using Hugging Face’s Transformers on DigitalOcean. That setup was fairly simple; start a server, configure Ubuntu and Apache, deploy the Python Flask code, and leave it alone. If it stopped working I would just restart the server or SSH in and look at the logs. This is no way to run a service for the long term and as we scaled up our data annotation it became clear we needed a more resilient setup. We could have kept on using DigitalOcean but around the same time we also received some Amazon Web Services credits and realised Elastic Beanstalk could provide all of the production goodness we needed.
Now if you read the AWS Elastic Beanstalk guides you’ll think this is a very simple task and it should all just work. However we encountered quite a learning curve in getting from a hello world Python Flask app to a Python Flask app running a GPT2 machine learning model. In the end the setup is not complicated and it taught us some important Elastic Beanstalk concepts. You can see the working code here;
Demo of deploying a Python Flask Connexion OpenAPI that generates natural language texts using Huggingface's GPT-2…
You can use the AWS console to create, configure, and deploy your code but as coders we found using the Elastic Beanstalk command line interface to fit our development practices better. On macOS you can install it with
brew and then configure it with your AWS credentials.
This file, in the root of your project, contains high-level configuration instructions. We highly recommend you do all your configuration and setup through files checked into your source repository (e.g. git). Many settings like instance size, default paths, logging etc. can be changed via the AWS Elastic Beanstalk console but then you’ll have to remember to do them each time you want a new environment (dev, staging, production etc.) You also do not want to SSH into a running instance to change anything because any new instances will then not have that change.
SolutionStack is an example of how Elastic Beanstalk can be inconsistent and verbose. For our needs the exact value of
64bit Amazon Linux 2018.03 v2.9.6 running Python 3.6 works and gives us Python 3.6 in a Linux OS. You can see a list of supported platforms here but make sure to always use the full Solution Stack Name. e.g.
64bit Amazon Linux v2.14.2 running GlassFish 5.0 Java 8 (Preconfigured — Docker) and not
Glassfish 5.0 (Docker) version 2.14.2 .
NumThreads are set to 1 each as they default to 3. Loading GPT2 models eats memory and in our case we don’t need concurrency. Elastic Beanstalk puts a load balancer in front of your instances so you can also scale out rather than up.
StreamLogs directs any default logs from your instances and from the Elastic Beanstalk system to CloudWatch. This was especially useful while troubleshooting our setup and generally you want to be able to see your application logs somewhere. We didn’t find
eb logs --stream particularly up to date or useful. In our case we set the
RetentionInDays to 1 but you may want it to be at least 3 to cover weekends.
SystemType: enhanced for
healthreporting is very useful if you want to see the status of individual instances and not just an overall green, yellow, or red health indicator.
InstanceType is what you think it is; the type of instance you want to launch. AWS has many different instance types with different CPU architectures, memory and disk sizes, network speeds etc. To run GPT2 we found
t3.large worked fine with 8 gibibytes of memory.
t3.medium had 4 gibibyte of memory which is too little while
t3.xlarge has 16 gibibytes and would allow for a larger GPT2 dataset or concurrency. You could try out the P3 GPU instance types but they are significantly more expensive and we’d need to alter the code to use torch cuda.
This is a directory of
.configfiles that can do high level instance configuration as well as lower level instance deployment commands. You can rewrite most of
env.yaml in a
.ebextensions config file but we preferred the separation of instance commands and setup from configuration.
wsgi.config uses the
files directive to create a file on every instance in the specified location with the permissions and even contents that you want. In this case we are setting
WSGIApplicationGroup to the value of
GLOBAL. We are not WSGI experts so don’t fully understand why a Flask app requires this, but it does.
If there isn’t a setting for what you need then
container_commands are available. These run on every instance on every deploy and can do almost anything you need. Our
/home/wsgi exists and has the right permissions and ownership for Flask and it also runs
download_model.sh which uses
wget to download the Hugging Face GPT2 model files into a sub-directory of
/opt/python/ondeck/app. You’ll see both
/opt/python/current/app when working with Elastic Beanstalk. You generally want to work in the former as it is a staging directory during deployment which is then moved to the latter on success.
By default Elastic Beanstalk looks for
application.py and an exported
application to run. Originally our code used
gunicorn and had
wsgi.py for production serving of Flask but Elastic Beanstalk already runs its own WSGI server and it works fine. You can configure Elastic Beanstalk to use something else like
app but we changed to the default conventions.
And that’s it. If you clone our code and run the
README instructions with your AWS Elastic Beanstalk account you should get a running HTTP API that generates some text using GPT2.