Deploying an Anime Character Generator using AWS Spot Fleet

Jesse Fredrickson

Published in

Analytics Vidhya

12 min readJan 22, 2021

Image by 🎄Merry Christmas 🎄 from Pixabay

Your scientists were so preoccupied with whether or not they could, they didn’t stop to think if they should

Abstract

In this post, I describe the process of building and deploying a Python-based Discord bot which connects to a Discord server I own, and responds to user requests with custom AI-generated images of anime characters. On the back end, I’m using a StyleGAN model implemented in ONNX, and deployed on an AWS Spot Fleet (my full cloud infrastructure uses EC2, S3, and SecretsManager). I’ll walk the reader through a high level model and implementation discussion, the code I used to make model inferences on Discord, and finally the cloud infrastructure I set up for a sustainable deployment. All of my code is available at https://github.com/jfreds91/waifu_bot

Motivation

This project started when I came across This Waifu Does Not Exist (TWDNE), a passion project by Gwern Branwen which showcases two contemporary machine learning models being used to generate a) unique images of anime characters (StyleGAN) and b) an accompanying text description for each (GPT-3). The images below are examples, and I should emphasize that these are completely original works of art generated by a neural network. Though in some cases they may bear resemblance to existing characters, they are unique — the neural network is drawing their pose, expression, hair and eye color, background, and art style all on its own, based on an input of random noise. I found this capability incredibly impressive, and thought my friends would enjoy having the ability to use this model to generate their own artwork (who wouldn’t), so set about working on co-opting the underlying model and figuring out how to deploy it myself.

This Waifu Does Not Exist example images. For the uninitiated, a “waifu” (WAIF-oo) is *“…a fictional character, usually in anime or related media, that someone has great, and sometimes romantic, affection for”*.

Model

General GAN architecture as designed by Ian Goodfellow, 2014. A general GAN discussion is beyond the scope of this post.

A bit of machine learning context. In December 2018, NVIDIA researchers published a paper on their new neural network model architecture, StyleGAN. GANs, or Generative Adversarial Networks, seek to generate high quality outputs (in this case, images) from random input, and StyleGAN was shown to be a technological leap forward which could learn to disambiguate abstract target features (like age, eye color, background color) in order to create extremely convincing images. When the model architecture was open-sourced in 2019, researchers from around the world were able to retrain the StyleGAN model to suit their own needs, and in the case of TWDNE, Gwern trained the StyleGAN model on a custom curated dataset of over 3 million images of anime characters. Gwern has written extensively on his process on his blog, here. Gwern trained StyleGAN for > 2 weeks on 4 high-end consumer-grade GPUs, and he and his collaborators exposed Tensorflow, PyTorch, and ONNX versions of the finished model. In order to use it, I simply need to download the model I’m interested in, and then write supporting code to invoke the model for inference.

ONNX adds a layer of abstraction which allows me to serve any trained model on any device, using a common runtime.

Aside — I opted to use the ONNX model. It happens to be the most compact of the three (118mb as opposed to the 320mb TF model), but I chose it explicitly to try out the ONNX runtime. ONNX is a common model interoperability platform introduced by Facebook and Microsoft in 2017, and allows for users to run framework-agnostic models by converting to its own common format. In my case it means I don’t need to worry about any of the libraries associated with training the model, because I’m using a common runtime. If someone comes along and trains a new model version using a different library, as long as they expose an ONNX version of the model, it’s just plug-and-play for me.

Note to the reader: per Gwern’s blog, this implementation of SyleGAN is actually trained on both male and female subjects — and in fact, during my experimentation I found both traditionally female and traditionally male outputs. Gwern has also exposed transfer-learning versions trained on all-male and all-female datasets, but I opted to work with the unisex version.

The anime face StyleGAN does in fact have male faces in its dataset as [Gwern] did no filtering — it’s merely that female faces are overwhelmingly frequent (and it may also be that male anime faces are relatively androgynous… [and it may be] hard to tell any difference between a female with short hair & a [male]).

Local Deploy

With the model in hand, my first task was to write code just to invoke the model. I used the onnxruntime Python library to load the model and run inference. The model takes two inputs: a [512,512] array of random noise, and a ‘truncation’ constant, which is used in the model to preprocess the noise. A truncation constant of 1 means that the model will reset any noise figures outside of 1 standard deviation down to the population mean, and has the effect of forcing the model to generate more ‘conservative’ images. Raising the truncation term affords the model more ‘artistic license’, but runs the risk of introducing artifacts.

Model inferences generated at truncation=0.25, 0.75, 1.25, respectively. Note that with increasing truncation (decreasing input-smoothing), images may get more expressive, but may introduce weird artifacts like the disfigured hand (?) on the right.

Easy enough; I can use numpy and PIL to generate input, transform output (an array) into a standard jpeg format, and save off a resulting image. (Specifically, I have to remove the batch dimension and reorder the remaining dimensions as row/column/channel before saving as a jpeg.) I’m now able to generate anime characters on my laptop whenever I want, and admittedly I spent a while doing just that.

Next, I wanted to share this newfound power with my friends. I have an active Discord server where I hang out with friends, and I decided to write a simple bot that would live on the server, listen for keywords on the text channels, and respond when invoked by generating and posting an image to the channel. To get started, I had to generate a token with associated permissions to allow my bot to connect to Discord, and then write supporting Python to use that token and connect to Discord using the Discord API. The API allows me to define a bot object which initiates and holds open a connection to my Discord server. I can simply define my bot’s behavior, and then call bot.run(token). Running this script, I can see suddenly my bot appears online on Discord.

A waifu generated on Discord in real time. I wrote the $claim_waifu command such that if it receives text after the keyword command, it sets the random seed based on that string. This way, any time I call for a waifu named “Medium”, I get the same one.

Prior to connection, the bot I wrote creates an onnxruntime session and loads the model for inference. I wrote a few eventlisteners which trigger asynchronous bot actions according to keywords of my choosing; the main one responds to “$claim_waifu” by running model inference, saving the resulting image as a tempfile, and then posting the image along with some flavortext. The onnxruntime ops are massively parallelized, and there is only about a one second delay between request and response.

Dedicated EC2 Deploy

This is great — but there’s a problem. I now have a bot that is running full time on my local laptop. If I restart my laptop, or lose internet, or accidentally brew upgrade Python in my virtual environment to 3.9, I incur bot downtime where my users are suddenly unable to claim their precious waifus. Unacceptable.

One resolution would be to purchase a baremetal server, install an OS, configure data and network redundancies, hire a Server Administrator to schedule and perform maintenance, and deploy my script from there. Or better yet, just deploy my script in the cloud — enter AWS, GCP, Azure, etc. I opted to use AWS because I already had an account.

Simple enough. I can deploy an EC2 instance running a machine image of my choice on hardware of my choice, and assign a security policy to allow my IP address to ssh in and control its behavior. I chose a t2.medium machine, which gave me 4GB RAM — through experimentation I had learned that the Python process on my machine consumed up tp 3.4GB RAM when running inference, a tradeoff of space for time, which I am happy with. I can deploy it with an ssh key-pair to use for connecting to it securely, and once it’s live, ssh in, connect to my git repo to download my source code, use scp to add anything my git repo might be missing (model, Discord token), and finally launch my bot process. I configured the security group to allow all outbound traffic, so EC2 establishes a Discord API connection no problem.

It worked great. Suddenly my bot was entirely hands-off, and available at all times. Any time I wanted to check its logs I could ssh into the instance, and if I wanted to deploy new code, it was as easy as git pull.

Spot Fleet Deploy

However, there was another problem. Since I needed 4GB of RAM, I was not able to use hardware eligible for “free tier” (a t2.micro, at 1GB RAM, would have allowed me 12 months of free runtime), and since I have to have the instance always up, I’m now paying for 730 hours of compute per month. That comes in at just shy of $40 a month billed to my AWS account, which is a lot to pay for computer-drawn waifus, especially considering that the vast majority of the time, my EC2 instance was just idle.

Two solutions occurred to me.

I could stand up a t2.micro to act as a listener, which could be up all the time but invoke a Lambda function to run inference when needed. A year of free Discord listening, and I only pay for inference when I need it.
I could migrate from a dedicated EC2 instance to a Spot Instance. A spot instance could save me approximately 70% per month, at the potential cost of uptime if spot availability diminishes.

I opted for solution 2 for a few reasons. For one thing, moving to a more complex EC2-Lambda infrastructure would require a thoughtful code refactor and additional testing, and would introduce more potential points of failure (additional networking and asset management on my part). In addition, it could potentially introduce lag due to dynamically loading the model in Lambda, or the added file transfer from Lambda to EC2. Moving to a spot instance on the other hand would allow me to keep all my code nearly exactly the same and preserve current behavior at the risk of losing my instance if the spot price were to rise above a threshold I set — but there are ways to mitigate this.

Enter Spot Fleet. A Spot Fleet Request allows me to define a target workload capacity and a set of rules that AWS will use to fulfill that capacity with dedicated EC2 instances, spot instances, or both. In my case, I only need one spot instance up at a time, but I can write a request that will allow AWS to select from a variety of hardware types that fit my needs. That way for example if there are no t2.medium spot instances available (AWS reports 5–10% downtime on t2.medium spot instances) I can have my workload roll over to a t3.medium, a t2.large, etc. This could mean incurring a slightly higher spot instance fee at times (t3.medium spot instances are $0.0125/hr and t3.large are $0.025/hr at the time of writing), but I can specify for AWS to always choose the lowest cost option, and I can also set a ceiling on how much I’m willing to bid. All of the spot instances in the hardware classes I ended up choosing still come out cheaper than a dedicated t2.medium, so even bumping up to the larger instance is still a cost savings for me. If I wanted to, I could even allow my fleet to use dedicated instances if no spot instances were available to guarantee constant uptime — but I’ll pass on that for now.

The first thing I needed to do to move my workload to a spot fleet was to define a Launch Template. A launch template is a blueprint for a deployed instance — either spot or dedicated. It tells AWS what hardware I want, what OS image to boot, what role (policies), security group, and key-pairs to assign it, and it allows me to write my own custom startup script. I’ll create a launch template that mirrors my current dedicated EC2 instance, with a startup script that will automate the deployment of the Discord bot code.

To automate the bot deployment, I needed a way to get my code onto the instance. I could add ssh an ssh key-pair to the instance and use it to clone my git repo as I did previously, but a slightly easier approach is just storing the head of my master git branch in S3 and allowing the instance to ingest it from there. That way I can also store my ONNX model alongside my code (it’s not good practice to use github to store data files) and allow EC2 to get everything from one place. Any time I push new changes to my code repo, I just need to hydrate the S3 bucket I defined — this could be automated as well (for instance using github Actions).

S3 bucket containing all the startup files I need for my launch template

Here I’ve created an S3 bucket to house my code and model. Notice I have an aws/ folder now — that wasn’t there before. I did have to make one small addition to my codebase for this to work.

AWS Secrets Manager, including my Discord token

Remember I needed a token to connect to Discord? That’s another thing I wouldn’t want to store in github. If someone were to obtain that token, they could connect to my Discord themselves. Locally, I was reading from a .gitignore’d credentials file containing the token, but I don’t want to publish that file to S3. A better idea is to use Secrets Manager, an AWS resource designed exactly for this purpose. I entered my private info into AWS Secrets Manager, and created a helper Python script in the new aws/ folder to read from it if it can’t find the original credentials file. All I needed to do next was modify the launch template role to allow access to Secrets Manager, easy enough.

Everything is now ready for my launch template definition. I create my OS, hardware, and key-pair requirements here:

Launch template VM definition. Note that I’ve changed to a RHEL AMI (was using Ubuntu 20.04 previously), which comes with the AWS CLI pre-installed for me. I had to modify my startup script to use the yum package manager instead of apt-get, but didn’t have to install any AWS tools in order to access EC2 or Secrets Manager later.

And I enter my startup script here:

The startup script installs packages required to run the ONNX runtime, downloads my code from S3, installs Python and pip, installs necessary Python packages from my requirements file, and launches my bot with its output redirected to a log file.

I can use this template to launch a solitary dedicated instance or spot instance, and this is what I did to debug. Once I was satisfied that I had the template configured correctly, I was ready to write my Spot Fleet Request, here:

As you can see, I’ve requested a fleet with a target capacity of 1 instance, with 0 on demand, meaning it will always choose a spot instance. I’ve selected a variety of instance types which can support my bot, and selected a “lowestPrice” allocation strategy. The t3.medium spot instance is currently the cheapest, but if that were to change, or if t3.medium availability were to subside, my fleet would terminate my current t3.medium instance and deploy the next cheapest option.

Conclusion

Once I deployed my fleet, beloved Waifu_bot rejoined my Discord server, now running on a cheap and reliable set of spot instances. All of the inference images I shared in this post were actually generated using this infrastructure, which has been running on a single spot instance for the past few days. To drive home my savings, I’ve posted a screenshot of my fleet resource consumption, which lists my instance costs and confirms I’m getting the savings I aimed for!

Once again, I’d like to give a special shoutout to my friend and colleague Stuart Minshull, who is a Senior Cloud Architect at Amazon Web Services who consulted with me on the general architecture and best practices I employed in this project!

github repo: https://github.com/jfreds91/waifu_bot