Infrastructure as Code for Artificial Intelligence Applications

Wilder Rodrigues
Applied Artificial Intelligence
5 min readJan 5, 2018

A month ago, I started a series of stories about Deep Learning for NLP. My initial idea was to have at least 4 instalments, where I would explore and share the realm of Deep Learning when it comes to Natural Language Processing. After writing the first two parts, which can be found here and here, I decided to change a bit the course of this boat and get into some very important aspect of AI applications: an environment to run them.

Throughout the year of 2017, I have invested 13+8+16+4+11 weeks on Coursera, going from Andrew Ng’s Machine Learning course, from 2011, Geoffrey Hinton’s Neural Networks for Machine Learning, from 2013, a couple of courses on Computational Neuroscience and Algorithms, to Andrew’s newest Deep Learning Specialisation. In those 52 weeks, I learned a lot and also wasted a lot of time running models on my MacBook. I believe that many people around the world start with AI doing the same: going through Coursera and running models locally.

With that in mind, I decided to put my Software and Cloud knowledge in practice to speed things up, also thinking about making it public so others can use the same strategy as I did.

In this story, I will show you how to create a fully automated AWS environment using a GPU powered instance with Terraform and NVIDIA Docker.

AWS

Before we get to Terraform code, you need an AWS account in order to create an user under AWS IAM (Identify and Access Management). I will list below the things you have to do before proceeding with the automation of the environment. Those steps are done only once.

  • AWS Account
  • IAM User
  • Access Key (this is done after the IAM user is created)
  • IAM Policies (you need to attach AmazonEC2FullAccess and AmazonS3FullAccess to the user that you have created)

Keep your Access Key and Secret Key safe! Do not share it with anyone unless you would like to see them creating AWS resources on your account!

Once the steps above have been successfully completed, you will need to do the following:

  • Go to the AWS Console / EC2 Dashboard and create a Key Pair. You should be able to find it on the Network & Security section on the left side of your browser. It looks like the image below:
Source: https://console.aws.amazon.com/ec2/v2/home

Download the key you have created and keep it safe.

Terraform

The whole idea behind Terraform is to offer a way to improve operators productivity. It also became one of the most used tools by Engineers that have one foot on operations and the other on development, a.k.a DevOps Engineers.

With such cloud-agnostic tools, one can create environments under the major cloud providers or other open source cloud platforms with a few hundred lines of code.

To get Terraform installed on your MacBook / Laptop, please go to https://www.terraform.io/.

Resources

To get going with our idea here, we will have to create several resources. Clicking through a browser is no option! To keep GPU instances running is expensive and to have to create the environment over and over again is tiresome, plus it doesn’t add any value to do repetitive tasks.

To have an idea about how many things we need, have a look at the list below:

  1. Virtual Private Cloud (a.k.a. VPC);
  2. Internet Gateway;
  3. Security Group;
  4. Subnet;
  5. Route Table;
  6. Virtual Machine;
  7. Volume;
  8. Elastic IP; and
  9. S3 Bucket.

As the saying goes: a picture is worth a thousand words!

Source: https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpce-gateway.html

The infrastructure we will be creating is pretty similar to the picture above, except that we will have only one subnet and one VM. But that’s it, we will write 125 lines of Terraform code that will be able to create the whole thing. Any time you want, just destroy it and then create it again. That’s perfect for running models!

NVIDIA Drivers & Docker

Oh yes! We are going to create a GPU powered instance. It means that we need some other things in place in order to use the full power of out 8 vCPUs and 15GB RAM instance: NVIDIA Drivers! But that’s not the only thing. We also need to install Docker CE and the NVIDIA-Docker runtime. Otherwise it would be a waste of time.

To get the extra tooling installed, we have to provision a shell script, which will run once the VM is created. Terraform will take care of all of this, we just have to write the script, which will be an extra 48 lines of code.

Why Docker?

Well, we don’t want to create the infrastructure and then keep using it to write our AI code. In addition to that, we also don’t want to keep cloning Git repositories every time we create the infrastructure. Instead, we will have a separate project, on Git, with the AI. We use that project to create a Docker image, which is them pulled onto out GPU instance and executed.

Why a S3 Bucket?

Running the model might still take some time, depending on its complexity. We don’t want to keep looking at it, waiting for it to finish and copy stuff around. So, what we do it: the model runs with a Model Check Point which will store the best model in disk. If things go weird, like loss is increasing, the check point stops the running and copies the best weights to the S3 Bucket. We can then simply download it from there and destroy the whole environment.

It’s Coding Time!

No worries, I’m not going to put you through all those lines of code: they are all ready on Github! There is also a Docker image based on a model I have running for a Kaggle competition. The code is also on Github and can be used as a template for your own model.

If you are now ready to create your environment and see how it works, clone the repository, read the code, read the README for further instructions and have fun!

Using Your Own Image

If you want to use your own Docker images, please feel free to clone / fork the repository below and play with the code:

I’m constantly working on those repositories to add more features and models. So, keep an eye on them in case you are interested.

Acknowledgements

Thank you for your time. I hope I have taught something new or at least given ideas on how to improve the knowledge I shared.

--

--