Auto-ML in a Flash

Shantanu Acharya
8 min readDec 6, 2021

--

In this post, I’ll explain how I created an Auto-ML platform called Flash. I’ll be describing in detail how the different components of the platform work and communicate with each other in order to act as a fully automated service.

Before we move forward, here are the links to the source code as well as the link to the platform

Introduction

Flash is an end-to-end Deep Learning platform that allows users to create, train, and deploy their own neural network models in a matter of minutes without writing a single line of code.

The platform currently supports two types of tasks:

Image Classification

Classify images from your own dataset by using them to train a ResNet-34 or MobileNet v2 model. Training happens via transfer learning where models available will be pre-trained on the ImageNet dataset.

Text Classification

Classify sentences by training an LSTM or GRU based sequential model on your own dataset. The models will be trained from scratch.

How it Works

Using Flash is easy. With just a few clicks you can train and deploy your models automatically. You just have to select your model and upload the dataset, and you’re good to go. No code or experience is required.

How to train and deploy models with Flash

Training

For training a model, you’ll have to upload your own dataset and select the model parameters. Depending on the size of the dataset, the model can take anywhere between 3–10 minutes to train and deploy your model.

After you upload your configuration, the platform will assign you a unique token. Please save the token as it will be used to test the model on the inference page.

Training a model

Inference

You can perform inference on a trained model by using the token provided to you after submitting the training configuration on the training page.

After submitting the token, you’ll get a form where you can upload inputs to check the performance of your trained model. The inference page also provides you with the results of the training process by showing you the accuracy of the model on the validation set as well as the change in accuracy during training.

Performing inference on a deployed model

Behind the scenes

Now let’s delve into the technical concepts.

On a higher level, there are two major components to the entire platform: the frontend and the backend.

  • The frontend is created using React Js and is hosted through GitHub pages. To check the specifics of the frontend architecture such as what all packages are being used, check out the README page on GitHub.
  • The backend is created by connecting several AWS services such as S3, Lambda, and EC2 together.

The Flow

To understand how the backend works, let’s first see a general flow as to how the backend functions when a training job is submitted to it by a user.

  • After getting the training request, the backend first checks whether the training server is in live mode or dev mode. This is important because if the server requires some maintenance then setting it to dev mode will allow it to ignore all the incoming user requests so that the maintenance can be done without interference.
  • If the server is in live mode, the backend then checks if the server is currently busy with another task. If the server is busy, a notification is sent to the user to try to submit the job again after some time.
  • If the server is available, the training configurations such as the hyperparameters, model configurations, and the dataset are sent to an AWS Lambda function which stores all this info in an S3 bucket. After uploading the dataset, the lambda function generates a unique token that is sent to the user. This token is used by the user later to perform inference on the platform on the trained model.
  • As soon as the job is created, a file with all the job details on S3 is stored and the lambda function boots up the training server (a P3 instance in AWS EC2).
  • On booting up, a cron job on the training server instance looks for the job file and fetches the training configurations and the dataset from S3 into the instance locally.
  • Using the data downloaded, the desired model is trained and the corresponding checkpoint with the best accuracy on the validation set is saved.
  • The checkpoint is then uploaded back to S3. As soon as the upload is done, the training server cleans up the training data that was downloaded locally as well as the data which was present in S3 and shuts itself down in order to minimize costs.

Now let’s see the flow when a user submits the assigned job token on the inference page of the platform.

  • After submitting the job token, a lambda function is triggered which validates the user token.
  • If the token is valid, all the training details such as the final obtained accuracy as well as a plot of change in accuracy during training are shown to the user. A form is also provided on the webpage where the user can upload sample images/write sample sentences to see the predictions of the trained image classification/text classification model.
  • Besides this, a separate cleaning lambda function is also run periodically on the backend every two hours which looks at the deployed models stored on S3 and deletes the models which are more than 2 hours old. This is done to minimize server storage costs.

Since now we have an overall understanding of the backend, let’s dive deeper into its architecture.

The Backend

There are three major components in the entire backend architecture

  1. Config Files
  2. Training Models
  3. Serving Models (Performing Inference)

Let’s look at each of them one by one.

Config Files

The platform uses four config files (in JSON format) throughout the flow to ensure smooth communication among different components.

  • status.json: Stores information about the state of the training server and whether the project is currently in dev or live mode.
  • cleanup.json: Contains a flag depending upon which the cleanup lambda function performs cleanup of trained models on S3.
  • training.json: Stores training configuration which includes model parameters and dataset.
  • inference.json: Keeps information regarding the models which are currently available for inference.

To know about the in-depth details of the configuration files mention above, check out the GitHub page.

Training Models

The training component is further sub-divided into two sub-components

  1. training lambda: Manages EC2 instances and acts as an interface between the frontend and the training process
  2. training server: Trains and deploys the model for inference

The code and in-depth details of each of the sub-components can be found on the GitHub page.

Training Lambda

The training lambda sub-component contains four lambda functions.

  1. status: Fetches the status.json file from S3 and confirms the availability of the server
  2. train: Receives the training configuration from the frontend, validates the dataset, and stores it in training/training.json in S3
  3. start: Turns on the EC2 training server
  4. stop: Turns off the EC2 training server and changes the status to sleeping in status.json

How it Works

  • The status function upon invocation by the frontend fetches status.json from S3 and returns the availability status of the training server.
  • If the status is sleeping, the frontend sends the training configuration along with the dataset to train function.
  • train first updates the server status in status.json to active then reads the training configuration, validates the dataset, and creates the training.json file on S3.
  • As soon as the training.json the config file is created, an S3 event notification is triggered which invokes the start lambda function.
  • start fetches status.json and checks the dev_mode flag. If dev_mode is set to False, the EC2 instance is turned on.
  • After training is completed, the training server deletes the training config (training.json) file from S3.
  • As soon as the training.json config file is deleted, an S3 event notification is triggered which invokes the stop lambda function.
  • stop fetches status.json and checks the dev_mode flag. If dev_mode is set to False, the EC2 instance is turned off. It then updates the status in status.json to sleeping.

Training Server

  • The training script (namedflash.py) fetches the training.json file from S3 and parses the information.
  • If the task type is classification then the training configuration is passed to the image_classification module.
  • If the task type is textclassification then the training configuration is passed to the text_classification module.
  • After training is completed, flash.py updates the inference.json file on S3 and uploads the trained model.
  • After uploading, flash.py deletes the training.json config file from S3.

Serving Models (Performing Inference)

The inference component contains 3 lambda functions.

  1. check: Checks if the token submitted by the user is valid.
  2. infer: Performs inference. It takes in the input and returns the model prediction.
  3. clean: Runs every two hours and deletes those models from the server which have expired their validity (i.e. models which are older than two hours).

The code and in-depth details of each of the sub-components can be found on the GitHub page.

How it works

  • The frontend invokes check when a user submits a token.
  • check fetches the inference.json file from S3 and confirms the validity of the token.
  • If the token is valid, the frontend invokes infer.
  • Using the token, infer fetches the required configuration from inference.json and gives the model prediction.

clean is an independent function that runs every two hours. It checks for models which are more than two hours old (using the created key in inference.json) and deletes them.

Current Limitations

The platform in its current capacity has several limitations

  • Since the models are currently hosted via AWS Lambda functions, there is a storage limit option as to what all models can the platform support. Currently, due to lambda limitations, the platform cannot support models bigger than ResNet34.
  • It can only support training requests of only one user at a time. This is intentional. Since the project is not made for commercial purposes, the resource availability on the training server is quite limited in order to reduce AWS bills.

While the limitations above do not make the model production worthy yet but with ample resources, those limitations can certainly be remedied.

Final Thoughts

Flash provides a very basic demonstration of how neural networks can be trained and deployed easily in a matter of minutes. While the platform is currently in its nascent stages, it shows that the possibilities for it to expand to do much more than just training basic tasks are endless. I plan to add many more features to it in the coming future.

Please feel free to reach out to me if you have any thoughts/questions. I’ll be happy to answer them.

--

--