AI in Practice: Identify defective components with AutoML in the Google Cloud Platform

Published in

Analytics Vidhya

10 min readJul 8, 2020

Until recently, the use of artificial intelligence (AI) was only possible with great effort and construction of own neural networks. Today, the barrier to entering the world of AI through cloud computing services has fallen dramatically. Thus, one can immediately use current AI technology for the (partial) automation of the quality control of components without having to invest heavily in AI research.

In this article, we show how such an AI system can be implemented exemplarily on the Google Cloud Platform (GCP). For this purpose, we train a model using AutoML and integrate it perspectively using Cloud Functions and App Engine into a process where manual corrections in quality control are possible.

The code for this project is in the repository gcp-automated-quality-inspection.

The architecture of the AI System

The following figure shows an example of architecture on GCP for step-by-step automation of the quality control of components.

An exemplary implementation of the AI system in GCP

To train a suitable model, we use the machine learning service AutoML. This service trains a state-of-the-art AI model for image classification on a custom dataset. Besides, the model can be deployed directly via the service as a REST-endpoint.

For the integration of the model, we use Google Cloud Storage (GCS) to upload the images. The upload triggers a cloud function that calls the REST-endpoint of the model and classifies the image. Then we process the result of the prediction via a second cloud function which is triggered through a pub/sub topic.

The second cloud function implements the processing logic to sort the images according to classification and the associated confidence. Predictions whose confidence level is below a selected threshold are called “uncertain”. These predictions must be post-processed by experienced workers. For this purpose, we wrote a web application that is used by the workers to check the uncertain predictions.

In the following, we describe the installation and deployment of the main components of the AI system:

Preparation, Training and Serving with AutoML
Integration of the model with cloud functions
Deployment of the application for manual post-processing with App Engine

Requirements to follow this tutorial

Access to GCP is required to perform these steps. We recommend creating a new project and setting up the Google Cloud SDK on the local development environment. Download the complete code of the gcp-automated-quality-inspection repository, set up a python 3.7 environment and install the requirements via pip install -r requirements.txt .

1. Preparation, Training and Serving with AutoML

We are using the casting product data for quality inspection dataset from Kaggle. This dataset contains different images of manufacturing product that are used to produce submersible pumps.

Firstly, we upload the data to a GCS bucket. Since AutoML is currently only available in the US-CENTRAL1 region, we must create the bucket inside this region by using the following commands. The GCP_PROJECT_ID can be found directly in the GCP Console.

export GCP_REGION="US-CENTRAL1"
export GCP_PROJECT_ID="<fill-with-your-project-id>"
export TRAINING_DATA_BUCKET="${GCP_PROJECT_ID}""-product-quality"
gsutil mb -l $GCP_REGION gs://"${TRAINING_DATA_BUCKET}"

Next, we download the data from Kaggle and unzip them inside the data directory.

data
└── casting_data
    ├── test
    │   ├── def_front
    │   │   ├── ....
    │   │   └── new__0_9334.jpeg
    │   └── ok_front
    │       ├── ....
    │       └── cast_ok_0_9996.jpeg
    └── train
        ├── def_front
        │   ├── ...
        │   └── cast_def_0_9997.jpeg
        └── ok_front
            ├── ...
            └── cast_ok_0_9998.jpeg

Finally, we upload the data: gsutil -m cp -r data/ gs://${TRAINING_DATA_BUCKET}

After the upload, we create a CSV file that contains the necessary meta-information about the data to start the training with AutoML. This file consists of three columns:

SET: This is an optional field with fixed values to decide which sample belongs in which set. The fixed values are TRAIN, VALIDATION and TEST. If we don’t assign this field, AutoML will divide the dataset into 8:1:1. If we assign this field, it is necessary to use all of these values.
IMAGE PATH: The path of the image in GCP.
LABEL: The label of a sample.

We wrote a script prepare.py to generate this CSV file based on the blobs in the specified bucket. You can create this file by executing python automl/prepare.py and upload it to GCS with gsutil cp preparation.csv gs://"${TRAINING_DATA_BUCKET}" .

Now the dataset can be created in AutoML. To do this, select Single-Label Classification in the Console and then select the uploaded CSV file. It performs the import of the data into AutoML. This process takes about 20 minutes.

Selection of the uploaded preparation.csv inside the AutoML UI

After the import, it is possible to inspect the data in AutoML. This feature is especially useful to check the data quality selectively. Now the training can be started.

In our case, we choose the option “Cloud-hosted” to deploy the model in GCP after the training quickly. The computing power during the training is specified in “Node Hours”. Behind it, there is a computing instance with an NVIDIA Tesla V100 graphics card. Each node hour is charged at $3.15 per hour. We choose the minimum of eight node hours and start the training.

After the training, a first evaluation of the model can be done in AutoML. Here you can calculate and display various quality criteria such as Recall, Precision and Confusion Matrix. Furthermore, there are several possibilities to visualise the model and its predictions interactively.

Interactive analyse of the model results via the AutoML UI

To finish the AutoML part, we deploy the trained model as a service endpoint.

2. Integration of the model with cloud functions

The architecture of the cloud function integration

The AutoML model is integrated via the two cloud functions Prediction and Moving. The Prediction function is executed automatically for each image upload. It downloads the image and sends it against the model endpoint. Then the result is written to the pub/sub topic, which triggers the Moving function. It implements the logic to sort the images according to the classification result.

Prediction

First, we create the INBOUND_BUCKET and the PREDICTION_TOPIC:

export INBOUND_BUCKET="product-quality-inbound"
export PREDICTION_TOPIC="automl_predictions"
gsutil mb -l $GCP_REGION gs://"${INBOUND_BUCKET}"
gcloud pubsub topics create "${PREDICTION_TOPIC}"

The code of the function is in cloud_functions/predict/main.py.

During runtime, the image to be classified is downloaded from the bucket and sent to the AutoML service endpoint. The response in Protocol Buffer Format is deserialised and then written to the PREDICTION_TOPIC in a message of the following form.

msg = {
    "bucket_name": data["bucket"],
    "image_name": data["name"],
    "prediction_label": result.get("display_name"),
    "prediction_score": result.get("classification").get("score"),
}

We deploy the cloud function via the Google Cloud SDK. Therefore we need to figure out the MODEL_ID of the trained model, which can be extracted via AutoML interface. Furthermore, the trigger event google.storage.object.finalize with the corresponding bucket INBOUND_BUCKET is specified.

export MODEL_ID="ICN690530685638672384"
export PREDICT_CLOUD_FUNCTION_PATH="cloud_functions/predict"
export PREDICT_CF_NAME="predict_image"
gcloud functions deploy "$PREDICT_CF_NAME" \
 --source "$PREDICT_CLOUD_FUNCTION_PATH" \
 --runtime python37 \
 --trigger-resource "$INBOUND_BUCKET" \
 --trigger-event google.storage.object.finalize \
 --set-env-vars model_id="$MODEL_ID",topic_id="$PREDICTION_TOPIC"

Moving

The Moving Function processes messages from PREDICTION_TOPIC. As soon as a message arrives, this triggers the Moving Cloud Function. The function is implemented in cloud_functions/move/main.py and processes the results according to the confidence, label and threshold. Depending on these three values, the service moves the associated image from the INBOUND_BUCKET to the specific directories of the PREDICTION_BUCKET:

okay: Images with no detection errors
defect: Image with detected errors
unclear: Uncertain images. In this case, the confidence level is below the desired threshold value.

Before the deployment, we create the prediction bucket:

export PREDICTION_BUCKET="product-quality-prediction" 
gsutil mb -l $GCP_REGION gs://"${PREDICTION_BUCKET}"

Finally, we deploy the cloud function with the Google Cloud SDK and the associated environment variables.

export PREDICTION_THRESHOLD="0.8"
export MOVE_CLOUD_FUNCTION_PATH="cloud_functions/move"
export MOVE_CF_NAME="move_image"
gcloud functions deploy "$MOVE_CF_NAME" \
 --source "$MOVE_CLOUD_FUNCTION_PATH" \
 --runtime python37 \
 --trigger-topic "$PREDICTION_TOPIC" \
 --set-env-vars prediction_bucket="$PREDICTION_BUCKET",prediction_threshold="$PREDICTION_THRESHOLD"

3. Deployment of the application for manual post-processing with App Engine

Using a simple web application, we display the images from the unclear directory in the browser. Through this application, experienced workers check the product images in detail and manually classify them. We used FastAPI and React to implement the web application. The code of the application is inside the app_engine folder.

The architecture of the manual post-processing application using Cloud Storage and App Engine with FastAPI and React

Preparation of the Permissions

Before we deploy the application, we need to set up different permissions to access Google Cloud Storage from the App Engine. By default, the API activation of App Engine creates the service account ${PROJECT_ID}@appspot.gserviceaccount.com. Through the IAM console, we create a key for this service account. We store this key as app_engine_service_account.json inside the app_engine directory.

IAM console — Create key for the service account

After the application start, it loads the key to obtain the necessary permissions. It is important to note that this key should neither be shared nor versioned.

The application creates presigned-urls to load them from the the Web frontend. Therefore, the service account requires the Role Service Account Token Creator.

IAM — Assign “Role Service Account Token Creator” to the App Engine Service Account.

Furthermore, the service account requires access to PREDICTION_BUCKET. In GCP console, we navigate to the Storage Browser and assign the service account the roles Storage Object Viewer and Storage Legacy Bucket Writer .

Deployment of the application on App Engine

Firstly, ensure that the app_engine_service_accout.json is inside the app_engine directory. Secondly, the PREDICTION_BUCKET must be adapted as an environment variable in app.yaml.

We deploy the application with gcloud app deploy app_engine/app.yaml. Once the deployment is finished, we can open the application directly from the CLI with the command gcloud app browse.

The application reads from the unclear directory in PREDICTION_BUCKET. To test it, we can upload an image into this directory. After the image has been manually classified, the backend store the image inside human_decided directory with the prefix of the label.

How expensive is the operation of this AI system?

An important question is the cost of operating such a cloud application. Here we differentiate between one-time, fixed and dynamic costs:

One-off: costs such as model training
Fixed: Costs incurred to ensure that the application is permanently available. For example the service endpoint of the model and web applications with App Engine
Dynamic: Costs that vary according to usage and utilisation, such as storage space in GCS and computing time for cloud functions

It is always difficult to establish an exact calculation of costs without a concrete use case. Besides the number of calls, the environment of the solution also plays an important role.

Nevertheless, to give a more concrete initial idea of the costs, we make the following assumptions:

every day 1,000 images of 1 MiB each
5 % of the images are classified as unsafe
The application runs 24 hours a day for 30 days

The costs are distributed into the usage of AutoML, App Engine, Cloud Storage and Cloud Functions. We do not consider the network traffic, the GCP free tier and we generously estimate the monthly costs:

Model development (One-off): 3,15 $ * 8 Node hour = 25,2 $
Model deployment (Fixed): 1,25 $ * 24 hours * 30 days = 900 $
Application hosting (Fixed): 0,05 $ * 24 hours * 30 days = 36 $
GCS — storage and read/write operations (dynamic): < 1.5 $
Cloud Functions (dynamic): < 1 $

It is clear from the calculation that model deployment causes the highest costs. These costs can be further reduced if necessary, for example, by edge deployment of the model or by not running the model around the clock, but only selectively during working hours.

Conclusion

In this article, we have shown how to implement an initial AI system for semi-automation in quality control on the Google Cloud Platform in a few steps.

By using cloud computing services data-driven products can be prototyped quickly. Services such as App Engine and Cloud Functions enable developers to focus more on the actual value creation than on the operational operation of the applications.

Especially in image recognition, a sufficiently good AI model can be developed today without tedious effort. This technology lowers the entry barrier for the validation of data-driven products. Due to the technological advances in cloud development, AI projects can be carried out in fast experimentation mode.

I wrote this article together with my colleague Marcel Mikl. Many thanks for the fruitful cooperation. We work at a company called codecentric which helps companies within their digital transformation journey.

If you are interested in doing projects or exchange about data-driven products or AI in general, feel free to write us an email to ki@codecentic.de.

_____

If you have any feedback or questions, please feel free to contact me on LinkedIn Nico Axtmann.

______

Join the Machine Learning in Production LinkedIn group to learn how to put your models into production. Feel free to add me on LinkedIn. I am always open to discuss machine learning topics and give advice on your data science projects business implementations!