Action Recognition for Karate Moves using Google Cloud Vertex AI and Cloud Storage

Published in

Google Cloud - Community

6 min readNov 20, 2023

Introduction

Action recognition aims to accurately describe human actions and their interactions from a previously unseen data sequence. It can be done using Google Cloud Vertex AI using the following steps.

Google Cloud Tech Used

Vertex AI — Machine Learning Platform to train and deploy ML models.
Cloud Storage — Managed service for storing unstructured data.

Create a New Google Cloud Project

On Google Cloud Console, the landing page, after logging into Google Cloud, click on the existing project. On the window that opens up, click on New Project.

Write the Project name.

Select the newly created project.

Enable Vertex AI API

Google Cloud APIs allow us to automate our workflows using our favorite language and use these Cloud APIs with REST calls or client libraries in popular programming languages.

Search for Vertex AI in the search bar and enable Vertex AI API.

Click the Enable button to enable Vertex AI and then click on Close.

Vertex AI is a machine learning (ML) platform that lets us train and deploy ML models and AI applications. Vertex AI provides several options for model training and deployment:

AutoML lets us train tabular, image, text, or video data without writing code or preparing data splits.
Custom training gives us complete control over the training process, including our preferred ML framework, writing our code, and choosing hyperparameter tuning options.

Vertex AI and machine learning workflow includes the following steps:

Data preparation — Apply data transformations and feature engineering to the model and split the data into training, validation, and test sets.
Model training — Choose a training method to train and tune a performance model.
Model evaluation and iteration — Evaluate our trained model, adjust your data based on evaluation metrics, and iterate on your model.
Model serving — Deploy our model to production and get predictions.

Prepare Data

Download sample videos from https://github.com/NehaKoppikar/ActionRecognitionKarateMoves. Alternatively, videos can be recorded. Vertex AI supports the following video formats:

MOV
MPEG4
MP4
AVI

It is recommended to have 100 annotations per label. The minimum number of videos per label is 11.

Create Dataset

Scroll down and search for the Prepare Data card. Click on ‘Create a dataset.’

Write the Dataset name, click the Video tab, and choose the Video action recognition option.

Choose Region us-central1 as all the regions are not yet supported for action recognition. Choose the default option, Google-managed encryption key, and click the Create button for encryption.

Upload Prepared Data

Choose Upload videos from your computer option.

Once all the videos are uploaded, the videos will take a while to be processed.

An email notification will be received once the videos are processed and uploaded.

Label and Annotate the Videos

Label: Used to select objects and to find collections of things that satisfy certain conditions. Labels are added to define the categories that we want to predict.
Annotation: Assign a label to an object. This will help with the model training process. Multiple annotations can be done with the same video.

Add the labels

In the above image, three labels are created:

Maai Geri: Front Kick
Oi Tzsuki Chudan: Moving Forward Stomach-level punch
Aage Uke: Face Punch Block

Annotate the videos. For this project, there were 11 annotations per label. Therefore, there were 33 annotations in total.

Train Model

Click on the Training tab under the Model Development Section and click on Train New Model.

Choose the Dataset and the Annotation set, and choose the AutoML option in the Model training method. Click on the Start Training button.

AutoML enables developers with limited machine-learning expertise to train high-quality models for their business needs and build custom machine-learning models in minutes.

The screen looks something like the following image.

Once the model training is completed, an email notification is received. Model training takes ~ 1 hour.

And the status of the model will be Finished.

Get Predictions

From the Vertex AI Dashboard, click Click Batch Prediction

In the New batch prediction tab, write the Batch prediction name and choose the model name and Destination path. Click the Create button.

In the cloud storage, add a pred.jsonl file. JSONL is a human-readable language to store and communicate data objects in a single line

JSONL file looks something like this:

{"content": "gs://cloud-ai-platform-29df1d5e-efde-4329-a061-654f3620c290/SaveTube.io-Age Uke-(480p).mp4", "mimeType": "video/mp4", "timeSegmentStart": "5.0s", "timeSegmentEnd": "8.0s"}

The JSONL file has the following keys:

Content: Path to file to be predicted
mimeType: Type of file
timeSegmentStart: Start time of the video to be predicted
timeSegmentEnd: End time of the video to be predicted

Once the predictions are ready, an email notification will be received in the Inbox.

The result looks like the following image.

The prediction file looks something like this:

{"instance":{"content":"gs://cloud-ai-platform-29df1d5e-efde-4329-a061-654f3620c290/SaveTube.io-Age Uke-(480p).mp4","mimeType":"video/mp4","timeSegmentStart":"5.0s","timeSegmentEnd":"8.0s"},"prediction":["aage_uke"]}

The prediction file has the following keys:

Instance: Contents of the JSONL file
Prediction: The predicted value. In this case, aage_uke was predicted.

This file can be found on our chosen destination path while creating the batch prediction.

Conclusion

This project costs $4.30 using Google Cloud Vertex AI and Cloud Storage. Thank you for reading so far.