The Increasing Importance & Applicability of ML/AI Models

Nathan Agez
TrackIt
Published in
4 min readApr 15, 2021

With growing volumes and varieties of data combined with the availability of cheaper, easier to use and more powerful computational processing and affordable storage, machine learning (ML/AI) models are becoming increasingly common implementations. It is now fairly easy for the modern-day enterprise to leverage trained machine learning (ML) models to automatically detect specific objects, recognize sounds, etc., and extract more value from its data.

This article explores some of the possible applications of ML/AI models and provides some insight into what these pipelines would look like.

Services Required

  • Matplotlib: to create audio spectrograms
  • Amazon Rekognition: AWS’s machine learning service used for object detection
  • MXNet: Library for deep learning used to train ML models
  • Amazon SageMaker: AWS’s service that helps developers build, train, and deploy ML models

Use Case 1: Recognizing objects in images and videos

Examples/Applications:

  • Logos
  • Faces
  • Objects
  • Shapes

What an object detection pipeline looks like:

  • Step 1: Images containing the objects and scenes to be identified are collected. (specify the number of images to be collected for proper training).
  • Step 2: A training dataset is created to simplify the process of training. This could mean uploading images and labeling images from a computer or S3 bucket or importing a SageMaker Ground Truth .manifest file that contains previously labeled images.
  • Step 3: A test data set is created to evaluate the model’s performance. This could involve selecting an existing dataset or splitting the training dataset for testing.
  • Step 4: A custom ML model is trained on Amazon Rekognition using the training and testing datasets.
  • Step 5: The model’s performance is evaluated on the test dataset. The model can be later improved by adding more images to the training dataset.
  • Step 6: The custom model is used to analyze images with a simple API call.

Use Case 2: Recognizing sounds from audio segments

Examples/Applications:

  • Recognizing a person’s voice
  • Recognizing the sound a specific car engine makes
  • Detecting the sound a bird or animal makes

What a sound recognition pipeline looks like:

  • Step 1: The audio challenge of detecting a sound is converted into an image classification problem by creating spectrograms of audio segments and associating them with specific names.
  • Step 2: To train the model, a few thousand short audio samples (between 3–10s) are required.
  • Step 3: To simplify the handling of data, a data frame is built. The data frame contains information about the files that are being analyzed and the labels that need to be attached to the data.
  • Step 4: The original dataset is split into three datasets: training, validation, and test.
  • Step 5: The training dataset is used to train the model. The validation dataset is used to improve the model i.e. to increase performance on the validation dataset. In production, the model with the highest performance metric is used on the validation dataset. The test dataset is used to get a real feel for how the model will perform with unknown data.
  • Step 6: The training of the model is done on an Amazon SageMaker training instance. The Amazon SageMaker framework requires a script conforming to a predefined interface.
  • Step 7: Once the model training is complete, the custom model is deployed as a service using Amazon SageMaker. Amazon SageMaker creates an endpoint.
  • Step 8: A user can now make API calls to the created endpoint, MXNet compares the data with the custom model and either labels the information with a certain percentage of accuracy or lets the user know that nothing was recognized

Potential for Monetization

One particular benefit of labeling information is the potential for monetization — a topic that we’ve elaborated on in our previous whitepaper titled ‘The Media & Entertainment Workflow Transformation Continues: From Videotape to Non-Linear Digital to the Cloud’. Companies and individuals could leverage custom AI/ML models to sift through old media files and find images, audio, and video segments that other parties may be willing to pay for.

About TrackIt

TrackIt is an Amazon Web Services Advanced Consulting Partner specializing in cloud management, consulting, and software development solutions based in Venice, CA.

TrackIt specializes in Modern Software Development, DevOps, Infrastructure-As-Code, Serverless, CI/CD, and Containerization with specialized expertise in Media & Entertainment workflows, High-Performance Computing environments, and data storage.

TrackIt’s forté is cutting-edge software design with deep expertise in containerization, serverless architectures, and innovative pipeline development. The TrackIt team can help you architect, design, build, and deploy customized solutions tailored to your exact requirements.

In addition to providing cloud management, consulting, and modern software development services, TrackIt also provides an open-source AWS cost management tool that allows users to optimize their costs and resources on AWS.

--

--