Building AI Solutions for Real-World Challenges as an Intern at HTX

Wen Hui Leng
HTX S&S COE
Published in
12 min readSep 13, 2023

Training a neural network model for optical character recognition in class, I was struck by how we can design a system to process data in a way that functions like the human brain. This first foray into Machine Learning (ML) sparked my interest in the field. I was curious to explore how ML models can be applied to real-world problems and decided to look for an internship that will allow me to gain hands-on experience in this area.

I am Wen Hui, and I am a third-year undergraduate student studying Electrical and Computer Engineering (ECE) at Carnegie Mellon University in the United States.

I chose to major in ECE as its curriculum spans across topics from low-level circuits to high-level software systems. I wanted to gain a comprehensive view of how computers operate and understand the integration of hardware components with software. Over the course of my studies, my interest in the software aspect of my major grew, including the use of data to improve decision making.

After my first experience with neural network in class, I went on to assist with a Computer Vision (CV) deep learning project at my university’s Robotics Institute. The exposure to various advanced algorithms further ignited my interest in Artificial Intelligence (AI), prompting me to take up the CV course offered in school. In the course, I gained hands-on experience with traditional image processing techniques such as edge detection, before advancing to neural networks. I found the interpretation of images using algorithms particularly exciting and decided to specialise in the field of AI and ML.

As I was interested to gain hands-on experience with real-world data and learn the end-to-end process of software development for ML applications and cloud deployment, I sought for an internship with the Sense-making and Surveillance (S&S) Centre of Expertise (CoE) in HTX. My 3-month internship from May to August 2023 at S&S proved to be an enriching and intellectually stimulating experience that shaped my understanding of ML engineering.

Vehicle Make and Model Recognition (VMMR) Project

My primary project in S&S was to develop and benchmark AI models to classify the make/model of heavy vehicles from images. The objective of the project was to develop a customised heavy vehicle classifier that can be used by the Home Team. Here’s how I went about tackling the project:

1. Data processing and preparation

The first step for me was to curate the dataset for model training of this specific use case. There were various types of data available, including a master-list of vehicle number plates with their corresponding make/model and raw images of the vehicles extracted from video footages.

To extract the vehicle number plates, I used the open-source YOLO (You Only Look Once) object detection model to extract cropped images containing only vehicles from the scene. These were passed through Amazon Rekognition, a cloud-based CV Application Programming Interface (API) service by Amazon Web Services (AWS), to extract the text detected from the number plates. The raw images and their corresponding detected text were then reconciled with the master-list of vehicle number plates (see diagram below).

Data processing pipeline for curating dataset of vehicle classes images

The data processing pipeline was automated using a Python data analysis library (Pandas) to match the raw images to the vehicle make/model using the number plate (shown below). This streamlines the data preparation data for current and future model training.

# Extract cropped images using YOLO
from ultralytics import YOLO
model = YOLO("yolov8m.pt")
results = model(source)

# Extract detected text using Amazon Rekognition
import boto3
client = boto3.client('rekognition')
with open(photo, 'rb') as image:
response = client.detect_text(Image={'Bytes': image.read()})
textDetections = response['TextDetections']

# Data processing using Pandas
import pandas as pd
df_image_classes = pd.merge(df_vehicle_number_classes, df_image_detected_text, on = ['Vehicle Number'], how = 'left')
df_image_classes = df_image_classes.drop_duplicates(subset='Vehicle Number')

The top vehicle makes/models with sufficient data samples were selected for model training and benchmarking. The dataset was then split into the train, validation and test sets. The train set was used to fit the model while the validation and test sets were used for evaluating the model’s performance during training and evaluation stage respectively. Some data were used for the train and validation set with a pre-defined ratio whilst the rest of the data were used solely as the test set. This is to ensure that there were no duplicates in vehicle images between the train/validation set and the test set so that the evaluation performance is unbiased.

2. Model training

After the data preparation is done, I experimented with several pre-trained image classification models like ResNet50, VGG16, InceptionV3 and EfficientNetB7 to train the model. With the TensorFlow framework, I used transfer learning to train these models with the dataset of full-frame images for the vehicle classes. These pre-trained models can pick up key features and increase the efficiency of my training process without the need for a large dataset. As part of the training process, I fine-tuned the model by adjusting hyperparameters such as learning rate and number of epochs to minimise the training loss. After which, I researched different evaluation metrics for classification models and wrote a script for model evaluation based on accuracy, performance and computing resources.

Upon evaluation, I noticed that the model performed better on vehicle classes with more training data, possibly due to biases towards those classes with higher occurrences within the dataset (see table below).

Table of selected evaluation results for each model tried

To address this bias from vehicle class imbalance, I used downsampling on the dataset to ensure fair representation of classes during training (see chart below). This means to randomly select a subset of the training data for vehicle classes with larger samples, ensuring that the model’s weights are adjusted to all classes more evenly. Additionally, I integrated the use of class weights into the model training process. This is a technique where the model’s weights are adjusted by a factor inversely proportional to the class frequency. With this adjustment, more significance can be placed on changes in the model’s weights for minority vehicle classes, reducing bias towards the majority class.

Example of downsampling on the distribution of data across classes

Upon reviewing the training curves of loss and accuracy, I noticed there were signs of potential overfitting. This is where the model was doing well on training data but poorly on validation data. In this case, the sign of overfitting was that the training curve continued to improve when the test curve had reached a plateau. A possible explanation could be due to the limited amount of data available to train the model because of the short data collection period. To mitigate this, I experimented with data augmentation techniques like adjustments to brightness and contrast — artificially creating samples using existing data to improve the model’s ability to generalise better. However, this did not work as the augmentation did not accurately represent the variation in input images. As seen from the image samples below, the original image (left) had consistent brightness levels, but the colours of the augmented image (right) looked unnatural. With no improvement in results, I decided to explore adding a dropout layer in the model architecture where the weights of certain nodes are ignored in the training process. The purpose is to reduce the model’s over reliance on certain nodes to determine its output which led to the possibility of overfitting. This technique yielded positive results and was integrated into the final model design.

Data augmentation outcome on sample image from Stanford Cars Dataset

To further improve the model performance, I tried two other alternative model designs.

● I used a two-step model with a detector model followed by a classification model. A pre-trained YOLO detector model detects the vehicle from the full frame image and passes the cropped image into the classification model to predict the make/model of the vehicle. This is to minimise the number of irrelevant image pixel information from the surrounding background being fed into the model. This focused classification approach proved to be more successful, increasing the model’s F1 score by 8%.

● Besides the two-step model, I also experimented with training a YOLOV8 detector model on the vehicle classes. The detector will label the vehicle with the make/model it corresponds to. This proved to be the most successful option that I have experimented with, achieving an improved F1 score of 16% from the original one-stage classification model on full-frame images. (See process below)

Summary of type of models experimented

3. Model deployment & Software development

For model deployment, I developed a Flask REST Application Programming Interface (API) and containerised it using Docker. Dockerization enables easy deployment on various environments. This is a back-end application which does the image inference when the API is called along with the input image. To test the application, I used a GitHub tutorial that my colleague developed for learning the concepts behind test-driven development (TDD). In this process, I learnt about unit testing with PyTest and Postman, ensuring robustness and efficiency in API functions.

The next stage of the project was exciting for me as it was my first exposure to utilising cloud platforms. To start off, I studied the cloud deployment pipeline that my supervisor built on Amazon Web Services (AWS). This helped me with foundational networking concepts such as web requests, API and internet protocol (IP) routing. With this newfound knowledge, I started experimenting with the deployment of an Amazon Elastic Compute Cloud (EC2) instance which is a virtual machine (VM) service on AWS.

Following that, I proceeded to implement the architecture from scratch on Microsoft Azure (as shown below). I started off by watching online tutorials and reading documentations on Azure services. As I had no prior knowledge on this topic, it was quite overwhelming. I was grateful for the useful resources recommended by colleagues for my learning. Eventually, I managed to build an application on Azure that connects to the CosmosDB database. The Docker image was then deployed on the Azure VM along with a series of networking resources like the Load Balancer and Network Address Translation (NAT) gateway. The entire process required multiple iterations of trial-and-error and frequent clarifications with my colleagues. Over time, I became more familiar with cloud computing concepts and the Azure platform — something I have not learnt in school yet.

Cloud architecture for model deployment on Microsoft Azure

I ended my project with optimising and benchmarking model prediction duration with different CPU and GPU resources, as well as helping to identify the most cost-effective option to guide the final steps of model deployment into production.

Challenges Faced

One challenge that I encountered in my project was the imperfection in real-world data. In previous projects, I had more straightforward data that were already pre-processed. For this project, the dataset that I worked with was more complicated and I had to think about how to prepare and process the data. For example, the records showed the vehicle number plates with their corresponding makes/models but there was no connection to the raw images captured. As such, I had to write a script to clean and process the raw data to build up the dataset for training. To ensure the correctness of the data preparation, I also had to visualise sample images from the dataset generated. There were quite a few steps of data pre-processing involved, which required an understanding of the type of data I was dealing with.

The other challenge was the fact that I had no prior experience with using cloud platforms. When I started software development on Azure, I was at a loss as to where to begin. I was glad that my colleague recommended some online learning resources that I could refer to for cloud platform services. This gave me a systematic framework to explore the available Azure services before integrating them in the model deployment architecture. It was a steep learning curve initially, especially in navigating the cloud platform. The mentorship and guidance provided by my colleagues were crucial in helping me to pick up these skills and gain exposure to available tools for the ML engineering process.

Having said that, I am glad to have met and overcome these challenges as they have given me the motivation to learn and grow which was the goal for my internship with HTX.

Work Environment and Team Culture

My internship would not have been as fulfilling without the supportive mentorship and guidance that I received from everyone in the team. It was easy for me to ask for advice or assistance when I encountered any challenges in my project. My colleagues would take the time to explain concepts which were foreign to me or assist me in finding the solution to a problem I was stuck on. I never felt like I was denied the opportunity to learn just because I was an intern. The open and supportive work environment made my internship an enjoyable experience!

S&S CoE Team Picture

The team has a mentorship system that supported my technical development as an intern. There were weekly check-ins with my supervisor to discuss my progress and share any roadblocks I was facing. Additionally, I also met with the Deputy Director at key checkpoints to seek his input on my project direction. Overall, the nurturing and supportive environment helped develop my technical skills in ML and CV.

S&S has fostered an inclusive environment, where there are opportunities for growth and contribution, for everyone in the team. With an emphasis on building relevant hands-on experiences and encouraging positive learning attitudes within the team, regardless of background, experience or gender, all the engineers get equal opportunities to work on impactful projects from software engineering to deepfake research.

The team has a culture of open communication and exchange of ideas. At meetings, every member of the team will update on their project’s progress and is able to seek input from others. There will also be a feedback session at the end where the Director and Deputy Directors provide a platform for questions and feedback from the team. The openness to feedback and emphasis on communication is key to the cohesion of the team.

My Takeaways

This internship has provided me with a more in-depth understanding of the ML engineering process. My previous experiences were mainly in model training and here in S&S, I have gained hands-on experience experimenting with data preparation and model architecture modifications. The significance of data processing in a model’s performance became apparent to me. I also improved my critical thinking skills in looking for solutions to improve the evaluation results. My biggest achievement is in the implementation of cloud architecture for model deployment. Working on the Microsoft Azure cloud deployment has provided me with a broader exposure to technologies related to ML applications. This has been a refreshing learning opportunity that I would otherwise not have in the classroom.

Beyond technical skills, I have honed my problem-solving skills, resourcefulness and adaptability. Using my prior experience in Python projects, I learnt to find online resources relevant to my task. For example, concepts in the TensorFlow framework and solutions for improving model performance. I realised the importance of thinking out of the box and approaching a problem from different perspectives in ML.

S&S has provided me with opportunities for growth not just in technical knowledge but also exposure to projects across HTX. It was especially rewarding to work on real-world projects that would enhance the operations of the Home Team. Besides working on my project, I also attended the launch of HTX’s innovation centre, Hatch, which serves as a collaborative space between HTX and startups. This was in conjunction with TechXplore 5 where various CoEs within HTX showcased their science and technology capabilities. The exposure gave me a more comprehensive overview of HTX’s efforts in supporting Home Team operations.

What’s next

Inspired by my internship with HTX, I plan to continue specialising in ML and CV through advanced coursework and extracurricular activities. The exposure to real-world projects at S&S has left a positive impact on me, motivating me to consider pursuing a career in this area.

I highly recommend an internship at HTX S&S for students eager to have hands-on experience in AI, CV, or software development.

I hope my article has provided insights into life as an intern at S&S and HTX. If you are interested in finding out more about my experience, feel free to reach out to me!

[1] Precision measures the proportion of correct positive predictions by the model out of the total number of positive predictions made by the model, weighted for each class.

[2] Recall measures the proportion of correct positive predictions by the model out of the total number of positive samples for that class, weighted for each class.

[3] F1 score is calculated by the harmonic mean of the precision and recall scores, which is the product of the precision and recall divided by their sum.

--

--