How I went from a mechanical engineering graduate to a masters student in Machine Learning

Udacity Deep Learning Nanodegree, AI Startup, and lessons I learned in between

6 min readAug 17, 2021

Summer 2018

Fresh Mechanical Engineering Graduate

As a fresh graduate with a Bachelor’s degree in Mechanical & Industrial Engineering, I was thinking about how to set the direction for my professional career. I had the option to go back to hardware verification engineer role I had at AMD for my professional experience (co-op) year. I could also work on some unfinished robotics projects to aim for a job in mechatronics. But something inside me kept reminding myself of the Introduction to Machine Learning course I took in computer science department during my last year. I was astonished by the mechanized pattern recognition processes enabling a computer to “perceive” the given data. Possibly biased by the media, I felt like machine learning was a big part of what the future holds.

At that time, having a career in AI felt like a far-away mountain, with a mechanical engineering degree and no related experience. After some research online; however, I came across many useful blogs about career switch to AI and MOOC resources for self-learning. At times it was confusing since so many AI terminologies were being used simultaneously (artificial intelligence, machine learning, deep learning, data science, etc.), but I started to get a sense after days of research. Now I believe that the following diagram I found online is quite an accurate representation of the scope:

Source: https://www.researchgate.net/figure/Data-science-vs-data-mining-vs-AI-vs-ML-vs-deep-learning_fig1_338594198

Trying to find an organized and focused online course that would keep me motivated on schedule, I came across Daniel Bourke’s blog post about Udacity’s Deep Learning Nanodegree. www.udacity.com was a online learning platform offering a wide range of courses especially in AI/ML fields. Their Nanodegree was an intensive 4-month online bootcamp that required completion of multiple projects in order to graduate (that also cost $1,200). They had Nanodegrees in artificial intelligence, data science, and data analyst, but following Daniel’s suggestion, I became most interested in their Deep Learning Nanodegree.

And there it was. As crazy as it sounds, I went for it. When my classmates were applying for jobs and getting interviews, I registered for the 4-month Deep Learning Nanodegree program at Udacity.

Fall 2018 - Spring 2019

Udacity’s Deep Learning Nanodegree and NLP Nanodegree

This was a fun time filled with learning new concepts and completing various deep learning projects. Udacity’s Deep Learning Nanodegree program covered a wide range of topics including CNN, RNN, GAN and their inner workings via backpropagation, gradient descent, and different types of training losses.

Since the program was heavily project-based, I engaged in a related project after each unit, which involved testing a model on different datasets or reproducing results from a research paper. While doing the projects, I learned to build a deep learning model architecture using PyTorch, set up a training environment, and work with CUDA GPU. Most of my early deep learning projects can be seen in my Github page. I won’t go into details about individual projects, but here’s what page 2 of my resume looked like after completing Udacity’s Deep Learning and also Natural Language Processing Nanodegrees :

By completing all these projects, I got a sense of how to code the general pipeline for training a deep learning model (mostly using PyTorch): transforming the dataset into iterable batches, building a model architecture, designing a training process with appropriate loss function and optimizer, and testing the model with new data.

Now (2.5 years later) that I look back at my work as a novice, I found two major shortcomings :

I did not inspect much of the given data.

The image/text data for my projects was usually provided by Udacity or an online repository in cleanly organized folders by labels, and I did not look much closer than visualizing/printing out a few sample data. When debugging the performance of a model, I focused on optimizing the model architecture or training procedure rather than inspecting the data given. This is somewhat understandable since most of what I was learning was about how modelling and optimization were done, rather than from where the model was learning from.

2. I focused more on completing as many projects as possible (breadth) rather than optimizing/looking deeper into a few projects (depth).

I was so amazed at the exceptional human-like results of the deep learning models, I wanted to try many different projects across different areas including image classification, image generation, text classification, machine translation, speech recognition, and speech synthesis. When I thought I had a reasonable outcome, I moved on to another project. Although this allowed me to taste a bit of each topic, it prevented me from digging deeper into a single one to deepen my understanding. For each project, I downloaded an open-source dataset tailored for convenient usage with organized folders. In addition, I was still intimidated by machine learning research papers.

Overall, it was an amazing experience even though I was always on a budget since I had to rely on my part-time modelling job for finances. I started applying for jobs in Spring 2019, and was fortunate enough to be offered a machine learning engineer position at a high-tech AI startup that was developing AI voice actors.

Summer 2019 - Spring 2020

Machine Learning Engineer at AI Startup

For 11 months, I worked as a machine learning research engineer at Neosapience Inc., a startup developing AI voice actors that can convey human-like emotions. Their product interface was a website where a user can type a sentence and choose a desired voice based on sex/age/personality. I was specializing in natural language processing aspect, where I optimized the user’s text before it was passed to a text-to-speech model generating the voice. I developed a semantic segmentation model that splits the inputted sentence into shorter segments , since shorter texts were synthesized with a better quality than longer ones. Other tasks included collecting English text and speech data from open-source websites such as medium.com.

Overall, it was a challenging yet enriching experience. I would summarize some of the most valuable things I learned as following :

Reading machine learning research papers and applying what I learned to my research experiments.
AI product pipeline of a company: from machine learning R&D, backend/frontend development, UI/UX design, to business marketing.
Working with real customer data, in contrast to clean, already tailored data for my Udacity projects; practicing data augmentation and pre-processing to fit the desired outcome
Working with great teammates who shared their problem solving skills

I also realized some aspects of research where I needed improvement :

Computing a quantitative test metric (e.g. recall, accuracy, statistic, etc.) to compare results between different experiments, instead of looking at a few test samples qualitatively
Always first inspecting the data more closely before searching for appropriate deep learning models for the problem
When given a problem, doing a thorough research about related papers instead of trying to build a model from scratch

Although the work was engaging, observing my teammates whom all held a graduate degree in machine learning, I felt like I could become a better researcher and gain further problem solving skills with a higher education in machine learning. I applied for University of Toronto’s Master of Engineering program in the Department of Industrial Engineering which had an option to have emphasis in analytics and machine learning. I was fortunate enough to get accepted.

Summer 2020

In preparation for Masters in Machine Learning

As I was preparing to start the masters in September, I contacted a few professors in my department for possible research projects in machine learning. Luckily, one professor offered me a list of projects, from which one caught my eyes. This is a snapshot from the email from the professor :

At first sight, I thought it resembled some image classification projects I’ve worked on before. Although it sounded quite familiar and simple (as overconfident as I sound), since it was the only one from the list that involved deep learning, I decided to go for it. And that’s how my computer vision research journey began.