Making the Leap from Hardware to Machine Learning, Part 2

Learning about the machine learning industry before you dive in

8 min readMar 17, 2024

After earning a PhD in optical chip design and working 7 years in the hardware industry, I decided to make a career pivot into machine learning software. In this blog series, I discuss why I made the switch, what I found helpful for preparation, projects that I worked on, and my job search and interview process as an outsider. Throughout, I’ll highlight lessons learned and things that I wish I knew before starting. My target audience is anyone with a technical, but non-computer science background, who is considering making the leap. I hope you can learn from my journey to make your own transition smoother and faster.

This is post #2: Learning about the machine learning industry before you begin technical preparation.

If you’re interested in other parts of the process, check out other posts:

Part 1: Why I made the decision to change

Intro

As an engineer and scientist, it’s really tempting to directly jump into watching tutorials, taking courses, working on projects, writing code, and reading papers. And to be honest, that’s exactly what I did. But in that process, I learned a lot about the machine learning industry, product cycle, and various companies. In retrospect, having this knowledge ahead of time would have enabled me to assess my personal interest in different ML jobs and help focus down and prioritize my technical preparation.

In this post, I’ll highlight some core concepts that are helpful to know about the ML industry if you’re just getting started. Of course, what I write here barely scratches the surface, so I’ve also provided resources and methods so that you can dig deeper in your own time.

Sections:
· ML vs. AI vs. Software Engineering
· The ML Product Cycle
· A Machine Learning Product is Data and Model
· Research vs. Engineering
· Next Steps: Networking to ask Questions
· Next Step: Scraping company websites
· Next Step: Keep your eye on the OpenAI residency

ML vs. AI vs. Software Engineering

The industry seems to use the terms ML and AI loosely and interchangeably, which can lead to confusion. Some people refer to ML as the traditional machine learning techniques that existed before deep learning entered the mainstream (e.g. SVM, Decision Trees, KNN, PCA). These same people likely to refer to AI as deep neural network-based methods. Yet other people, and many companies, broadly use ML to cover any of these technologies.

There’s also no standardization between role titles at different companies. Software Engineers (SWE) and Machine Learning Engineers (MLE) could refer to any or none of the above. So when you read about these roles or talk to people and companies about them, be sure to differentiate what is actually being discussed based on context or detailed job descriptions.

After a year learning about machine learning, I consider ML/AI to be a subset of software engineering. Manipulating code and data and scaling it up to serve millions of customers is something that software engineers have been doing long before deep learning arrived. Many tools, infrastructure, and best practices are entrenched in software engineering. You should be prepared not to learn ML in isolation but together with software engineering concepts.

In these posts, I will use ML as the catch-all term to represent both traditional and deep learning-based methods. There’s a lot of commonality between them that, for us hardware engineers, it’s not helpful to differentiate in the early stages.

The ML Product Cycle

Model training gets a disproportionate amount of attention in social media and public discourse. In 2023, everyone was training their own LLM from scratch (OpenAI, Anthropic, Mistral, Meta, Google etc.) and VCs went crazy funding these efforts. But in the machine learning product cycle, model training is just one of many pieces that must work well together.

Machine Learning Pipeline Infrastructure. Source: https://developers.google.com/machine-learning/testing-debugging/pipeline/overview.

Direction translation: ML Engineering can involve any or all of these tasks. At a larger company, you may be responsible for evaluating models or handling the data pipeline or building the context retrieval algorithm. At a smaller company, you will probably be responsible for multiple areas. The main takeaway is not to spend all your time focused on training models, despite the hype. You should have an understanding of the other phases of the product life cycle because your future job will likely involve them and you will certainly be asked about them in interviews.

If you want a light introduction to the machine learning product cycle and operations (i.e. MLOps), I recommend Machine Learning in Production, a Coursera course by Andrew Ng. It’s pretty high-level so you can get through it rather quickly, and it’ll introduce you to some terminology and concepts that will provide good perspective down the road.

A Machine Learning Product is Dynamic

In traditional software engineering, once you’ve coded up and deployed your product on a cloud platform, it’s “done”. You may occasionally fix bugs or make updates due to configuration changes and customer needs, but the product itself does not change much.

For a machine learning product, this is no longer true. An ML product needs to be constantly monitored and updated. This is because an ML product is composed of both the underlying model and its training data, which is effectively encoded in the model’s weights. If the training data has become stale compared to new input data, then the model’s performance will degrade.

A classic case study of this phenomena is machine models used to detect credit card fraud during the Covid pandemic. These models were trained using data before Covid, but once the pandemic hit and lock-downs were issued, consumers’ buying patterns fundamentally changed. As a result, the performance of these algorithms significantly degraded (e.g. a lot of false positives), and banks were forced to adapt the models using more recent data. This is an extreme example, of course, but such data distribution shifts occur regularly in every machine learning product. Because the model’s training data is encoded in its weights while the real world continues to drift, a machine learning product’s performance and input data must be constantly monitored, evaluated, and analyzed. It is an open question how frequently to retrain or otherwise adapt a model based on this feedback.

In the era of LLMs, there’s potentially another layer on top of this: prompting and providing context to LLMs during inference. Often, this involves retrieving information from a database based on a user query and feeding it to the LLM together with the user query. This database will likely evolve with new content and formats, as well as shifting user preferences, which in turn require the context retrieval algorithms to update.

I highlight the dynamic nature of machine learning products because I don’t believe it’s mentioned enough in standard introductions to machine learning material. Specifically, machine learning products are not a “set and forget” technology. Once deployed, a model needs to be constantly monitored, evaluated, and updated (together with any upstream/downstream models), and this work is as important as the actual model development itself.

For a great resource to learn about deploying machine learning systems in the real world, I recommend Designing Machine Learning Systems by Chip Huyen. If you’re just getting started, you may want to hold off on reading this until you’ve learned the basics of machine learning. But if you’re familiar with the basics, this book is a great supplement discussing practical aspects of deploying machine learning products.

Research vs. Engineering

There is a distinction between ML research and engineering roles, though it may blur in some smaller companies. ML research roles usually require a publication record in major ML/AI conferences. These roles focus on cutting-edge model development and techniques. On the other hand, ML engineering roles focus more on commercializing ML products. This distinction is obvious by name, but it’s helpful to keep in mind as you scan ML job postings.

Next Steps: Networking to Ask Questions

You know how important networking is for landing interviews, so I won’t harp on that. Instead, at this stage, network to gain an insider’s perspective of ML. As an outsider, I had no idea where to begin; blogs (!), news articles, social media, and arXiv are constantly buzzing with the latest and greatest LLMs that seem to be released only weeks apart. If you have friends or colleagues who work in ML, you have a great opportunity to get a more detailed and honest description of the industry and day-to-day work.

One of the most exciting things about ML/AI is that it’s very early and changing constantly, so you won’t get exactly the same answer from any two people. Talk to as many people as you can to gain wider perspectives. Here are some questions I found helpful to ask:

How are ML teams organized at your company?
Which part of the ML product cycle do you work on?
What does your day-to-day job look like?
What new features are you working on?
What problems are you trying to solve?
What constraints drive your team’s decision making?
What kind of interview questions does your team ask?
If you had to learn ML today, what would you focus your efforts on? What would you skip?
What do you like most and least about your job?

Of course, everyone’s answers will be different based on their specific company and their specific team. After all, there’s no standardization yet. But hopefully, with multiple conversations across multiple companies, you’ll be able to identify trends from company-specific items. Also, don’t forget to keep in touch with these people — they will be your referrals down the road.

Next Step: Scraping company websites

I’ve found that many ML companies make an effort to publish blogs on their websites. The quality of the blogs varies widely, but in certain cases, you can find excellent learning material on them. If you’re interested in a specific company, I recommend you scrape their websites for blogs, papers, and other materials. Not only will it give you a more detailed insight into the type of work they do, and the problems they’re trying to solve, but it will also give you an edge when you interview there.

Next Step: Keep your eye on the OpenAI residency

This last item is not a specific preparation step, but rather an incredible opportunity to keep on your radar. I didn’t apply for the OpenAI residency, but you might. From the website itself, the OpenAI residency…

is ideal for researchers specializing in fields outside of deep learning like mathematics, physics, or neuroscience. The program can also work well for exceptionally talented software engineers who seek to transition into full-time Research based positions in the field of AI.

When speaking with folks who applied, I learned that the interview process doesn’t expect you to have pre-existing ML knowledge. Rather, they are looking for people with exceptional math and statistics fundamentals who can be crafted into ML engineers. It is a competitive program, of course, but a golden opportunity to learn from some of the best minds in the industry. Even better, they pay a full time salary ($210k + benefits).

Google an Meta also have residency programs, but they seem geared towards undergraduates and act as a sort of substitute for the PhD.

In the next post, I will describe the technical preparation I found most helpful for learning about machine learning and preparing for a machine learning interview.