Switching from Software Engineer to Research Engineer
It took me about 9 months to go from being a SWE at <large_tech_company> to being an RE at <adjacent_ml_or_ai_company>. At the start of those 9 months, I knew squat about Machine Learning. I am not exaggerating when I say I did not know how linear regression worked, or what a neural network was. So, despite the terrifying results you will see if you Google search this, making the switch is possible*, and I’m hoping this article can help you to do the same.
*It is entirely possible (likely?) that I was incredibly, wildly lucky in this process, so please read the disclaimers below. Also, this article is aimed at SWEs who are looking to become an RE at an adjacent division or sister company, not a different company altogether. Finally, this article represents my personal views, not necessarily those of my employer.
You are an asset
It is important to know that strong software engineers are an asset to research organizations. Many of these orgs have mountains of unkempt research code, and they know that some solid software practitioners can go a long way to make their codebases readable and extendable. Because of this, having excellent SWE skills and mediocre ML skills can be a viable path:
Overall, you should stress in your résumé and interviews that you are a very strong, proven software engineer while showing that you are able to learn the required machine learning skills in a reasonable amount of time.
Gaining ML Experience
That being said, your skill makeup (as mine was) may be so far in the top-left as to be in the red bar, i.e. having too little ML knowledge. Indeed, astute readers will notice that the leftmost red bar is wider than the rightmost one — in other words, companies are usually more willing to hire strong ML practitioners with weak SWE skills than the other way around. Because of this, you should likely take anywhere from 6 months to several years to bridge that gap.
Broadly, you should be able to take a real-world problem and describe how you would build a solution end-to-end. Crucially, this entails having a strong knowledge of machine learning and deep learning fundamentals. It also means you should have a working understanding of popular architectures in areas such as computer vision, natural language processing, recommendation systems, and reinforcement learning. In practice, it’s OK to have one area you focus on, but you should at least be able to talk about the general ideas and latest advances in all those topics, or at least the ones most relevant to the company you are applying to.
General Learning
That’s a lot to learn! In general, when learning anything new, it can be useful to split activities up into 3 levels:
Intense learning is the most important by far. It involves spending extended amounts of completely uninterrupted time really banging your head against a problem. It should be painful, but rewarding. A prime example is implementing a neural network from memory using only numpy. If you’re like me, you’ll have to spend hours writing out the math on a whiteboard, writing the code, doing really in-depth debugging, and pondering why the hell it isn’t working. But this exercise of working through something by yourself, strenuous as it is, gives you a deep and permanent understanding in a way that passive learning can’t.
Moderate learning, such as reading books or blog posts, can also be valuable. The crucial objective with moderate learning is that you should gaining intuition about a wider array of topics. For example, if reading Sutton and Barto’s excellent book Reinforcement Learning, don’t memorize the Bellman Equation verbatim:
Instead, intuitively understand that the value of taking an action in a particular state is the immediate reward you would get plus some discounted estimate of the best action in the next state.
On the flip side, and contrary to popular advice, I would not recommend going through certain popular textbooks, such as Degroot and Schervish’s Probability and Statistics. These old-school books are largely hundreds of “Conjecture:Proof” pairs that do little to actually give you an intuition about what you are learning, and I forgot most of it after finishing (even though I did many of the exercises). Instead, instructional YouTube videos, like those of Grant Sanderson, taught me much faster (but make sure to pair watching videos with some intense practice problems).
Finally, passive learning has its place. It can be nice to sit back and watch interesting videos on recent papers or listen in on some top AI researchers while commuting. You should always be cognizant of how much of this you will forget, though, and consider the time you spend on passive learning could be better spent pondering a problem you had during intense work or simply recharging by focusing on something else entirely.
Concrete Resources
Below are some resources and activities that I found helpful during my learning process:
- Activities: Solve a personal or work problem with machine learning (if you had to pick one activity from this entire list, would be it); Re-implement common algorithms, from linear regression to K-nearest neighbors to a neural network, from scratch; Implement an RL algorithm using the OpenAI Gym; You can even try improving on published models, either by following their “Future Work” sections or coming up with your own ideas — oftentimes the authors will love to hear your results, positive or not (suggested by Kevin Summerian)!
- Courses: Andrew Ng has a highly intuitive Coursera Course; The UCL course taught by Reinforcement Learning legend David Silver is a great primer on RL.
- Books: Grokking Deep Learning is a good introduction to Deep Learning and has code for implementing networks from scratch. Sutton and Barto’s Reinforcement Learning book is a classic. Goodfellow’s Deep Learning Book is also a must-read, especially Parts I and II.
- Blogs: https://towardsdatascience.com/ has tons of high-quality posts and tutorials; Chris Olah’s blog is a goldmine of intuition, e.g. his articles on LSTMs and Deep Learning.
- YouTube Channels: Two-Minute Papers; 3blue1brown (especially the series on linear algebra, calculus, and probability); PyData conference talks have some good tidbits; mathematicalmonk has a really solid ML course; Steve Brunton has incredibly high-quality content, such as his course on SVD; UCL’s deep learning lecture series was recently released by DeepMind.
- Podcasts: ML Street Talk goes really deep with top AI researchers; Linear Digressions has some good content on Data Science; Lex Fridman interviews pioneers in the AI field.
- Papers: Deep Neural Networks for YouTube Recommendations; Hidden Technical Debt in Machine Learning Systems; Attention is All You Need; In general, you shouldn’t focus too much on open-ended reading of papers . Instead, use them as a tool for a specific project you are working on.
Possible Pre-Career Paths
It can be incredibly useful to actually work with a team that you would like to join. If you are lucky enough that your company allows you to do dedicated rotations or 20% time, you should 100% take advantage of that. If not, you may still consider reaching out to teams and spending your personal time doing some work for them. Not many teams will turn down free labor, and this sort of real-world experience is invaluable. Additionally, showing a team that you can make an impact gives them a stronger signal than they could ever get with just interviews, and hiring managers should flag that in the hiring process.
Alternatively, you may consider making lateral moves across the company to get to where you want to go. I have a friend that got hired as a UX designer, switched to SWE, and finally moved over to be an RE! Besides ladder changes, consider finding halfway points to RE, such as a SWE team that works with machine learning directly, on machine learning infrastructure, or even on products that have machine learning in them. Finally, for a less extreme route, you can become your team’s resident ML expert. Oftentimes, your project will be happy for you to curate a presentation on the latest ML technologies in the space, and you may even be able to explore adding ML-based features.
Conclusion
I’m still a fresh Research Engineer and I’m sure my opinions on the above advice will change in the coming years. I’m absolutely open to feedback, questions, comments, etc., so please help me advance my thinking on this subject. Together, I hope we can create a useful resource for anyone trying to make the switch, and that we can help you or others find their path to their dream job, like I have :)
Disclaimers
Now that you’ve read the entire article, here are a bunch of good reasons why you shouldn’t listen to me:
- I have an B.S. in Math and an M.S. in Applied CS. The former just means I’m decent at math, and the latter might have helped me get an interview, although in practice I don’t remember much from college classes.
- I joined an Applied Research team, and I imagine the qualifications for an RE in the pure research part of the company would be more stringent.
- At the time of switching jobs, I had no kids and my extremely supportive wife was not working, so I had the time, freedom, and support system that others may not.
- This is a sample size of 1 person at 1 company, and while I am confident in my abilities, I’m also aware that I got stupid lucky with my interviews.