Switching from Software Engineer to Research Engineer

Kevin Villela
7 min readOct 5, 2021

--

It took me about 9 months to go from being a SWE at <large_tech_company> to being an RE at <adjacent_ml_or_ai_company>. At the start of those 9 months, I knew squat about Machine Learning. I am not exaggerating when I say I did not know how linear regression worked, or what a neural network was. So, despite the terrifying results you will see if you Google search this, making the switch is possible*, and I’m hoping this article can help you to do the same.

*It is entirely possible (likely?) that I was incredibly, wildly lucky in this process, so please read the disclaimers below. Also, this article is aimed at SWEs who are looking to become an RE at an adjacent division or sister company, not a different company altogether. Finally, this article represents my personal views, not necessarily those of my employer.

You are an asset

It is important to know that strong software engineers are an asset to research organizations. Many of these orgs have mountains of unkempt research code, and they know that some solid software practitioners can go a long way to make their codebases readable and extendable. Because of this, having excellent SWE skills and mediocre ML skills can be a viable path:

Skill makeup for research engineers between Software and ML.

Overall, you should stress in your résumé and interviews that you are a very strong, proven software engineer while showing that you are able to learn the required machine learning skills in a reasonable amount of time.

Gaining ML Experience

That being said, your skill makeup (as mine was) may be so far in the top-left as to be in the red bar, i.e. having too little ML knowledge. Indeed, astute readers will notice that the leftmost red bar is wider than the rightmost one — in other words, companies are usually more willing to hire strong ML practitioners with weak SWE skills than the other way around. Because of this, you should likely take anywhere from 6 months to several years to bridge that gap.

Broadly, you should be able to take a real-world problem and describe how you would build a solution end-to-end. Crucially, this entails having a strong knowledge of machine learning and deep learning fundamentals. It also means you should have a working understanding of popular architectures in areas such as computer vision, natural language processing, recommendation systems, and reinforcement learning. In practice, it’s OK to have one area you focus on, but you should at least be able to talk about the general ideas and latest advances in all those topics, or at least the ones most relevant to the company you are applying to.

General Learning

That’s a lot to learn! In general, when learning anything new, it can be useful to split activities up into 3 levels:

Amount of energy you should spend on each type of learning.

Intense learning is the most important by far. It involves spending extended amounts of completely uninterrupted time really banging your head against a problem. It should be painful, but rewarding. A prime example is implementing a neural network from memory using only numpy. If you’re like me, you’ll have to spend hours writing out the math on a whiteboard, writing the code, doing really in-depth debugging, and pondering why the hell it isn’t working. But this exercise of working through something by yourself, strenuous as it is, gives you a deep and permanent understanding in a way that passive learning can’t.

Moderate learning, such as reading books or blog posts, can also be valuable. The crucial objective with moderate learning is that you should gaining intuition about a wider array of topics. For example, if reading Sutton and Barto’s excellent book Reinforcement Learning, don’t memorize the Bellman Equation verbatim:

Instead, intuitively understand that the value of taking an action in a particular state is the immediate reward you would get plus some discounted estimate of the best action in the next state.

On the flip side, and contrary to popular advice, I would not recommend going through certain popular textbooks, such as Degroot and Schervish’s Probability and Statistics. These old-school books are largely hundreds of “Conjecture:Proof” pairs that do little to actually give you an intuition about what you are learning, and I forgot most of it after finishing (even though I did many of the exercises). Instead, instructional YouTube videos, like those of Grant Sanderson, taught me much faster (but make sure to pair watching videos with some intense practice problems).

Finally, passive learning has its place. It can be nice to sit back and watch interesting videos on recent papers or listen in on some top AI researchers while commuting. You should always be cognizant of how much of this you will forget, though, and consider the time you spend on passive learning could be better spent pondering a problem you had during intense work or simply recharging by focusing on something else entirely.

Concrete Resources

Below are some resources and activities that I found helpful during my learning process:

Possible Pre-Career Paths

It can be incredibly useful to actually work with a team that you would like to join. If you are lucky enough that your company allows you to do dedicated rotations or 20% time, you should 100% take advantage of that. If not, you may still consider reaching out to teams and spending your personal time doing some work for them. Not many teams will turn down free labor, and this sort of real-world experience is invaluable. Additionally, showing a team that you can make an impact gives them a stronger signal than they could ever get with just interviews, and hiring managers should flag that in the hiring process.

Alternatively, you may consider making lateral moves across the company to get to where you want to go. I have a friend that got hired as a UX designer, switched to SWE, and finally moved over to be an RE! Besides ladder changes, consider finding halfway points to RE, such as a SWE team that works with machine learning directly, on machine learning infrastructure, or even on products that have machine learning in them. Finally, for a less extreme route, you can become your team’s resident ML expert. Oftentimes, your project will be happy for you to curate a presentation on the latest ML technologies in the space, and you may even be able to explore adding ML-based features.

Conclusion

I’m still a fresh Research Engineer and I’m sure my opinions on the above advice will change in the coming years. I’m absolutely open to feedback, questions, comments, etc., so please help me advance my thinking on this subject. Together, I hope we can create a useful resource for anyone trying to make the switch, and that we can help you or others find their path to their dream job, like I have :)

Disclaimers

Now that you’ve read the entire article, here are a bunch of good reasons why you shouldn’t listen to me:

  • I have an B.S. in Math and an M.S. in Applied CS. The former just means I’m decent at math, and the latter might have helped me get an interview, although in practice I don’t remember much from college classes.
  • I joined an Applied Research team, and I imagine the qualifications for an RE in the pure research part of the company would be more stringent.
  • At the time of switching jobs, I had no kids and my extremely supportive wife was not working, so I had the time, freedom, and support system that others may not.
  • This is a sample size of 1 person at 1 company, and while I am confident in my abilities, I’m also aware that I got stupid lucky with my interviews.

--

--

Kevin Villela

AI Research Engineer who is into volleyball, performance psychology, and learning :)