Everything you need to become a self-taught Machine Learning Engineer

Jason Benn
10 min readMar 13, 2020

--

In 2011, I graduated from UVA with a degree in Economics and joined a boutique consulting firm in Virginia.

None of that is related to what I do today.

Because today, I am a Machine Learning Engineer at a well-funded ML startup in Silicon Valley, living my dream. I work on the latest in AI research, implement AI papers, and explore new product ideas in my specialty fields of NLP and ML infrastructure.

The thing you do after graduation is not always your permanent path. I absolutely hated everything about consulting. I lasted about a year and a quarter, and then I decided to quit and have a mulligan on my whole life.

I made a pros and cons list of all the possible careers I could do instead. My favorite book was Cal Newport’s “So Good They Can’t Ignore You”, which persuaded me that my career North Star should just be to learn really useful skills. By that metric, programming was at the top of my list.

“My career North Star should just be to learn really useful skills. By that metric, programming was at the top of my list.”

As I saw it, the steps to get there were: buy a bunch of textbooks, move into my mom’s basement, read those books, and program until I got a job.

ML hack: first, become a programmer

Because I was starting from scratch, I decided to join a short programming school. I signed up for Dev Bootcamp, a 3-month program designed to help me get my first job.

Dev Bootcamp has since shut down, so my #1 recommendation now is Lambda School, which is a 9-month program where you learn how to code from teachers alongside other newbies. Lambda School’s business model is special: you don’t have to pay anything upfront, you pay a percentage of your income over your first two years as a programmer. In other words, Lambda School is incentivized to find you a great job after graduation, and a lot of great companies hire out of Lambda School.

You can go from a bank teller to a programmer in 9 months.

What’s a helpful career path? I think it’s important to know that there are a broad variety of jobs that use machine learning. It would be pretty hard to go straight into machine learning engineering without a background in Computer Science (CS). But it’s not as hard to go from non-CS into Data Science (and yes, Lambda School also has a Data Science program).

This article is about machine learning engineering (MLE), but remember that the foundation of any great programming career, including MLE, is computer science. The ML in MLE is only 10% of the job. Another 10% is data science. The last 80% is computer science.

After Dev Bootcamp, I learned computer science through Bradfield, a school where working programmers can take a series of one-month courses. You can take them part-time in addition to your day job. I did six between 2015–2017 and another one in 2019, and now they’re fully online. They’re intense — 10 hours of class and 10 hours of homework a week — but worth it.

If you want to do the same without paying for their classes, they make their curriculum free online at teachyourselfcs.com.

ML hack: create a group of ML enthusiasts

Three years into my programming career, and near the end of my CS curriculum, I started looking for my next big useful skill. I picked machine learning. First, I co-organized an ML book club with a coworker. We read An Introduction to Statistical Learning: with Applications in R (which is my second choice recommendation after this one).

I wanted a group to keep me accountable to skill building, so I launched my own Paper Club, an initiative where two friends and I went through all of fast.ai in six months. Fast.ai is an online course with two parts. Each is about 15 hours with 15 hours each in homework (so, 60 hours total). We met one night per week on Zoom or in a local park. After we finished fast.ai, we started reading research papers, and would go through each paper, page by page, and discuss blockers or challenges with the end of being able to understand any paper. It was tough — for the first few months, it took us 6 hours to read each paper (and each paper is only 10 pages long!). Paper Club lasted about two years and without that social accountability there’s no way I would have been able to keep studying for so long.

After one year of Paper Club, I decided it was time to quit my job again.

I spent two months on sabbatical, learning ML all day. I focused on diving deeper into the fundamentals and reimplementing neural nets from scratch.

If you’re wondering about helpful books, here is a 10-week course I assembled for a few friends. And here’s a post about learning to become a SWE and a MLE and the books I read/liked (scroll to the bottom for the ML content).

My top four ML books I usually recommend are:

Hands-On Machine Learning with Scikit Learn and Tensorflow

Introduction to Statistical Learning: with Applications in R

Deep Learning by Ian Goodfellow

Neural Networks by Michael Nielsen (free online!)

All of these books are 400–500 pages long, with the first two being about statistical ML and the last two being about deep learning.

Grab these books and find your people. Look for places that other curious programmers are spending time. For me, that was Bradfield. The kind of person who spends 10–20 hours/week learning is exactly the kind of person I wanted to study with.

ML hack: find great ML papers to read and implement

The gold standard of ML competence is being able to implement most research papers (and even better, implement the entire paper in one day — and last week was my first time hitting that goal!).

To do this, you first have to find great ML papers. Follow people you admire on Twitter. And when you take ML courses, be sure to note that papers the classes mentions.

My favorite spot to find ML papers to read or implement is Andrej Karpathy’s Arxiv Sanity Perserver, which aggregates recently popular papers.

Another option for hunting down a great ML paper: pick a field you’re interested in (e.g., grounded language learning) and go on a targeted search to find a “survey paper” (a review of the literature on that topic). Then follow the citations to uncover the most foundational papers on the topic, and implement those.

ML hack: keep up with the speed of innovation

Be early, but not too early.

Based on the trajectory of the research community, I would recommend learning PyTorch over Tensorflow. There are some fancy new programming languages designed from the ground up for ML, but they’re too early. Stick to Python, the language of ML.

People love asking for my go-to blogs, so here are my top recommendations: distill.pub, thegradient.pub, inference.vc, blog.acolyer.org, openai.com’s blog, and ruder.io.

Distill.pub is far and away the best blog.

ML hack: perfect your working environment

I practice deep work. Or I try to.

Deep work is an extended state of flow where you are focusing intensely and not getting distracted. It’s easy to talk about but hard to achieve reliably (much like meditation). And though it’s always hard, you can make it easier if you design your environment properly.

My average work day is structured into two deep work blocks. I start my day around 9am with a 4-hour deep work block in the morning. I stop at lunch to eat and take meetings. And then have one more deep work block until the day is done. Caffeine helps to supercharge these blocks.

I was actually profiled in Cal Newport’s Deep Work and so people often ask me how to get to this flow state. You want the secrets? I try and block all distractions. I block my favorite websites. I prefer to read books instead of blogs, which trains my brain to focus longer. I block all notifications and put my phone in airplane mode. I work in places without internet, ideally outdoors. Whenever possible, I work on pen and paper. I keep bright lighting and find that a slightly colder working temperature can feel like an extra 50mg of caffeine. I listen to brain.fm — wordless music that’s the perfect tempo for focusing.

If I’m struggling to get started, I sometimes jumpstart things with a depth ritual. I use a Macbook Pro (if you’re on a PC, install Linux) and I’ve memorized dozens (hundreds?) of hotkeys. That last one isn’t relevant to deep work, I just really love hotkeys.

“You want the secrets? I try and block all distractions.”

How do I avoid people interrupting me? It’s helpful that deep work is a cultural value at my startup, so we batch all of our meetings into the middle of the day. Otherwise, I just wear big headphones.

I also find that working with someone else is a great way to stay in flow. I love pairing (and love it more than working alone).

ML hack: get an ML project done

Hackathons might not be a go-to place to find a job, but it’s a great place to work out an ML project. During my last hackathon, my three friends and I built AI Rock Band in just under 24 hours as part of Developer Week 2020.

Find a good starter project to build up your ML skills.

Aside from homework problems and implementing little pieces of neural nets (like the sigmoid function, softmax function, or backpropagation), I would recommend picking a project with friends and hacking it out. Pick something that’s in your daily life. Example: if you live with roommates, and there are always dirty dishes in the sink, you could build (with a webcam) an image classifying network, so that whenever someone leaves a dirty dish in the sink for longer than 3 hours, the WiFi shuts off!

ML hack: know your math

I come from a lineage that is notoriously bad at statistics. My grandfather failed his stats exam four times. My dad’s only B in college (nerd) was stats. And I got a C in stats in college — and I needed a C+ to pass.

But I wasn’t deterred. And now I’m essentially a statistician.

Linear algebra is key but all you need to get started is a four-hour YouTube series by 3blue1brown.

As for calculus, it’s nice to review it a little bit. You don’t use it a lot unless you’re doing ML research. The minimum: know enough math to not be afraid of all the symbols you see in research papers.

“Know enough math to not be afraid of all the symbols you see in research papers.”

Right now, I’m doing CS294 (Deep Unsupervised Learning by Pieter Abbeel) through Berkeley, all online. It’s a second-year grad school course with four homework assignments. Each homework involves implementing multiple papers. It’s really hard, but such a great exercise (particularly because you immediately know if you’re doing the right thing).

Realistically, you don’t need this level of mathematics to get your first job. You can become a MLE and just reuse pre-trained models (which are often packaged up as libraries on Github) or Scikit Learn for statistical ML (as long as you have at least one textbook’s worth of theoretical background).

ML hack: tell everyone you’re looking for a job

Once you get your foot in the door, you’ll steadily improve at ML.

But first, you have to get that foot in.

For me, it was helpful to achieve conversational competence in ML. Once you hit conversational competence, it’s easier to find a job. And you get there by learning the fundamentals with fast.ai or textbooks (see above), reading papers, and ideally producing 1–4 Anki flashcards per hour of learning. If you haven’t yet heard the Good News about spaced repetition, read this primer from Michael Nielsen.

“Once you hit conversational competence, it’s easier to find a job.”

After that, it’s important to network and be scrappy. People knew I was a competent software engineer, and I was willing to take a short-term contract. A friend of mine was starting an ML company, so I negotiated a 3-month contract to do some early ML work for them. Funny enough, that contract ended up being a great experience, and I’m still with them over two years later!

In machine learning, building up your credibility is important. And you can do that by posting your work on Medium (check out my articles on multimodal feature representations and CNNs for text classification), Twitter, Github, and LinkedIn.

I’ve also had friends leverage their ML learnings to transfer internally at their company (from a non-ML dev role to an MLE role) or found a startup.

The biggest takeaways here are: talk about it with friends, be open to a contracting gig, embrace networking, build up implementation proof online, and establish credibility. Finding an ML job comes down to: network effects, scrappiness, and credibility.

“Finding an ML job comes down to: network effects, scrappiness, and credibility.”

ML hack: know when to do a PhD and when to skip it

If you want to go into ML research, a PhD is going to help you a lot, especially a PhD from a top program. For becoming an MLE, a PhD isn’t necessary.

Evidence of being able to implement a paper or produce high-quality writing is the right credibility to shoot for as an MLE. It shows you understand the material.

A PhD is grueling but will give you a lot of time to do original research and a support structure for doing that research, which is important for getting a job as a researcher. So if you’re inspired by the research route after MLE work, (and aren’t turned off by a six-year commitment), consider a PhD.

The final ML hack: just get started!

Machine learning is a really useful skill, and it’s not too late to start learning. I’ve armed you with the right books, blogs, papers, classes, deep work, and job search hacks. And feel free to reach out and ask any additional questions.

Excited for you all to get started (you know, so I can hire you sooner).

--

--