Engineering your own education

The journey of a lifetime

Jeff Smith
Data Engineering

--

One of the fun parts of being on a rocket ship of a startup is that I get to spend some of my time recruiting. I know that not every engineer enjoys recruiting, but personally, I really love it. When I recruit for Intent Media, I’m helping to introduce some smart motivated people to awesome jobs where they will have every chance to succeed in their professional goals and have a great time along the way. It’s sort of like being Santa Claus for good engineers.

One of the most salient aspects of recruiting for me is the difference between people working in data science and engineering and those looking to get started in the field. I got my first job in a data science group back in 2008, before there was a lot of clarity around this emerging profession. When I was at Perlegen, our next door neighbors and softball rivals LinkedIn were doing a lot to clarify these concepts, but it was clear even in the biotech world that sure were a lot of important jobs having to do with data that required different sorts of engineers than had been involved in the past.

Now, when I talk to young mathematically inclined people who can write some code, they all want to talk about data science. Recent grads with degrees in things like electrical engineering, physics, and biomedical engineering are all rushing to brand themselves as data scientists. I understand the desire, but I often have to hold myself back from getting all Lloyd Bentsen on them.

http://youtu.be/O-7gpgXNWYI

I understand the young technical person’s desire to align themselves with a hot profession, overflowing with good paying jobs. As a strategy it makes sense, but young engineers can fail to appreciate the negatives such an approach has.

What I want to know most when I meet a young engineer is not what they know; it’s what they’ll learn in the first year that I work with them.

So when a twenty-two year old physics grad tells me that he has a strong background in data science, I get quite concerned about where he sits on the Dunning-Kruger curve. It’s okay to know that you know nothing. If I’m considering working with you, I’d prefer that you know that you know you know nothing. It’s cool. I know that I also know nothing, but with a bit of luck and strategy, we might find a way to implement the next feature story in our backlog. We just need to make sure that a little thing like hubris doesn’t get in the way of learning the things we need to know to solve the real world problems in front of us.

Learning how to learn

I enjoy it when a junior engineer asks what they should learn to get a job in data science and engineering. It shows that they appreciate that there is more to be learned. But I would love to be asked how they should learn those things.

Even though no one asked, I’m going to tell you anyway. This is not another post on, “What does a data scientist do?” Those resources are out there. I’m going to assume you can Google as well as me.

Instead, I want to focus on the lifelong process of becoming a better engineer. Because I work on fun big data stuff, that certainly influences my perspective and what people want to talk to me about. But I think these tools and techniques apply equally to all kinds of engineers, including those oh, so sexy data scientists. They’re engineers too.

The Old (and New) School

You can’t talk about learning and software qualifications without wading into the contentious territory of whether or not programmers need a CS degree. This is a complicated issue that some people have very strong opinions on. I’ve blogged about it in the past, and I would encourage you to read that post.

To summarize my views on formal CS education, I’d say that it’s great preparation if you can get it. Of course, I worked for a fair bit in software without having a degree in CS, so I don’t think that it’s mandatory to have the qualification. But it should be mandatory to have the knowledge, at least some of it. When I worked in software without a formal qualification, I was useful because I had still learned all sorts of things about software.

How did I learn those things? In my case, I learned mostly from books and practice. When I did finally decide to get a CS degree, it was in large part to signal to employers that I had in fact read those books and practiced the material in them. If I had to sum this up in a nice prescriptive bullet point it would be something like this:

ProTip: If you can’t get a degree in CS, at least read the textbooks and do the work. That will give you most of the benefit of formal CS education.

Of course, the 2014 version of this advice has to acknowledge that if you can do that, then you can probably do a bunch of MOOCs. Coursera, Udacity, edX, and all the other MOOC providers offer world class education for little to no money at any pace (I prefer to watch lectures at 2X, personally). There’s nothing stopping you from getting just as good at programming and statistics as the best and brightest from Stanford and MIT.

I started taking MOOCs during that first experimental session taught by Thrun, Norvig, Ng, and Widom at Stanford. While I was traveling across Indonesia for a month, I was also traveling through the inner depths of databases, guided by Jennifer Widom.

For various reasons, it was one of the best courses that I’ve ever taken in any topic. If you haven’t taken it, you should, especially if you’re interested in data science or engineering. Much of what data engineers do is to solve the old problems of databases at a new scale. Having a solid foundation in databases ends up being useful nearly every day of your working life.

More generally, if you’re an ambitious engineer, who wants to do great things in tech, I have a hard time taking you seriously if you don’t take advantage of world-class resources like MOOCs. Lots of us come from modest backgrounds that didn’t necessarily give us much of a chance to go to a place like Stanford. But now that Stanford is just giving away their education, you need to step up to the plate, if you have big data dreams.

Learning on the clock

I think that most smart junior engineers have an appreciation for the material in the previous section. Most promising candidates I meet understand that they need to be as best prepared as they can be to snag a fun, challenging tech job. The point I see many miss though, is that their tech job is the beginning of the next phase of education, not the end of their education.

One of the most important things a professor ever said to me was the following.

Everything that I’m going to teach you in this class was invented after I left school.

The professor was Dennis Kroll and the topic of the course was Java development (that thing I now do for a living). His point was about the duration of our educational journey as engineers. If we wanted to stay useful and relevant as engineers, we were going to have continually find ways to learn more, throughout our professional careers.

If that sounds unreasonably challenging, I have good news for you: we have the tools to make this easy. Working in an exciting technology company involves all sorts of learning as an embedded part of the work.

For example, at Intent Media, within the past year we’ve had study groups of various sorts around TDD, management, QA, and functional programming. If you’re a student getting bored and exhausted with tedious homework, this might sound like a negative to you, studying more once you already have a job. It’s actually a huge benefit. These sorts of opportunities give you a chance to continue to get better at the things you do and the things you want to do in the future, guided and supported by the people who know your work the best, your coworkers.

I would be lying if I said that every company offered you the same level of opportunity to learn and grow. I’ve been at a bunch of great tech and biotech startups, filled with smart, motivated people who wanted to continue growing and learning for the rest of their lives. Not every place is like that. In my one tenure at a Fortune 500 company, the environment was so unstimulating and unsupportive that I ended up learning bioinformatics at night, mostly out of frustration and boredom.

That was a younger me. I like to think that a more mature me would have could have found a better way to engage my colleagues in developing a more learning culture at work. Since that time, I’ve been entirely in the startup world. Startups can be incredible places to learn, when built properly.

Beyond simple things like study groups, startups can be places where people experiment with and rapidly adopt new techniques that are going to allow you to develop and grow as an engineer more effectively. I’m think of things like agile squads (which we just adopted at IM) to organize the teams and lean coffee to run meetings. These are techniques which will have an explicit impact on how well you are supported and guided to become a better engineer. The more hierarchical and bureaucratic organizational techniques common at larger companies can actively prevent an engineer from learning and growing. Startups can offer a stark contrast to those sorts environments.

Photo from Intent Media

Another great technique that a company can use is pair programming. If you’ve never tried it, pair programming might sound scary and invasive. It’s not. In fact, pair programming is one of the best tools that an organization can use to ensure that all of its engineers are growing and learning. When you pair program, you as a junior engineer are directly in the code learning from a more senior engineer how they write good code. The examples could not be less abstract; you learn from implementing your actual work. This is miles away from the frustrating artificiality of school homework.

But pair programming at its best is not just a one way street. It’s a great way for everyone to learn from everyone else. Even if your more advanced code knowledge is limited to the 10 lines of Java you wrote yesterday, that’s still 10 line of code that I know nothing about, and if I’m on your team, I want to learn what you know about it.

As a cultural practice, pair programming promotes a more egalitarian environment where it’s explicitly recognized that all engineers have knowledge to share and deserve to the opportunity to learn from each other. On the data engineering squad that I work on, we’ve formalized this with this saying:

We are all teachers and students.

By which we mean that we should all be humble enough to learn from each other.

Fundamentally, many of these practices come from valuing equality. Obviously, equality is a value that has huge impact at the broader social level, but I think that it’s also a fundamental part of the connection between how we work and how we live. I believe that the tech professionals who agree that unconferences are the best way to learn from each other are the same sort of people who are going to fight for a more democratic society.

This applies the same at the team level. As I talked about in this post and my recent talk at CITCON, data engineering teams need to ensure that all team members have a shared understanding of the problem, the application, the techniques, the failure modes, etc. One of the obvious first steps for a team looking to build reliable machine learning software is to ensure that everyone has some basic level of education in what machine learning is and how it works.

When I worked for AI pioneer Ben Goertzel, he never hesitated to teach new hires concepts as basic as the difference between precision and accuracy. If he has the time to teach new members of his team, then I think that all senior engineers or data scientists should have the time to ensure that their team members know what they need to know to be able to make valuable contributions. Training team members so that they understand the problem domain should be a given.

And, of course, teaching a topic is also one of the best ways to learn a topic. So, as a junior engineer grows into a senior engineer, they may find themselves switching formal roles from student to teacher, but they should still be learning through teaching. One of my primary motivations in starting up blogging again was to do a bit of teaching as part of my personal educational activities.

Learning how to choose

I hope all of the above material is useful to the junior engineer looking to understand how they develop the skills to get a good tech job. I understand that it can be hard. Despite the enormous shortage of software engineers, it can still be quite difficult to find the right job when you’re just starting out.

The well prepared software engineer is likely to have a wealth of choices, though.

Even in that situation, it can be hard to understand the basis for which you should choose Company A over Company B. Given how important learning and growth are to your future as an engineer, I’d strongly encourage you to think about how you’re going to grow before and after you get that first exciting tech job. It’s been my experience that startups offer great environments to learn and grow as an engineer. If you asked me to recommend just one company, I think it’s clear whom I would recommend.

But regardless of if you come to work with me, I think that you should work at some place that values your lifelong development into a better technical professional. My best advice on how to get into such a place is to start thinking and acting in terms of that scale: your entire working life as an engineer (even if you wind up in management someday).

You owe it yourself and everyone who helped you get this far to never stop learning. The journey of becoming a better engineer is never ending but never disappointing, either.

--

--

Jeff Smith
Data Engineering

Author of Machine Learning Systems @ManningBooks. Building AIs for fun and profit. Friend of animals.