I graduated in the first cohort of Berkeley’s Master’s of Information and Data Science program in August 2015. After graduation, I applied for a number of Data Science jobs, and ultimately accepted a position at Airbnb. Since I was one of the first from my degree program to go through the job hunt process, I decided to share some what I learned through the process. This blog post is adapted from the notes I wrote at the time. Note that these opinions and advice are all my own, not Airbnb’s.
I organize this post into the following sections:
- Finding Jobs: Factors to Consider
- Researching Companies
- Preparing for Interviews
- The Offer Stage
Finding Jobs: Factors to Consider
Data Sophistication: Analysts vs. Researchers
Job titles in this field are not very informative. A “Data Scientist” at one company could do the same sort of work as a “Data Analyst” at another. The job responsibilites for a “Data Scientist” could vary wildly not only between companies, but often within companies. There’s a sliding scale for sophistication required in data jobs. On the low end, it’s primarily business intelligence work: lots of Excel and Powerpoint, possibly with some basic SQL thrown in. On the high end, statistics and CS PhDs construct novel machine learning algorithms and deploy them in production environments. Thus, the only way to understand whether the job you’re applying for matches your desired level of sophistication is to read the requirements carefully, and potentially to talk to people at the company if possible.
One common way that companies deal with the need to have data workers at different levels of sophistication is to have two different data teams. The first is often called something like the Product Team or Analytics Team. Example job titles on these teams include “Product Analyst,” “Analyst,” “Senior Analyst” or “Data Scientist in Analytics.” This team focuses on tactical, product-based decision making. They do a lot of exploratory analysis and reporting. Sometimes they also run experiments. They write SQL queries, and may also use R or Python. I’ll label these as Analyst jobs.
The second is often called something like the Data Science team, Core Data Science, the Research team, or Advanced Analytics. Example job titles on these teams include “Data Scientist,” “Researcher” or “Machine Learning Expert.” This team is usually made up entirely, or almost entirely, of PhDs. They tend to work on data products, prediction problems, and machine learning. Sometimes they do experiments. Sometimes they’ll publish white papers. They tend to work on long-term, less tactical, more high-level projects. I’ll call these Researcher jobs.
I read a lot of Data Science job descriptions over the course of my job search. I came to feel that most Data Science jobs comprise a combination of three tasks: exploratory data analysis and reporting, experimentation, and machine learning. My Airbnb colleague Robert Chang independently came up with a similar task taxonomy when he was a Data Scientist at Twitter. You can read his excellent description of these three tasks, plus ‘data pipelining’, in his blog post. Analyst jobs are usually high on exploratory data analysis and reporting. Researcher jobs are usually, but not always, high on machine learning. Either can include experimentation.
There tend to be many more Analyst jobs available than Researcher jobs. Analyst jobs often, but not always, pay less. Many companies will not need Researchers for many years, as Researchers tend to require large quantities of clean data and solve specialized problems that may not be high priorities for companies until they are sufficiently mature.
A lot of people are drawn to data work because they hear about the sophisticated analysis and high salaries of the Researchers. I’ve talked with many people who don’t have any graduate degrees, but who are interested in getting into this kind of work. The usual strategy is to get an Analyst position with the hopes of transitioning to Research work. However, companies vary a lot in their willingness to let people move between the groups. If you’re thinking of making this leap, check to make sure that there’s precedent of this kind of move within the company. You might also ask whether there is mentorship available in the areas that interest you, and whether you can occasionally work on projects that move in you closer to the type of work you’re hoping to do.
Keep in mind that recruiters and interviewers will be trying to figure out where you fall on the sophistication spectrum, both in terms of evaluating your fit for a given job description and in terms of figuring out your market value. I confused a few interviewers when I mentioned my machine learning projects from grad school, as I wasn’t applying for their companies’ machine learning specialized jobs.
One final word on this topic, which probably deserves its own post. In grad school, I asked every visiting Data Science panelist or speaker the same question: What is the most common misconception about Data Science among newcomers to the field? I almost always got the same answer, that Data Scientists are always using the most sophisticated methods available. In a business, what truly matters is doing impactful work, not sophistication for its own sake. A lot of important business questions can be answered with univariate bar charts. Even if you have a prediction problem to solve, it’s often not worth going beyond a basic logistic regression. More complex methods are usually less interpretable and harder to implement efficiently at scale, so the marginal improvement in performance they provide is not worth the additional costs. Sophisticated work can unquestionably be extremely impactful in the right situation, but if you’re primarily motivated by the thought of pushing the limits of neural networks rather than doing the dirty work of improving a product incrementally, academia might be a better fit for you than industry.
Distributed vs. Centralized
Some Data Science teams are organized in a centralized fashion. These teams function as an “internal consulting group” to which other teams bring problems. These teams’ Data Scientists tend to sit together and work together. They tend to report to other Data Scientists.
Other data teams are distributed. These teams’ Data Scientists are embedded in product teams, sitting with engineers and designers rather than with one another. They may report to managers on product teams rather than to other Data Scientists.
There are tradeoffs for each style. The centralized style means the Data Scientists work more closely together and learn more from one another. They can retain more objectivity in their work since they’re not reporting to product managers who are incentivized to use data to present a sunny picture. In the distributed style, meanwhile, the Data Scientists work more closely with their business partners. They have better context when working to solve problems, and are often able to have more of a tactical impact.
Most companies follow the centralized model in the beginning, when there are only a handful of Data Scientists, and then move closer to the distributed model over time. Many data teams, mine included, end up somewhere in between the two styles.
Small vs. Large
Companies vary wildly in the size and maturity of their data teams. There are, of course, lots of tradeoffs between working at large versus small companies in general, but some of these tradeoffs are specific to Data Scientist roles. If you work at a smaller or less mature data team, you’ll likely be building the data infrastructure yourself. The ‘Data Scientist’ job ends up looking a lot like a Data Engineer job. This blog post does a great job of describing the evolution companies go through regarding data infrastructure. Additionally, if there are few Data Scientists, you’re less likely to have good mentorship, as there are fewer people to learn from.
On the other hand, since the team hasn’t been around for as long, early Data Scientists can make bigger impacts on the company and answer more fundamental questions. They may prepare data that is shown directly to the company’s board members. They tend to get more freedom to work on what they think will be valuable. Also, they are in better position to become managers if the data team grows in size down the road. Conversely, larger teams may have better data infrastructure in place and more opportunities for mentorship, but their projects will have narrower scopes.
Sources for Job Postings
I found job postings through a variety of sources:
- Recommendations from friends
- My grad school’s job board
- LinkedIn’s job board pages
- KD Nuggets / Kaggle / other Data Science website forums or job boards
- Looking at where Insight Data Science and Galvanize bootcamp graduates went after graduating
- Andy Rachleff’s list (see below)
- AngelList, for startups
A Data Scientist I met during my job search introduced me to Andy Rachleff’s writing. He’s a Stanford Graduate School of Business professor who was a cofounder of the fintech company Wealthfront and the venture capital firm Benchmark Capital. He publishes this blog on the Wealthfront website that has a lot of great advice for tech workers. He originally shared it with Stanford MBAs only, but he eventually opened it up to the public. I don’t agree with all of it, but it’s still very informative. In particular, he has a guide to careers in Silicon Valley that I found persuasive. Every year, he publishes a list of “career launching companies” that he thinks are good to join, following his own advice in the career guide. I found this list a valuable source of tech companies to investigate. Another similar list is available here.
Posting on Facebook
When I started my job search, I posted on Facebook to solicit my friends for job opening suggestions. The post received a ton of attention, and people came out of the woodwork to send me great job openings that I never would have found otherwise. I highly recommend doing so if you’re able to announce publicly that you’re looking to change jobs.
LinkedIn and Second Degree Connections
I never, never, never applied to any companies without an introduction to someone who worked at the company. I assumed that blind online applications have a 0% success rate, or close to it. Instead, once I was interested in a company, I would use LinkedIn to find a first- or second- degree connection at the company. I would write to that connection, asking to talk to them about their experience at the company and, if possible, whether they’d be able to connect me to someone on the Data Science team. Whenever I could, I did in-person meetings (coffee or lunch) instead of phone calls. As an aside, Trey Causey recently wrote a great post on how to ask for just these kinds of meetings. I would never ask for a job directly, but they would usually ask for my resume and offer to submit me as an internal referral, or put me in touch with a hiring manager. If they didn’t seem comfortable doing so, I wouldn’t push it; I’d just thank them for their time and move on. It was helpful to contact people for several reasons. For one, being an internal referral gives you much better odds than being a random person on the Internet. Another advantage is that talking to real employees teaches you things that you could never learn otherwise — how the interview process works, what challenges the company is facing at the moment, what tools they use, what they value in candidates, whether it’s a good match for you, etc. This strategy was time-consuming, but productive.
I mostly vetted companies through conversations with employees, as described above. However, other tools were helpful. In particular, Glassdoor company reviews gave me a good general sense of how people felt at each company. Glassdoor can also give you a vague sense of how people are compensated. Quora sometimes gave good color on how companies were perceived in the tech industry. I would look up my interviewers on Quora, Github and LinkedIn to get a sense of what they were interested in and, in some cases, gauge what kinds of questions I was likely to get.
Some people choose to hire professional recruiters / headhunters for finding Data Science jobs. I didn’t do so, and I’m pretty wary of the concept. Recruiters’ incentives usually aren’t aligned with yours. They often want to get you into a job as soon as possible in order to get paid. A lot of the best companies don’t work with recruiters. Also, recruiters are expensive, and their help isn’t necessary in my opinion.
Preparing for Interviews
The Cadence of Interviews
Usually it was something along these lines:
- Talk to a recruiter on the phone / light phone screen
- Takehome data homework problem
- Technical phone screen
- Onsite interview
- Sometimes, a second onsite interview
I made one. It was the first thing I did after updating my resume. I used SquareSpace, which made it easy to do, though it is somewhat expensive. (If you go with SquareSpace, make sure to support your favorite podcast by using their coupon code!) I tried to make the page as visual as possible. I knew most people wouldn’t want to read much of the text, so I kept the text short. I linked to my Github and my project websites whenever possible. I made the first project on the page my grad school capstone project. What I didn’t realize is that people almost always only read the first project on the page. Therefore, do your best to have one great project, put it first, and don’t expect people to read anything else on the page!
Don’t Focus on Kaggle; Instead Do Your Own Projects
A lot of people recommended that I do Kaggle competitions to get ready for interviews. I found that to be generally unhelpful advice, unless you’re specifically applying to machine-learning-only jobs. Interviews, and most real jobs, rarely feature such clean data and well-defined problems. As mentioned above, most jobs don’t involve building complicated machine learning algorithms to eke out the last 2% of accuracy. Instead, I practiced for interviews by scraping data from the web and doing my own analysis projects. I’d talk about these projects passionately since I picked topics that interested me. Additionally, since I was using real, messy data, it was a closer proxy for the actual job I’d be doing.
You will need SQL, in some form, at almost every data job. I got a ton of challenging SQL questions in interviews. By challenging, I mostly mean questions that involved tricky joins and subqueries. I read this book over a few hours to brush up and found it fairly helpful.
Pick R or Python
A lot of companies have at least one homework stage. They send you data and ask you to answer a question using it. You usually get to pick your favorite tool for this task. In the real job, it’s great to know several languages. In job interviews, however, pick one language and stick with it. You have to be fast in interviews. You don’t have time to switch back and forth. I realized halfway through an interview that I knew how to visualize and clean data fastest in R but had only ever done machine learning in Python. That interview didn’t go so well. That weekend, I studied up on the caret package in R and used R exclusively for the rest of my interviews.
Learn your basic data manipulation functions and presentation methods really well
Speaking of these homework assignments, most of them involve manipulating data quickly. If you’re going with R, learn dplyr and ggplot2 like the back of your hand. The same goes for pandas and matplotlib if you’re using Python. Also, you’ll have to present your work at the end of every homework, so get comfortable with R Markdown in R or iPython Notebooks in Python. I usually just wrote my code in R Markdown from the start and then did a little bit of editing at the end to make a presentation, saving myself precious time for analysis.
Prepare for “Why us?” and “What feature or product would you add?”
Most companies ask why you’re interested in them and, if you could add a feature to their offerings, what you’d add. Sometimes they ask for your favorite feature of their product. All of these test your knowledge and interest in the company. Have prepared answers to both before every interview. Obviously, use the company’s product before the interview if you can.
Use the Product
Speaking of using the product, if you can go deep into the product beforehand, you can help yourself stand out. When my dad went to interview at Money magazine, he showed up at the interview with a list of article ideas he knew they hadn’t covered in the last year of issues. They hadn’t askedor expected candidates to go this extra mile, and they were impressed! I tried to follow his example for companies I was especially excited about. For instance, before applying to Slack, I built a Slackbot. I stayed at an Airbnb the night before my final round interview at Airbnb and interviewed my host about his experience on the platform. This sort of activity won’t get you the job itself, of course, but it can help you express that you are really committed to a company.
Ask good questions at the end of interviews
You’ll want to have questions ready for the interviewer at the end of the interview. Two that I liked to ask were: 1) “Do you have any concerns about my skillset or background that I can address for you?” and 2) “If you could go back in time to the day you started at this company and give yourself a piece of advice about the job, what would you tell yourself?” The first allows you to play offense a bit. It shows you’re confident in yourself. The second tends to be one they don’t have a prepared answer for, and usually teaches you something valuable about the company and the job. Even if you don’t end up working at that company, you tend to get some good advice for wherever you do end up working!
Another one from Zach Beaver, a Data Scientist friend at Nest: “I liked to ask ‘If you had to give your company’s data infrastructure/cleanliness a grade (A — F), what would it be and why?’ It allowed me to perceive two things: (1) To what extent might my work devolve into data engineering tasks and (2) How mature is the company? Is there a lot of technical debt? I’ve heard that private equity firms often ask this question in a circuitous way by requesting some kind of analysis of the company they’re investigating; if it takes the company longer than a day or two to respond to the request, they infer that the company’s data infrastructure is lacking.”
Specific Resources I Used to Prepare
Here’s a list of resources I used during the preparation process:
- Data Science Interview Questions: This resource was just okay. It’s a list of interview questions and some links to related answers on Quora. I found it worthwhile, but just barely.
- An Introduction to Statistical Learning (a machine learning textbook)
- Quora for learning about Data Science and for learning about companies
- Various statistics textbooks (specifically this one)
- Coursera (specifically Andrew Ng’s class on machine learning and the Johns Hopkins Data Science Specialization)
- Email newsletters: R-bloggers, DataScienceWeekly.org’s newsletter, Center for Data Innovation’s newsletter, O’Reilly Data Newsletter
- Google Alerts for companies I was interviewing at
- Attending various Data Science meetups in the Bay Area
- Twitter accounts of people such as Hadley Wickham, Wes McKinney, David Robinson, Josh Wills, William Chen, Jeff Hammerbacher, Peter Skomoroch, Hilary Mason, Hilary Parker, and Mike Conover
- Podcasts such as Data Skeptic, Not So Standard Deviations, Re/code Decode for specific tech company CEO interviews
- YouTube Channels such as Data Driven NYC (so good!) and Wrangle Conference
Some Example Questions I Got in Interviews
- Here’s experiment data. Analyze it. What do you think?
- Here’s a bunch of data. Build a model to predict this metric. Why’d you do it that way?
- Here’s a scenario. What dependent variable would you pick? What experiment would you run to measure impact?
- Here are some schemas for a database. Write SQL to answer this question with this data.
- Multiple choice test which required calculating answers from data quickly
- Basic algorithms + data structures questions, big-O notation, etc.
- A management consulting / business school style business case
- Here’s a Data Science scenario. What questions would you want to ask of data?
- My favorite interview question: “Pick a Data Science project you did, and let’s walk through it in detail”
- Fit interviews, talking about the product, my interests and my working style
- Didn’t get any probability questions or code whiteboarding, but many peers did
Basically, anything goes.
The Offer Stage
Learn about RSUs and Options
If you’re applying to companies that offer equity or stock options, as many tech companies do, then learn about RSUs and options before you get to the offer stage. This Andy Rachleff post is a great start. Especially for options, this Github page is a great resource.
Don’t Negotiate Early / Don’t Give a Number
As it says in the Github page above (under “Negotiation Tips”): “Companies will always ask you what you want for compensation. And you should always be cautious about answering. If you name a number that you’ll accept, you can be fairly sure the company won’t exceed it, at least not by much.”
Most companies did ask me early in the process. I tried not to be the first to say a number whenever possible, answering things like, “What do you think the position is worth?” Once I had offers, though, I definitely communicated more openly about them to negotiate.
“So, where else are you applying?”
Most companies seemed to ask where else I was applying. I was pretty uneasy about this. They probably just want to know what their competition is, but maybe they’re going to judge me based on where else I’m applying. (A competitor? Companies they don’t respect? A different type of job than the one they’re offering?) Also, I was worried that if I got rejected by Company X, I’d have to tell this to Company Y and it’d hurt my candidacy at Company Y. In general, I tried to give companies a rough idea of the sorts of companies I was applying to, and was very detailed about where I was in the process so they knew how fast they had to move before I had offers on the table elsewhere, while still withholding company-specific details.
I’d love to hear your thoughts about applying to Data Science jobs, so feel free to respond to this post below. Finally, if you know anyone you think would benefit from seeing this post, please send it their way.
I’d like to thank Robert Chang, Zachary Beaver, and Kate Vinton for reading earlier drafts of this piece.