How to stand out in a crowd: Tips for transitioning from academia to data science
Most people who receive a PhD will not remain in academia after graduate school. Instead, they will likely enter “industry” and apply the skills they’ve learned during graduate school in a new field (if they are lucky…). For many graduates in the quantitative sciences, this new field is data science!
I made this transition myself a couple of years ago and, rather than join one of the many companies hiring data scientists these days, I co-founded Strong Analytics, a Chicago-based data science consulting firm. We help organizations of all sizes research, build, and deploy machine learning algorithms and other data products — from startups you haven’t heard of yet to companies like Mercedes-Benz and Dictionary.com.
Although we are a small (read: “boutique”) firm, we have received a lot of applications for our open data science roles. The job market for data science and machine learning is very hot — meaning there is both lots of interest in hiring data scientists and lots of competition for data science positions. After all, who wouldn’t want “the best job in America”?
Ex-academics from quantitative sciences like physics and the social sciences are typically very competitive for these roles because of their experience with applied statistics. Nevertheless, if you’re an aspiring data scientist graduating from your PhD this spring/summer, here are a few things that you can work on to boost your chances of standing out in the crowd.
Speaking from experience, although PhD graduates typically have a solid understanding of applied statistics in theory and in clean, experimental settings, they often don’t have is much experience applying statistics to real-world messy data in service of building and delivering a data product.
But this is an easy problem to solve: go build something! Take what you know about statistics and apply it in a novel way to an interesting set of data. Along the way, you’re going to learn a lot that makes you super valuable to a new employer: namely, that:
- Real-world observational data is way messier than even your “messiest” experimental data (did I mention that nothing is actually normally-distributed?).
- Statistical modelling is a small part of delivering a useful data product.
- It feels amazing to ship a finished product!
What could you build? A web app is nice — maybe a something using Plotly Dash or Shiny? A Python package or R package is really great too, and perhaps easier if you don’t want to get involved in the web side. (That said, if the web stuff is making you nervous, you should probably do that too…).
Just build something, and tell folks about it!
Learn a new programming language
Most aspiring data scientists leave graduate school knowing either R or Python. Few know both, which makes sense: your Science paper isn’t going to get rejected based on the programming language in which you did your analyses.
But now that you’re transitioning to data science, learning another programming language is really valuable, for several reasons. First, every time you learn a new language, you learn a little more about the languages you already know, and programming in general. Knowing more than one language is an excellent signal of your programming proficiency. Second, most firms (including ours) bounce back and forth between programming languages to use the best tool for each task. (For the record, we tend to do our exploratory analytics and research in R but build and deploy machine learning pipelines in Python).
Practice your SQL
Every project we do begins with the data. After finding out where the data are (not always trivial), we need to query it to extract the data we need for our work.
SQL is the universal language of data. Once you have a decent grasp on SQL syntax, you can query standard relational databases (e.g., Postgres, MySQL), data warehouses (e.g., Redshift), and distributed data stores (e.g., Hadoop, S3; via Hive, Spark, Presto, Athena…). (In truth, many of these systems use different SQL dialects, but the differences between them are greatly outweighed by their similarities).
If you know SQL by the time you begin your first role, you are much less likely to stumble over the first and arguably most important hurdle in your work as a data scientist.
Perhaps you have already established a decent grasp on SQL? In that case, level up on the engineering side by learning about the various SQL execution engines that parse your query and return the results you want. Learning more about the underlying query execution engines of popular data stores will enable you to write more efficient queries and better identify possible data access/storage challenges in your future projects.
By no means are the suggestions above requirements for applying to data science position. That said, they should definitely help your chances of landing the data science job you’ve always wanted by showing you can make an impact in your new role on day one.
Oh, and if you find you love building data products, tackling difficult programming/engineering challenges, and flexing your SQL skills, send your resumé to us at Strong!