Data Science Resources For Beginners

Carson Forter
2 min readOct 17, 2017

--

I occasionally have conversations with people who are interested in getting into data science but don’t know where to start. There’s a ton of information out there, so to make it a little less intimidating, I’ve put together a list of resources to help you get a handle on the most basic data science skills.

This list is not exhaustive and is geared towards beginners who are interested in following a generalist path. In other words, going through the resources below will not make you an expert machine learning engineer, but will set you up to get yourself into an entry-level analyst or data science position.

SQL

SQL is a simple language for retrieving data from a database. Pretty much every interview and data science job requires you to know and write SQL.

  1. Get the basics from this Codecademy tutorial
  2. Get advanced lessons from Mode’s fantastic walkthroughs
  3. Check out my article on SQL interviews in data science once you feel comfortable.

Statistics and R

Learning statistics is probably the biggest challenge when starting out, but you don’t need to go super deep right away. A really thorough understanding of the basics (broadly speaking: distributions, hypothesis testing, confidence intervals) is better than a mediocre understanding of tons of advanced stuff.

This Coursera course provides a great foundation. It covers all the important basic statistical concepts you need to know and has you implement some of them in R, a programming language for data analysis that you’ll want to be familiar with. In particular, pay close attention to the first three sections: Central Limit Theorem and Confidence Intervals, Inference and Significance, and Inference for Comparing Means.

In terms of R, I’d recommend learning how to use ggplot2 to visualize data, and dplyr to transform and manipulate data, after you’ve worked through the above course.

Experiments and A/B Testing

Essentially, you just need to understand very thoroughly why correlation does not (ever, no matter what anyone says) equal causation, unless you have some source of random variation. I had a great class on this in my master’s program, but that’s probably overkill for most people. This is a really good HBR article that summarizes the basic principles.

Hopefully what you’ll take away from this is why random assignment is necessary to establish causality. I also have an article that talks about some of these ideas in the introduction.

Product Terminology

While not a technical skill per se, one of the potentially frustrating things for new entrants to the data science field can be the terminology. A co-worker of mine recommended this book on analytics as a good way to get accustomed to all the industry-specific language you’re expected to know in an interview and on the job.

That’s all for now. I’ll update these periodically if I come across new material that I think is worthwhile. Feel free to shoot me questions over Twitter or Linkedin.

--

--