Data Science Books to Stuff Your Stockings

Clare Corthell
5 min readDec 12, 2015

Happy Holidays! 🎉🎄🎀 It’s that time of year when I hide inside, drink hot toddies, and throw all my personal data from 2015 into spreadsheets, notebooks, and distributed clusters to examine the year through data.

Top 10 Bestselling Data Science Masters Books

The Open Source Data Science Masters is the only open source and most popular Data Science Masters curriculum on the internet. It spans statistics, linear algebra, algorithms, distributed computing, machine learning, natural language processing, visualization, and data analysis.

While the curriculum at its core is open source, it also includes great books from publishers who require payment for their products. As people click through to books, I can see what they end up buying. I’ll assume that buying something demonstrates intent to read it — here’s the list of the most popular Data Science reads this year!

The Most Popular Books in 2015

A popular one with the crowd, commonly agreed to be full of insights though lighter on application and technical details. The consultant author draws on diverse experience to discuss the intuition in evaluating data analysis solutions.

It really excited me to see this at the top of the list. When my dear friend Nate Stockham was teaching me linear algebra, he required that I read this. It undoubtedly changed the rigor of my thinking and even sparked an interest in proofs (remarkable). It was written in 1944, and in addition to being brilliant and succinct, the author is essentially pragmatic. This book could have been called, alternatively, I Will Teach You Not to Fear Math.

Anyone could be great at math, but abstraction scares most people away. In fact, the first time children encounter abstract concepts is in math. Math is a special explicit language for solving problems described by numbers, and How To Solve It details how to approach this language in the mathematical problem solving process (and beyond).

This is one of the books I can’t leave behind. It concisely covers fundamental statistical concepts with clear explanations. For example, did you know:

There is no absolute agreement among statisticians about how to define outliers, but nearly everyone agrees that it is important that they be identified and that appropriate analytical techniques be used for data sets that contain outliers. Basically, an outlier is a data point or observation whose value is quite different from the others in the data set being analyzed.

Wow. Opine on twitter!

Stats in a Nutshell is literally at hand on my desk next to my phone (eg the instagram machine) and my coffee.

Frankly I’m appalled that Anthropologie doesn’t have these mugs.

Wes McKinney’s invaluable book introducing the open source data analysis suite, including his package pandas. I highly recommend starting with this book if you’re new to working with data structures and data analysis. I used pandas to load and aggregate my amazon book data in three lines of code:

import pandas as pd
amz = pd.DataFrame.from_csv('amz2015.csv', header=1, index_col=None)
print amz['Item Name'].value_counts()

From Rachel Schutt’s Data Science course at Columbia comes an application-focused technical book touching on all topics of the trade. It’s a broad exposure of applications for those with less industry experience and a hunger to learn about building applications that solve real business problems. Topics include the data science process, spam filters, logistic regression, financial modeling, causality, visualization, data journalism, and more. One of the best features of her teaching: it always center on an applied example with code.

Editor’s picks, you ask?

Top stocking stuffers for any data-curious or already-infatuated person:

I’ve gifted this beautifully-typeset book to a number of people already. Even as a data person, I was surprised that I couldn’t put this one down. Christian Rudder is a fantastic storyteller with insights derived from real human behavior in the unconventional data playground of dating and love.

I’ve read this one twice, and I’ll probably read it again. Gleick’s unique and detailed style gives a new lens to stories of Ada Lovelace, the telegraph, african talking drums, and every other technology that has carried information.

I thought you’d never ask! This is one of the best pop-sci books about data ever written. It explains why acts of terrorism, earthquakes, and global warming are, for very good reasons, almost impossible to predict. This one is foundational — read it.

Amazon Order Numbers

I knew you’d be curious.

Data Analysis with Open Source Tools                            114
How to Solve It: A New Aspect of Mathematical Method 89
Statistics in a Nutshell 79
Python for Data Analysis 48
Doing Data Science: Straight Talk from the Frontline 35
Data Science from Scratch: First Principles with Python 23
Think Bayes 22
Think Stats 20
Linear Algebra 18
Envisioning Information 15
The Visual Display of Quantitative Information 14

How could I not be pleased by Tufte being so highly represented! People have their hearts in the right places.

Conversion stats, because you’re curious:

Product Link Conversion             0.51%
Product Link Clicks 35063
Total Items Ordered 1104

But I’m a woman concerned with edge cases — so of course I looked at the long tail.

Everything Is Illuminated: A Novel                                 1
D'Addario EJ83L Gypsy Jazz Acoustic Guitar Strings 1
101 Nights of Great Sex: Sealed Secrets. Anticipation. Seduction. 1
NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence 1
Ameda Purely Yours Breast Pump - 17070 1
Philips Sonicare HX6062/64 Diamondclean Replacement Brush Heads, Standard 1
Dallas Cowboys Men's NFL Ugly Sweater Cardigan 1
NOW Foods Liquid Melatonin, 2-Fluid Ounces 1

This data hereby defeats any stereotypes of techies being literarily unread, unmusical, unsexed, unfunny, male, unhygienic, poorly clad, or inartistic.

And one poor soul out there needs sleep.

--

--