My Reading List as a Data Scientist

Chris Choy
4 min readNov 1, 2016

--

A collection of good reads that I recommend in 2016.

General Reading

When looking for analytics-related books, I would like to recommend books that are more practical and actionable.

A fairly easy read which would give you a head-start understanding of some real-world story of data in practice.

This set of books is more related to the marketing analytics I have been doing. Marketing and Sales Analytics Proven Techniques and Powerful Applications from Industry Leaders is a guide to change management in an organization.

This book by Cole Nussbaumer Knaflic is a quick but worthwhile read, packed with practical tips that make your charts better.

Data Warehouse

A compilation of articles and tips that is useful for data warehousing. The Kimball compilation is not to miss.

If you would like a concise introduction for star schema design pattern:

Closely related to the star schema, it’s the PowerPivot in Excel. With a properly designed star schema, you can view and dice your data quickly with PowerPivot in Excel.

Python

To get you started in Python, the book by Panda author is not to miss:

To become more effective in Python, read the following book. Packed with examples to explain pitfalls and more advanced functionalities of the Python language.

R

When I first learned R back in 2006, the book Modern Applied Statistics with S by Venables and Ripley was one of the better ones. I’m sure the world has moved on. The following is the modern recommendation:

A good reference book for inspiration:

Theorectical Probability / Statistics / Optimisation

The books in this section is more theoretical and might not be of interest to everyone.

The following book is a good theoretical introduction to many machine learning algorithms.

For a good and concise introduction to probability:

My go-to reference book for frequentist statistics:

Packed with examples, my go-to book for Bayesian analysis.

Another book from Ross. It contains an easy-to-follow introduction to Markov Chain Monte Carlo.

I find this book easy to follow and packed with useful walkthrough. Essential if you are going to write your own optimization algorithm.

Correlation does not imply causation. Under some circumstances, you can actually derive causation, so not all is lost. Discover more via the following classics.

Technical Books that not everyone might find useful

It is helpful to have more than one tools in the toolkit. The following tools introduce you with some basic command line program such as cut, sort, jq which would be helpful to have a quick glimpse of the data.

It is this book that allows me to learn more about functional languages, e.g. Clojure. I could see the benefits of functional programming paradigm in data science.

Another book in the series that give you broad exposure for different type of databases such as mongodb, PostgreSQL, redis. It consists of a brief introduction to CAP theorem which is one of the reasons why there’re so many different types of databases.

--

--