There are so many great resources out there to learn data science and analysis for free. Over the last year I have read quite a few data science books and I wanted to share some of the best here. If you are studying, or practicing data science, and haven’t read these books I really think they are worth adding to your reading list for 2019. Below is a list of the top 10 I have found most useful to me over the last few years that are currently available online.
Automate the boring stuff
I really love this book it is a simple introduction to getting started with python from a practical point of view. Although not a specifically data science related book it includes most of the basic concepts around using python for data science. Including flow control, functions, web scraping, working with csv and json files, and running programs. It is very much aimed at absolute beginners so a great book for getting started with python. As well as step by step instructions for each technique, at the end of each chapter there are also practice questions and problems.
Data science at the command line
I started using python for data analysis purely in Jupyter Notebooks. However, over time I found that using the command line enabled me to be much more efficient in my work. For example I can very quickly obtain data, run programs and search through files all by typing commands and pressing enter in the terminal window. This book is a highly accessible and comprehensive guide to data science at the command line. In each chapter it covers, alongside working examples, how to obtain, clean, explore, model and interpret data via the command line.
Think Stats 2e
by Allen B. Downey. Download this book in PDF. Code examples and solutions are available from this GitHub repository…
This is a really practical overview of statistics for data science. The book uses a data set from the National Institute of Health throughout to explain the core concepts in probability and statistics necessary for data science and analysis. This is another highly practical book, and includes lots of example python code, and simple programs to explain the concepts. This is much more lightweight than a lot of the more theoretical textbooks you may find on this subject, and I found this really suited my learning style.
Python data science handbook
This is a really comprehensive guide to python for data science. This builds from beginner to advanced concepts. There is a chapter on iPython which really made such a difference to my efficiency as a data science practitioner. This book also covers Numpy, data manipulation with Pandas, visualisation methods, and Machine Learning. The Machine Learning chapter in particular is really good, and covers both the practical implementation of the various libraries, and the nuts and bolts of how they work.
R for data science
R for Data Science
This book will teach you how to do data science with R: You'll learn how to get your data into R, get it into the most…
I mainly work in python but I still find it is really useful to have at least a working knowledge of R. I have often found that if a good library for a particular method is not available in python, R usually has one. This book is a really comprehensive guide to doing data science with R, and covers everything from data visualisation and transformation, to the R workflow, to data modelling.
Probabilistic Programming and Bayesian methods for hackers
aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a…
In the authors own words this book is an attempt to “bridge the gap between Bayesian mathematics and probabilistic programming”, and I believe it does this very well. As with Think Stats it moves away from heavily theoretical textbooks and offers practical use cases for Bayesian inference, and the approach is a computational understanding first, and a mathematical understanding second. It is another python based book with lots of practical examples, and uses predominately the PyMC libraries.
Machine learning yearning
Machine Learning Yearning
Free draft copy of Andrew Ng's book - Machine Learning Yearning!
This book has been released in draft by Andrew Ng this year. It is designed to teach data scientists how to structure Machine Learning projects, and set direction for a data science team. It is a good overview of when and how to use Machine Learning, and how to handle the complexities involved in implementing AI in the real world.
Ethics and data science
Ethics and Data Science
As the impact of data science continues to grow on society there is an increased need to discuss how data is…
There has been a lot in the news this year relating to bias in machine learning applications, and data protection and privacy concerns. I read this book as I wanted to ensure that I had the required knowledge to practice good data science. This book covers how to put ethical principles into data science projects. It includes a really good checklist to go through when designing a project as well as lots of suggestions for building ethics into a general data culture. Another resource released this year along very similar lines was the deon command line tool from drivendata.org. This tool allows you to build an ethics checklist into data science projects. This is definitely something I will be incorporating into my work in the new year.
This is an excellent book now available to read for free online. It covers applied maths for Machine Learning, and has a large emphasis on deep learning in particular. It covers the mathematics behind key concepts in deep learning such as convolutional networks, regularisation and recurrent and recursive nets. It is very much a theory based book but gives a deep level of understanding into the subject. It does also include chapters on the practical implementation of these techniques.
Rules for machine learning
This is really an ebook/paper and only about 24 pages long. However, I have to include it here as it is such a great resource and I found it by chance on twitter this year. This covers some best practices from Google in how to implement a machine learning project. It emphasises the importance of data engineering to create great features and a solid data pipeline over machine learning expertise.
These books have been really useful to me over the last couple of years, I am always amazed at the quantity and quality of free resources available online. I am sure that I will continue to refer back to these into 2019 and beyond, and hopefully find some more brilliant resources to share. Happy New Year!