Top Books For DATA SCIENCE (must-read)

Anushka Bajpai
15 min readMar 22, 2022

--

“What you can’t find in someone’s voice, you might find in someone’s writing“ — unknown

Overwhelmed with plethora of resources available today ? Here is an updated Collection of the Best Data Science Books one must read!

Photo by Shiromani Kant on Unsplash

I was always more inclined towards video tutorials/lectures when it comes down to studying something on my own from the web. I found it easier and less cumbersome( just like most of you ).

I continued feeling the same unless a few years back when I came across a niche book in Statistics that changed my perspective of looking at books (from ‘boredom’ to ‘magically intriguing’). Kudos to the writers and publishers for making consistent efforts to create something so invaluable for the entire world out there.

I did a wide research before picking in my top books for data Science and today, I will share those books, with all you aspiring data enthusiasts and practitioners. Books that will make you think twice before turning your face away from them.

Let’s get started . . .

I have divided the books into different domains to make things easier for one to pick :

  • Books on Statistics / Probability
  • Books on Machine Learning
  • Books on Data Visualization and Storytelling
  • Books on Deep Learning
  • Books on Natural Language Processing (NLP)
  • Books on Computer Vision
  • Books on Artificial Intelligence
  • Books on Tools/Languages

Books on Statistics

1. Introduction to Statistical Learning

Authors: Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani

An all-time classic. This book covers basic statistics as well as machine learning techniques. The awesome thing about this book is that each concept is explained with case studies in R. So once you have a handle on programming, you can always come back and try out each concept again.

2. Think Stats: Probability and Statistics for Programmers

Author: Allen B. Downey

This book is at the top of most data science book lists. The book comes with plenty of resources. It will be especially useful for folks who know the basics of Python. The language is used to demonstrate real world examples.

3. The Art of Statistics: Learning from Data

Author: David Spiegelhalter

Humane examples…practical dissection of problem statement and gradual and consistent intelligence build up towards the statistical solution for a given problem. In short, Statistics made easy!!

4. Probability: For the Enthusiastic Beginner

Author: David Morin

Ideal book for beginners. All the basics are covered — combinatorics, the rules of probability, Bayes’ theorem, expectation value, variance, probability density, common distributions, the law of large numbers, the central limit theorem, correlation, and regression.

5. Introduction to Probability

Authors: J. Laurie Snell and Charles Miller Grinstead

Another introductory book covering basic probability concepts. Like the book above, this one is a comprehensive text written with college graduate students in mind.

6. Naked Statistics — Stripping the Dread from the Data

By Charles Wheelan

Statistics can sometimes be an daunting topic to dive into. In this book, author clarifies key concepts like inference, correlation, and regression analysis in a fun and less dreadful way.

Books on Machine Learning

1. The Hundred-Page Machine Learning Book

Author: Andriy Burkov

I absolutely loved this book. Having read a ton of books trying to teach machine learning from various angles and perspectives, I struggled to find one that could succinctly summarize difficult topics and equations. Until Andriy Burkov managed to do it in some 100-odd pages. It is beautifully written, is easy to understand and has been endorsed by many.

2. Introducing Data Science by Davy Cielen et.al published by Manning Publications

I like this book for a special reason and that is, the books contain not only the topics of data science that we see everywhere, it also includes other aspects of Data Science as a field. I highly recommend this a read and more or less get yourself familiar with above mentioned extra skills in your arsenal.

3. Data Science from Scratch by Joel Grus published by O’Reilly

The second edition of this book is already released and it has been a popular book due to the fact that it encounters various fundamentals altogether in this single book. It’s a full package deal and you should definitely consider giving it a read.

4. Python Data Science Handbook by Jake VanderPlas published by O ‘Reilly.

This book is best for those who just started doing Data Analysis or Data Science and need a go-to book to refer to all the techniques and library functionalities and strengthen their grip on python for data science and letting it work for you.

5. Build a Career in Data Science by Emily Robinson and Jacqueline Nolis

Source: Manning

This one mostly focusses on the technical stuff related to learning data science. Published in 2020, this one teaches you how to the data science industry works. And that’s why it’s a must-read.

6. The Art of Data Science — A Guide for Anyone Who Works With Data

By Roger D. Peng and Elizabeth Matsui

This book provides an excellent overview of the data analysis workflow. Moreover, it articulates well how despite the presence of many tools, data analysis is fundamentally an art, involving an iterative process where information is learned at every step.

Books on Data Visualization and Storytelling

1. Fundamentals of Data Visualization — A Primer on Making Informative and Compelling Figures

By Claus O. Wilke

This book presents the basic principles alongside good and bad contrasting examples of data visualization. It is a book that can help you understand the rationale behind an effective visualization and can teach you to design more meaningful plots that get the right message across.

2. “Beautiful Visualization, Looking at Data Through the Eyes of Experts by Julie Steele, Noah Iliinsky”

Authors: Julie Steele, Noah Iliinsky
Website: O’Reilly Media | Amazon

“Beautiful Visualization” explores storytelling with data, communicating through visual indicators such as color, and research methods to put it all together.

This book describes the design and development of some well-known visualizations.

3. MakeoverMonday — Improving How We Visualize and Analyze Data, One Chart at a Time

By Andy Kriebel

This book is an extension of the #MakeOverMonday project where members of data visualization community share their improved take on existing charts and data. It emphasizes that while there’s variability in designing visualizations, there are key techniques one can follow to make sure your chart makes an impact.

4. Storytelling with Data — A Data Visualization Guide for Business Professionals

By Cole Nussbaumer Knaflic

This is a must-read book for anyone who wants to get better at presenting information in a clear, concise, and graphical way. This book teaches you the fundamentals of data visualization and how to effectively communicate with data, complete with numerous real-world examples.

5. BETTER DATA VISUALIZATIONS: A GUIDE FOR SCHOLARS, RESEARCHERS, AND WONKS BY JONATHAN SCHWABISH

By : Jonathan Schwabish

The book is organized into three sections. It begins with a brief primer on data visualization best practices. Part two is the bulk of the book: chart types. Schwabish dives deep into different types of graphs that go well beyond the standards of lines and bars.

Books on Deep Learning

1. Deep Learning with Python

By Francois Chollet

Francois is the creator of Keras so who better to teach you this topic? I also recommend following Francois on Twitter — there is a lot we can learn from him.

Image from Amazon

2. The Deep Learning with Python

The book begins with a practical approach because you can learn several helpful techniques straight away. It is often incredibly realistic because you will adopt it right away to activities right after the read. This is an utter must-read in deep learning.

3. Foundations of Deep Reinforcement Learning — Theory and Practice in Python

By Laura Graesser and Wah Loon Keng

A rather advanced textbook that explores Deep Reinforcement Learning, where artificial agents learn to solve sequential decision making. A well-written book for anyone who has working knowledge of machine learning and wants to solve problems using Deep RL.

4. Deep Learning Illustrated — A Visual, Interactive Guide to Artificial Intelligence

By John Krohn, Grant Beyleveld, and Aglae Bassens

This is a practical reference that can help you build your intuition on deep learning algorithms. In this visual, interactive guide, you will learn theories together with examples you can run through on the accompanying Jupyter notebooks.

5. Hands On Machine Learning

By Aurelien Geron

This book is somewhere between the intermediate and advanced stages of Machine Learning. It would cater to all the individuals who are specialists in the area and others who are not. It starts with a gentle introduction to machine learning and deep learning and then moves to more advanced ways. A fantastic book!

Books on Natural Language Processing (NLP)

1. Natural Language Processing with Python

Authors: Steven Bird, Ewan Klein and Edward Loper

Another book in this collection which sticks to the learn by doing policy. You’ll pick up Python concepts you otherwise wouldn’t have and will navigate the world of NLP using the NLTK library (Natural Language Toolkit).

2. Foundations of Statistical Natural Language Processing

Authors: Christopher Manning and Hinrich Schutze

It’s a very comprehensive guide to the broader sub-topics in NLP, like Text Categorization, Parts-of-Speech Tagging, Probabilistic Parsing, among various other things. The authors have provided a rigorous coverage of mathematical and linguistic foundations. The book is quite detailed so keep that in mind.

3. Speech and Language Processing

Authors: Daniel Jurafsky and James H. Martin

The emphasis of this book is on practical applications and scientific evaluation in the scope of natural language and speech. Jurafsky and Martin have written an in-depth book on NLP and computational linguistics. This one is from the masters themselves.

Books on Computer Vision

1. Computer Vision: Algorithms and Applications

Author: Richard Szeliski

Explore a variety of common computer vision techniques in this book. It’s a comprehensive text that takes a scientific approach to solving basic vision challenges.

2. Programming Computer Vision with Python

Author: Jan Erik Solem

Before you dive into this awesome book, go to the website I’ve linked above and download the datasets, the code notebooks and clone the GitHub repository mentioned there. They are excellent companions in this REALLY hands-on introduction to the world of computer vision.

3. Computer Vision: Models, Learning, and Inference

Author: Dr. Simon J.D. Prince

The book starts off from scratch by introducing us to the concepts of probability and quickly picks up pace from there. More than 70 algorithms have been introduced and the text is beautifully complemented by over 350 illustrations.

Books on Artificial Intelligence

1. Artificial Intelligence: A Modern Approach

Authors: Stuart Russell and Peter Norvig

A book written by Stuart Russell and Peter Norvig? I am sold. It is the leading book in Artificial Intelligence. Covering the length and breadth of AI components — speech recognition, autonomous vehicles, machine translation, and computer vision among other things, this can be considered the Bible of AI.

2. Artificial Intelligence for Humans

Author: Jeff Heaton

What are the foundational algorithms underneath artificial intelligence? This book packs a lot of technical know-how into just 222 pages. This is volume 1 of a series of books on the techniques behind AI (dimensionality, distance metrics, clustering, error calculation, hill climbing, Nelder Mead, and linear regression). There is an accompanying site as well which contains examples cited in the book + a GitHub repository containing the code.

3. The Master Algorithm

Author: Pedro Domingos

If you’re looking for a technical book on AI, this isn’t it. Will we ever find a single algorithm (or ‘The Master Algorithm’) that is capable of driving all knowledge from data? Join Pedro Domingos in his quest to find out.

Books on Tools/Languages

1. Learning Pandas — Python Data Discovery and Analysis Made Easy

By Michael Heydt

Learning Pandas is another beginner-friendly book which spoon-feeds you the technical knowledge required to ace data analysis with the help of Pandas. One of the best attributes of this pandas book is the fact that it just focuses on Pandas and not a hundred other libraries, thus, keeping the reader out of confusion and proclaiming itself as one of the best books to learn Pandas.

2. Learning the Pandas Library

By Matt Harrison

Simple, accurate and versatile are the best terms to describe this book. Hailed as one of the best books to learn Pandas, this book is comparatively lightweight when compared to the other books on the list. The book has thorough coverage of Pandas DataFrame and the various activities one can perform with the help of DataFrames.

3. Pandas Cookbook

By Theodore Petrou

This book can be termed as the perfect reference book. With more than 95 recipes to showcase the power of the library, readers will be able to analyze data like never before.

4. Pandas for Everyone: Python Data Analysis

By Daniel Y. Chen

This book is aimed at absolute beginners who have zero programming knowledge. It picks you up, guides you and channels your mind to know exactly what needs to start Data Analysis with Python and Pandas. As the title of the book states, it is for everyone. This makes it a definite pick to be one of the best books to learn Pandas.

5. Hands-On Data Analysis with NumPy and Pandas

By Curtis Miller

This pandas book is a slightly tougher book to get your mind around, and it’s recommended that one has a slight knowledge of Pandas and NumPy before starting on the book. This book might not be beginner friendly, but that doesn’t keep it from being one of the best books to learn Pandas.

6. The Pragmatic Programmer — Your Journey To Mastery

By David Thomas and Andrew Hunt

This is a timeless book that “examines the very essence of software development, independent of any particular language, framework, or methodology”. Not only does it discuss techniques to keep your code adaptable and easy to reuse, but it also explores topics on personal responsibility and career development.

7. Clean Code — A Handbook of Agile Software Craftsmanship

By Robert C. Martin

This book explains the principles, and best practices of writing clean code illustrated using several case studies. Important for data professionals working in a collaborative setting, writing clean code is a skill that can prepare you and your team to produce better data products.

8. Fluent Python: Clear, Concise and Effective Programming

Author: Luciano Ramalho

There are way too many resources out there to learn Python but nothing teaches you programming like a good old-fashioned book. As you might expect from a coding book, it’s a hands-on guide to help you understand how Python works and how to write awesome and effective Python code.With a length of 794 pages, this book is worth the spend.

9. Programming Python: Powerful Object-Oriented Programming

Author: Mark Lutz

Wait, another Python book?! If you thought the above book taught you everything you need to know about Python, think again. This is a vast programming language with a lot more left to cover. Once you’ve mastered the fundamentals from the above book by Luciano Ramalho, try this one by Mark Lutz. There are in-depth tutorials on a wide variety of topics: databases, networking, text processing, GUIs, etc. Tons and tons of examples are included. A must-read for programming geeks.

10. Mastering Python for Data Science

Author: Samir Madhavan

The two books we have covered so far for learning Python looked at the language from a programming perspective. Now it’s time to learn it from the data science angle. Which data science libraries are commonly used and how? How can you create data visualizations and mine for patterns in Python? And how can you code advanced data science/machine learning techniques to build models? These questions and more are answered by Samir Madhavan in this excellent write-up.

11. R for Data Science

Authors: Garrett Grolemund and Hadley Wickham

Anyone who has remotely heard of R programming will have brushed across Hadley Wickham’s work. His work in this language is unparalleled — I could go on and on about him. The perfect book to learn data science through coding in R.

12. R for Everyone

Author: Jared P. Lander

This is a great book if you’re from a non-technical and non-statistical background.

13. R Cookbook

Author: Paul Teetor

The R Cookbook is an excellent addition to your budding data science reading list. It contains more than 200 practical recipes to help you get started with analyzing and manipulating data in R.

FINAL THOUGHTS

The idea behind this blog was to provide the reader with the top best books available for Data Science. I have supported each recommendation with a brief summary to ensure the readers can choose as per their REQUIREMENTS, INTERESTS and FUTURE ASPIRATIONS.

Whenever someone tries to confuse you, remember this :

There are multiple paths to succeed in data science, and the path you chose should be simple enough to help you take action. The reason you’re overwhelmed is because of the information overload these options bring you. Instead of spending more time thinking and planning about acquiring the skill, just pick one of the above books as per your current needs and get started. The key is to take action consistently.

Read with an intent to learn and discover the magic behind the minds of these great authors and practitioners.

Happy Learning!!!

Photo by Ian Schneider on Unsplash

--

--