The best MOOCs and Books for Data Scientists

Vagner Zeizer C. Paes
Geek Culture
Published in
9 min readDec 28, 2022
Source

In this story, I want to share with you the best courses I’ve taken and the best books I’ve read so far in Data Science. I would like that this story could help people find a good path when learning data science because there has been a huge increase in data science courses and books in the last few years, and it may be hard to guess the correct path for you in order to land a good job.

MOOCs

  1. Introduction to Machine Learning with Tensorflow (Udacity)

In this Nanodegree, you are introduced to the basics of supervised and unsupervised learning with Python by using scikit-learn and the Tensorflow framework (there is also an equivalent Nanodegree that uses the Pytorch framework) for dealing with neural networks. The projects are supervised learning (easy-level difficulty), a dense neural network to create an image classifier (mid-level difficulty), and unsupervised learning to create customer segments. This Nanodegree is basic and if you wish you can skip it and go to the Data Scientist Nanodegree.

2. Data Analyst Nanodegree (Udacity)

Although this Nanodegree is not essential, I think it is good for a Data Scientist to do it. When I did it, the course encompassed SQL, Web Scraping, Data Exploration, Statistics (which was not that easy to comprehend at first), and an Introduction to Machine Learning Methods. With great real-world projects, I had obtained at that time (more than two years ago), a good comprehension of data science methods, and I could participate in data science interviews and tasks from interviews with confidence. I think it is definitely worth doing it.

3. Data Scientist Nanodegree (Udacity)

This Nanodegree is one of the most famous data science courses on the internet. It begins with the definition of a data science problem and the common techniques, machine learning in Natural Language Processing, recommender systems, and a final project that you can choose from a list or search for some data and customize it totally on your own. So, I will briefly discuss the projects in each of the four courses within this Nanodegree and discuss what I have learned from them. First of all, you can learn software engineering if you want (not mandatory), and after that, you will get a general overview of data science methods and the CRISP-DM method, applying the learned methods in some possible datasets, such as Airbnb. In the second project, you will learn how to construct pipelines in machine learning, along with NLP tools, to predict many classes to build a disaster response pipeline. In the third part, you’ll learn statistical tests, such as A/B testing and recommender systems to make your recommender engine. The last part is the Capstone project, which you can choose among some projects pre-available in the workspace or customize your own project. I chose the Sparkify project, coded in PySpark, which can be found here. The full list of projects I have built can be found in my GitHub account, whose link is provided here. Definitely worth doing this Nanodegree, even if you are an experienced data scientist.

4. Deep Learning Specialization (Coursera)

Undoubtedly, this is considered the best specialization from Coursera. Taught by Andrew NG, in this specialization you learn how to build almost any kind of neural network from scratch. Moreover, you apply the concepts learned in the lessons in real-world projects. The projects range from creating from scratch DNN, CNN, and RNN to computer vision applications. I am not a big fan of Coursera comparatively to Udacity, but this course outstands, and it is much cheaper (and much shorter) than Udacity Deep Learning Nanodegree;

5. MLOps Specialization (Coursera)

I would say that this specialization is more like a kind of “Introduction to MLOps”. You learn in this course how to design an ML production system end-to-end: project scoping, data needs, modeling strategies, and deployment requirements. You learn also the concept of data drift and concept drift, and a valuable tool named Tensorflow Extender (TFX) that can perform feature engineering, transformation, and selection. Likewise, you learn how to serve offline/online inference requests. You also get an overview of how to apply best practices and progressive delivery techniques to maintain a continuously operating production system. This specialization is easy to middle level and can be done with hard work in the period of one month. Just one remark, I am personally not a big fan of the way Coursera teaches Cloud computing, because it is just kind of copy/paste tasks.

6. Practical Data Science on the AWS Cloud (Coursera)

In this specialization, which I think is quite fair about the trade-off of what you learn and what you pay for, you get a general overview, with some mid-level tasks, of how to use AWS SageMaker Cloud tools. You learn on the AWS Cloud how to ingest, register, and explore datasets, detect statistical bias, use AutoML, and save and manage features in a feature store. Also as sentiment analysis is done in the course, we use built-in algorithms and custom BERT models, debug, profile, and compare models to improve performance, create an end-to-end ML pipeline, and perform hyperparameter tuning on AWS, after that, we deploy and monitor the models. The most interesting aspect of the course I think is to build a human-in-the-loop pipeline to improve model performance, in which the model asks for human intervention when the model can not beat a threshold of certainty about the predicted class.

TECHNICAL BOOKS

  1. Data Science from Scratch

I do recommend this book if these are your first steps in data science. In this book, you learn how data science works by implementing (simple) algorithms from scratch. This is important for beginners because ML algorithms are commonly seen as black-box models and a more precise view of these algorithms will help you understand what is going on under the hood, ranging from the simplest Naïve-Bayes to recommender systems. Use cases are also shown in this book, which can help you get acquainted with data science procedures used in the industry.

2. Data Science for Business

This book is intended for people who have already a basic data science foundation, aiming at showing how data science can give real value to your company. It is a great one in order to get acquainted with what the industry expects from a data science worker. Basically, you learn how to make data science questions, comprising learning about correlation and segmentation, model fitting, similarity, and clustering. The author also discusses what a good model is, how to visualize model performance, and how to explore mining text.

3. Data Science Projects with Python

This book presents a detailed walk-through of a Data Science project. By presenting business problems, the author shows what, why, and how when it comes when deciding how to explore, clean, and model the data in real-world situations. He does a very good job of explaining what exactly is happening in the code used throughout the book. The best point of this book is that the coding is simple, and the machine learning algorithms used in the modeling part are explained without using in-depth math.

4. Approaching (Almost) any Machine Learning Problem

This book focuses much more on the methods and good approaches to tackle data science/ machine learning problems. This book has lots of codes within it, but as the code for reproduction is not hosted anywhere, the aim of the book is to expand your tools and discuss ways of thinking as a data science problem solver. The flow of the book is very good, ranging from basic concepts to covering images, and many advanced concepts!

5. Estatística básica

This book (in Brazilian Portuguese) gives you the statistics you need to know and understand to tackle real-world problems in order to perform valuable data analysis. It shows how data visualizations can be performed in order to get insights from the data, how probabilities are handled and can be applied, as well as the cases of regression models used in some real data. The book focuses more on theoretical explanations, therefore, I recommend it for readers who do not a strong or even basic statistics background.

6. Hands-On Machine Learning with Scikit-Learn, Keras, and Tensorflow: Concepts, Tools, and Techniques to Build Intelligent Systems

In this book, you learn mostly in a smart way how to use common supervised machine learning models, such as KNN, Tree-based models, Support Vector Machines, and so on. But the strong part, in my point of view, is the use of neural networks by using Tensorflow. The strongest part of this book is that it makes the bridge between theory and real-world applications with neural networks.

7. Getting Started with SQL: A Hands-On Approach for Beginners

This book gives you the basics, and sometimes enough, SQL knowledge in order to perform intermediate-level queries in datasets. It explains SELECT, WHERE, GROUP BY, and ORDER BY statements, to UPDATE tables and use JOINs. Ideal for beginners and it can be easily assimilated in two weeks or less, depending on your pace.

DATA SCIENCE/AI BOOKS FOR FUN

  1. AI Superpowers: China, Silicon Valley, and the New World Order

This book is basically about the technological dispute between USA and China and how Artificial Intelligence supremacy will be the key in order to be the prevalent nation. The author discusses four “waves of AI” and why in the long run China will have AI supremacy.

2. AI 2041: Ten Visions for Our Future

This book discusses what will possibly be the future twenty years from now. In short stories, it will be shown how AI will likely shape job reallocation, virtual reality and mixed reality, quantum computing with blockchain, possible COVID scenarios, and computer vision in future daily lives. It is kind of a Sci-Fi, great for people who enjoy it. I personally found the book too dense and too long.

3. Life 3.0: Being Human in the Age of Artificial Intelligence

This futuristic book written by Max Tegmark (Physics Professor at MIT) shows possible scenarios of life with the rapid growth of artificial intelligence in our daily life. It begins by arguing about three stages of life:

  1. Life 1.0 refers to biological origins, such as viruses;
  2. Life 2.0 refers to cultural developments in humanity;
  3. Life 3.0 refers to technologies such as Artificial General Intelligence (AGI) that may someday, in addition to being able to learn on their own, be able to also redesign their own hardware and internal structure. Things that the other two kinds of lives can not do.

After reviewing current issues in AI, Tegmark then considers a range of possible futures that comprises intelligent machines or humans. The book also covers potential outcomes that could occur, such as altered social structures, integration of humans and machines, and both positive (AGI helping us or AGI being “enslaved”) and negative scenarios like authoritarian AI or an AI apocalypse (AGI that may enslave or conquer the earth, and we most possibly would not even notice that).

Finally, Tegmark presents prospective insights into how would be the world within a thousand or ten thousand years from now.

4. Superintelligence: Paths, Dangers, Strategies

The author discusses several concepts and interesting questions about the characteristics and consequences of the rising of a “superintelligence”. The author points out the risks and potential of an uncontrolled superintelligence, and how it can be wisely managed. The book is not for dummies, and it is essential to have knowledge of computer science, politics, and economics at least.

5. Você, Eu e os Robôs: Como se Transformar no Profissional Digital do Futuro (Brazilian Portuguese)

This book embraces the impacts of the Digital Revolution, along with the transformations, challenges, and opportunities that we are going to experience. The book is structured into three main parts:

— You and I — the human beings, focusing on humankind transformed by digital technologies;

— They: the ascension of digital beings, discussing AI, robotics, and technological trends;

— We: the hybrid future[human beings+ digital beings], where we ask ourselves where we go, how we are mixing with machines, and how could the machines could broaden us in a technological way.

That is all folks! I think I have gone through a lot of things.

If you liked this story, please give it some claps.

You can add me on LinkedIn here.

Best wishes!

--

--

Vagner Zeizer C. Paes
Geek Culture

Data Scientist; Data Passionate; Applied Machine Learning; Data Analysis