5 Best Reads for Solidifying your Data Science Skills in 2021
Before I dive into the list, I want to put out a disclaimer that: No, I am not receiving any compensation from promoting these books. Though it would be nice, my only goal with this list is to provide up and coming data scientists with some review material specifically tailored to workplace and industry skills. Also, this list assumes you are familiar with programming languages at the level of what is covered in a beginning MOOC course (such as PY4E — Python for Everybody).
A brief plug about me, I have been a data scientist for 3 years professionally without having formal training in computer science, my degree having been in Neuroscience. Additionally, in my short career, I have had the priviledge of working in both industry and academia. I believe this short list of books will help aspiring data scientists be confident and successful in starting a position in industry or solidifying their position in business; however this is just my opinion as there are no perfect lists.
Finally, if you are on the fence of applying for jobs or fighting with imposter syndrome that you may not know enough, I HIGHLY ENCOURAGE you to apply for entry level data scientist roles. Often times the best way to learn and the best way to overcome imposter syndrome is by on the job training/learning. Without further ado, lets dive into the list.
#1: Hands-On Machine Learning with Scikit-Learn Keras & TensorFlow by Aurelien Geron
I would say this is the Data Science/ML Bible when it comes to all things machine learning and model building. Geron is an ex-Googler and fantastic author covering the core concepts behind the most common Supervised, Unsupervised, Reinforcement, and Deep Learning algorithms. This book is written at a high level but broken down in such a manner that it serves as the ultimate reference book for one to own if working as a data scientist. It helps to double and triple check as a guide for when you are building your own business solutions. Furthermore, Geron provides walkthrough examples as well as leveled coding challenges (with solutions) after each chapter making this also a superb choice for aspiring Data Scientists as well.
#2: Practical Statistics for Data Scientists by Peter Bruce, Andrew Bruce & Peter Gedeck
The second book on the list is also an essential reference text to have. Many times in industry and business, your role as a data scientist is that of a consult to interpret the data that has been sent your way. To do this effectively and not make incorrect conclusions, its important to have a good understanding of statistics. This text reviews these concepts thoroughly and with examples from the world of advertising and finance in both R and Python code. It presents the digestible essentials (not the math heavy jargon) on topics including hypothesis testing, significance testing, and good exploratory data analyses. To cap it off, for each section/topic covered, there are references to further readings both whitepapers and full reference textbooks.
#3: Data Science at the Command Line by Jeroen Janssens
The third book on this list is a bit of a slog and dense with information maybe more relevant to those with a computer science background; however, it is beneficial as a Data Scientist to at least have some exposure to these processes as you may not always have a data engineer around to build model and visuals pipelines for production. This book gives you a deep dive regarding the built-in Unix based tools you have available on nearly any operating system. It explores how to build pipelines to clean your data, run tasks in parallel, design effective visuals for deployment on dash tools, and implement sustainable model solutions. Many times, this work may be handed off to a data engineer, but if your company is small or doesn’t have a large IT department this work often falls on the shoulders of Data Scientists so understanding the world outside of your Jupyter Lab environment is essential.
#4: GIT for Teams by Emma Jane Hogbin Westby
At number four, we have a book on Git. In business and academia, more often than not you will have to collaborate with others on a joint project and luckily git serves as an open source tool to monitor and track changes made whether it be on the data engineering side or the data science side. With git, multiple data scientists can work on the contribute to the same project and keep track of updates to their visualizations or models. There are many books on Git, however I find this one, one of the best as its not really a reference textbook but rather an introductory guide written for people who have not used Git. It explains a lot of the ‘why’ in terms of command use, design structure of branching and merging, all while letting you follow along from your own command line.
#5: How to Win Friends & Influence People by Dale Carneige
The final book on the list is not a Data Science book per se, but it is one of the most important skills to have as a data scientist, which is effective communication. There are a TON, and I mean a TON of highly intelligent people in the tech world and its okay to feel overwhelmed or inferior. Your role as a data scientist in a company is not to be some super coder or master software engineer, but rather help drive business solutions by leveraging data and half the battle is communicating results. I like this book as it illustrates the importance of building good working relationships and how to do so. Having good relationships with your co-workers, your boss and other employees at the company in general will make both your job easier and your life more meaningful.
Thank you for taking the time to read the list, again I want to reiterate, I am not sponsored by any of these authors nor the O’reilly publishing media company. I hope you will find this of benefit to your Data Science Journey and if you have any other suggestions, I would be happy to have a discussion down in the comments section.