Applied Predictive Modeling
“Data Science” is the most exciting research and professional fields these days. It is creating a lot of buzz, both within the academy as well as in the business world. Detractors like to point out that most of the topics and techniques used by people who call themselves Data Scientists have been around for decades if not longer. However, has often been the case that a combination of topics and methodologies becomes important and concrete enough that a truly new subfield emerges.
Predictive Modeling is a particularly exciting subfield of Data Science. Thanks to the few recent high profile news grabbing success stories (the 2012 US presidential election, the Netflix prize, etc.) it has attracted a lot of attention and prominence. Thanks to the increased use and availability of data in all walks of life we are increasingly able to make reliable predictions and estimates regarding topics and issues that affect us in very substantive ways. This ability may sometimes seem almost magical, but behind it lay some very accessible ideas and techniques. “Applied Predictive Modeling” aims to expose many of these techniques in a very readable and self-contained book.
This is a very applied and hands-on book. It guides the reader through many examples that serve to illustrate main points, and it raises possible issues and considerations that are oftentimes overlooked or not sufficiently reflected upon. For instance, the way we model as simple of a data as a calendar date can have a significant impact on the kind of analysis and predictive model we choose. This is the kind of information that is often not discussed in other modeling books and can sometimes take years of practical experience before its impact is fully appreciated.
The book has a fairly low access bar, but it is definitely not intended for a complete novice. It assumes a fairly decent background in statistics, R language, and at least a passing understanding of machine learning. Many of these techniques are covered in this book, but mainly as summaries and refreshers. Each one of them could use up a book of its own, ore even a whole collection of books.
One of the best features of this book is that the authors understand that predictive modeling is not just a bunch of statistical and computational techniques. Understanding the data, how to obtain it, manipulate it, and format it, are some of the most crucial steps for predictive modeling (and other data-driven fields), and are often overlooked and not sufficiently explained in many other books and papers that I have come across. The same can be said about the model selection — the choice of a model and its predictive power will crucially depend on the kind of phenomena that we are predicting, as well as on what exactly are we trying to predict. This book does an excellent job in guiding the reader along these paths and installing the necessary intuitions required for successful predictive modeling. Here too, like with most things in life, there is no substitute for years of experience working with actual real world problems, but going through this book will ensure that you don’t have to stumble too much with your first steps.
**** Book provided for review purposes. ****
Originally published at www.tunguzreview.com on June 30, 2015.