Book Summary — Data Smart

Published in

MBReads

3 min readJan 4, 2019

You can find all my book summaries — here.

This book was the intellectually hardest book to comprehend, but incredibly stimulating. It goes through machine learning and AI models at incredible speed, as well as detail with very specific examples.

Absolutely loved it, but so hard to follow every step.

Data Smart gives you a full insight into statistics by running the reader through basic machine learning, AI, forecasting and optimisation models in Excel, but then also provides a bridge into how to solve all those problems in R.

Everything you ever needed to know about Spreadsheets

The first chapter kicks off with a great list of every Excel command you will need to build models — I thought it was an incredibly comprehensive list of all things you need to know.

Once you know them you’re sorted.

shortcuts (cmd + right, etc)
pane freezes
copy paste (transpose, values only etc)
fixed references
dynamic formatting
Formulas — Index, Match, Offset, Vlookup, Sumproduct
Filtering / Sorting
PivotTables
Array Formulas
Solver (which is essential for those models)

Then the book dives straight into examples and theory.

Cluster Analysis

Unsupervised analysis

Instead of blasting your whole database with emails, a simple cluster analysis can identify what different segments have in common.

The idea is you pick a couple K-means and you check which observations/customers/users (whatever you’re analysing) are closest — that’s the makeup of clusters

Silhouette → ( avg distance to those in the nearest neighbouring cluster minus avg distance to those in my cluster) / Maz of those two averages

K-median → modify clustering to only use values present in customers’ deal vectors

Measuring distances:

Euclidean — take Pythagoras’ distance, as the “crow flies”
Manhattan/Hamming —giving points when you match with someone else on a characteristic — you assign cosine distance to make it even better

Naive Bayes Model

Supervised Analysis

You train a model so that it recognises patterns in datasets — “bags of words”

remove punctuation, capitalised and short words
crosscheck words between dataset and bag

Optimisation Models

An artificial intelligence model predicts the result of a process by analyzing its inputs. That’s not what this is about.

Min / Max optimisation with conditions → use Solver

Artificial Intelligence Model

Training the model to identify the best predicting factors “via stumps”.

Train linear regression model by minimising squared error.

F test to show statistical significance
Check model coefficients
T test to show individual statistic significance

Ensemble model — build a couple AI models and let them vote on the most likely outcome

Forecasting

Exponential smoothing — smooth impact + most recent higher weight

Check for autocorrelation — do more smoothing ie seasonality (2x12 months moving average)

Systemise it

The last chapter does everything the book has showed step by step in Excel in a couple of simple lines of code in R.