# Data and Ethics books review

The three books I’ll be reviewing today are looking at the ways statistics and math can be badly misused. In the age when data is “the new oil”, when the Internet is full of unchecked facts, it is very important to know your brain’s blind spots and be equipped with the machinery to analyse the presented information and tell the truth from trickery.

#### Weapons of Math Destruction by Cathy O’Neil

Weapons of Math Destruction, just as the witty name suggests, describes the mathematical machinery used in the new world of “big data” by signifying the human biases, coming up with the new ones (some we would never even think about) or making arbitrary conclusions based on badly analysed data.

The book separates the models people are using for analysis into the two categories: reasonable ones, which use the feedback data in order to reinforce and improve themselves, to reduce the amount of poor decisions and make sure that the result represents initial expectations; and bad ones (which the author calls WMD, Weapons of Math Destruction), that often work as black-box mechanisms, misrepresent the data, base the important decisions on the data that no direct attachment to the measured outcome, have no feedback mechanism, do not evolve and improve.

WMDs, the author says, can be found everywhere these years. In *education*, where the teachers can get ranked (and subsequently fired for underperformance) by their student’s SAT scores and where the universities can get arbitrarily ranked based on the metrics that have very little to do with the actual student performance and impact on their lives. In *insurance*, where the rates are calculated by sorting people into some “group” where they allegedly belong and represent potential group behaviour instead of personal behaviour. In *politics*, where it can be used to plan campaigns with minimal investment by predicting people who can actually change their opinion (so called swing voters) and presenting candidate in a tailored mode, depending on the voter’s personal tastes and preferences. In *finance*, where the bad credit score can send the entire social groups into the (financial) death spiral. In *employment*, where the same credit score can get used as a success predictor. In *advertisement*, where our desires and opinions might get subtly manipulated in order to get us to spend money and push us to make irrational and unwise decisions.

This book touches the subjects that are extremely important in the modern society. Things that many industries are ignoring, either out of incompetence or out of malice. Sitting back and thinking that WMDs are OK as long as they do not touch you personally (because of whatever imaginary privilege) is a slippery slope: nobody is special in that regard and WMDs see all of us as mere data points and will crunch us mercilessly. Check out the book for learning more about it.

While discussing the this book online, someone suggested that it doesn’t go deep enough into the subject of use and abuse of data collection and algorithms. The first thing that came into my mind was that there’s already a great book on the subject:

#### Statistics Done Wrong by Alex Reinhart

Published as a free book online and it’s extended published counterpart, Statistics Done Wrong is the most technical of the three books discussed today. It also speaks on a subject of numbers, statistics and mathematics, “big data” if you please, but this time — looking at the technicalities of statistics and describing how exactly the numbers can be used to full us.

The author tells us how the scientists publish irreproducible research, skew the data by filtering out the contradicting data points in order to make their conclusions match the paper abstract, how many studies are* done without reasonably rejecting the “null hypothesis”*, which leads to the impossibility of establishing a strong relationship between the studied subject (say, a new type of flu medicine) with improvements in the patients health, how some studies publish the conclusions on the *absence of adverse effects* of the analysed new medicine and saying that there’s no statistically significant difference between the two studied groups, while, in fact, they should have said they did not have enough data to detect any differences but the largest ones.

The book is also full of stories, such as how “turn on red” rule was established in the USA, describes some statistical paradoxes, such as how come the Midwest, Southern and Western rural counties have simultaneously the lowest **and** the highest rates of kidney cancer, analysing the methods that the well-meaning scientists use in order to inflate the importance of their findings just to to get heard in the world of noise, how the scientists are re-doing experiments just in order to show that the hypothesis is true, just like the “Jelly Bean & Acne” XKCD describes.

SDW, just like WMD, discusses the ethics, although in this case more the ethics of research and publication, trying to bring the subject to the attention of the public and kick off the discussion, because extreme importance of statistics to the further scientific development.

The book is very nicely written and is definitely worth reading. Gladly, even though there is some statistics involved, it is still very approachable and easy to read.

When I finished reading the Weapons of Math Destruction, I wanted to put it on the bookshelf and started wondering where it might belong. After browsing the titles on one of the shelves, it stroke me that it is very close in terms of subject with another interesting book I’ve read recently:

#### How Not To Be Wrong by Jordan Ellenberg

The book starts with the World War II story, about Abraham Wald and Statistical Research Group where he worked. The group, among other things, was working on analysing the data in order to make predictions and try to change the course of the war. One of the tasks they were given was to analyse the data and come up with a way to reinforce the allied planes with armour in the ways that would keep them lightweight enough to be manoeuvrable and consume a reasonable amount of fuel but, at the same time, to be better protected from getting shut down from the enemy fighters.

Long story short, after trying out many different things and reinforcing the parts where the enemy seemed to be aiming (and hitting) most successfully, they’ve noticed no improvements in terms of percentages of returning planes. Wald tried many different things, but one day he decided to take a look at the data from the different perspective: since the planes that were available for statistical analysis were in fact making it back to the base. They had the bullet holes in fuselage, fuel system, on the wings and the other parts. It has stroke him that the amount of the planes with the damaged engine is stunningly low. All that time they were asking the wrong question. They tried to understand where the bullet holes were while they should have tried to understand where they were missing.

After putting a new recommendation, based on that insight, into effect, they’ve seen a major improvement in statistics. Turns out that the answer was hiding in a blind sight, as it very often happens.

The book is full of similar stories and mathematical tricks: author discusses the problem of taxation (and the Laffer curve), the problem with precision (and why the Earth may appear flat), why 0.9999… = 3 * 1/3 = 1, limits and infinitesimals, problems with the linear regression and projections (by proving that by 2048 100% of Earth’s calculation will be obese) and then proves that even though 100% of Earth’s calculation will be obese by 2048, 100% of black men will be obese by the year 2095.

I don’t want to spoil too much of the book, but I hope you get an idea. The author presents a bunch of examples and shows how the traditional and widely used tooling can present unintuitive and sometimes even ridiculous results, how statistics were used in basketball (to detect players that have a “hot hand”), in pharmaceutics (to test the new medicine), lotteries (and elaborates on the Monty Hall problem), in social networks, discusses the problems of statistical significance and false positives and their danger for every one of us.

The book is quite entertaining and insightful. It’s easy to read and has many interesting and throughs-provoking examples.

#### Have fun reading!

All three books are highly recommended for reading. They are discussing important problems we are facing as a society and suggest some ways out. None of the books requires a strong math background and can be read recreationally. Nevertheless, the thoughts these books provoke may make you want to understand some of the unfamiliar concepts deeper or review your point of view on how on the things you’ve already known about.

If you liked the post and would like to be notified about the next parts, you can follow me on twitter.