In 2012, a story was published in the New York Times under the headline How Companies Learn Your Secrets. The article discusses, among other things, how and why a marketing team at Target tried to build a model to predict which shoppers were pregnant. Partway through the article, there is an anecdote

About a year after Pole created his pregnancy-prediction model, a man walked into a Target outside Minneapolis and demanded to see the manager. He was clutching coupons that had been sent to his daughter, and he was angry, according to an employee who participated in the conversation.

“My…


Three Fundamental Things That Deep Learning Can’t Do

AI is a huge and vague term that means different things to different people, but everyone agrees about one thing: AI is the future. Soon enough, AI will be driving all of our Ubers and directing our Netflix specials and delivering our Amazon prime orders. Indeed in many cases these things are already taking place to some extent.

The crown jewel of modern AI hype is the family of machine learning models called Deep Neural Networks, or more often these days, Deep Learning. …


In June 2017, a paper popped up by Jardim et. al at the National Bureau of Economic Research entitled Minimum Wage Increases, Wages, and Low-Wage Employment: Evidence From Seattle. This is the first of what will probably be many studies investigating the effects of the Seattle minimum wage changes, and generated a lot of press in a short amount of time, with headlines ranging in political charge from CNBC’s tame Seattle’s minimum wage hike may have cut wages and jobs: Study author to the LA Times’ more biting Seattle’s experience shows liberals are clueless about raising the minimum wage and…


The paper was posted to NBER a few days ago and has gotten quite a bit of attention. Unfortunately there doesn’t seem to be an ungated version floating around right now, but lots of universities have access to NBER papers by email so you might be able to get to it by giving them your .edu email address if you have one.

The broad question that the authors set out to answer is: what are the economic outcomes for refugees in the United States? This question is split into roughly three parts.

  1. How do economic outcomes for refugees differ according…

A foray into the mathematics of NPS, complete with code for Excel and R

In my last piece about the Net Promoter Score and how it’s not a very useful metric in part due to the extremely wide confidence intervals that it generates, I wrote

Anyone using anything at least as powerful as Excel should have no problem producing [NPS] standard errors and confidence intervals and p-values and whatever else we normally report along with sample-based estimates of population statistics.

Since then, I’ve had a few people reach out and ask how to actually compute those confidence intervals, since it’s not necessarily obvious if you aren’t a statistician. So here is a little tutorial…


Update: For those who wish to know how to find confidence intervals for the Net Promoter Score, I’ve written a guide complete with a downloadable spreadsheet or R code for doing just that. Hopefully someone out there can use that information to improve their NPS reporting with some measures of statistical confidence.

People love the net promoter score. According to the Wikipedia article on NPS, two thirds of Fortune 1000 make use of it somehow. If you don’t know about it, NPS is a survey-based management tool meant to quantify customer satisfaction and loyalty. The folks who sell the NPS…


If you want to know something about your customers or market, it seems pretty obvious that a good way to find out is to ask. And indeed, surveying is probably one of the most effective knowledge-generation activities that a company can take — given that the survey is well executed. But a survey which is incompletely or badly executed is at best expensive and of questionable value — and at worst, misleading and damaging.

Surveys come with a host of hidden costs, aside from the fees charged by vendors. First of all, the pool of survey participants is finite, and…


How to avoid letting a model cheat on you and break your heart

There are a lot of ways to screw up when you’re trying to do data science. Some of them are very technical. Leakage is not one of those very technical screw-ups. In fact, you hardly have to know anything about mathematics or statistics or computer science at all to understand the basic premise of leakage. However, it is one of the sneakiest, most sinister, and widespread mistakes a data scientist can make. The Handbook of Statistical Analysis and Data Mining Applications calls leakage one of the top ten data mining mistakes. I might put it at number one.

The most…


This is the second post in a series about why business leaders need to know a little data science if they want to use data science as a tool. In the first post, I wrote about how without understanding a little about how model error and evaluating model performance, you can unknowingly end up in a situation where your model sucks. Moreover, unscrupulous (or maybe just oblivious) data science snake oil salespeople can take advantage of your lack of understanding to make you think you’ve got a model that works great. …


“I’m not a math guy. I don’t need to look at graphs. Just give me the punchline.”

I’ve been in the data science biz for a few years now, and everywhere I go, this is what the big boss has to say. And that’s fine—the bosses are busy people with a lot of responsibilities, and after all, if everyone loved pouring over coefficient tables and plotting their own ROC curves, I’d be out of a job. Besides, in my experience us data science folks can be a little… rigorous. …

Colin Fraser

Data Scientist at Facebook

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store