Awesome Stories of Data Science

Dunnhumby

Clive Humby was a University of Sheffield trained mathematician. After resigned from Caci, a multinational professional services and information technology company, he and his wife Edwina Dunn started their own company called Dunnhumby in 1989 in a kitchen of their home in Chiswick, West London. Its service is dedicated to analysing data to best work out the spending patterns of consumers. Clive thought that retailers at the time obviously knew how much they were selling of a particular product, but the key point is that they did not know what products combination consumers were typically purchasing. This is likely basket analysis we know today. Dunnhumby was really able to provide this information.

Their first clients was UK food wholesaler, Booker. Later in 1994, Humby met Grant Harrison, the key person of Tesco’s new loyalty card project, in a conference where Clive Humby was speaking at. Harrison told him that Tesco who were second in the UK retail market behind Sainsbury, wanted to create a new loyalty card to lift up their sales and even broaden their market share. Then both agreed to run a trial before signed the whole project. Subsequently, Ms Dunn and Mr Humby was invited to give a presentation to Tesco’s directors about their finding during initial trial session. After had finished their report, an awkward silence followed for more than a minute. This was eventually broken by Tesco’s then chairman, Lord MacLaurin which declared:

“What scares me is that you know more about my customers after three months than I know after 30 years”.

The couple had shown the Tesco board that their tiny business had the software and skills combined to do something the supermarket group had not been able to do for itself-work out almost exactly what Tesco’s customers were buying. Tesco quickly gave them a long-term contract and used their expertise to launch the Tesco clubcard, the world’s first supermarket loyalty card. It is estimated that since its introduction, the loyalty scheme has saved Tesco € 350m a year on expensive blanket marketing campaigns. In just over a year it enabled Tesco to overtake Sainsbury and become the UK’s largest retailer. Then, Tesco bought a 53% stake in Dunnhumby in 2001, for a reported € 30m and increased this holding to 84% in 2006. This kind of story is really amazing. Dunnhumby benefited data analytics to elevate retail business performance by understanding more deeply the customers’ behaviour.

Oakland Athletics

The star in this story is William Lamar “Billy” Beane, an American former professional baseball player. Young Billy had to face a dilemma choice whether to sign Stanford scholarship or started a career becoming baseball player as some scouts that came to him predicted that he will be a next star of American League baseball. In short, he trusted what scouts said and chose to be baseball player. Nevertheless, his playing career with The New York Mets failed to meet the expectations of scouts. After five seasons performed so bad, he joined the Oakland Athletics front office as a scout in 1990. Seven years later he was named as a general manager of the Oakland Athletics baseball team. Following the 2001 season, Oakland had a very successful season by winning 102 games, but they saw the departure of three key players: Jason Giambi moved to New York Yankees, Johnny Damon signed a contract with Boston Red Sox and Jason Isringhausen moved to with the St. Louis Cardinals.

Billy Beane had to replaced them for the 2002 season but the team did not have enough money to get the best guys. His funds were so limited that he had to build an entire team with a budget most rich teams could spend on a single player. Beane met a Harvard-educated statistician Paul DePodesta that able to demonstrate sabermetric principles to find the best players completing the whole team despite financial constraints. Then he used Paul’s expertise to look after under-the-radar free agent signings based on player’s performance statistical analysis getting players such as Scott Hatteberg, David Justice and Ray Durham, also welcoming other players. The new-look Athletics, despite a comparative lack of star power, surprised the baseball world by becoming the first team in the 100 plus years of American League baseball to win 20 consecutive games between August 13 and September 4, 2002.

This is what we learn how data analytics applied to baseball game to analyse players profile instead of using human instinct. Although Athletics could not eventually win the league, but they enjoyed the valuable process overcoming financial problem to build the best team using analytics and finally able to broke the old record. The Athletics’ season was the subject of Michael Lewis’ 2003 book Moneyball: The Art of Winning an Unfair Game and a film adaptation of the book titled Monyeball released in 2011. After the 2002 season, the Boston Red Sox made Beane an offer of $ 12,500,000 to become their General Manager that will become highest value ever, but he declined.

FiveThirtyEight

Nathaniel “Nate” Silver, an economics graduate from the University of Chicago, had developed a system for forecasting the performance and career development of Major League Baseball players called PECOTA before he initiated his own website, FiveThirtyEight.com, that analysed and predicted result of the 2008 United State presidential election. Surprisingly, the report could predict correctly the winner of 49 of the 50 states as well as the District of Columbia. The only state he missed was Indiana, which went for Barack Obama by 1%. He also accurately predicted the winner of all 35 Senate races that year. Achieving that outstanding result, he later was named as one of The World’s 100 Most Influential People by Time in April 2009. He made the analysis based on demographic analysis that proved to be substantially more accurate than those of the professional pollster gained visibility and professional credibility. On June 1, 2008, he declared his reason why politics is more interesting for him than predicting baseball players performance in an article in the New York Post.

“In polling and politics, there is nearly as much data as there is for the first basemen. In this year’s Democratic primaries, there were statistics for every gender, race, age, occupation and geography — reason why Clinton won older woman, or Obama took college students. But the understanding has lagged behind. Polls are cherry-picked based on their brand name or shock value rather than their track record of accuracy. Demographic variables are misrepresented or misunderstood.

Nate Silver with his blog kept making prediction for political election using his analytics system. In the 2012 US Presidential election between Barack Obama and Mitt Romney, Silver correctly predicted the winner of all 50 states and the District of Columbia. This is what we learn how politics can be measured accurately using data analytics and reveal good predictions. Since founded on March 7, 2008, FiveThirtyEight had successfully received a lot of awards such as the first blog ever selected as a Notable Narrative by the Nieman Foundation for Journalism at Harvard University in September 2008, winning a Webby Award for “Best Political Blog” from the International Academy of Digital Arts and Sciences and many others. FiveThirtyEight had ever made a partnership with The New York Times for three years in 2010–2013 until it was acquired by ESPN in 2014 and Nate Silver was appointed as editor-in-chief. Under the ESPN ownership, FiveThirtyEight has covered a broad spectrum of subjects including politics, sports, science, economics and popular culture.

LinkedIn

In the internet era, data is getting bigger and bigger. Many internet companies like Google, Facebook, Twitter, LinkedIn and others are definitely facing problems with data. But, many thanks to data analytics so then those internet companies were able to develop so many great data products that had been enjoyed by many of us while using their services. You can find the best answer using Google search engine, you feel enjoy connecting in social media that suggested many new friends and so on. Top of that, I will share an interesting story in LinkedIn that became main background of Thomas H Davenport and DJ Patil’s article in Forbes, “Data Scientist: The Sexiest Job of the 21st Century”.

He is Jonathan Goldman, a PhD in physics from Stanford which arrived at LinkedIn in June 2006 as a new signing. He was hoped to help the LinkedIn users’ growth taking off, while the company has just under 8 million user accounts but they seemed not to seek out connections with the people who were already on the site. One of their managers even illustrate the condition by saying

“It was like arriving at a conference reception and realising you don’t know anyone. So you just stand in the corner sipping your drink — and you probably leave early.”

There was possibly something missing in the social experience within the site. Afterward, Goldman started to explore the data, discovered how people were connected each other. He began to see possibilities and raised some hypotheses. He then tested it and finding patterns that allowed him to predict whose networks a given profile would land in. He could eventually imagine a new feature that might provide value to users. But LinkedIn’s engineering team caught up in the challenges of scaling up the site, they argued that it seemed uninterested. Some even were openly dismissive of Goldman’s ideas. Their reason is why would users need LinkedIn to figure out their network for them, while the site already had an address book importer that could pull in all a member’s connections.

Fortunately, Reid Hofmann, LinkedIn’s cofounder and CEO at the time had faith in the power of analytics because of his experience at PayPal and granted Goldman a high degree of autonomy. He allowed Goldman to publish small modules in the form of ads on the site’s most popular pages. Through one such simple module, he began to test what would happen if the site presented users with names of people they had not yet connected with but seemed likely to know. Within days, it was obvious that something remarkable was taking place, the click-through rate on those ads was the highest ever seen. It did not take long for LinkedIn’s top managers to recognise a good idea and make it a standard feature. That’s when things really took off. “People You May Know” ads achieved a click-through rate 30% higher than the rate obtained by other prompts to visit more pages on the site. This such data product had helped many internet companies achieved its most promising growth. As we know quite well, Netflix and Spotify also built its product recommendation using data analytics to help user find the movie or music that they are really like. Inevitably, it will improve user loyalty by creating a value-added relationship.

Limitless Data Science

In fact, there are a lot of problems solved by Data Science or Data Analytics. Those four stories are just small part of a big world of how people along with data science are able to overcome totally hard problems. Data Science can be applied ranging from business to politics, from sport to internet. This subject is still growing and finding its nature living along with human daily activities. We hope that data science can contribute more in the future, the methodology will be more sophisticated, the technology will be more advanced, people finding better living.