The Blind Side of Data-Driven Design
Big data and data-driven design are often regarded as ‘the goose that laid golden eggs’ in the tech industry. Some companies like Netflix and Amazon have successfully created new business models using elaborated data mining tools. Even so, I think they were just coincidentally lucky. In fact, I’m quite skeptical in data-driven design.
Let me start this essay from my first startup experience. While I was preparing for the design program application, I used to attend a meetup for user experience design and I had a chance to meet one of Samsung alumni who quit the job and worked for his own company. As a former HR guy, he was highly interested in data mining and career consulting. Having an ambitious dream, he decided to create a big data-based service that helps college students find their right career and coach them. That was actually one of the most attractive business idea that I have ever heard. I was deeply impressed by his goal and philosophy so I worked for his project for free on every weekends.
When I first saw his minimum viable product, however, my expectation turned to a doubt. The algorithm worked like this: I enter my profile and desirable career I want to have, then the solution calculates the current possibility of getting the job and suggest a set of actions in order to increase the possibility. It mines data from 300 million CVs stored on LinkedIn, Facebook and many more social media and headhunting websites.
Well, the solution might be able to find correlations between occupations and qualifications/experiences but how do these figures help students? Besides, what the system suggested to me was totally nonsense. When I entered that I want to be a UX designer, the algorithm told me that the chance of being a UX designer would be 10% higher if I get a college degree at Stanford University. Should I REALLY have to transfer to Stanford just because that increases the chance of being a UX designer? Of course not.
I think this is a great example of the trap of metadata. Metadata is like a set of hashtags used to categorize data and most databases are sorted by this. For example, when you search #CMU on Facebook, you can see thousands of photos and notes with completely different stories. While big data is proceeded with metadata however, context are removed and each different data is combined with the same attribute. As a result, it makes easier to do math and create a nice model to explain the relationship between variables. As David Cole pointed out, a large volume metadata tells everything about someone’s life without content.*
Consequently, the ‘refined’ big data sometimes brings unexpected, ridiculous results like my career path result or the case of Sarah Wysocki. Obviously, Michelle Rhee’s IMPACT evaluation model could not consider that the inflation of students’ initial scores and this causes Sarah to get a low evaluation score despite of her excellent teaching skill.** One question may arise at this point; Does data-driven solution really improved the quality of public education? I guess it only made the visible quantified figures better. Students’ grades never tell us what happened before.
Data itself is quantified consequence of a certain event and it is a great tool to see the relationship between causes and effects. However, to understand a phenomenon accurately we need to understand the background of an event. Without context, data-driven design cannot create the best solution for people. However, in most cases considering every context variable is impossible for big data analysis because it will make a model too complicated. In other words, big data itself cannot be a tool for human centric design. This is why I don’t believe in big data-based design anymore.
* David Cole. “‘We Kill People Based on Metadata’.” The New York Review of Books. http://www.nybooks.com/daily/2014/05/10/we-kill-people-based-metadata/
** O’Neil, Cathy. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy.