We Still Need Economic Models to Learn From Big Data

Ben Charoenwong
3 min readDec 8, 2015

--

Source: http://thenextweb.com/wp-content/blogs.dir/1/files/2014/03/data-flow.jpg

As data storage becomes cheaper, an important question arises: how do we convert raw data into business insights? While descriptive statistics are useful, they can often be misleading. A summary of 12 terabytes of regional data sampled at hourly frequencies may show that a rooster crows every morning before the sun rises, but does this suggest that if we strangle the rooster the sun stop rising? Of course not.

Yet, all over the media, we see quotes like “being rich is hard because you have to buy a house, a car, and a yacht, all while sending your kids to private school”. Both of these quotes highlight the fact that in-sample correlations and predictive ability alone are not insights. To actually learn something from correlations in the data, we must have a mental model of how we think the world functions. In business, an economic model or hypothesis facilitates this synthesis of contextual information with data. It is the common sense that says strangling the rooster will have absolutely no effect on whether the sun will rise.

Drawing business insights from data is fundamentally deeper than just filtering signal from noise; it is about knowing what to do with the signal after the noise has already been purged out. The signal might be that companies with more female board members have 36% more return on equity than similar companies with mostly male board members, but it does not mean that replacing male board members with female counterparts necessarily improves performance. Perhaps companies that have more female board members also tend to be companies that are more informed about current fads, so they simultaneously have more board members and choose to sell more fashionable products.

Data scientists can help with the filtration process to estimate that 36%, but interpreting the signal requires the corporate decision-maker to have a deeper understanding of economics and the business climate. The challenge boils down to the distinction of causation from mere statistical association. Since running large-scale experiments are usually not an option in most business settings, an economic model provides a lens through which to view the data and draw potential causalities.

To get a clear picture of why we need models, we only need to realize that people and companies tend to optimize before making decisions. Before concluding that a corporate training program is successful, it is paramount to consider whether or not the people that showed up were the ones who would actually gain the most in the first place. An executive who understands this will realize that unless the estimated statistical model accounts for this selection, the effect of the corporate training is likely overstated. Realizing this will allow her to adjust her expectations slightly when deciding whether to implement a new program.

As the cost of processing and storing data plummet, we will have a much larger number of “signals”. But knowing what to do with all these signals will still require an economic model. Data science, by itself, will not be able to save the day. Decision-makers will need to bear some of the burdens of turning data into insight, and although maintaining databases may require some specialized knowledge, having a sense of how the world works hardly requires a Ph.D. in computer science.

--

--

Ben Charoenwong

Assistant Professor of Finance at the National University of Singapore. Michigan and Chicago alum. I write random musings and complain about business media.