What the $&@#^ is Applied Big Data?

3 min readFeb 26, 2013

We’ve frequently referred to Applied Big Data as one of our core investment themes. It seemed about time to drill down and discuss some how we think about it in more detail. The simplest answer is that they are data-driven applications at massive scale.

For those who haven’t already figured out what Big Data is (in spite of the hype), I’ll summarize it in layman’s terms. Big Data refers to the extraordinary wealth of data that the current world generates. As we used to say at Netscape in the mid-90s, Internet scale is WAY bigger than Enterprise scale. With Facebook at 1billion users and 2 billion smart phones generating clickstream data, this statement is doubly true. Broad sensor networks from logistics infrastructure to street parking similarly generate tons of useful data.

Traditional Enterprise data was structured, meaning it fit neatly into rows and columns like an Oracle database (or even a spreadsheet). Facebook’s relationship data or Google’s logfile data is unstructured and no longer fits that approach. Open Source tools like Hadoop, Cassandra, Mongo DB and Hive were developed to store, manage, search and analyze both structured and unstructured data — at massive scale.

There is clearly much good work and many investment opportunities in the platforms themselves, but years of innovation history suggest that Applied Big Data are as big an opportunity as the underlying platforms. Our focus at Costanoa is on data driven applications, marrying: #1 a solution to a large business problem; #2 best in class machine learning and predictive analytics AND; #3 proprietary data. When these elements are combined they can create a tremendous value proposition for customers — and provide the foundation for a great business.

There are some several good examples, some well known and some less so. The most notable company that fits this profile today is Splunk. It provides a highly scalable platform for companies to manage their logfiles. My own thinking on Applied Big Data originated with my investment behind Matt Glickman and Mark Selcow in the Series A of Merced Systems in 2001. They believed it could use data to help improve the productivity of human capital. By focusing initially on the call center, where every action is already instrumented, they were able to prove value very quickly. Companies like Dell, Sprint, and Delta Airlines successfully reached for the product when in operational crisis. As a result, Merced Systems was sold for $192m to Nice Systems in 2011, having raised a total of $2.5m. The key lesson here was “in-stream data aggregation” — finding places where processes are instrumented, but existing data assets are messy, hard to integrate, or simply left on the cutting room floor.

A canonical example in the Costanoa portfolio is DemandBase, which provides a Business-to-Business (B2B) Marketing Optimization platform leveraging proprietary data that appends IP addresses to traditional business information. As a result, customers like Salesforce.com and Hewlett Packard can identify anonymous users visiting their site by business identity and type, target their messaging, improve conversion rates and generate more qualified leads. Demandbase recently rolled out a B2B product to target advertising by company to enable customers to increase engagement with their key prospects. The company began by aggregating existing data and spent years curating it. But the part about which we got so excited is that the majority of the data has now been generated by Demandbase and its customers as they interact with the system, providing classic network effects. At Costanoa, we’ve invested with similar theses in companies like Kenna Security (fka Risk I/O) for vulnerability intelligence management, Guardian Analytics for prevention of online banking fraud, and Return Path in email intelligence.

We’ve learned a few things in pursuit of this hypothesis.

All of these companies started with a high value business or technical problem. Demonstrating early that this data-driven approach creates value for customers is a critical first step.
In-stream data aggregation, tapping into an existing data flow, enable each user to get much more out of the system than they put in. Expecting users to input data is a long putt — and a bet we’d rather not make.
In order to create real enterprise value, it is important to aggregate andcurate and originate data. A clever data origination process creates network effects.
Best in class predictive analytics and machine learning can provide exceptional value to customers, especially when operating on proprietary data. To the question of which is better, great algorithms or great data, we say BOTH.

When these elements are combined, cloud-based products and services can create great value for customers and let entrepreneurs build important and valuable companies. We’re looking for more such opportunities.

Originally published at www.costanoavc.com.

What the $*&@#^* is Applied Big Data?

Written by Greg Sands

What the $&@#^ is Applied Big Data?