Cologne, Germany. By MICHAEL HAUSENBLAS, 05/2013—donated to the public domain.

Ubi fumus, ibi ignis.

Thoughts on TCO of data-driven business decisions.

Michael Hausenblas
Large-scale Data Processing
3 min readMay 20, 2013

--

Recently, after a customer engagement, I started to spend a bit more time thinking about aspects of the TCO of data-driven businesses. Some insights and even more questions follow.

Claremont Resort, CA.

Let’s step back a bit. In 2008, leading database researchers and practitioners—including Eric Brewer, Alon Halevy, Mike Stonebraker, and Gerhard Weikum— met at the Claremont Resort in Berkeley, California to discuss the state of the field. One insight particularly caught my attention:

Data analysis as a profit center: In traditional enterprise settings, the barriers between the IT department and business units are quickly dropping, and there are many examples of companies where the data is the business. As a consequence, data capture, integration and analysis are no longer considered a business cost; they are the keys to efficiency and profit. The industry supporting data analytics is growing quickly as a result. …

At the same time, a growing number of non-technical decision-makers want to ‘get their hands on the numbers’ as well.

Fast-forward to 2013 and to a seemingly simple question: while I agree with the sentiment wholeheartedly, there are still costs associated with ramping up and operating the Big Data infrastructure, right? So, I was wondering, how are enterprises spending their $$$ in this context? How much goes into hardware and how much into software (and, of course into services/training/eduction needed to use both)?

So I started to dig around a bit and found a nice paper from the late 1980s called An Empirical Analysis of Software and Hardware Spending that suggests the following:

S-shaped budget curve, from ‘An Empirical Analysis of Software and Hardware Spending’, 1989.

Hmm. So it seems that over time hardware (HW) and software (SW) costs seem to have developed a rough 1:1 ratio, while the majority goes into maintenance. I suppose operations is included in the latter category.

Now, to see where we are now, some 20 years later, I tried to source some more stats, incl.OECD and Gartner data. I can tell you, it ain’t easy. However, the results of this–admittedly not very scientific–survey show essentially two things:

  • The revenue per sold HW unit during the past 40 years seems to be pretty stable, around $1000.
  • The SW:HW costs ratio seems to be rather constant as well, around 0.6, meaning that for $1 on SW, some $1.6 on hardware spending occurs.

Of course, you don’t have to believe me. The raw data of the survey is available as a Google spreadsheet under CC BY 2.0 license.

Another interesting aspect is that of how the storage costs went south. This is sort of incredible, have a look at A History of Storage Cost which essentially states that the costs per GB came down from some $193,000 in 1980 to only $0.07 in mid 2009.

Now, given the above, the MapReduce/Hadoop way of thinking: storage is cheap, let’s keep around all the data in a raw form, we’ll sure figure how to query it—and hence answer business questions—tomorrow or the day after tomorrow, latest, makes increasingly sense. There is a reason why, say with $1 M it allows you to store and process 20PB in Hadoop, while the same budget enables you to handle 0.5PB to 1PB with a traditional SAN storage or NAS filer environment.

Is this the final answer? I guess not. There are a number of factors, including the required knowledge—both in terms of principled technology and operations—one needs to factor in. But given that it took the relational DB community, back in the 80s of the last century, also a few years to build up profiles such as a DBA and ramping up the general availability of people equipped with the right skill set, I think it’s fair to claim that we get there, rather sooner than later.

--

--

Michael Hausenblas
Large-scale Data Processing

open-source observability @ AWS | opinions -: own | 塞翁失马