Know the Value-Per-Byte of your Data

Dhruba Borthakur
97 Things
Published in
2 min readJun 28, 2019
Cartoon https://www.silicon.co.uk/workspace/know-much-personal-data-worth-154406

As a data engineer at various data-driven software companies, I saw that new technologies like Hadoop and AWS-S3 enabled product teams to store a lot of data. Compared to earlier systems, these new systems reduced the Cost-Per-Byte of data so much, that it became economically feasible to store terabytes of data without boring a hole in your pocket. It was easy to calculate this metric: divide your total data set size by its total cost, and you have your Cost-Per-Byte metric.

Product engineers started to log every event in their application without a second thought — “I will log fine-grain details about the event although I really need only one small piece of the event. It is cheap to log, so why bother to reduce the size of my log record?”

We data engineers, were thrilled to flaunt the size of our terabyte-size datasets as compared to the traditional database-administrators who typically managed upto few hundred GB of data. General Electric’s locomotives generate 1 TB in a single freight-route. A Boeing-787 generates half terabyte of data per flight. And data engineers, help manage this data. This was in the mid 2010s when enterprises leveraged the rapidly diminishing Cost-Per-Byte to practically never delete their log data (other than for compliance reasons).

Fast forward to late 2010s. Today, I am not challenged by the size of the data that I need to maintain. It is the value that the enterprise extracts from the data that is important to me. What insights are we able to extract from our datasets? Can I use the data when I need it or do I have to wait for it? These are best captured by a new metric, Value-Per-Byte.

For my enterprise, I have my own way to compute Value Per-Byte. If any query touched one specific byte of data, then the value of that byte is 1. If a specific byte is not touched by any query than the value of that byte is 0. I compute my Value-Per-Byte as the percentage of unique bytes that were used to serve any query. For my multi-terabyte dataset, I found that my Value-Per-Byte is 2.5%. This means that for every 100 byte of data that I help manage, I am only using the information stored in 2.5 bytes.

What is the Value-per-Byte of your enterprise? You might calculate it differently, but if you can increase the Value-Per-Byte of your system, you can positively impact the data-driven decisions in your enterprise.

--

--

Dhruba Borthakur
97 Things

Dhruba is the CTO at Rockset(http://rockset.com). Rockset is a serverless cloud service that powers Operational Analytics.