Both Appealing and Difficult to Digest — The H’s in Big Data

Henry Zhao
CISS AL Big Data
Published in
3 min readDec 12, 2023

Big Data had and continues to earn fame for itself through its “V’s” that describe its vast extent. However, V isn’t the only letter that’s able to describe this field: in Big Data, there are also 3 H’s that describe its vast expanse and the convenience that it brought us, as well as its responsiveness toward new inputs and its usefulness as seen in Fig. 1.

Fig. 1: Depiction of how Big Data is being currently applied

The first H in Big Data represents “heavy”. Big Data’s expanse allows us to access data that comes from nearly all sources, with some of them being documents, internet clickstream logs, and data from social media networks. Because of this, it’s heavily filled with datasets, variables, and numbers from studies and research across the world. The development of Big Data is so rapid that it provides us with a large extent of freedom to explore various datasets; many people, not used to this freedom, still feel that the data that they can access is limited and they’ll only be able to collect a small number of statistics; in the end, they usually followed this presumption and only collected minimal amounts of data. In addition, people who realize the freedom and convenience that Big Data brings about will likely find it hard to interpret the statistics that Big Data has provided us with access to. Datasets that are used for studies are usually heavily filled with large amounts of variables to be put into consideration; when combined with a large number of samples, any way of expressing these data (Including text, numbers, images, etc.) will be quite difficult to digest for an average person in a short period. The heaviness of Big Data provided us with the convenience of accessing data that we were unable to in the past, but people still need quite some time to accept the fact that the amount of data we have access to has grown exponentially over the past years.

The second H stands for “hyperreactive”, indicating the sensitivity of Big Data to new additions of samples. A prime example of this would be Oren Etzioni’s model that predicts plane ticket prices. Etzioni began with a model that contains 12000 samples that he obtained by “scraping” information from a travel website over 41 days. The model didn’t have any understanding of the variables that go into airline pricing decisions and could only make generalized predictions on the ticket prices of a particular flight. However, as the model evolved and received information from one of the flight reservation databases, it could predict the price for every seat on every flight for most routes in American commercial aviation. In his price prediction model, Etzioni successfully demonstrated how adding samples to existing Big Data can have drastic impacts on the precision of the predictions being made; in the field of Big Data, this is often the case for datasets where extending the range of data can display the accuracy of a model much more significantly and alter it to be more accurate & precise.

The third H stands for “handy”, which indicates how Big Data plays a core role in our lives. Even though some of us might not have realized it, Big Data has become quite useful for even an average person. With access to more data, we can utilize them in ways that can frequently benefit ourselves: for example, navigation applications use real-time traffic flow data and distance from destination to designate routes that reduce the time required for reaching the destination. Not only that, corporations’ access to Big Data also provides us with benefits, where we will be able to understand the needs of customers more efficiently through observing past trends of their purchases and try to implement ways that satisfy these needs as perfectly as possible.

From the 3 terms discussed above, we can see that Big Data might not be the optimal solution to any existing issue, but its existence still provides most, if not all people with more convenience to a certain extent, mainly by providing people with higher accuracy in the prediction of the future and assisting people as well as large businesses in making the most beneficial decision for themselves.

--

--