The Four A’s of Big Data

Cathy Tu
CISS AL Big Data
Published in
6 min readSep 6, 2023

The Oversaturation of the V’s of Big Data

Almost everyone who has studied Big Data knows about the V’s of Big Data. The majority of us have heard of volume, variety, velocity, veracity, and value (Gillis, 2021), but some of us have also come across validity, visualization, virality, volatility, and so on. The list goes on. A quick Google search can provide articles on the five V’s all the way up to the 51 V’s (Khan et al., 2019). Now, it feels as though scholars have been forcibly linking any word that starts with the letter V to Big Data. The oversaturation of the V’s has blurred the focus of Big Data and its key elements.

As a result, we resort to a lesser-known letter — the A. The four A’s of Big Data include automation, accuracy, agility (Nair, 2020), and adaptation. These four words highlight the fundamental characteristics of Big Data without any redundancy or inadequate explanation. The rest of this article dives into the meaning of each of the four terms and their significance.

Automation

The first A is automation. The key difference between traditional statistics and Big Data is not the fact that more information has been created — it’s that more information can be discovered and that more data can be stored with expanded computing power, as portrayed in Figure 1. Processing large amounts of data places heavy demand on computing infrastructure because processing workloads need to be distributed across hundreds or thousands of commodity servers (Botelho et al., 2022). With a focus on such development, total general computing power is predicted to see a tenfold increase and reach 3.3 ZFLOPS by 2030 (Brinkmann et al., 2021). Being able to store and read data is the prerequisite to making any predictions or answering any questions.

Fig. 1: The expansion of computing power gives way to Big Data (Pandey, 2021).

Accuracy

The second A is accuracy. Although statistics and random sampling provide affordable shortcuts, the accuracy they bring does not come close to that of Big Data analytics. One inherent weakness of random sampling is systematic bias. According to Dutch philosopher Baruch Spinoza, randomness does not exist (Spinoza, 1677). Systematic biases in the way data are collected could produce significant errors (Mayer-Schönberger et al., 2013, p. 23). For instance, election polling relying on landline phones would produce a set of data that mainly consists of younger and more likely liberal individuals, as the surveyed group is limited to those who use cell phones (Mayer-Schönberger et al., 2013, p. 24). With Big Data, n = all. Even if there are slight errors in some of the measurements, the amount of data still points analyzers in the right direction. Additionally, sampling is not always capable of capturing outliers, which are sometimes exactly what people are looking for. As an example, the detection of credit card fraud, which is depicted in Figure 2, would only be possible with the investigation of anomalies in spending. Picking apart the details can provide some of the most valuable information.

Fig. 2: Credit card fraud detection through Big Data analytics (Smith, 2022).

Agility

The third A is agility. Data agility refers to the rapid speed and inherent flexibility that facilitate the prompt, dependable, and scalable fulfillment of the demands of a business or study (Talend, 2023). As substantial flows of data swiftly enter, many business decisions also need to be made. Analysis of data must be made available instantly to assist in drawing conclusions and providing timely resolutions (Gillis, 2021). Take Instagram as an example: There are more than 600 million Instagram users today, 400 million of whom are active at least once every day (Marr, 2021). Every minute of the day, 66,000 images are shared on the app (Domo, 2023). Instagram consists of many features that rely on personalization, including the Explore page, recommended posts, and sponsored content. To increase engagement and cater to users’ preferences, Instagram records how many and how quickly people are liking, commenting, sharing, and saving every post; people’s interaction history with others; etc. The interconnection of users adds another layer of complexity to the data, as it’s not as though each user exists in isolation (Instagram for Creators, 2022). What this implies is that Instagram needs to be able to select posts to show and decide how they are ordered for 400 million users in less than 24 hours at all times. Besides, this is without consideration of how long users spend scrolling and how many times they hop on the app. As shown in Figure 3, every component of one’s feed comes with a purposeful decision. Moreover, Instagram is one example of many. The speed at which one’s Instagram feed is personalized may not seem pivotal, but the same mechanism is applied in other situations that have to do with human lives and security. The rapid processing of Big Data has the potential to enhance patient outcomes in the healthcare sector (Dolley, 2018), provide disaster resilience to individuals affected by natural disasters (Chapman, 2023), and furnish vital information from surveillance systems for crime prevention (Takemura, 2020).

Fig. 3: Personalization of Instagram feed (Hatmaker, 2021).

Adaptation

The last A is adaptation. One of the greatest shortcomings of random sampling is its lack of extensibility or malleability (Mayer-Schönberger et al., 2013, p. 25). In preparation for creating samples for research, one must prepare all the questions beforehand and cater the sampling methods to the questions. The sample would be able to provide answers that are generally correct, but it can’t answer questions that were never considered in advance of the data collection. Every time a new question is proposed, data must be gathered again. This leaves the sample “dead” after conclusions have been drawn from it. On the other hand, when n = all, all aspects of a phenomenon are recorded and digitized. Even if a question was never asked in the first place, it may be able to be answered with Big Data because of the scope of information it collects. Big Data can immediately adapt to given circumstances and answer spontaneous questions, contributing to its efficiency and usefulness.

The Fundamental Aspects of Big Data

With Big Data’s focus on the four A’s, including automation, accuracy, agility, and adaptation, the world has been able to advance in countless sectors. While the discussion of Big Data will continue to evolve, focusing on its key characteristics over tangential attributes like the dozens of V’s helps highlight its transformational power and ability to tackle challenges in novel ways.

References

5 A’s to Big Data success. (2020, December 15). BrightTalk. https://www.brighttalk.com/webcast/9059/453683

‌ Botelho, B. et al. (2022). Big Data. Data Management; TechTarget. https://www.techtarget.com/searchdatamanagement/definition/big-data

Brinkmann, A. et al. (2021). Computing 2030 computing building a fully connected, intelligent world. Huawei. https://www-file.huawei.com/-/media/corp2020/pdf/giv/industry-reports/computing_2030_en.pdf

Chapman, A. (2023). Leveraging Big Data and AI for disaster resilience and recovery. Texas A&M University. https://engineering.tamu.edu/news/2023/06/leveraging-big-data-and-ai-for-disaster-resilience-and-recovery.html

Data never sleeps 10.0. (2023). Domo. https://www.domo.com/data-never-sleeps

Dolley, S. (2018). Big Data’s role in precision public health. Frontiers in Public Health, 6. https://doi.org/10.3389/fpubh.2018.00068

Gillis, A. S. (2021). 5 V’s of Big Data. Data Management; TechTarget. https://www.techtarget.com/searchdatamanagement/definition/5-Vs-of-big-data

Hatmaker, T. (2021, June 23). Instagram’s newest test mixes “Suggested Posts” into the feed to keep you scrolling. TechCrunch. https://techcrunch.com/2021/06/23/instagram-suggested-posts-test-topics/

Khan, G., Naim, A., Hussain, M. R., Naveed, Q. R., Ahmad, N., & Qamar, S. (2019). The 51 V’s of Big Data | proceedings of the international conference on omni-layer intelligent systems. Association for Computing Machinery Digital Library. https://dl.acm.org/doi/10.1145/3312614.3312623

‌Marr, B. (2021, July 2). How much data do we create every day? The mind-blowing stats everyone should read. Bernard Marr & Co. https://bernardmarr.com/how-much-data-do-we-create-every-day-the-mind-blowing-stats-everyone-should-read/

Mayer-Schönberger, V. et al. (2013). Big Data: the essential guide to work, life and learning in the age of insight.

Pandey, V. How Big Data analytics is revolutionizing industry and delivering business value. (2021). LinkedIn. https://www.linkedin.com/pulse/how-big-data-analytics-revolutionizing-industry-business-pandey-

Recommendations on Instagram: what creators need to know. (2022). Instagram for Creators. https://creators.instagram.com/blog/instagram-recommendations-eligibility-tips-creators

Smith, L. (2022, December 19). Credit card fraud. ClearScore. https://www.clearscore.com/au/learn/credit-cards/credit-card-fraud

Spinoza, B. (1677) Ethics — part 1.

Takemura, N. (2020). AI-algorithm-Big Data, predictive criminal justice and hyper crime/social control: surveillance capitalism after ‘singularity’ and prospects of informational civilization. United Nations. https://www.unodc.org/documents/commissions/Congress/documents/written_statements/Individual_Experts/Takemura_Big_data_V2100865.pdf

What is data agility? (2023). Talend. https://www.talend.com/resources/what-is-data-agility/

--

--