With Big Data, Comes Great Power

Ryan Tang
CISS AL Big Data
Published in
5 min readOct 25, 2022

Have you ever wondered how a billion users of TikTok become more and more addicted to the app? Or how shopping apps such as TaoBao are able to precisely determine what you wish for even before you click on the search key? Well, the answer to these questions is simple: Big Data.

Figure 1: Big Data in Tik Tok (https://techcrunch.com/2022/07/13/tiktok-to-roll-out-content-filters-and-maturity-ratings-in-pledge-to-make-app-safer/)

Living in a world where electronics have become an essential part of our lives, every action or decision we make is virtually transformed into data, whether we are conscious of it or not. Companies such as TikTok(Figure 1) and TaoBao apply the concepts of Big Data Analytics to the data they collect from their millions of users. TikTok, for instance, is known for its ability to create a unique and personalized feed for each of its users, the “For You Page”. They achieve this by using the “TikTok Algorithm” which gathers and analyzes, in essence, every move of the user on the app ranging from likes, comments, shares, and search queries, to the time spent on each of the videos. But before the algorithm can begin forming the user’s interests, it is crucial that the videos uploaded, or database, are categorized. When users upload new videos, they are suggested to use captions and hashtags to describe the content of the video, which is then used to categorize the videos. Although the hashtags would provide “messy” data, which will be explained later in the article, the unstructured data could still be useful in categorizing content. In addition to the captions and hashtags, the algorithm also captures the sound and effects used in the video, further categorizing them to better fit the preferences of the users. With all the requirements met, the algorithm will now analyze all the data collected from the users and videos and configure the “For You Page” specialized for the user. This algorithm which applies Big Data Analytics is one of the chief ingredients of TikTok’s success. In fact shown in figure 2, in 2021, it generated $4.6 billion, a 142% increase year-on-year.

Figure 2: Quarterly revenue of Tik Tok from 2017–2022. (https://www.businessofapps.com/data/tik-tok statistics/#:~:text=TikTok%20has%20rapidly%20increased%20its,increase%20year%2Don%2Dyear.)

But what exactly is Big Data, one may ask? To put it simply, Big Data is the gargantuan volume of data that cannot be processed without the usage of technology. Big Data uses data processing to analyze and extract information from datasets to gain insights on the topic. But more specifically, Big Data refers to the 3 major Vs: Velocity, Variety, and Volume.

Nowadays, the 3 original Vs of Big data have evolved and duplicated into 42 Vs which include terms such as Vastness, Venue, Voyage… The arising of these new terms indicates the growing depth of our knowledge of Big Data Analytics. Luckily, the most commonly used, and arguably the most significant Vs in Big Data, are the original Vs: Velocity, Variety, and Volume.

Figure 3: Every Minute of the day in Big Data (https://www.smartinsights.com/internet-marketing-statistics/happens-online-60-seconds/)

Generally, when people hear the word “Velocity”, the first thing that comes to mind is the speed at which something moves in a particular direction. This is correct, but only partially in Big Data. When it comes to Big Data, the definition of “Velocity” is how rapidly the data is generated and how quickly it moves. For example, in healthcare, there are many medical devices that can monitor patients and collect data. From in-hospital medical equipment to wearable devices, these technologies collect data that needs to be sent to its destination and analyzed quickly to ensure the patient’s health. In fact, within every 60 seconds, there are so much data created, no single computer could hold all of it (Figure 3). In other words, “Velocity” is how quickly the raw Big Data information is turned into something an organization can potentially benefit from.

Figure 4: Unstructured Data vs. Structured Data. (https://www.youtube.com/watch?v=sf2S6ZI9BD0)

The “Variety” in Big Data is straightforward: the vast quantity of sources from which data are collected though not all of them provide the same level of value or relevance. In Big Data terms, it is the diverse data types in which the data is stored: structured, semi-structured, and unstructured data (Figure 4). Structured data are data that are defined in a fixed format, including dates, names, and transaction information. Whereas unstructured data has many faces like text files, PDF documents, social media posts… Variety is all about the ability to classify incoming data with various types into numerous categories.

Figure 5: Messiness in Big Data. (https://sahaysdailypost.com/the-messiness-of-data/)

Volume refers to the “Big” in Big Data. It is the vast amount of data that is collected and piled up in storage, waiting to be analyzed. In companies such as Walmart, Apple, and eBay, the quantity and size of data are measured in multiple petabytes. The enormous amount of data that companies have collected aids them by providing a clearer view of the details. However, with the great amount of quantity collected, comes tiny errors. The “messiness” of data would have been a notable flaw when analyzed under statistics using methods of sampling due to the smaller sample size. Yet, because of the colossal amount of data collected in Big Data, it amends for the minute errors in the data and tolerates the inexactitude. Therefore, the “Volume” is the most valuable V among the various Vs in Big Data.

Circling back to “The TikTok Algorithm”, without the substantial quantity of data collected, the algorithm would not have been able to operate such precisely in creating the personalized For You Page, which brought TikTok a fortune, acting as the foundation for all data analytics.

To summarize, the basics of Big Data include “Velocity”, “Variety”, and “Volume” — keeping in mind the various other Vs generated over the years that are equally important– and it exists all around us, every action is now digitalized and collected as data with great potential value. In fact, you reading this article has become will be recorded as data and a part of Big Data.

References

ACCA. (n.d.). Big data 1: What is big data? | ACCA Global. Www.accaglobal.com. Retrieved October 18, 2022, from https://www.accaglobal.com/gb/en/student/exam-support-resources/fundamentals-exams-study-resources/f5/technical-articles/what-is-big-data.html

Mage. (2022, February 17). How does TikTok use machine learning? DEV Community. https://dev.to/mage_ai/how-does-tiktok-use-machine-learning-5b7i#:~:text=TikTok%20gathers%20a%20large%20sum

Memon, M. (2020, July 29). How the TikTok Algorithm Works in 2020 (and How to Work With It). Social Media Marketing & Management Dashboard. https://blog.hootsuite.com/tiktok-algorithm/

Terra, J. (2022, April 18). Characteristics of Big Data? | 5V’s, Types, Benefits | Simplilearn. Simplilearn.com. https://www.simplilearn.com/5-vs-of-big-data-article

Wang, C. (2020, June 7). Why TikTok made its user so obsessive? The AI Algorithm that got you hooked. Medium. https://towardsdatascience.com/why-tiktok-made-its-user-so-obsessive-the-ai-algorithm-that-got-you-hooked-7895bb1ab423

--

--