The Five S’s of Big Data: Significance in the World of Big Data

Simon Jang
CISS AL Big Data
Published in
5 min readSep 8, 2023

--

Big Data is an integral part of our modern world. It enables us to gather, analyze, and implement findings from vast data sets into real-world applications. Simply put, Big Data is a tool that utilizes advanced computing software and hardware to analyze massive amounts of data (Botelho, B., & Bigelow, S. J. 2022). However, it was not until the 2000s, with the advent of digital data and the internet, that this became possible. Before this, only a quarter of data sets were digital, making it impossible to analyze them comprehensively. Recently, the “V’s” of Big Data have emerged as tools to highlight the main features of Big Data. However, these terms have become overused and redundant. I present you the five S’s: Scalability, Storage, Speed, Security, and Smart Analytics.

Scalability

Scalability refers to the ability of a system to handle increasing volumes of data efficiently. With the ever-changing state of digital information, scalability will play a vital role in managing and processing large datasets. For the past two decades, the world has witnessed the rapid digitization of books, films, and various other forms of media. Additionally, with the widespread accessibility of the internet, massive data sets have been formed; Google alone gathers 2.5 exabytes of data per day, which is equivalent to 833 thousand 4K movies. Technologies such as distributed computing frameworks, like Hadoop, empower organizations to seamlessly expand their data infrastructure as the volume of data expands. Scalability not only allows for the analysis of large data sets but also enables more accurate generalizations compared to small samples obtained through traditional sampling methods.

Figure 2 Data volume growth by year in zettabytes
https://www.researchgate.net/figure/Data-volume-growth-by-year-in-zettabytes_fig2_313400371

Storage

Storage is another essential component of big data. With vast amounts of info being generated daily, it becomes crucial to store and manage it effectively. As mentioned, the rapid digitization of most documents and datasets on the internet made storage a crucial component of Big Data (Figure 2). Traditional storage systems, such as databases, often fall short in handling the sheer volume and variety of data. This has led to the emergence of innovative storage solutions, like distributed file systems and cloud-based storage, that provide efficient and scalable means to store and retrieve data for analysis. With data expanding exponentially daily with everyone connected to the internet, being able to store and navigate the data will prove pivotal in making predictions. Additionally, new designs such as Microsoft’s underwater data center experiments aim to be reliable, practical, and energy-sustainable, potentially allowing Microsoft to run large data centers while saving money on energy-related expenditures as well as being protected from possible attacks by electromagnetic pulses(Figure 3). Without a viable storage system, big data will be unable to function to its maximum potential.

Figure 3 Microsoft’s underwater data center in the Scottish Sea (2020) https://news.microsoft.com/source/features/sustainability/project-natick-underwater-datacenter/

Speed

Speed is essential when it comes to analyzing datasets. With the current rate at which data is generated, organizations must process and analyze information in real time to derive actionable insights. Technologies like in-memory databases and stream processing frameworks enable rapid data processing, allowing businesses to respond quickly to changing market dynamics and make data-driven decisions. As Big data is aimed at providing instantaneous and reliable predictions, it is important to have the computing power to process petabytes of data rapidly to ensure instantaneous predictions. Big Data isn’t aimed at being exact; rather, “it’s a tradeoff with less error from sampling we can accept more measurement error” (Viktor Mayer-Schönberger). Speed became more important than being exact.

Security

Security is a significant concern in the realm of big data. As data becomes an increasingly valuable asset, protecting it from unauthorized access, breaches, and privacy violations is of utmost importance. Robust security measures, including encryption, access controls, and data anonymization techniques, are implemented to safeguard sensitive information. As data breaches become more common, the likelihood of your personal information being released or stolen becomes a reality. Thus, compliance with regulations like GDPR(General Data Protection Regulation) and CCPA(California Consumer Privacy Act) ensures responsible handling of personal data. For example, an industrial framework called SOC 2, developed by the American Institute of CPAs (AICPA), certifies a company’s preparedness for security breaches which can help companies strengthen their cyber security. (The Economic Times, (2023))

Smart Analytics

Last, Smart analytics refers to the process of deriving insightful conclusions from large amounts of data. Organizations can discover patterns, trends, and correlations within large datasets using modern technologies such as machine learning, natural language processing, and data mining. Decision-makers are provided with the tools they need to make informed decisions, run their businesses more efficiently, and identify new business opportunities. People might be given the same data. However, if they fail to identify the most effective way to analyze the data, they might end up with an outcome that is either inaccurate or completely wrong.

In conclusion, the Five S’s — Scalability, Storage, Speed, Security, and Smart Analytics — hold great significance in the realm of big data. These concepts drive the effective management, processing, and analysis of large volumes of data, enabling organizations to harness the power of information for innovation and growth. By embracing these principles, businesses can leverage big data to gain a competitive edge, drive insights, and shape the future. As we navigate the big data landscape, let us adopt the Five S’s as guiding principles for success.

Figure 4 Server room that has the potential to manage datasets and analyze large sums of data. https://www.poweradmin.com/blog/recommended-server-room

Citations:

1. Botelho, B., & Bigelow, S. J. (2022, January 5). What is Big Data and why is it important? Data Management. https://www.techtarget.com/searchdatamanagement/definition/big-data

2. GeeksforGeeks. (2023, February 21). 6V’s of Big Data. GeeksforGeeks. https://www.geeksforgeeks.org/5-vs-of-big-data/

3. Team, T. (2021, June 16). 5 vs of big data. https://techvidvan.com/tutorials/5-vs-of-big-data/

4. 14, W. by John R. P. S., Roach, W. by John, & Roach, J. (2023, July 24). Microsoft finds underwater data centers are reliable, practical, and use energy sustainably. Source. https://news.microsoft.com/source/features/sustainability/project-natick-underwater-datacenter/

5. What are the benefits and challenges of Big Data Analytics? The Economic Times. (2023). https://economictimes.indiatimes.com/jobs/c-suite/what-are-the-benefits-and-challenges-of-big-data-analytics/articleshow/103312056.cms?from=mdr

6. Mayer-Schönberger, V., & Cukier, K. (2017). Big Data: The Essential Guide to work, life and learning in the age of insight. John Murray Publishers.

--

--