About BIG DATA

5 min readSep 17, 2020

INTRODUCTION

Big data burst upon the scene in the first decade of the 21st century, and the first organizations to embrace it was online and startup firms. Arguably, firms like Google, eBay, LinkedIn, and Facebook was built around big data from the beginning. They didn’t have to reconcile or integrate big data with more traditional sources of data and the analytics performed upon them, because they didn’t have those traditional forms. They didn’t have to merge big data technologies with their traditional IT infrastructures because those infrastructures didn’t exist. Big data could stand alone, big data analytics could be the only focus of analytics, and big data technology architectures could be the only architecture.

Consider, however, the position of large, well-established businesses. Big data in those environments shouldn’t be separate but must be integrated with everything else that’s going on in the company. Analytics on big data have to coexist with analytics on other types of data. Hadoop clusters have to do their work alongside IBM mainframes. Data scientists must somehow get along and work jointly with mere quantitative analysts.

How new the big data is?

Big data may be new for startups and for online firms, but many large firms view it as something they have been wrestling with for a while. Some managers appreciate the innovative nature of big data, but more find it “business as usual” or part of a continuing evolution toward more data. They have been adding new forms of data to their systems and models for many years, and don’t see anything revolutionary about big data. Put another way, many were pursuing big data before big data was big. When these managers in large firms are impressed by big data, it’s not the “bigness” that impresses them. Instead it’s one of three other aspects of big data: the lack of structure, the opportunities presented, and low cost of the technologies involved.

“It’s About Variety, not Volume: Big companies are focused on the variety of data, not its volume, both today and in three years. The most important goal and potential reward of Big Data initiatives is the ability to analyze diverse data sources and new data types, not managing very large data sets.”

Firms that have long handled massive volumes of data are beginning to enthuse about the ability to handle a new type of data — voice or text or log files or images or video. Companies can have a much more complete picture of their customers and operations by combining unstructured and structured data.

Objectives for Big Data

Big data can bring about dramatic cost reductions, substantial improvements in the time required to perform a computing task, or new product and service offerings. Like traditional analytics, it can also support internal business decisions. The technologies and concepts behind big data allow organizations to achieve a variety of objectives, but most of the organizations were focused on one or two. The chosen objectives have implications for not only the outcome and financial benefits from big data, but also the process — who leads the initiative, where it fits within the organization, and how to manage the project.

Big Data’s moving parts

No single business trend in the last decade has as much potential impact on incumbent IT investments as big data. Indeed big data promises — or threatens, depending on how you view it — to upend legacy technologies at many big companies.

Companies are not only replacing legacy technologies in favour of open source solutions like Apache Hadoop, but they are also replacing proprietary hardware with commodity hardware, custom-written applications with packaged solutions, and decades-old business intelligence tools with data visualization. This new combination of big data platforms, projects, and tools is driving new business innovations, from faster product time-to-market to an authoritative — finally! — single view of the customer to custom-packaged product bundles and beyond.

The Big Data Stack

As with all strategic technology trends, big data introduces highly specialized features that set it apart from legacy systems.

Each component of the stack is optimized around the large, unstructured and semi-structured nature of big data. Working together, these moving parts comprise a holistic solution that’s fine-tuned for specialized, high-performance processing and storage.

Hadoop

Hadoop is an important part of the NoSQL movement that usually refers to a couple of open source products — Hadoop Distributed File System (HDFS), a derivative of the Google File System, and MapReduce — although the Hadoop family of products extends into a product set that keeps growing. HDFS and MapReduce were co-designed, developed, and deployed to work together.

Hadoop adoption — a bit of a hurdle to clear — is worth it when the unstructured data to be managed reaches dozens of terabytes. Hadoop scales very well, and relatively cheaply, so you do not have to accurately predict the data size at the outset. Summaries of the analytics are likely valuable to the data warehouse, so interaction will occur.

The user consumption profile is not necessarily a high number of user queries with a modern business intelligence tool and the ideal resting state of that model is not dimensional. These are data-intensive workloads, and the schemas are more of an afterthought. Fields can vary from record to record. From one record to another, it is not necessary to use even one common field, although Hadoop is best for a small number of large files that tend to have some repeatability from record to record.

Record sets that have at least a few similar fields tend to be called “semi-structured,” as opposed to unstructured. Web logs are a good example of semi-structured. Either way, Hadoop is the store for these “nonstructured” sets of big data.

Instagram is a Surprisingly Effective Big Data Source

Instagram is a behemoth in the world of social media. There are more than 800 million active users on it every month. 51 percent of this user base accesses it on a daily basis. 95 million photos and videos get uploaded to the platform each day. Since its inception in 2010, there have been over 40 billion photos and videos shared in total.

It becomes more and more staggering the closer you look at it all. So, it’s no wonder that businesses have set their sights on Instagram as a resource for mining big data. The information and insights gained from it have proven an invaluable resource for personalized marketing and research.