DATA IS S0 BIG !!!!! :)
“To really understand big data, it’s helpful to have some historical background. Big data is data that contains greater variety arriving in increasing volumes and with ever-higher velocity. Big data is larger, more complex data sets, especially from new data sources. These data sets are so voluminous that traditional data processing software just can’t manage them.”
“Big Data is also data but with a huge size. Big Data is a term used to describe a collection of data that is huge in volume and yet growing exponentially with time. In short such data is so large and complex that none of the traditional data management tools are able to store it or process it efficiently.”
The Three Vs of Big Data
:- Volume>>The amount of data matters. With big data, you’ll have to process high volumes of low-density, unstructured data. This can be data of unknown value, such as Twitter data feeds, clickstreams on a webpage or a mobile app, or sensor-enabled equipment.
:- Velocity>>Velocity is the fast rate at which data is received acted on. Normally, the highest velocity of data streams directly into memory versus being written to disk.
:- Variety>>Variety refers to the many types of data that are available. Traditional data types were structured and fit neatly in a relational database.
The History of Big Data
Although the concept of big data itself is relatively new, the origins of large data sets go back to the 1960s and ’70s when the world of data was just getting started with the first data centers and the development of the relational database.
Around 2005, people began to realize just how much data users generated through Facebook, YouTube, and other online services. Hadoop (an open-source framework created specifically to store and analyze big data sets) was developed that same year. NoSQL also began to gain popularity during this time.
The development of open-source frameworks, such as Hadoop (and more recently, Spark) was essential for the growth of big data because they make big data easier to work with and cheaper to store. In the years since then, the volume of big data has skyrocketed. Users are still generating huge amounts of data but it’s not just humans who are doing it.
With the advent of the Internet of Things, more objects and devices are connected to the internet, gathering data on customer usage patterns and product performance. The emergence of machine learning has produced still more data.
While big data has come far, its usefulness is only just beginning. Cloud computing has expanded big data possibilities even further. The cloud offers truly elastic scalability, where developers can simply spin up ad hoc clusters to test a subset of data.
The statistic shows that 500+terabytes of new data get ingested into the databases of social media site Facebook, every day. This data is mainly generated in terms of photo and video uploads, message exchanges, putting comments etc.
A single Jet engine can generate 10+terabytes of data in 30 minutes of flight time. With many thousand flights per day, generation of data reaches up to many Petabytes
Benefits of Big Data Processing
- Businesses can utilize outside intelligence while taking decisions
- Improved customer service
- Early identification of risk to the product/services
- Better operational efficiency
Big Data Use Cases
Companies like Netflix and Procter & Gamble use big data to anticipate customer demand. They build predictive models for new products and services by classifying key attributes of past and current products or services and modeling the relationship between those attributes and the commercial success of the offerings.
Factors that can predict mechanical failures may be deeply buried in structured data, such as the year, make, and model of equipment, as well as in unstructured data that covers millions of log entries, sensor data, error messages, and engine temperature.
The race for customers is on. A clearer view of customer experience is more possible now than ever before. Big data enables you to gather data from social media, web visits, call logs, and other sources to improve the interaction experience and maximize the value delivered.
Machine learning is a hot topic right now. And data specifically big data is one of the reasons why. We are now able to teach machines instead of program them. The availability of big data to train machine learning models makes that possible.
Fraud and Compliance
When it comes to security, it’s not just a few rogue hackers you’re up against entire expert teams. Security landscapes and compliance requirements are constantly evolving. Big data helps you identify patterns in data that indicate fraud and aggregate large volumes of information to make regulatory reporting much faster.
Big Data Challenges
1. Dealing with data growth
The most obvious challenge associated with big data is simply storing and analyzing all that information. IDC estimates that the amount of information stored in the world’s IT systems is doubling about every two years. By 2020, the total amount will be enough to fill a stack of tablets that reaches from the earth to the moon 6.6 times.
Much of that data is unstructured, meaning that it doesn’t reside in a database. Documents, photos, audio, videos and other unstructured data can be difficult to search and analyze.
2. Generating insights in a timely manner
organizations don’t just want to store their big data they want to use that big data to achieve business goals. According to the New Vantage Partners survey, the most common goals associated with big data projects included.
- Decreasing expenses through operational cost efficiencies
- Establishing a data-driven culture
- Creating new avenues for innovation and disruption
- Accelerating the speed with which new capabilities and services are deployed
- Launching new product and service offerings
3. Recruiting and retaining big data talent
But in order to develop, manage and run those applications that generate insights, organizations need professionals with big data skills. That has driven up demand for big data experts and big data salaries have increased dramatically as a result.
The 2017 Robert Half Technology Salary Guide reported that big data engineers were earning between $135,000 and $196,000 on average, while data scientist salaries ranged from $116,000 to $163, 500. Even business intelligence analysts were very well paid, making $118,000 to $138,750 per year.
4. Integrating disparate data sources
The variety associated with big data leads to challenges in data integration. Big data comes from a lot of different places enterprise applications, social media streams, email systems, employee-created documents, etc.
many enterprises are turning to new technology solutions. In the IDG report, 89 percent of those surveyed said that their companies planned to invest in new big data tools in the next 12 to 18 months. When asked which kind of tools they were planning to purchase, integration technology was second on the list, behind data analytics software.
5. Securing big data
Security is also a big concern for organizations with big data stores. After all, some big data stores can be attractive targets for hackers or advanced persistent threats (APTs).
most organizations seem to believe that their existing data security methods are sufficient for their big data needs. most organizations seem to believe that their existing data security methods are sufficient for their big data needs' most popular include identity and access control (59 percent), data encryption (52 percent) and data segregation (42 percent).
Big Data Best Practices
To help you on your big data journey, we’ve put together some key best practices for you to keep in mind. Here are our guidelines for building a successful big data foundation.
— Align Big Data with Specific Business Goals
More extensive data sets enable you to make new discoveries. it is important to base new investments in skills, organization, or infrastructure with a strong business-driven context to guarantee ongoing project investments and funding. Ask how big data supports and enables your top business and IT priorities.
— Ease Skills Shortage with Standards and Governance
One of the biggest obstacles to benefiting from your investment in big data is a skills shortage. You can mitigate this risk by ensuring that big data technologies, considerations, and decisions are added to your IT governance program.
— Optimize Knowledge Transfer with a Center of Excellence
Use a center of excellence approach to share knowledge, control oversight, and manage project communications. Whether big data is a new or expanding investment, the soft and hard costs can be shared across the enterprise.
— Plan Your Discovery Lab for Performance
Discovering meaning in your data is not always straightforward. Sometimes we don’t even know what we’re looking for. That’s expected. Management and IT needs to support this “lack of direction” or “lack of clear requirement.”
— Align with the Cloud Operating Model
Big data processes and users require access to a broad array of resources for both iterative experimentation and running production jobs. A big data solution includes all data realms including transactions, master data, reference data, and summarized data.