Data Lake: Biggest Data and Analytics trends for 2017

Manish Garg
Data and Analytics
Published in
3 min readDec 29, 2016

If you work on Data and Analytics, 2017 will be a very interesting year. Data & Analytics infrastructure is finally becoming sexy, after being dominated by Data Warehouse for decades.

Three main macro trends that will influence and disrupt the state of data management (and companies playing in this field).

  1. Most applications in the next 2–5 years will be intelligent applications
  2. Users will demand information in real time and more automation in the software
  3. Companies will need a more scalable and flexible infrastructure to manage their data (especially big data)

Intelligent applications, user demand for real time data and need to analyze more data, faster will make Data Lake a necessity.

  1. Most applications in the next 2–5 years will be intelligent applications:

Companies, both in consumer and enterprise space are investing heavily into machine learning and artificial intelligence. Gartner predicts that by 2018 even ERP applications will be infused with higher level of intelligence. Facebook and Google have been investing heavily in the consumer space. And enterprise companies are not far behind. IBM Watson, Salesforce Einstein, Microsoft Machine Learning and of course Amazon Machine Learning platform as well as Amazon Recognition are examples of both machine learning platforms as well as solutions in enterprise space.

As more applications are built on these platforms, existing applications will be forced to evolve to similar sophistication. Machine Learning is possible when large data sets can be analyzed and current infrastructure wasn’t built for that.

2. Users will demand information in real time and more automation in the software:

People will demand more of their software because of two reasons. First, workforce is getting increasingly more computer savvy as younger people join, who are already comfortable in using mobile and internet. Second, people have started to experience intelligent technology in their personal life (recommendations by Netflix, Amazon, and Facebook, self driving cars becoming a reality, personal assistants, etc). And they will demand similar intelligence from their enterprise software. This includes information in real time and more automation in software to make manual tasks redundant. Real time information means streaming data and fast data analysis.

3. Companies will need a more scalable and flexible infrastructure to manage their data (especially big data):

As this chart illustrates, the growth in data will continue to grow faster. And as unstructured data from various sources grow, it will be difficult to manage and analyze this data using traditional analytics infrastructure like Data Warehouse.

Therefore companies will turn towards newer data management strategies like Data Lake (click to learn more about Data Lake).

Existing technologies are great for structured queries and batch replication of data from trasaction to analytics database. But when the structure of incoming data is unknown or when this data HAS NO STRUCTURE, Data Warehouse fails miserably.

Nick Heudecker, research director at Gartner puts it well. “The idea is simple: instead of placing data in a purpose-built data store, you move it into a data lake in its original format. This eliminates the upfront costs of data ingestion, like transformation. Once data is placed into the lake, it’s available for analysis by everyone in the organization.”

In conclusion, any organization investing in Data Analytics ought to be thinking about Data Lake as the data organizing principle and rethink their Data Warehouse based organization scheme.

--

--

Manish Garg
Data and Analytics

Technologist | Machine Learning | UX | Enterprise applications | Tech debates