Analytics Challenges — Data Fabric

Cengiz Kayay (CK)
Analytics Vidhya
Published in
5 min readDec 27, 2019

In the previous post, I have listed the Analytics Challenges that most organizations facing today. This post articulates the Data Fabric concept.

Data Fabric Conceptual Image

As of today, structured, semi-structured and unstructured data are all fed into Data Lakes to create Data Marts and Data Assets for Regulatory Reports, Strategic KPI's, etc. There are also other silo BI solutions in the enterprise targeting to help measure the operational performance to deliver analytics, decisions, or recommendations for specific Line of Business such as CRM, ERP, Marketing(Web/Mobile Analytics) as shown below:

Simplified Example — Current Analytics Landscape

Different data assets and types are available from Document Management Systems and Silo BI solutions which may need to be integrated for collective insights throughout the enterprise using the 'Data Fabric' concept.

Gartner describes the ‘Data Fabric’ as the means of supporting “frictionless access and sharing of data in a distributed network environment.” These decentralized data assets (and respective management systems) are joined by the data fabric architecture. Although this architecture involves any number of competing vendors, graph technology and semantic standards play a pivotal role in its implementation.

The primary driver underpinning the necessity of the data fabric architecture is the thresholds of traditional data management options. Hadoop inspired data lakes can co-locate disparate data successfully, but encounter difficulty actually finding and integrating datasets.

These options can sometimes excel at cheaply processing vast, simple datasets, but have limited utility when operating over complex multiple entities.

Data warehouses can offer excellent integration performance for structured data but were designed in the slower pace of the pre-big data era. They’re too inflexible and difficult to change in the face of the sophisticated and ever-increasing demands of today’s data integrations and are poorly suited for tying together the unstructured (textual and visual) data inundating the enterprises today. Cognitive computing applications like machine learning require far more data and many more intricate transformations, necessitating modern integration methods.

The foremost benefit of semantic graphs through data fabric architecture is seamless data integrations. This approach not only blends together various datasets, data types, and structures, but also the outputs of entirely distinct toolsets and their supporting technologies. By placing a semantic graph integration layer atop this architecture, organizations can readily rectify the most fundamental differences at the data and tool levels of these underlying data technologies. Whether organizations choose to use different options for data virtualization, storage tiering, ETL, data quality and more, semantic graph technology can readily integrate this data for any use.

The data blending and data discovery advantages of semantic graphs are attributed to their ability to define, standardize, and harmonize the meaning of all incoming data. Moreover, they do so in terms that are comprehensible to business end-users, spurring an innate understanding of relationships between data elements. The result is a rich contextualized understanding of data’s interrelations for informed data discovery, culminating in timely data integrations for cutting edge applications or analytics like machine learning.

The knowledge graph by-product of these integrations is quickly spun up in containers and deployed in any cloud or hybrid cloud setting. With modern pay on-demand cloud delivery mechanisms in which APIs and Kubernetes software enable users to automatically position their compute where needed, the data fabric architectures are becoming the most financially feasible choice for the distributed demands of the modern data ecosystem.

The below diagram shows the Data Fabric Architecture using separate Semantic Knowledge Graph that creates Graph Marts and Semantic Search & Intention Engine components.

Simplified Example — Data Fabric Architecture

The data from all systems can be extracted in micro-batches into Semantic Knowledge Graph component implemented on a performant OLAP Graph DB based on specific Enterprise Knowledge Ontology.

Semantic Search & Intention Engine understands the intention of the questions asked in free text and creates SparQL queries on the Graph Marts supporting dynamic aggregations. ML/AI algorithms would be trained on the Graph Marts and can be called directly from the same component to answer business questions, such as:

  • What is the total sale for Region X?
  • What skills are available in the organization for ‘’XYZ’ and their current utilization?
  • How many projects are running across the organization and what is their completion rate?
  • What are the scenarios to improve sales by %10?
    It runs a query to bring all the sales data and run several scenarios automatically for each region that yields the best result.
  • What is the best product to offer for Customer with Account Number XYZ?
    It runs a query to bring details of Customer XYZ and runs the next-best-offer model to recommend products.
  • What is the best location in Region XYZ to open a new store?
    It runs a query to bring details of Region XYZ and runs the store recommendation model.

Conclusion:

Thus far, we have talked about how ‘Semantic Graphs’ may help deliver the ‘Data Fabric’ concept that blends data from disparate sources and allow their discovery.

Data Fabric would enable you to directly raise business questions and get answers by utilizing all of the data and insights available in your enterprise.

The Everis Knowler SAAS solution is one of such implementations that extracts the data from Office 365 (as Content and Document Management System) to blend with other data sources based on custom enterprise ontology and allows semantic search over their portal like search interface. Check it out…

You may contact for specific questions at: ckayay@gmail.com

References:

--

--

Cengiz Kayay (CK)
Analytics Vidhya

The data guy with focus on rapid delivery of solutions using less through autonomous systems