Data Democratization: Giving Power Back To The People

Veer
babelfishing
Published in
4 min readDec 3, 2018

Data democratization is the process of putting analytics in everyone’s hands, empowering more people within an organization to extract the insights that inform decisions. In effect, data democratization removes the gatekeepers that create a bottleneck at the gateway to the data.

It’s also essential that we accompany easy access with an easy way for people to understand the data, so that they can use it to expedite decision-making and uncover opportunities for an organization. The goal is to let anybody use data at any time, to make decisions with no barriers to access or understanding.*

When implemented strategically with the right technology, data democratization benefits everyone — from the data scientists who have more time for bigger tasks, right down to the customer. Data democratization will catapult companies to new heights of performance — if done right.

Tech innovation that propels data democratization

Data federation software: This software uses metadata to aggregate data from a variety of sources into a virtual database.

Self-service BI applications: These applications make it easier for non-technical users to interpret data analysis. We can now have a machine look at data and explain it to non-technical users.

What kind of technology would be required to do this?

An Artificial Intelligence system that is capable of the following:

1. Comprehending questions asked by various business users in the organization and able to deliver answers/results in runtime

2. Ability to understand different questions types involving past/present (descriptive) and, future(predictive) contexts

3. Ability to reason or discover insights

4. Ability to identify users and serve data within their access role

5. Ability to generate the right charts based on data extracted

6. Ability to narrate or generate natural language responses and deliver meaningful responses

To put it in a technical context, we need the following tech:

  • An NLP solution that can comprehend user queries
  • A data model / network that can extract time and space features
  • A network capable of finding missing relationships and influencers
  • A network with memory which identifies users and their roles
  • A network capable of learning from past patterns of queries and visualization
  • An NLG solution capable of incorporating context keys into preprocessed response templates
  • An integration of the NLP for Language and NN for Analytics, with an ability to process in real-time (AutoML)

Technical Challenges and Babelfish Solutions

NLP

  • Challenges: Though we have multiple NLP solutions to recognize named entities and label roles semantically, the available solutions are generic to text applications and do not help in recognizing entities within business data comprising of text and numbers.
  • Solution : To overcome this gap, Babelfish has developed a proprietary algorithm for Named Entity Recognition that can auto-detect entities for the NER to work seamlessly
  • To overcome the gap in dependency parsing, Babelfish proprietary logic for Semantic Role Labeling is designed to assign roles to keywords based on business meanings. This helps in finding the right relationship between nouns and verbs, which helps in finding the right meaning for a given sentence

Data Model / Neural Network

  • Challenge 1: Though there are numerous data architectures and neural networks, there is no one single model or network that can connect different data types of different departments. This is important if the system has to respond to all users in an organization
  • Solution: A unified model or network capable of connecting all data types for multidimensional processing in run time. Babelfish Unified Data Model is a hierarchical structure that works on the concepts of Recursive Neural Network, which allows back-propagation through structure. The hierarchy defined in the model connects all the data types allowing in processing queries from any department.
  • Challenge 2: The data model also has to make sure that the queries can be responded to irrespective of the question, which may involve carrying out descriptive analysis, predictive analysis and recommendations.
  • Solution: The hierarchy within the Babelfish model allows for modular data organization that can suit the analysis type for the detected context. Using scoring algorithms, Babelfish model can run probabilistic attribution or collaborative filtering using the hierarchical sequences to output predictive as well as prescriptive (recommended) answers
  • Challenge 3: It is a known challenge that neural networks are still not ready to generate reasoning. For example: Why did sales fail last month? Without reasoning, business queries like these are not likely to find answers.
  • Solution: Babelfish hierarchy allows for answering queries involving case based reasoning using matching function between patterns and extracting the influencing parameters and missing relationships between entities, allowing the machine to detect parameters that influenced a past event.

Real-time Processing

  • Challenge: For processing learning and discovery in real-time, the data types need to be integrated. This involves real-time ranking or weights, as well as maintaining states with aggregate definitions.
  • Solution: Babelfish state based synthesized nodes within the hierarchy of the unified model allows for real-time scoring to output an aggregate(net weight), which gets categorized under a state based on weight threshold. This allows the system to compute in run time and output answers by computing both historical data as well as current data

--

--