In the Grand Schema of Things

Semantic Schema to extract information across multiple business systems

Introduction
An EDM (Enterprise Data Model) is the starting point for all data system designs. It falls in the realm of information architecture. The model can be compared to an architectural blueprint, and its relationship with the finished building: A blueprint draws up visualization, and acts as a framework supporting planning, building and implementation of data systems. Likewise, an EDM helps in planning your data integration and takes away the data silos. With this, the unified data presents the “single version of the truth” for the advantage of all. It minimizes data redundancy, disparity, and errors; core to data quality, consistency, and accuracy.

The efficiency of a good EDM lies in the design of its logical data model. The logical data model defines the relationship between subject areas like people, places, things, events and associated rules. Using taxonomies, data can be organized in a hierarchical fashion as part of the master data management environment

Babelfish schema is designed to primarily fetch complex results in a minimum number of steps (shortest path to responses), irrespective of the number of entities. The schema not only creates an absolute relationship among data entities, but also incorporates a comprehensive meta-data management system with rules, in order to compute in real-time.

Explaining Data Model
The design incorporates data entities that encompass both raw and synthesized data entities, in order to compute and retrieve responses on the fly.

The schema is designed to achieve descriptive, predictive and prescriptive insights using a single design. The schema allows collaborative filtering of data, using customer node as the primary filter. This design provides two primary advantages. (1) Create a single data organization which connects every entity available across all data sources (2) Fetch results in minimum number of steps (shortest path to responses)

Notes on Schema

· The Customer node is connected to the product node which is further broken down into 4 sub-entities: Customer States, Behavior, Product States and Price. The Customer State defines the states (VAPR), scores and any unique segmentation definitions that are related to the product of a particular customer. VAPR stands for Viewed, Added-to-Cart, Purchased and Rejected. The customer-product relationship is in one of the four states. The purchase state further branches into sub-states that include post purchase (customer service) states

· The Behavior entity is one of the core entities. The Behavior entity covers all data coming from channels, which can be further broken down by sources and intent states. The intent states are a calculation of scores assigned to stages. In parallel, data is stacked by time-series which allow the machine to update patterns for decision matching.

· The Product States comprises of states of the product life cycle up to product delivery. Once the final product state is met, the customer state updates to Purchase within the customer states and these states allow quick retrieval of insights for both customer facing and operation intelligence systems.

· The Inventory data can be further classified based on procurement type which could be self-manufacturing or sourced through a vendor

· In case of manufacturing, you can add materials or other relevant data attributes that can be used for operational and price deductions

· The Price entity linked to the product is broken by actual price (which is deduced by expense feeds) and selling price (collected from channel data) to understand margin, discount and customer revenues.

· The price entity is a derivative of all the inputs, which is used to deduce NPV, Product Pricing, Change in Margins and to propose attractive offers, all in real-time

· The Actual Price is further broken down into Fixed and Dynamic entities which derives its numbers from People, Place, S&M, GA and other costs documented

The design is apt for rendering instant learning and response selections as it allows data to form patterns that can be matched, given weights, and synthesized.

Using the time stamp of each data record, relationships are built between data based on the hierarchical design, allowing the machine to extract patterns on the fly. Using the strength of the node and its cumulative weight, the machine may prioritize which response to select. The diagram below illustrates how a pattern may get highlighted (based on the focus rule) and how it can detect possible associations that help the machine predict the possible outcomes of a repetitive path. The depth of repetition is used to achieve a confirmation state once it attains a certain threshold.

The simplicity demonstrated in selecting an outcome involves simply following the relationship trail. This method may assist in arriving at decisions instantaneously, without having to parse through redundant computational steps.

To achieve such rapid processing, the data model for AI needs to be centered around the primary node and cascade down to its inputs in order to form clusters in run-time across any dimensions, just by pivoting at the node. For run time pattern detection, data and its relationships are converted to a string. These strings are used to match with the incoming data set, and the differences and similarities are further used for sub-classification and labeling.