Introduction to The Tranquil Data™ Enterprise Edition
With our announcement this week about the up-coming “Dragons” Early Access, we’re taking some time to catch everyone up on what the product is and what it does. Tomorrow we’ll talk about a streamlined edition of the product, and later this week we’ll cover how the software is applied. Today, let’s focus on a high-level view of the core product.
A System of Record for Data Context
Tranquil Data is the first commercial System of Record specifically for Data Context. There’s a lot to unpack in that sentence, so let’s take it in pieces.
A System of Record (SoR) is an authority for some set of data. If you want to know the absolute truth about something, you consult your SoR. Organizations have lots of data across many types of systems and processes, so we have domain-specific SoRs, like CRM for sales processes or ERP for business processes. They may contain a gold-standard set of data, or they may represent a metadata view connecting collections of data.
Like all SoRs, Tranquil Data is first a database. That is, it provides a general service where users can define data models, program rules and queries, and track changes over time. It promises durability and recovery, defines semantics for real-time access, and integrates with security infrastructure to control use. It supports live upgrades, provides logs for operations and audit tasks, and exposes APIs to administer running services.
Like many SoRs, Tranquil Data is domain-specific. If you think about Salesforce, under the covers it is a general-purpose database in order to scale and provide all the requirements of a modern CRM. The problem it’s trying to solve, however, is rooted in the sales domain, and therefore customer interaction with “the database” isn’t via SQL but through metaphors that map to sales and marketing processes. Similarly, the interfaces that Tranquil Data exposes are defined to support “correct use” of data, driven by context. So, what is context?
Data Context is not the data that you’re using today, but knowledge about where that data came from, why you have it, and therefore what you can do with it. Data doesn’t exist in a vacuum. If it’s first-party data, then a user agreed to some privacy policy, signed some consent, or interacted in some environment where regulations were in-place (like how HIPAA governs visits to doctors in the US). If it’s third-party data there was some back-end contract like an MSA or BAA defining rights and obligations about how that data can be used or shared. Those rights and obligations, because of regulatory requirements, may change depending on the type of data, the country it came from, the age of an associated user, whether that user had a particular employer or spouse at the time their data was acquired, etc.
Data Context is the metadata needed to define this complete framework, and Tranquil Data is the System of Record for a materialized version of this context. Context takes the form of a graph, but unlike other “knowledge graphs” which are designed to collect as much data as possible, Tranquil Data purposefully curates this knowledge down to the minimum set needed to select the applicable reason that grants or denies use of data against any purpose in context. Our software allows you to apply this knowledge in real time to ensure that data is being used and shared correctly, and provides access to the versioned context graph to look back over time and understand not just what decisions were made, but why.
Categories of Context
This context graph is at the core of Tranquil Data. It is what allows our software to ensure correct use of data against complex frameworks of rules and requirements. In the Dragons release of Tranquil Data, there are three broad categories of context that we connect.
Entity Context is the building block used to reason about all other types of context. It consists of:
- Policies, written in the OASIS XACML 3.0 policy language. They are decomposed, versioned, and support a syntax to query context during evaluation, providing a rich expression model.
- Models, defining which fields from any given database are meaningful from a context perspective, details about how to interpret these properties, and definitions for how fields of data are categorized.
- Domains, namespaces that define a root policy and model to work together in generating other context and evaluating use of data.
- Peers, individual actors with unique cryptographic identity and operational properties, that use the other three types of entity context to export access to data and APIs for real-time decision-making.
Entity context is provisioned via API into our software, is accepted through consensus (more below on service capabilities), and is written to a distributed ledger. Each component-version exists at a unique logical time, so that there’s a timestamp we can use to connect all other context back to this state of the world. The ledger itself provides proof of integrity and is used on each process re-start to replay context into a known, good state.
Record Context is knowledge about data, at the record-version level, in some external database. This might be (e.g.) a row in a Postgres DBMS or a record in a FHIR v4 service. Most of the content in these records is typically ignored. Entity Models let you define which specific fields are meaningful (like fields that define an associated user, a sensitive type of data, a location where the data was created, etc.) and include those properties in context. Those properties can now be referenced by policy. The model also defines category-field mapping, so that policies can be defined for collections with stable names like “contact information” (which non-technical roles will also understand) without having to know how structures in different databases map fields to that category.
Subject Context is knowledge about the owners and actors around this data. Without requiring any specific schema or structure for context elements, it lets you define users or organizations who are associated with data, and their properties and relationships, like the state where they live, the name of a spouse, or a specific consent that they’ve granted. This knowledge can be connected to groups, which similarly have properties and members, so you can define collections like third-parties that you work with or organization types.
The query syntax supported by policies lets you connect this knowledge so that you could (e.g.) find the user associated with the record that is being requested, and from a relationship that user has resolve properties of a group as input to a rule.
The Tranquil Data™ Context Engine
At the heart of creating and curating this graph is the Tranquil Data™ Context Engine. This is a piece of software, provided as a native binary or a Docker Container Image, for you to deploy and run in your environment. It runs in any cloud or virtualized environment, and in its simplest deployment requires only a durable volume to run. The design of this engine has a few components that will be familiar to anyone who’s worked on databases, operating systems, or similar infrastructure.
There are two ways to interact with the context graph that our software forms. One is via APIs. You can create explicit structure like a subject or policy, and read those values directly. There is also a decision endpoint that lets a caller specify a domain and provide some input, to get details about whether a given purpose is valid in context, and why that decision was made. These interfaces are useful for automating data flows, asking “what if” style questions, testing collections of data before using them, etc.
The second way is that our engine speaks a number of database wire protocols, and can act as an intermediary where authenticated connections are terminated. This lets a client that expects to talk with (e.g.) Postgres, MongoDB, or S3 run un-modified while your policies and context definitions ensure that every CRUD operation on any record is valid, and automates the formation of Record Context (in a model consistent with the semantics of any given database) as data flows through the engine. The software can be configured to act as a validating component, an enforcement point, or a row or field-level redaction service. Deployment in the data path supports API hardening when data is leaving an environment, CI testing to validate application behavior, or data virtualization where you want to export access based on context.
In addition to these two paths, there are REST APIs used to configure and monitor the software, and they may be defined to have different types of security access. For instance, access to a MySQL instance might use a specific LDAP service while a suite of administrative APIs could require OAuth tokens with specific values.
As context forms the engine acts like any other database, isolating changes locally until a session is committed. A cache provides low-latency access to frequently or recently used context, and a Write-Ahead Log absorbs the latency of writes. The Change Data Capture stream provides a complete and historical view of the context graph for audit, data management, discovery, and similar tasks often associated with knowledge graphs. The runtime context store throws away much of this knowledge, keeping only the “computed” most-recent version of any given context element. This bounds the size of the operational store, and turns the graph into an efficient KV model for cache and storage. The runtime model may be kept in common operational systems like Postgres or Dynamo, or stored on-disk as an internal component of the engine.
The Tranquil Data™ Enterprise Edition provides access to all of these capabilities. It also provides the ability to deploy multiple instances of the engine, and connect them together into a logical service. In the initial Dragons release, each peer maintains its own local copy of replicated entity context, but shares a common store for record and subject context. The peers coordinate to accept entity changes or export datastore access.
A service deployment may be needed for traditional operational requirements like availability and redundancy, increased throughput, etc. It may also be used to deploy instances into physically separate locations to support residence requirements, or to tag instances with different properties that will be inherited by record context, so that (e.g.) it’s possible to write policies that apply or allow use only when records are known to have originated through specific sources or streams. Individual peers may also have their own policies that are combined with domain policies, so that (e.g.) exporting access to the same Postgres instance through two different peers results in different rules being applied.
How to Get Started
Tranquil Data offers a wide set of building blocks to solve for a diverse set of challenges. This article was intended to give you an overview of the pieces without diving deep into any specific detail. It explained that the goal of this system is to connect contextual knowledge that is used to select and evaluate rules within a framework that defines correct use. What “correct” means will depend on the application, the vertical, and the business requirements.
That said, there are a few common uses and deployment models where our users tend to start. Tomorrow we’ll talk about a second product edition that focuses on these, hiding most of the details of context and the engine, and focusing on streamlined adoption. Tune in to learn more about how to get started with Tranquil Data Dragons.