Datacoral: Innovations to Unleash the Power of Data

Raghu Murthy
Mar 27 · 5 min read

In my initial post, I introduced Datacoral and our focus on helping enterprises everywhere get the maximum value out of their data.

This week I am kicking off a series of posts that dive into Datacoral’s core technology and offer some practical “how to” examples that demonstrate what life as a data engineer or scientist is like for Datacoral customers. But first, let’s get oriented on the problem that has been talked about a lot!

The Challenge — The Data Hairball

This quote is from a recent Wall Street Journal article that highlights how unlocking the value of data is at the core of every company that wants to become a technology company. We identify two key reasons for why it is hard for companies to truly leverage the data they have.

  1. There are a mind-boggling number of choices of technologies and services for collecting, analyzing, and managing data — like ingest tools and services, ETL and job orchestration systems, data warehouses and big data query engines to name a few. There is significant expertise required to make those choices and actually assemble a system that works for end-to-end data flows — from data in different sources to insights that are acted upon. Such expertise is very much in short supply.
  2. Data flows are coded up as data pipelines which require a deep understanding of the underlying technologies. Data pipelines implement the business logic of data in scripts that are filled with boiler plate code to handle the integration points between the different systems and orchestration logic to make sure that data is processed in the right order. These data pipelines become brittle and hard to maintain over time as the business logic of the data changes, again requiring expertise that is hard to come by.

At Datacoral, we have worked to overcome the complexities of securely piecing together the systems needed for end-to-end data flows and have dramatically simplified how data-flows are specified.

Data Programming — Moving Data Teams Up the Stack

We call it Data Programming, rather than Data Engineering.

Data engineering today can really be boiled down to the integration of a variety of systems and the scripting data pipelines that span those systems. Data engineers have to understand the architecture of the underlying data infrastructure, the jobs in data pipelines used for automation and the dependencies that govern how the orchestration happens at scale.

In the world of programming, this is akin to working with an assembly language where there is a strong correspondence between the syntax and structure and the architecture of the target microprocessor. Most programmers use higher-level programming languages (Node.js, Java, .NET, Python) that allow them to write code that describes the business logic of an application and is portable across hardware architectures.

So, why shouldn’t there be a higher-level language to create programs that focus on the business logic of the data without having to know about the architecture of the underlying data pipeline and data infrastructure?

At Datacoral, we are introducing a SQL-like high level language — Data Programming Language (DPL) — which allows data professionals to author data programs to manage end-to-end data flows without having to understand the underlying systems.

So, for example, instead of thinking about building an ingest pipeline from Salesforce into a data warehouse with its myriad jobs and tasks, one would just write a single statement for the `collect data function’, i.e., something like:

UPDATE SCHEMA salesforce
FROM sfdc-connection-params

Once such a statement is executed, data from different salesforce objects starts automatically flowing into corresponding tables in the salesforce schema of the data warehouse!

We have added SQL-like syntax to specify different data functions that are typically performed in end-to-end data flows. The signature or type of data functions is essentially the schema of the data being returned by the functions. So, data programs can be statically type checked, for example, changing the schema of data function without also changing all the transformations that use the output of the data function results in a compile time error. Having such a capability significantly simplifies how end-to-end data flows are built and maintained over time.

Data programs get compiled into data pipelines that then get executed on Datacoral’s data programming runtime platform. The platform consists of

  1. a scalable way to manage state through a shared metadata layer and
  2. a data-event driven pipeline orchestration layer

The runtime captures the necessary state to provide users visibility into both data freshness and data quality. The platform itself has been built in a fully serverless manner. More on this in later posts.

Datacoral Slices — Abstracting the Complexity of System Integration

We have built an extensive catalog of slices of different types. Collect slices make raw data available consistently. They provide modular endpoints for instrumentation, capture changes in production databases and retrieve data from any API. Organize slices use the notion of materialized views to support consistently transforming data in any query engine. Harness slices can publish data to third-party apps for company wide use, or to production databases for direct access by applications.

Serverless all the way — Scalable and Secure Architecture that Works where Your Data is

The deployment and consumption of Datacoral itself is similar to the deployment and consumption of other AWS services. Our software gets deployed inside our customers’ VPC, which means that their data never leaves their environment and is also encrypted using customer managed keys. The result is an unprecedented level of security for an end-to-end data infrastructure stack in the cloud.

Solving Real Problems for Enterprises Today

Checkout the Data Engineering Podcast I did on Serverless Data Pipelines using Datacoral to learn more.

Simply sign-up for a demo or email us at hello@datacoral.co to learn how.

We are also looking for strong engineers to join the team!

Next up — the data programming interface.

Datacoral

A place for our points of view and news

Raghu Murthy

Written by

Founder and CEO, Datacoral

Datacoral

Datacoral

A place for our points of view and news