Introducing Prophecy.io - Cloud Native Data Engineering

Published in

Prophecy.io

5 min readAug 15, 2019

Prophecy.io is a high-performance, zero compromise cloud native data engineering product powered by Spark & Kubernetes for Enterprise Data Engineering teams. Prophecy also provides a highly automated replacement for legacy ETL products, to accelerate the journey to open source and cloud computing.

Starting in systems engineering and excited by the potential of GPUs, I moved from Microsoft Visual Studio team to become an early engineer in the CUDA team at NVIDIA, and played a key role in compiler optimizations to get high performance on GPUs. It’s a delight to see how far CUDA as come in powering deep learning and bitcoin mining. Passionate about building a startup, I moved to learn Product Marketing & Product Management, leading both functions for ClustrixDB through a hard pivot to a limited, repeatable product-market fit.

At Hortonworks, I product managed Apache Hive through the IPO. It was not fun to be in front of customers and see them struggle with Data Engineering. The Hortonworks team put in a massive effort to make Hive better & simpler — fewer configurations, faster performance, a real cost based optimizer, and a simplified stack in Hive 2.0 and beyond.

Taking the learnings of technology and the market, we’ve decided to build a product centric company with relentless focus on the customer needs and it just works! experience, coupled with a sprinkle of joy!

My co-founder Rohit Bakhshi brings strong product expertise, with the go-to-market experience from Hadoop, GraphQL Apollo and Kafka - scaling multiple Enterprise Data Engineering platform companies to unicorns.

We’re delighted to raise our seed round from SignalFire with Ilya Kirnos joining the board & Andrew Ng working closely with us. They join our existing investors Berkeley SkyDeck (Fall 2018 cohort), and friends from Enterprise software industry who invested Angel funds! SignalFire is unique - besides knowing the technology & market well, they get their hands dirty -running Spark in-house for their Machine Learning based product Beacon for locating talent!

There are two things I want to talk about today:

Enterprise Data Engineering is a Journey
Enterprise Data Engineering needs new Interfaces

Enterprise Data Engineering is a Journey

Solving the real problem!

Large Enterprises have 10s of thousands of ETL workflows in production on premise, in a legacy ETL format, and they’re paying through the nose. There are compelling reasons to move to Open Source and Apache Spark - freedom, agility, talent and cost - that are well understood by the leadership of these Enterprises.

Our strong view is that putting a product on public cloud, and asking the Enterprise to figure out the transition sucks! Are 10s of thousands of workflows to be rewritten? Is every environment to have a separate scheduler, and a separate ops team? How are graphs of dependent workflows to be migrated?

We think “Legacy ETL workflows” or “dataflows” are too low a level of abstraction to focus on. We’re focused on building products for the complete “Data Engineering” journey!

Let’s look at some specific solutions:

1. Legacy ETL to Spark: Transpilers!

Apache Spark is ubiquitous in public and private clouds with managed services in public clouds.

Transpilers (drawn as dragons since the dragon book) are our compiler based products to convert Legacy ETL assets into Prophecy with restructuring support. This includes workflows, configurations and datasets which we transpile to an open source technology stack. This is the first step to freedom!

2. Multi-Cloud: Unified Control Plane & Distributed Data Plane

As Enterprises are moving to public clouds and multiple data centers, hybrid cloud is the new-normal state:

Enterprise IT infrastructure will always be across private and multiple public cloud providers and multiple regions.
Data sovereignty and data localization laws mean slices of data will be spread and processed across many geographies

In these cases, keeping a single control plane, while distributing the data plane provides significant simplicity and cost savings. The motivation for distributing data plane might be based on regulation, performance or reliability. This requires the right abstractions in the Data Engineering product for development and production.

Enterprise Data Engineering needs new Interfaces

As we talk to Enterprise teams, we’re finding that with different roles come different preferences:

Some developers prefer visual drag and drop interface
Some developers prefer code development interface
Architects prefer standardized components
Test and support engineers prefer visual interfaces

Clearly, current interfaces are not meeting these needs, and as we look to design the right interface, let’s review the strengths of each:

We believe there is a much better way of doing this! With compiler magic we have a Unified Visual & Code Interface that provides Interactive Execution.

We’re super excited to support the journey of moving Enterprise Data Engineering to Open Source Runtimes and Cloud Native Infrastructure, while innovating on interfaces - providing a delightful user experience!

There are a few other unique features that I’ll talk about in subsequent posts such as user defined components. Stay tuned!

PS. We’re working on hard problems and looking for top engineers to help us get there! If you’re interested reach out to me at raj.bains@prophecy.io

Introducing Prophecy.io - Cloud Native Data Engineering

Enterprise Data Engineering is a Journey

1. Legacy ETL to Spark: Transpilers!

2. Multi-Cloud: Unified Control Plane & Distributed Data Plane

Enterprise Data Engineering needs new Interfaces

Written by Raj Bains