Introducing Fybrik

Ronen Kat
fybrik
Published in
3 min readMay 24, 2021

co-authored with Roee Shlomo and Sima Nadler

We announce Fybrik (previously called Mesh for Data), an open platform that handles common data usage functions such as access, mobility, governance and control — so applications don’t need to implement such functions themselves and manual data usage process can be automated. Fybrik provides an extendible Kubernetes-based framework to control and optimize the data flows between applications and data sources.

What problem does it solve?

Organizations seek to unlock value from their data. However, taking advantage of the data is hard. The data use is governed by regulatory requirements and business rules. This becomes more challenging as data is stored in both on-premises and in cloud-based platforms.

Application developers who write and run code that accesses data are required to tackle a labyrinth of protocols, APIs, location constraints, performance concerns, security requirements, audit mechanisms, lineage updates, and more.

Fybrik simplifies how a data user (e.g., a data scientist, analyst, developer) works with data, based on common patterns that we have observed. Fybrik automates and handles the interactions between applications, data users, and data sources, which typically involve interacting with IT operators, data governance officers, and data stewards, among others.

How does it work?

We will use an example to demonstrate the concept of the platform.

In the example, Alice is part of a team that develops a fraud detection microservice. The microservice needs to consume some of the organization’s financial datasets. However, governance policies dictate that sensitive parts of financial datasets must be anonymized before use.

The financial data is already registered in a data catalog and the governance officer defined data policies in a policy manager. Fybrik integrates with external data catalogs, policy managers, and credential stores through pluggable connectors. For the sake of the example, the connectors to the organization’s favorite tools already exist.

Alice looks in the data catalog and picks the datasets that the microservice needs. Now all she needs to do is tell Fybrik about it! In our example, the organization’s rules dictate that the intent for using the data must also be declared, so Alice submits the data usage properties of the microservice, which includes “fraud detection” as intent, and a list of the datasets to the Fybrik controller. Technically, in Fybrik this is done by applying a custom Kubernetes resource `FybrikApplication`.

Now the magic comes in. The Fybrik controller interacts with the data catalog, policy manager and other infrastructure components to gather additional requirements such as data governance requirements or infrastructure constraints. It processes all these requirements to create an optimized data flow that connects the microservice to the data, even across multiple clusters. If the requirements change, the data flow is automatically adjusted.

In our example, one of the governance requirements dictates that the data must be anonymized. To compose such a data flow there must be some component that can do data anonymization so it can be deployed as part of the data flow.

In Fybrik these components, that run in the data plane, are called modules and they are another kind of plugin. Everyone can develop modules that implement new functionality and make it available to Fybrik so it can be used in a data flow when required. The functionality may be related to data transformations, data mobility, data access APIs, observability, caching, or anything else that one can imagine.

Try it out!

We recently released version 0.1 of Fybrik as a first initial demonstration of the platform’s vision. Give it a try! Future releases will document other features that are already supported, such as multi-cluster environments, and introduce exciting new features.

We also published a white paper describing our vision for Fybrik. It describes the motivation, main ideas and the principles upon which Fybrik is built.

We welcome contributors, feedback and comments. The project is young, and we continue to add building blocks into fybrik towards this vision. We are soliciting help in defining and realizing this vision. Please feel free to reach out in GitHub Discussions or via the contacts list in the white paper, to share your thoughts, use cases, suggestions for improvements, and code contributions.

Stay tuned for future blog posts that will highlight how a single source of governance truth can be enforced across different cloud instances, cloud vendors, and on-premise environments

Updates:
August 1, 2020 — Content and links updated to reflect new project name fybrik.

--

--