Fybrik Open Architecture and Ecosystem

Ronen Kat
fybrik
Published in
5 min readDec 21, 2021

Fybrik is designed to help organizations extract value from their data while upholding the constraints, rules, and regulations that must be followed when using the organization’s data. Navigating these constraints can be difficult, especially since the data is often dispersed on multiple systems, locations, and technologies. To address this challenge, we designed Fybrik to be an open platform, allowing it to be easily used with any tools and technologies for data management and integration. In this blog we describe the Fybrik architecture and its open approach for integrating with data catalogs, data governance policy managers, credential managers, and data access and enforcement tools — whether commercial or open-source.

Fybrik is an open platform for orchestrating non-functional aspects of data, so you are free to focus on your core business goals. For example, the platform can be used to enforce data governance, handle data credentials, and optimize performance.

A key point in the system’s design is its open architecture and open eco-system approach. Fybrik is designed to enable integration with existing data tools and solutions. As shown in Figure 1, the open integration approach includes the following aspects:

  1. Data catalog — contains metadata details describing the data, schema, and tags
  2. Data governance policy manager — contains enterprise policies that dictate how data can/should be used
  3. Credentials store — the enterprise repository for data access credentials
  4. Data plane components — software and tools that perform some functionality on the data before, during, or after the data is accessed by an application
Figure 1. Fybrik open architecture.

Let’s look at how Fybrik orchestrates the compliant use of data by integrating with the above components. When running an application, Fybrik deploys components called data plane modules, which form a layer between the application and the data. These components handle non-functional actions such as access control, enforcing data governance (e.g., reduct, mask, encrypt), auditing and tracking the use of data, and more. Any actions taken are derived from policies that were defined by the enterprise and are enforced by the data plane components that Fybrik deploys.

Using the following components (shown in Figure 1), Fybrik translates the policies into concrete actions. It gathers information on the data used by the workload, obtains the relevant policy decisions, selects appropriate data plane modules from a module library, deploys the components, and provides access to credentials.

Data catalog: Fybrik connects to an external data catalog to obtain the properties of the data that is used by the workload. The properties of the data include metadata, tags, and connection information for accessing the data. Fybrik can integrate with any data catalog by building a connector that implements the Fybrik data catalog API. For example, you can add support for data catalogs such as Egeria and Amundsen by developing a connector using the data catalog API.

Data governance policy manager: Fybrik connects to an external data governance policy manager to obtain the required enterprise policy decisions. The policy decisions specify the actions that Fybrik must perform in order for the use of the data to be compliant. Fybrik then selects the data plane components that can enforce and apply the actions listed in the policy decisions. For example, an organization’s policy may require that any customer name, age, and address data be masked in some cases. Fybrik can integrate with any policy manager through a connector that implements the Fybrik data policy connector API. For example, the data policy connector API is used to communicate with policy managers such as Open Policy Agent (OPA);

Credential store: Fybrik provides a mechanism to get data access credentials for the data plane components that act as the layer between the application and data, and perform the access to the data. In this way, users don’t need to provide by themselves credentials to access data.
The data plane components use HashiCorp Vault client API to access the needed secrets for accessing data, and developers can add support for additional credentials stores by implementing Vault’s de-facto standard secret engine backends. In Fybrik, we developed a secret backend for Vault to read credentials from Kubernetes secrets.

Data plane components: The data plane components, called modules, are deployed by Fybrik from an open and extensible library of modules. Developers can build similar modules by following the Fybrik guidelines and specifications for building modules. These modules provide functions such as access to data, inspection, audit, and more. Adding a data plane module to Fybrik is simple; it requires creating a YAML specification called FybrikModule, which points to a Helm chart (stored in a repository) that deploys the module according to inputs from Fybrik.

The open architecture enables Fybrik to be part of a larger ecosystem. For example, Fybrik as an orchestrator can help drive lineage information by connecting with tools such as OpenLineage, a standard for metadata and lineage collection, or Marques, a tool that shows how data is used.

One relevant community for Fybrik is the Linux Foundation AI & Data (LF AI), home to many of the projects named above. It’s also part of the Fybrik journey toward building ties with additional related projects. In October 2021, we presented Fybrik to the LF AI DataOps committee forum. We invite you to look at the presentation, which offers more details on Fybrik, the open architecture design, and its relation to potential tools. Both the recording from the meeting and the presentation slides are available online.

The Fybrik team is looking forward to expanding the platform’s integration and connection with additional solutions, whether open source or third-party. We welcome your feedback and would love to hear from you about the discussions and issues on the project github. Contributions for extending Fybrik support for additional data catalogs, policy managers, credentials store or data path modules are very welcome.

--

--