Fybrik 0.6.0 Released

Sima Nadler
fybrik
Published in
3 min readJan 19, 2022

Together with the New Year comes a new and exciting release of Fybrik!

As we described in our first blog post, Fybrik simplifies how a data user (e.g., a data scientist, analyst, developer) works with data, based on common patterns that we have observed. Fybrik automates and handles the interactions between applications, data users, and data sources, which typically involve interacting with IT operators, data governance officers, and data stewards, among others. And, in a subsequent blog post we shared use cases implemented in collaboration between ING, a Dutch multi-national bank and finance corporation, and IBM.

In version 0.6 released last week, some very exciting features have been added, making Fybrik even more flexible and powerful.

Taxonomy

The taxonomy defines common terms across components. One of the powerful aspects of Fybrik is its ability to build and orchestrate data planes at the infrastructure level that contain components from disparate development groups or organizations, coupled with its ability to support different data catalogs and policy engines. However, to do this in a manner that does not require upgrades or redeployment of components when new capabilities are added or changes are made is a major challenge. To address this challenge taxonomies were added to Fybrik. Learn more by reading the taxonomy overview, and the how to use the taxonomy.

IT Config Policies

In the ING use cases the ability of Fybrik to build a data plane based on data governance policies was demonstrated. Being able to do this as an extension to kubernetes, rather than doing governance in a siloed solution, was a breakthrough approach to handling sensitive data.

In the new release, in addition to supporting data governance policies, support for IT config policies was added. These policies allow the organization to influence the data plane components chosen by Fybrik for a given workload and data set, and the choice of the cluster in which each component will run. This new capability enables your organization to better leverage the infrastructure available and the costs associated with it.

Examples of the types of policies that can be implemented:

  • For development workloads low-cost storage should be used when data is copied.
  • For high priority production workloads latency should be less than 0.5 seconds.

For more on IT policies see the configuration policy introduction.

Standardization of Fybrik Logging

The Fybrik control plane as well as the components in the data planes it orchestrates generate log files. Because of the many disparate components involved in the solution and the fact that multi-cluster data planes are supported, there are many different log files generated for a given workload.

Fybrik, similar to kubernetes and other infrastructure solution providers, does not provide an out of the box solution for aggregating log files, parsing them, and providing the relevant information to the relevant actors.

However, as part of the new release Fybrik has provided guidelines and libraries to standardize the structure and content of the logs generated. In addition, the Fybrik control plane now generates the new json format logs and passes to all components a globally unique identifier associated with the FybrikApplication instance. This is an important step towards using standard logging aggregation solutions with Fybrik.

--

--

Sima Nadler
fybrik
Editor for

IBM Research. Expert in privacy & hybrid cloud data protection. Opinions expressed are my own.