Teradata Open its Data Lake Management Strategy with Kylo: Literally

Published in

D of Things

4 min readApr 17, 2018

Still distilling good results from the acquisition of former consultancy company Think Big Analytics, Teradata, a powerhouse in the data management market took one step further to expand its data management stack and to make an interesting contribution to the open source community.

Fully developed by the team at Think Big Analytics, in March of 2017 the company launched Kylo –a full data lake management solution– but with an interesting twist: as a contribution to the open source community.

Offered as an open source project under the Apache 2.0 license Kylo is, according to Teradata, a new enterprise-ready data lake management platform that enables self-service data ingestion and preparation, as well the necessary functionality for managing metadata, governance and security.

One appealing aspect of Kylo is it was developed over an eight year period, as the result of number of internal projects with Fortune 1000 customers which has enabled Teradata to incorporate several best practices within Kylo. This way, Teradata has given the project the necessary maturity and testing under real production environments to launch a mature product.

Using some of the latest open source capabilities, including Apache Hadoop, Apache Spark and Apache NiFi, Kylo was designed by Teradata aiming to help organizations address common challenges of a data lake implementation and provide those common use cases the will enable reduced implementation cycles that average 6 to 12 months.

Teradata’s decision to release Kylo through an open source model — instead of a traditional commercial one — comes also within an interesting spinoff.

Traditionally a fully commercial software provider, the company has had in recent years a core transformation, being increasingly open to new business models and approaches, including its Teradata Everywhere strategy to enable increasing access to Teradata solutions and services in all possible on-premises and cloud platforms.

This broad strategy includes increased support for the open source community, such is the case with the Hadoop community on different projects, Presto, and now of course with Kylo.

Teradata’s business model for Kylo is based the services its big data services company Think Big can offer on top of Kylo, these optional services include support training, as well as implementation and managed services.

According to Teradata, Kylo will enable organizations to address specific challenges implied within common data lake implementation efforts, including:

Shortage of skilled and experienced software engineers and administrators
Implementation of best practices regarding data lake governance
Reinforce data lake adoption beyond engineers and specific IT teams

Teradata aims with Kylo for a data lake platform that requires no code and enable self-service data ingest and preparation via an intuitive user interface to help accelerate the development process by enabling reusable templates to increase productivity.

From a functions and features perspective, Kylo has been designed to provide the necessary data management capabilities for the deployment of a data lake:

Data Ingestion. Self-service data ingest capabilities along with data cleansing, validation, and automatic profiling.
Data Preparation. Handling data capabilities through a visual SQL and interactive data transformation user interface.
Data Discovery. Data searching and exploration capabilities as well as metadata, view lineage, and profile statistics.
Data Monitoring. Data monitoring capabilities for health of feeds and services through the complete data lake as well as tracking service level agreements (SLA’s) and troubleshoot performance.
Data Pipeline Design. Capabilities for designing batch and/or streaming pipeline templates in Apache NiFi to be registered with Kylo, allowing user self-services.

As per words from Oliver Ratzesberger, Executive Vice President and Chief Product Officer at Teradata:

“Kylo is an exciting first in open source data lake management, and perfectly represents Teradata’s vision around big data, analytics, and open source software. Teradata has a rich history in the development of many open source projects, including Presto and Covalent. We know how commercial and open source should work together. So we engineer the best of both worlds, and we pioneer new approaches to open source software as part of our customer-choice strategy, improving the commercial and open source landscape for everyone.”

With Kylo, Teradata aspires to play a leadership role in the data lake, governance, and stewardship market, yet a difficult goal as niche vendors like Zaloni and Podium Data or big vendor like Informatica with its Data Lake Management solution stack but at first, it looks like a solution to follow closely, especially considering price point due to its business model versus the other commercial offerings.

Want more information?

Kylo software, documentation and tutorials can be found in the Kylo project website or at the project’s GitHub site, or check the following video and its page on Youtube:

Originally published at www.dofthings.com.

Teradata Open its Data Lake Management Strategy with Kylo: Literally

Written by Jorge Garcia