Trino, Superset and Ranger on Kubernetes: What, Why, How?

Madokai
Geek Culture
Published in
7 min readFeb 22, 2022

--

Photo by Visual Stories || Micheile on Unsplash

This article is an opinionated SRE point of view of an open source stack to easily request, graph, audit and secure any kind of data access of multiple data sources. This post is the first part of a series of articles dedicated to MLOps topics. So, let’s start with the theory!

What Is Trino?

Trino is an open-source distributed SQL query engine that can be used to run ad hoc and batch queries against multiple types of data sources. Trino is not a database, it is an engine that aims to run fast analytical queries on big data file systems (like Hadoop, AWS S3, Google Cloud Storage, etc), but also on various sources of distributed data (like MySQL, MongoDB, Cassandra, Kafka, Druid, etc). One of the great advantages of Trino is its ability to query different datasets and then join information to facilitate access to data.

Trino, by its ability to offer a centralized entry point to the different database systems, allows:

  • The developers to avoid the development, duplication and maintenance of code necessary to connect to the different database management systems,
  • The administrators to facilitate the maintenance of the various database systems thanks to this abstraction of the infrastructure that Trino offers. Applications no longer connect…

--

--

Madokai
Geek Culture

DevOps, Observability, Cloud Computing and Automation!