It is a little late to write something about Flink Forward 2018. But I have to because we, well not me, actually co-present with Mesosphere about Flink on Mesos!
As I am writing this, Flink already on 1.4 release and 1.5 snapshot is already out. But we are still on flink 1.3.2
I want to talk a little bit about Flink externalized checkpoint. Flink’s checkpoint is a great…
As we rolled out and stabilized our Realtime Flink Parquet Data Warehouse, we are considering ingest parquet data into druid directly. We follow the guideline here, everything seems working well in the beginning. When our QA team runs integration test on…
Sorry, I trolled on the title. Probably because I expected too much on it.
Back to early last year, Dremio came to our office and did a demo. It was a very informative talk and we asked a lot of Parquet related questions since they are contributors…
From last post, we learned if we want to have a streaming ETL in parquet format, we need to implement a flink parquet writer. So Let’s implement the Writer Interface.
public class FlinkAvroParquetWriterV1<T> implements…
“Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.”
Recently I am working on migrating our currently pipelines (mostly pyspark) to JVM based. Our plan is to use spark for batch processing and flink for real-time processing.
Cluster Setup:
Presto: