Hao GaoinHadoop Noobfluentd out_forward plugin is stuck longer than expectedRecently I am working log ingestion project which we use fluentd to collect and stream the log to Kinesis for further data processing. We…Feb 18, 2019Feb 18, 2019
Hao GaoinHadoop Noobjson vs msgpackWhich is better? It is really hard to say if we don’t give some context or constraints. Because if I could build it from scratch, I may…Sep 14, 2018Sep 14, 2018
Hao GaoinHadoop NoobDebug Flink OOM in Docker ContainerRecently I am planning to deploy a new Flink pipeline. I tested on my local and staging environment. When I deploy it to serve the full…Sep 10, 2018Sep 10, 2018
Hao GaoinHadoop NoobNotes on Apache Mesos SetupRecently, I work on building a new data ingestion pipelines. I need to ingest data from kinesis and dump them on S3. Since I am familiar…Jul 30, 2018Jul 30, 2018
Hao GaoinHadoop NoobFlink Forward 2018It is a little late to write something about Flink Forward 2018. But I have to because we, well not me, actually co-present with Mesosphere…Apr 18, 2018Apr 18, 2018
Hao GaoinHadoop NoobFlink Fault tolerance using externalized checkpointAs I am writing this, Flink already on 1.4 release and 1.5 snapshot is already out. But we are still on flink 1.3.2Apr 9, 2018Apr 9, 2018
Hao GaoinHadoop NoobDruid parquet extension on Array/List typeAs we rolled out and stabilized our Realtime Flink Parquet Data Warehouse, we are considering ingest parquet data into druid directly. We…Apr 6, 2018Apr 6, 2018
Hao GaoinHadoop NoobDremio — best Parquet viewerSorry, I trolled on the title. Probably because I expected too much on it.Feb 13, 20182Feb 13, 20182
Hao GaoinHadoop NoobFlink Parquet WriterFrom last post, we learned if we want to have a streaming ETL in parquet format, we need to implement a flink parquet writer. So Let’s…Nov 8, 20175Nov 8, 20175