Flink Forward 2018

Published in

Hadoop Noob

2 min readApr 18, 2018

It is a little late to write something about Flink Forward 2018. But I have to because we, well not me, actually co-present with Mesosphere about Flink on Mesos!

If you are interested in the presentation, you could try to find the recorded video later (not published yet). I think it would be a useful for whom looking for Flink + Mesos combination.

I am going to share some quick stats about our Mesos cluster and Flink jobs

50 streaming jobs in Flink. Streaming jobs read from Kafka and persist data into HDFS/Druid.
Streaming jobs handle 120k rps/sec. The biggest Kafka topic we have is around 70k rps/sec. We assign 32 cores to the Flink streaming to handle the biggest Kafka topic.
Streaming jobs process 2TB raw data (protobuf) per day and persist 200GB on HDFS (parquet)
We run a Mesos cluster with more than 200 nodes in AWS
We run Flink in Mesos both as a framework and as a Marathon app. Streaming we run one pipeline per cluster as a mesos framework. Flink in Marathon usually is used for ad-hoc batch jobs.

Although, we already run Flink on Mesos in production, we have to overcome some limitations in Flink cluster management itself (not only Mesos). For example, unlike spark, when you launch your Flink application in Mesos, your driver doesn’t wait until the required resources (slots) is ready, it will just start and fail. If you are interested in more details, you can refer to FLIP-6. I see lots of people including mesosphere are actively working on it. We will definitely see more in the next Flink Forward!

Hope this brief note could give you some confidence for Flink on Mesos. If you are more interested in Flink Forward 2018 in general, I found this Article useful :)

Flink Forward 2018

Written by Hao Gao