Flink Forward 2018

Kevin Barresi
FTS Engineering
Published in
3 min readApr 11, 2018
Greetings from San Francisco

FinTech Studios is always trying to stay ahead of the curve on the latest and greatest technologies. So, we’re excited to have attended Flink Forward 2018 in San Francisco, April 9–10 2018.

Flink Forward is the top conference for all things Apache Flink, an open source stream processing framework specializing in distributed, high performance data streaming applications.We were glad to have the chance to meet up with other industry leaders and discuss the future of data streaming technologies.

Core Takeaways

  • Flink is a very healthy Apache project, and we can foresee a robust development effort and pipeline in the future.
  • There are several projects like Apache SOMOA or Apache Beam that aim to generalize SDK languages and domain specific functionality (e.g. machine learning), so application developers don’t have to worry about the underlying stream processor (Flink would work the same as Apache Apex, Spark Streaming, etc, and the developer wouldn’t know the difference). Extra language support seems to be a common theme.
  • Many people use ElasticSearch as a sink for analytics, and then Kibana to display.
  • SQL CLI in Flink 1.5.0!

Nifty Tips

  • The Flink AsyncFunction interface is very powerful (you can integrate external operations inside a Flink workflow). However, there are several factors to keep in mind or you’ll run into issues. For example: AsyncFunction uses complete(), not collect(), so the Flink UI shows no backpressure. This can cause for some interesting debugging scenarios.
  • The SQL/Table API is great for data pipelines, low latency ETL, analytics stream, and powering live dashboards.
  • Kafka MirrorMaker makes it easy to duplicate streams across regions for active-active, or any HA architecture.
  • Bootstrapping Flink does not exist of the box. However, one option is to use stream retention (Kafka, Kinesis, etc) rather than rely on Flink itself.

Conference Summary

The main theme here, without a doubt, is big data. Specifically, how to build an architecture and system that can efficiently scale to meet the growing needs of production applications.

Clearly Apache Flink is a very healthy open source project; it’s one of the top 10 most active Apache mailing lists, and has on average 12k downloads per month. The diverse set of use cases is clearly a factor in its quick uptake: capital risk, fraud detection, real-time business intelligence, and machine learning, to name a few.

Stream processing is about building applications.

Steam processing changes the database-centric architecture.

As a general trend, we’re seeing a shift away from the late 2000’s “data lake” architecture, where the idea is to collect/store everything now, and figure it out later. The data lake trend clearly coincided with the dramatic decrease in disk capacity & memory prices, and the rise of powerful batch “hammers”, such as Hadoop.

Instead, we’re seeing a move towards the Kappa architecture (or something closer to that), where stream analytics play a crucial part in the larger architecture.

However, with Apache Flink as an example, stream-centric architectures are causing a stronger coupling of the application and database — this in turn creates challenges in the operational aspects of maintaining these types of systems throughout the software development lifecycle. Some examples:

  • How do you carry state forward when updating, changing, or scaling Flink?
  • How can you fix issues in history, aka data reprocessing/replay?

These are issues anyone trying to deploy Flink into production would run into. Data Artisans, the company that is running Flink Forward, has a platform (dA Platform) that aims to solve these issues.

As the core Flink project is being improved, several open source projects specializing in certain use cases or sections of Flink are popping up. This includes Pravega, which focuses on stream storage, and Apache Beam, which aims to provide multiple language bindings and domain specific tools, generalized to work with underlying “runners” (including Flink).

Overall, the general “atmosphere” is that Flink is a great tool for production use today in several use-cases, but it’s not yet a full enterprise streaming platform. For that, it needs things like RBAC, metadata management, failover/failback, etc.

--

--