Hi, I was wondering if you know of a way do this:

Hi Ashwin,

I think what you are looking for is a broadcast hash-join. In this case, Flink will send the (small) left side to all machines and partition the probe side.
Locally, it will execute a hash join.

Have a look at this blog post, explaining how joins are done in Flink: http://flink.apache.org/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html

Flink’s optimizer will automatically pick the join strategy. You can check with the web-client of Flink what strategy it has choosen.
If the strategy is not as intended, you can manually enforce a strategy.

Show your support

Clapping shows how much you appreciated Robert Metzger’s story.