Running Apache Flink on Windows

Pritam Pathak
6 min readMar 14, 2023

--

To give a bit of context to why this blog exists in the first place,

If you’re new to Flink, it's probably a good idea to set up a standalone cluster locally in your machine and submit jobs to it so you can get a flavor of how Flink works.

While trying to do this, I’ve run into multiple issues while setting up the environment in windows and found most of the blogs and tutorials are primarily executing the local demos on ubuntu. Also, Flink’s newer version has deprecated windows executables and there are quite a few bugs you’ll run into if you plan to get it done on windows.

So, in this tutorial, I’ve tried to cover everything you’d need to do in order to spin up a Flink standalone cluster locally on windows.

1. Prerequisites

There are a couple of things that you need before installing Flink. First of all, ensure jdk is installed in your system. Open up the command prompt and execute

$ java -version
java version "1.8.0_202"
Java(TM) SE Runtime Environment (build 1.8.0_202-b08)
Java HotSpot(TM) 64-Bit Server VM (build 25.202-b08, mixed mode)

If you don’t have java installed in your system, you can download it from here.

Please note that Apache Flink supports both java 8 and 11. If you are starting out afresh, then the recommendation from Flink is to use java 11 since 8 is deprecated. The devs are working on integrating java 17 and is expected to be supported in the newer releases. For the purpose of this tutorial, I’ll be using java 8 to be compatible with older systems.

Once java is up and running, you’d need to install cygwin to mock UNIX commands. As of Flink 1.15, the devs have removed support for windows, so cygwin helps us with emulating linux commands. This part can be replaced with WSL as well.

In the setup tool, ensure mintty (linux emulator) and netcat (write to network) are installed. Netcat isn’t added by default so it needs to be added explicitly. This will be needed for the demo in the later section.

Post installing, navigate Cygwin folder. Locate .bash_profile in the home directory (Typically, it’ll be in C://cygwin-64/home if the installation path is unaltered). Append these two lines at the end of the file and save it.

export SHELLOPTS
set -o igncr

Basically, these commands are added to make sure, 1. Cygwin can flow the command line arguments correctly to all the child processes and 2. the carriage return behaves normally while executing a command.

2. Install Apache Flink

Get the latest Flink distribution from here. The latest version available as of this blog is 1.16 and I’ll be using this throughout the tutorial.

Unpack the tar file using any file archival tool (winRAR, 7-zip, etc).

The unpacked files should look like this:

To briefly explain the folder hierarchy of this.

  • bin: This has all the bash scripts to start cluster and task manager.
  • conf: This contains the default configurations which can be updated to suit our needs.
  • examples: Sample executable jars to test out standard use cases. The source code of these can be found here.
  • lib: The core executables of Flink are available here.
  • log: During a session, the cluster and task managers' logs are dumped here. This is very useful to debug runtime issues.
  • plugins: External plugins for monitoring the system.
  • opt: Additional libraries that are part of Flink’s ecosystem.

Important!

One tweak to the default config is needed to ensure the Flink processes can execute as expected. Open flink-conf.yaml file under conf in any editor of your choice and append the following line to the end of file.

# localhostLogs can be replaced with any other name of your choice.
taskmanager.resource-id: localhostLogs

This instructs Flink to create a temp folder under Cygwin with this name for maintaining its state logs while it spins up a cluster.

This addresses an existing issue with Flink where the default name contains a colon which is an acceptable naming convention for folders in UNIX but not in Windows. If you’re interested in reading further about this issue, you can check this mail thread.

3. Starting Flink cluster locally

Now we are all set to start a Flink cluster locally (on windows). Open up Cygwin, navigate to unzipped Flink folder, and execute the following command.

$./bin/start-cluster.sh

Now, if everything goes well, you should be able to see 2 Flink processes on checking the list of active java processes.

One of them is the standalone cluster that’s executing locally in your system and the second process is a task manager which allocates and de-allocates resources (task slots) to jobs.

You can also open up the Flink dashboard on localhost:8081.

If there are any issues you’re facing here, you can open up the log files under the log folder to debug further.

4. Executing a demo streaming application

We’ll take the traditional word count problem and see how that works in Flink in a streaming fashion.

To explain the problem statement in brief:

We’ll read inputs of words in a streaming fashion.

Apply a filter: words only starting with S.

Count the number of occurrences of the words so far.

The code for this can be found here. You can clone it locally and do a mvn clean install to generate the jar file. I won’t explain the code as I’ve added ample comments to it so it’d be pretty self-explanatory.

After the jar file is available, it's time to see our code in action!

For this, you’ll need to open up two Cygwin terminals. In the first one, execute this:

$nc -l 9999

This opens up a socket connection on the 9999 port. This will form our input stream.

In the second terminal, execute this set of commands:

prita@Pritam-PC /cygdrive/c/Study/Flink/flink-1.16.1
$ ./bin/start-cluster.sh
Starting cluster.
Starting standalonesession daemon on host Pritam-PC.
Starting taskexecutor daemon on host Pritam-PC.

prita@Pritam-PC /cygdrive/c/Study/Flink/flink-1.16.1
$ ./bin/flink run ../Tutorials/wordCountStreaming/target/wordCountStreaming-1.0-SNAPSHOT.jar
Job has been submitted with JobID 364898bd09fff839bf38b549b8b04781

This submits our jar file to Flink cluster. Voila! we have successfully set up the streaming job!

Now, head over to the dashboard. In the overview section, our job should appear.

Go to Task Managers. Here, you’d find a task with the name localhostLogs (or the name you’ve specified in Flink config), click on that. You would now see all the job details for this task. You can open up StdOut as the outputs will be logged here.

Seeing it in action:

5. Stop the Cluster

Once you’re done experimenting, you can terminate the netcat session by pressing ctrl+C.

For stopping the Flink cluster, execute this shell command.

$ ./bin/stop-cluster.sh
Stopping taskexecutor daemon (pid: 8273) on host Pritam-PC.
Stopping standalonesession daemon (pid: 8013) on host Pritam-PC.

With that, we’ve executed a streaming application in Flink end-to-end locally on windows! 🎉

And, that completes our tutorial!

I’ve planned to release further tutorials on Apache Flink, so please watch this space to stay updated.

--

--

Pritam Pathak

Data Engineer @ Microsoft - Writes about Cloud, Big data and ML