Run flink cluster in Mac
Context
Flink is a popular opensource data streaming framework for big data processing. In this article, we gonna show how to run Flink cluster in MacBook.
Prerequisite
- Set up Java env in MacBook first, you can follow the instruction here: https://medium.com/@databackendtech/set-up-java-development-environment-in-macbook-with-java-19-and-maven-746ab6446587
- We gonna use Flink 1.16.0 version, so you would need to download the package here: https://www.apache.org/dyn/closer.lua/flink/flink-1.16.0/flink-1.16.0-bin-scala_2.12.tgz
Run cluster
- unzip the downloaded tgz package
tar -xvf flink-1.16.0-bin-scala_2.12.tgz
- In the unzipped folder, there is a ‘conf’ sub-folder, which contains the main configuration file flink-conf.yaml for Flink cluster. The config file contains some default config already, so no change needed for now.
data_backend_tech@hacking... % ll conf
total 112
drwxr-xr-x@ 13 data_backend_tech staff 416 Dec 26 07:50 .
drwxr-xr-x@ 13 data_backend_tech staff 416 Oct 20 02:58 ..
-rw-r--r--@ 1 data_backend_tech staff 12943 Dec 26 07:55 flink-conf.yaml
...
...
- set the environment variable FLINK_CONF_DIR for conf dir
export FLINK_CONF_DIR='/some_path_to_conf_dir/conf/'
- Run cluster
./bin/start-cluster.sh
If the cluster ran successfully, you gonna see following output in console:
data_backend_tech@hacking... % ./bin/start-cluster.sh
Starting cluster.
Starting standalonesession daemon on host ip-192-168-1-6.ec2.internal.
No workers file. Please specify workers in 'conf/workers'.
Then use browser to view localhost:8081 for the admin page of local Flink cluster:
Some potential errors
- IllegalConfigurationException: Exception in thread “main” org.apache.flink.configuration.IllegalConfigurationException: JobManager memory configuration failed: Either required fine-grained memory (jobmanager.memory.heap.size), or Total Flink Memory size (Key: ‘jobmanager.memory.flink.size’ , default: null (fallback keys: [])), or Total Process Memory size (Key: ‘jobmanager.memory.process.size’ , default: null (fallback keys: [])) need to be configured explicitly.
- Solution: you would need to manually define jobmanager.memory.heap.size or jobmanager.memory.process.size in the config file
Additional Note
If you want different configuration settings for different scenario of usage of the Flink cluster, you can put flink-conf.yaml in different folder, and override FLINK_CONF_DIR when running Flink cluster.
Summary
In this article, we showed how to run Flink cluster in session mode, in another follow-up article, we gonna show how to run job with the Flink cluster. Happy Flinking!