An exercise in Discovery, Building Docker Images, using Makefiles & Docker Compose. — Part 5b

George Leonard
2 min readAug 25, 2024

--

Let’s build Apache Hadoop DFS cluster.

(See: Part 4)

(25 August 2024)

For the full entrypoint.sh please see the repo.

...

configure /etc/hadoop/core-site.xml core CORE_CONF
configure /etc/hadoop/hdfs-site.xml hdfs HDFS_CONF
configure /etc/hadoop/yarn-site.xml yarn YARN_CONF
configure /etc/hadoop/httpfs-site.xml httpfs HTTPFS_CONF
configure /etc/hadoop/kms-site.xml kms KMS_CONF
configure /etc/hadoop/mapred-site.xml mapred MAPRED_CONF

...

Well, the magic is all in the above small section of the much larger entrypoint.sh. See the GIT repo for the full entrypoint.sh. Let’s have a look.

What this does is call the configure function and specifies 3 input variables. I’m not going to discuss the actual code from the configure function, but only high-level touch what is accomplished via it.

The first variable is an output file (i.e: /etc/hadoop/core-site.xml), the 2nd is a module name, and the 3rd is the module prefix. The prefix is the first letters of the variable to be read/extract (i.e: CORE_CONF), from the environment variables we injected into the container at startup via the env_file=./hdfs/hadoop.env setting in the docker compose.yml, which is then written by the configure function to the specified output file. Notice that this output location is specific to where we installed the Hadoop binaries in the base image build.

Hint: With very little work this can be made to do same for other systems…

By using this our image stays static but allows us to add and remove variables changing the behavior and configuration of our cluster.

Well, there you have it, an Apache Hadoop 3.3.5 DFS (Distributed File System) cluster build on openJDK11 on top of a Ubuntu 20.04 OS, with 5 data nodes, all from scratch.

Sorry, the section was a bit long… but that was luckily primarily because we copied large sections of code into it…

In the next section I will quickly touch the process of doing multistage based Dockerfile’s builds.

My Repo’s

All the code used during this article will be available on the below GIT repo.

Building Docker Images

About Me

I’m a techie, a technologist, always curious, love data, have for as long as I can remember always worked with data in one form or the other, Database admin, Database product lead, data platforms architect, infrastructure architect hosting databases, backing it up, optimizing performance, accessing it. Data data data… it makes the world go round.

In recent years, pivoted into a more generic Technology Architect role, capable of full stack architecture.

George Leonard

georgelza@gmail.com

--

--

George Leonard

I'm a techie, a technologist, technology architect, full stack architect, Always curious, Love data and data platforms.