Deploying an enterprise-grade bare metal OpenShift cluster in less than a week

Ben Randall
5 min readMar 4, 2024

Let’s talk criteria for this challenge. Our goal is to get a Red Hat OpenShift cluster up and running in a reliable manner, and to have that cluster be resilient. That means no single points of failure. We’ll need network redundancy through the full stack. We’ll also need a resilient storage layer — preferably Container Native Storage, so we don’t have a dependency on an additional storage appliance that needs to be managed separately.

The solution I’m going to be looking at is IBM Fusion HCI — a hyper-converged appliance positioned as “OpenShift in a box”. It meets a number of my criteria:

- As an appliance, it comes with all of the compute, network, and storage needed for a bare metal OpenShift deployment.

- The networking is configured out of the box with full redundancy. There are four switches (two for appliance management, two high speed switches for OpenShift), and all of the nodes are cabled to all four switches.

- The nodes in the appliance can be configured with an adjustable number of NVMe drives, and Data Foundation (built on Ceph) provides highly available storage via three way replication.

The hardware that makes up the Fusion HCI appliance.

Fusion HCI isn’t just a hardware deployment though. It orchestrates the installation of OpenShift when you first bring it online, resulting in an operational OpenShift cluster that’s ready for use. Fusion also has built in software for backup/recovery and disaster recovery, which doubles down on my resiliency requirement.

Overall, this is a promising pick, as I don’t need to choose compute, network, and storage vendors, then architect the whole thing for resiliency, figure out how to deploy OpenShift on it, test it out, harden it, find a vendor for data protection… plus maintain it.

So let’s see what it’s like to set Fusion HCI up. I set a target for “less than a week” because this is a full 42u rack appliance, and we’re all going to deal with some different logistics of getting it into the data center, setup, connected to the network, and powered on. My rack was completely setup for me by IBM. Below are my key notes for the rest of the installation.

Getting Fusion HCI on the network

Preparing the network for Fusion HCI consists of two key steps:

1. Figuring out where I’m going to put Fusion on my network.

2. Configuring DHCP and DNS.

The key consideration for connecting Fusion HCI to the network is that it is a switch-to-switch connection, not a node-to-switch connection. The Fusion HCI appliance features high speed switches that are used for the OpenShift and storage networks. All of the nodes in the appliance are redundantly cabled to these switches. Those high speed switches are then connected to switches in your data center. As such, you can think of the appliance switches as leaf switches that are connected directly to core switches.

The Fusion HCI documentation covers how DHCP and DNS are configured for the nodes in the appliance. A list of MAC addresses are provided for the nodes in the appliance, and these are then used to configure DHCP. DNS is then configured using naming conventions prescribed by the appliance.

I didn’t do any of this part myself, as there’s a team that manages the network, but it was important that they understood the requirements.

Mirroring images

The Fusion HCI appliance pulls images for OpenShift and Fusion software during its install. Those images could come directly from Red Hat and IBM’s registries, or I can mirror the images to a private registry. I have to think that most people are going to use the latter approach, as it enables image scanning. I have an Artifactory instance, and I’m going to use that to hold the images.

I have two options for how I do this:

1. Stick all of the images into a single repository.

2. Use separate repositories for the OpenShift images and the Fusion images.

I like the second approach, because it allows me to separate the management of my OpenShift images from any other software.

The instructions for mirroring images can be found in the Fusion HCI documentation.

Running the installer

Fusion HCI features a graphical installer that I access using my web browser, and I was given the URL by the IBM engineer who set up the appliance. The first thing I see is a health check of the network, which is green, so I go on to the next step. I’m prompted for an image registry to use for installing OpenShift and Fusion software. I’m going to use the Artifactory instance that I mirrored my images to, so I choose the Private option and enter information to connect to the registry that has my OpenShift images, and the registry that has the rest of the images.

Next I’m shown the default network settings that will be used for the OpenShift network and the Storage network. Since these are both networks that are internal to the appliance, and I’m not doing anything to expose these networks externally, I’m just going to accept the defaults. If I was configuring metro disaster recovery, which involves stretching a storage cluster between two HCI appliances, I’d need to make sure my storage network entries didn’t have conflicts between the two appliances, but I’m not doing that now.

Finally, I’m asked for an optional custom certificate. The installer will configure OpenShift with the certificate if I upload one. If not, a self signed certificate will be used. That means I’ll get one of those annoying certificate warnings when I try to log into the OpenShift GUI, although I can always configure the custom certificate later if I want. I’m going to configure the certificate now just because it’s a best practice, and I can do it with a simple file upload.

Ok, now I’m on for the ride. The installer is chugging away and showing me some progress messages while it does what looks like a bunch of stuff. I can definitely see it initializing the OpenShift cluster. If you’re curious about how the installer actually works, you can read an interesting blog post here.

It took over an hour, but the installer completed and OpenShift is up and running. I’m prompted to download the credentials for the cluster and the Core OS key. Trust me, this is important — if I lose these, I’m basically locked out of my cluster. I’m going to store them where I won’t forget until I’ve had a chance to log into the cluster and customize authentication.

The prompt to download credentials for the cluster. Make sure to do this step.

I’m given the choice to log into the Fusion GUI or the OpenShift GUI. I’m going to go right into OpenShift so I can check out the cluster.

The result

So I now have a three node OpenShift cluster — a small cluster, to be sure, but sufficient for me to start customizing/hardening it. My rack contains three other nodes that aren’t yet part of the OpenShift cluster. I can use them to scale out the cluster when needed, which is cool, because I don’t need to license everything right out of the gate. When I do scale it out, I’ll try to capture my notes and post them.

If you want to learn more about Fusion HCI, check out this link.

--

--

Ben Randall

I'm a software development architect, and I've focused my career on enterprise storage and container workloads.