Transforming WebSphere to Open Liberty on OpenShift: An Operations point of view — Part 5: Operational Comparisons

Published in

AI+ Enterprise Engineering

16 min readJul 7, 2021

This is Part 5. You can also review Part 1 and Part 2 and Part 3 and Part 4.

Introduction

This series is devoted to introducing traditional WebSphere Application Server Network Deployment (WAS ND) architects and runtime operations teams a first look at Open Liberty on OpenShift Container Platform (OCP).

In this fifth installment of our series, we will focus on comparing a sampling of tasks performed in a WAS ND environment to the roughly equivalent tasks for Open Liberty applications in an OCP environment.

The background for the example tasks discussed in this document came from a review of RACIs (spreadsheets describing who is Responsible, Accountable, Consulted, or Informed for various tasks) from various sources. Setting some intricate custom implementation details aside, the majority of tasks are generally very similar, no matter which customer has allowed us to behold their RACI spreadsheets.

A side comment: This is a good time to review your organization’s RACI spreadsheet between your WAS developers, WAS admins, the other middleware admins with whom your WAS admins interact, day 2 operations teams (including monitoring, logging, and alerts), technical architects, and business owners.

Be it the first environment, or the fifty-third environment, every customer tends to have a variety of procedures that must come together in an orderly, approved, and budgeted sequence in order to create environments and deployment targets into which workload will be deployed.

In this post, we will examine a small number of very common tasks:

Creating the host environment
Creating a deployment target
Deploying workload
Scaling workload
Restarting workload

For each of these tasks, this post will take a very high level view of how those tasks are generally approached in WAS ND environments, and how those tasks are generally approached in Open Liberty on OCP environments. As it was in previous installments of this series, the storytelling in these examples has been greatly simplified in order to focus on the major concepts.

Example Task: Creating the host environment

Essentially, this task can be interpreted as “installing the products ” and “basic preparations of capabilities.”

First, let’s talk about “installing the products.”

There are volumes of pages-long instructions for installing the products, including:

WAS ND Knowledge Center https://www.ibm.com/docs/en/was-nd/8.5.5?topic=855-installing-configuring-your-application-serving-environment
Red Hat OpenShift Cloud Platform documentation (expand “Installing”) https://docs.openshift.com/container-platform/4.6/welcome/index.html

To make this post easier to read, we’re not going to sift through all of the options and details.

What this means to WAS ND: A fairly common assumption is that the WAS ND environment will be installed on VMware. As with any major installation, you will need to coordinate with the networking and infrastructure teams to ensure that what you’re about to create will be allowed on the network, using approved images, and that you’re not breaking corporate rules.

You will obtain all of the WAS ND install packages and fixes, and then install the WAS ND product via Installation Manager. You will then create the Deployment Manager profile and some nodes via the Profile Management Tool or the managedprofiles command. You will then federate the nodes to the Deployment Manager. You will be able to administer the WAS ND environment via the WAS ND console and via the WebSphere scripting tool, wsadmin.

For the purpose of this story, at this point you have an empty WAS ND cell with no clusters.

What this means to OCP: This description can immediately mean different things, depending on what type of infrastructure or cloud provider is used.

Ordering up a new OCP cluster from a public cloud provider, such as Red Hat Open Shift Kubernetes Service (ROKS) on IBM Cloud, can be very straightforward, and relieves you from a lot of preparation and maintenance activities. This document will skip this easy case for this task.

Creating a new OCP cluster from scratch in your own infrastructure is more detailed. These actions depend on if your requirements and infrastructure are capable of using Installer-Provisioned Infrastructure (IPI) or User-Provisioned Infrastructure (UPI). IPI is very opinionated, and it configures all of its necessary infrastructure underpinnings. If your corporate requirements do not align with OCP installer’s opinions, or if you are installing on-premises and your infrastructure is not capable of all of the IPI requirements, then you will explicitly prepare infrastructure for UPI.

In either the IPI or UPI case, you will still need to coordinate with the networking and infrastructure teams to ensure that you are getting what is needed for the installation, and that you’re not breaking any corporate policies. After downloading the install files, and editing the installation configuration files, and setting up your pull secret so you can get container images from registries, you can then run the openshift-install command to install the cluster. When that is done, download the oc and kubectl CLI for managing the cluster via scripting. You can also use the OCP dashboard. Finally, create some storage for the registry operator.

For the purpose of this story, at this point you have an “empty” OCP cluster with no application-specific namespaces/projects. There are many namespaces for OCP- and Kubernetes-centric capabilities, but those are not shown in the figure below.

The example on the left illustrates an “empty” WAS ND cell. The example on the right illustrates an “empty” OCP cluster.

The simplified description of “installing the products” does not intend to gloss over the amount of planning and organization that truly occurred to get to this point. In either the WAS ND or OCP case it is considerable. For example, coordination with the networking and infrastructure teams is an obvious requirement. The degree of detail each may entail can vary significantly, depending on requirements, procedues, or the kind of infrastructure or cloud providers being used.

If you are fired up with a litany of additional considerations that you would include in the definition of a task for “creating the host environment”, then you are exactly the audience for this paper. Some of those “basic preparations of capabilities” include:

Setting up user/group authentication
Ensuring that the WAS ND cell or the OCP cluster is configured for your logging and monitoring requirements
Additional storage considerations

Once the encapsulating host environment exists, we can now turn our attention to the more granular (and more often repeated) tasks at hand.

Example Task: Creating the deployment target

This task picks up where the the previous task (“Creating the host environment”) completed. Meaning, that the WAS ND product is installed, and the nodes have been created and federated. This will also assume that the OCP product is installed and has several ready worker nodes.

What this means to WAS ND: Assuming capacity is available on the nodes, via the WAS ND Administration Console or wsadmin scripting, create the cluster onto which the workload apps will later be deployed. Double check JVM settings to ensure compliance with corporate standards.

At this point:

you do know how much memory has been requested, at least as per the initial definitions, of each JVM. Memory use will grow over time. Memory use is just one of the things that should be addressed via performance testing.
you do know the host and port values associated with each cluster member’s JVM. Create or locate a webserver, such as IBM HTTP Server (aka IHS), which will be needed later for routing HTTPS traffic to the workload apps in each of the cluster member JVMs.

What this means to OCP: Via the OCP console or via scripted yaml, create a new namespace/project. Secure the namespace and set quotas for the resources that will soon be created in it.

At this point:

you have not explicitly utilized any space (memory or disk use) from the available resources per worker node, because no pods have been scheduled and deployed.
you do not know service endpoint information, because none of the workload’s services have been deployed yet. An IHS server is no longer necessary to route incoming HTTPS traffic across dedicated endpoints like you did in WAS ND, because Kubernetes is different. Workload might come and go, and be scheduled to different nodes each time. The kube-proxy and/or ingress will deal with that after services are created.

On the left is a WAS ND cell that has 1 cluster. The example cluster (“Cluster 1”) has 1 cluster member JVM on each of the 3 nodes. No workload has been deployed to this cluster. On the right is an OCP cluster, which has a new namespace (“namespace1”). No workload has been deployed in this namespace yet. However, various policies have likely been created in this namespace to define who and what is allowed to happen in this namespace, such as user/group permissions and resource quotas.

Generically speaking, the deployment targets themselves are not particularly interesting. They become more interesting when a workload to be deployed there has been identified.

Example task: Deploying workload

In the end, a “workload” is the number of requests, responses, vCPU, executing code, and resources utilized to make bits of data become something useful. In order to arrive to the ability to do useful things, a “workload” needs quite a few things, including:

the packaging of the application code
the resources which the application code needs to use at runtime
the method by which the application arrives to the deployment target

What this means to WAS ND: To deploy a workload to a WAS ND cluster, the following details are needed:

The application is packaged as EAR and WAR files.
A definition of the WebSphere resources used by the application at runtime. These resources generally include JDBC datasources and JMS connection factories, amongst others.
The resources should be defined at the scope to which they are pertinent. This usually means that the scope is the specific cluster to which the application will be deployed. Some resources (such as certain resource adapters) must be defined at the node scope (meaning, repeat for every node hosting a member of the cluster). In any case, the thing to note is this: a lot of WebSphere resources are defined at scopes outside of the application.
There should be a pipeline that leads up to the deployment of the EAR to the cluster.
This is usually an automated process with well-defined configuration (often using UrbanCodeDeploy or Chef). However, this is sometimes implemented as a WAS admin responding to a ticket, manually retrieving an EAR from a repository, manually reviewing notes in a playbook about which resources need to exist and what parameters to use in the target environment, and then using the CLI or WAS ND admin console to create resources and deploy the EAR.

An EAR file is essentially a zip file with a lot of metadata, compiled code, and some exposed bindings so that when the EAR or WAR is deployed, the admin can tell the application how to look up and find the specific runtime resources it needs from the available scopes of JNDI namespace.

Prior to application deployment:

the application architects should have been communicating with the WebSphere admins to describe the Service Level Agreements (SLAs) and runtime requirements the application needs
the WebSphere admins should have enough information from the application developers (“hey, head’s up, the app is expecting to use a database and a queue”) and from the MQ/DB2 admins (“if your app is trying to use my database, then use this host name and port number to find it”) to create scripts for the creation and configuration of those WebSphere runtime resources, such as JDBC datasources or JMS connection factories.

While the WebSphere admin deploys an application, either via scripting or the WAS ND admin console, they will ensure that the application is mapped to all the resources it requests. They will verify that the app deployment process completes without error, that the information has synchronized from the Deployment Manager to each of the nodes, and will review each cluster member’s logs to ensure that the application is started. A plugin-cfg.xml is generated and propagated to the webservers.

As a final sanity check, a small test load can be run to verify that the application is able to connect to and use its resources.

What this means to Open Liberty on OCP: To deploy an Open Liberty workload to an OCP namespace, the following details are needed:

the packaging of the application is a containerized image
the resources which the application code needs to use at runtime are a little different.
Meaning, the application server resources for Open Liberty are defined in the server.xml file, which is already baked in to the image as part of the build.
At the same time, the application might need to use some resources defined in OCP, such as secrets or mappings to endpoints of other services. These OCP resources are defined in yaml files associated with the image.
the CI/CD pipeline has access to deploy the image and associated yaml to the namespace.
This is an automated process (often using Jenkins) which includes a lot of code testing, image verification, and creation and validation of yaml

Here, the workload and application is a containerized image. It arrives to the OCP cluster’s namespace fully baked and ephemeral. This means that the OCP admin does not create new definitions in the Open Liberty server.xml file at runtime. However, those resources might need to point to something. For example, a datasource defined in the server.xml might need to point to an existing DB2 instance, and of course this means a different DB2 instance per each environment (dev, test, production), which can be access via secrets and/or environment variables.

Prior to application deployment:

the application architects should have been communicating with the OCP admins to describe the Service Level Agreements (SLAs) and runtime requirements of the application.
the OCP admins should know the endpoint locations of the target services, such as DB2 or MQ.
In the case of legacy systems, the OCP admins will need to coordinate with the DBAs and/or the MQ admins to ensure that the correct connection information is defined. Alternatively, in the case where the target services are already cloud services (and depending on which cloud), the OCP admins might already know the service endpoint information.
the OCP admins should have been involved in the creation of the yaml for defining OCP resources and service endpoint information.

To deploy the application, the OCP admin could retrieve the image from its repository and deploy it manually. The more likely case is that the image is deployed as an automated step in the pipeline. Regardless, the status of the deployment process can be observed, and ultimately the OCP admins will watch for the pods to get to a Running state.

As per the context of this document, a few things to remember are:

At image deploy time, the OCP admins are not creating any WebSphere things like JVMs or datasources or JMS connection factories in the OCP cluster.
It is no longer necessary to generate plugins for webservers

On the left is a WAS ND cell where “App 1” has been deployed to “Cluster 1”. “Cluster 1” has 3 cluster members, so therefore there are 3 instances of the application. “App 1” uses a datasource “Ds1”, which has been created at cluster scope. “Ds1” points to an instance of DB2. On the right is an OCP cluster. The deployed application has resulted in a single pod, “Pod A”. “Pod A” has not yet been scaled up to have additional instances. “Pod A” contains the Open Liberty container, which includes the server.xml and the datasource, “Ds2” defined there.

There are differences in how and when configuration is handled and in who is responsible for resolving service requirements to actual endpoints. At the end, code and configuration does get deployed, but you can see how much different this becomes in the world of Open Liberty and Openshift.

Example task: scaling workloads

For this example, “scaling workloads” means that there is not enough capacity in the current deployment to handle all requests, and therefore additional instances are desired.

As mentioned earlier in this series, there are a number of ways to scale the WebSphere/Open Liberty workload:

Make the existing JVM bigger
Create additional instances
Create additional nodes in order to create capacity for hosting more instances

What this means to WAS ND: One way to increase capacity is for the WAS admin to increase the maximum JVM heap size for every member of the cluster. Depending on the root problem and assuming that there is available capacity on each of the host VMs, this may be a sufficient action. However, this implies a server restart for all impacted JVMs.

Another approach is to create additional instances. In the case of WAS ND applications, this means creating additional members of the cluster in which the application is deployed.

If the current WebSphere nodes have available capacity, it is possible to create new cluster members on the existing nodes, and generate and propagate the webserver plugin to the IHS.
If the current WebSphere nodes have no additional capacity, or if the customer has made it a standard procedure, then new VMs can be obtained (which may be non-trivial), WAS ND installed, a node profile created and federated to the Dmgr, and finally a cluster member is added to the new node. Generate and propagate the webserver plugin.

Historically, any of these actions were manual and implied a change ticket because most customers have strict record keeping about environment settings.

What this means to OCP: In this case, we are talking about Open Liberty applications, which could mean updating the JVM options in the server.xml file. Fortunately, the setting in the Open Liberty server.xml and the setting used in the deployment config (a.k.a. the container limits) do not necessarily have to battle at runtime. The JVM will determine the heap size based on the container settings.

Whenever you deploy in Kubernetes, you should set container settings for memory requests and memory limit for your containers.  The documentation says these are "optional", but really, these should be set and monitored.  https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/

When we’re talking about scaling in OCP, what is really meant most of the time is to talk about number of instances of a pod. This setting can be increased manually via OCP dashboard or CLI, or can be handled automatically by setting up a Horizontal Pod Autoscaler (HPA). As long as there is available capacity in the worker nodes, and there isn’t some nodeSelector constraint to disallow the pod from certain nodes, then additional instances of a pod can be automatically created and scheduled as needed.

: On the left is a WAS ND cell. It originally had 3 WebSphere nodes, with 1 cluster member on each node. If that cell was at full capacity (meaning none of the nodes had enough capacity to support an additional cluster member) or if the customer’s standard operational procedure dictated that each cluster should have no more than 1 member per node, then one way to increase capacity is to create an additional VM, create a new WebSphere node, federate that node to the Dmgr, and then add a cluster member on the new node. On the right is an OCP cluster with 3 nodes. The deployed application had only 1 pod originally. Because there is available capacity on the other worker nodes, the OCP admin could simply increase the number of instances for the pod, and the scheduler will find a worker node with sufficient capacity onto which an additional pod will be scheduled and started.

Earlier it was mentioned that the WAS ND cell might exhaust capacity on all nodes. Similarly, all available capacity could also be exhausted in an OCP cluster. This could be addressed in a variety of ways:

In OCP clusters that use the Machine API, the cluster autoscaler can be used to automatically adjust the size of the cluster to meet current deployment needs.
For OCP clusters that do not have the cluster autoscaler, we first take a look back at how the cluster was created in the first place. Depending on requirements, it could be as simple as ordering up additional nodes from the service provider. It could also be as detailed as working through additional network requirements, acquiring additional host systems, and setting up the additional target nodes in a very controlled on-premises environment.

In summary for this example, scaling workloads in WAS ND meant adding cluster members, which often resulted in tickets and manual work and ultimately additional nodes. Scaling workloads in OCP usually means creating additional instances of pods, and that could be simplified by creating an HPA to automatically handle it.

Example task: restarting workloads

For the purpose of this task, “restarting workload” means that the application has become unresponsive for some known or unknown reason. Although we all love a good deep dive into a possibly scary root cause analysis, the business is screaming for the application to be available. In this case, we will assume that restarting the application is sufficient for the immediate availability needs.

What this means to WAS ND: “Restarting workload” could mean a lot of things in WAS ND. In the simplest case, the WebSphere admin logs into the WAS ND admin console and stops the application. This should stop the application in all cluster members. The admin then checks the logs of all the cluster members to verify that the app has stopped everywhere. If it has,then it can be restarted.

If stopping the app was not sufficient, and the app remains “hung” in one or more cluster members, then the WebSphere admin starts killing cluster member JVMs. This impacts all applications that are deployed to the same JVMs. After the JVMs have been stopped, the admin will restart those cluster members. In a WebSphere cell that has a smaller total number of clusters and cluster member JVMs, with fewer and smaller applications deployed to each cluster, this may take many minutes. In a WebSphere cell that has a high total number of clusters and cluster member JVMs, and/or with many heavy applications in the cluster, this may take much longer to restart.

In a scenario where an entire node needs to be killed (or the host VM restarted), the node agent must be restarted first, and then the cluster member JVMs on that node can be restarted. However, in cases where there are many JVMs, usually great care must be taken as to avoid trying to restart too many at once. Depending on the situation, this may take a very long time.

In summary, depending on the extent of what needs to be restarted, how many JVMs exist, how heavy are the applications in each of the JVMs, and special procedures, a restart could take anywhere from a few minutes to over an hour.

What this means to OCP: Quite consistently for Open Liberty applications on OCP, restarting the workload means that the OCP admin kills the pod. The OCP scheduler will immediately try to restart the pod, possibly on a different node. This normally takes a few seconds but can take up to a minute or two.

Having to restart an entire node is a somewhat rare occurrence. If the Open Liberty pod is running on a node where some of the other pods on that node are not running well, the OCP admin may restart that worker node. There are some node detection and timeout settings that can impact pod evictions, but overall the node and pods (possibly on different nodes) should be back up fairly quickly (seconds to a few minutes)

Summary of Part 5:

This article has compared a number of tasks performed in a WAS ND environment to the roughly equivalent tasks to care and feed for Open Liberty applications in an OCP environment.

Here is a table of overly condensed key terms from the comparisons made earlier in this post:

Series Conclusion

This brings our series of articles to an end, or at least a temporary end. Throughout the series we have taken a look at the similarities and differences in operations that are experienced as the transition from traditional WebSphere ND to Open Liberty on OpenShift occurs.

The second article did a level up on WAS ND while the third article laid down some basics related to OpenShift (OCP) and Open Liberty and how they work together. From these foundations, Part 4 focused on topologies and considerations for planning OCP topologies that are able to deliver the service levels expected by WAS ND based applications, and sometimes do it more efficiently and with less human intervention. The final article, Part 5, then took a task oriented view, again comparing and contrasting how similar requests are fulfilled in the different environments.

If you’ve made it this far, we expect that you are now thinking about how you can leverage OCP to do what you were doing in WebSphere. We hope you are also thinking about capitalizing on the opportunities that a platform like OCP provides to solve problems that were hard or nearly impossible to solve in WebSphere. We expect that once you get the hang of life on OpenShift, you’ll agree with us that Kubernetes is a platform that is very capable, when it comes to hosting those mission-critical applications which today are evolving forward from traditional WebSphere towards Open Liberty.

The authors would like to thank everyone who bothered to read all 5 parts of this series, and of course the trusted people who reviewed these articles, including: Eric Herness, Greg Hintermeister, John Alcorn, and Ryan Claussen.