Winter is Coming — Expect “SNO(w)” (Single Node Openshift)

Published in

Cloud Pak for Data

10 min readMay 10, 2023

Sachin Prasad — Chief Product Manager, Cloud Pak for Data, IBM
Claus Huempel — Technical Sales Specialist, Cloud Pak for Data, IBM

As I tucked my son into bed and pondered over the title for my next article, I casually mentioned “winter is coming,” only to be met with confusion. “Dad, it’s not even summer,” my son replied. I smiled and explained that it was a metaphor for an “economy slowdown that’s lurking behind the shadows.” However, his young mind was too innocent to fully grasp the concept, and he quickly drifted off to sleep after a long day of playing basketball.

But, its actually a fitting title -

With the economy slowdown, industries and business are winding down their capex/opex in preparation of impending winter that’s quite evident to come. The Era has seen massive investments in AI & Cloud native applications being driven by Kubernetes. Kubernetes is great but it comes with its own set of complexities & cost, primarily because of the baggage of resource overhead that makes it a non-starter for cost conscious businesses. But what if you could simplify that process & just keep the greatness without the resource hog? What if it’s easy to start & maintain and not take a village to deploy it? - — Didn’t I say expect “SNO” ?

Single Node Openshift (SNO) is a highly efficient and cost-effective solution that enables users to run Cloud Pak for Data (CP4D) workloads on a single machine, making it ideal for smaller, fault-tolerant, and short-term data & AI projects. This solution can help organizations streamline their CP4D workflows while minimizing costs, which is crucial for businesses of all sizes in today’s competitive market. In this article, we will delve deeper into SNO and its capabilities and explore how it can benefit organizations in optimizing their Cloud Pak for Data workflows while keeping their expenses under control.

Motivation

Beginning of this year, Product Management was clear to make IBM Data & AI platform more accessible and at a compelling price point. Various measure were taken from Technical and Commercial angle -

IBM announced Cloud Pak for Data Express which was the first step in this direction. Express is bite size offering which opens up a whole new world of options for SMBs to get started with their AI journey.
Support for AWS FSx was announced which provides a robust enterprise grade managed storage better than EFS/EBS when it comes to cost and performance.
CP4D Express footprint has been coming down with various innovations, modernization & streamlining software components to provide a much better posture.

The motivation to bring down the TCO continues and support for Single Node Openshift is another sincere attempt from IBM to provide value to our customers to achieve their goals without breaking the bank.

Official Support for Cloud Pak for Data Certification

Before we start, SNO support is part of CP4D Express roadmap and teams are working diligently to bring the certification in the coming months. Due to some deficiencies of SNO (we will discuss), for now, support will only be available for lower non-production type environments and customers be advised not to deploy SNO based CP4D for production workloads.

We would have a more formal communication when officially SNO tests are complete. I will include the announcement links in the reference section as they become available.

In the meantime, we had some fun and conducted a brief test of SNO + CP4D combo and successfully deployed it using an external NFS storage. This article aims to provide a glimpse of SNO and how it reduces the overall cost of CP4D deployments for certain use cases.

Why and why not Single Node OpenShift

Advantages of Single Node OpenShift

Cost-effective: SNO requires less hardware, resulting in significant cost savings for users. It requires only one virtual machine, while other deployment options, such as a compact cluster, require three virtual machines, and a control plane/worker cluster requires six virtual machines.
Streamlined deployment: SNO has a simpler deployment process compared to other deployment options, making it ideal for users who want to get up and running quickly.
Flexibility: SNO provides a lot of flexibility as users can add or remove components as needed. This is especially useful for users who have varying workloads and want to adjust their deployment accordingly.
Maintainability and Day 2 Ops : Few servers to be maintained, patched & backed up means more time back to Cp4D administrators figuring out other server & workload optimization and automation
Extensibility: One can start with single node and extend by adding more worker nodes.

Disadvantages of Single Node OpenShift

Limited scalability: SNO is not as scalable as other deployment options, and users may experience performance issues if they try to run too many applications on a single node.
Single point of failure: Since SNO runs on a single node, it is more susceptible to downtime in the event of a hardware failure.
Limited capacity: SNO has a limited capacity for storage and memory, and users may need to upgrade their hardware to accommodate larger workloads.

Things to be aware of when using Single Node OpenShift

Hardware requirements: It is important to check the hardware requirements for SNO before deploying it, as insufficient hardware can lead to performance issues.
Security considerations: Since SNO runs on a single node, users need to ensure that their system is secure and properly configured to prevent any security breaches.
Backup and recovery: SNO requires a backup and recovery plan to ensure that data is not lost in the event of a hardware failure or other issues
Workload requirements: It is essential to consider the workload requirements when choosing a deployment option. SNO may not be suitable for larger workloads that require more scalability and capacity.

Use Cases where SNO is a perfect fit

SNO has a drawback of being a single point of failure and while we would discuss the options in our next article on how to make it resilient, lets review what scenarios work best for SNO deployment -

Cloud Pak for Data Express Proof of Concept (PoC) — If there are few services that needs to be deployed for the next demo to secure investments from higher up or business partners planning to provide an instance on-prem for a sale, SNO works out the best.
Cloud Pak for Data Express Dev & Test Environments : As these environments normally have fewer requirements in terms of availability and fail over, running CP4D on SNO is an ideal use case for these purposes. In next article, we would discuss ways to harden the cluster to an extent that it could get closer (if not match) to standard Openshift resiliency
Cloud Pak for Data Express Edge Computing : As the cloud continues to advance, businesses and institutions are adopting edge computing devices to execute their processing tasks. This is done with the aim of cutting costs and gaining more advantages from the infrastructure they have invested in. Single Node Openshift is a perfect solution for this, particularly for far edge setups that are cost-effective. An ETL job converts data from sensors into a format that a Watson Studio Job can analyze to provide valuable insights. Instead of sending raw data, real insights are transmitted periodically, resulting in numerous benefits -

Reducing latency: Edge computing can process heavy compute processes on edge devices, reducing the latency to bring this information.
Reducing bandwidth: Edge computing can reduce the used bandwidth while taking part of the data on the edge devices, reducing the traffic on the network.
Reducing costs: Reducing latency and bandwidth translates to the reduction of operational costs, which is one of the most important benefits of edge computing.
Improving security: Edge computing uses data aggregation and data encryption algorithms to improve the security of data access.

Deployment Model & Resiliency

The most straightforward deployment approach is to use two x86 virtual machines -

“Bastion node (4 vCPU x 32 GB)” — Housing among other usual services, an NFS service backed by a fast SSD disk (separate from the OS disk recommended). This machine is similar to a typical bastion in a Cloud Pak for Data installation. We opted for a 4x32 machine , which is adequate for a regular non-I/O intensive CP4D installation.

“CP4D Express Node (32 vCPU x 128 GB)” — The second machine is the SNO where OCP and CP4D are deployed together. Given the large number of containers, we must account for the ephemeral storage required. Typically, 500GB would suffice, but it can be increased up to 1TB if necessary. There is a minor change [TODO LINK] that SNO needs to be configured to increase the default number of pods from 250 to 500. Additionally, IOPS play a crucial role in the overall performance of Cloud Pak for Data, regardless of whether it is an SNO or a regular 3-control-nodes cluster. Before beginning installations, it is important to perform storage performance tests to ensure that the disks meet expectations. [LINK]

Although the configuration mentioned works well, there are other deployment techniques that can make SNO more resilient. Specifically, optimizations would involve capturing Openshift state such as ETCD and CP4D state via offline backups or using an online backup-restore procedure when a CSI-compliant storage solution such as PX or Fusion is employed. In the absence of enterprise-grade storage, one can resort to traditional backup-restore mechanisms such as disk replications, setting RAID configurations, or even using a pool of storage over LVM. Since this is a single machine, it may even be possible to take a snapshot of the entire machine at the VM level (VMware even provides live snapshots). Resiliency optimizations is an interesting topic, and we hope to cover that soon.

Installed Components, footprint and word of caution

For this exercise, a subset of Cloud Pak for Data 4.6.4 services were installed on an OpenShift 4.12 SNO x86 cluster with their default scaleConfig (small), which is typically sufficient for PoC, demo type environments. The architecture of Cloud Pak for Data optimizes resource consumption by deploying a core set of services that provide common functions such as logging, monitoring, authentication, and day 2 operations to all its consumers. In this article, we would refer these groups of services as the “Platform” . For simplicity, in this article, we would also count the OpenShift base services, CP4D operators, catalog sources, and the CP4D Common Core Services into the Platform groups of services.

Each bundle in Cloud Pak for Data express offers unique value in the Data and AI space and has different CPU/memory requirements (footprint). We attempted to combine a subset of popular services in a way that would optimize hardware requirements while addressing a business use case. For more information on some of the use cases, please refer to articles on Data Fabric and Express (see reference).

The table below summarizes our results and findings -

The above workload footprint is an idle time footprint which means that the system is not being used. For most of data science workload, the consumption increases as number of users start executing workloads such as Jupyter notebooks. Some services such as Cognos,DB2, DMC, Datastage etc do not grow drastically with consumption.

Depending on which services you choose to deploy and how much its expected to grow, you could start anywhere from 32 vCPU to 64 vCPU machine. For instance, first few combinations, you would be good with 32 CPU machines but as you come down the list, the buffer capacity diminishes and in order to account for expansion, you could start with 32(n) +8 = 40 vCPU or even 32(n)+16 = 48 machine.

Total cost of ownership

As we mentioned, its all about the coming “winter” and the rush to bring down the cost to run Data and AI workload on a cloud native environment. Below is $$ comparison of minimum 3 control plane/3 worker CP4D cluster vs Single node Openshift. As you can see cost benefits are anywhere from almost 35% — 50%

Conclusion

In conclusion, with the impending winter of economic slowdown, organizations are looking for more efficient and cost-effective ways to optimize their workflows. Single Node Openshift (SNO) is a highly efficient and cost-effective solution that enables users to run Cloud Pak for Data (CP4D) workloads on a single machine, making it ideal for smaller, fault-tolerant, and short-term data and AI projects. Although it has limitations in scalability, storage, and memory capacity, SNO offers several advantages such as cost savings, streamlined deployment, flexibility, and extensibility. To take full advantage of SNO, it is essential to consider the workload requirements, hardware requirements, security considerations, and backup and recovery plans. SNO can be an ideal solution for proof of concept (PoC) projects, fault-tolerant workloads, and short-term data and AI projects. With its potential to reduce the overall cost of CP4D deployments, SNO is a promising solution that can benefit organizations of all sizes in optimizing their workflows while keeping their expenses under control.