Installing a Highly-Available OpenShift Origin Cluster

6 min readNov 12, 2017

This article proposes a reference architecture for a Highly Available installation of OpenShift Origin. We will outline the architecture of such an installation and walk through the installation process. The intention of this process is to perform iterative installations of the OpenShift Origin cluster.

Preparation…

Provision Servers

01 Master (Openshift Web Console)
01 Infra (Router, Registry, Metrics, EFK)
01 Node (Application node POD)
Centos 7 x86_64 with Minimal Installation.

Each of these servers should be provisioned with an SSH public key which can be used to access all hosts from the Ansible

For next step, you need to install docker service on all nodes (Master, Infra, Node)

# yum install -y docker
# systemctl start docker.service

Docker Storage Setup

During the Provision Servers step of this guide, we provisioned all of our nodes (including the master) with docker volumes attached as /dev/sdb. We’ll now install and configure docker to use that volume for all local docker storage.

Running a large number of containers in production requires a lot of storage space. Additionally, creating and running containers requires the underlying storage drivers to be configured to use the most performant options. The default storage options for Docker-formatted containers vary between the different systems and in some cases they need to be changed. A default installation of RHEL uses loopback devices, whereas RHEL Atomic Host has LVM thin pools created during installation. However, using the loopback option is not recommended for production systems.

Creating a new VG (Master, Infra, Node)

# vgcreate vgdocker /dev/sdb

Editing /etc/sysconfig/docker-storage-setup file and add LV and VG names

# cat /etc/sysconfig/docker-storage-setup
# Edit this file to override any configuration options specified in
# /usr/share/container-storage-setup/container-storage-setup.
#
# For more details refer to “man container-storage-setup”
CONTAINER_THINPOOL=docker
VG=vgdocker

Restarting Docker and Docker Storage Setup Services

# systemctl stop docker docker-storage-setup
# rm -rf /var/lib/docker/*
# systemctl start docker docker-storage-setup

New logical volume will be created automatically. More information, check it out How to use the Device Mapper storage driver.

# lvs
 LV  VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
 root   centos      -wi-ao  — —   <8.00g
 swap   centos      -wi-ao  — —    1.00g
 docker vgdocker    twi-a-t — —    15.91g 0.12 0.10

Preparing the Installer…

OpenShift uses Ansible as it’s installation & configuration manager. As we walk through design decisions, we can start capturing this information in an ini style config file that’s referred to as the ansible inventory file. To start, we’ll establish a project skeleton for storing this file and begin populating information specific to the cluster I did:

More information about each parameter

Advanced Installation - Installing | Installation and Configuration | OpenShift Enterprise 3.1

While RHEL Atomic Host is supported for running containerized OpenShift services, the advanced installation method…

docs.openshift.com

*** All of the hosts in the cluster need to be resolveable via DNS. Additionally if using a control node to serve as the ansible installer it too should be able to resolve all hosts in your cluster.
In an HA cluster there should also be two DNS names for the Load Balanced IP address that points to the 1 master server for access to the API, CLI and Console services. One of these names is the public name that users will use to log into the cluster. The other is an internal name that will be used by internal components within the cluster to talk back to the master. These values should also resolve, and will be placed in the ansible Hosts file for the variables.

$TTL 86400
@ IN SOA xxxx.xxxxxx.xxx. xxxx.xxxxxx.xxx. (
  2017010101; Serial
  3600 ;Refresh
  1800 ;Retry
  604800 ;Expire
  86400 ;Minimun TTL
 )@       IN NS xxxx.xxxxxx.xxx
@       IN A XXX.XXX.XXX.XXX
oc      IN A XXX.XXX.XXX.XXXmaster   IN A XXX.XXX.XXX.XXX
infra    IN A XXX.XXX.XXX.XXX
node1    IN A XXX.XXX.XXX.XXX*.apps.cirrus.io IN A XXX.XXX.XXX.XXX

Preparing for Install

At this point in the process we are ready to prepare our hosts for install. The following sections guide us through this process.

# ansible-playbook -i hosts /usr/share/ansible/openshift-ansible/playbooks/byo/config.yml

The install will run for 15–20 minutes. Good time for a coffee break.

Few minutes after. Voi-là!

Authentication…

For the initial installation we are going to simply use htpasswd for simple authentication and seed it with a couple of sample users to allow us to login to the OpenShift Console and validate the installation.

# htpasswd -c /etc/origin/htpasswd admin
New password:
Re-type new password:
Adding password for user admin

and set cluster-role rights to admin user have access all projects in the cluster.

# oadm policy add-cluster-role-to-user cluster-admin admin
cluster role “cluster-admin” added: “admin”

Done!

By default Project. We are able to see Registry Service already "containerized"

So, you can access by https://registry-console-default.apps.cirrus.io URL provided by Router Service.

Validating an OpenShift Install…

After having gone through the process of building an OpenShift environment

Validate Nodes

# oc get nodes
NAME               STATUS    AGE       VERSION
infra.cirrus.io    Ready     1d        v1.6.1+5115d708d7
master.cirrus.io   Ready     1d        v1.6.1+5115d708d7
node1.cirrus.io    Ready     1d        v1.6.1+5115d708d7

Check the output to ensure that:

All expected hosts (masters and nodes) are listed and show as Ready in the Status field
All masters show as unschedulable
All labels that were listed in the ansible inventory files are accurate

Validate Status of Default Project

The oc status command is helpful to validate that a namespace is in the state that you expect it to be in. This is especially helpful after doing an install to, at a high level, check that all of the supporting services and pods that you expect to exist actually do. At minimum, you should see the following after a successful install:

An example of a healthy output might look like:

# oc status
In project default on server https://master.cirrus.iohttps://docker-registry-default.apps.cirrus.io (passthrough) (svc/docker-registry)
  dc/docker-registry deploys docker.io/openshift/origin-docker-registry:v3.6.1
    deployment #1 deployed 26 hours ago - 1 podsvc/kubernetes - 172.30.0.1 ports 443, 53->8053, 53->8053https://registry-console-default.apps.cirrus.io (passthrough) (svc/registry-console)
  dc/registry-console deploys docker.io/cockpit/kubernetes:latest
    deployment #1 deployed 26 hours ago - 1 podsvc/router - 172.30.50.237 ports 80, 443, 1936
  dc/router deploys docker.io/openshift/origin-haproxy-router:v3.6.1
    deployment #1 deployed 26 hours ago - 1 podView details with 'oc describe <resource>/<name>' or list everything with 'oc get all'.

Run Diagnostics

OpenShift provides an additional CLI tool that can perform more fine grained diagnostics, including validating that services can see each other, than certificates are valid, and much more. The output of a diagnostics run can be quite verbose, but will include a final report of Errors and Warnings at the end. If there are errors or warnings, you may want to go back to them and validate that they are not errors and warnings for any critical services.

Not all errors or warnings warrant action. The diagnostics check will additionally examine all deployed services and report anything out of the ordinary. This could include apps that may have been misconfigured by a developer, and would not necessarily warrant administrative intervention.

# oadm diagnostics

Uninstalling OpenShift Origin

You can uninstall OpenShift Origin hosts in your cluster by running the uninstall.yml playbook. This playbook deletes OpenShift Origin content installed by Ansible, including:

Configuration
Containers
Default templates and image streams
Images
RPM packages

The playbook will delete content for any hosts defined in the inventory file that you specify when running the playbook. If you want to uninstall OpenShift Origin across all hosts in your cluster, run the playbook using the inventory file you used when installing OpenShift Origin initially or ran most recently:

# ansible-playbook -i hosts /usr/share/ansible/openshift-ansible/playbooks/adhoc/uninstall.yml

Enjoy your Cluster :)