Deploying Spectrum Scale with Ansible - Part1 - Singel Node

Published in

Possimpible

12 min readSep 3, 2021

How to deploy a clustered filesystem with Redhat Ansible.

IBM Spectrum Scale and Redhat Ansible Logo

It’s almost 3. years since I started deploying IBM Spectrum Scale with Ansible, This was in combination with a project where we should deploy different application stacks that saves the data against Spectrum Scales filesystem.

In the mentioned project, everything should be written as code (also the data 😄) the process should always be versioned and repeatable.

Luckily (For us) Christian Achim had already published an Ansible Repo for the Core Spectrum Scale, so we started building on top of that with new Ansible roles like Scale GUI and CES (Protocol), and integrate it to Hasicorp Consul, Vault and Terraform.

Since then, IBM development has taken over and created feature-rich Ansible Roles for Spectrum Scale, this includes multiple roles for installing and configuring IBM Spectrum Scale (GPFS). All of the roles are publicly available on GitHub ibm-spectrum-scale-install-infra.

So what is IBM Spectrum Scale?

For people that don’t know what Spectrum Scale is, we can start to describe what Spectrum Scale is, as mention on IBMs sites:

IBM Spectrum® Scale is a clustered file system that provides concurrent access to a single file system or set of file systems from multiple nodes. The nodes can be SAN attached, network attached, a mixture of SAN attached, and network attached, or in a shared-nothing cluster configuration. This enables high performance access to this common set of data to support a scale-out solution or to provide a high availability platform.

So it’s a cluster filesystem but it also has much more functions like Protocols, Replication, snapshot, DR, HDFS, Encryption, Compression, FileAudit function, just to mention some of them.

In part one :1⃣

In this part, I will go through how you can deploy a simple one-node cluster with the IBM Spectrum Scale Ansible roles.

In short, a Linux server that we will install and configure Spectrum Scale Core, Management GUI, Performance Sensors. Filesystem from a local disk.

And then we will build on this later in the next part. example add extra nodes.

1. What do we need: ✅

One or more virtual or physical servers, that could be ppc64le, x86, Z.
Supported version Operation Systems, I will use RHEL 8.4
- Support for RHEL 7 on x86_64, PPC64 and PPC64LE
- Support for RHEL 8 on x86_64 and PPC64LE
- Support for UBUNTU 20 on x86_64 and PPC64LE
- Support for SLES 15 on x86_64 and PPC64LE
One or more block/storage devices on the Linux host (example /dev/sd*)
Package repo for the Scale node.
IBM Spectrum Scale packages.
A Developer Edition Free Trial is available at this site
Ansible control node, This can be a Linux or Mac computer
Ansible 2.9.x installed on the controller node. (See point 2)

2. First Scale Node.

One VM with Rhel 8.4, named lab-scale1
OS disk is 85Gb.
One network card and IP. (We can install daemon on specific nic’s and IP)
Register it against a package repo (most of the ansible roles need this)
I use RedHat Subscription Manager.
One extra 25Gb disk for the filesystem FS1.
PS: There are some support limitations when using Datastore disk.
For example from VMware, we need to use an RDM disk.
However, for testing this is not necessary.)

[root@lab-scale1 ~]# lsblk
NAME              MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                 8:0    0   80G  0 disk
sdb                 8:16   0   25G  0 disk

Create an ssh-key if it’s not already done.
$ ssh-keygen Press enter several times for default.
SELinux can cause problems, so set it to be permissive or disabled if you don't “need” it.

sestatus
setenforce 0
sestatusChange also the SELINUX= to permissive in /etc/selinux/configvi /etc/selinux/config
SELINUX=permissive set in /etc/selinux/config

Check that Scale node can ping itself on hostname, if not add the local IP in /etc/hosts.

[root@lab-scale1 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain610.33.3.126 lab-scale1 lab-scale1.oslo.forum.ibm.com

3. Install and configure Ansible on your computer/control node. 🖥

To install Ansible on your control node, this could be a host, docker or your local computer, there are many guides to install Ansible out there so just google Ansible install and your OS.
Installing Ansible — Ansible Documentation

For CentOS/Red Hat Enterprise Linux:

To enable the Ansible Engine repository for RHEL 8, run the following command:
$ sudo subscription-manger repos --enable ansible-2.9-for-rhel-8-x86_64-rpm $ sudo yum install ansible
Or PiP
$ curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py $ python get-pip.py $ pip3 install ansible==2.9

Mac:
The preferred way to install Ansible on a Mac is via pip
or use Brew if you have that installed.

Passwordless SSH between ansible controller and scale node

Check that DNS is working. if not add it to local /etc/hosts or DNS

# ping lab-scale1
PING lab-scale1 (10.33.3.126) 56(84) bytes of data.
4 bytes from lab-scale1 (10.33.3.126): icmp_seq=1 ttl=64 time=0.521 ms

Copy over ssh keys to passwordless ssh to scale node.
$ ssh-copy-id lab-scale1
Then check that you can SSH to the node and authorize the key.
$ ssh lab-scale1

4. Download the Spectrum Scale packages. 📦

A Developer Edition Free Trial is available at this site

We have 4 different ways the ansible role can grab the binary when installing.

Installation from (existing) YUM repository
(see samples/playbook_repository.yml)
Installation from remote installation package
(see samples/playbook_remotepkg.yml)
Installation from local installation package
(see samples/playbook_localpkg.yml)
Installation from single directory package path
(see samples/playbook_directory.yml)

In part one: I will use the Installation from the local archive installation method. This means that the self-extracting archive containing the Spectrum Scale code is accessible on the Ansible control machine running the playbook.

The integrity of the archive will be validated by comparing checksums with a *.md5 reference file (if present), the archive will be copied to each managed node in your cluster (‘scale_install_localpkg_tmpdir_path’), and subsequently, the archive will be extracted. Packages will then be installed from the local files on each node.

Unzip the Spectrum Scale Binary file on the Ansible Controller to the desired location.

# unzip Spectrum_Scale_Developer-5.1.1.1-x86_64-Linux.zip
Archive: Spectrum_Scale_Developer-5.1.1.1-x86_64-Linux.zip
 inflating: Spectrum_Scale_Developer-5.1.1.1-x86_64-Linux.README
 inflating: Spectrum_Scale_Developer-5.1.1.1-x86_64-Linux-install
 inflating: Spectrum_Scale_Developer-5.1.1.1-x86_64-Linux-  install.md5
 inflating: SpectrumScale_public_key.pgp# ls 
Spectrum_Scale_Developer-5.1.1.1-x86_64-Linux-install
Spectrum_Scale_Developer-5.1.1.1-x86_64-Linux-install.md5
Spectrum_Scale_Developer-5.1.1.1-x86_64-Linux.README# pwd
/root/spectrum-scale-binary

5. Clone down the IBM Spectrum Scale Roles from Git to your ansible host.

$ dnf install git -y
$ git clone https://github.com/IBM/ibm-spectrum-scale-install-infra.git

Change working directory
There are different methods for accessing the roles provided by this project. You can either change your working directory to the cloned repository and create your own files inside this directory (optionally copying examples from the samples/ subdirectory):

$ cd ibm-spectrum-scale-install-infra/

Alternatively, you can define an Ansible environment variable to make the roles accessible in any external project directory:

$ export ANSIBLE_ROLES_PATH=$(pwd)/ibm-spectrum-scale-install-infra/roles/

6. Create Ansible inventory

Define Spectrum Scale nodes in the Ansible inventory (e.g. ./hosts) in the following format:

vi /root/ibm-spectrum-scale-install-infra/hosts

The host inventory is just a minimal example, this method defines Ansible variables directly in the inventory.
There are other ways to define variables, such as host variables and group variables.

In this inventory file we have included some variables that are specific for this host:

scale_cluster_quorum=true
If you don’t specify any quorum nodes then the first seven hosts in your
inventory will automatically be assigned the quorum role, even if this
variable is ‘false’
scale_cluster_manager=true
You’ll likely want to define per-node roles in your inventory.
default is false.
scale_cluster_gui=true
Chose what node you want the GUI to be installed on and should be running on.
You’ll likely want to define per-node roles in your inventory

To test the SSH connection we can run the Ansible ping command:

$ ansible all -i hosts -m pinglab-scale1 | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/libexec/platform-python"
    },
    "changed": false,
    "ping": "pong"
}

7. Create Ansible playbook

There are sample playbooks as mention in the repo: samples

A basic Spectrum Scale Ansible playbook (e.g. ./scale-playbook.yml) looks as follows:

Download a copy of the Playbook:

wget https://gist.githubusercontent.com/olemyk/9a44b564b173ddd854da355c4e830076/raw/8edbde8d7ec373b540ff1e053f02337426425d6c/scale-playbook.yml

There are different ways to include variables in an ansible play.
One way is to add this in the ansible host file, if there are multiple variables and lines this can get messy, so one way is to create a hosts_vars folder and then create the name of the host you want to parse the variables to.
For group vars, you could create a group_vars folder with a file called “all” that is included in the run, more on this later.

In this playbook we will include some variables that are for the cluster:

Variables:

hosts: cluster1
This will point to the group in ansible inventory
scale_install_localpkg_path:
Points to the spectrum scale binary on the Ansible controller
Change this to your path
scale_cluster_clustername: scale-test
Name of the spectrum scale cluster.
scale_prepare_exchange_keys: true
Exchange SSH keys between all nodes. even if we have only one node now, Spectrum scale is talking with itself over ssh.
scale_prepare_disable_ssh_hostkeycheck: true
disable SSH host key checking (StrictHostKeyChecking).
pre_tasks:
In phase one we will use pretask to import the filesystem information:
include_vars: storage_vars.yml

Ansible Roles:

We are including some roles base on what we want to include.
the roles are divided into precheck, node, cluster, postcheck. You should normally always include all of them.

CORE is the base installation of Spectrum Scale.
GUI as the name is saying, installation of the Management GUI
ZIMON is the performance monitoring tool that collects metrics from GPFS and protocols and provides performance information.
pmsensor is normally installed on all nodes so that the pmcollector nodes, can gather information. pmcollector is installed on GUI nodes automatically.

include_vars: storage_vars.yml

One part that is important for spectrum scale, is that the scale_storage variable must be define using group variables. Do not define disk
parameters using host variables or inline variables in your playbook. Doing so would apply them to all hosts in the group/play, thus defining the same disk multiple times.
For the variables for scale_storage: there are some parameters that are mandatory and some are optional as there are predefined values:
- filesystem: parameter is mandatory
- Under the disks parameter in each filesystem, it’s mandatory to fill in device and servers
- All other filesystem and disk parameters are optional.
PS: In the example, we have the data and metadata on the same disk and the pool is system.

Download a copy of the storage_vars.yml and change

wget https://gist.githubusercontent.com/olemyk/89705764c7afe5bec1aa72cded5512fb/raw/3e62e63cf61adbc36252c157c56469c465821a3a/storage_vars.yml

8. Run the playbook 🏃🚧

Run the playbook to install and configure the Spectrum Scale cluster

To run the Ansible playbooks use the command: ansible-playbook
Using the ansible-playbook command:

ansible-playbook -i hosts scale-playbook.yml

Playbook execution screen.

[root@lab-scale-controller ibm-spectrum-scale-install-infra]# ansible-playbook -i hosts scale-playbook.ymlPLAY [cluster1] *******************TASK [Gathering Facts] ********************************************************************
ok: [lab-scale1]
TASK [include_vars] ******************************************************************
ok: [lab-scale1]
TASK [common : check | Stat GPFS base package] ********************************************************************
skipping: [lab-scale1]

Recording from Ansible run:

When the playbook is done, you should hopefully have zero on failed=

PLAY RECAP ****************************************************************
lab-scale1 : ok=182 changed=30 unreachable=0 failed=0 skipped=214 rescued=0 ignored=0

9. Spectrum Scale GUI

Now we can point our web browser against the IP or hostname of the Scale Node. example: https://lab-scale1

You should be hit with the screen saying: Welcome to IBM Spectrum Scale and you need to create a new GUI user.

So let us create a user and set a password for the GUI.

[root@lab-scale1 ~]# /usr/lpp/mmfs/gui/cli/mkuser admin -g SecurityAdminEFSSG1007A Enter password for User :
EFSSG0225I Repeat the password:
EFSSG0019I The user admin has been successfully created.
EFSSG1000I The command completed successfully.

After creating your user, refresh the web browser and log in with your newly created GUI user.

Then you should now have a Spectrum Scale cluster 👊👏 🏁

10: Spectrum Scale commands 🖥

To run Spectrum Scale command without running a full string, you can export this path to your user with:

export PATH=$PATH:/usr/lpp/mmfs/bin

To get this in your user:
Create a file called: /etc/profile.d/scale.sh with the content:

$ vi /etc/profile.d/scale.sh

PATH=$PATH:/usr/lpp/mmfs/bin
export PATH

Run: $ source /etc/profile.d/scale.sh to update your running environment or login in and out on your terminal.

FYI: Almost all of the spectrum scale commands start with “mm”
(Reference from the original product, multimedia.)

Some basic Spectrum Scale commands to know of.

To give your information about your cluster, run mmlscluster
this will show you information about cluster name, and members, roles.

[root@lab-scale1 ~]# mmlsclusterGPFS cluster information
========================
  GPFS cluster name:         scale-test.oslo.forum.com
  GPFS cluster id:           7171363335925894970
  GPFS UID domain:           scale-test.oslo.forum.com
  Remote shell command:      /usr/bin/ssh
  Remote file copy command:  /usr/bin/scp
  Repository type:           CCRNode  Daemon node name  IP address   Admin node name              Designation
--------------------------------------------------------------------
   1   lab-scale1.oslo.forum.com  10.33.3.126  lab scale1.oslo.forum.com  quorum-manager-perfmon

To get the status of each node in the cluster. you can run mmgetstate -a
-a will list all nodes. In our case, we have only one node, and it should be in active state.

[root@lab-scale1 ~]# mmgetstate -aNode number  Node name        GPFS state
-------------------------------------------
       1      lab-scale1       active

There is also a handy command that checks the status of each component for Spectrum scale cluster. mmhealth

Like the mmhealth cluster show give you a brief overview of if all is good or there is something wrong in your cluster.

[root@lab-scale1 ~]# mmhealth cluster showComponent           Total         Failed       Degraded        Healthy          Other
--------------------------------------------------------------------
NODE                    1              0              0              0              1
GPFS                    1              0              0              0              1
NETWORK                 1              0              0              1              0
FILESYSTEM              1              0              0              1              0
DISK                    1              0              0              1              0
FILESYSMGR              1              0              0              1              0
GUI                     1              0              0              1              0
PERFMON                 1              0              0              1              0
THRESHOLD               1              0              0              1              0

To drill down more on each component you can specify example node:
mmhealth cluster show node

[root@lab-scale1 ~]# mmhealth cluster show nodeComponent     Node                              Status            Reasons
-------------------------------------------------------------
NODE          lab-scale1.oslo.forum.com     TIPS          callhome_not_enabled,gpfs_pagepool_small

The filesystem we created should be mounted under /gpfs/fs1

[root@lab-scale1 ~]# df -h /gpfs/fs1
Filesystem      Size  Used Avail Use% Mounted on
gpfs01           25G  872M   25G   4% /gpfs/fs1

Check also out:
mmlsfs gpfs01 mmlsnsd

In the next part! ⏭

So in this part, we deployed a simple single node cluster to get us started!
So for the next part, we will look into how we will look into adding nodes, storage, config, parameters and maybe also how to deploy Protocol function SMB/NFS. stay tuned 👋

Troubleshooting: ❓ 😤

If your task stopes here: TASK [core/cluster : cluster | Create new cluster]

You can enable verbose mode. -vvv and rerun your playbook
In my case this has been caused by ssh_hostkeycheck, The scale nodes should be able to ssh to themselves.

Start daemons

And then I also experience that the cluster didn't want to start with the mmstartup. It was the gpfs demon who didn't want to start. mmfsenv did give us some clue, and then Jan-Frode Myklebust tips me about the Secure boot in VMware, need to be disabled.

 TASK [core/cluster : cluster | Start daemons] 
changed: [lab-scale1 -> lab-scale1]RUNNING HANDLER [core/cluster : wait-daemon-active] *************************************************************
FAILED — RETRYING: wait-daemon-active (10 retries left).
FAILED — RETRYING: wait-daemon-active (9 retries left).
FAILED — RETRYING: wait-daemon-active (8 retries left).FAILED — RETRYING: wait-daemon-active (2 retries left).
FAILED — RETRYING: wait-daemon-active (1 retries left).fatal: [lab-scale1 -> lab-scale1]: FAILED! => {“attempts”: 10, “changed”: false, “cmd”: “/usr/lpp/mmfs/bin/mmgetstate -N lab-scale1 -Y | grep -v HEADER | cut -d ‘:’ -f 9”, “delta”: “0:00:01.426860”, “end”: “2021–08–19 18:42:44.656040”, “rc”: 0, “start”: “2021–08–19 18:42:43.229180”, “stderr”: “”, “stderr_lines”: [], “stdout”: “down”, “stdout_lines”: [“down”]}[root@lab-scale1 ~]# mmfsenv
Unloading modules from /lib/modules/4.18.0-305.12.1.el8_4.x86_64/extra
Loading modules from /lib/modules/4.18.0-305.12.1.el8_4.x86_64/extra
insmod: ERROR: could not insert module /lib/modules/4.18.0-305.12.1.el8_4.x86_64/extra/tracedev.ko: Required key not available

Shutdown VM and uncheck the Secure Boot