Deploying Spectrum Scale with Ansible-Part2: Multiple Nodes

Ole Kristian Myklebust
Possimpible
Published in
16 min readFeb 25, 2022

How to deploy a clustered filesystem with Redhat Ansible.

IBM Spectrum Scale and Redhat Ansible Logo

So in Part1, we learned how to use the Spectrum Scale Ansible role to deploy Singel node Cluster.

In this part, I will go through how you can deploy a Multi-node cluster with Filesystem replication using IBM Spectrum Scale Ansible roles.

FileSystems Info

What we need to think more about now, is how the filesystem is supported by block devices/disks. To simplify this, I will try to explain some of the different ways to create a filesystem. (There a several more options and other options like Erasure code edition ECE, GNR/ESS just to mention some.)

  • NSD: IBM Spectrum Scale will use whatever block device that you provide it. example sdX, dmX and so on. this is equal to what Scale Calls a Network Shared Disk (NSD) after it’s created. (There is some support statement here, so check that before you go bananas.) And then the filesystem you create is supported by 1 or more of these NSDs.
  • NSD: Metadata and Data: We have also the option to configure specific NSDs to just contain MetadataOnly, dataOnly and dataAndMetadata.
    More information and options here:

    This is was more normal before and when you are using different tieres.
    If you have only SSD then there is not much reason to divide this. The main point is any way to place metadata on the fast disk in the system.
  • NSD: Failure Group, we will showcase this little more later.
    The failure Group is used in the combination when Spectrum Scale is doing the replication, So say that we want 2 replicas, and we want to have the data on 2 different places, we need 2 failure groups, so say we have an identical number of NSDs in failure Group 10 and 20. Then we can lose one failure group without loss of access to the data.
    One failure group can be a site or data center.
  • Pools: NSDs can be attached to different pools under the filesystem, this is can be used with the ILM Policy, example files that are older than 30 days or have not been accessed, are placed into the pool: slowdisk.
    default: data would be placed in dataOnly pool (if it exists), if not placed into system-pool på dataAndMetadata disker
  • Filesystem is created based on the NSD and pool info.
    Default it will use blockSize of 4M and with 8 KB subblocks.
    The part with replication is also done on the filesystem, so if you want multiple copies of the data there is also an option to specify this on MetaData and Data.
    defaultMetadataReplicas: x
    defaultDataReplicas: x

FYI: when a server has NSDs attached, those servers are then called NSD servers or Storage Servers.

  • Shared disk.
    - Same block device/disk that is attached to one or more NSD Servers.
    - This could be FC/SAN-attached. So when one node writes to the filesystem this is written directly from that node to the shared disk.
  • Local disk with Scale Replication. 🔁
    This part is a rabbit hole🐇. so I will not go in-depth in all of this.

    In short: with scale replication, we can tell the file system to write all data in multiple copies, one copy to each failure group. We can use that to distribute data copies to internal disks in several servers and implement a file system that isn’t dependent on any single server.
  • - When it comes to replication, we have multiple options, one important thing to consider is how many replicas of the data you could and want to have, (this can be set individually for metadata and data)

    - So if we have three servers, then we need a minimum of three disks of the same size attached, so we can replicate the data between the disks.

    - In this case, we need 3 Failure Groups per disk,
    ( One failure group can contain several disks)
    - one or more Local Block Device/disk that is attached to each server.

Just to give an example of how this could look like, with local disk and replication.

Quorum and TieBreaker

As with all clustered systems, we need to have some technique to says who is responsible here and who should continue server data.
So that's why Quorum or TieBreaker is normally used.

  • Quorum: So for Spectrum Scale, we can assign Quorum role to each node. There should be an odd number of quorum nodes. 1, 3, 5 or 7. Ansible role will do this for the first 7 nodes if we don't specify this in host var.
    We have also the possibility to use a specific disk descriptor.
  • Tiebreaker:
    If the servers have access to the same disk, we can define one to three disks as tiebreaker disks. When we have tiebreakerdisks, we can use an even number of nodes for the quorum function... for more info check out this page

1. What do we need: ✅

  • One or more virtual or physical servers, that could be ppc64le, x86, Z.
  • Supported version Operation Systems, I will use RHEL 8.5
    - Support for RHEL 7 on x86_64, PPC64 and PPC64LE
    - Support for RHEL 8 on x86_64 and PPC64LE
    - Support for UBUNTU 20 on x86_64 and PPC64LE
    - Support for SLES 15 on x86_64 and PPC64LE
  • One or more block/storage devices on the Linux host (example /dev/sd*)
  • Package repo for the Spectrum Scale node.
  • IBM Spectrum Scale packages.
    A Developer Edition Free Trial is available at this site
  • Ansible control node, This can be a Linux or Mac computer
  • Ansible 2.9.x installed on the controller node. (See point 2)

2. Tre times Scale Nodes. 💻

  • Tre VM with Rhel 8.5, named lab-scale1 to 3
    Tre servers are to be able to have quorum, so you are possibly to shut down one node.
  • I will use an OS disk that is 85Gb.
  • One network card and IP. (We can install daemon on specific nic’s and IP)
  • Register it against a package repo (most of the ansible roles need this)
    I use RedHat Subscription Manager.
  • One extra 25Gb disk for the filesystem FS1
    This disk is for each server, this can be a shared disk but then you should not add them in different failure groups and replication to 3 as we do later in the article.
  • PS: There are some support limitations when using Datastore disk.
    For example from VMware, we need to use an RDM disk.
    However, for testing this is not necessary.)
[root@scale1 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 80G 0 disk
sdb 8:16 0 25G 0 disk
  • Create an ssh-key if it’s not already done.
    $ ssh-keygen Press enter several times for default.
  • SELinux can cause problems, (Need to this it more)
    so set it to be permissive or disabled if you don’t “need” it.
sestatus
setenforce 0
Change also the SELINUX= to permissive in /etc/selinux/config
vi /etc/selinux/config
SELINUX=permissive set in /etc/selinux/config
  • Check that Scale node can ping itself on hostname with DNS, if not add the local IP in /etc/hosts. (it’s always a little safer to add this also in /etc/hosts so if there are issues with DNS, the cluster will still operate,)
[root@scale1 ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.33.3.100 scale1 scale1.oslo.forum.ibm.com
10.33.3.101 scale2 scale2.oslo.forum.ibm.com
10.33.3.102 scale3 scale3.oslo.forum.ibm.com

3. Install and configure Ansible on your computer/control node. 🖥

To install Ansible on your control node, this could be a host, docker or your local computer, there are many guides to install Ansible out there so just google Ansible install and your OS.
Installing Ansible — Ansible Documentation

For CentOS/Red Hat Enterprise Linux:

  • To enable the Ansible Engine repository for RHEL 8, run the following command:
    $ sudo subscription-manger repos --enable ansible-2.9-for-rhel-8-x86_64-rpm
    $ sudo yum install ansible
  • Or PiP
    $ curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
    $ python get-pip.py
    $ pip3 install ansible==2.9

Mac:
The preferred way to install Ansible on a Mac is via pip
or use Brew if you have that installed.

Passwordless SSH between ansible controller and Spectrum Scale node

  • Check that DNS is working on all of your nodes. if not add it to local /etc/hosts or DNS
# ping lab-scale1
PING lab-scale1 (10.33.3.126) 56(84) bytes of data.
4 bytes from lab-scale1 (10.33.3.126): icmp_seq=1 ttl=64 time=0.521 ms
  • Copy over ssh keys to passwordless ssh to scale nodes.
    $ ssh-copy-id lab-scale1
  • Then check that you can SSH to the node and authorize the key.
    $ ssh lab-scale1

4. Download the Spectrum Scale packages. 📦

A Developer Edition Free Trial is available at this site

Remember to download the latest version of Spectrum Scale to support the latest Operation Systems. example rhel 8.5 is supported with 5.1.2.2 of Spectrum Scale. Check out the FAQ here

We have 4 different ways the ansible role can grab the binary when installing.

In part one: I will use the Installation from the local archive installation method. This means that the self-extracting archive containing the Spectrum Scale code is accessible on the Ansible control machine running the playbook.

The integrity of the archive will be validated by comparing checksums with a *.md5 reference file (if present), the archive will be copied to each managed node in your cluster (‘scale_install_localpkg_tmpdir_path’), and subsequently, the archive will be extracted. Packages will then be installed from the local files on each node.

  • Unzip the Spectrum Scale Binary file on the Ansible Controller to the desired location.
# cd /root/spectrum-scale-binary# unzip Spectrum_Scale_Developer-5.1.2.2-x86_64-Linux.zip
Archive: Spectrum_Scale_Developer-5.1.2.2-x86_64-Linux.zip
inflating: Spectrum_Scale_Developer-5.1.2.2-x86_64-Linux.README
inflating: Spectrum_Scale_Developer-5.1.2.2-x86_64-Linux-install
extracting: Spectrum_Scale_Developer-5.1.2.2-x86_64-Linux-install.md5
inflating: SpectrumScale_public_key.pgp
# ls
Spectrum_Scale_Developer-5.1.2.2-x86_64-Linux-install
Spectrum_Scale_Developer-5.1.2.2-x86_64-Linux-install.md5
Spectrum_Scale_Developer-5.1.2.2-x86_64-Linux.README#
# pwd
/root/spectrum-scale-binary

5. Clone down the IBM Spectrum Scale Roles from Git to your ansible host.

We are working on creating a Galaxy repo, until that is ready we will download the Ansible Role from GitHub.
And to be sure we are working in tested versions, we normally want to use the latest release/tag. if not changing the version, you will be working in the GitHub Master branch.

Create project directory on Ansible control node
The preferred way of accessing the roles provided by this project is by placing them inside the collections/ansible_collections/ibm/spectrum_scale directory of your project, adjacent to your Ansible playbook. Simply clone the repository to the correct path:

# dnf install git -y# mkdir -p /root/ibm-spectrum-scale-install# cd ibm-spectrum-scale-install
# git clone https://github.com/IBM/ibm-spectrum-scale-install-infra.git collections/ansible_collections/ibm/spectrum_scale
# cd collections/ansible_collections/ibm/spectrum_scale
# git describe --tags
v2.1.3-60-g5839d3f
# git checkout v2.1.3-60-g5839d3f# cd /root/ibm-spectrum-scale-install/

Be sure to clone the project under the correct subdirectory:

ibm-spectrum-scale-install/
├── collections/
│ └── ansible_collections/
│ └── ibm/
│ └── spectrum_scale/
│ └── ...
├── hosts
└── playbook.yml

Important is part is that when now we are working on going over to ansible collection, we will use ibm.spectrum_scale as a collection in the playbook-

6. Create Ansible inventory

Define Spectrum Scale nodes in the Ansible inventory (e.g. ./hosts) in the following format:

vi /root/ibm-spectrum-scale-install/hosts

FYI: Let us just deploy on two servers first, and then add the third node later.

The host inventory is just a minimal example, this method defines Ansible variables directly in the inventory.
There are other ways to define variables, such as host variables and group variables.

In this inventory file we have included some variables that are specific for this host:

  • scale_cluster_quorum=true
    If you don’t specify any quorum nodes then the first seven hosts in your
    inventory will automatically be assigned the quorum role, even if this variable is ‘false’
  • scale_cluster_manager=true
    You’ll likely want to define per-node roles in your inventory.
    default is false.
  • scale_cluster_gui=true
    Chose what node you want the GUI to be installed on and should be running on.
    You’ll likely want to define per-node roles in your inventory

To test the SSH connection we can run the Ansible ping command:

$ ansible all -i hosts -m pingscale2 | SUCCESS => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/libexec/platform-python"
},
"changed": false,
"ping": "pong"
}
scale1 | SUCCESS => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/libexec/platform-python"
},
"changed": false,
"ping": "pong"
}

7. Create Ansible playbook ✍️

There are sample playbooks as mentioned in the repo: samples

We are working on creating one single document that has all the variables documented, atm this is in the GitHub nextgen branch. Variables.md

And as mentioned, we have the sample folder, that includes sample playbooks and examples vars files

Playbook
A basic Spectrum Scale Ansible playbook (e.g. ./scale-playbook.yml) looks as following:

git clone https://gist.github.com/e485836ad5db6613b694b2dc3c67f06a.git

There are different ways to include variables in an ansible play.
One way is to add this in the ansible host file, if there are multiple variables and lines this can get messy, so one way is to create a hosts_vars folder and then create the name of the host you want to parse the variables to.
For group vars, you could create a group_vars folder with a file called “all” that is included in the run, more on this later.

In this playbook, we will include some variables that are group/cluster variables and not host vars

Variables:

  • hosts: cluster1
    This will point to the group in ansible inventory
  • scale_install_localpkg_path:
    Points to the spectrum scale binary on the Ansible controller
    Change this to your path
  • scale_cluster_clustername: scale-test
    Name of the spectrum scale cluster.
  • scale_prepare_exchange_keys: true
    Exchange SSH keys between all nodes. even if we have only one node now, Spectrum scale is talking with itself over ssh.
  • scale_prepare_disable_ssh_hostkeycheck: true
    disable SSH host key checking (StrictHostKeyChecking).
  • scale_gui_admin_user: “admin”
    Specify a name for the admin user to be created.
  • scale_gui_admin_password: “Admin@GUI”
    Password to be set on the admin user
  • scale_gui_admin_role: “SecurityAdmin,SystemAdmin”
    Role access for the admin user, check IBM doc for valid roles.
  • pre_tasks:
    Also In part two, we will also use pretask to import the filesystem information:
    include_vars: storage_vars.yml

Ansible Roles:

We are including some Ansible Spectrum Scale roles based on what we want to include in the Ansible play.
The roles are divided into */precheck, */node, */cluster, */postcheck.
You should normally always include all of them. when for example adding GUI/*

  • CORE is the base installation of Spectrum Scale.
  • GUI as the name is saying, installation of the Management GUI
  • ZIMON is the performance monitoring tool that collects metrics from GPFS and protocols and provides performance information.
    pmsensor is normally installed on all nodes so that the pmcollector nodes, can gather information. pmcollector is installed on GUI nodes automatically.

include_vars: storage_vars.yml

  • One part that is important, is that the scale_storage variable must be defined using group variables. Do not define disk
    parameters using host variables or inline variables in your playbook. Doing so would apply them to all hosts in the group/play, thus defining the same disk multiple times.
  • For the variables for scale_storage: there are some parameters that are mandatory and some are optional as there are predefined values:
    - filesystem: parameter is mandatory
    - Under the disks parameter in each filesystem, it’s mandatory to fill in device and servers
    - All other filesystem and disk parameters are optional. (but maybe not what you want it…)
  • In this example: Local disk with replication

    - Adjust the default Replicas to 2, as we have 2 failure groups. and the. max to 3, as we will add more servers/failure Groups.
    (In production this needs to be planned better)
    - The disk/NSDs is local for each server: servers: scale*
    - Data and metadata is on the same disk: Usage: dataAndMetadata
    - All disks/NSDs are used for the: pool: system.
    - We dived all the NSD/Disks that are local for each server in the same failure group:
    -

This is only to showcase what is possible with the Ansible roles,
So please plan and investigate the Nodes and Storage options before running this in production..

Download a copy of the storage_vars2.yml and change content to your environment. and verify the name of the file with your playbook.

wget https://gist.githubusercontent.com/olemyk/0ee2e0f851693a43a658a319abce653b/raw/20611ab41628dc99c8db1a667d4015bbdd8e4803/storage_vars2.yml

8. Run the playbook 🏃🚧

Run the playbook to install and configure the Spectrum Scale cluster

To run the Ansible playbooks use the command: ansible-playbook
Using the ansible-playbook command:

ansible-playbook -i hosts scale_playbook_part2.yml

When the play is done, you should hopefully have zero on failed=

PLAY RECAP ********************************************************************
scale1 : ok=189 changed=33 unreachable=0 failed=0 skipped=216 rescued=0 ignored=0
scale2 : ok=100 changed=13 unreachable=0 failed=0 skipped=91 rescued=0 ignored=0

9. Spectrum Scale GUI

Now we can point our web browser against the IP or hostname of the Scale Node. example: https://scale1

You should be hit with the screen saying: Welcome to IBM Spectrum Scale and you can now login with the user details you provided in the playbook.

Spectrum Scale Login Screen

After login, you should see 2 nodes and 2 NSDs

10: Spectrum Scale commands 🖥

To run the Spectrum Scale command without running a full string, you can export this path to your user with:

export PATH=$PATH:/usr/lpp/mmfs/bin

To get this in your user:
Create a file called: /etc/profile.d/scale.sh with the content:

$ vi /etc/profile.d/scale.sh

PATH=$PATH:/usr/lpp/mmfs/bin
export PATH

Run: $ source /etc/profile.d/scale.sh to update your running environment or login in and out on your terminal.

FYI: Almost all of the spectrum scale commands start with “mm”
(Reference from the original product, multimedia.)

11. Spectrum Scale Cluster and Filesystem info

Let us look into the nsd and filesystem. run mmlsnsd

Let us check the status of the NSDs/disks with mmlsdisk fs1

  • Here we can see that the disks are available and what failure groups.
[root@scale1 ~]# mmlsdisk fs1
disk driver sector failure holds holds storage
name type size group metadata data status availability pool
------------ -------- ------ ----------- -------- ----- ------------- ------------ ------------
nsd_1 nsd 512 10 yes yes ready up system
nsd_2 nsd 512 20 yes yes ready up system
nsd_3 nsd 512 30 yes yes ready up system

Let us check out the replication of files, to do that, create a new file in the filesystem.

touch /gpfs/fs1/newfile

Then check the attribute of that file. with the mmlsattr command.
In this case, we have 2 replicas.

root@scale1 fs1]# mmlsattr -L newfile
file name: newfile
metadata replication: 2 max 3
data replication: 2 max 3
immutable: no
appendOnly: no
flags:
storage pool name: system
fileset name: root
snapshot name:
creation time: Thu Feb 24 15:58:13 2022
Misc attributes: ARCHIVE
Encrypted: no

Let us see what happens if we shut down one of the nodes.

[root@scale1 fs1]# mmgetstate -aNode number  Node name  GPFS state
-------------------------------------
1 scale1 active
2 scale2 active

[root@scale1 fs1]# mmshutdown -N scale2
Shutting down the following quorum nodes will cause the cluster to lose quorum:
scale2.oslo.forum.ibm.com
Do you wish to continue [yes/no]: yes
Thu Feb 24 16:02:56 CET 2022: mmshutdown: Starting force unmount of GPFS file systems
Thu Feb 24 16:03:01 CET 2022: mmshutdown: Shutting down GPFS daemons
Thu Feb 24 16:03:08 CET 2022: mmshutdown: Finished
[root@scale1 ~]# mmgetstate -aNode number Node name GPFS state
-------------------------------------
1 scale1 arbitrating
2 scale2 down
[root@scale1 ~]# mmstartup -a

And as you can see, the cluster is down as we don't have any third node to decide who should live.

12. Add the third node to the cluster

To showcase adding nodes, let us do this now

Edit the ansible host file. and add the third node,
To showcase this with a dual GUI. you can also add the variable scale_cluster_gui=true in the third node.

Remember to add also add the local NSD for server3 and adjust the replication from default from 2 to 3 in storage_vars2.yml
Your data will then be located in each failure group since we have 3 failure groups and the replica is 3.

Rerun the playbook. 🏃

ansible-playbook -i hosts scale_playbook_part2.yml

After the run is finished, we can see that the recap is now containing the scale3 node.

PLAY RECAP *******************************************************************
scale1 : ok=173 changed=5 unreachable=0 failed=0 skipped=238 rescued=0 ignored=0
scale2 : ok=99 changed=0 unreachable=0 failed=0 skipped=100 rescued=0 ignored=0
scale3 : ok=111 changed=0 unreachable=0 failed=0 skipped=88 rescued=0 ignored=0

Check the newly added node.

Let’s log in and check the cluster.
Now we can see that scale3 is added to the cluster.

[root@scale1 ~]# mmgetstate -aNode number  Node name  GPFS state
-------------------------------------
1 scale1 Active
2 scale2 active
3 scale3 active

And we can see that the drives/NSDs is attached.

After we change the replication, we will get an error at the end of the mmlsdisk command.

Attention: Due to an earlier configuration change the file system
is no longer properly replicated.

To fix this, we need to restripe the filesystem. with mmrestripefs fs1

Let us check the replication, create a new file and check the replica level.

[root@scale1 fs1]# mmlsattr -L newfile3
file name: newfile3
metadata replication: 3 max 3
data replication: 3 max 3

13. Adding more capacity to FS with new NSDs.

Now we have 3 NSDs in each server with 25Gb, which will be 75Gb unreplicated data.

Let us then add one more NSD for each server and failure group.

  • We are currently using sdb.
[root@scale2 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 83G 0 disk
sdb 8:16 0 25G 0 disk
└─sdb1 8:17 0 25G 0 part

After adding one more data disk in the VMs, now we have the sdc.

AME              MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda 8:0 0 83G 0 disk

sdb 8:16 0 25G 0 disk
└─sdb1 8:17 0 25G 0 part
sdc 8:32 0 25G 0 disk
  • Let us then change storage_vars.yml and add the new disks.
    If you see we have added 3 more disks in each server, nsd_x_2
wget https://gist.githubusercontent.com/olemyk/3286017b407e9d52533e6eea4f5373f0/raw/85a26b563001040115e7c83f7fe9c685c5fa7ecd/storage_vars_6disks.yml

Make a local copy and change it to your environment,

  • Rerun the playbook, with the new storage_vars file. 🏃
ansible-playbook -i hosts scale_playbook_part2.yml

And there we have more NSDs in the filesystem.

[root@scale1 ~]# mmlsnsdFile system   Disk name       NSD servers
--------------------------------------------------------------------
fs1 nsd_1 scale1.oslo.forum.ibm.com
fs1 nsd_2 scale2.oslo.forum.ibm.com
fs1 nsd_3 scale3.oslo.forum.ibm.com
fs1 nsd_1_2 scale1.oslo.forum.ibm.com
fs1 nsd_2_2 scale2.oslo.forum.ibm.com
fs1 nsd_3_2 scale3.oslo.forum.ibm.com
[root@scale1 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
fs1 150G 4.3G 146G 3% /gpfs/fs1

New Filesystem, If you want one more filesystem, you can just copy the whole filesystem section and add it to the same storage_vars.yml file or a new one and change the names, and devices.

FYI: This is only to showcase what is possible with the Ansible roles,
So please plan and investigate the Nodes, cluster and Storage options before running this in production..

In the next part! ⏭

So in this part, we deployed a multi-node cluster with local disks/no shared.
We added new nodes to the cluster, change the replication for the filesystem and expanded the existing filesystem.

So for the next part, we will look into how we will create a “remote” Spectrum Scale cluster. Access to the filesystems is via IP or FC. (Remote Mounted filesystem.)

--

--

Ole Kristian Myklebust
Possimpible

Nerd, Loves the mountains and all that come with it. IBMer that works for IBM Lab Services. My own Words and opinion.