Multipath iSCSI in Azure with GlusterFS and gluster-block on RHEL

Alessandro Vozza
Cooking with Azure
Published in
5 min readApr 10, 2018

--

As part of my ongoing effort to get DB2 Purescale running in Azure over distributed, scalable (block-)storage, I pick up the challenge of setting up a 3-node GlusterFS cluster exporting 4 LUNs via 3 iSCSI targets (the same Gluster nodes). All the following scripts are in this repo on Github. Let’s dive into it:

First, as always, we’ll need a resource group to keep all resources together:

rg=gluster-iscsi
az group create -n $rg --location westeurope

I’m going to create a vnet with two subnets; I want the Gluster replication to happen over one subnet/network interface and the communication between the clients and the iSCSI targets (the Gluster nodes) over another interfaces/subnet.

az network vnet create \
--resource-group $rg \--name gluster \
--address-prefix 192.168.0.0/16 \
--subnet-name client \
--subnet-prefix 192.168.1.0/24
az network vnet subnet create \
--resource-group $rg \
--vnet-name gluster \
--name backend \
--address-prefix 192.168.2.0/24

Now we need a network security group:

az network nsg create \
--resource-group $rg \
--name gluster-nsg
az network nsg rule create --nsg-name gluster-nsg -g $rg --name allow-ssh --description "SSHDB2" --protocol tcp --priority 101 --destination-port-range "22"az network nsg rule create --nsg-name gluster-nsg -g $rg --name allow-iscsi --description "iSCSI" --protocol tcp --priority 201 --destination-port-range "3260"az network nsg rule create --nsg-name gluster-nsg -g $rg --name allow-gluster-bricks --description "Gluster-bricks" --protocol tcp --priority 202 --destination-port-range "49152-49160"az network nsg rule create --nsg-name gluster-nsg -g $rg --name allow-gluster-daemon --description "Gluster-daemon" --protocol "*" --priority 203 --destination-port-range "24007-24010"az network nsg rule create --nsg-name gluster-nsg -g $rg --name allow-rpcbind --description "RPCbind" --protocol "*" --priority 204 --destination-port-range "111"

You could get fancy and apply different NSG’s to different network interfaces, but we’ll keep it simple and we’ll move on to create 6 network interfaces (with accelerated networking, you must pre-create the NICs before the VM’s):

az network nic create --resource-group $rg --name g1-client --vnet-name gluster --subnet client --network-security-group gluster-nsg --private-ip-address 192.168.1.10 --accelerated-networking trueaz network nic create --resource-group $rg --name g1-backend --vnet-name gluster --subnet backend --network-security-group gluster-nsg --private-ip-address 192.168.2.10 --accelerated-networking trueaz network nic create --resource-group $rg --name g2-client --vnet-name gluster --subnet client --network-security-group gluster-nsg --private-ip-address 192.168.1.11 --accelerated-networking trueaz network nic create --resource-group $rg --name g2-backend --vnet-name gluster --subnet backend --network-security-group gluster-nsg --private-ip-address 192.168.2.11 --accelerated-networking trueaz network nic create --resource-group $rg --name g3-client --vnet-name gluster --subnet client --network-security-group gluster-nsg --private-ip-address 192.168.1.12 --accelerated-networking trueaz network nic create --resource-group $rg --name g3-backend --vnet-name gluster --subnet backend --network-security-group gluster-nsg --private-ip-address 192.168.2.12 --accelerated-networking true

Finally, let’s create some VM’s

az vm create --resource-group $rg --name g1 --image RedHat:RHEL:7-RAW-CI:latest --size Standard_DS3_v2 --admin-username rhel --nics g1-client g1-backend --data-disk-sizes-gb 1000 1000 --no-wait --custom-data install_config_gluster.shaz vm create --resource-group $rg --name g2 --image RedHat:RHEL:7-RAW-CI:latest --size Standard_DS3_v2 --admin-username rhel --nics g2-client g2-backend --data-disk-sizes-gb 1000 1000 --no-wait --custom-data install_config_gluster.shaz vm create --resource-group $rg --name g3 --image RedHat:RHEL:7-RAW-CI:latest --size Standard_DS3_v2 --admin-username rhel --nics g3-client g3-backend --data-disk-sizes-gb 1000 1000 --no-wait --custom-data install_config_gluster.sh

The magic of cloud init and the install_config_gluster.sh script will do the rest (namely: enable the second interface, prepare the two data disks with LVM and install and start both gluster and gluster-block system services).

At the end of the process, you’ll end up with 3 VMs ready for the Gluster+iSCSI setup.

You must execute the following only on one node! First, let’s cluster the nodes (assuming you’re executing this on g1 node):

g1#> gluster peer probe g2b
g1#> gluster peer probe g3b
g1#> gluster pool list
UUID Hostname State
ede83b7b-f6ca-4d25-aec9-5746e59ee487 g2b Connected
ed23527b-2c7f-4d88-92cf-83d182578d66 g3b Connected
1fe6b392-f63a-4b63-801f-dd2a309306d7 localhost Connected

And create the volume:

g1#> gluster volume create db2data replica 3 g1b:/bricks/db2data/db2data g2b:/bricks/db2data/db2data g3b:/bricks/db2data/db2datavolume create: db2data: success: please start the volume to access datag1#> gluster volume start db2data

References:

We will now proceed to mount the volume locally; targetcli will create two subfolders, block-meta and block-data to hold both the metadata and the actual disk files for the export iSCSI LUNs:

g1#> mkdir -p /db2/datag1#> mount -t glusterfs g1b:/db2data /db2/data/

Final step: create the LUNs with gluster-block command (note the use of the IP addresses from the client subnet; this way, the targets will be exposed over that network keeping Gluster replication and client data separated for both security and performance; took me a while and a Github issue to find out):

g1#> gluster-block create db2data/data ha 3 192.168.1.10,192.168.1.11,192.168.1.12 2480GiBg1#> gluster-block create db2data/quorum ha 3 192.168.1.10,192.168.1.11,192.168.1.12 10GiBg1#> gluster-block create db2data/log ha 3 192.168.1.10,192.168.1.11,192.168.1.12 500GiBg1#> gluster-block create db2data/shared ha 3 192.168.1.10,192.168.1.11,192.168.1.12 10GiB

Lo and behold, the magic of multipath iSCSI targets:

I omitted the other 3 TPGs for brevity. Let’s setup a client to test the multipath devices.

On a client machine (RHEL7.4 as well, with only one NIC in the client subnet), install the necessary tools (check this script out):

#> yum -y install device-mapper-multipath iscsi-initiator-utils#>  modprobe dm_multipath#>  lcat >> /etc/multipath.conf <<EOF
# LIO iSCSI
devices {
device {
vendor "LIO-ORG"
user_friendly_names "yes" # names like mpatha
path_grouping_policy "failover" # one path per group
path_selector "round-robin 0"
path_checker "tur"
prio "const"
rr_weight "uniform"
}
}
EOF
#> systemctl start multipathd
#> systemctl enable multipathd

We just need to discover one target, and the rest will follow:

#> iscsiadm -m discovery --type sendtargets --portal 192.168.1.10 -l

This will log into each target and discover multiple paths to the same device:

#> lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
fd0 2:0 1 4K 0 disk
sda 8:0 0 32G 0 disk
├─sda1 8:1 0 500M 0 part /boot
└─sda2 8:2 0 31.5G 0 part /
sdb 8:16 0 14G 0 disk
└─sdb1 8:17 0 14G 0 part /mnt
sdc 8:32 0 10G 0 disk
└─mpathc 253:0 0 10G 0 mpath
sdd 8:48 0 2.4T 0 disk
└─mpathf 253:3 0 2.4T 0 mpath
sde 8:64 0 10G 0 disk
└─mpathd 253:1 0 10G 0 mpath
sdf 8:80 0 10G 0 disk
└─mpathc 253:0 0 10G 0 mpath
sdg 8:96 0 2.4T 0 disk
└─mpathf 253:3 0 2.4T 0 mpath
sdh 8:112 0 500G 0 disk
└─mpathe 253:2 0 500G 0 mpath
sdi 8:128 0 2.4T 0 disk
└─mpathf 253:3 0 2.4T 0 mpath
sdj 8:144 0 10G 0 disk
└─mpathd 253:1 0 10G 0 mpath
sdk 8:160 0 10G 0 disk
└─mpathc 253:0 0 10G 0 mpath
sdl 8:176 0 10G 0 disk
└─mpathd 253:1 0 10G 0 mpath
sdm 8:192 0 500G 0 disk
└─mpathe 253:2 0 500G 0 mpath
sdn 8:208 0 500G 0 disk
└─mpathe 253:2 0 500G 0 mpath
sr0 11:0 1 628K 0 rom

We just need now to create a filesystem in each device and mount it

for device in {a,b,c,d}; do mkfs.xfs /dev/mapper/mpath$device; donemkdir -p /db2/{data,quorum,shared,logs}mount /dev/mapper/mpatha /db2/data
mount /dev/mapper/mpathb /db2/quorum
mount /dev/mapper/mpathc /db2/shared
mount /dev/mapper/mpathd /db2/logs

Done!

References:

--

--

Alessandro Vozza
Cooking with Azure

Full time Cloud Pirate, software engineer at Microsoft, public speaker, community organiser and mentor. Opinions are mine’s, facts are facts.