Cooking with Azure
Published in

Cooking with Azure

Multipath iSCSI in Azure with GlusterFS and gluster-block on RHEL

As part of my ongoing effort to get DB2 Purescale running in Azure over distributed, scalable (block-)storage, I pick up the challenge of setting up a 3-node GlusterFS cluster exporting 4 LUNs via 3 iSCSI targets (the same Gluster nodes). All the following scripts are in this repo on Github. Let’s dive into it:

First, as always, we’ll need a resource group to keep all resources together:

rg=gluster-iscsi
az group create -n $rg --location westeurope

I’m going to create a vnet with two subnets; I want the Gluster replication to happen over one subnet/network interface and the communication between the clients and the iSCSI targets (the Gluster nodes) over another interfaces/subnet.

az network vnet create \
--resource-group $rg \--name gluster \
--address-prefix 192.168.0.0/16 \
--subnet-name client \
--subnet-prefix 192.168.1.0/24
az network vnet subnet create \
--resource-group $rg \
--vnet-name gluster \
--name backend \
--address-prefix 192.168.2.0/24

Now we need a network security group:

az network nsg create \
--resource-group $rg \
--name gluster-nsg
az network nsg rule create --nsg-name gluster-nsg -g $rg --name allow-ssh --description "SSHDB2" --protocol tcp --priority 101 --destination-port-range "22"az network nsg rule create --nsg-name gluster-nsg -g $rg --name allow-iscsi --description "iSCSI" --protocol tcp --priority 201 --destination-port-range "3260"az network nsg rule create --nsg-name gluster-nsg -g $rg --name allow-gluster-bricks --description "Gluster-bricks" --protocol tcp --priority 202 --destination-port-range "49152-49160"az network nsg rule create --nsg-name gluster-nsg -g $rg --name allow-gluster-daemon --description "Gluster-daemon" --protocol "*" --priority 203 --destination-port-range "24007-24010"az network nsg rule create --nsg-name gluster-nsg -g $rg --name allow-rpcbind --description "RPCbind" --protocol "*" --priority 204 --destination-port-range "111"

You could get fancy and apply different NSG’s to different network interfaces, but we’ll keep it simple and we’ll move on to create 6 network interfaces (with accelerated networking, you must pre-create the NICs before the VM’s):

az network nic create --resource-group $rg --name g1-client --vnet-name gluster --subnet client --network-security-group gluster-nsg --private-ip-address 192.168.1.10 --accelerated-networking trueaz network nic create --resource-group $rg --name g1-backend --vnet-name gluster --subnet backend --network-security-group gluster-nsg --private-ip-address 192.168.2.10 --accelerated-networking trueaz network nic create --resource-group $rg --name g2-client --vnet-name gluster --subnet client --network-security-group gluster-nsg --private-ip-address 192.168.1.11 --accelerated-networking trueaz network nic create --resource-group $rg --name g2-backend --vnet-name gluster --subnet backend --network-security-group gluster-nsg --private-ip-address 192.168.2.11 --accelerated-networking trueaz network nic create --resource-group $rg --name g3-client --vnet-name gluster --subnet client --network-security-group gluster-nsg --private-ip-address 192.168.1.12 --accelerated-networking trueaz network nic create --resource-group $rg --name g3-backend --vnet-name gluster --subnet backend --network-security-group gluster-nsg --private-ip-address 192.168.2.12 --accelerated-networking true

Finally, let’s create some VM’s

az vm create --resource-group $rg --name g1 --image RedHat:RHEL:7-RAW-CI:latest --size Standard_DS3_v2 --admin-username rhel --nics g1-client g1-backend --data-disk-sizes-gb 1000 1000 --no-wait --custom-data install_config_gluster.shaz vm create --resource-group $rg --name g2 --image RedHat:RHEL:7-RAW-CI:latest --size Standard_DS3_v2 --admin-username rhel --nics g2-client g2-backend --data-disk-sizes-gb 1000 1000 --no-wait --custom-data install_config_gluster.shaz vm create --resource-group $rg --name g3 --image RedHat:RHEL:7-RAW-CI:latest --size Standard_DS3_v2 --admin-username rhel --nics g3-client g3-backend --data-disk-sizes-gb 1000 1000 --no-wait --custom-data install_config_gluster.sh

The magic of cloud init and the install_config_gluster.sh script will do the rest (namely: enable the second interface, prepare the two data disks with LVM and install and start both gluster and gluster-block system services).

At the end of the process, you’ll end up with 3 VMs ready for the Gluster+iSCSI setup.

You must execute the following only on one node! First, let’s cluster the nodes (assuming you’re executing this on g1 node):

g1#> gluster peer probe g2b
g1#> gluster peer probe g3b
g1#> gluster pool list
UUID Hostname State
ede83b7b-f6ca-4d25-aec9-5746e59ee487 g2b Connected
ed23527b-2c7f-4d88-92cf-83d182578d66 g3b Connected
1fe6b392-f63a-4b63-801f-dd2a309306d7 localhost Connected

And create the volume:

g1#> gluster volume create db2data replica 3 g1b:/bricks/db2data/db2data g2b:/bricks/db2data/db2data g3b:/bricks/db2data/db2datavolume create: db2data: success: please start the volume to access datag1#> gluster volume start db2data

References:

We will now proceed to mount the volume locally; targetcli will create two subfolders, block-meta and block-data to hold both the metadata and the actual disk files for the export iSCSI LUNs:

g1#> mkdir -p /db2/datag1#> mount -t glusterfs g1b:/db2data /db2/data/

Final step: create the LUNs with gluster-block command (note the use of the IP addresses from the client subnet; this way, the targets will be exposed over that network keeping Gluster replication and client data separated for both security and performance; took me a while and a Github issue to find out):

g1#> gluster-block create db2data/data ha 3 192.168.1.10,192.168.1.11,192.168.1.12 2480GiBg1#> gluster-block create db2data/quorum ha 3 192.168.1.10,192.168.1.11,192.168.1.12 10GiBg1#> gluster-block create db2data/log ha 3 192.168.1.10,192.168.1.11,192.168.1.12 500GiBg1#> gluster-block create db2data/shared ha 3 192.168.1.10,192.168.1.11,192.168.1.12 10GiB

Lo and behold, the magic of multipath iSCSI targets:

I omitted the other 3 TPGs for brevity. Let’s setup a client to test the multipath devices.

On a client machine (RHEL7.4 as well, with only one NIC in the client subnet), install the necessary tools (check this script out):

#> yum -y install device-mapper-multipath iscsi-initiator-utils#>  modprobe dm_multipath#>  lcat >> /etc/multipath.conf <<EOF
# LIO iSCSI
devices {
device {
vendor "LIO-ORG"
user_friendly_names "yes" # names like mpatha
path_grouping_policy "failover" # one path per group
path_selector "round-robin 0"
path_checker "tur"
prio "const"
rr_weight "uniform"
}
}
EOF
#> systemctl start multipathd
#> systemctl enable multipathd

We just need to discover one target, and the rest will follow:

#> iscsiadm -m discovery --type sendtargets --portal 192.168.1.10 -l

This will log into each target and discover multiple paths to the same device:

#> lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
fd0 2:0 1 4K 0 disk
sda 8:0 0 32G 0 disk
├─sda1 8:1 0 500M 0 part /boot
└─sda2 8:2 0 31.5G 0 part /
sdb 8:16 0 14G 0 disk
└─sdb1 8:17 0 14G 0 part /mnt
sdc 8:32 0 10G 0 disk
└─mpathc 253:0 0 10G 0 mpath
sdd 8:48 0 2.4T 0 disk
└─mpathf 253:3 0 2.4T 0 mpath
sde 8:64 0 10G 0 disk
└─mpathd 253:1 0 10G 0 mpath
sdf 8:80 0 10G 0 disk
└─mpathc 253:0 0 10G 0 mpath
sdg 8:96 0 2.4T 0 disk
└─mpathf 253:3 0 2.4T 0 mpath
sdh 8:112 0 500G 0 disk
└─mpathe 253:2 0 500G 0 mpath
sdi 8:128 0 2.4T 0 disk
└─mpathf 253:3 0 2.4T 0 mpath
sdj 8:144 0 10G 0 disk
└─mpathd 253:1 0 10G 0 mpath
sdk 8:160 0 10G 0 disk
└─mpathc 253:0 0 10G 0 mpath
sdl 8:176 0 10G 0 disk
└─mpathd 253:1 0 10G 0 mpath
sdm 8:192 0 500G 0 disk
└─mpathe 253:2 0 500G 0 mpath
sdn 8:208 0 500G 0 disk
└─mpathe 253:2 0 500G 0 mpath
sr0 11:0 1 628K 0 rom

We just need now to create a filesystem in each device and mount it

for device in {a,b,c,d}; do mkfs.xfs /dev/mapper/mpath$device; donemkdir -p /db2/{data,quorum,shared,logs}mount /dev/mapper/mpatha /db2/data
mount /dev/mapper/mpathb /db2/quorum
mount /dev/mapper/mpathc /db2/shared
mount /dev/mapper/mpathd /db2/logs

Done!

References:

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alessandro Vozza

Alessandro Vozza

416 Followers

Full time Cloud Pirate, software engineer at Microsoft, public speaker, community organiser and mentor. Opinions are mine’s, facts are facts.