Using Ceph Distributed Storage Cluster on Oracle Cloud Infrastructure

This tutorial describes the deployment steps of a Ceph Distributed Storage Cluster environment on Oracle Cloud Infrastructure (OCI) using Oracle Linux OS.

Ceph is fully supported on Oracle Linux as described in the public documentation.

As an additional configuration option on OCI, you might either keep all Ceph resources in a single availability domain (AD) or use all ADs. Depending on the network traffic utilization, it might be better getting all the resources in a single AD or using a distributed option across different availability domains for fault tolerance.

Here is an example of a Ceph Distributed Storage Cluster architecture that can be used on OCI

What is Ceph Distributed Storage Cluster?

Ceph is a widely used open source storage platform and it provides high performance, reliability, and scalability. The Ceph free distributed storage system provides an interface for object, block, and file-level storage.

In this tutorial, you will install and build a Ceph cluster on Oracle Linux 7.X with the following components:

  • Ceph OSDs (ceph-osd) — Handles the data store, data replication and recovery and a Ceph cluster needs at least two Ceph OSD servers which will be based on Oracle Linux
  • Ceph Monitor (ceph-mon) — Monitors the cluster state, OSD map and CRUSH map.
  • Ceph Meta Data Server (ceph-mds) — This is needed to use Ceph as a File System.

Additional details can be found in the Ceph public documentation and it’s important that you understand them first before proceeding with the initial configuration.

Environment

  • 5 server nodes, all with Oracle Linux 7.X installed.
  • Root privileges on all nodes.

Getting Started

First of all, this tutorial is based on “How To Build a Ceph Distributed Storage Cluster on CentOS 7” article. Secondly, for this setup you will need to provision at least five Oracle Bare Metal instances as explained in the above environment table along with two additional block-storage volumes that will be used later as a Ceph data storage. Make sure you attached all block storage volumes on each new instance, run the iscsi commands. Review our Bare Metal Public Documentation Adding a Block Volume session for more details and pay special attention to the “_netdev and nofail” fstab options in case you need to use them.

Ceph Configuration

In this step, you will configure all five nodes to prepare them for the installation of the Ceph Cluster. You have to follow and run all commands below on all nodes.

Create a Ceph User

Create a new user named ‘cephuser’ on all nodes.

$ sudo useradd -d /home/cephuser -m cephuser
$ sudo passwd cephuser

After creating the new user, you need to configure sudo for ‘cephuser’. He must be able to run commands as root and to get root privileges without a password.

Run the command below to create a sudoers file for the user and edit the /etc/sudoers file with sed.

$ sudo visudo

Add the following at the bottom of the file

##cephuser sudo permissions
cephuser ALL = (root) NOPASSWD:ALL

Save and exit vim.

Install and Configure NTP

Install NTP to synchronize date and time on all nodes. Run the ntpdate command to set a date and time via NTP protocol, we will use the us pool NTP server. Then start and enable NTP server to run at boot time.

$ sudo yum install -y ntp ntpdate ntp-doc
$ sudo ntpdate 0.us.pool.ntp.org
$ sudo hwclock — systohc
$ sudo systemctl enable ntpd.service
$ sudo systemctl start ntpd.service

Disable SELinux

Disable SELinux on all nodes by editing the SELinux configuration file with the sed stream editor.

$ sudo sed -i ‘s/SELINUX=enforcing/SELINUX=disabled/g’ /etc/selinux/config

DNS Configuration

On OCI, internal DNS should work without any additional configuration. Test the network connectivity to make sure your Ceph hosts can ping each other as shown below.

In case of failure, modify your Bare Metal Security Lists to allow internal communication. Edit the VCN Security List and either open all ports for the Bare Metal Internal Network (NOT PUBLIC NETWORK) as shown below for network 172.0.0.0/16

Source: 172.0.0.0/16

IP Protocol: All Protocols

Allows: all traffic for all ports

or specific ports only if needed.

Configure the SSH Server

In this step, you will configure the ceph-admin node. Login to the ceph-admin node and become the ‘cephuser’.

$ ssh -i public_ssh_key opc@ceph-admin-bmcs-public_IP
$ sudo su — cephuser

The admin node is used for installing and configuring all cluster nodes, so the user on the ceph-admin node must have privileges to connect to all nodes without a password.

You have to configure password-less SSH access for ‘cephuser’ on ‘ceph-admin’ node. By default, on OCI only opc user contains the key to authenticate on all machines so you will need to generate an exception for cephuser account. In order to do that, add the following into /etc/ssh/sshd_config file at the bottom.

## adding a ssh password authentication permission to cephuser
Match User cephuser
PasswordAuthentication yes

Save and exit vim. Restart ssh to validate the change.

$ sudo service sshd restart

Now you’re ready to proceed with the next step. Generate the ssh keys for ‘cephuser’.

leave passphrase blank/empty.

Next, create the configuration file for the ssh configuration.

Paste configuration below:

Host ceph-admin
 Hostname ceph-admin
 User cephuser
 
Host mon1
 Hostname mon1
 User cephuser
 
Host osd1
 Hostname osd1
 User cephuser
 
Host osd2
 Hostname osd2
 User cephuser
 
Host client
 Hostname client
 User cephuser

Using Ceph Distributed Storage Cluster on OCI

Save the file and change the permission of the config file.

$ chmod 644 ~/.ssh/config

Now add the SSH key to all nodes with the ssh-copy-id command.

$ ssh-keyscan osd1 osd2 mon1 client >> ~/.ssh/known_hosts
$ ssh-copy-id osd1
$ ssh-copy-id osd2
$ ssh-copy-id mon1
$ ssh-copy-id client

When you are finished, try to access osd1 server from the ceph-admin node.

Configure Firewalld

You will use Firewalld to protect the system. In this step, you will open the ports needed by ceph-admon, ceph-mon and ceph-osd.

Connect into the ceph-admin and open port 80, 2003 and 4505–4506, and then reload the firewall as shown below

$ sudo firewall-cmd --zone=public --add-port=80/tcp --permanent
$ sudo firewall-cmd --zone=public --add-port=2003/tcp --permanent
$ sudo firewall-cmd --zone=public --add-port=4505–4506/tcp --permanent
$ sudo firewall-cmd --reload

From the ceph-admin node, login to the monitor node ‘mon1’ and open new port on the Ceph monitor node then reload the firewall..

$ sudo firewall-cmd --zone=public --add-port=6789/tcp --permanent
$ sudo firewall-cmd --reload

Finally, open port 6800–7300 on each of the osd nodes — osd1 and osd2. You can either login to each osd node from the ceph-admin node or through the Bare Metal Instance Public IP if you’re using the public IP network.

$ sudo firewall-cmd --zone=public --add-port=6800–7300/tcp --permanent
$ sudo firewall-cmd --reload

Configure the Ceph OSD Nodes

In this tutorial, we have two OSD nodes and each node has two partitions.

  • /dev/sda for the root partition.
  • /dev/sdb is an empty partition — 50GB (attache iSCSI block storage volume).

You will use /dev/sdb for the Ceph disk. From the ceph-admin node, login to all OSD nodes and format the /dev/sdb partition with XFS.

$ sudo parted -s /dev/sdb mklabel gpt mkpart primary xfs 0% 100%
$ sudo mkfs.xfs /dev/sdb -f

Now check the partition, and you will get xfs /dev/sdb partition.

$ sudo blkid -o value -s TYPE /dev/sdb

Build the Ceph Cluster

In this step, you will install Ceph on all nodes from the ceph-admin node.

Install ceph-deploy on the ceph-admin node

Login to the ceph-admin node.

First you need to enable Ceph YUM Repo on Oracle Linux which is disabled.

$ sudo vi /etc/yum.repos.d/public-yum-ol7.repo

search for “ol7_ceph” and enable it (“enabled=1”)

Save and exit vim.

Now, install ceph-deply with yum command and make sure all nodes are updated.

$ sudo yum update -y && sudo yum install ceph-deploy -y

After the ceph-deploy tool has been installed, create a new directory for the ceph cluster configuration.

$ sudo mkdir /cluster
$ cd /cluster/

Next, create a new cluster configuration with the ‘ceph-deploy’ command and define the monitor node to be ‘mon1’.

Edit the ceph.conf file with vim and under [global] block, paste configuration below.

# Your network address
public network = 172.0.0.0/16
osd pool default size = 2

Save the file and exit vim.

Install Ceph on All Nodes

Now install Ceph on all other nodes from the ceph-admin node using the following single command.

$ ceph-deploy install ceph-admin mon1 osd1 osd2

The command will automatically install Ceph on all nodes: mon1, osd1–3 and ceph-admin and it will take some time.

Now deploy the ceph-mon on mon1 node.

$ ceph-deploy mon create-initial

The command will create the monitor key, check and get the keys with with the ‘ceph’ command.

$ ceph-deploy gatherkeys mon1

Adding OSDS to the Cluster

When Ceph has been installed on all nodes, then you can add the OSD daemons to the cluster. OSD Daemons will create their data and journal partition on the disk /dev/sdb

$ ceph-deploy disk list osd1 osd2

You will see the /dev/sdb disk with XFS format as shown above.

Next, delete the /dev/sdb partition tables on all nodes with the zap option. This command will delete all data on /dev/sdb on the Ceph OSD nodes.

$ ceph-deploy disk zap osd1:/dev/sdb osd2:/dev/sdb

Now prepare all OSDS nodes and make sure there are no errors in the results.

$ ceph-deploy osd prepare osd1:/dev/sdb osd2:/dev/sdb

If you see the osd1–2 is ready for OSD use result, then the deployment was successful.

Activate the OSDs with the command below:

$ ceph-deploy osd activate osd1:/dev/sdb1 osd2:/dev/sdb1

Check the output for errors before you proceed. Now you can check the sdb disk on OSD nodes with the list command.

$ ceph-deploy disk list osd1 osd2

The results is that /dev/sdb has now two partitions:

  • /dev/sdb1 — Ceph Data
  • /dev/sdb2 — Ceph Journal

Or you can check that directly on the OSD node with fdisk.

$ ssh osd1
$ sudo fdisk -l /dev/sdb

Next, deploy the management-key to all associated nodes.

$ ceph-deploy admin ceph-admin mon1 osd1 osd2

Change the permission of the key file by running the command below on all nodes.

$ sudo chmod 644 /etc/ceph/ceph.client.admin.keyring

Testing your Ceph setup

You have installed and created your new Ceph cluster, then you added OSDS nodes to the cluster. Now you can test the cluster and make sure there are no errors in the cluster setup.

From the ceph-admin node, log in to the ceph monitor server ‘mon1’.

Check the cluster health.

$ sudo ceph health

HEALTH_OK

Check the cluster status

$ sudo ceph -s
cluster 66adb950–1fc4–447b-9898–6b6cd7c45a40
 health HEALTH_OK
 monmap e1: 1 mons at {mon1=172.0.0.28:6789/0}
 election epoch 3, quorum 0 mon1
 osdmap e10: 2 osds: 2 up, 2 in
 flags sortbitwise
 pgmap v21: 64 pgs, 1 pools, 0 bytes data, 0 objects
 68744 kB used, 92045 MB / 92112 MB avail
 64 active+clean

Make sure Ceph health is OK as shown above and there is a monitor node ‘mon1’ with IP address ‘172.0.0.28’. There should be 2 OSD servers and all should be up and running, and there should be an available disk of about 100GB — 2x50GB Ceph Data partition.

Your new Ceph Cluster setup is done. Now, you ready to use your new Ceph block device.

Configure Ceph Client Node

In this section, you will configure our Oracle Linux 7.x server as a Ceph client and you will configure the Ceph client as other Ceph node (mon-osd).

Login to the Ceph client node either through the Ceph admin node or using the Bare Metal Instance public IP. Then add a new ‘cephuser’ account and set a new password for the user.

$ sudo useradd -d /home/cephuser -m cephuser
$ sudo passwd cephuser

Repeat the visudo process, disable selinux and configure NTP as described above.

Make sure you can ssh into client instance from ceph-admin as previously done for other nodes.

$ ssh client
$ [cephuser@client ~]$

Install Ceph on Client Node

In this step, you will install Ceph on the client node (the node that acts as client node) from the ceph-admin node.

Login to the ceph-admin node as root by ssh and become “cephuser” with su. Go to the Ceph cluster directory, you used the ‘cluster’ directory.

$ su — cephuser
Last login: Tue Jul 18 21:25:25 GMT 2017 on pts/0
$ cd /cluster/

Install Ceph on the client node with ceph-deploy and then push the configuration and the admin key to the client node.

$ ceph-deploy install client
$ ceph-deploy admin client

The Ceph installation will take some time (depends on the server and network speed). When the task finished, connect to the client node and change the permission of the admin key.

$ ssh client
$ sudo chmod 644 /etc/ceph/ceph.client.admin.keyring

Ceph has been installed on the client node.

Configure and Mount Ceph as Block Device

Ceph allows users to use the Ceph cluster as a thin-provisioned block device. You can mount the Ceph storage like a normal hard drive on your system. Ceph Block Storage or Ceph RADOS Block Storage (RBD) stores block device images as an object, it automatically stripes and replicates our data across the Ceph cluster. Ceph RBD has been integrated with KVM, so you can also use it as block storage on various virtualization platforms for example.

Before creating a new block device on the client node, you must check the cluster status as done above. Login to the Ceph monitor node and check the cluster state.

Make sure cluster health is ‘HEALTH_OK’ and pgmap is ‘active & clean’.

After confirming that you’re ready to proceed with the Client configuration. For this tutorial, you will use Ceph as a block device or block storage on a client server with Oracle Linux 7 as the client node operating system. From the ceph-admin node, connect to the client node with ssh. There is no password required as you configured passwordless logins for that node.

Ceph provides the rbd command for managing rados block device images. You can create a new image, resize, create a snapshot, and export our block devices with the rbd command.

On this tutorial you will create a new rbd image with size 40GB, and then check ‘disk01’ is available on the rbd list.

$ rbd create disk01 — size 40960
$ rbd ls -l
NAME SIZE PARENT FMT PROT LOCK
disk01 40960M 2 
[cephuser@client ~]$

Next, activate the rbd kernel module.

$ sudo modprobe rbd
$ sudo rbd feature disable disk01 exclusive-lock object-map fast-diff deep-flatten

Now, map the disk01 image to a block device via rbd kernel module, and make sure the disk01 in the list of mapped devices then.

$ sudo rbd map disk01
/dev/rbd0
$ rbd showmapped
id pool image snap device
0 rbd disk01 — /dev/rbd0

You can see that the disk01 image has been mapped as ‘/dev/rbd0’ device. Before using it to store data, you have to format that disk01 image with the mkfs command. For this tutorial you will use the XFS file system

$ sudo mkfs.xfs /dev/rbd0

Now, mount ‘/dev/rbd0’ to the mnt directory.

$ sudo mount /dev/rbd0 /mnt

The Ceph RBD or RADOS Block Device has been configured and mounted on the system. Check that the device has been mounted correctly with the df command.

$ df -hT |grep rd0
/dev/rbd0 xfs 40G 33M 40G 1% /mnt

Setup RBD at Boot time

After finishing the Ceph Client configuration, you will configure it to automount the Ceph Block Device to the system at boot time. One way of doing that is creating a new file in the /usr/local/bin directory for mounting and unmounting of the RBD disk01.

$ cd /usr/local/bin/
$ sudo vim rbd-mount

Paste the script below and feel free to modify it based on your requirements.

#!/bin/bash
# Script Author: http://bryanapperson.com/
# Change with your pools name
export poolname=rbd
 
# Change with your disk image name
export rbdimage=disk01
 
# Mount Directory
export mountpoint=/mnt/mydisk
 
# Image mount/unmount and pool are passed from the systems service as arguments
# Determine if we are mounting or unmounting
if [ “$1” == “m” ]; then
 modprobe rbd
 rbd feature disable $rbdimage exclusive-lock object-map fast-diff deep-flatten
 rbd map $rbdimage — id admin — keyring /etc/ceph/ceph.client.admin.keyring
 mkdir -p $mountpoint
 mount /dev/rbd/$poolname/$rbdimage $mountpoint
fi
if [ “$1” == “u” ]; then
 umount $mountpoint
 rbd unmap /dev/rbd/$poolname/$rbdimage
fi

Save the file and exit vim, then make it executable with chmod.

$ sudo chmod +x rbd-mount

Next, go to the systemd directory and create the service file.

$ cd /etc/systemd/system/
$ sudo vim rbd-mount.service

Paste service configuration below:

[Unit]
Description=RADOS block device mapping for $rbdimage in pool $poolname”
Conflicts=shutdown.target
Wants=network-online.target
After=NetworkManager-wait-online.service
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/local/bin/rbd-mount m
ExecStop=/usr/local/bin/rbd-mount u
[Install]
WantedBy=multi-user.target

Save the file and exit vim.

Reload the systemd files and enable the rbd-mount service to start at boot time.

$ sudo systemctl daemon-reload
$ sudo systemctl enable rbd-mount.service

If you reboot the client node now, rbd ‘disk01’ will automatically be mounted to the ‘/mnt/mydisk’ directory.

Your Ceph Distributed Storage Cluster and Client configuration are done!

References


Originally published at community.oracle.com on July 18, 2017.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.