This blog is to help people resolve the accounting issues caused by Glusterfs-quota. It requires the users to have a basic understanding of Gluster and how quota is used in Gluster.
The blog is split into 3 parts. This part explains how accounting works in Quota.
How quota lists the size of a directory in a gluster volume:
Let’s assume a directory dir1 with two files f1 and f2:
When the same layout is spread in this distributed Gluster volume:
So for easier understanding let take a distributed volume, where your data is distributed between two bricks b1 and b2.
In Gluster the directories are available in all the bricks or subvolumes while the files are saved only in their hashed subvolume (respective brick. This is taken care by DHT). Here f1 is in the first subvolume and f2 is in the second subvolume.
The size of the directory dir1 is supposed to sum of sizes of the files in the directory which is the size of f1 and f2. So quota has to get the size of f1 and f2 and add it up and show that as the size consumed by dir1. Here we have two files f1 and f2 so adding two files is fine. But in a huge filesystem, a directory can have a number of files.
For example, if two subvolumes with 100 files on one subvol and 500 files are on other. Then to calculate the size consumed by the directory we need to add all these 600 files. To add all the 600 files by getting their size every time one after other is an expensive action. So Gluster quota maintains a xattr called size in the directory as well. The size of the directory will be the size of the files inside the directory.
Note: The size of the directory provided by the underlying filesystem such as xfs is the size of the directory as a file (the content written on it) not the files that are underneath it in the file system hierarchy.
So one subvol which had 100 files each of 1mb will have the directory size as 100mb and the other subvol which had 500 files of 1mb will have directory size as 500mb. So the contribution to size by one subvol is 100mb and the contribution by other is 500mb. This way we have the size of the same directory from different subvols. These sizes have to be added to get the actual size of the directory spread across the volume.
The quota daemon is the one responsible for aggregates the size of the same directory in all the subvolumes (two here. Can differ based on the volume configuration). This aggregated size is then shown as the size of the directory to the user.
The above is for a directory structure which is one level deep. If the depth is more, Then the lowermost directories size will update on it parent directory. The files will update their size to their parent directory and this keeps going recursively.
This way a particular directory will have the sum of the sizes of all the files underneath it.
The way this is calculated is a bit complex. Will skip that for now.
So the way, the Quota shows the size consumed is:
The command creates an auxiliary mount with the client graph.
This client connects to the servers necessary (two servers here). The call traverses from the aux mount to the server graph to the underlying filesystem, gets the size and returns back to the client graph with the size. The client graph is smart to aggregate the size (add it from the respective subvolumes) and give the output.
Now that we have an idea about the aggregation for a plain distribute volume in gluster is done. I’ll give a gist about the replicated volumes.
The client graph is usually like this:
DHT on top and then AFR below it. The number of DHT subvolumes is determined using the number of servers among which the volume has to be distributed. The number of AFR subvolumes is decided by the replication factor.
So for a 10 x 3 volume type, 10 is the distribution count and 3 is the replica count.
So each distribute-subvol will have 3 replica subvol. This is how a 10 X 3 subvolume has 30 bricks.
This way the aggregation that comes to a DHT will go through the replica subvol.
Because of this, the quota has to talk to the DHT which gets the value from AFR. AFR returns from one of the AFR subvols. Once DHT gets the value from AFR, it can further use it for calculation(add in terms of DHT) without worrying about the replica count.
This is how the client takes care of the aggregation.
The reason behind accounting mismatch is explained in the next blog.