Benchmark: Deep directory structure vs. flat directory structure to store millions of files on ext4
It seems to be common knowledge that you should be using deep (also called tree) directory structure (e.g., files/00/01/123.data
) instead of a flat directory (e.g., files/123.data
) when you want to store millions of files. It might have been true for old filesystems like ext3
but is it still true for more modern one like ext4
?
Let’s verify that.
We’ll use Ruby to generate and benchmark both storing strategies. First we need to find a way to generate fake files. We want to generate 10 millions of them. To do that, we’ll just use a hash with random md5 keys and random md5 values. This way we are sure we are reading something that can’t be cached by the system:
hash = {}10_000_000.times do
key = Digest::MD5.hexdigest(rand.to_s)
value = Digest::MD5.hexdigest(rand.to_s)
hash[key] = value
end
We then need some code to write than read these files using a flat directory storing strategy:
puts Benchmark.measure {
hash.each do |key,value|
File.write "./dir_flat/#{key}", value
end
}puts Benchmark.measure {
hash.each do |key,value|
File.read "./dir_flat/#{key}"
end
}
And, some code to write than read these files using a deep directory storing strategy. We chose two directory levels with two hexadecimal letters. It should average 152-153 files per leaf directory. (10,000,000/(256*256)):
puts Benchmark.measure {
hash.each do |key,value|
dir_path = "./dir_deep/#{key[0..1]}/#{key[2..3]}/"
FileUtils.mkdir_p dir_path
File.write dir_path + key, value
end
}puts Benchmark.measure {
hash.each do |key,value|
dir_path = "./dir_deep/#{key[0..1]}/#{key[2..3]}/"
File.read dir_path + key
end
}
We note that write performance is probably impacted by creating directories dynamically. Let’s prerender the directory structure:
hash.keys.each do |key|
dir_path = "./dir_deep/#{key[0..1]}/#{key[2..3]}/"
FileUtils.mkdir_p dir_path
endputs Benchmark.measure {
hash.each do |key,value|
dir_path = "./dir_deep/#{key[0..1]}/#{key[2..3]}/"
File.write dir_path + key, value
end
}puts Benchmark.measure {
hash.each do |key,value|
dir_path = "./dir_deep/#{key[0..1]}/#{key[2..3]}/"
File.read dir_path + key
end
}
Here’s the final benchmark results:
Write is 44% faster using a flat directory structure instead of deep/tree directory structure. Read is even 7.8x faster.
In conclusion, just use a flat directory structure. It’s easier to use. Faster in write. Much faster in read. Save on ionodes. And doesn’t need to pre-create or dynamically generate the branch folders.
References: source — raw results
[Edit] So I found out after publishing this article, that ext4 limits is around 10,118,651 (or ~ 10,233,706) files per directory for md5 long filename.
I was trying to run the above benchmark with 20 millions files. But I was getting Errno::ENOSPC: No space left on device @ rb_sysopen
error in Ruby. That was weird because both disk space and inodes were fine.
In thedmesg
log, I actually had inode directory index full errors:
ext4_dx_add_entry:2235: inode #258713: comm pry: Directory index full
[1718.956797] EXT4-fs warning (device vda1): ext4_dx_add_entry:2184: Directory (ino: 384830) index full, reach max htree level :2
[1718.956798] EXT4-fs warning (device vda1): ext4_dx_add_entry:2188: Large directory feature is not enabled on this filesystem
[10788.316073] EXT4-fs warning (device vda1): ext4_dx_add_entry:2184: Directory (ino: 384830) index full, reach max htree level :2
[10788.316075] EXT4-fs warning (device vda1): ext4_dx_add_entry:2188: Large directory feature is not enabled on this filesystem
Directory indexes in ext4 are linked to filename size and number of files. So, this limit may vary on your system.
Following some commenter’s advices, redoing the 10M benchmark with real JSON files yield a similar results though:
Reads are still 2x faster and writes are still faster by 20%.
References: source v2 — raw results v2 — tune2fs -l output
Dmke also made a awesome fork of the original code. He added benchmarks for how directory depths are performing against each other:
Conclusion 2: stick to custom wisdom and use a deep directory file system. However, be wary of the performance cost of too many directory levels.