The Google File System

Krisha Mehta
Computers, Papers and Everything
5 min readJan 3, 2019

The paper we will be studying today is important in many aspects. It talks about storage systems and how this particular system was different from the ones that existed before it. At the same time, for Google, this system was an important part of Map Reduce, which will be discussed in my next article.

The paper we will be studying today is important in many aspects. It talks about storage systems and how this particular system was different from the ones that existed before it. At the same time, for Google, this system was an important part of Map Reduce, which will be discussed in my next article.

Introduction

From the introduction of the paper, we have:

We have designed and implemented the Google File System (GFS) to meet the rapidly growing demands of Google’s data processing needs. GFS shares many of the same goals as previous distributed file systems such as performance, scalability, reliability, and availability. However, its design has been driven by key observations of our application workloads and technological environment, both current and anticipated, that reflect a marked departure from some earlier file system design assumptions.

This is one of the most important features of GFS. GFS was built specifically to meet the growing processing demands that Google encountered. What was the best way to meet those demands?
To study and anticipate their workloads and operations and build a system that supports that.

The GFS is different than other file systems. The main differences between other file systems and GFS are :

  1. Hardware failures are common for any system. The rate at which these failures occur increase at a higher rate when the file system is as huge as the one at Google. The quantity and quality of the components virtually guarantee that some are not functional at all times. Due to this, error detection and continuous monitoring become essential. The goal is to build software that helps in reducing these failures.
  2. GPS mainly supports huge files. This is natural given the amount of data Google encounters and handles on a regular basis.
  3. Most files are mutated by appending new data rather than overwriting data. Random writes within a file do not exist. This helps in optimization as well as atomicity guarantees.
  4. The flexibility of the system is increased by co-designing the applications and the file system API.

Interface

The GFS supports usual file operations that include create, delete, read, write, open and close. Along with this, GFS has two more additional operations, snapshot and record append.

Traditional append enables the writer to add or “append” to the end of the file. However, this becomes complicated when two or more users want to append at the same time i.e concurrently. Normally, when such a situation arises, only one of the two append operation is picked. However, for a system like the one Google uses, this can be time-consuming as concurrent appends are encountered quite often. Let us take the example of a user who searches the word “Universe.” There would be several web crawlers working together on a file adding resources. Concurrent operations are bound to happen. In this case, the results from multiple clients are merged together.

Snapshot makes a copy of a file or directory tree almost immediately while minimizing any interruptions of ongoing mutations. This is done to quickly create copies of the huge dataset or to checkpoint the current state so that future changes can be rollbacked. The paper further explains how this works

We use standard copy-on-write techniques to implement snapshots. When the master receives a snapshot request, it first revokes any outstanding leases on the chunks in the files it is about to snapshot. This ensures that any subsequent writes to these chunks will require interaction with the master to find the leaseholder. This will give the master an opportunity to create a new copy of the chunk first.

I will explain what the terms master and chunks mean in a moment.

Architecture

A GFS cluster consists of a single master and multiple chunkservers and is accessed by multiple clients, as shown in Figure 1.

The diagram is taken from the paper.

Files in GFS are divided into chunks. Chunkservers store chunks on the local disks as Linux files and read or write chunkdata specified by a chunkhandle and byte-range. Each chunk is also replicated on multiple chunkservers. This avoids any loss of data if and when the hardware fails. The default number of replicas is three. The master maintains all file system metadata which includes the namespace, access control information, the mapping from files to chunks and the current locations of the chunks. The clients communicate with the master and the chunkservers. The clients do not cache the file data as the files are huge. It also avoids cache coherence issues. Chunkservers do not cache any data because the chunks are stored as local files.

One of the most common issues that a master in any system suffers is a bottleneck. It is necessary that the load is balanced among the various components of the system. Otherwise, if the master fails, it will lead to a single point failure. GFS avoids this. It minimizes the master’s involvement in reads and writes so that it does not become a bottleneck. Clients never read
and write file data through the master. Instead, a client asks the master which chunkservers it should contact. It caches this information for a limited time and interacts with the chunkservers directly for many subsequent operations. Hence, the master is moved off the critical path to avoid a bottleneck. When a master fails, a new master is needed. To overcome this problem, journaling is used. Journals change the namespace and replicate the master.

The paper further talks about implementing such a system for production as well as for research. The Google File System demonstrates the qualities essential for supporting large-scale data processing workloads on commodity hardware. While some of its features may be specific to the work done at Google, most of it can be used by other organizations as well, especially by those that encounter large-scale data on a daily basis.

--

--