BlockHeaded HardDrives

Here’s a fun bit of trivia — What takes up more space? 1MB of data on a modern harddrive or 1MB of data on a Floppy?

Is this a trick question? Is this like the difference between a pound of lead and a pound of feathers? If you think so — You may be surprised that in the majority of cases, an equivalent amount of data on a Floppy will take up more space on a modern HD (and for that matter, an SSD). If that feels weird to you, and you’re asking why that’s the case — that’s natural, and we’ll answer that in the coming paragraphs.

A. Track B. Geometric Sector C. Track Sector D. Block(or Cluster, depending on context)

While we may have a conception of computer data as an enormous undifferentiated field of 1’s and 0’s, this isn’t the case. Our machines still need to be able to find the data that it stores to disk, so there are several layers of organization, abstract and physical, that are used to organize and address this data. To address data on platter, your machine’s operating system (specifically the part of the OS called the filesystem) organizes the drive into units called Blocks. It also takes care of organizing the data it stores into blocks (called, unimaginatively enough, Blocking) during the write cycle and assembling the data from the blocks during the read cycle (called, even more unimaginatively…Deblocking). The number of blocks an OS can address limits the the size of the disk. In the case of early versions of MS DOS, this was roughly 32 MB. With Windows 10, this is something close to 9.3 Zettabytes (9.3 x 10²¹ bytes)

Physically, hard disk platters are organized into tracks(concentric rings of data — very much like the tracks on an old LP) these tracks further are separated during manufacture into sectors by writing headers that separate each track into smaller pieces(a single track on an HD can have thousands of sectors). In SSDs, the constituent data recording units, NAND cells — are grouped into sectors). This goes for many other storage media as well, CD, floppies, and laserdiscs. Formerly, the standard was 512 bytes. Currently, the standard has been increased to 4096 byte sectors. The sector is the smallest unit of storage on a particular disk. ANY piece of data, even a single byte, will take up the entire sector. Which is why data stored on the floppy will almost always use less space than a modern Hard Disk or SSD. But…. this seems wasteful, so why do it?

A few reasons — first is that sector headers contain vital for telling the actual machine (specifically the read/write head) what part of the disk it’s looking at. Next, the sector also contains Error Correcting Codes and Alternate Addressing in case the data is corrupted, the sector is physically damaged, or if the contents are otherwise unreadable. Secondly, it improves read/write efficiency. In almost any computer, reading and writing to disk is the bottleneck that determines overall system performance (though, this is also due to the limitations of bandwidth)and a little wasted space is a reasonable trade off to improve the speed of the overall process. Think of it like a dresser drawer — sometimes you may not care about unfilled space to make sure it’s organized well enough so you can get find your clothes in a timely manner.

Yeah, try finding your stuff in that mess….

Filesystems generally attempt to keep data from a single file stored in adjacent sectors and avoid mixing the contents of different files in the same sector. Furthermore, relational databases will try to keep records from a single item stored in the same sector so that the database can grab a single item in one sweep and not have to try to find the other pieces of that record across far flung regions of the disk. For Big Data applications, columnar databases will store entire datasets across blocks to be able to query the entire dataset without having to read from the disk multiple times.

For a database engineer, this read/write speed is a fact of life they deal with everyday. For the normal consumer, this wasted space is not terribly noticeable. Unless you’re storing millions of small files, files systems are generally intelligent enough to optimize space. And for people designing applications that interact with these systems, it is always important to keep these things in mind… Especially if performance is a priority.

References and Further Reading:

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.