Welcome to the Matrix. What data on your hard disk really looks like.

Few may actually wonder how the operating system, Windows for example, keeps track of all your data, your files. How does Windows find things on a disk? What does the data on a disk really look like? For the few taking the red pill, I write this. Let’s discover what’s underneath the world of stylish and polished icons. Welcome to the Matrix.

Sectors

For the operating system a disk is nothing else than a big pool of numbered sectors. Sectors have been 512 bytes in size for years. Modern hard disks use 4 KB sectors. A sector is the smallest addressable unit on a hard disk.

To be able to organize things it uses something called a ‘file system’. In order to be able to create a file system, a pool of sectors needs to be set aside for this file system. It’s called a partition. A partition is nothing else than a large row of sectors.

Partitions

Your hard disk contains at least one partition but there can be more than one partition. The first sector of the hard disk contains a partition table. To locate the data on your disk, the operating system has to start here. Here it finds where partitions start and their sizes. It’s here where the accidents happen that will cause the entire disk to appear empty or unallocated as ‘Windows Disk Management’ calls it.

On the left is what the MBR actually looks like. On the right is how the raw data is interpreted.

File systems

In order to be able to store files and find files on a partition, it needs to contain a file system. An example of a file system is NTFS. A file system is created when we format a partition. Formatting creates file system structures. Where those file system structures are is written in the first sector of the volume (volume = partition containing a file system). We call this sector the boot record. Little accidents that happen here result in a so called ‘RAW file system’.

On the left ‘the matrix’, the RAW hex version of the boot record. On the right the decoded data.

The smallest addressable unit on a volume is called a cluster. A cluster consists of one or more sectors. The cluster size is also stored in the boot record as this value is needed to calculate the sector address. In the end the hard disk is nothing than a pool of sectors and to actually read data from the disk, we need the sector address. So, while file systems store the locations of files as cluster values, we need to convert that to sector numbers to know where these files actually are on the hard disk.

The Master File Table

The Master File Table (MFT) is the most important file system structure in the NTFS file system. It contains information about all files on the volume and even about itself. The MFT is ‘self referencing’ and we only know it’s first cluster from the boot record. To determine all clusters allocated to the MFT, the operating system needs to interpret the first file record in the MFT. It’s the file record where the MFT ‘describes’ itself.

The first file record of the MFT. What a surprise, the filename is $MFT!

Final step — Your files

The clusters allocated to the MFT, or any file for that matter as described on so called ‘runs’. For each run a logical start cluster (LCN) and a length is recorded. This allows the operating system to process the entire MFT.

And that’s how we finally get to our files. By decoding individual file records we get file names, file attributes, creation dates etc.. And the data runs to determine where the file’s data is actually stored.

A file record for an individual file.

A modern operating system hides all the RAW hex data from the user. The user finds her or him self in a world of polished icons and symbols.

When accidents happen she or he can rely on powerful, polished file recovery software that without the help of the operating system, descends in this world of raw hex to retrieve the lost data.

An example of a modern, powerful yet easy to use file recovery software

Summarizing

To locate your data, your files, the operating system basically has to interpret and process a chain of on disk structures. Of course it doesn’t have to start at the first sector of the hard disk for each individual file. It only has to interpret the partition table once. The same goes for the boot record. The location of the volume doesn’t change while the operating system is running. The same goes for the data in the boot record.

Other structures like the MFT are highly dynamic. As files are created, modified or deleted the MFT needs to be constantly updated.

Comments? You can email me at joep@disktuna.com