WTF is memory-mapped files?

Tobias Andersen
6 min readAug 26, 2023

The groundwork for memory-mapped files was laid down with the development of virtual memory capabilities in operating systems during the 1950s and 1960s as computer systems became more complex and memory-intensive. Virtual memory allowed programs to access a larger address space than the physical memory available by utilizing a combination of RAM and disk storage.

The idea of memory-mapped files was popularized by the UNIX operating system in the 1980s. The mmap system call, introduced in the early versions of UNIX, allowed processes to map files or devices into their address space. This provided a seamless way to work with files as if they were in memory, without the need for explicit file I/O operations.

Following this development memory-mapped files quickly gained prominence in the database domain. Databases could use memory-mapped files to manage their storage efficiently, allowing parts of their data to be paged in and out of memory as needed. This approach was particularly beneficial for read-heavy workloads and improved the performance of database systems.

As computing systems evolved, memory-mapped files became an essential feature in various programming languages and operating systems. APIs were developed to provide developers with standardized ways to work with memory-mapped files. For example, the Win32 API on Windows and the aforementioned mmap function in POSIX-compliant systems provided access to memory-mapped files.

Memory-mapped files continued to find applications in scenarios where efficient handling of large datasets is critical. This includes multimedia processing, database systems, scientific computing, and more. The introduction of managed languages like C# and Java brought memory-mapped files to a wider audience by providing higher-level abstractions that made their usage more accessible to developers.

In the .NET ecosystem, memory-mapped files are available through the System.IO.MemoryMappedFiles namespace. This API allows .NET developers to easily create and manage memory-mapped files, making them a part of the toolkit for efficient data handling.

A simple example using C# and .NET Core

Creating a complete streaming server using memory-mapped files involves several components and considerations. However for my purpose I will focus on a simplified example using C# and .NET Core that demonstrates the basic concept of serving a large file via memory-mapped files. Please note that this example was created for demonstration purposes (while I had a sever hangover) and doesn’t cover all the aspects of a production-grade streaming server, such as handling range requests, authentication, security, and performance optimizations.

using System.Net;
using System.Net.Sockets;
using System.IO.MemoryMappedFiles;

class Program
{
static async Task Main(string[] args)
{
var blobFilePath = "sample.dat";
var port = 8080;
var listener = new TcpListener(IPAddress.Any, port);

listener.Start();


while (true)
{
var client = await listener.AcceptTcpClientAsync();

_ = HandleClientAsync(client, blobFilePath);
}
}

static async Task HandleClientAsync(TcpClient client, string filePath)
{
using var fileStream = File.OpenRead(filePath);
using var mmf = MemoryMappedFile.CreateFromFile(fileStream, null, fileStream.Length, MemoryMappedFileAccess.Read, HandleInheritability.None, false);
using var networkStream = client.GetStream();

var buffer = new byte[4096];

await foreach (ValueTask<int> bytesReadTask in ReadDataAsync(mmf, buffer))
{
var bytesRead = bytesReadTask.Result;

if (bytesRead == 0)
break;

await networkStream.WriteAsync(buffer, 0, bytesRead);
}

client.Close();
}

static async IAsyncEnumerable<ValueTask<int>> ReadDataAsync(MemoryMappedFile mmf, byte[] buffer)
{
using var stream = mmf.CreateViewStream();

while (true)
{
var bytesRead = await stream.ReadAsync(buffer, 0, buffer.Length);

if (bytesRead == 0)
yield break;

yield return new ValueTask<int>(bytesRead);
}
}
}

In the code I introduce an async IAsyncEnumerable<ValueTask<int>> named ReadDataAsync to build on my previous posts and ensure that we asynchronously read data from the memory-mapped file in chunks. The ReadDataAsync method uses an async foreach loop to iteratively read data and yield ValueTask<int> instances representing the number of bytes read.

The HandleClientAsync method in turn consumes the asynchronous stream of ValueTask<int> instances and sends the data to the client's network stream. This approach is more efficient for streaming large amounts of data, as it allows for asynchronous data streaming without excessive overhead.

Benefits of memory-mapped files

Memory-mapped files provide a mechanism to map a portion of a file’s data directly into memory. This enables seamless interaction with large files as if they were entirely loaded into memory, even though only specific sections are loaded at a given time. This approach reduces the need for manual file I/O operations, which in turn yield a range of benefits when dealing with large files:

  1. Efficiency: Memory-mapped files eliminate the need for explicit read and write operations between memory and disk. Data is read directly from and written to the memory-mapped region, reducing unnecessary data transfers.
  2. Performance: The operating system handles data movement, making use of caching and other optimizations to provide efficient data access.
  3. Simplicity: Memory-mapped files simplify code by allowing developers to use memory operations (pointers) to interact with file data, akin to working with in-memory arrays.
  4. Shared Memory: Multiple processes can map the same file, enabling efficient inter-process communication by sharing data through memory-mapped regions.
  5. Caching: Operating systems often utilize caching mechanisms to optimize memory-mapped file access. Frequently accessed portions of the file might be kept in memory, reducing the need to repeatedly read from disk.
  6. Sequential Access: Memory-mapped files can be particularly performant when there’s sequential access to the data. Sequential access patterns align well with the underlying disk I/O and memory management mechanisms.
  7. Large Datasets: When working with large files or datasets that cannot fit entirely in memory, memory-mapped files can be much more efficient than trying to load and manage chunks of data manually.

Performance of memory-mapped files

Memory-mapped files can offer substantial performance benefits in situations where they align well with the characteristics of the data and access patterns. However, like any technology, their performance is influenced by various factors. It’s recommended to perform thorough testing and profiling for your specific use case to determine if memory-mapped files are the best choice for achieving your desired performance improvements. Factors affecting performance include, but are not limited to, the following:

  1. Access Patterns: The nature of read and write patterns impacts performance. Sequential access is usually more performant than random access, as it aligns well with the operating system’s data caching and disk read-ahead mechanisms.
  2. Memory Usage: If the memory-mapped region becomes too large and exceeds available physical memory, performance might degrade due to frequent swapping between RAM and disk.
  3. File Size: The size of the file being memory-mapped affects performance. Smaller files might be efficiently cached in memory, while larger files could have more variable performance depending on access patterns.
  4. OS & Hardware: Different operating systems and hardware architectures handle memory-mapped files differently. Modern systems are generally well-equipped to optimize memory-mapped file usage, but variations exist.
  5. Concurrent Access: Performance might be affected if multiple processes or threads access the same memory-mapped file concurrently without proper synchronization mechanisms.

Conclusion

Memory-mapped files provide an elegant solution for efficiently managing and processing large datasets in .NET Core and C#. By leveraging memory operations and the capabilities of the operating system, developers can significantly enhance their application’s performance and responsiveness.

Whether dealing with multimedia, databases, or other data-intensive tasks, memory-mapped files are a powerful tool in the modern developer’s toolkit. It is however important to keep in mind that the “right tool for the right job” still applies and as always there are no “golden hammers”, thus memory-mapped files are most effective when used in scenarios where:

  • Large datasets need to be processed efficiently.
  • Sequential access patterns are predominant.
  • Multiple processes need to share data through shared memory.
  • Frequent read and write operations are required.

--

--