Memory mapping files and mmap module in python(with a lot of examples)

Siddharth Kshirsagar
Analytics Vidhya
Published in
4 min readAug 3, 2020
Photo by Kote Puerto on Unsplash

Definition: A memory-mapped file object maps a normal file object into a memory. This allows us to modify a file object’s content directly in memory.

  1. memory mapped file objects behave both like bytearray and file objects . Hence all the operations which can be performed on a bytearray like indexing,slicing assigning a slice, or using re module to search through the file.
  2. And all the operations which can be performed on a file object like reading and writing data starting at current position. or using seek() to position the current pointer to different position.

Memory-mapped file object

The memory mapped file object is different for Unix and Windows based system. I will discuss about the Windows version.

class mmap.mmap(fileno, length, tagname=None, access=ACCESS_DEFAULT[, offset])

fileno: maps length bytes from the file specified by the file handle(file descriptor in Unix)fileno . And creates a mmap object.

file.fileno(): returns file descriptor of the stream as number

A file handle(file descriptor in Unix) is a number that uniquely identifies an open file in a computer’s operating system. It describes a data resource, and how that resource may be accessed.

If the length is larger than the current size of the file, the file is extended to contain the length of the bytes. And if the length is zero the maxlength of the map is current size of the file.

To map anonymous memory, -1 should be passed as the fileno along with the length.

tagname: If specified and not None , is a string giving a tagname for the mapping. If the parameter is omitted or None the mapping is created without any name.

Understanding ACCESS_READ, ACCESS_WRITE, ACCESS_COPY

ACCESS_READ : read-only.

ACCESS_WRITE : write-through, affects memory and both underlying file

ACCESS_COPY : copy-on write memory, affects only memory and not underlying file.

Examples:

ACCESS_READ and ACCESS_WRITE and ACCESS_COPY.

Now we know how mmap module functions now let's compare it with normal files.

  1. Assume that there is a binary file(in this case 20MB pdf file) larger than 15MB and we are processing the contents of this files like.
  2. From current position seeking 64 bytes and processing the data at this position.(in simple words we are moving 64 bytes from the start and placing the pointer at that position)
  3. From the current position seeking -32 bytes and processing the data at this position.(in simple words we are moving back 32 bytes and placing the pointer at that point)
  4. This process keeps on processing until a point is reached where the processed data is larger than 10MB.

using memory maps is 13 times fast than normal files.

Utility function to create a memory map

import os
import mmap
def memory_map(filename,access = mmap.ACCESS_WRITE):
size = os.path.getsize(filename)
# the os.open() method returns a file descriptor for the newly opened file
file_descriptor = os.open(filename,os.O_RDWR)
return mmap.mmap(file_descriptor,size,access = access)

Conclusion:

  1. Using the mmap to map files into memory can be an efficient and elegant means for randomly accessing the contents of a file.
  2. Instead of opening a file and performing various combinations of seek(), read() and write() calls we can simply map the file and access the file using the slicing operations.
  3. It should be noted that memory mapping a file does not cause the entire file to be read into the memory buffer. Instead, the operating system reserves a section of virtual memory for the contents of the file.(if the parts of the files are never accessed they stay on the disk).
  4. If more than one python interpreter memory maps the same file, the resulting mmap object can be used to exchange data between the interpreters.

References:

--

--