CODEX

Exploring Windows Memory Management —Swapping & Virtual Memory

Tuan Nguyen
CodeX

--

I used to be curious about how things happened inside my Windows. I would like to share with you what I explored on this topic. There are two memory management approaches: swapping and virtual memory. The difference between them is the way in which a process is loaded to the main memory.

Swapping

The operation of a swapping system is illustrated in the following figure. Initially, only process A is in memory. Then processes B and C are created or swapped in from the disk. In (d) A is swapped out to disk. Then D comes in and B goes out. Finally, A comes in again. Because A is now at a different location, addresses contained in it must be relocated, either by the software when it is swapped in or (more likely) by hardware during program execution.

When swapping creates multiple holes in memory, it is possible to combine them all into one big one by moving all the processes downward as far as possible. This technique is known as memory compaction. It is usually not done because it requires a lot of CPU time. For example, on a 1-GB machine that can copy at a rate of 2 GB/sec (0.5 nsec/byte), it takes about 0.5 sec to compact all of the memory. That may not seem like much time, but it would be noticeably disruptive to a user watching a video stream.

A point that is worth making concerns about how much memory should be allocated for a process when it is created or swapped in. If processes are created with a fixed size that never changes, then the allocation is simple: the operating system allocates exactly what is needed, no more and no less.

If, however, processes’ data segments can grow, for example, by dynamically allocating memory from a heap, as in many programming languages, a problem occurs whenever a process tries to grow. If a hole is adjacent to the process, it can be allocated and the process can be allowed to grow into the hole. On the other hand, if the process is adjacent to another process, the growing process will either have to be moved to a hole in memory large enough for it, or one or more processes will have to be swapped out to create a large enough hole. If a process cannot grow in memory and the swap area on the disk is full, the process will have to wait or be killed.

If it is expected that most processes will grow as they run, it is probably a good idea to allocate a little extra memory whenever a process is swapped in or moved, to reduce the overhead associated with moving or swapping processes that no longer fit in their allocated memory. However, when swapping processes to disk, only the memory actually in use should be swapped; it is wasteful to swap the extra memory as well. In the following figure, we see a memory configuration in which space for growth has been allocated to two processes.

If processes can have two growing segments, for example, the data segment being used as a heap for variables that are dynamically allocated and released and a stack segment for the normal local variables and return addresses, the alternative arrangement suggests itself, namely that of figure (b). In this figure, we see that each process illustrated has a stack at the top of its allocated memory that is growing downward, and a data segment just beyond the program text that is growing upward. The memory between them can be used for either segment. If it runs out, either the process will have to be moved to a hole with sufficient space, swapped out of memory until a large enough hole can be created, or killed.

Virtual Memory

The basic idea behind virtual memory is that the combined size of the program, data, and stack may exceed the amount of physical memory available for it. The operating system keeps those parts of the program currently in use in the main memory, and the rest on the disk. For example, a 512-MB program can run on a 256-MB machine by carefully choosing which 256 MB to keep in memory at each instant, with pieces of the program being swapped between disk and memory as needed.

Virtual memory can also work in a multiprogramming system, with bits and pieces of many programs in memory at once. While a program is waiting for part of itself to be brought in, it is waiting for I/O and cannot run, so the CPU can be given to another process, the same way as in any other multiprogramming system.

Paging

Most virtual memory systems use a technique called paging, which we will now describe. On any computer, there exists a set of memory addresses that programs can produce. When a program uses an instruction like

MOV REG, 1000

It does this to copy the contents of memory address 1000 to REG (or vice versa, depending on the computer). Addresses can be generated using indexing, base registers, segment registers, and other ways.

These program-generated addresses are called virtual addresses and form the virtual address space. On computers without virtual memory, the virtual address is put directly onto the memory bus and causes the physical memory word with the same address to be read or written. When virtual memory is used, the virtual addresses do not go directly to the memory bus. Instead, they go to an MMU (Memory Management Unit) that maps the virtual addresses onto the physical memory addresses.

A very simple example of how the mapping works is shown in the following figure. In this example, we have a computer that can generate 16-bit addresses, from 0 up to 64K. These are the virtual addresses. This computer, however, has only 32 KB of physical memory, so although 64-KB programs can be written, they cannot be loaded into memory in their entirety and run. A complete copy of a program’s memory image, up to 64 KB, must be present on the disk, however, so that pieces can be brought in as needed.

The virtual address space is divided up into units called pages. The corresponding units in physical memory are called page frames. The pages and page frames are always the same sizes. In this example, they are 4 KB, but page sizes from 512 bytes to 1 MB have been used in real systems. With 64 KB of virtual address space and 32 KB of physical memory, we get 16 virtual pages and 8-page frames. Transfers between RAM and disk are always in units of a page. When the program tries to access address 0, for example, using the instruction

MOV REG, 0

virtual address 0 is sent to the MMU. The MMU sees that this virtual address falls on page 0 (0 to 4095), which according to its mapping is page frame 2 (8192 to 12287). It thus transforms the address to 8192 and outputs address 8192 onto the bus. The memory knows nothing at all about the MMU and just sees a request for reading or writing address 8192, which it honors. Thus, the MMU has effectively mapped all virtual addresses between 0 and 4095 onto physical addresses 8192 to 12287. Similarly, an instruction

MOV REG, 8192

is effectively transformed into

MOV REG, 24576

because virtual address 8192 is on virtual page 2 and this page is mapped onto physical page frame 6 (physical addresses 24576 to 28671). As a third example, virtual address 20500 is 20 bytes from the start of the virtual page 5 (virtual addresses 20480 to 24575) and maps onto physical address 12288 + 20 = 12308.

By itself, this ability to map the 16 virtual pages onto any of the eight-page frames by setting the MMU’s map appropriately does not solve the problem that the virtual address space is larger than the physical memory. Since we have only eight physical page frames, only eight of the virtual pages are mapped onto physical memory. The others, shown as crosses in the figure, are not mapped. In the actual hardware, a present/absent bit keeps track of which pages are physically present in memory.

What happens if the program tries to use an unmapped page, for example, by using the instruction

MOV REG, 32780

which is byte 12 within virtual page 8 (starting at 32768)? The MMU notices that the page is unmapped (indicated by a cross in the figure) and causes the CPU to trap the operating system. This trap is called a page fault. The operating system picks a little-used page frame and writes its contents back to the disk. It then fetches the page just referenced into the page frame just freed, changes the map, and restarts the trapped instruction.

For example, if the operating system decided to evict page frame 1, it would load virtual page 8 at physical address 4K and make two changes to the MMU map. First, it would mark virtual page 1’s entry as unmapped, to trap any future accesses to virtual addresses between 4K and 8K. Then it would replace the cross in virtual page 8’s entry with a 1 so that when the trapped instruction is re-executed, it will map virtual address 32780 onto physical address 4108.

Now let us look inside the MMU to see how it works and why we have chosen to use a page size that is a power of 2. In the next figure, we see an example of a virtual address, 8196 (0010000000000100 in binary), being mapped using the MMU map of the previous one. The incoming 16-bit virtual address is split into a 4-bit page number and a 12-bit offset. With 4 bits for the page number, we can have 16 pages, and with 12 bits for the offset, we can address all 4096 bytes within a page.

The page number is used as an index into the page table, yielding the number of the page frame corresponding to that virtual page. If the present/absent bit is 0, a trap to the operating system is caused. If the bit is 1, the page frame number found in the page table is copied to the high-order 3 bits of the output register, along with the 12-bit offset, which is copied unmodified from the incoming virtual address. Together they form a 15-bit physical address. The output register is then put onto the memory bus as the physical memory address.

Segmentation

This is another scheme to manage memory bases on the virtual memory concept. In that way, the virtual address space is a collection of segments. Each segment has a name and a length. Thus the addresses specify both the segment name and the offset within that segment. The user, therefore, specifies each address by two quantities: a segment name and an offset. Contrast this scheme with the paging one, in which the user specifies only a single address which is partitioned by the hardware into a page number and an offset, all invisible to the programmer.

For simplicity of implementation, segments are numbered and are referred to by a segment number rather than by a segment name. That’s the reason why a logical address in the segmentation approach consists of two tuples:

<segment-number, offset>

Although the user can refer to objects in the program by a two-dimensional address, the actual physical address is still, of course, a one-dimensional sequence of bytes. Thus, the segment table is defined as a mapping from a two-dimensional user-defined address to a one-dimensional physical address. Each entry in the segment table has a segment base and a segment limit. The segment base contains the starting physical address where the segment resides in memory, whereas the segment limit specifies the length of the segment.

Intel Architecture’s strategy

Both paging and segmentation have advantages and disadvantages. In fact, some architectures provide both of them, and Intel is one of them. It supports both pure segmentation and segmentation with paging.

In the Pentium system, the CPU generates a logical address which is given to the segmentation unit. The segmentation unit produces a linear address for each logical address. The linear address is then given to the paging unit which in turn generates the physical address in the main memory. Thus, the segmentation unit and paging unit form the equivalent of the memory-management unit as follows:

The Pentium architecture allows a segment to be as 4GB and the maximum number of segments per process is 16KB. The logical address space is divided into two partitions. The first one consists of up to 8KB segments that are private to that process. The second consists of up to 8KB segments that are shared among all the processes. About the paging, a page size is either 4 KB or 4 MB.

Originally published at https://emerging-it-technologies.blogspot.com.

--

--