How does an OS enable virtual memory?

cstack
4 min readOct 16, 2016

--

I’m implementing virtual memory in my hobby kernel, so I’ve had to do a lot of digging and build up my intuition for how it works. Here’s my understanding so far!

Accessing memory (RAM) happens incredibly often. For example, EVERY SINGLE INSTRUCTION needs to be fetched from memory before the CPU can execute it. Virtual memory completely changes how each request to memory works. Once you turn it on, memory addresses no longer mean the same thing. So it’s hard to think about how to write a program that transitions from not using it to using it (e.g. an operating system).

Here’s my mental model for memory accesses. First, if virtual memory is turned OFF:

The CPU issues a request to read or write a certain address. The request goes to the chipset. In most cases, the chipset forwards the request to RAM. For certain addresses, the chipset forwards the request to a memory-mapped hardware device.

Two examples of memory requests with paging disabled. The chipset routes the request to RAM or to another hardware device.

To turn on virtual memory, the kernel first has to set up a couple data structures in RAM called page tables. These are big tables that map “virtual” addresses to “physical” addresses. There may be multiple levels of page tables, in which case the higher-order bits of the virtual address determine which entry to look for in the first-level page table. That entry contains the physical address of another page table. The next highest bits in the virtual address determine which entry to look for the in the second-level page table. That entry contains the physical address of a page (section) of memory. The lowest order bits of the virtual address determine the offset inside the physical page to fetch.

Diagram taken from https://61creview.wordpress.com/tag/tlb/

So when virtual memory is turned ON:

The CPU issues a request to read or write a certain address. The memory management unit (MMU) intercepts the request. It considers the request to be a virtual address and translates it to a physical address using page tables (which are also stored in RAM). The MMU then forwards the physical address to the chipset like normal.

An example memory request with paging enabled. The MMU translate the virtual address 0x01 to the physical address 0xC1. Page tables are stored somewhere in RAM but aren’t shown.

Note that every request from the CPU now requires an additional request to look up an entry in one or more page tables. This does not cause an infinite loop because page tables store physical addresses, so they don’t need to be translated.

Also note that this would seem to increase the number of requests to RAM by 2x (or more!). Why does this not make all our programs twice as slow? Because caching. The MMU caches the result of these translations in a set of registers called the TLB (translation look-aside buffer).

So back to the question, how does the operating system enable paging? First, it creates some page tables in memory (addressing memory by its physical address). In x86, it passes the physical address of the top-level page table (or page directory) to the CPU in cr3, a control register.

Then the OS flips a bit in another control register (cr0) to tell the CPU to start sending addresses through the MMU and translating them using our page tables.

HERE’S THE TRICKY PART. As soon as you enable paging, all memory requests are treated as virtual addresses and go through the MMU. This includes the request for the next instruction. Let’s say you write this code (I label each line with a number to represent it’s address):

0: mov ecx, page_table; Copy the address of the top-level page table
1: mov cr3, ecx ; Tell the CPU the address of the page table
2: mov ecx, cr0 ; Copy flags into register ecx
3: or ecx, 0x80000000 ; Set PG bit in CR0 to enable paging
4: mov cr0, ecx ; Save flags back into cr0. Paging is enabled
5: call main ; Call the main() function of our kernel

After instruction 4 runs, the CPU increments the instruction pointer like normal, then uses it to request the next instruction from memory. But that address is now translated by the MMU! We incremented a physical address, then treated the result as a virtual address. A couple things can go wrong:

  • If we didn’t initialize any entries in our page tables, the request for the next instruction would fail and cause a page fault.
  • If we mapped the address of our next instruction to a some random physical address, the CPU will start executing code at that physical address and probably cause an exception.

To avoid those problems, we need to add some entries to our page table before enabling paging. The simplest solution is to identity-map the addresses that contain our kernel. So if our kernel starts at address 0x00100000, the page table should map “0x00100000” to “0x00100000”. That way, when the CPU fetches the next instruction, the MMU will not modify the address before sending it to RAM.

And our OS doesn’t crash.

But that isn’t very useful. Our code is using virtual memory right now, but it’s acting the exact same as before. In my next post I’ll explain how to use virtual memory for good, by implementing a “higher half kernel.”

--

--

cstack

Writing codez @Square. Previously @Twitter. Graduated from University of Michigan. My heart is as big as a car.