How does a higher half kernel work?

cstack
5 min readOct 18, 2016

--

This continues my previous post about how the OS enables virtual memory.

One of the main reasons for using virtual memory is to let user programs act as though they were loaded at address 0. Most executable formats assume they are loaded at address 0, so we would like the operating system get out of the way and live in the higher reaches of memory, let’s say address 0xC0000000 (3 GB). This leaves the first 3 GB of the address space for the user program, and the last GB for the kernel. Seems fair.

This is called a Higher Half Kernel. It requires some extra shenanigans when you enable paging.

If we want our kernel to run in the upper half of virtual memory, we need to tell the linker. In our linker script, we can include this directive to set the relocation address:

. = 0xC0100000; /* the code should be linked as though it were loaded at 3GB + 1MB. */

Now all addresses in our code will be computed as though our kernel is loaded into memory starting at address 0xC0100000. But what if the computer we’re running on doesn’t have 4 GB of memory? What if it only has 128 MB (the default value for the qemu emulator that I’m using)? We have to tell our bootloader, GRUB, to load the kernel into a lower memory address. We choose 1MB because that is the lowest memory address that isn’t used by GRUB. In our link script, we explicitly declare where to load each section of our program:

.text ALIGN (0x1000) :   AT(ADDR(.text)-0xC0000000)....rodata ALIGN (0x1000) : AT(ADDR(.rodata)-0xC0000000)...

So our code will assume it’s loaded at address 0xC0100000 (the relocation address), but it will actually be loaded at 0x00100000 (the relocation address minus 0xC0000000).

How will we reconcile this difference? By mapping a second set of addresses before we enable paging. In the previous article, we mapped the first 4MB of virtual memory to the first 4MB of physical memory. We also want to map 4MB of virtual memory starting at 0xC0000000 to that same first 4MB of physical memory. Like this:

We set up this second mapping at the same time as the first one, before we turn on paging. We turn on paging as described in the previous post, the program counter is incremented, and we fetch the next instruction using the identity mapping in the first 4MB of memory.

Here’s some assembly code that creates the page directory from the above diagram. Note that it is a single-level page table for simplicity.

KERNEL_VIRTUAL_BASE equ 0xC0000000
KERNEL_PAGE_NUMBER equ (KERNEL_VIRTUAL_BASE >> 22) ; Index in the page directory
section .data
align 0x1000 ; align to 4KB, the size of a page
BootPageDirectory:
; the first entry identity maps the first 4MB of memory
; All bits are clear except the following:
; bit 7: PS The kernel page is 4MB.
; bit 1: RW The kernel page is read/write.
; bit 0: P The kernel page is present.
dd 0x00000083 ; in binary this is 10000011
; entries for unmapped virtual addresses
times (KERNEL_PAGE_NUMBER - 1) dd 0 ; Pages before kernel space.
; entry for the kernel's virtual address
dd 0x00000083
; entries for unmapped addresses above the kernel
times (1024 - KERNEL_PAGE_NUMBER - 1) dd 0

Then we jump to an address in that higher section of virtual memory. But since it maps to the same physical address, execution continues like normal. Finally, we can pass control over to our C code, which is linked assuming it is loaded at 0xc0100000 (which it is now, since paging is turned on).

HERE’S THE TRICKY PART OF A HIGHER HALF KERNEL. The code that runs before paging is turned on has to be very careful when it uses memory addresses. It was linked assuming it would be loaded at 0xC0100000 but in actuality, it was loaded at 0x00100000. Therefore, it needs to subtract 0xC0000000 from any addresses it uses. So loading the page tables looks more like:

0: mov ecx, (page_table - KERNEL_VIRTUAL_BASE); Copy the physical address of the top-level page table
1: mov cr3, ecx ; Tell the CPU the physical address of the page table

`page_table` is the virtual address of the page table, which was calculated at link time using the relocation address KERNEL_VIRTUAL_BASE. Subtracting KERNEL_VIRTUAL_BASE gives the physical address of the page table.

We can now reference addresses like normal, since the page table does the subtraction for us, but we still need to update the instruction pointer, since it’s still pointing to an address in the lower half of memory. That can be done with a long jump. A normal jump specifies a 16-bit address which is added to base address of the current segment. To jump to an address outside the range [segment_start, segment_start + 2¹⁶), we have to use a long jump. We first put an absolute 32-bit address into a register, then pass that register to the jump instruction:

lea ecx, [StartInHigherHalf] ; Load the 32-bit virtual address of the label StartInHigherHalf into register ecx
jmp ecx ; Jump to the address stored in ecx

`StartInHigherHalf` could be directly after the jump instruction, but we do the jump anyway to update the instruction pointer to start using higher-half addresses.

There we go! The shenanigans are over and our kernel can now run as though it were loaded at 0xC0100000. Since we don’t need the identity mapping in the lower-half of memory anymore, we can clean up after ourselves:

; zero-out the first entry in the page directory
mov dword [BootPageDirectory], 0
; tell the CPU the first entry has changed
invlpg [0]

That’s it. Our kernel is still sitting at physical location 0x00100000, but it can run as though it were sitting at location 0xC0100000. And now we can load user programs at any old physical address that’s free, and map virtual address 0 to that physical address. The user program will never know it’s not at address 0. We’re so sneaky.

--

--

cstack

Writing codez @Square. Previously @Twitter. Graduated from University of Michigan. My heart is as big as a car.