Lecture: Virtual memory

preparation

read the xv6 book: §2, Page tables

administrivia

lab due on Sunday (instead of Friday)
update on make grade - see Canvas discussion
how to quit qemu: Ctrl-a and then (without holding Ctrl) x
challenge problems: no bonus points, though they might give you some ideas on project
project: lab mmap/net or your own project; feel free to talk to us about the scope; start early (forming groups, etc.)

overview

virtual memory
- popular in modern OSes for isolation (and more)
- example: two user processes write to the same virtual address (e.g., pointers) 0x1000
  - the two writes go to different physical addresses 0x80001000 and 0x80002000
  - isolation is achieved through naming: one process cannot name a memory address private owned by another process
- today’s focus: how is this achieved?
workflow
- both VA (virtual address) and PA (physical address) spaces divided into fixed-size chunks (e.g., 4KB) called pages
- hardware (CPU): perform VA → PA translation based on a data structure called page table
- software (OS): set up page tables for translation
  - isolate user processes from each other: per-process page tables
  - isolate kernel memory from user: permissions on page table (post-Meltdown OSes usually switch the kernel page table - more next week)

hardware support

MMU (memory management unit)
- draw figure: CPU (MMU), memory
  - given VA (e.g., load/store instructions), consult page table to translate to PA
  - translation failure (due to no mapping or permission): raise an exception (page fault)
- how to make VA → PA lookup faster
  - cache: TLB (translation lookaside buffer)
  - increase page size
- notes
  - segmentation vs. paging
  - why get “Segmentation fault” running the follwing C program (e.g., on Linux)? how?

int main(void)
{
  *(volatile int *)0 = 0;
  return 0;
}

registers (RISC-V)
- satp (Supervisor Address Translation and Protection) register holds the (physical) address of the page-table root
- stval (Supervisor Trap Value) register holds the page-fault (virtual) address
- other architectures have similar registers (e.g., %cr3 and %cr2 on x86)
instructions
- use csrw to set satp
- use sfence.vma to order changes to page table

page table

satp: Figure 4.12 of the RISC-V privileged spec
- MODE: xv6 uses 8 (sv39, three-level page table), see Table 4.3
- ASID: ignore (set to 0)
- PPN: physical page number of the page-table root
- example: #define MAKE_SATP(pagetable) ((8L << 60) | (((uint64)pagetable) >> 12)) in xv6’s kernel/riscv.h
page table (sv39): Figure 3.2 of the xv6 book
- three-level page table
- 64-bit VA (→ 56-bit PA)
  - top 25 VA bits unused
  - next 27 VA bits used to index into page table (9 bits for each level)
  - bottom 12 bits untranslated and copied to PA
- each level (one page) contains 512 page-table entries (PTEs)
  - PTE = 44-bit PPN (for the next level) + 10-bit permission flags
  - V/R/W/X/U: valid/read/write/execute/user
  - A/D: accessed/dirty
- see PTE* macros in xv6’s kernel/riscv.h
questions
- why not just one-level page table?
  - 2^27 * 8 = 1GB per page table (per process);
  - multiple levels efficiently encode sparse address space
- can a page table have more levels?
  - see RISC-V’s sv48 in the privileged spec
  - x86’s 4-level page table
  - x86’s 5-level page table
- it makes sense for user processes to use virtual addresses, but why does the kernel also use virtual addresses?

xv6 memory management

implement vmprint to print page table
- see the first part of lab lazy
- use it to show kernel and process address spaces
example
- insert a call to vmprint in kvminit(), right after the first kvmmap
- use it to translate VA 0x1000_0000 - what’s the resulting PA?

page table 0x0000000087fff000
 ..0: pte 0x0000000021fff801 pa 0x0000000087ffe000
 .. ..128: pte 0x0000000021fff401 pa 0x0000000087ffd000
 .. .. ..0: pte 0x0000000004000007 pa 0x0000000010000000

kernel address space: Figure 3.3 of the xv6 book
- mostly identity mapping
- except for the trampoline page and the kernel stack - why not just identity mapping?
- inject a page fault in kernel/main.c
- can be quite different on other architectures/OS kernels - see Linux/x86-64
process address space: Figure 3.4 of the xv6 book
- sbrk for growing the heap: sys_sbrk, growproc, uvmalloc, mappages
- PTE_U set for user pages
- share the trampoline page with the kernel
- inject a page fault in user/init.c
- why not just map the entire kernel into the process address space?
notes
- superpages (huge pages)
- TLB shootdown in multiprocessors
- two-dimensional paging (second-level address translation/nested paging/extended page table)

applications

protect against stack overflow
- see Michael Barr’s Bookout v. Toyota, “Toyota’s major stack mistakes”
- trick: put a non-mapped, guard page right below user stack
- xv6: kernel/memlayout.h
implement null pointer dereference exception
- how would you implement this for Java, say obj->field
- trick: put a non-mapped page at VA zero
  - useful for catching program bugs
  - limitations?
limited physical memory
- applications need more memory than physical memory
  - early days: two floppy drives
  - strawman: applications store part of state to disk and load back later
  - hard to write applications
- virtual memory: offer the illusion of a large, continuous memory
  - swap space: OS pages out some pages to disk transparently
  - distributed shared memory: access other machines’ memory across network
grow stack on demand (lab lazy)
- sbrk currently allocates memory upon invcation
- allocate memory lazily (upon page fault)
copy-on-write fork (lab cow)
- strawman fork: copy all pages from parent to child
- observation: child and parent share most of the data
  - mark pages as copy-on-write
  - make a copy on page fault
- other sharing
  - multiple guest OSes running inside the same hypervisor
  - shared objects: .so/.dll files
memory-mapped files (lab mmap)
- mmap(): map files, read/write files like memory
- simple programming interface
- when to page-in/page-out content?
- avoid data copying: send an mmaped file to network
  - compare to using read/write
  - no data transfer from kernel to user

lab alloc

physical memory management (kernel/kalloc.c): a free list of physical pages, by embedding a struct run into each free page
slab allocation: feel free to design your own data structures for allocating small objects