Lecture: Virtual memory
preparation
- read the xv6 book: §2, Page tables
administrivia
- lab due on Sunday (instead of Friday)
- update on
make grade
- see Canvas discussion
- how to quit qemu: Ctrl-a and then (without holding Ctrl) x
- challenge problems: no bonus points, though they might give you some ideas on project
- project:
lab mmap/net or your own project;
feel free to talk to us about the scope; start early (forming groups, etc.)
overview
- virtual memory
- popular in modern OSes for isolation (and more)
- example: two user processes write to the same virtual address (e.g., pointers)
0x1000
- the two writes go to different physical addresses
0x80001000
and 0x80002000
- isolation is achieved through naming: one process cannot name a memory address private owned by another process
- today’s focus: how is this achieved?
- workflow
- both VA (virtual address) and PA (physical address) spaces
divided into fixed-size chunks (e.g., 4KB) called pages
- hardware (CPU): perform VA → PA translation based on a data structure called page table
- software (OS): set up page tables for translation
- isolate user processes from each other: per-process page tables
- isolate kernel memory from user: permissions on page table
(post-Meltdown OSes usually switch the kernel page table - more next week)
hardware support
- MMU (memory management unit)
- draw figure: CPU (MMU), memory
- given VA (e.g., load/store instructions), consult page table to translate to PA
- translation failure (due to no mapping or permission): raise an exception (page fault)
- how to make VA → PA lookup faster
- cache: TLB (translation lookaside buffer)
- increase page size
- notes
- segmentation vs. paging
- why get “Segmentation fault” running the follwing C program (e.g., on Linux)? how?
- registers (RISC-V)
satp
(Supervisor Address Translation and Protection) register
holds the (physical) address of the page-table root
stval
(Supervisor Trap Value) register
holds the page-fault (virtual) address
- other architectures have similar registers (e.g.,
%cr3
and %cr2
on x86)
- instructions
- use
csrw
to set satp
- use
sfence.vma
to order changes to page table
page table
satp
: Figure 4.12 of the RISC-V privileged spec
- MODE: xv6 uses 8 (sv39, three-level page table), see Table 4.3
- ASID: ignore (set to 0)
- PPN: physical page number of the page-table root
- example:
#define MAKE_SATP(pagetable) ((8L << 60) | (((uint64)pagetable) >> 12))
in xv6’s kernel/riscv.h
- page table (sv39): Figure 3.2 of the xv6 book
- three-level page table
- 64-bit VA (→ 56-bit PA)
- top 25 VA bits unused
- next 27 VA bits used to index into page table (9 bits for each level)
- bottom 12 bits untranslated and copied to PA
- each level (one page) contains 512 page-table entries (PTEs)
- PTE = 44-bit PPN (for the next level) + 10-bit permission flags
- V/R/W/X/U: valid/read/write/execute/user
- A/D: accessed/dirty
- see
PTE*
macros in xv6’s kernel/riscv.h
- questions
- why not just one-level page table?
- 2^27 * 8 = 1GB per page table (per process);
- multiple levels efficiently encode sparse address space
- can a page table have more levels?
- it makes sense for user processes to use virtual addresses,
but why does the kernel also use virtual addresses?
xv6 memory management
- implement
vmprint
to print page table
- see the first part of lab lazy
- use it to show kernel and process address spaces
- example
- insert a call to
vmprint
in kvminit()
, right after the first kvmmap
- use it to translate VA
0x1000_0000
- what’s the resulting PA?
- kernel address space: Figure 3.3 of the xv6 book
- mostly identity mapping
- except for the trampoline page and the kernel stack - why not just identity mapping?
- inject a page fault in
kernel/main.c
- can be quite different on other architectures/OS kernels - see Linux/x86-64
- process address space: Figure 3.4 of the xv6 book
sbrk
for growing the heap: sys_sbrk
, growproc
, uvmalloc
, mappages
PTE_U
set for user pages
- share the trampoline page with the kernel
- inject a page fault in
user/init.c
- why not just map the entire kernel into the process address space?
- notes
- superpages (huge pages)
- TLB shootdown in multiprocessors
- two-dimensional paging (second-level address translation/nested paging/extended page table)
applications
- protect against stack overflow
- see Michael Barr’s Bookout v. Toyota, “Toyota’s major stack mistakes”
- trick: put a non-mapped, guard page right below user stack
- xv6:
kernel/memlayout.h
- implement null pointer dereference exception
- how would you implement this for Java, say
obj->field
- trick: put a non-mapped page at VA zero
- useful for catching program bugs
- limitations?
- limited physical memory
- applications need more memory than physical memory
- early days: two floppy drives
- strawman: applications store part of state to disk and load back later
- hard to write applications
- virtual memory: offer the illusion of a large, continuous memory
- swap space: OS pages out some pages to disk transparently
- distributed shared memory: access other machines’ memory across network
- grow stack on demand (lab lazy)
sbrk
currently allocates memory upon invcation
- allocate memory lazily (upon page fault)
- copy-on-write fork (lab cow)
- strawman fork: copy all pages from parent to child
- observation: child and parent share most of the data
- mark pages as copy-on-write
- make a copy on page fault
- other sharing
- multiple guest OSes running inside the same hypervisor
- shared objects:
.so
/.dll
files
- memory-mapped files (lab mmap)
mmap()
: map files, read/write files like memory
- simple programming interface
- when to page-in/page-out content?
- avoid data copying: send an mmaped file to network
- compare to using
read
/write
- no data transfer from kernel to user
lab alloc
- physical memory management (
kernel/kalloc.c
):
a free list of physical pages, by embedding a struct run
into each free page
- slab allocation: feel free to design your own data structures for allocating small objects