Lecture: Address Translation & Paging
Physical Memory
  - byte addressable (can refer to  each byte in memory), limited size
  
 - ~200 cycles access latency (as a reference, common instrs within 7 cycles)
  
 - a process's code and data needs to be in memory to execute
 
Physical Memory Management
  - another resource allocation problem: limited physical memory, multiple processes
  
 - so how do we allocate memory?
  
    - simple case: one process at a time
    
      - give the entire physical memory to the process
 
      - no translation needed, process's address = physical address
 
      - pro vs cons? 
 
    
     - actual case: multiple processes
    
      - attempt 1:
 
      
        - if we know how much memory a process needs, we can just put processes's memory into disjoint sections of the physical memory
        
 - do we need address translation now? how do we support fork?
        
          - virtual memory: every process has their own view of memory
 
          - virtual address vs physical address
          
 - hw support: base and bound registers
          
         
         - pos vs cons? What are the limitations? 
 
      
      - attempt 2:
 
      
        - do programs need all of its memory at once?
        
 - how can we make more efficient uses of physical memory?
        
 - paging: divide process memory into fixed chunks, only keep ones needed in physical memory
 
      
    
   
 
Paging
  - divide a process's memory into fixed sized pages (typically 4KB)
  
 - only keeps pages we currently need in memory (what might that be?)
  
 - dynamically load other pages into memory as needed it
 
  
    - access to a page not in memory causes a page fault
  
 
Address Translation With Paging
  - how would we implement this purely in software?
    
      - divide up physical memory into page sized chunks
 
       - each chunk of physical memory is called frame, page frame, or physical page
  
      - track which page is mapped to which frame (physical memory)
      
        - is this information per process or per entire system? 
        
 - what data structure can we use to store this info?
        
 - what's the cost for accessing the data structure?
        
 - where do we store the data structure?
        
 - how many of these translation mappings would we need to store?
      
 
       - on every memory access, transfers control to the kernel and asks the kernel to perform address translation
 
    
   - how is it actually done?
  
    - page table
 
     
      - data structure for storing page to frame mappings
      
 - single level
 
      
      - multilevel
 
      
      - indirection can help with space saving (when does it not?) 
 
    
    - how often do we need to perform address translation?
 
    - how can we speed it up?
 
    
      - cache the translation lookup! Translation Lookaside Buffer (TLB)
      
        - upon a memory access, the hardware checks if the translation for the page is cached in the TLB
 
        - if not, walk the page table to find the corresponding frame, and add that to the TLB
 
        
      
       - have hardware perform the translation look up (page table walk)
    
 
  
 
x86-64 Address Translation
architecture specification defines format of the page table
x86-64 page table format
  - 4 level page table
  
    - PML4: Page Map Level 4, top level page table, each entry stores the address of a PDPT
 
    - PDPT: Page Directory Pointer Table, 2nd level page table, each entry stores the address of a PDT
 
    - PDT: Page Directory Table, 3rd level page table, each entry stores the address of a PT
 
    - PT: Page Table, last level page table, each entry stores the address of the mapped frame
 
  
  
   - each table is 4KB in size and each table entry is 8 bytes
 
  
    - 4096 (table size) / 8 (entry size) = 512 (entries)
 
    - each table is indexed with 9 bits of the virtual address
 
    - what does the 8 byte page table entry look like? 
    
 - page table entry: 
      
        
        - Bit 0-11 contain information about the page (bit 0: present, 1: writable, 2: user accessible)
 
        - Bit 12-47 contain the physical page number of the frame
 
        - Bit 48-63 contain either reserved field or other permission info about the page (63: executable)
 
        
      
 
  - why do we care about the format if hardware does the walk and permission checking?
 
  
    - the kernel is responsible for setting up the page tables and filling out the entries
 
    
    - the kernel can use these bits to make paging policy decisions
        
- eg. bit 5 indicates if the page has been accessed, 6 indicate if the page has been written to
 
 
  
  
Page Faults
  - an exception that is raised by the hardware when something wrong happens in the page table walk
  
 - could be missing the translation mapping or violating the access permission
  
 - how does the kernel handle a page fault?
  
  
    - identify and handle valid page faults
 
    
      - stack or heap growth
 
      - memory mapped files
 
      - known permission mismatch
 
      - memory pressure (access to swapped pages)
 
    
    - terminate threads with invalid page faults 
    
      - nullptr, random address in unallocated virtual memory
 
      - actual permission mismatch
 
    
   
  - needs bookkeeping structures to track information (unrelated to address translation) about each page
 
  - machine independent bookkeeping structures vs machine dependent page table
 
  
    - machine independent structures in xk: 
vspace, vregion, vpage_info 
      - track the size of each region (stack, heap, code), if a page is associated to any file, if a page is cow
 
    - machine dependent structures in xk: the x86-64 page table 
x86_64vm.c  
      - used for actual translation information
 
    - you can update just vspace and generate a new machine depedent page table with 
vspaceinvalidate 
  
  - last thing: how does the TLB interact with page fault handling
 
  
    - if we change the permission of a page while handling page fault (eg. cow), is the cached result in TLB still valid?
 
    - if we add a new mapping in page fault (stack growth), do we need to do anything to the TLB?