Min-hui Lin Chris Mordue CSE 451: OS 5-12-03 The point of the lecture was: Make common case fast and exploit locality in utilizing resources. Summary of details: TLBs can be implemented in Software or hardware and can be efficiently created in either. The implementation of a TLB does not require all of the bits of the page Table entry but there are various tradeoffs associated with including/not including the bits. Final exam: Expect something similar to the midterm. How does a TLB work? ________________________________ | | | TLB | | VFN PFN | | ______________ | | ___| | | | VA | |_____|________| \|/ physical memory |_______|__| | |____\ PA -------------[ ] | |_____|________| / TLB hit |__| | | |_____|________| simultaneously compare TLB miss: HW or SW translates the VFN into a PA & adds the entry to the TLB Different implementations of TLB have different trade-offs: Hardware implementation: Layout: Set size, each entry is precisely so many bits, certain bits have a specific function. pro: fast con: can't represent higher level stuff speed issue- the trend is that memory was getting big and slow relative to the CPU. We can let CPU do most of the work and still have pretty good speed. space issue- we usually allocate chunks of continuous addresses. So a lot of the page table entries are pretty much the same. If we don't keep every address, and let CPU do more work to figure out what address to use, it'll save us some space. Software implementation: On a TLB hit, associated physical frame # is used in combination with virtual address offset. It is just like a HW TBL hit. TLB miss causes exception to happen, just like when there is a page fault. There is a register holding the TLB miss frame number in the "other stuff" part of the processor. The OS kernel does the translation, loads the mapping into the TLB and re-starts the program. Pseudo-code for TLB exception handler: PFN = Base of the Page Table + Miss Frame # STORE TLB (Miss Frame #, Physical Frame #) pros: Allows increase flexibility (i.e. it allows OS defined segments) and is therefore easily changed and customizable TLB Coverage: Use benchmarks to determine for the memory system what the effective access time (Ea) is. perfect: Ea = Tm' actual: Ea = Pn(Tm') + (1-Pn)(Tx + Tm) Tm = time to access memory Tm' = time to access cache Pn = probability of hit in TLB Tx = time to translate (done by SW or HW cause CPU is fast) Example: Given that Tm' = 1, Tm = 10. If Pn = 1, then Ea = 1 If Pn = .5, then Ea = 6 The typical hit ratio is ~80% in modern OS (meaning that Pn = .8) What bits to put in TLB entry Process ID (PID): con: Increased space with PID bits for each entry and more space required for the HW comparers to compare each entry's PID w/ the running PID You are limited in the # PIDs you can have. (ex: 4 bit PID # = 16 PIDs) pro: don't have to flush TLB as often Valid bit: It's not needed because all entries in the TLB are either valid or removed Protection bit: Protection info is needed. When page protection info is changed, the TLB entry must be updated or invalidated (removed). Modify bit: con: TLB miss becomes expensive: must update the PTE if no modify bit in TLB, how do we know if a page has been written? before attempt to write: R W(in TLB) M R W(in PTE) 1 0 0 1 1 if try to write, w bit is 0 in TLB -> page fault goes to PTE and finds w bit to be 1, w should be allowed so update w bit in TLB and M bit in PTE after write: R W(in TLB) M R W(in PTE) 1 1 1 1 1 trade-off of this implementation: con: slow on first write. pro: first time being slow is better than a miss or hit always being slow because we want to make the common case fast. Reference bit: It's not needed because if it's in the TLB, it's valid and has been referenced. Switching contexts between processes requires flush and zero-fill the TLB. When switching threads we don't have to perform TLB flush, it shows the power of threading.