Min-hui Lin
Chris Mordue
CSE 451: OS
5-12-03

The point of the lecture was: 
	Make common case fast and exploit locality in utilizing resources.

Summary of details:
	TLBs can be implemented in Software or hardware and can be 
	efficiently created in either. The implementation of a TLB 
	does not require all of the bits of the page Table entry but 
	there are various tradeoffs associated with including/not 
	including the bits.


Final exam: Expect something similar to the midterm.

How does a TLB work?

  ________________________________
  |				 |
  |		TLB	  	 |
  |	    VFN     PFN		 | 	
  |	    ______________	 |
  |	___|	 |        |      |
VA      |  |_____|________|     \|/            physical memory
|_______|__|  	 |	  |____\ PA -------------[	]
 	|  |_____|________|    /	TLB hit
	|__|	 |	  |
	   |_____|________|
         
      simultaneously compare

	TLB miss: HW or SW translates the VFN into a PA & adds the 
		entry to the TLB

Different implementations of TLB have different trade-offs:

Hardware implementation:
	Layout: Set size, each entry is precisely so many bits, 
	certain bits have a specific function.

	pro: fast
	con: can't represent higher level stuff

	speed issue- the trend is that memory was getting big and slow
	relative to the CPU. We can let CPU do most of the work and 
	still have pretty good speed. 	  
	
	space issue- we usually allocate chunks of continuous addresses.
	So a lot of the page table entries are pretty much the same.
	If we don't keep every address, and let CPU do more work to 
	figure out what address to use, it'll save us some space.

Software implementation:
	On a TLB hit, associated physical frame # is used in combination 
	with virtual address offset. It is just like a HW TBL hit.
	TLB miss causes exception to happen, just like when there 
	is a page fault. There is a register holding the TLB miss frame number 
	in the "other stuff" part of the processor. The OS kernel does 
	the translation, loads the mapping into the TLB and re-starts the program. 
	

	Pseudo-code for TLB exception handler:
		PFN = Base of the Page Table + Miss Frame #
		STORE TLB (Miss Frame #, Physical Frame #)

	pros: Allows increase flexibility (i.e. it allows OS defined 
		segments) and is therefore easily changed and customizable

TLB Coverage:
	Use benchmarks to determine for the memory system what the 
	effective access time (Ea) is.
	perfect: Ea = Tm'	
	actual:  Ea = Pn(Tm') + (1-Pn)(Tx + Tm)

	Tm  = time to access memory
	Tm' = time to access cache
	Pn  = probability of hit in TLB
	Tx  = time to translate (done by SW or HW cause CPU is fast)
	
	Example: Given that Tm' = 1, Tm = 10. 
		If Pn = 1,  then Ea = 1
		If Pn = .5, then Ea = 6 

	The typical hit ratio is ~80% in modern OS (meaning that Pn = .8)


What bits to put in TLB entry
	
	Process ID (PID): con: Increased space with PID bits for each entry
				and more space required for the HW comparers
				to compare each entry's PID w/ the running PID
			       You are limited in the # PIDs you can have. 
				(ex: 4 bit PID # = 16 PIDs)
			  pro: don't have to flush TLB as often
	
	Valid bit: 	  It's not needed because all entries in the TLB are either
			  valid or removed 
	
	Protection bit:   Protection info is needed. When page protection info is changed,
			  the TLB entry must be updated or invalidated (removed).
	
	Modify bit: 	  con: TLB miss becomes expensive: must update the PTE
		
		    	  if no modify bit in TLB, how do we know if a page has 
		    	  been written?
	
			   before attempt to write:
 
		   	   R W(in TLB)   M     R W(in PTE)
		   	   1 0           0     1 1   
		
			  if try to write, w bit is 0 in TLB -> page fault
			  goes to PTE and finds w bit to be 1, w should be allowed
			  so update w bit in TLB and M bit in PTE
	
			  after write:
 		  	   R W(in TLB)   M     R W(in PTE)
		  	   1 1           1     1 1 

			  trade-off of this implementation: 
			  con: slow on first write.
			  pro: first time being slow is better than a miss or hit always being slow
				because we want to make the common case fast.
	
	Reference bit: 	  It's not needed because if it's in the TLB, it's valid 
		       	  and has been referenced.
		     
Switching contexts between processes requires flush and zero-fill the TLB. 
When switching threads we don't have to perform TLB flush, it shows the power of threading.