Miscellaneous techniques
Improving on write time
- Write in parallel with checking the tag (for direct-mapped L1). If there is a hit, everything is fine (for the rest of the block). If there is a miss, invalidate the block.
- Pipeline the write with a buffer to delay the data write by one cycle
For superscalar machines
- Duplicate the L1 cache(s) (could be cheaper than multiple ports?)
For (highly) associative caches
- Keep for each set the MRU index so that it is checked first (cf. MIPS R10000 which has an 8K*1 prediction table to that effect).