CSE 351 - Spring 2010 - Section 7

Basic Information Theory

If you have n bits, how many different items or choices can you represent? Let's take the simple example of n=2. Here are all possible 2-bit strings.

00
01
10
11

Now let's try it for n=3. Here are all possible 3-bit strings.

000
001
010
011
100
101
110
111

Notice that the number of possible choices (or strings) we can represent doubles each time we increase the number of bits by one. This is an exponent increase in choices for a linear increase in bits. That's why we can represent the total number of choices N as follows.

N = 2n

Conversely, we can take the logarithm base 2 of both sides to find out how many bits we need to represent N choices.

n = log2N

This is not always a whole number. For example, to represent 5 choices, we need log25 = 2.32 bits. However, there is no such thing as a fractional number of bits, so in real-life, if we had to encode one of five choices, we would round up and use 3 bits. That is the same as rounding up to the next greater power-of-two (in this case, 8). You might as well, since choices 6, 7, and 8 will come for free.

Odds are, you decided how many bits to use because you've memorized powers-of-two as part of this class, rather than being able to do logarithms in your head.

Cache Parameters

The following cache parameters are related by the exponential/logarithmic relationship above:

Number of Addresses to Represent Number of Bits Needed to Select
Block Size (in bytes)B=2bb = log2BBlock Offset Bits
Number of SetsS=2ss = log2SSet Index Bits
Main Memory AddressesM=2mm = log2MTotal Address Bits

The other free parameter given in a cache are the number of lines per set (E). The size of a cache is computed by C = B * E * S.

Cache Questions

Why are set index bits in the middle, instead of the highest-order bits?

Sets are a basic restriction on where data from main memory can exist in the cache. If the set bits were the highest-order bits, then large consecutive ranges in main memory would map to the same cache set, causing many conflicts and evictions. This would give poor cache performance due to all the misses.

The number of sets, times the block size (times the number of lines), tells you how many consecutive addresses from main memory you can store at one time.

On the other hand, the tag bits represent a "pressure valve" or "degree of freedom" that lets you store data from wildly different addresses in main memory together in the same cache.

In other words, low-order bits tend to change a lot when you are traversing an array or taking advantage of data locality. High-order bits tend to stay the same for those operations. You want to take advantage of different low-order bits to store a working set with good locality in a cache at the same time, to get a bunch of cache hits together.

The solution to Practice Problem 6.12 on page 645 gives an alternate answer to this question. Also, Figure 6.33 on page 606 gives a good graphical argument on why set bits should be in the middle.

Can you get rid of the valid bit and just make all invalid cache lines contain bytes of zeros?

This is a bad idea for two reasons.

The main reason is that all bytes of zeros may be a value from main memory that we want to use in our program. If your cache required programmers to stop using the value of all zeros, it would not be a very popular technology.

Another reason is that the cache would need to be initialized to all zeros when it powered up. Right now, initializing all valid bits to zero takes a lot less work.

Cache Example

We did the following practice problems from the book: 6.13, 6.14, 6.15, 6.16, and 6.17, on pages 609-611. The answers are on pages 646-647.

Set index Line 0 Line 1
Tag Valid Byte 0 Byte 1 Byte 2 Byte 3 Tag Valid Byte 0 Byte 1 Byte 2 Byte 3
0 09186303F10 000--------
1 451604FE023 38100BC0B37
2 EB0-------- 0B0--------
3 060-------- 32112087BAD
4 C71067807C5 0514067C23B
5 7110BDE184B 6E0--------
6 911A0B7262D F00--------
7 460-------- DE112C08837