Computer Architecture & Organization — Cache Memory Design Issues

Sanjay Santokee
7 min readNov 6, 2019

--

1. Cache Addresses

-Logical Cache/Virtual Cache stores data using virtual addresses.

Accesses cache directly without going through MMU

-Physical Cache stores data using main memory physical addresses.

One obvious advantage of the logical cache is that cache access speed is faster than for a physical cache, because the cache can respond before the MMU performs an address translation.

The disadvantage has to do with the fact that most virtual memory systems supply each application with the same virtual memory address space. That is, each application sees a virtual memory that starts at address 0. Thus, the same virtual address in two different applications refers to two different physical addresses. The cache memory must therefore be completely flushed with each application switch, or extra bits must be added to each line of the cache to identify which virtual address space this address refers to.

2. Cache Size

The larger the cache, the larger the number of gates involved in addressing the cache. The available chip and board area also limit cache size.

The more cache a system has, the more likely it is to register a hit on memory access because fewer memory locations are forced to share the same cache line.

Although an increase in cache size will increase the hit ratio, a continuous increase in cache size will not yield an equivalent increase of the hit ratio.

i.e.: An Increase in cache size from 256K to 512K (increase by 100%) will yield a 10% improvement of the hit ratio, but an additional increase from 512K to 1024K would yield a less than 5% increase of the hit ratio (law of diminishing marginal returns).

3. Replacement Algorithm

Once the cache has been filled, when a new block is brought into the cache, one of the existing blocks must be replaced.

For direct mapping, there is only one possible line for any particular block, and no choice is possible

Direct mapping — No choice, Each block only maps to one line. Replace that line.

For the associative and set-associative techniques, a replacement algorithm is needed. To achieve high speed, such an algorithm must be implemented in hardware.

Least Recently Used (LRU) — Most Effective

Sock Drawer:

Least recently used- not favorite set of socks at back of drawer

Most recently used- favorite at front of drawer

Replace that block in the set that has been in the cache longest with no reference to it.

For two- way set associative, this is easily implemented. Each line includes a USE bit. When a line is referenced, its USE bit is set to 1 and the USE bit of the other line in that set is set to 0. When a block is to be read into the set, the line whose USE bit is 0 is used.

Because we are assuming that more recently used memory locations are more likely to be referenced, LRU should give the best hit ratio. LRU is also relatively easy to implement for a fully associative cache. The cache mechanism maintains a separate list of indexes to all the lines in the cache. When a line is referenced, it moves to the front of the list. For replacement, the line at the back of the list is used. Because of its simplicity of implementation, LRU is the most popular replacement algorithm.

First- In- First- Out (FIFO)-

Sock Drawer:

Replace Socks in drawer that you have the longest (Oldest Socks), no matter amount of usage

All data in loc 1 through 5 in cache, (loc 1) would be in cache longest.

Replace that block in the set that has been in the cache longest. FIFO is easily implemented as a round- robin or circular buffer technique.

Least Frequently Used (LFU):

Replace that block in the set that has experienced the fewest references. LFU could be implemented by associating a counter with each line. Replaces with what has the lowest counter value.

Random:

A technique not based on usage (i.e., not LRU, LFU, FIFO, or some variant) is to pick a line at random from among the candidate lines. Simulation studies have shown that random replacement provides only slightly inferior performance to an algorithm based on usage

4. Write Policy

When you are saving changes to main memory. There are two techniques involved:

Write Through:

Every time an operation occurs, you store to main memory as well as cache simultaneously. Although that may take longer, it ensures that main memory is always up to date and this would decrease the risk of data loss if the system would shut off due to power loss. This is used for highly sensitive information.

One of the central caching policies is known as write-through. This means that data is stored and written into the cache and to the primary storage device at the same time. One advantage of this policy is that it ensures information will be stored safely without risk of data loss. If the computer crashes or the power goes out, data can still be recovered without issue. To keep data safe, this policy has to perform every write operation twice. The program or application that is being used must wait until the data has been written to both the cache and storage device before it can proceed. This comes at the cost of system performance but is highly recommended for sensitive data that cannot be lost. Many businesses that deal with sensitive customer information such as payment details would most likely choose this method since that data is very critical to keep intact.

Write Back:

Saves data to cache only.

But at certain intervals or under a certain condition you would save data to the main memory.

Disadvantage: there is a high probability of data loss.

5. Line Size

Another design element is the line size. When a block of data is retrieved and placed in the cache, not only the desired word but also some number of adjacent words are retrieved.

As the block size increases from very small to larger sizes, the hit ratio will at first increase because of the principle of locality, which states that data in the vicinity of a referenced word are likely to be referenced in the near future.

As the block size increases, more useful data are brought into the cache. The hit ratio will begin to decrease, however, as the block becomes even bigger and the probability of using the newly fetched information becomes less than the probability of reusing the information that has to be replaced.

Two specific effects come into play:

· Larger blocks reduce the number of blocks that fit into a cache. Because each block fetch overwrites older cache contents, a small number of blocks results in data being overwritten shortly after they are fetched.

· As a block becomes larger, each additional word is farther from the requested word and therefore less likely to be needed in the near future.

6. Number of Caches

Multilevel Caches:

· On chip cache accesses are faster than cache reachable via an external bus.

· On chip cache reduces the processor’s external bus activity and therefore speeds up execution time and system performance since bus access times are eliminated.

· L1 cache always on chip (fastest level)

· L2 cache could be off the chip in static ram

· L2 cache doesn’t use the system bus as the path for data transfer between the L2 cache and processor, but it uses a separate data path to reduce the burden on the system bus. (System bus takes longer to transfer data)

· In modern designed computers L2 cache may now be on the chip. Which means that an L3 cache can be added over the external bus. However, some L3 caches can be installed on the microprocessor as well.

· In all of these cases there is a performance advantage to adding a third level cache.

Unified (One cache for data and instructions) vs Split (two, one for data and one for instructions)

These two caches both exist at the same level, typically as two L1 caches.

When the processor attempts to fetch an instruction from main memory, it first consults the instruction L1 cache, and when the processor attempts to fetch data from main memory, it first consults the data L1 cache.

Advantages of unified cache:

-Higher hit rate:

It balances the load between instruction and data fetches automatically. That is, if an execution pattern involves many more instruction fetches than data fetches, then the cache will tend to fill up with instructions, and if an execution pattern involves relatively more data fetches, the opposite will occur.

-Balances load of instruction and data fetch

-Only one cache to design & implement

Split Cache:

-Data and instruction can be accessed in parallel

-Both caches can be configured differently

Advantages of split cache:

-Eliminates cache contention between instruction fetch/decode unit and execution unit

  • Important in pipelining

7. Mapping Function

Because there are fewer cache lines than main memory blocks, an algorithm is needed for mapping main memory blocks into cache lines

Further, a means is needed for determining which main memory block currently occupies a cache line. The choice of the mapping function dictates how the cache is organized. Three techniques can be used: direct, associative, and set-associative.

--

--

Sanjay Santokee

Computer Scientist & Data Analyst | Interested in Data Science & Machine Learning | https://www.linkedin.com/in/sanjaysantokee/