The widening gap between the latency of the processor, DRAM, and secondary storage devices (hard disks) has created the need for a memory hierarchy based on speed, size, and cost. It is historically observed that the speed gap between the processor, clocked in the orders of gigahertz, and the main memory is growing exponentially. This problem is sometimes called the “memory wall” (previously “memory gap”). A hierarchy of cache of various size and speed has been employed to alleviate the speed gap and reduce latency.
Caches are made of cache lines, which are mapped to and host data of a memory block of the same size. Since caches are much smaller in size, a particular cache line may be mapped to different memory locations at different times. If the address requested by the processor happens to be found in the cache at some level, we call the scenario a cache hit, if not, we call it a cache miss.
Source: UC Berkeley
Cache Hierarchies
Caches are designed under the observation that memory requested by the processor during the execution of a program is highly predictable. Consider the pattern exhibited by the following simple loop:
loop: ADD R2, R1, R1
SUBI R3, R3, #1
BNEZ R3, loop
Observe that locations referenced will likely be referenced again in the near future, this is called temporal locality, and that locations nearby are also likely to be referenced in the near future, this is called spatial locality. Thus caches can exploit the temporal and spatial locality by remembering recently accessed locations are prefetching blocks around them.
A cache is made of two memories: a directory memory and data memory. The directory memory contains the ID/tag of the memory block residing in each cache line and some state bits, with at least a validity bit. The data memory contains the cached copy of the memory block. The mapping between a memory block and a cache line is based on the bock address. There are three types of cache mapping: direct mapping, set-associative mapping, and fully associative mapping.
Direct mapped caches:
In direct mapped cache, a given memory block is always mapped to the same cached line by a hash function. The simplest hash function simply takes the least significant bits in the block address that correspond to the number of cache lines and data bytes per line. The remaining bits will be stored as the address tag of that cache line.
Read accesses proceed as follows:
1. Cache indexing. The directory entry is fetched while the data memory is accessed with the offset. Cache indexing carries out at standard SRAM speed.
2. Tag checking. The tag in the cache line is compared with the corresponding most significant bits of the block address. If they match, a validity check is performed on the state bits. If successful, cache hits and the data are forwarded to the next level, otherwise a cache miss triggers.
Set-associative caches:
The advantage of a direct mapped cache is its fast access time on a hit. However, many locations in memory are mapped to the same cache lines, this can cause high miss rates when a large number of memory blocks compete for the same cache line. To resolve this problem, most caches are set-associative. A set associative cache groups cache lines into sets, each memory block is mapped to a single set but can reside in any line of that set, in a sense, this is similar to linear probing.
Source: MIT 6.823
During a read access, all lines of a set must now be checked for consistency. The rest of the procedures for a read access are the same as those of a direct mapped cache, which can also be called a one way set associative cache. The number of cache lines in a set associative cache is typically between 2 and 8. Too many set lines require complex decoding and checking logic that increase the reading latency, a better solution would be to increase the number of sets instead.
Write access in a set associative cache takes at least 2 cycles. The first cycle is required for a tag and consistency check, if hit, the second cycle writes the data. These two cycles can be pipeline if write accesses occur in burst.
Fully associative caches:
A fully associative cache is drastically different from a set associative cache. A memory block can be mapped anywhere in the fully associative cache and as a result, its tag is the entire block address. A read or write access takes 2 cycles. In the first cycle, the block address is compared with the tags of all cache lines in parallel. If hit, data is read or written into that cache line. A fully associative cache can be seen as a set associative cache consisting of a single set.
Caches are made of cache lines, which are mapped to and host data of a memory block of the same size. Since caches are much smaller in size, a particular cache line may be mapped to different memory locations at different times. If the address requested by the processor happens to be found in the cache at some level, we call the scenario a cache hit, if not, we call it a cache miss.
Source: UC Berkeley
Cache Hierarchies
Caches are designed under the observation that memory requested by the processor during the execution of a program is highly predictable. Consider the pattern exhibited by the following simple loop:
loop: ADD R2, R1, R1
SUBI R3, R3, #1
BNEZ R3, loop
Observe that locations referenced will likely be referenced again in the near future, this is called temporal locality, and that locations nearby are also likely to be referenced in the near future, this is called spatial locality. Thus caches can exploit the temporal and spatial locality by remembering recently accessed locations are prefetching blocks around them.
A cache is made of two memories: a directory memory and data memory. The directory memory contains the ID/tag of the memory block residing in each cache line and some state bits, with at least a validity bit. The data memory contains the cached copy of the memory block. The mapping between a memory block and a cache line is based on the bock address. There are three types of cache mapping: direct mapping, set-associative mapping, and fully associative mapping.
Direct mapped caches:
In direct mapped cache, a given memory block is always mapped to the same cached line by a hash function. The simplest hash function simply takes the least significant bits in the block address that correspond to the number of cache lines and data bytes per line. The remaining bits will be stored as the address tag of that cache line.
Read accesses proceed as follows:
1. Cache indexing. The directory entry is fetched while the data memory is accessed with the offset. Cache indexing carries out at standard SRAM speed.
2. Tag checking. The tag in the cache line is compared with the corresponding most significant bits of the block address. If they match, a validity check is performed on the state bits. If successful, cache hits and the data are forwarded to the next level, otherwise a cache miss triggers.
Set-associative caches:
The advantage of a direct mapped cache is its fast access time on a hit. However, many locations in memory are mapped to the same cache lines, this can cause high miss rates when a large number of memory blocks compete for the same cache line. To resolve this problem, most caches are set-associative. A set associative cache groups cache lines into sets, each memory block is mapped to a single set but can reside in any line of that set, in a sense, this is similar to linear probing.
Source: MIT 6.823
During a read access, all lines of a set must now be checked for consistency. The rest of the procedures for a read access are the same as those of a direct mapped cache, which can also be called a one way set associative cache. The number of cache lines in a set associative cache is typically between 2 and 8. Too many set lines require complex decoding and checking logic that increase the reading latency, a better solution would be to increase the number of sets instead.
Write access in a set associative cache takes at least 2 cycles. The first cycle is required for a tag and consistency check, if hit, the second cycle writes the data. These two cycles can be pipeline if write accesses occur in burst.
Fully associative caches:
A fully associative cache is drastically different from a set associative cache. A memory block can be mapped anywhere in the fully associative cache and as a result, its tag is the entire block address. A read or write access takes 2 cycles. In the first cycle, the block address is compared with the tags of all cache lines in parallel. If hit, data is read or written into that cache line. A fully associative cache can be seen as a set associative cache consisting of a single set.