Lect 6 B
Lect 6 B
Lect 6 B
Agenda
1 Memory Hierarchy
2 Cache Memory
Mechanics of Technology
Locality of Reference
Hierarchy List
Registers
L1 Cache
L2 Cache
Main memory
Disk cache
Disk
Optical
Tape
Cache
Cache (2)
Cache Structure
Cache Design
Addressing
Size
Mapping Function
Replacement Algorithm
Write Policy
Block Size
Number of Caches
Cache Addressing
Cache size
Cost
More cache is expensive
Speed
More cache is faster (up to a point)
Larger decoding circuits slow up a cache
Algorithm is needed for mapping main memory addresses to
lines in the cache. This takes more time than just a direct
RAM
Mapping Functions
Cache Example
Direct Mapping
i = j modulo m
where
i = cache line number
j = main memory block number
m = number of lines in the cache
8 14 2
24 bit address
2 bit word identifier (4 byte block)
22 bit block identifier
8 bit tag (=22-14)
14 bit slot or line
No two blocks in the same line have the same Tag field
Check contents of cache by finding line and checking Tag
0 0, m, 2m, 3m…2s-m
1 1,m+1, 2m+1…2s-m+1
Simple
Inexpensive
Fixed location for given block
If a program accesses 2 blocks that map to the same line
repeatedly, cache misses are very high
Associative Mapping
Word
Tag 22 bit 2 bit
Word
Tag 9 bit Set 13 bit 2 bit
Replacement Algorithms
Write Policy
Write through
Write through
Write back
Line Size
Multi-Level Caches
One cache for data and instructions (unified) or two, one for
data and one for instructions (split)
Advantages of unified cache
Higher hit rate
Balances load of instruction and data fetch
Only one cache to design & implement
Advantages of split cache
Eliminates cache contention between instruction fetch/decode
unit and execution unit
Important in pipelining
L1 cache
8k bytes
64 byte lines
Four way set associative
L2 cache
Feeding both L1 caches
256k
128 byte lines
8 way set associative
Contention occurs when both the Instruction Prefetcher and Create separate data and instruction Pentium
the Execution Unit simultaneously require access to the caches.
cache. In that case, the Prefetcher is stalled while the
Execution Unit’s data access takes place.
Fetch/Decode Unit
Fetches instructions from L2 cache
Decode into micro-ops
Store micro-ops in L1 cache
Out of order execution logic
Schedules micro-ops
Based on data dependence and resources
May speculatively execute
Execution units
Execute micro-ops
Data from L1 cache
Results in registers
Memory subsystem – L2 cache and systems bus