Lect 6 B

Download as pdf or txt
Download as pdf or txt
You are on page 1of 64

Memory Hierarchy Cache Memory Cache Design Issues

CSC 213: Computer Architecture


Lecture 6b: Cache Memory

November 30, 2021

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Agenda

1 Memory Hierarchy

2 Cache Memory

3 Cache Design Issues

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Importance of Memory System

Every instruction makes at least one memory reference


to fetch instruction
Typically more memory references are made
to fetch operand
to store operand
A program’s memory references often determines the ultimate
performance of a program

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Processor-DRAM Performance Gap

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

How do you Bridge the Gap?

Goal: Provide an illusion of a fast, large and cheap memory


system.
Method: Memory hierarchy.

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Memory Hierarchy Diagram

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Mechanics of Technology

The basic mechanics of creating memory directly affect the


first three characteristics of the hierarchy:
Decreasing cost per bit
Increasing capacity
Increasing access time
The fourth characteristic is met because of a principle known
as locality of reference

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Locality of Reference

Due to the nature of programming, instructions and data tend


to cluster together (loops, subroutines, and data structures)
Over a long period of time, clusters will change
Over a short period, clusters will tend to be the same

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Breaking Memory into Levels

Assume a hypothetical system has two levels of memory


Level 2 should contain all instructions and data
Level 1 doesn’t have room for everything, so when a new
cluster is required, the cluster it replaces must be sent back to
the level 2
These principles can be applied to much more than just two
levels

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Performance of a Simple Two-Level Memory

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Memory Hierarchy - Performance Examples

A processor has access to two levels of memory. Level 1 has


an access time of 0.01 µs and level 2 has an access time of
0.1 µs.
If 95% of the memory accesses are found in the faster level,
then the average access time might be:

(0.95)(0.01 µs) + (0.05)(0.01 µs + 0.1 µs)

= 0.0095 + 0.0055 = 0.015 µs

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Hierarchy List

Registers
L1 Cache
L2 Cache
Main memory
Disk cache
Disk
Optical
Tape

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Cache

What is it? A cache is a small amount of fast memory


What makes small fast?
Simpler decoding logic
More expensive SRAM technology
Close proximity to processor – Cache sits between normal main
memory and CPU or it may be located on CPU chip or module

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Cache (2)

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Cache Structure

Cache includes tags to identify the address of the block of


main memory contained in a line of the cache
Each word in main memory has a unique n-bit address
There are M = 2n /K block of K words in main memory
Cache contains C lines of K words each plus a tag uniquely
identifying the block of K words

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Cache Structure (2)

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Cache operation – overview

CPU requests contents of memory location


Check cache for this data
If present, get from cache (fast)
If not present, read required block from main memory to cache
Then deliver from cache to CPU

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Cache Read Flowchart

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Cache Design

Addressing
Size
Mapping Function
Replacement Algorithm
Write Policy
Block Size
Number of Caches

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Cache Addressing

Where does cache sit?


Between processor and virtual memory management unit
Between MMU and main memory
Logical cache (virtual cache) stores data using virtual
addresses
Processor accesses cache directly, not thorough physical cache
Cache access faster, before MMU address translation
Virtual addresses use same address space for different
applications
Must flush cache on each context switch
Physical cache stores data using main memory physical
addresses

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Cache size

Cost
More cache is expensive
Speed
More cache is faster (up to a point)
Larger decoding circuits slow up a cache
Algorithm is needed for mapping main memory addresses to
lines in the cache. This takes more time than just a direct
RAM

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Typical Cache Organization

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Mapping Functions

A mapping function is the method used to locate a memory


address within a cache
It is used when copying a block from main memory to the
cache and it is used again when trying to retrieve data from
the cache
There are three kinds of mapping functions
Direct
Associative
Set Associative

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Cache Example

These notes use an example of a cache to illustrate each of


the mapping functions.
The characteristics of the cache used are:
Size: 64 kByte
Block size: 4 bytes
i.e. the cache has 16k (214 ) lines of 4 bytes
Address bus: 24-bit
i.e., 16M bytes main memory divided into 4M 4 byte blocks

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Direct Mapping

Each block of main memory maps to only one cache line


i.e. if a block is in cache, it will always be found in the same
place
Line number is calculated using the following function

i = j modulo m

where
i = cache line number
j = main memory block number
m = number of lines in the cache

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Direct Mapping Address Structure


Each main memory address can by divided into two fields
Least Significant w bits identify unique word within a block
Remaining bits (s) specify which block in memory. These are
divided into two fields
Least significant r bits of these s bits identifies which line in
the cache
Most significant s-r bits uniquely identifies the block within a
line of the cache

s-r bits r bits w bits

Tag Bits identifying Bits identifying word


row in cache offset into block

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Direct Mapping Address Structure - Example

Tag s-r Line or Slot r Word w

8 14 2

24 bit address
2 bit word identifier (4 byte block)
22 bit block identifier
8 bit tag (=22-14)
14 bit slot or line
No two blocks in the same line have the same Tag field
Check contents of cache by finding line and checking Tag

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Direct Mapping Cache Organization

Cache line Main Memory blocks held

0 0, m, 2m, 3m…2s-m

1 1,m+1, 2m+1…2s-m+1

m-1 m-1, 2m-1,3m-1…2s-1

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Direct Mapping Cache Line Table

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Direct Mapping Summary

Address length = (s + w) bits


Number of addressable units = 2s+w words or bytes
Block size = line size = 2w words or bytes
Number of blocks in main memory = 2s+w /2w = 2s
Number of lines in cache = m = 2r
Size of tag = (s - r) bits

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Direct Mapping pros & cons

Simple
Inexpensive
Fixed location for given block
If a program accesses 2 blocks that map to the same line
repeatedly, cache misses are very high

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Associative Mapping

A main memory block can load into any line of cache


Memory address is interpreted as:
Least significant w bits = word position within block
Most significant s bits = tag used to identify which block is
stored in a particular line of cache
Every line’s tag must be examined for a match
Cache searching gets expensive and slower

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Associative Mapping from Cache to Main Memory

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Fully Associative Cache Organization

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Associative Mapping Address Structure - Example

Word
Tag 22 bit 2 bit

22 bit tag stored with each 32 bit block of data


Compare tag field with tag entry in cache to check for hit
Least significant 2 bits of address identify which 16 bit word is
required from 32 bit data block
Address Tag Data Cache line
FFFFFC FFFFFC 24682468 3FFF

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Associative Mapping Summary

Address length = (s + w) bits


Number of addressable units = 2s+w words or bytes
Block size = line size = 2w words or bytes
Number of blocks in main memory = 2s+w /2w = 2s
Number of lines in cache = undetermined
Size of tag = s bits

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Set Associative Mapping

Cache is divided into a number of sets, v


Each set contains a number of lines, k
A given memory block maps to any line in a given set
e.g. Block B can be in any line of set i
2 lines per set is the most common organization.
called 2 way associative mapping
A given block can be in one of 2 lines in only one set

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Set Associative Mapping (2)

Address length is s + w bits


Cache is divided into a number of sets, v = 2d
k blocks/lines can be contained within each set
k lines in a cache is called a k-way set associative mapping
Number of lines in a cache = v ∗ k = k ∗ 2d
Size of tag = (s-d) bits
Hybrid of Direct and Associative
k = 1, this is basically direct mapping
v = 1, this is associative mapping

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Mapping From Main Memory to Cache: v Associative

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Alternative Mapping: k-way Associative

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

K-Way Set Associative Cache Organization

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Set Associative Mapping Example

Using a two-way set associative mapping


Divides the 16K lines into 8K sets
This requires a 13 bit set number
With 2 word bits, this leaves 9 bits for the tag
Block number in main memory is modulo 213
Blocks beginning with the addresses 00000016 , 00800016 ,
01000016 , 01800016 , 02000016 , 02800016 , etc. map to the
same set, Set 0.
Blocks beginning with the addresses 00000416 , 00800416 ,
01000416 , 01800416 , 02000416 , 02800416 , etc. map to the
same set, Set 1.

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Set Associative Mapping Address Structure - Example

Word
Tag 9 bit Set 13 bit 2 bit

Use set field to determine cache set to look in


Compare tag field to see if we have a hit
e.g.,
Address Tag Data Set
1FF 7FFC 1FF 12345678 1FFF
001 7FFC 001 11223344 1FFF

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Associative Mapping Summary

Address length = (s + w) bits


Number of addressable units = 2s+w words or bytes
Block size = line size = 2w words or bytes
Number of blocks in main memory = 2s+w /2w = 2s
Number of lines in set = k
Number of sets = v = 2d
Number of lines in cache = kv = k ∗ 2d
Size of tag = (s - d) bits

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Replacement Algorithms

There must be a method for selecting which line in the cache


is going to be replaced when there’s no room for a new line
Direct mapping
There is no need for a replacement algorithm with direct
mapping
Each block only maps to one line
Replace that line

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Replacement Algorithms (2)


Associative & Set Associative
Hardware implemented algorithm (speed)
Least Recently used (LRU)
Replace the block that hasn’t been touched in the longest
period of time
Two way set associative simply uses a USE bit.
- When one block is referenced, its USE bit is set while its
partner in the set is cleared
First in first out (FIFO)
replace block that has been in cache longest
Least frequently used (LFU)
replace block which has had fewest hits
Random
only slightly lower performance than use-based algorithms
LRU, FIFO, and LFU

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Write Policy

Must not overwrite a cache block unless main memory is up


to date
Two main problems:
If cache is written to, main memory is invalid or if main
memory is written to, cache is invalid
Can occur if I/O can address main memory directly
Multiple CPUs may have individual caches; once one cache is
written to, all caches are invalid

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Write through

All writes go to main memory as well as cache


Multiple CPUs can monitor main memory traffic to keep local
(to CPU) cache up to date
Lots of traffic
Slows down writes

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Write through

All writes go to main memory as well as cache


Multiple CPUs can monitor main memory traffic to keep local
(to CPU) cache up to date
Lots of traffic
Slows down writes

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Write back

Updates initially made in cache only


Update bit for cache slot is set when update occurs
If block is to be replaced, write to main memory only if
update bit is set
Other caches get out of sync
I/O must access main memory through cache
Research shows that 15% of memory references are writes

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Multiple Processors/Multiple Caches

Even if a write through policy is used, other processors may


have invalid data in their caches
In other words, if a processor updates its cache and updates
main memory, a second processor may have been using the
same data in its own cache which is now invalid.

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Solutions to Prevent Problems with Multiprocessor/cache


systems

Bus watching with write through


each cache watches the bus to see if data they contain is being
written to the main memory by another processor. All
processors must be using the write through policy
Hardware transparency
a “big brother” watches all caches, and upon seeing an update
to any processor’s cache, it updates main memory AND all of
the caches
Noncacheable memory
Any shared memory (identified with a chip select) may not be
cached.

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Line Size

There is a relationship between line size (i.e., the number of


words in a line in the cache) and hit ratios
As the line size (block size) goes up, the hit ratio could go up
due to more words available to the principle of locality of
reference
As block size increases, however, the number of blocks goes
down, and the hit ratio will begin to go back down after a
while
Lastly, as the block size increases, the chances of a hit to a
word farther from the initially referenced word goes down
No definitive optimum value has been found

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Multi-Level Caches

Increases in transistor densities have allowed for caches to be


placed inside processor chip
Internal caches have very short wires (within the chip itself)
and are therefore quite fast, even faster then any zero
wait-state memory accesses outside of the chip
This means that a super fast internal cache (level 1) can be
inside of the chip while an external cache (level 2) can provide
access faster then to main memory

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Unified versus Split Caches

One cache for data and instructions (unified) or two, one for
data and one for instructions (split)
Advantages of unified cache
Higher hit rate
Balances load of instruction and data fetch
Only one cache to design & implement
Advantages of split cache
Eliminates cache contention between instruction fetch/decode
unit and execution unit
Important in pipelining

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Intel x86 caches

80386 – no on chip cache


80486 – 8k using 16 byte lines and four-way set associative
organization (main memory had 32 address lines - 4 Gig)
Pentium (all versions)
Two on chip L1 caches
Data & instructions
Pentium III – L3 cache added off chip

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Pentium 4 L1 and L2 Caches

L1 cache
8k bytes
64 byte lines
Four way set associative
L2 cache
Feeding both L1 caches
256k
128 byte lines
8 way set associative

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Intel Cache Evolution

Processor on which feature


Problem Solution first appears
Add external cache using faster 386
External memory slower than the system bus. memory technology.

Move external cache on-chip, 486


Increased processor speed results in external bus becoming a operating at the same speed as the
bottleneck for cache access. processor.

Add external L2 cache using faster 486


Internal cache is rather small, due to limited space on chip technology than main memory

Contention occurs when both the Instruction Prefetcher and Create separate data and instruction Pentium
the Execution Unit simultaneously require access to the caches.
cache. In that case, the Prefetcher is stalled while the
Execution Unit’s data access takes place.

Create separate back-side bus that Pentium Pro


runs at higher speed than the main
(front-side) external bus. The BSB is
Increased processor speed results in external bus becoming a dedicated to the L2 cache.
bottleneck for L2 cache access.
Move L2 cache on to the processor Pentium II
chip.

Add external L3 cache. Pentium III


Some applications deal with massive databases and must
have rapid access to large amounts of data. The on-chip
caches are too small. Move L3 cache on-chip. Pentium 4

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Pentium 4 Block Diagram

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Pentium 4 Operation – Core Processor

Fetch/Decode Unit
Fetches instructions from L2 cache
Decode into micro-ops
Store micro-ops in L1 cache
Out of order execution logic
Schedules micro-ops
Based on data dependence and resources
May speculatively execute
Execution units
Execute micro-ops
Data from L1 cache
Results in registers
Memory subsystem – L2 cache and systems bus

CSC 213: Computer Architecture


Memory Hierarchy Cache Memory Cache Design Issues

Pentium 4 Design Reasoning

Decodes instructions into RISC like micro-ops before L1 cache


Micro-ops fixed length – Superscalar pipelining and scheduling
Pentium instructions long & complex
Performance improved by separating decoding from scheduling
& pipelining
Data cache is write back – Can be configured to write through
L1 cache controlled by 2 bits in register
CD = cache disable
NW = not write through
2 instructions to invalidate (flush) cache and write back then
invalidate
L2 and L3 are 8-way set-associative – Line size 128 bytes

CSC 213: Computer Architecture

You might also like