DRAM Tutorial
DRAM Module and Chip
Goals
DRAM Chip
Sense Amplifier
Sense Amplifier – Two Stable States
Sense Amplifier Operation
DRAM Cell – Capacitor
Capacitor to Sense Amplifier
DRAM Cell Operation
DRAM Subarray – Building Block for DRAM Chip
DRAM Bank
DRAM Chip
DRAM Operation
RowClone
Memory Channel – Bottleneck
Goal: Reduce Memory Bandwidth Demand
Bulk Data Copy and Initialization
Bulk Data Copy and Initialization
Bulk Copy and Initialization – Applications
Shortcomings of Existing Approach
Our Approach: In-DRAM Copy with Low Cost
RowClone: In-DRAM Copy
Two Key Observations
Bulk Copy in DRAM – RowClone
Fast Parallel Mode – Benefits
Fast Parallel Mode – Constraints
End-to-end System Design
Applications Summary
Results Summary
1.09M
Категория: ЭлектроникаЭлектроника

DRAM Tutorial

1. DRAM Tutorial

18-447 Lecture
Vivek Seshadri

2. DRAM Module and Chip

Vivek Seshadri – Thesis Proposal
2

3. Goals


Cost
Latency
Bandwidth
Parallelism
Power
Energy
Vivek Seshadri – Thesis Proposal
3

4. DRAM Chip

Cell Array
Array of Sense Amplifiers
Cell Array
Cell Array
Array of Sense Amplifiers
Cell Array
Bank I/O
4
Vivek Seshadri – Thesis Proposal
Row Decoder
Row Decoder

5. Sense Amplifier

top
enable
Inverter
bottom
Vivek Seshadri – Thesis Proposal
5

6. Sense Amplifier – Two Stable States

VDD
1
0
1
0
Logical “1”
Vivek Seshadri – Thesis Proposal
VDD
Logical “0”
6

7. Sense Amplifier Operation

VTDD
VT > VB
0
1
V0B
Vivek Seshadri – Thesis Proposal
7

8. DRAM Cell – Capacitor

Empty State
Logical “0”
Fully Charged State
Logical “1”
1
Small – Cannot drive circuits
2
Reading destroys the state
Vivek Seshadri – Thesis Proposal
8

9. Capacitor to Sense Amplifier

VDD
0
1
1
VDD
Vivek Seshadri – Thesis Proposal
0
9

10. DRAM Cell Operation

½VVDD
DD+δ
1
0
0 DD
½V
Vivek Seshadri – Thesis Proposal
10

11. DRAM Subarray – Building Block for DRAM Chip

Row Decoder
Cell Array
Array of Sense Amplifiers (Row Buffer) 8Kb
Cell Array
Vivek Seshadri – Thesis Proposal
11

12. DRAM Bank

Row Decoder
Row Decoder
Address
DRAM Bank
Cell Array
Array of Sense Amplifiers (8Kb)
Cell Array
Cell Array
Array of Sense Amplifiers
Cell Array
Bank I/O (64b)
Address
Vivek Seshadri – Thesis Proposal
Data
12

13. DRAM Chip

Cell Array
Row Decoder
Row Decoder
Row Decoder
Row Decoder
Array of Sense
Amplifiers
Cell Array
Cell Array
Cell Array
Bank I/O
Bank I/O
Cell Array
Array of Sense
Amplifiers
Cell Array
Array of Sense
Amplifiers
Cell Array
Row Decoder
Cell Array
Cell Array
Cell Array
Array of Sense
Amplifiers
Bank I/O
Bank I/O
Cell Array
Array of Sense
Amplifiers
Cell Array
Cell Array
Array of Sense
Amplifiers
Cell Array
Cell Array
Array of Sense
Amplifiers
Cell Array
Cell Array
Array of Sense
Amplifiers
Cell Array
Cell Array
Cell Array
Cell Array
Cell Array
Array of Sense
Amplifiers
Array of Sense
Amplifiers
Cell Array
Bank I/O
Array of Sense
Amplifiers
Bank I/O
Row Decoder
Row Decoder
Cell Array
Cell Array
Cell Array
Array of Sense
Amplifiers
Cell Array
Cell Array
Array of Sense
Amplifiers
Cell Array
Array of Sense
Amplifiers
Cell Array
Cell Array
Array of Sense
Amplifiers
Cell Array
Bank I/O
Bank I/O
Row Decoder
Row Decoder
Row Decoder
Row Decoder
Row Decoder
Row Decoder
Row Decoder
Row Decoder
13
Vivek Seshadri – Thesis Proposal
Row Decoder
Array of Sense
Amplifiers
DRAM Chip
Shared internal bus
Memory channel - 8bits

14. DRAM Operation

1 ACTIVATE Row
Row Decoder
Row Decoder
Row Address
DRAM Operation
2 READ/WRITE Column
Cell Array
Array of Sense Amplifiers
Cell Array
3 PRECHARGE
Bank I/O
Column Address
Vivek Seshadri – Thesis Proposal
Data
14

15. RowClone

Fast and Energy-Efficient In-DRAM
Bulk Data Copy and Initialization
Vivek Seshadri
Y. Kim, C. Fallin, D. Lee, R. Ausavarungnirun,
G. Pekhimenko, Y. Luo, O. Mutlu,
P. B. Gibbons, M. A. Kozuch, T. C. Mowry

16. Memory Channel – Bottleneck

Core
MC
High Energy
Vivek Seshadri – Thesis Proposal
Channel
Memory
Core
Cache
Limited Bandwidth

17. Goal: Reduce Memory Bandwidth Demand

Core
MC
Channel
Reduce unnecessary data movement
Vivek Seshadri – Thesis Proposal
Memory
Core
Cache
Goal: Reduce Memory Bandwidth
Demand

18. Bulk Data Copy and Initialization

Bulk Data
Copy
src
dst
Bulk Data
Initialization
val
dst
Vivek Seshadri – Thesis Proposal

19. Bulk Data Copy and Initialization

Bulk Data
Copy
src
dst
Bulk Data
Initialization
val
dst
Vivek Seshadri – Thesis Proposal

20. Bulk Copy and Initialization – Applications

00000
00000
00000
Forking
Zero initialization
(e.g., security)
Checkpointing
Many more
VM Cloning
Deduplication
Vivek Seshadri – Thesis Proposal
Page Migration

21. Shortcomings of Existing Approach

High Energy
Core
Core
Cache
(3600nJ to copy 4KB)
MC
Channel
High latency
(1046ns to copy 4KB)
Interference
Vivek Seshadri – Thesis Proposal
dst
src

22. Our Approach: In-DRAM Copy with Low Cost

X
Core
Core
Cache
High Energy
MC
Channel
X
Interference
X
High latency
Vivek Seshadri – Thesis Proposal
dst
?
src

23. RowClone: In-DRAM Copy

23

24. Two Key Observations

Row Decoder
Many DRAM cells
2 share the same
sense amplifier
1
Any operation on one sense
amplifier can be easily
performed in bulk
Vivek Seshadri – Thesis Proposal
24

25. Bulk Copy in DRAM – RowClone

½VVDD
DD +δ
Data gets
copied
1
0
½V0DD
Vivek Seshadri – Thesis Proposal
25

26. Fast Parallel Mode – Benefits

Bulk Data Copy (4KB across a module)
Latency
11X
Energy
1046ns to 90ns
74X
3600nJ to 40nJ
No bandwidth consumption
Very little changes to the DRAM chip
Vivek Seshadri – Thesis Proposal
26

27. Fast Parallel Mode – Constraints

• Location constraint
– Source and destination in same subarray
• Size constraint
– Entire row gets copied (no partial copy)
1 Can still accelerate many existing primitives
(copy-on-write, bulk zeroing)
2 Alternate mechanism to copy data across banks
(pipelined serial mode – lower benefits than Fast Parallel)
Vivek Seshadri – Thesis Proposal
27

28. End-to-end System Design

• Software interface
– memcpy and meminit instructions
• Managing cache coherence
– Use existing DMA support!
• Maximizing use of Fast Parallel Mode
– Smart OS page allocation
Vivek Seshadri – Thesis Proposal
28

29. Applications Summary

Fraction of Memory Traffic
Zero
Copy
Write
Read
1
0,8
0,6
0,4
0,2
0
bootup
compile forkbench mcached
Vivek Seshadri – Thesis Proposal
mysql
shell
29

30. Results Summary

Compared to Baseline
IPC Improvement
Memory Energy Reduction
70%
60%
50%
40%
30%
20%
10%
0%
bootup
compile forkbench mcached
Vivek Seshadri – Thesis Proposal
mysql
shell
30
English     Русский Правила