324.70K
Категория: ПрограммированиеПрограммирование

RISC-V Processor with “F” Extension

1.

RISC-V Processor with “F”
Extension
Kathy Camenzind and Miguel Gomez

2.

Outline
● Background
● Implementation Strategy



Combinational
Multicycle
4-stage pipeline
● Microarchitecture
● Testing


Connectal
Compliance tests
● Results
● Notable bugs & lessons learned

3.

Background
● RISC-V “F” extension


Single-precision floating point numbers
Includes encodings for +- Infinity, quiet/signaling NaN
● Instructions include:







FLW/FSW
FMV (between integer and FP registers)
FDIV/ADD/MUL/SQRT/MADD/…
FMIN/MAX, FEQ/FLT/FLE
FSGNJ (sign injection)
FCVT
FCLASS

4.

Overview of Required Modifications
● Based on Lab 5 Processor
● CSR instructions


Basic CSR reads and swaps
FCSR (rounding mode)
● Extending the Decode and Execute stages to handle new instructions
● Integrating Bluespec’s built in FloatingPoint library
● Adding tests that use floating point instructions

5.

Implementation Strategy
● Combinational


Executes floating-point operations in single cycle
Very long critical path
● Multicycle


Executes floating-point operations in multiple cycles
One instruction passes through processor at a time
● 4-Stage Pipeline with Bypassing


Executes floating-point operations in multiple cycles
Pipelining based on ThreeStageBypass from lab 5, with added writeback stage
● 4-Stage Superscalar Pipeline with Bypassing


Can execute multiple instructions at a time in different functional units
Preserves commit order within integer operations and floating point operations

6.

4-Stage Pipeline: Microarchitecture

7.

Superscalar Execute: Microarchitecture
Control Hazards:

Branch instructions resolve
in one cycle in Execute
Data Hazards

RAW: Still handled by
Scoreboard

WAW: Completion buffer
reorders same-RF
writebacks

8.

Synthesis Results
Combinational
Multicycle
FourStageBypass
Superscalar
Area (μm2)
~190,000
~187,000
~368,000
~397,000
Critical Path (ps)
~10,200
1,147
2,018
1,656

9.

Performance Results
Multicycle
(Floating Point)
Four Stage
Bypass (Floating
Point)
Four Stage
Superscalar
Bypass (Floating
Point)
towers
.1033
.1069
0.1069
median
.2768
.2872
0.2873
Multiply
.4763
.3838
0.3838
Qsort
.3817
n/a
Vvadd
.2229
.2241
n/a
0.2242

10.

Running on FPGA

11.

Notable Bugs
● Functionally incorrect functions in FloatingPoint.bsv
● Compliance tests that were non-compliant :)

Mostly centered around NaN operations
● Debugging



Implemented showInst for new instructions
Printed which stages were executing which instructions
Printing the current state of the completion buffer

12.

Conclusion / Things Learned
● Efficient pipelining is difficult
● Fixing edge cases should be a second priority, completing the bigger picture
is first
● Simulations can sometimes be unreliable
● Libraries can be incorrect
English     Русский Правила