# Using a Performance Model to Implement a Superscalar CVA6

**Côme Allart**<sup>1,2,\*</sup>, Jean-Roch Coulon<sup>2</sup>, André Sintzoff<sup>2</sup>, Olivier Potin<sup>1</sup> and Jean-Baptiste Rigaud<sup>1</sup>

<sup>1</sup>Mines Saint-Étienne, CEA, Leti, Centre CMP, F-13541, Gardanne, France

\*come.allart@thalesgroup.com

\*come.allart@thalesgroup.com

#### CONTEXT

- CVA6: a 32- or 64-bit RISC-V application processor
- Highly-configurable, 6-stage pipeline
- In-order issue, out-of-order execution
- Performance is 3.10 CoreMark/MHz on single-issue
   How to improve performance further?



## 1. CYCLE-BASED MODEL

- Goal Easily evaluate architecture improvements
- Input RVFI trace (committed instr.s) from CVA6
- Output Cycle-annotated RVFI trace

#### Accuracy check:

- Using 2<sup>nd</sup> iteration of CoreMark
- Measure each instruction duration  $\Delta t_i = t_i t_{i-1}$
- Count correct results  $\#\{i \mid \Delta t_i^{\mathsf{Model}} = \Delta t_i^{\mathsf{RTL}}\}$
- Divide by instruction number  $\#\{i\}$

$$\begin{array}{c|c} \mathsf{Accuracy} = \frac{\#\{i \mid \Delta t_i^{\mathsf{Model}} = \Delta t_i^{\mathsf{RTL}}\}}{\#\{i\}} = \mathbf{99.2\%} \end{array}$$

#### Issue Check for interactions between instructions

- Execute Mark as done, delay according to instruction
- Commit Check for done mark



## 2. PREDICTING PERFORMANCE

- a. Add the feature in the Python Model class: the data path is ignored here
- b. For a given benchmark, the model produces the performance gain considering the whole pipeline even though the modification (a.) was local
- c. Implement, rework or discard the feature
- Superscalar Choose issue and commit port numbers



- Dual-issue Final prediction for performance gain
  - Without renaming: +47% speed
    Without renaming: +42% speed
  - Without renaming: +42% speed

# 3. IMPLEMENTING SUPERSCALAR CVA6



Debug the performance by comparing with the model:

- $(\alpha)$  Global performance Performance gain
- (β) Local performance Instructions duration
- ( $\gamma$ ) Internal Pipeline state over time

### RESULTS

| Criteria       | Reference | Superscalar | Variation |
|----------------|-----------|-------------|-----------|
| CoreMark/MHz   | 3.10      | 4.35        | +40.1%    |
| Max. Frequency | 892 MHz   | 877 MHz     | -1.75%    |
| Power          | 32.453 mW | 34.844 mW   | +7.37%    |
| Area           | 250 kGE   | 278 kGE     | +11.1%    |



These activities are supported by the TRISTAN project funded by the Key Digital Technologies Joint Undertaking (KDT JU) under grant agreements 101095947. The present action reflects only the authors' view; the European Commission and the JU are not responsible for any use that may be made of the information it contains.









