Skip to content

Commit fd73bca

Browse files
committed
feat: RV32A atomics, FENCE.I, per-FU pipeline, and cache coherence fixes
## New instructions - **RV32A**: LR.W, SC.W, and all AMO variants (AMOSWAP, AMOADD, AMOXOR, AMOAND, AMOOR, AMOMAX, AMOMIN, AMOMAXU, AMOMINU) now route through the cache layer via `lr_w` / `sc_w` / `amo_w` on the `Bus` trait. LR/SC reservations are tracked per-hart inside `CacheController` (and per-hart in the CPU registers as a fallback for bypass mode). Any store to a reserved word cancels the reservation. - **FENCE**: forwards to `Bus::fence()` (was a no-op). - **FENCE.I**: implemented as `Bus::fence_i()`, invalidates the I-cache so self-modifying code picks up freshly written instructions. - Instruction panel type badges extended: `[A]` (atomic) and `[F]` (float). ## Cache coherence bug fixes (multi-level hierarchy) Four bugs that caused silent data corruption when L1 D-cache + L2 are both enabled (program crash at PC=0x00000000 with rust-to-raven.elf): 1. **Blanket L2 invalidation after write-back stores removed.** `dcache_store_bytes` was invalidating L2 after every store. On a write-allocate miss this sequence ran: fetch L2 → evict dirty D-cache line → writeback dirty data to L2 → then invalidate that same L2 line. Data ended up in neither cache level nor RAM (silent loss, corrupted stack return addresses → PC=0). 2. **Write-through stores now correctly invalidate L2.** Write-through keeps D-cache lines clean, so evictions never writeback to L2. After a write-through store, the L2 copy is stale and must be dropped so future misses re-read the updated value from RAM. 3. **`sync_to_ram` writeback order fixed.** Was: D-cache first, then L2. Now: L2 (reversed outer levels) first, then D-cache. D-cache is always authoritative; writing L2 last would overwrite correct D-cache data with stale L2 data in RAM. 4. **`flush_all` writeback order fixed** (same root cause as #3). Called when the user toggles the cache off; same reversal applied. ## Pipeline improvements - Per-FU (ALU, MUL, DIV, FPU, LSU, SYS) stall counters and configurable functional-unit counts. - Branch predictor extended: not-taken (default), always-taken, BTFNT, and 2-bit dynamic. - Per-hazard stall tag breakdown in the pipeline footer. - `.pcfg` config format updated: per-bypass-path flags replace single `forwarding` boolean; `fu.*` counts added. - Speculative Gantt: in-flight speculative instructions shown with distinct styling; flushed instructions marked. ## rust-to-raven - `raven_api` restructured: `hart.rs` moved to `hardware_thread/`, `atomic/` module added with `Arc`, `AtomicBool`, `AtomicU32`, etc. backed by LR.W/SC.W so they actually work on the simulator. - `main.rs` updated to use `HartTask` with a closure-based API.
1 parent 792b43c commit fd73bca

79 files changed

Lines changed: 8549 additions & 2085 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ Requires Rust 1.75+. No other dependencies.
4343
- **R/W**: still RAM view, but auto-follows the last memory access address from `LOAD` and `STORE`
4444
- **Registers**: integer or float register bank with per-register age highlighting; pin with `P`
4545
- **Dyn**: self-narrating mode for single-stepping — STORE → RAM centered on the written address (``); LOAD / ALU / branch → register bank so you see the result
46-
- Instruction memory panel: type badge `[R][I][S][B][U][J]`, execution heat `×N`, branch outcome
46+
- Instruction memory panel: type badge `[R][I][S][B][U][J][A][F]`, execution heat `×N`, branch outcome
4747
- Instruction decoder: full field breakdown (opcode, funct3/7, rs1/rs2/rd, immediate, sign-extended)
4848

4949
### Cache Simulator (Tab 3)
@@ -66,7 +66,7 @@ Requires Rust 1.75+. No other dependencies.
6666

6767
| Extension | Instructions |
6868
|-----------|-------------|
69-
| RV32I | ADD, SUB, AND, OR, XOR, SLL, SRL, SRA, SLT, SLTU, ADDI, ANDI, ORI, XORI, SLTI, SLTIU, SLLI, SRLI, SRAI, LB, LH, LW, LBU, LHU, SB, SH, SW, BEQ, BNE, BLT, BGE, BLTU, BGEU, JAL, JALR, LUI, AUIPC, ECALL, EBREAK |
69+
| RV32I | ADD, SUB, AND, OR, XOR, SLL, SRL, SRA, SLT, SLTU, ADDI, ANDI, ORI, XORI, SLTI, SLTIU, SLLI, SRLI, SRAI, LB, LH, LW, LBU, LHU, SB, SH, SW, BEQ, BNE, BLT, BGE, BLTU, BGEU, JAL, JALR, LUI, AUIPC, FENCE, FENCE.I, ECALL, EBREAK |
7070
| RV32M | MUL, MULH, MULHSU, MULHU, DIV, DIVU, REM, REMU |
7171
| RV32A | LR.W, SC.W, AMOSWAP.W, AMOADD.W, AMOXOR.W, AMOAND.W, AMOOR.W, AMOMAX.W, AMOMIN.W, AMOMAXU.W, AMOMINU.W |
7272
| RV32F | FADD.S, FSUB.S, FMUL.S, FDIV.S, FSQRT.S, FMIN.S, FMAX.S, FEQ.S, FLT.S, FLE.S, FLW, FSW, FMV.W.X, FMV.X.W, FCVT.W.S, FCVT.WU.S, FCVT.S.W, FCVT.S.WU, FCLASS.S, FMADD.S, FMSUB.S, FNMADD.S, FNMSUB.S, FNEG.S, FABS.S |

docs/en/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ Requires Rust 1.75+. No other dependencies.
4040
- **RAM**: `k` cycles region: Data / Stack / R/W / Heap (sbrk pointer with `▶HB` marker)
4141
- **R/W**: still RAM view, but auto-follows the last memory access address from `LOAD` and `STORE`
4242
- **Dyn**: STORE → RAM at written address; LOAD/ALU → register bank
43-
- Instruction memory panel: type badge `[R][I][S][B][U][J]`, execution heat `×N`, branch outcome
43+
- Instruction memory panel: type badge `[R][I][S][B][U][J][A][F]`, execution heat `×N`, branch outcome
4444
- Instruction decoder: full field breakdown (opcode, funct3/7, rs1/rs2/rd, immediate, sign-extended)
4545

4646
### Cache Simulator (Tab 3)
@@ -63,7 +63,7 @@ Requires Rust 1.75+. No other dependencies.
6363

6464
| Extension | Instructions |
6565
|-----------|-------------|
66-
| RV32I | ADD, SUB, AND, OR, XOR, SLL, SRL, SRA, SLT, SLTU, ADDI, ANDI, ORI, XORI, SLTI, SLTIU, SLLI, SRLI, SRAI, LB, LH, LW, LBU, LHU, SB, SH, SW, BEQ, BNE, BLT, BGE, BLTU, BGEU, JAL, JALR, LUI, AUIPC, ECALL, EBREAK |
66+
| RV32I | ADD, SUB, AND, OR, XOR, SLL, SRL, SRA, SLT, SLTU, ADDI, ANDI, ORI, XORI, SLTI, SLTIU, SLLI, SRLI, SRAI, LB, LH, LW, LBU, LHU, SB, SH, SW, BEQ, BNE, BLT, BGE, BLTU, BGEU, JAL, JALR, LUI, AUIPC, FENCE, FENCE.I, ECALL, EBREAK |
6767
| RV32M | MUL, MULH, MULHSU, MULHU, DIV, DIVU, REM, REMU |
6868
| RV32A | LR.W, SC.W, AMOSWAP.W, AMOADD.W, AMOXOR.W, AMOAND.W, AMOOR.W, AMOMAX.W, AMOMIN.W, AMOMAXU.W, AMOMINU.W |
6969
| RV32F | FADD.S, FSUB.S, FMUL.S, FDIV.S, FSQRT.S, FMIN.S, FMAX.S, FEQ.S, FLT.S, FLE.S, FLW, FSW, FMV.W.X, FMV.X.W, FCVT.W.S, FCVT.WU.S, FCVT.S.W, FCVT.S.WU, FCLASS.S, FMADD.S, FMSUB.S, FNMADD.S, FNMSUB.S, FNEG.S, FABS.S |

docs/en/RELEASE-pipeline-hart.md

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ The map distinguishes a forwarding-covered RAW (no stall, shown as `FWD`) from a
6060

6161
### Branch prediction
6262

63-
Static prediction is configurable: **not-taken** (default) or **taken**. The pipeline attaches a prediction badge when the branch reaches `ID`. If the prediction is wrong, younger speculative instructions are flushed and the fetch redirects.
63+
Prediction is configurable: **not-taken** (default), **always-taken**, **BTFNT**, or **2-bit Dynamic**. The pipeline attaches a prediction badge when the branch reaches `ID`. If the prediction is wrong, younger speculative instructions are flushed and the fetch redirects.
6464

6565
You can also choose where branches resolve: `ID` (1 bubble), `EX` (2 bubbles, default), or `MEM` (3 bubbles).
6666

@@ -74,13 +74,22 @@ The pipeline and cache share one clock model. An IF or MEM access pays the full
7474

7575
### Configuration file
7676

77-
Pipeline settings persist in a `.pcfg` file (`Ctrl+Shift+P` to save/load):
77+
Pipeline settings persist in a `.pcfg` file:
7878

7979
```
8080
# Raven Pipeline Config v1
8181
enabled=true
82-
forwarding=true
82+
bypass.ex_to_ex=true
83+
bypass.mem_to_ex=true
84+
bypass.wb_to_id=true
85+
bypass.store_to_load=false
8386
mode=SingleCycle
87+
fu.alu=1
88+
fu.mul=1
89+
fu.div=1
90+
fu.fpu=1
91+
fu.lsu=1
92+
fu.sys=1
8493
branch_resolve=Ex
8594
predict=NotTaken
8695
speed=Normal
@@ -181,7 +190,7 @@ The pipeline tab footer shows live metrics:
181190
Cycle 342 Instr 211 CPI 1.62 Stalls 89 Flushes 12
182191
```
183192

184-
Per-class instruction counts are visible in the Gantt color legend.
193+
The per-hazard breakdown is reported as **stall tags**. A single stalled cycle can contribute to more than one tag when multiple causes overlap.
185194

186195
---
187196

docs/en/cli.md

Lines changed: 19 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -147,11 +147,13 @@ printf "42\n" | raven run calculator.fas --nout
147147

148148
When `--pipeline` is enabled, Raven still writes the normal cache statistics, but also includes a pipeline summary:
149149

150+
- scope (`selected` for pipeline-only export, `aggregate` in cache/program summaries)
150151
- committed instructions
151152
- pipeline cycles
152153
- stall count
153154
- flush count
154155
- pipeline CPI
156+
- stall-tag breakdown (`RAW`, `load-use`, `branch`, `FU`, `mem`)
155157

156158
### Assertions
157159

@@ -401,8 +403,17 @@ Controls pipeline-specific behavior used by the TUI pipeline tab and by `raven r
401403
```ini
402404
# Raven Pipeline Config v1
403405
enabled=true
404-
forwarding=true
406+
bypass.ex_to_ex=true
407+
bypass.mem_to_ex=true
408+
bypass.wb_to_id=true
409+
bypass.store_to_load=false
405410
mode=SingleCycle
411+
fu.alu=1
412+
fu.mul=1
413+
fu.div=1
414+
fu.fpu=1
415+
fu.lsu=1
416+
fu.sys=1
406417
branch_resolve=Ex
407418
predict=NotTaken
408419
speed=Normal
@@ -411,10 +422,14 @@ speed=Normal
411422
Fields:
412423

413424
- `enabled` — pipeline enabled in the TUI
414-
- `forwarding` — enable bypass/forwarding paths
415-
- `mode``SingleCycle` or `FunctionalUnits`
425+
- `bypass.ex_to_ex` — enable EX->EX bypass
426+
- `bypass.mem_to_ex` — enable MEM->EX bypass
427+
- `bypass.wb_to_id` — enable WB->ID bypass
428+
- `bypass.store_to_load` — enable store-to-load forwarding
429+
- `mode` — legacy field currently mapped in the UI as `Serialized` or `Parallel UFs`
430+
- `fu.alu` / `fu.mul` / `fu.div` / `fu.fpu` / `fu.lsu` / `fu.sys` — number of functional units of each type used by `Parallel UFs` mode
416431
- `branch_resolve``Id`, `Ex`, or `Mem`
417-
- `predict``NotTaken` or `Taken`
432+
- `predict``NotTaken`, `Taken`, `Btfnt`, or `TwoBit`
418433
- `speed` — TUI playback speed (`Slow`, `Normal`, `Fast`, `Instant`)
419434

420435
Export / import from the TUI: **Pipeline tab → `Ctrl+E` / `Ctrl+L`**

docs/en/pipeline.md

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ Each visible pipeline step corresponds to exactly one CPU clock cycle. Cache lat
4444
- Computes effective addresses for loads/stores/atomics
4545
- Applies forwarding again for EX-stage consumers
4646
- Holds instructions for the configured `CPIConfig` latency in both pipeline modes
47-
- Uses the functional-unit panel only as an expanded visualization when that mode is enabled
47+
- Always renders the functional-unit panel in the TUI so execution opportunities stay visible
4848

4949
### MEM
5050

@@ -161,9 +161,16 @@ Visual markers:
161161

162162
---
163163

164-
## Functional-unit mode
164+
## Execution model
165165

166-
In functional-unit mode, `EX` can remain busy for multiple cycles depending on class:
166+
The pipeline config currently exposes two execution models:
167+
168+
- `Serialized`
169+
- `Parallel UFs`
170+
171+
Both use the same functional-unit panel in the UI. The difference is semantic, not cosmetic.
172+
173+
In the current implementation, execution still behaves like a single in-order EX path, and `EX` can remain busy for multiple cycles depending on class:
167174

168175
- `ALU`
169176
- `MUL`
@@ -175,7 +182,7 @@ In functional-unit mode, `EX` can remain busy for multiple cycles depending on c
175182
- `SYSTEM`
176183
- `FP`
177184

178-
While a long-latency instruction holds `EX`, the front of the pipe remains blocked and Raven keeps that state visible without letting unrelated IF latency progress incorrectly. In `FunctionalUnits`, the same latency is also broken down by FU in the expanded EX panel.
185+
While a long-latency instruction holds `EX`, the front of the pipe remains blocked and Raven keeps that state visible without letting unrelated IF latency progress incorrectly. The functional-unit panel breaks that latency down by FU so the user can see which resource is active and where parallelism could exist once the execution model allows it.
179186

180187
---
181188

docs/pt-BR/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ Tudo vive em uma única TUI: escreva código, monte, execute passo a passo, insp
3434
### Aba Run (Aba 2)
3535
**Memória de Instruções**
3636
- Headers de label e separadores de bloco renderizados inline
37-
- Badge de tipo por instrução (`[R]` `[I]` `[S]` `[B]` `[U]` `[J]`)
37+
- Badge de tipo por instrução (`[R]` `[I]` `[S]` `[B]` `[U]` `[J]` `[A]` `[F]`)
3838
- Heat coloring — sufixo `×N` de contagem de execuções colorido por frequência
3939
- Resultado de branch no PC atual: `→ 0xADDR (taken)` / `↛ (not taken)`
4040
- Breakpoints (`b`), saltar para endereço (`g`), painel de trace de execução (`t`)

docs/pt-BR/cli.md

Lines changed: 22 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -145,13 +145,15 @@ printf "42\n" | raven run calculadora.fas --nout
145145
| `fstats` | Tabela legível por humanos (`.fstats`) |
146146
| `csv` | CSV compatível com planilhas |
147147

148-
Quando `--pipeline` está ativo, o Raven inclui também um resumo do pipeline:
148+
Quando `--pipeline` está ativo, o Raven ainda exporta as estatísticas normais de cache, mas inclui também um resumo do pipeline:
149149

150+
- escopo (`selected` na exportação específica do pipeline, `aggregate` nos resumos de cache/programa)
150151
- instruções committed
151152
- ciclos do pipeline
152-
- stalls
153-
- flushes
153+
- contagem de stalls
154+
- contagem de flushes
154155
- CPI do pipeline
156+
- breakdown por tags de stall (`RAW`, `load-use`, `branch`, `FU`, `mem`)
155157

156158
### Asserções
157159

@@ -396,8 +398,17 @@ Controla o comportamento do pipeline usado pela aba de pipeline da TUI e pelo `r
396398
```ini
397399
# Raven Pipeline Config v1
398400
enabled=true
399-
forwarding=true
401+
bypass.ex_to_ex=true
402+
bypass.mem_to_ex=true
403+
bypass.wb_to_id=true
404+
bypass.store_to_load=false
400405
mode=SingleCycle
406+
fu.alu=1
407+
fu.mul=1
408+
fu.div=1
409+
fu.fpu=1
410+
fu.lsu=1
411+
fu.sys=1
401412
branch_resolve=Ex
402413
predict=NotTaken
403414
speed=Normal
@@ -406,10 +417,14 @@ speed=Normal
406417
Campos:
407418

408419
- `enabled` — pipeline habilitado na TUI
409-
- `forwarding` — habilitar caminhos de bypass/forwarding
410-
- `mode``SingleCycle` ou `FunctionalUnits`
420+
- `bypass.ex_to_ex` — habilitar bypass EX->EX
421+
- `bypass.mem_to_ex` — habilitar bypass MEM->EX
422+
- `bypass.wb_to_id` — habilitar bypass WB->ID
423+
- `bypass.store_to_load` — habilitar forwarding store-to-load
424+
- `mode` — campo legado hoje mapeado na UI como `Serialized` ou `Parallel UFs`
425+
- `fu.alu` / `fu.mul` / `fu.div` / `fu.fpu` / `fu.lsu` / `fu.sys` — quantidade de unidades funcionais de cada tipo usada no modo `Parallel UFs`
411426
- `branch_resolve``Id`, `Ex` ou `Mem`
412-
- `predict``NotTaken` ou `Taken`
427+
- `predict``NotTaken`, `Taken`, `Btfnt` ou `TwoBit`
413428
- `speed` — velocidade de reprodução na TUI (`Slow`, `Normal`, `Fast`, `Instant`)
414429

415430
Exportar / importar pela TUI: **aba Pipeline → `Ctrl+E` / `Ctrl+L`**

rust-to-raven/.cargo/config.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,4 @@ target = "riscv32im-unknown-none-elf"
55
build-std = ["core", "alloc"]
66

77
[target.riscv32im-unknown-none-elf]
8-
rustflags = ["-C", "link-arg=-e", "-C", "link-arg=_start", "-C", "link-arg=--gc-sections"]
8+
rustflags = ["-C", "target-feature=+a,+f", "-C", "link-arg=-e", "-C", "link-arg=_start", "-C", "link-arg=--gc-sections"]

rust-to-raven/.vscode/settings.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,10 @@
77
"cargo",
88
"+nightly",
99
"check",
10-
"-Zbuild-std=core,compiler_builtins",
10+
"-Zbuild-std=core,alloc,compiler_builtins",
1111
"-Zbuild-std-features=compiler-builtins-mem",
1212
"--target",
13-
"riscv32im-unknown-none-elf",
13+
"riscv32imafc-unknown-none-elf",
1414
"--message-format=json"
1515
]
1616
}

rust-to-raven/API.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ use raven_api::{exit, pause_sim};
2222
use raven_api::{spawn_hart, spawn_hart_fn};
2323
```
2424

25-
The crate is `no_std` and requires the `riscv32im` target.
25+
The crate is `no_std` and targets Raven's RV32IM baseline with `A` and `F` enabled explicitly at build time.
2626
Build with:
2727

2828
```bash

0 commit comments

Comments
 (0)