Design: CORE-ET (Erbium CPU Subsystem)
Repository: https://github.com/openhwgroup/core-et
Branch: erbium (note: not main)
Tip at filing: b38a1a3 — "Initial drop of the Erbium original RTL" (2026-05-04)
License: Apache-2.0
Stars: 56 (openhwgroup-hosted, multi-vendor consortium)
Last activity: 2026-05-07 (active)
Description
CORE-ET is the CPU subsystem of the Erbium SoC — a multi-core, multi-threaded RISC-V processor subsystem targeted at AI inference workloads.
- Each ET-Minion core is a dual-threaded, in-order, single-issue RV64IMFC processor extended with a custom 8-lane SIMD/vector unit that supports packed FP/integer, transcendental, and tensor (ML) instructions. Private 4 KB L1 D-cache per core; shared I-cache.
- 8 ET-Minions are grouped into an ET-Neighborhood, sharing I-cache infrastructure and I/O buses; 2× L0 ICache (1 per quad) backed by a 32 KB L1 ICache.
- The CPU Subsystem integrates one Neighborhood plus PLIC, CLINT, fast local barriers (FLB), fast credit counters (FCC), and subsystem CSRs.
- The subsystem is the major AXI4 transaction initiator of the wider Erbium SoC.
Why it's a good benchmark candidate
- New architecture for the suite: a multi-core, multi-threaded RISC-V cluster with custom SIMD + ML/tensor instructions — distinct from existing processors (minimax, NyuziProcessor, bp_processor) and from gemmini (which is a systolic-array accelerator rather than a core). Adds a coherent multi-core integration profile that the benchmark suite doesn't currently have.
- Industry-relevant: OpenHW Group designs (e.g. CVA6, CV32E40P) are widely used as industrial-quality reference cores; Erbium is the consortium's first SoC-scale drop.
- Active development: hosted in openhwgroup org with recent commits.
- Memory-heavy: shared L1 ICache (32 KB), 8× 4 KB private DCaches, plus L0 caches, register files, and various TLB/barrier structures — exercises the FakeRAM macro flow and macro placement heavily.
- Routing/PnR stress: 8 cores × SIMD lanes + shared cache fabric + AXI4 interconnect is exactly the kind of design that stresses placement density, routing congestion, and CTS.
Estimated complexity
- Gate count: Large. 8-core cluster with 8-lane vector units, tensor unit, shared cache infrastructure, PLIC/CLINT, AXI4 master logic. Repo is ~91 MB.
- Memories: Many — FakeRAM definitely needed (L1 ICache 32 KB shared, 8× 4 KB DCaches, L0 caches, RF, possibly TLBs). Likely candidates for partitioning per Neighborhood/Minion similar to bp_processor's
bp_uno/bp_quad split.
- IO count: Moderate to high — AXI4 manager interface + interrupt/timer/peripheral I/O.
Verification
dv/ directory contains a substantial verification environment: arch_monitors/, cosim/, dpi/, minion_common/, neigh_common/, noc/, ip/, ip_stub/. README mentions a Verilator-based simulation path with RISC-V toolchain integration. Co-simulation infrastructure suggests Spike/ISA-level reference checking is wired up.
Conversion notes
Likely SystemVerilog (consistent with OpenHW Group convention). Will need either yosys-slang (preferred) or sv2v depending on construct coverage. RTL is organized under rtl/cpu_subsystem, rtl/shire, rtl/libs plus rtl/inc. extern/ directory at top level suggests external IP submodules — may need to be initialized recursively.
A reasonable first cut: start from the CPU Subsystem as the top (one Neighborhood), with the FLB/FCC/PLIC/CLINT logic, treating large SRAMs as FakeRAM macros. A smaller partition (single Minion) could follow if the full subsystem is too large for the smaller platforms, following the bp_uno/bp_quad precedent.
Target platforms
Design: CORE-ET (Erbium CPU Subsystem)
Repository: https://github.com/openhwgroup/core-et
Branch:
erbium(note: notmain)Tip at filing:
b38a1a3— "Initial drop of the Erbium original RTL" (2026-05-04)License: Apache-2.0
Stars: 56 (openhwgroup-hosted, multi-vendor consortium)
Last activity: 2026-05-07 (active)
Description
CORE-ET is the CPU subsystem of the Erbium SoC — a multi-core, multi-threaded RISC-V processor subsystem targeted at AI inference workloads.
Why it's a good benchmark candidate
Estimated complexity
bp_uno/bp_quadsplit.Verification
dv/directory contains a substantial verification environment:arch_monitors/,cosim/,dpi/,minion_common/,neigh_common/,noc/,ip/,ip_stub/. README mentions a Verilator-based simulation path with RISC-V toolchain integration. Co-simulation infrastructure suggests Spike/ISA-level reference checking is wired up.Conversion notes
Likely SystemVerilog (consistent with OpenHW Group convention). Will need either yosys-slang (preferred) or sv2v depending on construct coverage. RTL is organized under
rtl/cpu_subsystem,rtl/shire,rtl/libsplusrtl/inc.extern/directory at top level suggests external IP submodules — may need to be initialized recursively.A reasonable first cut: start from the CPU Subsystem as the top (one Neighborhood), with the FLB/FCC/PLIC/CLINT logic, treating large SRAMs as FakeRAM macros. A smaller partition (single Minion) could follow if the full subsystem is too large for the smaller platforms, following the
bp_uno/bp_quadprecedent.Target platforms