Nommu linux#21
Open
adambagley wants to merge 36 commits into
Open
Conversation
The core was Machine-mode only. Add User privilege (M+U; no Supervisor,
MMU, or PMP) so an M-mode kernel can run user processes in U-mode.
- riscv_pkg: ExcEcallUmode (cause 8); PrivU/PrivM encodings; mstatus
MPP/MPRV field positions.
- csr_file: current-privilege register (resets to M); mstatus.MPP as a
live WARL field {M,U} (was hardwired M) plus mstatus.MPRV. Trap entry
saves the privilege to MPP and enters M; MRET returns to MPP, sets
MPP=U, and clears MPRV when returning below M. misa advertises U
(0x4010_112F). New o_priv output.
- trap_unit: machine interrupts are taken while in U-mode regardless of
mstatus.MIE, so the timer can preempt user code.
- cpu_ooo: commit-time ECALL cause select (8 from U, 11 from M) into
mcause. MPRV is intentionally inert (no PMP/MMU); trapping illegal
CSR/MRET access from U-mode is intentionally deferred.
M-mode behavior is unchanged (the interrupt-enable reduces to mstatus.MIE
in M). Verified: hello_world cocotb smoke passes.
The core now implements Machine (M) and User (U) privilege modes (was
M-mode only; still no S-mode/MMU/PMP). Sync docs and descriptive
comments to match the U-mode commit.
- READMEs (root, hw/rtl, cpu, verif, sw) + CONTRIBUTING: M-mode -> M/U
privilege wording; root extensions table gains a User Mode row;
CONTRIBUTING future-work list no longer implies S-mode is supported.
- RTL header/doc-block comments (csr_file, trap_unit, riscv_pkg,
cpu_and_mem, instr_decoder): M/U privilege; csr_file notes mstatus.MPP
WARL {M,U}, inert MPRV, and misa 0x4010_112F.
- sw headers (trap.h, csr.h), FreeRTOSConfig.h, __init__.py: M/U wording;
trap.h notes ECALL cause 8 (U) / 11 (M); csr.h notes MSTATUS_MPP WARL.
Comments/docs only; no functional, test, or test-skip changes.
Two trap-path bugs, surfaced by a U-mode directed test (they also affected M-mode interrupts): - Interrupt mcause was 0. csr_file's mcause was driven from the ROB's synchronous cause, while trap_unit's arbitrated cause (the interrupt cause with the interrupt bit set, or the remapped exception cause) was left unconnected. Route trap_unit.o_trap_cause into csr_file.i_trap_cause. This is why FreeRTOS's preemptive tick never fired -- its mcause==0x8000_0007 check could not match; it now works. - Double trap-entry. A single trap was applied twice, the second time in M-mode, corrupting mstatus.MPP/mcause. The exception path re-armed while the ROB's trap_pending was still asserted, and the registered interrupt_pending re-fired. Hold exception_pending cleared one extra cycle (via the existing trap_taken_prev) and gate interrupt_pending with !o_trap_taken. Both feedback paths pass through a flop (no combinational loop) and stay off the take_trap->stall->cache critical cone. Verified: freertos_demo passes (now with real preemption); the U-mode directed test's ecall-from-U (mcause=8) and timer-preempt-from-U (mcause=0x8000_0007, taken in U-mode) cases pass.
Enforce the U-mode privilege check at the ROB head: a U-mode access to an M-mode CSR (csr_addr[9:8] > priv) or an MRET is an illegal instruction (mcause=2). It is computed from existing head signals (head_is_csr, head_is_mret, head_csr_addr) plus a new i_priv input fed from csr_file.o_priv -- so no decode/dispatch/HeadMetaWidth changes are needed. Folded into head_exception/head_exc_cause at the source so every consumer (commit_en, o_csr_start/o_mret_start, o_trap_pending, the serial FSM, the commit record) treats it as a precise exception; the faulting op never executes or retires. It rides the same single-cycle exception path, so the trap_unit double-trap guard already covers it. M-mode-inert by construction (head_priv_fault is 0 when priv==M). i_priv is bridged cpu_ooo -> tomasulo_wrapper -> reorder_buffer. Verified: umode_test C (M-CSR-from-U) and D (MRET-from-U) trap mcause=2 from U-mode; freertos_demo still passes (M-mode unchanged). Note: head_priv_fault lands on the commit_en cone -- needs an X3 post-opt WNS check before merge.
Self-checking app on the real core (<<PASS>>/<<FAIL>>) covering the M+U privilege support end-to-end: A ecall-from-U -> mcause=8 (ExcEcallUmode) B timer preempts U (MIE=0) -> mcause=0x8000_0007, taken in U-mode C M-CSR read from U -> illegal instruction (mcause=2) D MRET from U -> illegal instruction (mcause=2) The naked M-mode handler records the first trap's mcause + originating privilege (mstatus.MPP) and bounces to a fixed continuation, so each case self-checks both the cause and that it was taken from U-mode. Registered in test_run_cocotb.py. Also corrects the now-stale test_arch_compliance.py comment: Frost is M+U (not M-only); the privilege suite's U-mode tests drive an S-mode trap routine and need S/H extensions, so they remain filtered out, with U-mode covered by this directed test.
Present a word-stride 16550 register face at 0x4000_1000 (DTB reg-shift=2, reg-io-width=4) that aliases the native UART TX/RX, so a stock Linux 8250 console driver (earlycon=uart8250,mmio32) can drive FROST's UART. THR transmits when DLAB is clear; LSR reports THRE/TEMT from TX-ready and DR from RX-valid; IER/FCR/LCR/MCR/SCR plus the DLL/DLM baud divisor form a small register file (the divisor is accepted but ignored -- FROST's UART runs at a fixed baud). Widen MmioSizeBytes 0x2C -> 0x1_C000 so the new face (and the CLINT alias to follow) fall in the MMIO-decoded range. The native UART TX/RX, FIFO, and timer paths are unchanged: the THR alias only adds an inert OR term to o_uart_wr_en, and the register-file writes live in a separate block whose decode misses every existing address. Validated by sw/apps/ns16550_test: the 8250 init dance plus register-file and TX-ready checks pass, and a banner transmitted through the face appears on the UART TX line.
Expose a sifive,clint0-compatible window at 0x4001_0000 (msip @ +0, mtimecmp @ +0x4000, mtime @ +0xBFF8) that aliases the native FROST timer registers, so a stock Linux CLINT driver delivers the machine timer tick. The aliases read and write the same msip/mtimecmp/mtime as the native block (no new state); native-timer behavior is unchanged -- the CLINT addresses are added as extra read-mux cases and extra labels on the existing mtime/mtimecmp/msip write paths. Validated by sw/apps/clint_test: writes through the CLINT window are observable at the native timer addresses, and a machine timer interrupt set up entirely through the CLINT window fires with mcause=0x8000_0007. freertos_demo still passes (M-mode timer/UART unchanged).
Reflect the Linux glue (commits "Add ns16550a UART face for the Linux console" and "Add SiFive CLINT alias for the Linux timer") in hw/rtl/README.md: widen the MMIO region size (44 B -> 112 KiB), add the ns16550a face (0x4000_1000) and the SiFive CLINT alias (0x4001_0000) to the MMIO register table with a note on the device-tree binding, and correct the MMIO_SIZE_BYTES parameter (0x2C -> 0x1_C000).
…nux to banner The store-conditional resolution path deadlocked when several SCs are in flight under branch speculation (e.g. an LR/SC retry loop). Two bugs: 1. sc_pending was cleared on any partial flush ((speculative_partial_flush || is_younger(...))), which dropped a *surviving* older SC. Clear only when the pending SC is younger than the flush boundary (is_younger). 2. The mem-RS ready gate (!(sc_pending && mem_rs_next_is_sc)) blocked the *older* head SC from issuing whenever a *younger* speculative SC had already issued and set sc_pending. The head SC then never issued, never fired, and sc_pending never cleared -> deadlock. sc_pending_unit now tracks multiple in-flight SCs in a small per-ROB-tag table (depth NumCheckpoints+1) and fires the SC whose tag matches the ROB head; the serialization gate is removed. BRAM LR/SC was unaffected (it resolves before a second SC issues); the longer cached-DDR latency exposed it (Linux printk _prb_commit). Also adds the ddr_atomic_test directed reproducer and the linux_boot bring-up. With the fix the kernel now reaches "[ 0.000000] Linux version".
- ddr_atomic_test: include_in_pytest=True so CI runs it (bram tier; it self-skips the ddr relink tier via its ddr_ name in DDR_TIER_EXCLUDE). linux_boot stays excluded from pytest (needs external kernel artifacts). - tomasulo_wrapper/README.md: rewrite the SC-state-machine section and the sc_pending_unit row for the multi-SC table and the removed issue-serialization gate. - sc_pending_unit.sv: correct the header comment (under speculation the head SC was blocked from issuing by the gate, not overwritten in the register). - tests/README.md: list ddr_atomic_test among the DDR-tier-excluded programs.
…anic) An MRET returning below M-mode retires via the trap/MRET full flush, not the commit path, so cpu_ooo's interrupt_resume_pc was never refreshed to the MRET target and held the MRET instruction's own PC across the whole MRET-to-U window. A machine timer taken after privilege dropped to U (once the trap_unit inhibit lifted, before the first U-mode commit) then saved mepc = <MRET PC>; Linux later MRET'd to that kernel address in U-mode -> SIGILL at ret_from_exception+0x76 -> "Attempted to kill init" panic. Fix (cpu_ooo.sv): seed interrupt_resume_pc <= csr_mepc when mret_taken fires, so the U-target is in place before the inhibit window closes. Proven by a new directed test (sw/apps/mret_timer_resume_test): timer already pending across an MRET-to-U; asserts the saved resume PC is the U-mode target, not the MRET PC. FAIL (resume_mepc=MRET PC) before, PASS after. Directed regression green (umode, wfi_mepc, trap_unit, linux_irq stack/find/ddr/ active_ddr). On Genesys2 the boot clears the 0x80388bba panic and advances from ~0.85s to past initramfs unpacking. Also captures the entangled in-flight bring-up work in the same files (trap_unit MRET/interrupt inhibit window, slot-2 store-commit SQ guard, LR/SC load_queue), the linux_boot ret_from_exception image patch, and the fpga/load_software JTAG DDR-loader updates.
- cpu_ooo.sv: remove the temporary `ifndef SYNTHESIS FROST_HB/FROST_DBG
$display debug harness (the self-labeled "first-timer-IRQ ra-corruption hunt"
heartbeat + trap/RA traces). Pure deletion (31 lines, 0 added); the
test-facing dbg_* /* verilator public_flat_rd */ signals the cocotb harness
reads are untouched. Rebuilt clean; mret_timer_resume_test still PASSes
(resume_mepc = u_spin U-target).
- Docs synced to the committed RTL:
- tomasulo_wrapper/README.md: document the !i_flush_all mask on the registered
commit-valid outputs and the slot-2 raw store-commit SQ guard
(i_commit_valid_comb_2, previously tied to 1'b0).
- store_queue/README.md: the combinational commit guard covers any flush
racing a registered commit (partial-flush recovery AND full-flush trap/MRET/
FENCE.I drains); the wrapper now actually drives the slot-2 twin.
- verif/README.md: add the new cocotb_tests/control/ (trap_unit) entry.
- test_real_program.py: revert the linux_boot pass marker from the temporary
"Kernel panic" debug string (now obsolete -- the fix removed that panic) to
the "Linux version" boot banner. linux_boot is include_in_pytest=False (not in
CI); interim criterion pending boot-to-shell.
CI already covers the new tests (pytest -m cocotb / -k test_real_program);
runner + Makefile registration was already complete.
The MRET restore-window patch targeted a hardcoded image word offset, which shifts whenever the kernel is rebuilt (e.g. after editing entry.S to strip the bring-up IRQ probes). Locate the target by its unique machine-code word (sc.w == 18c1202f) instead, with an idempotency check (absent + NEW_WORD present => already patched) and an ambiguity guard (>1 occurrence => abort). The patch now survives kernel rebuilds. Context: the U-mode variant of the timer interrupt-resume-PC race is fixed in hardware (cpu_ooo.sv seeds interrupt_resume_pc from csr_mepc on mret_taken), but the M-mode restore-window variant is not yet -- an unpatched kernel hangs at the CLINT clocksource switch once the periodic timer tick ramps up. So this software crutch (clear mstatus.MIE in the restore window) is still required for now. Drop it once the M-mode window is fixed properly (RTL, or a clean kernel change that keeps the sc.w reservation-clear). Makefile comment updated to reflect this.
…ated Capture the post-0x80388bba-fix bring-up state: kernel boots fully to the /sbin/init handoff on hardware; userspace execution + syscalls + vfork/exec/ wait proven working via minimal bFLT test inits; busybox blockers root-caused (bFLT stack 16KB too small -> Buildroot FLAT stack-size fix; 16MiB->64MiB RAM); and the remaining reliability blocker isolated to a residual M-mode machine- timer trap-return race (memory-size- and board-state-independent; ~33-67% flaky; often hangs at the clocksource switch where the unpatched kernel died). Next: directed sim of the M-mode ret_from_exception/MRET restore path to find and fix the residual race. Plus operational notes (autonomous bitstream programming, fast boot-watch, bFLT test technique).
…g repro WIP) Synthetic reproducer for the residual M-mode machine-timer trap-return race that intermittently hangs the no-MMU Linux boot. An M-mode loop (loads/stores/ ALU) is preempted by a frequent machine timer whose period is re-armed to a swept value (mtime + 24..87) each tick, so the timer lands at every cycle offset around the MRET across ~10k ticks; a deadlock would stall the loop and time out. Currently PASSES (survives 9851 IRQs) -- i.e. it does NOT yet reproduce the race, same as the existing linux_irq_* DDR tests. Kept as a regression + a starting point: the race needs more faithful conditions (full GPR save/restore to a DDR stack, WFI idle, or the exact clocksource-setup sequence). The full linux_boot sim is DDR-bound and too slow to reach the hang (25M cycles only reached early pre-timer init).
HW-verified on genesys2: Coremark Pro (9 workloads), Coremark, FreeRTOS demo, and isa_test all pass. No-MMU Linux 6.18.7 boots past the ___slab_alloc AMO wedge and the 0x38d7fa RVC fetch-misalign into the timer phase (furthest yet); next blocker is a timer-IRQ-dispatch data corruption, still open. - if_stage/pc_controller/fetch_provider/cpu_and_mem/cpu_ooo: served-window resteer fix for the RVC fetch/decode alignment desync (passes isa + all fetch regressions) - load_queue/lq_issue_selector: AMO-deadlock breaker (clears ___slab_alloc) - trap_unit/reorder_buffer/sq_forwarding/store_queue/tomasulo_wrapper: interrupt + memory-path WIP - new cocotb directed tests + bare-metal repros (sw/apps/*) + registry - fpga/load_software DDR loader + linux_boot kernel-image patch script WIP checkpoint: verified better than prior state, not final.
Sim-only parameter (default 0 = FPGA, set 1 for cocotb): completes fence.i L1 invalidate-all in O(1) (sdp_block_ram bulk-clear) and writeback-all in O(dirty) instead of O(NumLines), cutting fence.i from ~8998 to ~327 cycles (~27x). The default-0 path is byte-for-byte the original FSM; Yosys confirms the fast logic does not elaborate at param=0 (zero FPGA impact). Unblocks booting the real (un-noop'd) no-MMU Linux kernel in sim. Verified: frost_cache unit (on+off), ddr_smc_test (SMC via fence.i), isa_test all pass; verible clean.
linux/buildroot-external/ — BR2_EXTERNAL board support to build the FROST no-MMU RV32 kernel (Linux 6.18.7) reproducibly: defconfig, kernel config fragment, DTS, post-image packer (build_fpga_boot.py), rootfs. Plus .github/workflows/linux-boot-sim.yml — CI that builds via the external tree, stages sw*.mem, and runs the cocotb linux_boot sim. Packer validated byte-identical to the existing hand-built artifacts. NOTE (follow-up): add the Buildroot submodule pin — local tree is 2026.08-git @67449130 (not the tagged 2026.05 the README assumes); pick one before CI runs. The vendored build_fpga_boot.py carries a ruff D103/UP031 noqa pending refactor.
multi_lookup placed "maps" (len 4) in loadavg's RIGHT subtree (meminfo.left), where the length-first lookup walk can never reach it: a len-4 query always descends LEFT at the len-7 root into the cmdline subtree, hits cmdline.left=NULL, and returns 0 -- so the "maps found" assertion failed by tree construction, not by any RTL fault. Move maps onto the left spine (cmdline.left) so it is reachable; set_rb_links is (idx, RIGHT, LEFT). The full pass (16 iterations x 5 lookups x 2 variants) takes ~1.12M cycles -- it previously bailed early on the maps failure, masking the real budget, which exceeds the 500k default. Add PDE_RETURN_HAZARD_MAX_CYCLES=2M so the test runs to completion. Verified: all subtests + <<PASS>> at 1,124,182 cycles.
The fall-through link_address used is_compressed_for_link = sel_compressed_sc, whose stall_capture_reg flush-zeroes its saved bit on a flush-inside-a-stall. For a compressed branch held at fetch that makes is_compressed_for_link read 0 -> link_address = pc_reg+4 (one halfword too far); when that stale fall-through link is later consumed as a not-taken mispredict redirect target, fetch skips the branch's successor parcel. Drive is_compressed_for_link from a dedicated stall-capture that does NOT flush-zero, so the held instruction's true size sets the link. sel_compressed_sc's other consumers are replay-gated (sel_nop_saved=1 after a flush) and unaffected. Verified by a directed if_stage test (flush-in-stall must yield link=pc_reg+2) and isa_test 570/570; boot-safe in the no-MMU Linux capture. (Latent link bug found during the timer-IRQ fetch-drop hunt; distinct from that drop, which is a separate served-window-resteer issue still under investigation.)
The naked timer handler clobbers t1 (it addresses g_mepc/g_taken via t1, then loads MTIMECMP_HI=0x4000001C into t1 to ack the timer), but the WFI inline-asm clobber list listed only "t0","memory" -- not "t1". The compiler kept g_taken's base pinned in t1 across the WFI, so after the async timer interrupt clobbered t1 the post-WFI while(!g_taken) read a stale address and spun forever. DDR layout g_taken=0x800017d8 -> loop computes 2008(t1); with t1 clobbered to 0x4000001C it read 0x400007f4 (empty MMIO -> 0) and never saw g_taken. BlockRAM passed only by coincidence (offset 0 -> the clobbered read hit MTIMECMP_HI=0xffffffff -> beqz fell through). This was misread as a DDR-tier RTL trap/MRET deadlock; the trap/MRET/flush path is correct here. Add "t1" to the clobber list. Verified: wfi_mepc_test passes in DDR (<<PASS>> at 71k cycles).
…d-trip) The handler re-armed the timer to mtime+24..87 cycles, shorter than the ~90-cycle trap->handler->MRET->resume round-trip (two flush_all pipeline wipes per tick + store-drain). The timer was perpetually overdue, the handler saturated the core, and main's 20000-iteration loop never advanced -> <<PASS>> never printed and the harness timed out. Ground-truth wedge instrumentation showed this is a livelock in the TEST, not an RTL deadlock: the pipeline kept retiring (648-870 commits/2000cyc), flush_all only pulsed (~44/2000, never stuck), and MTIP was serviced every tick -- refuting the prior "flush_all stuck / undrained mtimecmp store" theory. Re-arm to 512..575 cycles (> round-trip) so main makes net progress while still preempting at every swept phase. Verified: <<PASS>> at 270k cycles, loop completes (g_loop=19999), 465 timer IRQs serviced.
A None-safe background cocotb coroutine, gated on FROST_WEDGE_MONITOR=1 (off by default), that samples the trap/MRET/flush/IRQ/store-drain state every clock and emits aggregated snapshots (cycles-high, rising-edge counts, per-PC and per-load-address histograms) every FROST_WEDGE_DUMP_INTERVAL cycles, plus a banner when UART progress stalls. It dumps what is actually stuck -- head PC, trap_taken/mret_taken (+ registered forms), flush_all/flush_en, mip.MTIP, mstatus.MIE, mtime/mtimecmp, sq_committed_empty -- instead of guessing. This is what ground-truth-diagnosed mtimer_stress (interrupt-saturation livelock) and wfi_mepc_test (software t1-clobber): both were test bugs, not the RTL deadlocks previously assumed. Kept for re-examining wfi_lost_tick / mret_drain_deadlock and the SQ-drain-occupancy follow-up. Off by default; the existing suite is unaffected (verified -- wfi_mepc/mtimer runs passed with it present).
wfi_lost_tick passes on bram but the ddr axis needs 533752 cycles (cold I-cache fill + slower per-tick round-trip across the 3000-tick phase sweep), exceeding the 500000 default. Raise the per-app cap, mirroring coremark/pde_return_hazard. Observed via the wedge monitor: a flat tick rate and jiffies overshooting the iteration count -> no lost ticks; the failure was purely a budget overrun (~7%). RTL unchanged. (mret_drain_deadlock already passes; the deadlock its header fears was fixed in 3c30849 -- only the comment is stale.)
…al flush The sim-only assertion fired on i_alloc.valid during i_flush_all, but i_flush_all structurally squashes the allocation (priority else-if branches), so no entry is ever written; the RS issues the alloc handshake un-flush-gated for timing closure (safe per p_alloc_slot_free). Narrow the assertion to (i_flush_en && !i_flush_all) so only the genuinely-unsafe partial-flush case is checked. No hardware or drain-timing change -- the suspected SQ drain inefficiency was a measurement artifact (drain_wait <1.3%; occupancy is by-design registered drain + pessimistic empty-clear). mtimer_stress warnings 1222->0; store_queue unit 46/46, memory_test PASS.
3c30849 added the i_served_addr input and served-window guard, but the cocotb harness never drove the new port, so it defaulted to 0 and the guard mis-fired on cached test PCs -> 10/13 failures (sel_nop forced; BTB never trained). Drive i_served_addr to track pc_reg (offset 0 models the always-valid 1-cycle fetch provider; offset 1 for the F=W+1 lead/parity desync test). Harness-only, no RTL change. Suite 13/13; sibling instruction_aligner 10/10. The offset=0 experiment confirms the served-window guard RTL is sound (catches a real F=W+1 aligner parity desync), so no RTL bug.
The non-SIM_FAST_MAINT (FPGA) cache-maintenance FSM walked all NumLines on every fence.i (~8200 cyc for the 128 KiB/4096-line L1D), so smc_fencei_test (~193 fence.i) overran its cycle budget and timed out. Add a synthesizable dirty-index-range tracker (wb_lo/wb_hi/wb_any), set on the dirty-bit write hooks and cleared on writeback completion, so the writeback-all walks only [lo,hi]. Functionally identical (writeback still ordered before invalidate); the SIM_FAST_MAINT fast path and the L1I invalidate sweep are unchanged. Verified standalone: SIM_FAST_MAINT=0 smc_fencei_test + ddr_smc_test PASS; SIM_FAST_MAINT=1 unaffected; cache unit suite 12/12 on both L2 shapes. ~8200->7 cyc/fence.i. (Item framing was 'stale-code race' -- refuted by observation; this is an efficiency fix.)
The no-MMU M-mode Linux boot hung in the machine-timer IRQ dispatch at ~mc 20.96M. A pending BTB prediction for the halfword-target bgeu@0x8005a19c, while pc_reg sits on its immediate-predecessor load c.lw@0x8005a19a, made the pending-prediction fetch-holdoff squash that load AND the land-on-branch arm jump pc_reg past it, DROPPING the load -> stale a5 -> wrong __irq_resolve_mapping path -> wild jalr. Add an immediate-predecessor carve-out (pending_imm_pred_emit): when a pending prediction's branch is the compressed parcel right after pc_reg, let the older parcel EMIT and advance pc_reg sequentially onto the branch (the documented prediction_metadata_tracker intent), keeping the prediction live. The carve-out must be NARROW. The o_pc_reg+2 condition alone fires ~50k times/boot, mostly at wcs=0 dual-issue load+branch bundles where the load already emits -- and there, clearing sel_nop lets pc_reg_advance_sel_live pick +4 (slot-2) so pc_reg jumps PAST the branch, mishandling the pending prediction and corrupting a later return (stale-ra wild ret, observed at of_prop_next_string 0x8021fcae, ~mc 15.5M). So fire ONLY when the served window cannot deliver the load (raw window_cannot_serve_pc_reg = the true drop condition), and LATCH the engagement (carve_out_engaged_q) through the wcs=0 emit cycle so the served load is not re-NOP'd. Not a pc_reg hold (pim still advances pc_reg) -> cannot deadlock. Verified on linux_boot_128k (genesys2 shape: CACHED_HAS_L2=0, 128KiB L1I, SIM_FAST_MAINT=1): the boot now clears the timer phase and runs healthily into kernel init (devtmpfs) with 0 wild jumps / 0 panics; the revmap_size load EMITs at the gremlin site. UNOPTFLAT=0 (comb-loop clean), isa_test 570/570.
…bble rob_serializer only recognizes a serial head (CSR/FENCE/FENCE.I/WFI/MRET) while !i_flush_en (its SERIAL_IDLE guard), but commit_en did not gate on i_flush_en. During an early-backend / mispredict recovery bubble (i_flush_en=1, a fixed 1-cycle hold) a head FENCE.I therefore retired UNSERIALIZED -- skipping its cache sync (L1D writeback-all + L1I invalidate-all) entirely -- so store-produced code was never published to DDR and a post-fence fetch read stale bytes (the self-modifying-code bug). Gate commit_en (and the mirrored, currently-inert o_head_commit_misprediction_candidate) on !i_flush_en so commit_en is a strict subset of the serializer's IDLE guard: a serial head can no longer retire during the bubble; it commits and serializes once the bubble clears. The bubble never waits on the head committing, so this cannot deadlock, and slot-2 retire is already rooted in commit_en. The pre-existing o_csr_start already gates on !i_flush_en -- commit_en was simply missing the guard its siblings had. Confirmed cycle-exact (fence.i @ PC 0x1aae retired with head_is_fence_i=1, flush_en=1, commit_stall=0, serial_state=IDLE). The directed repro fencei_wide_test failed on baseline (got 0x5cb, want 0x559) and passes with the fix on SIM_FAST_MAINT=0 and =1. isa_test 570/570; smc_fencei/fence_speed/fencei_blob pass; UNOPTFLAT lint clean; gremlin linux_boot_128k capture clean to the 22M-cycle cap.
…lloc)
binfmt_flat passes no ELF auxv, so AT_PAGESZ is absent and uClibc's
_dl_aux_init left _dl_pagesize=0 -> getpagesize()=0 -> malloc rounds its
first heap-extend block to round_up(size,0)=0 -> mmap(0,0)=EINVAL -> the
first malloc() fails. busybox (init/sh) then aborts "out of memory", PID 1
exits, kernel panics ("Attempted to kill init") and the system never
reaches a shell.
Default _dl_pagesize to the compile-time PAGE_SIZE (the documented intended
behaviour, matching the ldso.c dynamic path) when AT_PAGESZ is missing.
Verified: busybox reaches an interactive shell on RV32 no-MMU under both
QEMU and the FROST FPGA. The FROST core was never at fault.
…e external Make frost_nommu_rv32_defconfig reproduce the image we've been assembling by hand in ~/bigger_l0/linux-mvp, so the pipeline is buildable from a clean checkout / in CI: - board/frost/busybox.config: minimal base + the bring-up toolkit (vi/find/awk/top/less/tar/devmem/gzip/gunzip, 177 applets); wired via BR2_PACKAGE_BUSYBOX_CONFIG. - board/frost/device_table.txt: static /dev bootstrap nodes for the initramfs (console/null/zero/kmsg/ttyS0/tty), which an initramfs boot lacks and init's stdio needs before the sysinit devtmpfs mount; wired via BR2_ROOTFS_DEVICE_TABLE. - board/frost/rootfs-overlay/etc/inittab: clean /sbin/init inittab (devtmpfs mount + getty, no debug markers); wired via BR2_ROOTFS_OVERLAY. - board/frost/post-image.sh: apply patch_ret_from_exception (M-mode restore-window crutch) to the packed sw_ddr after build_fpga_boot; script vendored into board/frost so the external is self-contained. - gitignore the generated frost-nommu-fpga.dts/.dtb and sw*.mem/.txt (the committed .dts was a stale standalone-run artifact; nothing reads it). The uClibc no-MMU page-size fix is already committed under board/frost/patches/uclibc; build_fpga_boot.py already emits clean bootargs.
Vendor Buildroot as a submodule at linux/buildroot pinned to commit 67449130, so the FROST no-MMU M-mode Linux kernel + busybox initramfs + FROST memory images build reproducibly from a clean `git clone --recurse-submodules`, driven by the committed linux/buildroot-external BR2_EXTERNAL tree. Rework sw/apps/linux_boot/Makefile to build from that submodule instead of the out-of-repo ~/bigger_l0/linux-mvp artifacts. Two stages: stage 1 (slow, board-independent) builds the kernel + rootfs via Buildroot and is cached in linux/build; stage 2 (fast, board-dependent) repacks the DDR image for the board clock (FPGA_CPU_CLK_FREQ) and applies the M-mode timer patch. sw_ddr.mem now depends on the kernel Image + packer, killing the stale-image footgun where it never regenerated when the underlying image changed. load_software.py <board> linux_boot self-builds through that Makefile: it sizes the build timeout for the ~30-60 min first build, prints a preflight notice, threads the board clock (genesys2=133.33 MHz, x3=300 MHz), and fails fast with guidance when the submodule or host tools are missing. The Dockerfile gains Buildroot's host deps + QEMU as a late layer -- the single source of truth for the Linux build + boot CI jobs and the self-build. Generated build outputs (linux/build, linux/dl, linux/ccache, the packed images) are gitignored.
Add three jobs to the existing CI workflow (absorbing and deleting the draft
linux-boot-sim.yml -- one workflow, multiple jobs):
build-frost-linux builds the kernel + initramfs + FROST memory images from
the linux/buildroot submodule inside the frost-dev image,
caching Buildroot's dl/ and ccache, then uploads them.
linux-boot-cocotb boots that image on the FROST RTL in cocotb for ~22M
cycles in the genesys2 cache shape (128 KiB L1I, no L2).
That window is silent mem_init after devtmpfs, so it
captures the full boot and check_linux_boot_regression.py
asserts health: banner + devtmpfs, no panic, the periodic
CLINT timer tick serviced (mtimecmp re-armed -- the gremlin
hung here), and retire progress all the way to the cap.
linux-boot-qemu boots the SAME kernel + rootfs to a login prompt under
QEMU -- a fast full-userspace reference.
check_linux_boot_regression.py is the standalone boot-health checker; the
FROST_LINUX_RUN_FULL capture-mode comment documents its CI use.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.