Skip to content

Latest commit

 

History

History
78 lines (66 loc) · 3.55 KB

File metadata and controls

78 lines (66 loc) · 3.55 KB

Day 7 Seed — Next Session Priorities

P0: Commit & Push Day 6 Results

  • 7 new data files, 4 scripts, TODO.md, week report, wisdom
  • Update README/CLAUDE.md already done, need commit+push
  • Tag v1.1.0? (div8 v3, sign/sat, u32, arith16, SHA-256)

P1: Z80 Intro LFSR Alignment (UNFINISHED from Day 5)

  • cuda/che_intro.asm (351 bytes) produces noise — LFSR mismatch
  • Root cause: CUDA search uses per-layer init, Z80 code uses fixed init
  • Fix: re-run CUDA segmented search with EXACT Z80 LFSR parameters
  • Or: rewrite Z80 code to match CUDA LFSR conventions
  • Goal: working 256-byte ZX Spectrum intro with recognizable Che face
  • 85 seeds × 2 bytes = 170 bytes data + ~80 bytes code = fits 256b

P2: div8 v4 — Lower Bound Certificates

  • GPU brute-force found carry_compare but also exposed verifier bug (0xFF≠1)
  • Fix CUDA z80_divmod_fast verifier for K≥128
  • Run GPU search at max-len=9,10 for K<128 — might find more tricks
  • Can we find something for K=3..127 that beats preshift_mul?
  • Lower bound certificates: prove no len-N sequence exists for specific K

P3: mul16c (HL×K→HL) — MinZ Waiting

  • MinZ-VIR asked about this (P5 from day6_seed)
  • Approach: decompose as HL×K = L×K + H×K×256
  • Or: new CUDA kernel with HL input, reduced op pool
  • Needed for 16-bit constant multiply in compiler

P4: Multi-Pass BB Refinement

  • Single greedy pass gives ~25% error (noisy)
  • Introspec got much better via multi-pass + manual mask tuning
  • Implement: after first pass, re-optimize early layers
  • Also try: p=8 (more points per layer), 132 layers (vs 66)

P5: Enriched Table Reader in MinZ-VIR

  • 4tw49890 asked about pkg/regalloc Go reader
  • Told them about LoadBinary+Lookup (~150 lines, pure Go)
  • Follow up: help integrate or ship as separate Go module?

P6: Segmented Search <10% Error

  • Current: 6 levels, 15% error on Che (1194 bytes)
  • Add: Level 6-7, 32-bit seeds, multiple LFSR polynomials
  • Try: overlap grids between levels
  • Goal: <10% error from 256 bytes

P7: Reordering Optimizer (pkg/reorder/)

  • Dependency DAG from opReads/opWrites (primitives exist in pruner.go)
  • Pattern matcher with reorder awareness
  • Multi-pass fixpoint
  • This turns 739K rules into a practical compiler pass

P8: Paper / Book

  • antique-toy (fjimbuwe) received day 6 data for Appendix K
  • 3-level validation methodology deserves its own section
  • carry_compare as case study: "GPU discovers what humans miss"
  • Phase diagram + feasibility cliff paper ready for submission

Key Context for Next Session

  • div8_optimal.json is v3 (6 methods, avg 79T, carry_compare for K≥128)
  • GPU brute-force div8_all.jsonl has corrupt JSON (double commas in ops arrays)
  • MinZ-VIR has integrated div8 intrinsics (commit 8cfba219)
  • All sign/sat/arith16 cross-verified on 4 systems
  • SHA-256 realistic estimate: 58ms/block (not 15ms as day 5 thought)
  • TODO.md exists and is referenced from CLAUDE.md + README.md

Active Colleagues

  • ju6yy047:main (MinZ) — compiler, integrated all tables, goodnight
  • 4tw49890:main (MinZ-VIR) — VIR backend, div8 intrinsics done, wants enriched reader
  • gyfiwji1:main (MinZ-ABAP) — crosscheck done, Z3 constant folding works
  • fjimbuwe:main (antique-toy) — book, wants Appendix K data

Numbers

  • div8: avg 79T, total 19996T, 254/254, 6 methods
  • sat_add8: 4 ops 16T, sat_sub8: 5 ops 20T, sign8: 9 ops 43T
  • abs16: 11 ops 44T, neg16: 6 ops 27T, min16/max16: 5 ops 41-46T
  • SHA-256: ~2570T/round, 202K T/block, 58ms @3.5MHz
  • Image: cat 4.9%, skull 14.7%, Che 15% (segmented), Einstein 15.1%
  • Week total: 1500+ verified sequences, 12 data files, 6 CUDA kernels