|
| 1 | +# ModifierSystem Monthly Tick Performance (Continued) |
| 2 | +**Date**: 2026-02-03 |
| 3 | +**Session**: 4 |
| 4 | +**Status**: ✅ Complete |
| 5 | +**Priority**: High |
| 6 | + |
| 7 | +--- |
| 8 | + |
| 9 | +## Session Goal |
| 10 | + |
| 11 | +**Primary Objective:** |
| 12 | +- Eliminate remaining ~377ms in `GetProvinceModifierFast` and ~3400ms in `MarkCountryProvincesDirty` |
| 13 | + |
| 14 | +**Success Criteria:** |
| 15 | +- Monthly tick runs without late-game stall (user ran past previous stuck point) |
| 16 | + |
| 17 | +--- |
| 18 | + |
| 19 | +## Context & Background |
| 20 | + |
| 21 | +**Previous Work:** |
| 22 | +- See: [Session 3 — ModifierSystem Performance](3-modifier-system-performance.md) — bitmask, dirty flags, unsafe pointers, reverse index |
| 23 | +- Session 3 reduced 7000ms → ~377ms `GetProvinceModifierFast` + ~3400ms `MarkCountryProvincesDirty` |
| 24 | + |
| 25 | +**Current State (start of session):** |
| 26 | +- `GetProvinceModifierFast`: 377ms total, 299ms in `ApplyModifier`, 277ms in `RebuildIfDirty` |
| 27 | +- `MarkCountryProvincesDirty`: 3400ms, 2917ms in `NativeArray<bool>.set_Item` |
| 28 | + |
| 29 | +**Root Cause Chain:** |
| 30 | +1. `RebuildIfDirty` slow path: `cachedModifierSet.Clear()` zeroes 8KB per province rebuild (~50k provinces × 8KB = 400MB zeroing) |
| 31 | +2. `RebuildIfDirty` slow path: country scope passed as `ScopedModifierContainer?` nullable = 8KB copy per province |
| 32 | +3. `MarkCountryProvincesDirty`: still O(k) per country change (665 × ~75 = ~50k `NativeArray<bool>.set_Item` calls with bounds checking) |
| 33 | +4. `ApplyModifier`: goes through `Get()` → `ModifierValue` → `Apply()` intermediary structs unnecessarily |
| 34 | + |
| 35 | +--- |
| 36 | + |
| 37 | +## What We Did |
| 38 | + |
| 39 | +### 1. Generation-Based Lazy Province Invalidation |
| 40 | +**Files Changed:** `Core/Modifiers/ModifierSystem.cs` |
| 41 | + |
| 42 | +- Added `NativeArray<uint> countryGeneration` — increments when any country modifier changes (O(1)) |
| 43 | +- Added `NativeArray<uint> provinceLastCountryGeneration` — stores the country gen each province was last built against |
| 44 | +- Added `uint globalGeneration` — increments on global modifier changes |
| 45 | +- `GetProvinceModifierFast` compares `provinceLastCountryGeneration[id] != countryGeneration[countryId]` to detect stale provinces |
| 46 | +- **Eliminated `MarkCountryProvincesDirty` entirely** — no per-province writes when country modifiers change |
| 47 | +- `AddCountryModifier` now just increments `countryGeneration[countryId]` — single `NativeArray<uint>` write |
| 48 | +- `SetCountryProvincesLookup` kept as no-op for API compatibility |
| 49 | +- Cost reduction: O(k) per country change → **O(1)** |
| 50 | + |
| 51 | +### 2. Full Unsafe Pointer Slow Path |
| 52 | +**Files Changed:** `Core/Modifiers/ModifierSystem.cs`, `Core/Modifiers/ScopedModifierContainer.cs` |
| 53 | + |
| 54 | +- `GetProvinceModifierFast` slow path now uses `provinceScopes.GetUnsafePtr()` — writes directly to NativeArray via pointer, no 8KB copy + write-back |
| 55 | +- Country scope accessed via `countryScopes.GetUnsafeReadOnlyPtr()` — no 8KB nullable copy |
| 56 | +- Added `RebuildIfDirtyFromParentPtr(ScopedModifierContainer*)` — takes unsafe pointer to parent, avoids nullable boxing |
| 57 | +- Eliminates: ~50k × 8KB province copy + ~50k × 8KB country nullable copy = **~800MB** memory traffic |
| 58 | + |
| 59 | +### 3. ClearActive() for Sparse Cache Reset |
| 60 | +**Files Changed:** `Core/Modifiers/ModifierSet.cs` |
| 61 | + |
| 62 | +- Added `ClearActive()` — zeros only modifier types tracked in bitmask, then clears mask |
| 63 | +- Typically 2-5 types × 2 longs = 10-20 bytes vs 8KB full clear |
| 64 | +- `RebuildIfDirtyFromParentPtr` uses `ClearActive()` instead of `Clear()` |
| 65 | +- Cost reduction per rebuild: 8KB zeroing → ~20 bytes = **~400x improvement** |
| 66 | + |
| 67 | +### 4. Optimized Clear() and ApplyModifier |
| 68 | +**Files Changed:** `Core/Modifiers/ModifierSet.cs` |
| 69 | + |
| 70 | +- `Clear()` now uses `UnsafeUtility.MemClear` for bulk zeroing (single call vs 1024-iteration loop) |
| 71 | +- `ApplyModifier` inlined: reads raw longs directly, early-outs when both additive and multiplicative are zero, skips multiplication when no multiplicative modifier |
| 72 | + |
| 73 | +--- |
| 74 | + |
| 75 | +## Decisions Made |
| 76 | + |
| 77 | +### Decision 1: Generation Counters over Per-Province Dirty Marking |
| 78 | +**Context:** `MarkCountryProvincesDirty` wrote `NativeArray<bool>` per owned province per country modifier change. Even with reverse index (Session 3), 665 countries × ~75 provinces = ~50k writes with bounds checking = 3400ms. |
| 79 | +**Options:** |
| 80 | +1. Optimize NativeArray write (unsafe bulk set) — still O(k) writes per change |
| 81 | +2. Generation counter per country + last-seen-gen per province — O(1) per change, lazy check on read |
| 82 | +3. Batch dirty marking to once per tick — still O(k), just less frequent |
| 83 | + |
| 84 | +**Decision:** Generation counter (Option 2) |
| 85 | +**Rationale:** Moves cost from write-time (every modifier change) to read-time (only when province is actually queried). Most provinces are never queried between changes. O(1) per modifier change regardless of province count. |
| 86 | + |
| 87 | +### Decision 2: Separate Rebuild Method for Unsafe Parent |
| 88 | +**Context:** `RebuildIfDirty(ScopedModifierContainer?)` copies 8KB via nullable on every call |
| 89 | +**Options:** |
| 90 | +1. Change existing method signature — breaks all callers |
| 91 | +2. Add `RebuildIfDirtyFromParentPtr` overload — hot path only, existing API unchanged |
| 92 | +3. Template/generic approach — C# doesn't support this well for structs |
| 93 | + |
| 94 | +**Decision:** New overload (Option 2) |
| 95 | +**Rationale:** Hot path (`GetProvinceModifierFast`) uses the unsafe overload. Non-hot-path callers (`GetProvinceModifier`, tooltips) keep the safe API. |
| 96 | + |
| 97 | +--- |
| 98 | + |
| 99 | +## What Worked |
| 100 | + |
| 101 | +1. **Generation-based lazy invalidation** |
| 102 | + - Eliminates O(k) write amplification entirely |
| 103 | + - Cost shifts to read-time where it's amortized across queries |
| 104 | + - Pattern: "don't push dirty state, let readers pull freshness" |
| 105 | + |
| 106 | +2. **Full unsafe pointer chain** |
| 107 | + - Province read + write + country read all via pointers = zero struct copies in hot path |
| 108 | + - Rebuilds happen in-place in the NativeArray |
| 109 | + |
| 110 | +3. **Sparse clear (ClearActive)** |
| 111 | + - 8KB → ~20 bytes per province clear when only 2-5 modifier types active |
| 112 | + - Bitmask from Session 3 enables this naturally |
| 113 | + |
| 114 | +--- |
| 115 | + |
| 116 | +## What Didn't Work |
| 117 | + |
| 118 | +### 1. Reverse Index for MarkCountryProvincesDirty (Session 3) |
| 119 | +- **What we tried:** Using province ownership reverse lookup to mark only owned provinces dirty |
| 120 | +- **Why it wasn't enough:** Even marking ~75 provinces per country × 665 countries = ~50k NativeArray writes. `NativeArray<bool>.set_Item` bounds checking overhead made this 3400ms. |
| 121 | +- **Lesson:** O(k) is still too expensive when k × frequency is large. O(1) write + lazy read is the correct pattern for high-frequency modifier changes. |
| 122 | + |
| 123 | +--- |
| 124 | + |
| 125 | +## Problems Encountered & Solutions |
| 126 | + |
| 127 | +### Problem 1: 3400ms in MarkCountryProvincesDirty |
| 128 | +**Symptom:** `TryBuildFarm → AddCountryModifier → MarkCountryProvincesDirty`: 3400ms, 2917ms in `NativeArray.set_Item` |
| 129 | +**Root Cause:** Even with reverse index, ~50k `NativeArray<bool>` writes per monthly tick. Bounds-checked indexer overhead dominates. |
| 130 | +**Solution:** Generation counters eliminate the method entirely. Country modifier change = increment one uint. Province detects staleness on read. |
| 131 | + |
| 132 | +### Problem 2: 277ms in RebuildIfDirty |
| 133 | +**Symptom:** `GetProvinceModifierFast` → `ApplyModifier` → `RebuildIfDirty`: 277ms |
| 134 | +**Root Cause:** Three costs per dirty province rebuild: |
| 135 | + 1. `cachedModifierSet.Clear()` zeroes 8KB (512×2 longs + mask) |
| 136 | + 2. Country scope passed as nullable = 8KB copy |
| 137 | + 3. Province scope copied from NativeArray = 8KB copy + write-back |
| 138 | +**Solution:** `ClearActive()` (sparse clear), unsafe parent pointer (zero copy), unsafe province pointer (in-place rebuild) |
| 139 | + |
| 140 | +--- |
| 141 | + |
| 142 | +## Architecture Impact |
| 143 | + |
| 144 | +### New Pattern: Generation-Based Lazy Invalidation (Pattern 11 Extension) |
| 145 | +- **When to use:** Parent-child relationships where parent changes should invalidate children, but children are queried much less frequently than parents change |
| 146 | +- **Mechanism:** Parent has generation counter (uint, incremented on change). Child stores last-seen parent generation. On child query, compare generations — mismatch means rebuild needed. |
| 147 | +- **Benefits:** O(1) per parent change regardless of child count. Cost deferred to query time. |
| 148 | +- **Trade-off:** Slight overhead on every child query (uint comparison). Worth it when parent changes >> child queries. |
| 149 | + |
| 150 | +### Pattern Reinforced: Unsafe Pointers for Large Struct NativeArrays |
| 151 | +- Session 3 used unsafe for read-only fast path |
| 152 | +- Session 4 extends to read-write (slow path rebuild) via `GetUnsafePtr()` |
| 153 | +- Rule: If struct > ~64 bytes and accessed in loops, always use unsafe pointers |
| 154 | + |
| 155 | +--- |
| 156 | + |
| 157 | +## Code Quality Notes |
| 158 | + |
| 159 | +### Performance |
| 160 | +- **Before (Session 3 end):** ~3400ms `MarkCountryProvincesDirty` + ~377ms `GetProvinceModifierFast` |
| 161 | +- **After Session 4:** User reports game runs past previous stuck point (monthly tick no longer stalls) |
| 162 | +- **Target:** <100ms monthly tick — appears met based on user testing |
| 163 | + |
| 164 | +### Technical Debt |
| 165 | +- `SetCountryProvincesLookup` is now a no-op — could be removed along with `GameState` wiring, but kept for API stability |
| 166 | +- `GetProvinceModifier` (non-fast path) still copies structs — acceptable for non-hot-path callers (tooltips, etc.) |
| 167 | +- `MarkAllDirty()` still does 8KB struct copy for 4096 countries — acceptable, only called on global modifier change (rare) |
| 168 | +- `RebuildIfDirty` (non-ptr overload) still uses `Clear()` not `ClearActive()` — only used by non-hot-path |
| 169 | + |
| 170 | +--- |
| 171 | + |
| 172 | +## Next Session |
| 173 | + |
| 174 | +### Immediate Next Steps |
| 175 | +1. Profile monthly tick with deep profiler to confirm <100ms target |
| 176 | +2. Look for other hotspots now that ModifierSystem is resolved |
| 177 | +3. Consider removing dead code (`SetCountryProvincesLookup` no-op, reverse index callback fields) |
| 178 | + |
| 179 | +### Questions to Resolve |
| 180 | +1. Are there other systems calling `GetProvinceModifier` (non-fast) that should migrate to `GetProvinceModifierFast`? |
| 181 | +2. Should `MarkAllDirty` also use generation bumps instead of per-province flag writes? |
| 182 | + |
| 183 | +--- |
| 184 | + |
| 185 | +## Quick Reference for Future Claude |
| 186 | + |
| 187 | +**Key implementations:** |
| 188 | +- Generation counters: `Core/Modifiers/ModifierSystem.cs` — `countryGeneration`, `provinceLastCountryGeneration`, `globalGeneration` |
| 189 | +- Unsafe rebuild: `Core/Modifiers/ScopedModifierContainer.cs` — `RebuildIfDirtyFromParentPtr(ScopedModifierContainer*)` |
| 190 | +- Sparse clear: `Core/Modifiers/ModifierSet.cs` — `ClearActive()` (uses bitmask to zero only active types) |
| 191 | +- Inlined apply: `Core/Modifiers/ModifierSet.cs` — `ApplyModifier()` reads raw longs, early-out on zero |
| 192 | +- Bulk clear: `Core/Modifiers/ModifierSet.cs` — `Clear()` uses `UnsafeUtility.MemClear` |
| 193 | +- Full unsafe fast path: `Core/Modifiers/ModifierSystem.cs` — `GetProvinceModifierFast()` uses `GetUnsafePtr()` for both read and write |
| 194 | + |
| 195 | +**Gotchas:** |
| 196 | +- `SetCountryProvincesLookup` is now a no-op — generation counters replaced `MarkCountryProvincesDirty` |
| 197 | +- `provinceLastCountryGeneration` must be updated after province rebuild in `GetProvinceModifierFast` |
| 198 | +- `countryGeneration` must be incremented in ALL country mutation methods: `AddCountryModifier`, `RemoveCountryModifiersBySource`, `ClearCountryModifiers`, `ExpireModifiers`, `LoadState`, `MarkAllDirty` |
| 199 | +- `RebuildIfDirtyFromParentPtr` assumes parent is already clean — always call `EnsureCountryScopeClean` first |
| 200 | +- `ClearActive()` relies on bitmask accuracy — if bitmask gets out of sync, stale values remain |
| 201 | + |
| 202 | +**Files changed this session:** |
| 203 | +- `Core/Modifiers/ModifierSystem.cs` — generation counters, eliminated `MarkCountryProvincesDirty`, full unsafe slow path |
| 204 | +- `Core/Modifiers/ModifierSet.cs` — `ClearActive()`, `UnsafeUtility.MemClear` in `Clear()`, inlined `ApplyModifier` |
| 205 | +- `Core/Modifiers/ScopedModifierContainer.cs` — `RebuildIfDirtyFromParentPtr(ScopedModifierContainer*)` |
| 206 | + |
| 207 | +--- |
| 208 | + |
| 209 | +## Related Sessions |
| 210 | +- [Session 3 — ModifierSystem Performance](3-modifier-system-performance.md) — bitmask, dirty flags, unsafe read path, reverse index |
| 211 | +- [Session 2 — Semaphore Fix](2-semaphore-waitforsignal-fix.md) — GPU sync stall |
| 212 | +- [Session 1 — Runtime Performance](1-runtime-performance-optimization.md) — initial fixes |
0 commit comments