|
| 1 | +# ModifierSystem Monthly Tick Performance |
| 2 | +**Date**: 2026-02-03 |
| 3 | +**Session**: 3 |
| 4 | +**Status**: 🔄 In Progress |
| 5 | +**Priority**: High |
| 6 | + |
| 7 | +--- |
| 8 | + |
| 9 | +## Session Goal |
| 10 | + |
| 11 | +**Primary Objective:** |
| 12 | +- Eliminate 7000ms `GameState.Update → GC.Alloc` spike on monthly tick in late game (~665 AI countries, ~50k provinces) |
| 13 | + |
| 14 | +**Success Criteria:** |
| 15 | +- Monthly tick under 100ms |
| 16 | +- Zero GC allocations during monthly tick |
| 17 | + |
| 18 | +--- |
| 19 | + |
| 20 | +## Context & Background |
| 21 | + |
| 22 | +**Previous Work:** |
| 23 | +- See: [Session 2 — Semaphore Fix](2-semaphore-waitforsignal-fix.md) — GPU sync stall fix |
| 24 | +- See: [Session 1 — Runtime Performance](1-runtime-performance-optimization.md) — reverse index, initial modifier fix |
| 25 | + |
| 26 | +**Current State:** |
| 27 | +- Deep profiler showed two hotspots in `EventBus.ProcessEvents → MonthlyTickEvent`: |
| 28 | + - `AISystem.OnMonthlyTick`: 3513ms (TryBuildFarm: 3160ms, TryColonize: 350ms) |
| 29 | + - `EconomySystem.OnMonthlyTick`: 3468ms (CalculateCountryIncome: 3463ms) |
| 30 | + |
| 31 | +**Root Cause Chain:** |
| 32 | +1. `MarkCountryProvincesDirty` marked ALL 65,536 provinces dirty (O(65k) struct writes per call) |
| 33 | +2. Each AI farm build triggered this → 665 countries × 65k = **43M struct writes** |
| 34 | +3. `RebuildIfDirty` iterated 512 `MAX_MODIFIER_TYPES` slots to copy parent modifiers |
| 35 | +4. `ScopedModifierContainer` is ~8KB struct (contains `ModifierSet` with 512×2 fixed longs) — every NativeArray access copies 8KB |
| 36 | +5. `GetProvinceModifier` called 2× per province per country income calc → ~100k calls × 8KB = **800MB struct copying** |
| 37 | + |
| 38 | +--- |
| 39 | + |
| 40 | +## What We Did |
| 41 | + |
| 42 | +### 1. Reverse Index for MarkCountryProvincesDirty |
| 43 | +**Files Changed:** `Core/Modifiers/ModifierSystem.cs`, `Core/GameState.cs` |
| 44 | + |
| 45 | +- Added `Action<ushort, NativeList<ushort>> getCountryProvincesFunc` callback + `NativeList<ushort> countryProvinceBuffer` |
| 46 | +- `SetCountryProvincesLookup()` wired in `GameState.InitializeSystems()` after `Provinces.Initialize()` |
| 47 | +- `MarkCountryProvincesDirty` now marks only owned provinces (~75 avg) instead of all 65k |
| 48 | +- Cost reduction: 65,536 → ~75 struct writes per call = **~870x improvement** |
| 49 | + |
| 50 | +### 2. Bitmask Active Type Tracking in ModifierSet |
| 51 | +**Files Changed:** `Core/Modifiers/ModifierSet.cs`, `Core/Modifiers/ScopedModifierContainer.cs` |
| 52 | + |
| 53 | +- Added `fixed long activeTypeMask[8]` (64 bytes) to `ModifierSet` — tracks which of 512 types have values |
| 54 | +- `Add()` and `Set()` set corresponding bit; `Clear()` zeros the mask |
| 55 | +- Added `CopyActiveToSet(ref ModifierSet target)` using bit iteration (trailing zero count) |
| 56 | +- `ScopedModifierContainer.RebuildIfDirty` now uses `CopyActiveToSet` instead of 512-type loop |
| 57 | +- Cost reduction per rebuild: 512 iterations → ~3 active types = **~170x improvement** |
| 58 | + |
| 59 | +### 3. Separate Dirty Flag Arrays (Avoid 8KB Struct Copy) |
| 60 | +**Files Changed:** `Core/Modifiers/ModifierSystem.cs` |
| 61 | + |
| 62 | +- Added `NativeArray<bool> provinceDirtyFlags` and `countryDirtyFlags` |
| 63 | +- Kept in sync with `ScopedModifierContainer.isDirty` across all mutation methods |
| 64 | +- `MarkCountryProvincesDirty` now sets only 1-byte flags, no 8KB struct read-modify-write |
| 65 | +- `MarkAllDirty` skips province struct access entirely, only sets flag array |
| 66 | + |
| 67 | +### 4. GetProvinceModifierFast with Unsafe Pointer Access |
| 68 | +**Files Changed:** `Core/Modifiers/ModifierSystem.cs`, `Core/Modifiers/ScopedModifierContainer.cs`, `StarterKit/Systems/EconomySystem.cs` |
| 69 | + |
| 70 | +- `EnsureCountryScopeClean(countryId)` — rebuild country scope once before batch queries |
| 71 | +- `GetProvinceModifierFast()` — checks `provinceDirtyFlags[id]` (1-byte) before accessing struct: |
| 72 | + - **Clean (common case):** Unsafe pointer into NativeArray → `ApplyModifierFromCache()` — **zero struct copy** |
| 73 | + - **Dirty (rare):** Full struct copy + rebuild + write-back (same as before) |
| 74 | +- `ScopedModifierContainer` gained `IsDirty` property and `ApplyModifierFromCache()` |
| 75 | +- `EconomySystem.CalculateCountryIncome` calls `EnsureCountryScopeClean` once, then `GetProvinceModifierFast` per province |
| 76 | + |
| 77 | +### 5. Prior Session Fixes (from context summary) |
| 78 | +**Files Changed:** `StarterKit/Systems/AISystem.cs`, `StarterKit/Systems/EconomySystem.cs`, `StarterKit/Systems/ProvinceHistorySystem.cs`, `Core/Commands/CommandProcessor.cs`, `Core/Commands/ICommand.cs`, `StarterKit/Commands/ColonizeCommand.cs` |
| 79 | + |
| 80 | +- AISystem: replaced `ProvinceQueryBuilder` allocations with pre-allocated `NativeList<ushort>` buffers |
| 81 | +- EconomySystem: added pre-allocated `provinceBuffer` for `GetCountryProvinces` |
| 82 | +- ProvinceHistorySystem: cached `TimeManager` reference (was `FindFirstObjectByType` per event) |
| 83 | +- CommandProcessor: replaced reflection (`GetType().GetMethod()` + `method.Invoke()`) with `ICommandMessages` interface |
| 84 | +- ColonizeCommand: removed `LogExecution` string allocation, direct field access instead of `GetComponent` |
| 85 | + |
| 86 | +--- |
| 87 | + |
| 88 | +## Decisions Made |
| 89 | + |
| 90 | +### Decision 1: Callback Pattern for Cross-Layer Lookup |
| 91 | +**Context:** ModifierSystem (Core) needs province ownership data (ProvinceSystem) for `MarkCountryProvincesDirty` |
| 92 | +**Options:** |
| 93 | +1. Direct reference to ProvinceSystem — violates Core layer independence |
| 94 | +2. Callback/delegate set after init — mechanism-only, no import dependency |
| 95 | +3. Event-based — too complex for a sync query |
| 96 | + |
| 97 | +**Decision:** Callback (Option 2) |
| 98 | +**Rationale:** Engine provides mechanism (callback registration), game wires it. ProvinceSystem doesn't exist at ModifierSystem construction time anyway. |
| 99 | + |
| 100 | +### Decision 2: Unsafe Pointer for Hot Path |
| 101 | +**Context:** Even reading `NativeArray[i]` copies the entire 8KB struct |
| 102 | +**Options:** |
| 103 | +1. Restructure data to avoid large structs in NativeArrays — massive refactor |
| 104 | +2. Unsafe pointer access for read-only fast path — targeted fix |
| 105 | +3. Separate small query cache alongside NativeArray — memory overhead |
| 106 | + |
| 107 | +**Decision:** Unsafe pointer (Option 2) |
| 108 | +**Rationale:** `ScopedModifierContainer` is fully unmanaged (NativeArrays + fixed arrays + bool). Unsafe read is safe and zero-copy. Only used on clean provinces (common case). |
| 109 | + |
| 110 | +### Decision 3: Separate Dirty Flag Arrays |
| 111 | +**Context:** Checking `isDirty` required copying 8KB struct from NativeArray |
| 112 | +**Options:** |
| 113 | +1. Move `isDirty` out of struct — breaks encapsulation |
| 114 | +2. Parallel `NativeArray<bool>` kept in sync — small overhead, clean separation |
| 115 | +3. Bit-pack dirty flags — complex, minimal memory benefit over bool array |
| 116 | + |
| 117 | +**Decision:** Parallel bool array (Option 2) |
| 118 | +**Rationale:** 65KB for provinces + 4KB for countries is trivial. Keeping in sync requires discipline but all mutation goes through ModifierSystem methods. |
| 119 | + |
| 120 | +--- |
| 121 | + |
| 122 | +## What Worked |
| 123 | + |
| 124 | +1. **Profiler-driven investigation** |
| 125 | + - Deep profile breakdown pinpointed exact methods and call counts |
| 126 | + - Each fix targeted a measured bottleneck, not guesswork |
| 127 | + |
| 128 | +2. **Bitmask for sparse iteration** |
| 129 | + - 512 modifier types but only ~3 active → bitmask skips 99.4% of iterations |
| 130 | + - Trailing zero count gives O(k) where k = active types |
| 131 | + |
| 132 | +3. **Separate dirty tracking to avoid struct copies** |
| 133 | + - 1-byte dirty check vs 8KB struct copy = 8000x less memory traffic on fast path |
| 134 | + |
| 135 | +--- |
| 136 | + |
| 137 | +## What Didn't Work |
| 138 | + |
| 139 | +### 1. Frame-Batching AI Processing (Prior Session) |
| 140 | +- **What we tried:** Processing 50 AI countries per frame instead of all 665 at once |
| 141 | +- **Why it failed:** Total cost identical, just spread over more frames. User correctly identified: "did you actually fix anything or just spread stuff" |
| 142 | +- **Lesson:** Batching hides latency, doesn't fix throughput. Fix the algorithmic cost first. |
| 143 | + |
| 144 | +--- |
| 145 | + |
| 146 | +## Problems Encountered & Solutions |
| 147 | + |
| 148 | +### Problem 1: 43M Struct Writes per Monthly Tick |
| 149 | +**Symptom:** `TryBuildFarm` 3160ms — 9x slower than `TryColonize` despite simpler logic |
| 150 | +**Root Cause:** `BuildingSystem.ApplyBuildingModifiers` → `AddCountryModifier` → `MarkCountryProvincesDirty` marked ALL 65k provinces dirty per farm build. ~665 AI countries × 65k = 43M struct writes. |
| 151 | +**Solution:** Reverse index lookup via callback — only marks ~75 owned provinces per country. |
| 152 | + |
| 153 | +### Problem 2: 800MB Struct Copying in Income Calculation |
| 154 | +**Symptom:** `CalculateCountryIncome` 3463ms despite pre-allocated buffers |
| 155 | +**Root Cause:** `GetProvinceModifier` copies 8KB `ScopedModifierContainer` from NativeArray on every call. 2 calls × ~50k provinces = 100k copies × 8KB = 800MB. |
| 156 | +**Solution:** Separate dirty flag array + unsafe pointer read for clean provinces (zero copy). |
| 157 | + |
| 158 | +### Problem 3: RebuildIfDirty Iterating 512 Types |
| 159 | +**Symptom:** Even after dirty fix, `ApplyModifier` still 376ms |
| 160 | +**Root Cause:** Parent scope copy loop iterated all 512 `MAX_MODIFIER_TYPES` slots. With ~50k province rebuilds × 512 = 25M iterations. |
| 161 | +**Solution:** Bitmask tracking active types. `CopyActiveToSet` iterates only active types via trailing zero count. |
| 162 | + |
| 163 | +--- |
| 164 | + |
| 165 | +## Architecture Impact |
| 166 | + |
| 167 | +### New Anti-Pattern: Large Structs in NativeArray for Frequent Access |
| 168 | +- **What not to do:** Store >1KB structs in NativeArray and access them in tight loops |
| 169 | +- **Why it's bad:** Every `array[i]` copies the entire struct. 8KB × 100k accesses = 800MB traffic. |
| 170 | +- **Rule:** For hot-path access, either use unsafe pointers or keep frequently-checked flags in a separate small array. |
| 171 | + |
| 172 | +### Pattern Reinforced: Separate Dirty Tracking (Pattern 11 Extension) |
| 173 | +- When dirty flag is inside a large struct, checking it costs as much as the full access |
| 174 | +- Maintain parallel `NativeArray<bool>` for dirty state when struct size > ~64 bytes |
| 175 | +- Keep in sync through the system's mutation API (single point of control) |
| 176 | + |
| 177 | +--- |
| 178 | + |
| 179 | +## Code Quality Notes |
| 180 | + |
| 181 | +### Performance |
| 182 | +- **Before:** ~7000ms monthly tick (3500ms AI + 3500ms Economy) |
| 183 | +- **After fix 1 (reverse index + bitmask):** ~600ms `GetProvinceModifier` |
| 184 | +- **After fix 2 (dirty flags + unsafe + EnsureCountryScopeClean):** Improved, measuring ongoing |
| 185 | +- **Target:** <100ms monthly tick |
| 186 | + |
| 187 | +### Technical Debt |
| 188 | +- `MarkAllDirty()` still does 8KB struct copy for 4096 countries (acceptable, rare operation) |
| 189 | +- `GetProvinceModifier` (non-fast path) still copies structs — only `GetProvinceModifierFast` is optimized |
| 190 | +- `ExpireModifiers` iterates all 65k provinces — should use sparse tracking (no temp modifiers currently) |
| 191 | + |
| 192 | +--- |
| 193 | + |
| 194 | +## Next Session |
| 195 | + |
| 196 | +### Immediate Next Steps |
| 197 | +1. Profile remaining hotspots after current fixes — measure actual improvement |
| 198 | +2. If still slow: consider moving `ModifierSet` out of `ScopedModifierContainer` into separate NativeArray for cache-friendly access |
| 199 | +3. Verify province ownership changes still work correctly after all modifier changes |
| 200 | + |
| 201 | +### Questions to Resolve |
| 202 | +1. Is `GetProvinceModifierFast` sufficient or does the original `GetProvinceModifier` also need optimization? |
| 203 | +2. Are there other callers of `GetProvinceModifier` outside `EconomySystem` that need the fast path? |
| 204 | + |
| 205 | +--- |
| 206 | + |
| 207 | +## Quick Reference for Future Claude |
| 208 | + |
| 209 | +**Key implementations:** |
| 210 | +- Reverse index callback: `Core/Modifiers/ModifierSystem.cs` — `SetCountryProvincesLookup()`, `MarkCountryProvincesDirty()` |
| 211 | +- Bitmask iteration: `Core/Modifiers/ModifierSet.cs` — `activeTypeMask`, `CopyActiveToSet()` |
| 212 | +- Fast province query: `Core/Modifiers/ModifierSystem.cs` — `GetProvinceModifierFast()`, `EnsureCountryScopeClean()` |
| 213 | +- Dirty flag arrays: `Core/Modifiers/ModifierSystem.cs` — `provinceDirtyFlags`, `countryDirtyFlags` |
| 214 | +- Unsafe pointer access: `Core/Modifiers/ModifierSystem.cs:327` — `provinceScopes.GetUnsafeReadOnlyPtr()` |
| 215 | +- Wiring: `Core/GameState.cs:209-211` — `Modifiers.SetCountryProvincesLookup()` |
| 216 | + |
| 217 | +**Gotchas:** |
| 218 | +- `ScopedModifierContainer` is ~8KB — NEVER access from NativeArray in tight loops without unsafe |
| 219 | +- `provinceDirtyFlags` must be kept in sync with struct's `isDirty` across ALL mutation methods |
| 220 | +- `GetProvinceModifierFast` slow path must call `MarkDirty()` on struct before `RebuildIfDirty` (flag array doesn't set struct's field) |
| 221 | +- `MarkCountryProvincesDirty` with reverse index only sets flag array, NOT struct — `GetProvinceModifierFast` handles the sync on access |
| 222 | + |
| 223 | +**Files changed this session:** |
| 224 | +- `Core/Modifiers/ModifierSystem.cs` — reverse index callback, dirty flag arrays, fast query path, unsafe access |
| 225 | +- `Core/Modifiers/ModifierSet.cs` — bitmask active type tracking, `CopyActiveToSet()`, `BitCount` helper |
| 226 | +- `Core/Modifiers/ScopedModifierContainer.cs` — `IsDirty`, `ApplyModifierFromCache()`, bitmask-based `RebuildIfDirty` |
| 227 | +- `Core/GameState.cs` — wiring `SetCountryProvincesLookup` after province init |
| 228 | +- `StarterKit/Systems/EconomySystem.cs` — `EnsureCountryScopeClean` + `GetProvinceModifierFast` in income calc |
| 229 | +- `StarterKit/Systems/AISystem.cs` — pre-allocated NativeList buffers (prior session, carried forward) |
| 230 | +- `StarterKit/Systems/ProvinceHistorySystem.cs` — cached TimeManager (prior session) |
| 231 | +- `Core/Commands/CommandProcessor.cs` — ICommandMessages interface (prior session) |
| 232 | +- `Core/Commands/ICommand.cs` — ICommandMessages interface (prior session) |
| 233 | + |
| 234 | +--- |
| 235 | + |
| 236 | +## Related Sessions |
| 237 | +- [Session 2 — Semaphore Fix](2-semaphore-waitforsignal-fix.md) — GPU sync stall |
| 238 | +- [Session 1 — Runtime Performance](1-runtime-performance-optimization.md) — reverse index, initial fixes |
0 commit comments