Skip to content

Commit f4d72ac

Browse files
Eliminate MarkCountryProvincesDirty with generation-based lazy invalidation
Replaces O(k) per-province dirty writes with O(1) country generation counter: - countryGeneration increments on modifier change, provinces detect stale gen on query - Full unsafe pointer chain in GetProvinceModifierFast slow path (no 8KB copies) - RebuildIfDirtyFromParentPtr takes unsafe parent pointer, avoids nullable boxing - ClearActive() zeros only bitmask-tracked types (~20 bytes vs 8KB full clear) - ApplyModifier inlined with early-out for zero modifiers - Clear() uses UnsafeUtility.MemClear for bulk zeroing Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 0dbdb3c commit f4d72ac

4 files changed

Lines changed: 341 additions & 81 deletions

File tree

Lines changed: 212 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,212 @@
1+
# ModifierSystem Monthly Tick Performance (Continued)
2+
**Date**: 2026-02-03
3+
**Session**: 4
4+
**Status**: ✅ Complete
5+
**Priority**: High
6+
7+
---
8+
9+
## Session Goal
10+
11+
**Primary Objective:**
12+
- Eliminate remaining ~377ms in `GetProvinceModifierFast` and ~3400ms in `MarkCountryProvincesDirty`
13+
14+
**Success Criteria:**
15+
- Monthly tick runs without late-game stall (user ran past previous stuck point)
16+
17+
---
18+
19+
## Context & Background
20+
21+
**Previous Work:**
22+
- See: [Session 3 — ModifierSystem Performance](3-modifier-system-performance.md) — bitmask, dirty flags, unsafe pointers, reverse index
23+
- Session 3 reduced 7000ms → ~377ms `GetProvinceModifierFast` + ~3400ms `MarkCountryProvincesDirty`
24+
25+
**Current State (start of session):**
26+
- `GetProvinceModifierFast`: 377ms total, 299ms in `ApplyModifier`, 277ms in `RebuildIfDirty`
27+
- `MarkCountryProvincesDirty`: 3400ms, 2917ms in `NativeArray<bool>.set_Item`
28+
29+
**Root Cause Chain:**
30+
1. `RebuildIfDirty` slow path: `cachedModifierSet.Clear()` zeroes 8KB per province rebuild (~50k provinces × 8KB = 400MB zeroing)
31+
2. `RebuildIfDirty` slow path: country scope passed as `ScopedModifierContainer?` nullable = 8KB copy per province
32+
3. `MarkCountryProvincesDirty`: still O(k) per country change (665 × ~75 = ~50k `NativeArray<bool>.set_Item` calls with bounds checking)
33+
4. `ApplyModifier`: goes through `Get()``ModifierValue``Apply()` intermediary structs unnecessarily
34+
35+
---
36+
37+
## What We Did
38+
39+
### 1. Generation-Based Lazy Province Invalidation
40+
**Files Changed:** `Core/Modifiers/ModifierSystem.cs`
41+
42+
- Added `NativeArray<uint> countryGeneration` — increments when any country modifier changes (O(1))
43+
- Added `NativeArray<uint> provinceLastCountryGeneration` — stores the country gen each province was last built against
44+
- Added `uint globalGeneration` — increments on global modifier changes
45+
- `GetProvinceModifierFast` compares `provinceLastCountryGeneration[id] != countryGeneration[countryId]` to detect stale provinces
46+
- **Eliminated `MarkCountryProvincesDirty` entirely** — no per-province writes when country modifiers change
47+
- `AddCountryModifier` now just increments `countryGeneration[countryId]` — single `NativeArray<uint>` write
48+
- `SetCountryProvincesLookup` kept as no-op for API compatibility
49+
- Cost reduction: O(k) per country change → **O(1)**
50+
51+
### 2. Full Unsafe Pointer Slow Path
52+
**Files Changed:** `Core/Modifiers/ModifierSystem.cs`, `Core/Modifiers/ScopedModifierContainer.cs`
53+
54+
- `GetProvinceModifierFast` slow path now uses `provinceScopes.GetUnsafePtr()` — writes directly to NativeArray via pointer, no 8KB copy + write-back
55+
- Country scope accessed via `countryScopes.GetUnsafeReadOnlyPtr()` — no 8KB nullable copy
56+
- Added `RebuildIfDirtyFromParentPtr(ScopedModifierContainer*)` — takes unsafe pointer to parent, avoids nullable boxing
57+
- Eliminates: ~50k × 8KB province copy + ~50k × 8KB country nullable copy = **~800MB** memory traffic
58+
59+
### 3. ClearActive() for Sparse Cache Reset
60+
**Files Changed:** `Core/Modifiers/ModifierSet.cs`
61+
62+
- Added `ClearActive()` — zeros only modifier types tracked in bitmask, then clears mask
63+
- Typically 2-5 types × 2 longs = 10-20 bytes vs 8KB full clear
64+
- `RebuildIfDirtyFromParentPtr` uses `ClearActive()` instead of `Clear()`
65+
- Cost reduction per rebuild: 8KB zeroing → ~20 bytes = **~400x improvement**
66+
67+
### 4. Optimized Clear() and ApplyModifier
68+
**Files Changed:** `Core/Modifiers/ModifierSet.cs`
69+
70+
- `Clear()` now uses `UnsafeUtility.MemClear` for bulk zeroing (single call vs 1024-iteration loop)
71+
- `ApplyModifier` inlined: reads raw longs directly, early-outs when both additive and multiplicative are zero, skips multiplication when no multiplicative modifier
72+
73+
---
74+
75+
## Decisions Made
76+
77+
### Decision 1: Generation Counters over Per-Province Dirty Marking
78+
**Context:** `MarkCountryProvincesDirty` wrote `NativeArray<bool>` per owned province per country modifier change. Even with reverse index (Session 3), 665 countries × ~75 provinces = ~50k writes with bounds checking = 3400ms.
79+
**Options:**
80+
1. Optimize NativeArray write (unsafe bulk set) — still O(k) writes per change
81+
2. Generation counter per country + last-seen-gen per province — O(1) per change, lazy check on read
82+
3. Batch dirty marking to once per tick — still O(k), just less frequent
83+
84+
**Decision:** Generation counter (Option 2)
85+
**Rationale:** Moves cost from write-time (every modifier change) to read-time (only when province is actually queried). Most provinces are never queried between changes. O(1) per modifier change regardless of province count.
86+
87+
### Decision 2: Separate Rebuild Method for Unsafe Parent
88+
**Context:** `RebuildIfDirty(ScopedModifierContainer?)` copies 8KB via nullable on every call
89+
**Options:**
90+
1. Change existing method signature — breaks all callers
91+
2. Add `RebuildIfDirtyFromParentPtr` overload — hot path only, existing API unchanged
92+
3. Template/generic approach — C# doesn't support this well for structs
93+
94+
**Decision:** New overload (Option 2)
95+
**Rationale:** Hot path (`GetProvinceModifierFast`) uses the unsafe overload. Non-hot-path callers (`GetProvinceModifier`, tooltips) keep the safe API.
96+
97+
---
98+
99+
## What Worked
100+
101+
1. **Generation-based lazy invalidation**
102+
- Eliminates O(k) write amplification entirely
103+
- Cost shifts to read-time where it's amortized across queries
104+
- Pattern: "don't push dirty state, let readers pull freshness"
105+
106+
2. **Full unsafe pointer chain**
107+
- Province read + write + country read all via pointers = zero struct copies in hot path
108+
- Rebuilds happen in-place in the NativeArray
109+
110+
3. **Sparse clear (ClearActive)**
111+
- 8KB → ~20 bytes per province clear when only 2-5 modifier types active
112+
- Bitmask from Session 3 enables this naturally
113+
114+
---
115+
116+
## What Didn't Work
117+
118+
### 1. Reverse Index for MarkCountryProvincesDirty (Session 3)
119+
- **What we tried:** Using province ownership reverse lookup to mark only owned provinces dirty
120+
- **Why it wasn't enough:** Even marking ~75 provinces per country × 665 countries = ~50k NativeArray writes. `NativeArray<bool>.set_Item` bounds checking overhead made this 3400ms.
121+
- **Lesson:** O(k) is still too expensive when k × frequency is large. O(1) write + lazy read is the correct pattern for high-frequency modifier changes.
122+
123+
---
124+
125+
## Problems Encountered & Solutions
126+
127+
### Problem 1: 3400ms in MarkCountryProvincesDirty
128+
**Symptom:** `TryBuildFarm → AddCountryModifier → MarkCountryProvincesDirty`: 3400ms, 2917ms in `NativeArray.set_Item`
129+
**Root Cause:** Even with reverse index, ~50k `NativeArray<bool>` writes per monthly tick. Bounds-checked indexer overhead dominates.
130+
**Solution:** Generation counters eliminate the method entirely. Country modifier change = increment one uint. Province detects staleness on read.
131+
132+
### Problem 2: 277ms in RebuildIfDirty
133+
**Symptom:** `GetProvinceModifierFast``ApplyModifier``RebuildIfDirty`: 277ms
134+
**Root Cause:** Three costs per dirty province rebuild:
135+
1. `cachedModifierSet.Clear()` zeroes 8KB (512×2 longs + mask)
136+
2. Country scope passed as nullable = 8KB copy
137+
3. Province scope copied from NativeArray = 8KB copy + write-back
138+
**Solution:** `ClearActive()` (sparse clear), unsafe parent pointer (zero copy), unsafe province pointer (in-place rebuild)
139+
140+
---
141+
142+
## Architecture Impact
143+
144+
### New Pattern: Generation-Based Lazy Invalidation (Pattern 11 Extension)
145+
- **When to use:** Parent-child relationships where parent changes should invalidate children, but children are queried much less frequently than parents change
146+
- **Mechanism:** Parent has generation counter (uint, incremented on change). Child stores last-seen parent generation. On child query, compare generations — mismatch means rebuild needed.
147+
- **Benefits:** O(1) per parent change regardless of child count. Cost deferred to query time.
148+
- **Trade-off:** Slight overhead on every child query (uint comparison). Worth it when parent changes >> child queries.
149+
150+
### Pattern Reinforced: Unsafe Pointers for Large Struct NativeArrays
151+
- Session 3 used unsafe for read-only fast path
152+
- Session 4 extends to read-write (slow path rebuild) via `GetUnsafePtr()`
153+
- Rule: If struct > ~64 bytes and accessed in loops, always use unsafe pointers
154+
155+
---
156+
157+
## Code Quality Notes
158+
159+
### Performance
160+
- **Before (Session 3 end):** ~3400ms `MarkCountryProvincesDirty` + ~377ms `GetProvinceModifierFast`
161+
- **After Session 4:** User reports game runs past previous stuck point (monthly tick no longer stalls)
162+
- **Target:** <100ms monthly tick — appears met based on user testing
163+
164+
### Technical Debt
165+
- `SetCountryProvincesLookup` is now a no-op — could be removed along with `GameState` wiring, but kept for API stability
166+
- `GetProvinceModifier` (non-fast path) still copies structs — acceptable for non-hot-path callers (tooltips, etc.)
167+
- `MarkAllDirty()` still does 8KB struct copy for 4096 countries — acceptable, only called on global modifier change (rare)
168+
- `RebuildIfDirty` (non-ptr overload) still uses `Clear()` not `ClearActive()` — only used by non-hot-path
169+
170+
---
171+
172+
## Next Session
173+
174+
### Immediate Next Steps
175+
1. Profile monthly tick with deep profiler to confirm <100ms target
176+
2. Look for other hotspots now that ModifierSystem is resolved
177+
3. Consider removing dead code (`SetCountryProvincesLookup` no-op, reverse index callback fields)
178+
179+
### Questions to Resolve
180+
1. Are there other systems calling `GetProvinceModifier` (non-fast) that should migrate to `GetProvinceModifierFast`?
181+
2. Should `MarkAllDirty` also use generation bumps instead of per-province flag writes?
182+
183+
---
184+
185+
## Quick Reference for Future Claude
186+
187+
**Key implementations:**
188+
- Generation counters: `Core/Modifiers/ModifierSystem.cs``countryGeneration`, `provinceLastCountryGeneration`, `globalGeneration`
189+
- Unsafe rebuild: `Core/Modifiers/ScopedModifierContainer.cs``RebuildIfDirtyFromParentPtr(ScopedModifierContainer*)`
190+
- Sparse clear: `Core/Modifiers/ModifierSet.cs``ClearActive()` (uses bitmask to zero only active types)
191+
- Inlined apply: `Core/Modifiers/ModifierSet.cs``ApplyModifier()` reads raw longs, early-out on zero
192+
- Bulk clear: `Core/Modifiers/ModifierSet.cs``Clear()` uses `UnsafeUtility.MemClear`
193+
- Full unsafe fast path: `Core/Modifiers/ModifierSystem.cs``GetProvinceModifierFast()` uses `GetUnsafePtr()` for both read and write
194+
195+
**Gotchas:**
196+
- `SetCountryProvincesLookup` is now a no-op — generation counters replaced `MarkCountryProvincesDirty`
197+
- `provinceLastCountryGeneration` must be updated after province rebuild in `GetProvinceModifierFast`
198+
- `countryGeneration` must be incremented in ALL country mutation methods: `AddCountryModifier`, `RemoveCountryModifiersBySource`, `ClearCountryModifiers`, `ExpireModifiers`, `LoadState`, `MarkAllDirty`
199+
- `RebuildIfDirtyFromParentPtr` assumes parent is already clean — always call `EnsureCountryScopeClean` first
200+
- `ClearActive()` relies on bitmask accuracy — if bitmask gets out of sync, stale values remain
201+
202+
**Files changed this session:**
203+
- `Core/Modifiers/ModifierSystem.cs` — generation counters, eliminated `MarkCountryProvincesDirty`, full unsafe slow path
204+
- `Core/Modifiers/ModifierSet.cs``ClearActive()`, `UnsafeUtility.MemClear` in `Clear()`, inlined `ApplyModifier`
205+
- `Core/Modifiers/ScopedModifierContainer.cs``RebuildIfDirtyFromParentPtr(ScopedModifierContainer*)`
206+
207+
---
208+
209+
## Related Sessions
210+
- [Session 3 — ModifierSystem Performance](3-modifier-system-performance.md) — bitmask, dirty flags, unsafe read path, reverse index
211+
- [Session 2 — Semaphore Fix](2-semaphore-waitforsignal-fix.md) — GPU sync stall
212+
- [Session 1 — Runtime Performance](1-runtime-performance-optimization.md) — initial fixes

Scripts/Core/Modifiers/ModifierSet.cs

Lines changed: 45 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
using System.Runtime.InteropServices;
2+
using Unity.Collections.LowLevel.Unsafe;
23
using Core.Data;
34

45
namespace Core.Modifiers
@@ -101,23 +102,13 @@ public void Set(ushort modifierTypeId, FixedPoint64 value, bool isMultiplicative
101102

102103
/// <summary>
103104
/// Clear all modifiers (reset to zero)
104-
/// Vectorized for performance
105+
/// Uses UnsafeUtility.MemClear for fast bulk zeroing
105106
/// </summary>
106107
public void Clear()
107108
{
108-
fixed (long* add = additive, mult = multiplicative, mask = activeTypeMask)
109+
fixed (long* add = additive)
109110
{
110-
// Zero out both arrays
111-
for (int i = 0; i < MAX_MODIFIER_TYPES; i++)
112-
{
113-
add[i] = 0L;
114-
mult[i] = 0L;
115-
}
116-
// Zero out bitmask
117-
for (int i = 0; i < BITMASK_LONGS; i++)
118-
{
119-
mask[i] = 0L;
120-
}
111+
UnsafeUtility.MemClear(add, MAX_MODIFIER_TYPES * sizeof(long) * 2 + BITMASK_LONGS * sizeof(long));
121112
}
122113
}
123114

@@ -127,8 +118,47 @@ public void Clear()
127118
/// </summary>
128119
public FixedPoint64 ApplyModifier(ushort modifierTypeId, FixedPoint64 baseValue)
129120
{
130-
var mod = Get(modifierTypeId);
131-
return mod.Apply(baseValue);
121+
if (modifierTypeId >= MAX_MODIFIER_TYPES)
122+
return baseValue;
123+
124+
long addRaw = additive[modifierTypeId];
125+
long multRaw = multiplicative[modifierTypeId];
126+
127+
// Fast path: no modifiers active for this type
128+
if (addRaw == 0L && multRaw == 0L)
129+
return baseValue;
130+
131+
// (base + additive) * (1 + multiplicative)
132+
FixedPoint64 result = baseValue + FixedPoint64.FromRaw(addRaw);
133+
if (multRaw != 0L)
134+
result = result * (FixedPoint64.One + FixedPoint64.FromRaw(multRaw));
135+
return result;
136+
}
137+
138+
/// <summary>
139+
/// Clear only active modifier types using bitmask, then zero the mask.
140+
/// Much faster than full Clear() when only 2-5 types are active out of 512.
141+
/// </summary>
142+
public void ClearActive()
143+
{
144+
fixed (long* add = additive, mult = multiplicative, mask = activeTypeMask)
145+
{
146+
for (int wordIdx = 0; wordIdx < BITMASK_LONGS; wordIdx++)
147+
{
148+
long word = mask[wordIdx];
149+
if (word == 0L) continue;
150+
151+
while (word != 0)
152+
{
153+
int bitIdx = BitCount.TrailingZeroCount(word);
154+
int typeId = (wordIdx << 6) | bitIdx;
155+
add[typeId] = 0L;
156+
mult[typeId] = 0L;
157+
word &= word - 1;
158+
}
159+
mask[wordIdx] = 0L;
160+
}
161+
}
132162
}
133163

134164
/// <summary>

0 commit comments

Comments
 (0)