Background
internal/brain 是 LLM 客户端装饰器链和意图路由模块(含 brain/llm 子包)。Phase 2 resource-mgmt + performance 分析发现 3 个无界 map 无 TTL 驱逐、cache 读路径使用排他锁、指标滚动窗口 O(n) 拷贝。
Scope: resource-mgmt, performance — cycle 203 (模块分析通过 3)
Key files: router.go, memory.go, llm/cost.go, llm/ratelimit.go, llm/metrics.go
Related: issue 501 (cost-calculator unbounded map, Phase 1 已跟踪), issue 531 (SafetyGuard race)
Finding Summary
| Category |
Critical |
High |
Medium |
Low |
| Resource-mgmt |
0 |
0 |
2 |
0 |
| Performance |
0 |
0 |
2 |
0 |
| 合计 |
0 |
0 |
4 |
0 |
Findings
intent-router-exclusive-lock-on-cache-read
Severity: Medium | Confidence: High | ROI: Medium
Location: router.go:372-387
Problem: getFromCache 是读操作(map lookup + LRU MoveToFront),但获取排他 Lock() 而非 RLock()。每次缓存命中阻塞所有并发读写者。RWMutex 已声明但未在热路径读操作上充分利用。
Current Pattern:
func (r *IntentRouter) getFromCache(key string) *IntentResult {
r.cacheMu.Lock() // exclusive lock for READ
defer r.cacheMu.Unlock()
result, exists := r.cache[key]
if !exists { return nil }
if elem, ok := r.lruIndex[key]; ok {
r.lruList.MoveToFront(elem)
}
return result
}
Proposed Fix: 先 RLock 读取 map,miss 时释放;LRU MoveToFront 单独获取 Lock。
Acceptance Criteria:
rate-limiter-unbounded-models-map
Severity: Medium | Confidence: Medium | ROI: Medium
Location: llm/ratelimit.go:33, llm/ratelimit.go:187-208
Problem: RateLimiter.models map 为每个唯一模型名创建 rate.Limiter,但从不驱逐。如果模型名是动态的(用户配置或路由器响应),map 无限增长。
Proposed Fix: 添加 TTL 驱逐 goroutine(与 SafetyGuard.userLimiters 的 evictStaleLimiters 模式一致)。
Acceptance Criteria:
metrics-latency-ring-buffer-append-copy
Severity: Medium | Confidence: High | ROI: Medium
Location: llm/metrics.go:175-179
Problem: requestLatencies 滚动窗口使用 append(slice[1:], val) 满时拷贝 999 个 float64(8KB),且在 mu.Lock 下执行。OTel histogram 已处理延迟分布,本地窗口仅用于 GetStats() API。
Current Pattern:
if len(mc.requestLatencies) >= mc.maxLatencySamples {
mc.requestLatencies = append(mc.requestLatencies[1:], latencyMs) // O(n) copy
}
Proposed Fix: 替换为 ring buffer,或对 GetStats() 使用 OTel histogram 数据。
Acceptance Criteria:
memory-manager-unbounded-preferences-map
Severity: Medium | Confidence: High | ROI: High
Location: memory.go:500-503, memory.go:516-524
Problem: MemoryManager.preferences 使用两级 map(userID -> key -> value),条目只增不减。无 TTL、无最大用户数限制、无后台清理。与 SafetyGuard.userLimiters(有 evictStaleLimiters,10 分钟间隔)不同,MemoryManager 无任何驱逐机制。
Current Pattern:
type MemoryManager struct {
preferences map[string]map[string]string // userID -> key -> value
prefMu sync.RWMutex
}
func (m *MemoryManager) RecordUserPreference(userID, key, value string) {
m.prefMu.Lock()
defer m.prefMu.Unlock()
if m.preferences[userID] == nil {
m.preferences[userID] = make(map[string]string)
}
m.preferences[userID][key] = value // only adds, never evicts
}
Proposed Fix: 添加 lastAccess 跟踪和 TTL 驱逐(与 ContextCompressor 的 startCleanupDaemon 模式相同)。
Acceptance Criteria:
Implementation Priority
| Finding |
Priority |
Effort |
Risk |
Impact |
| memory-manager-unbounded |
P1 |
Small |
Low |
防止数千用户场景内存泄漏 |
| intent-router-lock |
P1 |
Small |
Low |
缓存命中吞吐量提升 |
| rate-limiter-unbounded |
P2 |
Medium |
Low |
动态模型名场景防护 |
| metrics-ring-buffer |
P2 |
Small |
Low |
消除 O(n) 拷贝 |
Recommended starting point: memory-manager-unbounded + intent-router-lock — 同为小投入高 ROI
Out of Scope
- CostCalculator.sessions unbounded map(已跟踪于 issue 501)
- Metrics OTel context.Background() trace break(已跟踪于 issue 501)
- fmt.Sprintf per-request prompt build(LLM 延迟掩盖,Low ROI)
Verification
Background
internal/brain是 LLM 客户端装饰器链和意图路由模块(含brain/llm子包)。Phase 2 resource-mgmt + performance 分析发现 3 个无界 map 无 TTL 驱逐、cache 读路径使用排他锁、指标滚动窗口 O(n) 拷贝。Scope: resource-mgmt, performance — cycle 203 (模块分析通过 3)
Key files:
router.go,memory.go,llm/cost.go,llm/ratelimit.go,llm/metrics.goRelated: issue 501 (cost-calculator unbounded map, Phase 1 已跟踪), issue 531 (SafetyGuard race)
Finding Summary
Findings
intent-router-exclusive-lock-on-cache-read
Severity: Medium | Confidence: High | ROI: Medium
Location:
router.go:372-387Problem:
getFromCache是读操作(map lookup + LRU MoveToFront),但获取排他 Lock() 而非 RLock()。每次缓存命中阻塞所有并发读写者。RWMutex 已声明但未在热路径读操作上充分利用。Current Pattern:
Proposed Fix: 先 RLock 读取 map,miss 时释放;LRU MoveToFront 单独获取 Lock。
Acceptance Criteria:
getFromCache使用 RLock 进行 map 查找TestIntentRouter_ConcurrentCacheAccess用-race验证无竞争rate-limiter-unbounded-models-map
Severity: Medium | Confidence: Medium | ROI: Medium
Location:
llm/ratelimit.go:33,llm/ratelimit.go:187-208Problem:
RateLimiter.modelsmap 为每个唯一模型名创建rate.Limiter,但从不驱逐。如果模型名是动态的(用户配置或路由器响应),map 无限增长。Proposed Fix: 添加 TTL 驱逐 goroutine(与 SafetyGuard.userLimiters 的 evictStaleLimiters 模式一致)。
Acceptance Criteria:
lastAccess跟踪和 TTL 驱逐TestRateLimiter_ModelEviction验证过期模型被清理metrics-latency-ring-buffer-append-copy
Severity: Medium | Confidence: High | ROI: Medium
Location:
llm/metrics.go:175-179Problem:
requestLatencies滚动窗口使用append(slice[1:], val)满时拷贝 999 个 float64(8KB),且在mu.Lock下执行。OTel histogram 已处理延迟分布,本地窗口仅用于 GetStats() API。Current Pattern:
Proposed Fix: 替换为 ring buffer,或对 GetStats() 使用 OTel histogram 数据。
Acceptance Criteria:
memory-manager-unbounded-preferences-map
Severity: Medium | Confidence: High | ROI: High
Location:
memory.go:500-503,memory.go:516-524Problem:
MemoryManager.preferences使用两级 map(userID -> key -> value),条目只增不减。无 TTL、无最大用户数限制、无后台清理。与 SafetyGuard.userLimiters(有 evictStaleLimiters,10 分钟间隔)不同,MemoryManager 无任何驱逐机制。Current Pattern:
Proposed Fix: 添加 lastAccess 跟踪和 TTL 驱逐(与 ContextCompressor 的 startCleanupDaemon 模式相同)。
Acceptance Criteria:
lastAccess map[string]time.Time跟踪TestMemoryManager_PreferenceEviction验证 TTL 清理行为Implementation Priority
Recommended starting point: memory-manager-unbounded + intent-router-lock — 同为小投入高 ROI
Out of Scope
Verification
make test通过,无回归make lint不产生新警告go test -race ./internal/brain/...无数据竞争