Skip to content

fix(worker/claudecode): split-lock in SendControlRequest + tempFiles race + silent usage unmarshal #541

@hrygo

Description

@hrygo

Background

internal/worker/claudecode/ 是 Claude Code stdio 适配器,负责 Worker 进程管理、stdin/stdout 管道、控制请求(permission response、context usage)、临时文件注入。Phase 2 error-handling + concurrency 分析发现 3 个 Medium 级别问题。Related: issue 502(claudecode readOutput bypasses Mapper), issue 511(worker/base Terminate lock-get-nil pattern)

Scope: error-handling, concurrency — cycle 199 (模块分析通过 2)
Key files: control.go, worker.go, parser.go


Finding Summary

Category Critical High Medium Low
Concurrency 0 0 2 0
Error Handling 0 0 1 0
合计 0 0 3 0

Findings

Concurrency

sendcontrolrequest-split-lock

Severity: Medium | Confidence: High | ROI: Medium
Location: control.go:155-172

Problem: SendControlRequest 在一次调用中 3 次获取/释放 h.mu:注册 pending channel → 写入 stdin → deferred cleanup。register 和 write 之间非原子,理论上另一个 goroutine 可能在中间插入写入。

Current Pattern:

ch := make(chan map[string]any, 1)
h.mu.Lock()
h.pendingRequests[reqID] = ch
h.mu.Unlock()
// ... defer cleanup ...
h.mu.Lock()
_, err = h.stdin.Write(data)
h.mu.Unlock()

Proposed Fix: 合并 register + write 为一次锁获取:

ch := make(chan map[string]any, 1)
h.mu.Lock()
h.pendingRequests[reqID] = ch
_, err = h.stdin.Write(data)
h.mu.Unlock()
if err != nil { /* cleanup + return */ }
defer func() {
    h.mu.Lock()
    delete(h.pendingRequests, reqID)
    h.mu.Unlock()
    select { case <-ch:; default: }
}()

Acceptance Criteria:

  • SendControlRequest register + write 在同一锁获取下完成
  • make test 零回归

tempfiles-no-mutex-protection

Severity: Medium | Confidence: High | ROI: Medium
Location: worker.go:950, worker.go:958-967

Problem: w.tempFileswriteTempFile 中 append(在 w.Mu 保护下),但 cleanupTempFiles 不获取 w.Mu 就遍历并置 nil。如果 Terminate 与 ResetContext 重叠调用,理论上存在竞争。

Current Pattern:

func (w *Worker) cleanupTempFiles() {
    for _, path := range w.tempFiles {
        // ...
    }
    w.tempFiles = nil
}

Proposed Fix:

func (w *Worker) cleanupTempFiles() {
    w.Mu.Lock()
    files := w.tempFiles
    w.tempFiles = nil
    w.Mu.Unlock()
    for _, path := range files {
        // ... os.Remove ...
    }
}

Acceptance Criteria:

  • cleanupTempFilesw.Mu 保护下读取和清空 tempFiles
  • 文件删除在锁释放后执行(避免持锁做 I/O)
  • make test 零回归

Error Handling

parse-result-swallows-unmarshal-errors

Severity: Medium | Confidence: High | ROI: High
Location: parser.go:337-341

Problem: parseResult_ = json.Unmarshal(...) 静默丢弃 msg.Usagemsg.ModelUsage 的反序列化错误。如果 Claude Code 改变 schema,下游统计会静默显示零 cost/tokens,极难诊断。

Current Pattern:

var usage, modelUsage map[string]any
if len(msg.Usage) > 0 {
    _ = json.Unmarshal(msg.Usage, &usage)
}
if len(msg.ModelUsage) > 0 {
    _ = json.Unmarshal(msg.ModelUsage, &modelUsage)
}

Proposed Fix:

var usage, modelUsage map[string]any
if len(msg.Usage) > 0 {
    if err := json.Unmarshal(msg.Usage, &usage); err != nil {
        p.log.Warn("parser: unmarshal usage", "err", err, "raw", string(msg.Usage))
    }
}
if len(msg.ModelUsage) > 0 {
    if err := json.Unmarshal(msg.ModelUsage, &modelUsage); err != nil {
        p.log.Warn("parser: unmarshal model_usage", "err", err, "raw", string(msg.ModelUsage))
    }
}

Acceptance Criteria:

  • usage 和 model_usage 的 unmarshal 失败时输出 Warn 日志
  • 日志包含原始 JSON 内容用于诊断
  • make test 零回归

Implementation Priority

Finding Priority Effort Risk Impact
parse-result-swallows-unmarshal-errors P0 Small Low ~8 行,恢复 usage 可观测性
sendcontrolrequest-split-lock P1 Small Low ~10 行,减少锁获取次数
tempfiles-no-mutex-protection P1 Small Low ~5 行,消除潜在竞争

Recommended starting point: 先修 unmarshal 错误吞咽(P0/High-ROI)


Out of Scope

  • readOutput bypasses Mapper layer(已在 issue 502)
  • Terminate/Kill/Wait lock-get-nil pattern(已在 issue 511)

Verification

  • make test 通过,无回归
  • go test -race ./internal/worker/claudecode/... 无竞争报告

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Medium: tech debt, refactoring, improvementsarchitectureDomain: design patterns, coupling, separation of concerns

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions