Skip to content

fix(messaging/tts): fallback close error chain lost + moss cold-start mutex contention #540

@hrygo

Description

@hrygo

Background

internal/messaging/tts/ 是 TTS 语音合成模块(Edge-TTS + MOSS),包含 Synthesizer 接口、FallbackSynthesizer、SharedSynthesizer、MossProcess 子进程管理。Phase 2 error-handling + concurrency 分析发现 2 个 Medium 级别问题。

Scope: error-handling, concurrency — cycle 198 (模块分析通过 2)
Key files: tts.go, moss_process.go


Finding Summary

Category Critical High Medium Low
Error Handling 0 0 1 0
Concurrency 0 0 1 0
合计 0 0 2 0

Findings

Error Handling

fallback-close-errors-lost-chain

Severity: Medium | Confidence: High | ROI: High
Location: tts.go:59-75

Problem: FallbackSynthesizer.Closefmt.Errorf("...: %v", errs) 聚合关闭错误,将错误切片转为字符串,丢失错误链。调用者无法用 errors.Is/errors.As 识别哪个 synthesizer 关闭失败。项目在 Go 1.26 上,可直接使用 errors.Join

Current Pattern:

// tts.go:71-74
if len(errs) > 0 {
    return fmt.Errorf("fallback close errors: %v", errs)
}

Proposed Fix:

if len(errs) > 0 {
    return fmt.Errorf("fallback close: %w", errors.Join(errs...))
}

Estimated Impact: 1 行修改,恢复错误链

Acceptance Criteria:

  • FallbackSynthesizer.Close 使用 errors.Join 聚合错误
  • 调用者可通过 errors.Is/errors.As 检查具体 synthesizer 关闭失败
  • make test 零回归

Concurrency

moss-start-holds-mutex-60s

Severity: Medium | Confidence: High | ROI: Medium
Location: moss_process.go:85-97, moss_process.go:176-236

Problem: MossProcess.Synthesize 在持有 p.mu 期间调用 ensureRunningLockedstartwaitForReady。sidecar 预热每 500ms 轮询,最长 60 秒。期间所有并发 Synthesize 调用阻塞在 p.mu.Lock()。热路径不受影响(ensureRunningLocked 在 sidecar 运行时立即返回)。

Current Pattern:

// moss_process.go:85-97
func (p *MossProcess) Synthesize(ctx context.Context, text, voice string) ([]byte, error) {
    p.mu.Lock()
    // ...
    if err := p.ensureRunningLocked(ctx); err != nil {  // holds mu for up to 60s
        p.mu.Unlock()
        return nil, fmt.Errorf("tts moss: %w", err)
    }
    p.activeWg.Add(1)
    p.activeCount.Add(1)
    p.mu.Unlock()

Proposed Fix: 添加 starting bool + readyCh chan struct{} 字段,实现 single-flight start pattern — 第一个调用者启动 sidecar,其余调用者释放 mu 等待 readyCh

func (p *MossProcess) ensureRunningLocked(ctx context.Context) error {
    if p.closed { return ErrSynthesizerClosed }
    if p.started && p.isAlive() { return nil }
    if p.starting {
        ready := p.readyCh
        p.mu.Unlock()
        defer p.mu.Lock()
        select {
        case <-ready:
        case <-ctx.Done():
            return ctx.Err()
        }
        if p.closed || !p.isAlive() {
            return fmt.Errorf("tts moss: sidecar failed to start")
        }
        return nil
    }
    return p.start(ctx)
}

Estimated Impact: ~30 行修改,消除冷启动 60s 并发延迟尖峰

Acceptance Criteria:

  • MossProcess 添加 startingreadyCh 字段
  • 并发 Synthesize 调用在 sidecar 预热期间不阻塞 p.mu
  • 热路径性能不受影响(benchmark 确认)
  • make test 零回归

Implementation Priority

Finding Priority Effort Risk Impact
fallback-close-errors-lost-chain P0 Small Low 1 行,恢复错误链
moss-start-holds-mutex-60s P1 Medium Medium ~30 行,消除冷启动延迟

Recommended starting point: 1 行 errors.Join 修复(P0/High-ROI)


Out of Scope

  • Edge-TTS read goroutine 管理(已正确使用 buffered done channel)
  • SharedSynthesizer 引用计数(已正确序列化)
  • idleMonitor 不等待 done channel(有意设计,避免死锁)

Verification

  • make test 通过,无回归
  • go test -race ./internal/messaging/tts/... 无竞争报告
  • 验证 FallbackSynthesizer.Close 错误链可通过 errors.Is 检查

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Medium: tech debt, refactoring, improvementsarchitectureDomain: design patterns, coupling, separation of concerns

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions