Background
internal/gateway 是 HotPlex 核心消息路由层。Phase 2 resource-mgmt + performance 分析发现热路径存在不必要的重复编码和分配开销。
Scope: resource-mgmt, performance — cycle 203 (模块分析通过 3)
Key files: conn.go, hub.go, bridge_forward.go, platform_writer.go
Finding Summary
| Category |
Critical |
High |
Medium |
Low |
| Performance |
0 |
1 |
2 |
0 |
| Resource-mgmt |
0 |
0 |
0 |
0 |
| 合计 |
0 |
1 |
2 |
0 |
Findings
per-connection-re-encode-on-route
Severity: High | Confidence: High | ROI: Medium
Location: conn.go:656-671, hub.go:454-486, platform_writer.go:83-86
Problem: Hub.routeMessage 对每个订阅的 SessionWriter 独立调用 RouteWrite,每次都重新 JSON-marshal 同一个 Envelope。对于 1 WS conn + 1 platform conn 的会话,每条 message.delta 被编码 2-3 次,每次分配新 []byte。
Current Pattern:
// hub.go routeMessage — per-conn encoding
for _, conn := range conns {
if err := conn.RouteWrite(ctx, msg.Env); err == nil {
continue
}
}
// conn.go RouteWrite — encodes every time
func (c *Conn) RouteWrite(_ context.Context, env *events.Envelope) error {
data, err := aep.EncodeJSON(env) // allocates []byte per call
...
c.sendData(data)
}
Proposed Fix: 在 routeMessage 中编码一次,对 WS conn 发送原始字节;platform conn 保留独立编码(需要不同处理)。
Acceptance Criteria:
forward-events-clone-per-event
Severity: Medium | Confidence: High | ROI: Medium
Location: bridge_forward.go:176-178, bridge_forward.go:156-159
Problem: forwardEvents 对每条出站事件调用 events.Clone() 深拷贝 Envelope + map[string]any Data。在高频 message.delta(6-20/sec/session)下产生持续分配压力。Clone 是正确性必需的(Hub.Run 并发编码),优化目标是减少分配成本。
Proposed Fix: 验证 typed Event.Data(MessageDeltaData, DoneData 等)路径占主导 — 如果 >80% 事件使用 typed Data(非 map[string]any),当前 Clone 已接近最优(struct copy 不触发 deepCopyMap)。
Acceptance Criteria:
snapshot-conns-per-route-allocation
Severity: Medium | Confidence: High | ROI: Medium
Location: hub.go:229-238, hub.go:454-455
Problem: snapshotConns 在每次 routeMessage 调用时分配新 []SessionWriter slice。10 events/sec * 10 sessions = 100 slice allocations/sec。Hub.Run 是单线程的,可以直接在 RLock 下迭代而非快照。
Current Pattern:
func (h *Hub) snapshotConns(sessionID string) []SessionWriter {
h.mu.RLock()
sessionConns := h.sessions[sessionID]
conns := make([]SessionWriter, 0, len(sessionConns))
for conn := range sessionConns {
conns = append(conns, conn)
}
h.mu.RUnlock()
return conns
}
Proposed Fix: 在 routeMessage 中直接持有 RLock 迭代,延迟错误处理(removeSession)到迭代后批处理。
Acceptance Criteria:
Implementation Priority
| Finding |
Priority |
Effort |
Risk |
Impact |
| per-connection-re-encode |
P1 |
Medium |
Low |
热路径编码减少 N-1 次 |
| forward-events-clone |
P2 |
Small |
Low |
确认 typed path 主导后可接受 |
| snapshot-allocation |
P2 |
Medium |
Medium |
需处理 RLock 下的错误移除 |
Recommended starting point: per-connection-re-encode — 最大性能收益
Out of Scope
- Hub.Run 单线程瓶颈(已知设计选择,分片重构 ROI 低)
- Conn.writeCh 缓冲大小(64 合理,无争用证据)
Verification
Related: issue 526 (accumMu lock contention)
Background
internal/gateway是 HotPlex 核心消息路由层。Phase 2 resource-mgmt + performance 分析发现热路径存在不必要的重复编码和分配开销。Scope: resource-mgmt, performance — cycle 203 (模块分析通过 3)
Key files:
conn.go,hub.go,bridge_forward.go,platform_writer.goFinding Summary
Findings
per-connection-re-encode-on-route
Severity: High | Confidence: High | ROI: Medium
Location:
conn.go:656-671,hub.go:454-486,platform_writer.go:83-86Problem: Hub.routeMessage 对每个订阅的 SessionWriter 独立调用 RouteWrite,每次都重新 JSON-marshal 同一个 Envelope。对于 1 WS conn + 1 platform conn 的会话,每条 message.delta 被编码 2-3 次,每次分配新 []byte。
Current Pattern:
Proposed Fix: 在 routeMessage 中编码一次,对 WS conn 发送原始字节;platform conn 保留独立编码(需要不同处理)。
Acceptance Criteria:
Hub.routeMessage对同一 Envelope 只调用aep.EncodeJSON一次SendData(data []byte)发送预编码字节BenchmarkRouteMessage_Throughput显示编码分配减少 N-1 次(N 为 conn 数)forward-events-clone-per-event
Severity: Medium | Confidence: High | ROI: Medium
Location:
bridge_forward.go:176-178,bridge_forward.go:156-159Problem: forwardEvents 对每条出站事件调用 events.Clone() 深拷贝 Envelope + map[string]any Data。在高频 message.delta(6-20/sec/session)下产生持续分配压力。Clone 是正确性必需的(Hub.Run 并发编码),优化目标是减少分配成本。
Proposed Fix: 验证 typed Event.Data(MessageDeltaData, DoneData 等)路径占主导 — 如果 >80% 事件使用 typed Data(非 map[string]any),当前 Clone 已接近最优(struct copy 不触发 deepCopyMap)。
Acceptance Criteria:
snapshot-conns-per-route-allocation
Severity: Medium | Confidence: High | ROI: Medium
Location:
hub.go:229-238,hub.go:454-455Problem: snapshotConns 在每次 routeMessage 调用时分配新 []SessionWriter slice。10 events/sec * 10 sessions = 100 slice allocations/sec。Hub.Run 是单线程的,可以直接在 RLock 下迭代而非快照。
Current Pattern:
Proposed Fix: 在 routeMessage 中直接持有 RLock 迭代,延迟错误处理(removeSession)到迭代后批处理。
Acceptance Criteria:
routeMessage不再调用snapshotConns,改为直接 RLock 迭代BenchmarkRouteMessage_Allocs显示 allocs/op 减少Implementation Priority
Recommended starting point: per-connection-re-encode — 最大性能收益
Out of Scope
Verification
make test通过,无回归make lint不产生新警告go test -bench=BenchmarkRouteMessage -count=5验证改进Related: issue 526 (accumMu lock contention)