Skip to content

Add pprof & mcp metrics in api-service and request retry in aenv sdk#71

Merged
lanmaoxinqing merged 19 commits intomainfrom
jun/metrics
Mar 16, 2026
Merged

Add pprof & mcp metrics in api-service and request retry in aenv sdk#71
lanmaoxinqing merged 19 commits intomainfrom
jun/metrics

Conversation

@JacksonMei
Copy link
Copy Markdown
Collaborator

No description provided.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the stability and observability of both the Python SDK and the Go API service. The Python SDK now features sophisticated retry logic for establishing MCP sessions and manages persistent sessions more efficiently, while carefully avoiding retries for potentially non-idempotent tool executions. Concurrently, the Go API service gains powerful profiling capabilities through pprof and introduces detailed Prometheus metrics for MCP proxy requests, offering deeper insights into its performance and behavior. Additionally, several Go dependencies and the Go toolchain have been updated, and HTTP client resource management has been improved.

Highlights

  • Python SDK (aenv) Request Retry Logic: Implemented robust retry mechanisms with exponential backoff and jitter for establishing MCP sessions within the aenvironment SDK. This enhances resilience against transient network issues or service unavailability during session setup.
  • Idempotent-Safe Tool Execution: Modified the call_tool method in the Python SDK to ensure idempotent-safe retries. Session establishment failures are retried, but actual tool execution failures are not re-invoked to prevent unintended side effects from non-idempotent operations.
  • Persistent MCP Session Management: Introduced persistent MCP session management within the Python SDK's async context manager, reducing overhead by avoiding repeated session initialization and deletion for every tool call.
  • Go API Service Profiling (pprof): Integrated pprof into the Go API service, enabling on-demand profiling endpoints to diagnose performance bottlenecks and memory usage.
  • New MCP Proxy Metrics: Added new Prometheus metrics (mcp_requests_total and mcp_request_duration_ms) to the Go API service for monitoring MCP proxy requests, including the extraction of JSON-RPC method names for granular observability.
  • Go API Service HTTP Client Enhancements: Improved the FaaS HTTP client in the Go API service by adding defer resp.Body.Close() for proper resource cleanup and increasing MaxIdleConnsPerHost to optimize connection reuse.
  • Dependency and Go Version Updates: Updated Python SDK version to 0.1.6rc1 and bumped the Go toolchain to 1.23.1 along with various Go module dependencies.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • aenv/pyproject.toml
    • Updated project version to 0.1.6rc1.
  • aenv/src/aenv/core/environment.py
    • Imported the 'random' module for jitter in backoff strategy.
    • Added '_backoff' asynchronous method for exponential backoff with jitter.
    • Implemented '_rebuild_mcp_client' asynchronous method to destroy and recreate the MCP client for session recovery.
    • Modified 'aenter' to ensure a persistent MCP session is established.
    • Modified 'aexit' to close the persistent MCP session.
    • Added '_ensure_mcp_session' asynchronous method to establish a persistent MCP session.
    • Added '_close_mcp_session' asynchronous method to close the persistent MCP session.
    • Updated 'release' method to catch 'asyncio.CancelledError' when closing the MCP client.
    • Added detailed retry strategy documentation to the 'call_tool' method.
    • Updated 'call_tool' method's 'Raises' section to include 'EnvironmentError' for session establishment failures.
    • Implemented retry logic for session establishment within 'call_tool' with exponential backoff and client rebuilds.
    • Modified exception handling in 'call_tool' to mark the MCP session as inactive and prevent retries for tool execution failures due to non-idempotent safety.
    • Updated exception handling in '_ensure_mcp_session' to include 'asyncio.CancelledError' when closing stale MCP clients.
    • Removed error logging from '_ensure_mcp_session' for failed session establishment, now handled by the calling context.
  • aenv/uv.lock
    • Updated 'aenvironment' package version to 0.1.6rc1.
    • Added new wheel URLs for 'caio' package across different Python versions and architectures.
  • api-service/go.mod
    • Simplified the 'replace' directive for 'envhub'.
    • Added 'github.com/gin-contrib/pprof' as an indirect dependency.
  • api-service/go.sum
    • Added new dependency entries for 'github.com/creack/pty', 'github.com/gin-contrib/pprof', 'github.com/gin-gonic/gin' (v1.8.1), 'github.com/go-playground/assert/v2' (v2.0.1), 'github.com/go-playground/locales' (v0.14.0), 'github.com/go-playground/universal-translator' (v0.18.0), 'github.com/go-playground/validator/v10' (v10.10.0), 'github.com/goccy/go-json' (v0.9.7), 'github.com/kr/pretty' (v0.2.1, v0.3.0), 'github.com/leodido/go-urn' (v1.2.1), 'github.com/mattn/go-isatty' (v0.0.14), 'github.com/pelletier/go-toml/v2' (v2.0.1), 'github.com/pkg/diff', 'github.com/rogpeppe/go-internal' (v1.6.1, v1.8.0), 'github.com/stretchr/testify' (v1.6.1), 'github.com/ugorji/go' (v1.2.7), 'google.golang.org/protobuf' (v1.28.0), 'gopkg.in/check.v1' (v1.0.0-20201130134442), 'gopkg.in/yaml.v3' (v3.0.0-20210107).
    • Updated existing dependency hashes for various Go modules.
  • api-service/main.go
    • Imported 'github.com/gin-contrib/pprof'.
    • Registered 'pprof' handlers with the main router.
    • Replaced 'middleware.MetricsMiddleware()' with 'middleware.MCPMetricsMiddleware()' for the MCP router.
  • api-service/metrics/metrics.go
    • Added 'MCPRequestsTotal' Prometheus counter vector for MCP proxy requests, labeled by method, endpoint, rpc_method, and status.
    • Added 'MCPRequestDurationMs' Prometheus histogram vector for MCP proxy request durations, labeled by method, endpoint, rpc_method, and status.
  • api-service/middleware/metrics.go
    • Imported 'bytes', 'encoding/json', and 'io' packages.
    • Updated 'MetricsMiddleware' comment to reflect its use of Gin's FullPath().
    • Introduced 'MCPMetricsMiddleware' to record MCP proxy request metrics, including extracting the JSON-RPC method from the request body.
  • api-service/service/faas_client.go
    • Added 'defer resp.Body.Close()' to ensure the response body is closed after 'InitializeFunction' call.
  • api-service/service/faas_model/http_client.go
    • Increased 'MaxIdleConnsPerHost' to 20 in the HTTP client transport configuration.
    • Added 'defer r.resp.Body.Close()' to ensure the response body is closed after 'Into' method call.
  • go.work
    • Updated Go version to 1.23.0.
    • Added 'toolchain go1.23.1' directive.
  • go.work.sum
    • Added new dependency entries for 'github.com/bytedance/sonic', 'github.com/cloudwego/base64x', 'github.com/gabriel-vasile/mimetype' (v1.4.3), 'github.com/go-playground/validator/v10' (v10.20.0), 'github.com/google/go-cmp' (v0.6.0), 'github.com/klauspost/cpuid/v2', 'github.com/pelletier/go-toml/v2' (v2.2.2), 'github.com/stretchr/testify' (v1.9.0), 'github.com/ugorji/go' (v1.2.7), 'golang.org/x/arch', 'golang.org/x/crypto' (v0.23.0, v0.33.0, v0.36.0), 'golang.org/x/mod' (v0.17.0), 'golang.org/x/net' (v0.21.0, v0.25.0, v0.33.0, v0.34.0), 'golang.org/x/sync' (v0.13.0), 'golang.org/x/sys' (v0.20.0, v0.30.0, v0.31.0), 'golang.org/x/term' (v0.30.0, v0.31.0), 'golang.org/x/text' (v0.15.0, v0.22.0, v0.23.0), 'golang.org/x/tools' (v0.21.1-0), 'google.golang.org/protobuf' (v1.34.1).
    • Updated existing dependency hashes for various Go modules.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several valuable improvements across the api-service and the aenv Python SDK, enhancing robustness, performance, and observability through additions like pprof endpoints and MCP request metrics, and a robust retry mechanism for MCP sessions. However, a security audit identified critical vulnerabilities, including the exposure of unauthenticated pprof endpoints, a potential Denial of Service (DoS) due to unrestricted memory allocation when reading request bodies, and the logging of potentially sensitive information in plain text. Addressing these security concerns is paramount. Additionally, a minor suggestion for code cleanup in the Python SDK has been noted to improve maintainability.


mainRouter.GET("/health", healthChecker)
mainRouter.GET("/metrics", gin.WrapH(promhttp.Handler()))
pprof.Register(mainRouter)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The pprof profiling endpoints are registered on the mainRouter without any authentication or access control. This exposes sensitive internal state and performance data of the application to anyone who can reach the service. It is recommended to wrap the pprof registration in an authentication middleware or expose it on a separate, internal-only port.

return func(c *gin.Context) {
var rpcMethod string
if c.Request.Method == "POST" && c.Request.Body != nil {
if body, err := io.ReadAll(c.Request.Body); err == nil && len(body) > 0 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The MCPMetricsMiddleware uses io.ReadAll to read the entire request body into memory without any size limit. An attacker can send a very large request body to exhaust the server's memory, leading to a Denial of Service. It is recommended to use io.LimitReader to restrict the maximum size of the request body that can be read into memory.

@@ -649,8 +697,39 @@ async def call_tool(
f"{self._log_prefix()} Executing tool: {actual_tool_name} in environment {self.env_name}, arguments={arguments}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The arguments dictionary, which may contain sensitive information such as API keys, passwords, or tokens, is logged in plain text when a tool execution is initiated or fails. This can lead to sensitive data being exposed in the logs.

f"NOT retrying (non-idempotent safety). | "
f"Environment: {self.env_name} | "
f"Tool: {actual_tool_name} | "
f"Arguments: {arguments} | "
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The arguments dictionary is logged in plain text when a tool execution fails. This can lead to sensitive data being exposed in the logs.

Comment on lines +270 to +276
async def _ensure_mcp_session(self):
"""Ensure a persistent MCP session is established."""
if not self._mcp_session_active:
client = await self._get_mcp_client()
logger.info(f"{self._log_prefix()} Opening persistent MCP session")
await client.__aenter__()
self._mcp_session_active = True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This method _ensure_mcp_session is defined twice in the Environment class. The second definition at line 1005 will overwrite this one at runtime, making this block of code unreachable. This can be confusing for future maintenance. To improve code clarity and remove dead code, I recommend deleting this first definition.

JacksonMei and others added 14 commits March 10, 2026 12:34
Add comprehensive Prometheus metrics (requests, latency, instance ops,
active instances gauge, uptime gauge) to api-service. Implement labels
propagation through full chain: SDK -> api-service -> faas-api-service
-> metaservice, enabling per-experiment/owner/app metric breakdown.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Share request metrics with MCP data plane traffic on port 8081.
Skip recording /health endpoint errors to avoid noise from
proxy-less K8s liveness probes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove mockEnvInstanceService and mockBackendClient that were defined
but never used, causing golangci-lint unused errors in CI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each call_tool/list_tools was wrapping the MCP client in `async with
client:`, causing a full session lifecycle (initialize/call/DELETE) per
invocation. When the server was busy, re-initialize could hang for 60s.

Move session management to Environment __aenter__/__aexit__ so one
session is reused for all tool calls within the context manager.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add gin-contrib/pprof v1.4.0 for runtime profiling
- Fix resp.Body leak in faas_model/http_client.go Into() method
- Fix resp.Body leak in faas_client.go InitializeFunction()
- Add HTTP Transport pool config (MaxIdleConns, IdleConnTimeout)
- Add MCPMetricsMiddleware with real path and JSON-RPC method extraction
- Remove unused mock types in labels test to fix golangci-lint

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add circuit breaker (5 consecutive failures -> UnrecoverableEnvironmentError)
- Rebuild MCP client after tool execution failure to break ClosedResourceError cascade
- Add retry logic to __aenter__ session establishment (max 2 attempts)
- Remove duplicate _ensure_mcp_session definition (dead code)
- Add unit tests for circuit breaker and __aenter__ retry
- Add e2e test suite for session persistence verification

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… leak

- Add shared http.Transport with connection pooling to MCPGateway
- SSE handler and HTTP proxy reuse transport instead of creating per-request
- Update Dockerfile from golang:1.21 to golang:1.23 to match go.work

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ation tuning

Truncate request/response body to 2KB in logs, compress rotated files,
reduce max backups from 30 to 10, set 7-day max age, and raise file
log level from Debug to Info.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…accuracy

Add LimitReader(8KB) to prevent OOM, set GetBody for reverse proxy
rewind, skip SSE duration histogram, normalize endpoint labels to
prevent cardinality explosion, and store body/rpc_method in context.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rename the business metric label key from "env" to "envName" across
code, metrics, and tests to avoid ambiguity with environment variables.
Also fix typo in lumberjack import (natefinsh -> natefinch) and remove
unused truncateString function.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add test_e2e_resilience.py covering SDK recovery scenarios:
- Tool call recovery after api-service pod restart
- Session rebuild after transient network failure
- Instance preservation across session rebuilds
- Concurrent tool calls during network jitter
- Circuit breaker behavior under rapid failures

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
JacksonMei and others added 2 commits March 10, 2026 14:04
…ric label

The previous commit incorrectly renamed the instance label key from
"env" to "envName" in business logic and storage. The intent was only
to rename the Prometheus metric label to avoid collision with system
env variables. Instance labels remain "env" internally; the collector
maps inst.Labels["env"] to Prometheus label "envName" via
BusinessLabelKeys.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- aenv: increase __aenter__ max_attempts from 2 to 3 for consistency
  with call_tool session retry logic
- api-service: cap large body read at 1MB in MCPMetricsMiddleware to
  prevent memory exhaustion from abnormal requests; remove unused
  _req_body context key

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@daihaowz daihaowz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

严重问题 (CRITICAL / HIGH)

  1. Redis SCAN 游标变量遮蔽 — 大数据集下无限循环

文件: api-service/service/redis.go — ListEnvInstancesFromRedis 和 deleteKeysByScan

for {
scannedKeys, cursor, err := r.client.Scan(r.ctx, cursor, pattern, 100).Result()
// ^^^^^^ := 创建了新的局部变量,外层 cursor 始终为 0

:= 遮蔽了外层的 cursor,下一轮循环传给 Scan 的始终是 0,导致大数据集下无限循环。小数据集碰巧一次扫完所以没触发。(注:此 bug 在 main 上已存在,但本分支修改了 redis.go 却没有修复)

  1. pprof 端点在生产环境无鉴权暴露

文件: api-service/main.go

pprof.Register(mainRouter)

/debug/pprof/ 注册在外部可访问的主路由上,无任何鉴权中间件。攻击者可获取 goroutine dump、堆内存信息,甚至触发 CPU profiling 造成 DoS。

  1. _rebuild_mcp_client 未持有 session 锁 — 并发竞态

文件: aenv/src/aenv/core/environment.py:190-200

_rebuild_mcp_client 直接修改 self._mcp_client 和 self._mcp_session_active,未获取 _mcp_session_lock。而 _ensure_mcp_session 通过锁保护同样的字段。当多个协程共享同一个 Environment 实例时,会导致状态不一致(新创建的 client 被 rebuild 覆盖销毁)。

  1. envInstance.Env 空指针 panic

文件: api-service/service/redis.go — ListEnvInstancesFromRedis

if envInstance != nil && envInstance.Env.Name != "" {

当 envInstance 非 nil 但 envInstance.Env 为 nil 时,会空指针 panic。应改为 envInstance != nil && envInstance.Env != nil && envInstance.Env.Name != ""。

  1. environmentVariables key 更名 — API 兼容性破坏

文件: api-service/controller/env_instance.go

// main: backendEnv.DeployConfig["environmentVariables"] = ...
// branch: backendEnv.DeployConfig["environment_variables"] = ...

key 从 camelCase 改为 snake_case,如果下游 controller 读的还是 "environmentVariables",环境变量将静默丢失。

  1. Redis fallback 丢失 version 过滤

文件: api-service/controller/env_instance.go — ListEnvInstances

当 Redis 数据不完整回退到 service 层时,ListEnvInstances(envName) 只按名称过滤,丢弃了版本信息,返回所有版本的实例。


中等问题 (MEDIUM)

  1. _req_body 上下文键从未设置 — 死代码

文件: api-service/middleware/logging.go:866 + middleware/metrics.go

LoggingMiddleware 检查 c.Get("_req_body"),但 MCPMetricsMiddleware 从未调用 c.Set("_req_body", ...),导致该分支永远不会执行。

  1. Collector Stop() 无 double-close 保护

文件: api-service/metrics/collector.go

func (c *Collector) Stop() { close(c.stopCh) }

无 sync.Once 保护,重复调用会 panic: close of closed channel。

  1. select{} 之后的 defer 永远不会执行

文件: api-service/main.go

defer metricsCollector.Stop()
defer cleanManager.Stop()
select {} // 永远阻塞,defer 不可达

资源清理代码实际上是死代码,Stop() 永远不会被调用。

  1. 熔断器无恢复机制

文件: aenv/src/aenv/core/environment.py:697-701

_consecutive_tool_errors 达到阈值后 Environment 永久不可用。没有 half-open 状态、超时重置或手动 reset 方法。长期运行的 Environment 一旦触发熔断就永久报废。

  1. 异步锁延迟初始化存在竞态

文件: aenv/src/aenv/core/environment.py:808-809, 1046-1047

if not hasattr(self, "_init_lock"):
self._init_lock = asyncio.Lock()

两个协程可在 hasattr 检查和赋值之间交替执行,创建两个不同的锁实例。应在 init 中初始化。

  1. Prometheus 标签基数爆炸风险

文件: api-service/middleware/metrics.go + api-service/metrics/metrics.go

  • 主路由 FullPath() 对 404 路径回退到原始 URL → 扫描器可创建无限时序
  • InstanceUptimeSeconds 包含 instance_id 标签 → 高频创建/销毁实例时基数爆炸
  • CreateEnvInstance 在 binding 验证前就用 req.EnvName 做 label → 攻击者可注入任意值
  1. MCP body 静默截断

文件: api-service/middleware/metrics.go

body 超过 ~1MB+8KB 时被静默截断,读取错误被吞掉(remaining, _ := io.ReadAll(...)),向上游转发不完整的 body 而无任何告警。

  1. Warmup 未发送请求体

文件: api-service/service/env_instance.go

httpReq, err := http.NewRequest("PUT", url, nil) // body 是 nil

req *backend.Env 参数被接受但从未序列化发送。

  1. 5xx 响应不重试

文件: aenv/src/aenv/client/scheduler_client.py

重试逻辑只捕获 httpx.RequestError(网络层错误),服务端 500/503 等瞬态错误不会被重试。

  1. wait_for_status 不处理 TERMINATED 状态

文件: aenv/src/aenv/client/scheduler_client.py

等待 RUNNING 状态时只短路 FAILED,对 TERMINATED(终态)仍会一直轮询直到超时。

  1. Labels 在 list 响应中丢失

文件: api-service/service/schedule_client.go — ListEnvInstances

PodListResponseData 没有 Labels 字段,通过 ScheduleClient 路径 list 时创建时设置的 labels 全部丢失。

  1. Label 命名混淆

文件: api-service/metrics/metrics.go + metrics/collector.go

Prometheus label 名为 envName,但从 inst.Labels["env"] 取值。命名不一致容易引起后续维护 bug。


低等问题 (LOW)

┌─────┬──────────────────────────────┬────────────────────────────────────────────────────┐
│ # │ 文件 │ 问题 │
├─────┼──────────────────────────────┼────────────────────────────────────────────────────┤
│ 19 │ environment.py:252 │ initialize() dummy 路径返回 None 而非声明的 bool │
├─────┼──────────────────────────────┼────────────────────────────────────────────────────┤
│ 20 │ environment.py:513 │ call_function 使用可变默认参数 {} │
├─────┼──────────────────────────────┼────────────────────────────────────────────────────┤
│ 21 │ environment.py:220 │ aenter 循环后有不可达的 return self │
├─────┼──────────────────────────────┼────────────────────────────────────────────────────┤
│ 22 │ logging.py:82 │ getLogger 的 type
参数被完全忽略 │
├─────┼──────────────────────────────┼────────────────────────────────────────────────────┤
│ 23 │ scheduler_client.py │ 使用废弃的 asyncio.get_event_loop() │
├─────┼──────────────────────────────┼────────────────────────────────────────────────────┤
│ 24 │ redis.go │ Redis key 无 TTL,无状态对账机制,长期内存泄漏 │
├─────┼──────────────────────────────┼────────────────────────────────────────────────────┤
│ 25 │ env_instance.go (controller) │ TTL 行为从"始终设置"变为"条件设置",可能影响客户端 │
└─────┴──────────────────────────────┴────────────────────────────────────────────────────┘


已修复的 Bug (本分支修复了 main 上的问题)

  • ListEnvInstances 中 token 为 nil 时缺少 return,导致空指针 — 已修复
  • DeployConfig 为 nil 时直接写 map 导致 panic — 已修复(nil check 前移)

建议优先修复

  1. pprof 生产暴露 (#2) — 安全问题,必须立即修复(加 auth 或环境变量控制)
  2. Redis SCAN 游标遮蔽 (#1) — 将 := 改为 =
  3. _rebuild_mcp_client 加锁 (#3) — 加 _mcp_session_lock
  4. Env 空指针检查 (#4) — 加 nil check
  5. environment_variables key 名 (#5) — 确认下游是否已同步更改

Copy link
Copy Markdown
Collaborator

@lanmaoxinqing lanmaoxinqing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lanmaoxinqing lanmaoxinqing merged commit a4680ab into main Mar 16, 2026
1 check passed
@lanmaoxinqing lanmaoxinqing deleted the jun/metrics branch March 16, 2026 05:13
JacksonMei added a commit that referenced this pull request Mar 25, 2026
Resolve conflicts in main.go keeping admission control changes
alongside upstream PR #71 updates.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants