Skip to content

feat(probe): add Kubernetes probe support with liveness, readiness, and startup checks#3213

Open
Alanxtl wants to merge 4 commits intoapache:developfrom
Alanxtl:develop
Open

feat(probe): add Kubernetes probe support with liveness, readiness, and startup checks#3213
Alanxtl wants to merge 4 commits intoapache:developfrom
Alanxtl:develop

Conversation

@Alanxtl
Copy link
Contributor

@Alanxtl Alanxtl commented Feb 13, 2026

This is the implemention of #2039
which is the rewritten of #3047
usage are demonstrated in apache/dubbo-go-samples#1033
docs are written in apache/dubbo-website#3193

Kubernetes 探针(Probe)功能说明

本模块提供独立的 HTTP 探针服务,面向 Kubernetes 的 livenessreadinessstartup 三类探针。
它支持用户自定义健康检查逻辑,并可选择性地与 Dubbo Server 生命周期进行内部状态对齐。

设计目标

  1. 可扩展:通过回调注册自定义检查逻辑。
  2. 可控风险liveness 默认不带内部逻辑,避免不当重启。
  3. 生命周期对齐readiness/startup 可选用内部状态。

默认 HTTP 路径

当启用 probe 后,默认在22222端口下暴露以下路径:

  • GET /live:liveness 探针
  • GET /ready:readiness 探针
  • GET /startup:startup 探针

响应规则:

  • 所有检查通过:HTTP 200
  • 任一检查失败:HTTP 503

new api 配置方式

通过 metrics.NewOptions(...) 传入以下 Option 配置:

ins, err := dubbo.NewInstance(
  dubbo.WithMetrics(
    metrics.WithProbeEnabled(),
    metrics.WithProbePort(22222),
    metrics.WithProbeLivenessPath("/live"),
    metrics.WithProbeReadinessPath("/ready"),
    metrics.WithProbeStartupPath("/startup"),
    metrics.WithProbeUseInternalState(true),
  ),
)

old api 配置方式

metrics 配置下新增 probe 子配置:

metrics:
  probe:
    enabled: true
    port: "22222"
    liveness-path: "/live"
    readiness-path: "/ready"
    startup-path: "/startup"
    use-internal-state: true

配置项说明:

  • enabled:是否开启 probe 服务
  • port:probe HTTP 端口
  • liveness-path:liveness 路径
  • readiness-path:readiness 路径
  • startup-path:startup 路径
  • use-internal-state:是否启用内部生命周期状态检查,默认启用

内部状态(UseInternalState)

use-internal-state: true 时,探针会附加内部状态检查:

  • readiness 依赖 probe.SetReady(true/false)
  • startup 依赖 probe.SetStartupComplete(true/false)

默认行为:

  • 应用启动完成后(Server.Serve() 成功执行)会设置 ready=truestartup=true
  • 应用优雅关闭时会将 ready=false

如果设置为 false,则完全由用户注册的回调决定探针结果。

自定义健康检查(推荐)

通过注册回调即可扩展探针逻辑:

import "dubbo.apache.org/dubbo-go/v3/metrics/probe"

// liveness 例子
probe.RegisterLiveness("db", func(ctx context.Context) error {
    // 检查数据库连接
    return nil
})

// readiness 例子
probe.RegisterReadiness("cache", func(ctx context.Context) error {
    // 检查缓存或依赖中间件
    return nil
})

// startup 例子
probe.RegisterStartup("warmup", func(ctx context.Context) error {
    // 检查预热逻辑是否完成
    return nil
})

注意事项

  • liveness 风险:liveness 失败会触发 Pod 重启,请谨慎设置,推荐仅用作进程/核心依赖检测。
  • readiness 适配:可以关联注册中心、数据库、缓存、下游依赖等健康状态。
  • startup 适配:建议用于冷启动、预热或依赖初始化场景。

Kubernetes 示例

livenessProbe:
  httpGet:
    path: /live
    port: 22222
  initialDelaySeconds: 5
  periodSeconds: 5
readinessProbe:
  httpGet:
    path: /ready
    port: 22222
  initialDelaySeconds: 5
  periodSeconds: 5
startupProbe:
  httpGet:
    path: /startup
    port: 22222
  failureThreshold: 30
  periodSeconds: 10

Description

Fixes # (issue)

Checklist

  • I confirm the target branch is develop
  • Code has passed local testing
  • I have added tests that prove my fix is effective or that my feature works

@sonarqubecloud
Copy link

@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 46.42857% with 105 lines in your changes missing coverage. Please review.
✅ Project coverage is 47.93%. Comparing base (60d1c2a) to head (53a5f19).
⚠️ Report is 735 commits behind head on develop.

Files with missing lines Patch % Lines
metrics/probe/server.go 0.00% 57 Missing ⚠️
metrics/options.go 0.00% 19 Missing ⚠️
config/metric_config.go 0.00% 8 Missing and 1 partial ⚠️
global/metric_config.go 63.63% 5 Missing and 3 partials ⚠️
compat.go 81.81% 2 Missing and 2 partials ⚠️
metrics/probe/probe.go 87.09% 2 Missing and 2 partials ⚠️
server/options.go 0.00% 1 Missing and 1 partial ⚠️
server/server.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3213      +/-   ##
===========================================
+ Coverage    46.76%   47.93%   +1.17%     
===========================================
  Files          295      467     +172     
  Lines        17172    33943   +16771     
===========================================
+ Hits          8031    16272    +8241     
- Misses        8287    16355    +8068     
- Partials       854     1316     +462     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expose readiness and liveness apis so the process's status can be detected by the scheduling cluster like K8S.

2 participants