Skip to content

feat(metrics): enhance the monitoring metrics of rpc call failure under Triple protocol#3189

Open
Saramanda9988 wants to merge 10 commits intoapache:developfrom
Saramanda9988:enhance-metrics
Open

feat(metrics): enhance the monitoring metrics of rpc call failure under Triple protocol#3189
Saramanda9988 wants to merge 10 commits intoapache:developfrom
Saramanda9988:enhance-metrics

Conversation

@Saramanda9988
Copy link

Description

according to https://cn.dubbo.apache.org/en/overview/reference/metrics/standard_metrics/

Implementing a more fine-grained error classification mechanism for the triple protocol to classify and count RPC request failures for both provider and consumer.

Support classification: timeout, flow restriction, service unavailability, business failure and unknown error, and their aggregate version which aline with dubbo-java.

preview:
ca7b92f3ec2e8b5feafc783e2835dd41
a9a3c59f1e963a66148a46fc26fdd8b8

Checklist

  • I confirm the target branch is develop
  • Code has passed local testing
  • I have added tests that prove my fix is effective or that my feature works

@Saramanda9988
Copy link
Author

Currently, It seems that the Dubbo protocol in dubbo-go only propagates plain errors without structured error codes and there is no unified abstraction to align error semantics across protocols. Any further handling would have to rely on comparing error text, which is not a good approach. Therefore, this part is not implemented in this PR.

@codecov-commenter
Copy link

codecov-commenter commented Feb 2, 2026

Codecov Report

❌ Patch coverage is 24.19355% with 47 lines in your changes missing coverage. Please review.
✅ Project coverage is 48.41%. Comparing base (60d1c2a) to head (8254c38).
⚠️ Report is 726 commits behind head on develop.

Files with missing lines Patch % Lines
metrics/rpc/collector.go 0.00% 27 Missing ⚠️
metrics/rpc/metric_set.go 0.00% 20 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3189      +/-   ##
===========================================
+ Coverage    46.76%   48.41%   +1.64%     
===========================================
  Files          295      462     +167     
  Lines        17172    33498   +16326     
===========================================
+ Hits          8031    16217    +8186     
- Misses        8287    15973    +7686     
- Partials       854     1308     +454     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances RPC metrics under the Triple protocol by introducing fine‑grained error classification and associated counters for both providers and consumers, and slightly relaxes some registry metrics time assertions to be less brittle.

Changes:

  • Extend rpcCommonMetrics and providerMetrics/consumerMetrics to include per‑error‑type counters (timeout, limit, service unavailable, business failure, unknown) and their aggregate variants.
  • Introduce error_classifier.go to map Triple/gRPC error codes into internal ErrorType values, and update rpcCollector.afterInvokeHandler to classify failures and increment the corresponding granular counters per side.
  • Adjust metrics/registry/event_test.go time assertions to accept End == Start by checking !End.Before(start) instead of requiring End.After(start).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
metrics/rpc/metric_set.go Adds new counter and aggregate counter vectors for each granular error type on both provider and consumer metric sets.
metrics/rpc/error_classifier.go Introduces ErrorType enum and classifyError helper to translate Triple error codes into high‑level categories used by metrics.
metrics/rpc/collector.go Wires error classification into the after‑invoke path and updates per‑type failure counters for provider/consumer roles.
metrics/registry/event_test.go Relaxes event timing expectations to avoid flakiness by only asserting that End is not before Start.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Saramanda9988
Copy link
Author

@Alanxtl Done

Copy link
Contributor

@Alanxtl Alanxtl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool work, lgtm

@sonarqubecloud
Copy link

sonarqubecloud bot commented Feb 5, 2026

@Saramanda9988
Copy link
Author

@AlexStocks Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants