Skip to content

upgrade grpc to v1.73#1779

Merged
ti-chi-bot[bot] merged 13 commits intotikv:masterfrom
joechenrh:try-bump-grpc
Jan 27, 2026
Merged

upgrade grpc to v1.73#1779
ti-chi-bot[bot] merged 13 commits intotikv:masterfrom
joechenrh:try-bump-grpc

Conversation

@joechenrh
Copy link
Copy Markdown
Contributor

@joechenrh joechenrh commented Oct 30, 2025

Why upgrade grpc

Because we are using some expermential API that was already renamed. This will block those libraries that depend on this to use newer version of grpc. And we have introduce github.com/apache/arrow-go/v18 to TiDB repo, which requires google.golang.org/grpc@v1.73

What has be changed

  • Change WithRecvBufferPool with WithBufferPool.
  • Use old codec in grpc 1.63 to make sharedBytes(tipb #32, kvproto #142) works.

TiFlash test

This part is to verify if the previous issue (#9159) still exists. You can check the result on tcms.

sysbench test

This part is to verify that there won't be performance degradation after upgrading grpc version.

# range scan
sysbench oltp_read_only \
  --db-driver=mysql --mysql-host=172.16.4.180 --mysql-port=32003 \
  --mysql-user=root --mysql-db=test \
  --tables=1 --table-size=100000000 \
  --threads=16 --time=1800 --report-interval=10 \
  --rand-type=uniform --range_size=4096 --range_selects=on \
  --point_selects=0 \
  --db-ps-mode=disable run

For range scan, the QPS/latency are almost same for both version. The memory for 1.73 is slightly reduced (4.09G->3.98G), while the gc rate increases (0.25->0.3), which is an expected result by introducing sync.Pool inside the grpc-go.

image image
# point lookup
sysbench oltp_read_only \
  --db-driver=mysql --mysql-host=172.16.4.180 --mysql-port=32003 \
  --mysql-user=root --mysql-db=test \
  --tables=1 --table-size=100000000 \
  --threads=16 --time=1800 --report-interval=10 \
  --rand-type=uniform --range_size=4096 --range_selects=on \
  --db-ps-mode=disable run

For point lookup, both version show almost no difference in QPS/lantence/memory.

image image

Summary by CodeRabbit

  • Chores

    • Upgraded gRPC and Protocol Buffer dependencies for improved stability and compatibility across releases.
    • Updated example modules to align with the dependency upgrades.
  • New Features

    • Added a legacy protobuf codec to preserve compatibility with older gRPC/protobuf usage.
  • Performance

    • Reworked buffer pooling to reduce memory overhead and improve throughput.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
@ti-chi-bot ti-chi-bot Bot added dco-signoff: yes Indicates the PR's author has signed the dco. contribution This PR is from a community contributor. first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. labels Oct 30, 2025
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Oct 30, 2025

Welcome @joechenrh!

It looks like this is your first PR to tikv/client-go 🎉.

I'm the bot to help you request reviewers, add labels and more, See available commands.

We want to make sure your contribution gets all the attention it needs!



Thank you, and welcome to tikv/client-go. 😃

@ti-chi-bot ti-chi-bot Bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Oct 30, 2025
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
@joechenrh
Copy link
Copy Markdown
Contributor Author

/hold

@ti-chi-bot ti-chi-bot Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 30, 2025
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
@ti-chi-bot ti-chi-bot Bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Oct 31, 2025
@joechenrh joechenrh changed the title upgrade grpc to v1.66 upgrade grpc to v1.73 Nov 4, 2025
@joechenrh
Copy link
Copy Markdown
Contributor Author

/ok-to-test

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Nov 4, 2025

@joechenrh: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

Details

In response to this:

/ok-to-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@joechenrh
Copy link
Copy Markdown
Contributor Author

/unhold

@ti-chi-bot ti-chi-bot Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 4, 2025
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
@joechenrh
Copy link
Copy Markdown
Contributor Author

/run-all-tests

@joechenrh
Copy link
Copy Markdown
Contributor Author

joechenrh commented Nov 4, 2025

Previously, grpc was upgraded to 1.64 and reverted (by #1369) due to errors in TiFlash regression tests. So I keep using old API DialContext here.

I've ran a test internally with this branch. The plan ID is 7972306, and you can check the result.


rpc error: code = DeadlineExceeded desc = received context error while waiting for new LB policy update: context deadline exceeded
rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 10.200.86.26:3930: i/o timeout

(Although I suspect that error is not related to grpc API change because it seems like we couldn't connect to 10.200.86.26:3930. That is, the first mistake was the result, and the second mistake was the cause, as resolver didn't provide any address to LB.)

@joechenrh joechenrh mentioned this pull request Nov 4, 2025
13 tasks
@kennytm kennytm mentioned this pull request Nov 13, 2025
@dveeden
Copy link
Copy Markdown
Contributor

dveeden commented Nov 23, 2025

This would fix #1789, right?

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jan 20, 2026

📝 Walkthrough

Walkthrough

Upgrades gRPC/protobuf-related modules; replaces shared buffer pool with mem.NewTieredBufferPool or mem.NopBufferPool depending on flag; changes batch encoding to use pointer-backed slices; adds an internal legacy protobuf codec for older grpc-go behavior. No public API changes.

Changes

Cohort / File(s) Summary
Module files
go.mod, examples/*/go.mod, integration_tests/go.mod
Bumped google.golang.org/grpc to v1.73.0 and updated indirect versions for google.golang.org/genproto/... and google.golang.org/protobuf (v1.36.6); other example module require blocks updated similarly.
Conn pool / Dial
internal/client/conn_pool.go
Added import google.golang.org/grpc/mem; added //nolint:SA1019 on grpc.DialContext; when shared buffer pool is disabled, set ForceCodec to legacyCodec{} and use experimental.WithBufferPool(mem.NopBufferPool{}).
Batch encoding & pooling
internal/client/client_batch.go
Replaced global shared buffer pool with mem.NewTieredBufferPool(...); marshal into pointer-backed slice (*data), adjust Put/Get to use the pointer-held buffer and update size checks and assignments accordingly.
Legacy codec
internal/client/legacy_codec.go
New unexported legacyCodec implementing Marshal, Unmarshal, Name, and helper messageV2Of to adapt proto v1→v2 for pre-1.66 grpc-go expectations.

Sequence Diagram(s)

(Skipped)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

size/XL

Suggested reviewers

  • ekexium
  • cfzjywxk
  • lhy1024

Poem

🐰 I hopped through modules, buffers anew,
Old codecs whispered; tiered pools grew,
Pointer slices twirled in a neat ballet,
Nop pools waited when flags turned away,
🥕 Code binkies — forward we chew!

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title "upgrade grpc to v1.73" is concise and clearly identifies the main change: a direct dependency upgrade from gRPC v1.63.2 to v1.73.0 across multiple go.mod files and related code adjustments.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@internal/resourcecontrol/resource_control.go`:
- Around line 184-190: The code reads r.Data for a *tikvrpc.CopStreamResponse
even when r.Response can be nil, which can panic; update the CopStreamResponse
handling so the readBytes assignment is guarded by the same nil-check that sets
detailsV2/details (i.e., move readBytes = uint64(len(...)) inside the if
r.Response != nil block or use r.Response.Data instead of r.Data), ensuring
detailsV2, details and readBytes are only accessed when r.Response is non-nil.

Comment thread internal/resourcecontrol/resource_control.go Outdated
@ti-chi-bot ti-chi-bot Bot added dco-signoff: no Indicates the PR's author has not signed dco. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed dco-signoff: yes Indicates the PR's author has signed the dco. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 21, 2026
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
@ti-chi-bot ti-chi-bot Bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 21, 2026
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>

// globalEncodedMsgDataPool is used to pool pre-encoded message data for batch commands.
var globalEncodedMsgDataPool = grpc.NewSharedBufferPool()
var globalEncodedMsgDataPool = mem.NewTieredBufferPool(
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewSharedBufferPool is replaced with NewTieredBufferPool.

  • Previous size tier: 16B, 256B, 4KB, 64KB, 1MB
  • New size tier: 256B, 4KB, 16KB, 32KB, 1MB

}, opts...)
if cfg.TiKVClient.GrpcSharedBufferPool {
opts = append(opts, experimental.WithRecvBufferPool(grpc.NewSharedBufferPool()))
if !cfg.TiKVClient.GrpcSharedBufferPool {
Copy link
Copy Markdown
Contributor Author

@joechenrh joechenrh Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After grpc-go 1.66, the default pool was changed from noop pool to defaultBufferPool
https://github.com/grpc/grpc-go/blob/c7ec4d9ae3281bc57a8adce59b572e56965fb728/dialoptions.go#L705-L718
Here we just align with the old behavior: use noop pool by default.

Name: connName,
}
//nolint:SA1019
conn.ClientConn, err = grpc.DialContext(ctx, target, opts...)
Copy link
Copy Markdown
Contributor Author

@joechenrh joechenrh Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep using the old (deprecated) API to avoid potential TiFlash problems.
Ref: pingcap/tiflash#9159

Comment thread go.mod
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@examples/txnkv/1pc_txn/go.mod`:
- Line 48: The gRPC upgrade to v1.73.0 enabled the least_request LB policy by
default which can cause RPCs to hang and produce DEADLINE_EXCEEDED/Unavailable
errors; mitigate by setting the environment variable
GRPC_EXPERIMENTAL_ENABLE_LEAST_REQUEST=false in the test/run environment to
revert to previous behavior, or explicitly configure your service config/xDS to
use a simpler policy like "pick_first" or "round_robin"; also verify your xDS
resources and endpoint READY state and, if you must keep least_request, increase
RPC deadlines (client-side timeout values) during connection warmup to avoid
premature deadline expiry.

In `@examples/txnkv/pessimistic_txn/go.mod`:
- Around line 46-49: The gRPC upgrade to google.golang.org/grpc v1.73.0 tightens
handling of grpc-timeout and rejects non-positive timeout header values and
changes the xDS locality ID metric label; audit any code that sets timeouts
(places constructing grpc metadata headers or passing timeout strings—look for
usages building "grpc-timeout" header, grpc.Dial options, context.WithTimeout
calls) to ensure no non-positive or malformed timeout values are sent (use
positive durations and proper formatting), and update any monitoring/dashboards
or metric label usage that referenced the old xDS locality id label to the new
label name; also confirm module line google.golang.org/grpc v1.73.0 in go.mod is
intentional.

In `@integration_tests/go.mod`:
- Around line 146-151: Update the OpenTelemetry v1.35.0 migration checklist:
ensure CI/build uses Go ≥ 1.22 (verify Go toolchain configuration used by CI
jobs), audit any Prometheus exporter metric name validation changes (check usage
of LegacyValidation vs NoEscaping and update dashboards/alerts referencing
metric names), search for direct imports of semantic conventions (e.g.,
go.opentelemetry.io/otel/semconv/v1.27.0) and update them to a compatible
v1.28.0 or v1.30.0 module path, replace any usages of moved internal logging
types (references to sdk/log/internal/... ) with the public API at
go.opentelemetry.io/otel/sdk/log, and retest application startup/shutdown and
global provider interactions to verify compatibility with the added
auto-instrumentation (auto/sdk v1.1.0 and related otelgrpc/otelhttp
dependencies).

Comment thread examples/txnkv/1pc_txn/go.mod
Comment thread examples/txnkv/pessimistic_txn/go.mod
Comment thread integration_tests/go.mod
@ti-chi-bot ti-chi-bot Bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jan 27, 2026
@ti-chi-bot ti-chi-bot Bot added the lgtm label Jan 27, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Jan 27, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bufferflies, zyguan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added approved and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Jan 27, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Jan 27, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-01-27 03:44:22.492429676 +0000 UTC m=+1070290.106386532: ☑️ agreed by bufferflies.
  • 2026-01-27 09:28:34.547418662 +0000 UTC m=+1090942.161375518: ☑️ agreed by zyguan.

@ti-chi-bot ti-chi-bot Bot merged commit f436a52 into tikv:master Jan 27, 2026
4 checks passed
wshwsh12 pushed a commit to wshwsh12/client-go that referenced this pull request Feb 5, 2026
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
wshwsh12 pushed a commit to wshwsh12/client-go that referenced this pull request Feb 5, 2026
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
wshwsh12 pushed a commit to wshwsh12/client-go that referenced this pull request Feb 5, 2026
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
wshwsh12 pushed a commit to wshwsh12/client-go that referenced this pull request Feb 10, 2026
 

Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
wshwsh12 pushed a commit to wshwsh12/client-go that referenced this pull request Feb 10, 2026
 

Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved contribution This PR is from a community contributor. dco-signoff: yes Indicates the PR's author has signed the dco. first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. lgtm size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants