upgrade grpc to v1.73#1779
Conversation
|
Welcome @joechenrh! |
|
/hold |
2bcb5a5 to
9bd9fb0
Compare
|
/ok-to-test |
|
@joechenrh: Cannot trigger testing until a trusted user reviews the PR and leaves an DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/unhold |
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
|
/run-all-tests |
|
Previously, grpc was upgraded to 1.64 and reverted (by #1369) due to errors in TiFlash regression tests. So I keep using old API I've ran a test internally with this branch. The plan ID is 7972306, and you can check the result. (Although I suspect that error is not related to grpc API change because it seems like we couldn't connect to |
|
This would fix #1789, right? |
📝 WalkthroughWalkthroughUpgrades gRPC/protobuf-related modules; replaces shared buffer pool with mem.NewTieredBufferPool or mem.NopBufferPool depending on flag; changes batch encoding to use pointer-backed slices; adds an internal legacy protobuf codec for older grpc-go behavior. No public API changes. Changes
Sequence Diagram(s)(Skipped) Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@internal/resourcecontrol/resource_control.go`:
- Around line 184-190: The code reads r.Data for a *tikvrpc.CopStreamResponse
even when r.Response can be nil, which can panic; update the CopStreamResponse
handling so the readBytes assignment is guarded by the same nil-check that sets
detailsV2/details (i.e., move readBytes = uint64(len(...)) inside the if
r.Response != nil block or use r.Response.Data instead of r.Data), ensuring
detailsV2, details and readBytes are only accessed when r.Response is non-nil.
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
d98bf71 to
fcf6d0c
Compare
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
|
|
||
| // globalEncodedMsgDataPool is used to pool pre-encoded message data for batch commands. | ||
| var globalEncodedMsgDataPool = grpc.NewSharedBufferPool() | ||
| var globalEncodedMsgDataPool = mem.NewTieredBufferPool( |
There was a problem hiding this comment.
NewSharedBufferPool is replaced with NewTieredBufferPool.
- Previous size tier: 16B, 256B, 4KB, 64KB, 1MB
- New size tier: 256B, 4KB, 16KB, 32KB, 1MB
| }, opts...) | ||
| if cfg.TiKVClient.GrpcSharedBufferPool { | ||
| opts = append(opts, experimental.WithRecvBufferPool(grpc.NewSharedBufferPool())) | ||
| if !cfg.TiKVClient.GrpcSharedBufferPool { |
There was a problem hiding this comment.
After grpc-go 1.66, the default pool was changed from noop pool to defaultBufferPool
https://github.com/grpc/grpc-go/blob/c7ec4d9ae3281bc57a8adce59b572e56965fb728/dialoptions.go#L705-L718
Here we just align with the old behavior: use noop pool by default.
| Name: connName, | ||
| } | ||
| //nolint:SA1019 | ||
| conn.ClientConn, err = grpc.DialContext(ctx, target, opts...) |
There was a problem hiding this comment.
Keep using the old (deprecated) API to avoid potential TiFlash problems.
Ref: pingcap/tiflash#9159
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@examples/txnkv/1pc_txn/go.mod`:
- Line 48: The gRPC upgrade to v1.73.0 enabled the least_request LB policy by
default which can cause RPCs to hang and produce DEADLINE_EXCEEDED/Unavailable
errors; mitigate by setting the environment variable
GRPC_EXPERIMENTAL_ENABLE_LEAST_REQUEST=false in the test/run environment to
revert to previous behavior, or explicitly configure your service config/xDS to
use a simpler policy like "pick_first" or "round_robin"; also verify your xDS
resources and endpoint READY state and, if you must keep least_request, increase
RPC deadlines (client-side timeout values) during connection warmup to avoid
premature deadline expiry.
In `@examples/txnkv/pessimistic_txn/go.mod`:
- Around line 46-49: The gRPC upgrade to google.golang.org/grpc v1.73.0 tightens
handling of grpc-timeout and rejects non-positive timeout header values and
changes the xDS locality ID metric label; audit any code that sets timeouts
(places constructing grpc metadata headers or passing timeout strings—look for
usages building "grpc-timeout" header, grpc.Dial options, context.WithTimeout
calls) to ensure no non-positive or malformed timeout values are sent (use
positive durations and proper formatting), and update any monitoring/dashboards
or metric label usage that referenced the old xDS locality id label to the new
label name; also confirm module line google.golang.org/grpc v1.73.0 in go.mod is
intentional.
In `@integration_tests/go.mod`:
- Around line 146-151: Update the OpenTelemetry v1.35.0 migration checklist:
ensure CI/build uses Go ≥ 1.22 (verify Go toolchain configuration used by CI
jobs), audit any Prometheus exporter metric name validation changes (check usage
of LegacyValidation vs NoEscaping and update dashboards/alerts referencing
metric names), search for direct imports of semantic conventions (e.g.,
go.opentelemetry.io/otel/semconv/v1.27.0) and update them to a compatible
v1.28.0 or v1.30.0 module path, replace any usages of moved internal logging
types (references to sdk/log/internal/... ) with the public API at
go.opentelemetry.io/otel/sdk/log, and retest application startup/shutdown and
global provider interactions to verify compatibility with the added
auto-instrumentation (auto/sdk v1.1.0 and related otelgrpc/otelhttp
dependencies).
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bufferflies, zyguan The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
Why upgrade grpc
Because we are using some expermential API that was already renamed. This will block those libraries that depend on this to use newer version of grpc. And we have introduce github.com/apache/arrow-go/v18 to TiDB repo, which requires google.golang.org/grpc@v1.73
What has be changed
WithRecvBufferPoolwithWithBufferPool.sharedBytes(tipb #32, kvproto #142) works.TiFlash test
This part is to verify if the previous issue (#9159) still exists. You can check the result on tcms.
sysbench test
This part is to verify that there won't be performance degradation after upgrading grpc version.
For range scan, the QPS/latency are almost same for both version. The memory for 1.73 is slightly reduced (4.09G->3.98G), while the gc rate increases (0.25->0.3), which is an expected result by introducing
sync.Poolinside the grpc-go.For point lookup, both version show almost no difference in QPS/lantence/memory.
Summary by CodeRabbit
Chores
New Features
Performance
✏️ Tip: You can customize this high-level summary in your review settings.