tests(tso): add dynamic switching integration tests#10478
tests(tso): add dynamic switching integration tests#10478YuhaoZhang00 wants to merge 4 commits intotikv:masterfrom
Conversation
Add integration tests for TSO dynamic switching between PD and TSO microservice, covering forward transition, fallback when the microservice stops, and leader transfer resilience. Also add a unit test for IsServiceIndependent. Signed-off-by: Yuhao Zhang <yhzhang00@outlook.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @YuhaoZhang00. Thanks for your PR. I'm waiting for a tikv member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
📝 WalkthroughWalkthroughAdded new tests exercising TSO dynamic switching and service-independence: a unit test for Server.IsServiceIndependent and multiple integration tests validating dynamic TSO enable/disable and leader-transfer behaviors, plus a helper to assert monotonic TSO responses. Changes
Sequence Diagram(s)(omitted) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
server/server_test.go (1)
29-40: BuildpersistOptionsfrom the target microservice config.Line 59 flips
s3.cfg.Microservice.EnableTSODynamicSwitchingafternewTestServerhas already createdpersistOptions, butIsServiceIndependentreads that flag throughs.GetMicroserviceConfig(). That makes this case depend oncfg/persistOptionsaliasing instead of the behavior under test. It would be more robust to pass the flag into the helper and set it beforeconfig.NewPersistOptions(cfg)runs.♻️ Suggested refactor
-func newTestServer(t *testing.T, keyspaceGroupEnabled bool) *Server { +func newTestServer(t *testing.T, keyspaceGroupEnabled, tsoDynamicSwitchingEnabled bool) *Server { t.Helper() cfg := config.NewConfig() + cfg.Microservice.EnableTSODynamicSwitching = tsoDynamicSwitchingEnabled s := &Server{ ctx: context.Background(), cfg: cfg, persistOptions: config.NewPersistOptions(cfg), isKeyspaceGroupEnabled: keyspaceGroupEnabled, } atomic.StoreInt64(&s.isRunning, 1) return s } @@ - s := newTestServer(t, false) + s := newTestServer(t, false, false) @@ - s2 := newTestServer(t, true) + s2 := newTestServer(t, true, false) @@ - s3 := newTestServer(t, true) - s3.cfg.Microservice.EnableTSODynamicSwitching = true + s3 := newTestServer(t, true, true)Also applies to: 58-60
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/server_test.go` around lines 29 - 40, newTestServer currently builds persistOptions before the microservice flag EnableTSODynamicSwitching is set, causing tests to rely on cfg/persistOptions aliasing; change newTestServer (and its callers) to accept an enableTSODynamicSwitching bool, set cfg.Microservice.EnableTSODynamicSwitching = enableTSODynamicSwitching on the cfg returned by config.NewConfig() before calling config.NewPersistOptions(cfg), so persistOptions is constructed from the intended microservice config; reference newTestServer, Server, config.NewConfig, config.NewPersistOptions, GetMicroserviceConfig, and IsServiceIndependent when making this change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/integrations/tso/client_test.go`:
- Around line 849-866: The loop doesn't assert leadership actually changed and
only checks TSO availability once; update the loop in the test to capture the
previous leader via pdCluster.WaitLeader(), call ResignLeaderWithRetry(), then
call pdCluster.WaitLeader() again and assert the new leader != previous leader
(use leaderName variable and compare), and invoke per-transfer availability
checks (either call utils.WaitForTSOServiceAvailable(ctx, re, pdClient) or the
existing checkTSO(...) around each transfer) before continuing; also keep the
ServiceIndependent assertion using
GetServer(...).GetServer().IsServiceIndependent(mcsconst.TSOServiceName) as you
already do.
---
Nitpick comments:
In `@server/server_test.go`:
- Around line 29-40: newTestServer currently builds persistOptions before the
microservice flag EnableTSODynamicSwitching is set, causing tests to rely on
cfg/persistOptions aliasing; change newTestServer (and its callers) to accept an
enableTSODynamicSwitching bool, set cfg.Microservice.EnableTSODynamicSwitching =
enableTSODynamicSwitching on the cfg returned by config.NewConfig() before
calling config.NewPersistOptions(cfg), so persistOptions is constructed from the
intended microservice config; reference newTestServer, Server, config.NewConfig,
config.NewPersistOptions, GetMicroserviceConfig, and IsServiceIndependent when
making this change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: fc5fc0eb-a84f-44d8-aa75-735ed49992f5
📒 Files selected for processing (2)
server/server_test.gotests/integrations/tso/client_test.go
|
/ok-to-test |
|
/retest |
Move WaitForTSOServiceAvailable inside the leader transfer loop so that TSO availability is verified per-iteration rather than only once after all transfers complete. Signed-off-by: Yuhao Zhang <yhzhang00@outlook.com>
There was a problem hiding this comment.
♻️ Duplicate comments (1)
tests/integrations/tso/client_test.go (1)
850-865:⚠️ Potential issue | 🟡 MinorCapture old leader and assert leadership actually changed.
The loop overwrites
leaderNamewithout first capturing it, so there's no assertion that leadership actually transferred to a different node. The same PD member could regain leadership, making the test less meaningful.🔧 Suggested fix
for range 2 { - leaderName = pdCluster.WaitLeader() - re.NotEmpty(leaderName) - err = pdCluster.GetServer(leaderName).ResignLeaderWithRetry() + oldLeaderName := pdCluster.WaitLeader() + re.NotEmpty(oldLeaderName) + err = pdCluster.GetServer(oldLeaderName).ResignLeaderWithRetry() re.NoError(err) leaderName = pdCluster.WaitLeader() re.NotEmpty(leaderName) + re.NotEqual(oldLeaderName, leaderName) // ServiceIndependent must remain set after leader transfer. newLeader := pdCluster.GetServer(leaderName)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integrations/tso/client_test.go` around lines 850 - 865, Before calling ResignLeaderWithRetry(), save the current leader into a distinct variable (e.g., oldLeader := leaderName) and after pdCluster.WaitLeader() completes, assert the returned leaderName is different from oldLeader to ensure leadership actually changed; then use pdCluster.GetServer(leaderName) (the newLeader) for the existing ServiceIndependent assertion (GetServer().IsServiceIndependent) and for utils.WaitForTSOServiceAvailable checks. Adjust references to leaderName/oldLeader so the resign call uses oldLeader and the subsequent assertions use the newLeader to verify the transfer.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@tests/integrations/tso/client_test.go`:
- Around line 850-865: Before calling ResignLeaderWithRetry(), save the current
leader into a distinct variable (e.g., oldLeader := leaderName) and after
pdCluster.WaitLeader() completes, assert the returned leaderName is different
from oldLeader to ensure leadership actually changed; then use
pdCluster.GetServer(leaderName) (the newLeader) for the existing
ServiceIndependent assertion (GetServer().IsServiceIndependent) and for
utils.WaitForTSOServiceAvailable checks. Adjust references to
leaderName/oldLeader so the resign call uses oldLeader and the subsequent
assertions use the newLeader to verify the transfer.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 79259e1b-cf56-4658-aca2-8c63bc6ae97b
📒 Files selected for processing (1)
tests/integrations/tso/client_test.go
|
/retest |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #10478 +/- ##
==========================================
+ Coverage 78.86% 78.93% +0.06%
==========================================
Files 529 530 +1
Lines 71102 71548 +446
==========================================
+ Hits 56072 56473 +401
- Misses 11014 11044 +30
- Partials 4016 4031 +15
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
|
/retest |
|
/test pull-unit-test-next-gen-3 |
JmPotato
left a comment
There was a problem hiding this comment.
Overall the test structure is good and covers the key dynamic switching scenarios (forward, fallback, leader transfer). However, there are two main issues:
- The
usePDServiceModefailpoint bypasses client-side service-mode discovery, so these tests only validate server-side behavior. The scope should be documented. - TSO monotonicity is not checked across switches —
WaitForTSOServiceAvailableonly proves eventual availability, not correctness.checkTSOMonotonic()should be used instead.
The TestIsServiceIndependent unit test is well-structured and covers the state matrix thoroughly.
| } | ||
|
|
||
| // TestDynamicSwitchingPDToTSO tests that when dynamic switching is enabled and a TSO | ||
| // microservice starts, PD stops serving TSO locally, sets ServiceIndependent, |
There was a problem hiding this comment.
All three new integration tests (TestDynamicSwitchingPDToTSO, TestDynamicSwitchingTSOToPDFallback, TestDynamicSwitchingWithLeaderTransfer) enable the usePDServiceMode failpoint, which pins the client to PD_SVC_MODE by short-circuiting updateServiceModeLoop(). This means the tests do not exercise the client-side service-mode discovery path at all.
If the intent is to test server-side dynamic switching behavior only, please add a comment clarifying this scope limitation and noting that client-side service-mode discovery is covered by TestTSOServiceSwitch in tests/integrations/mcs/tso/server_test.go.
Otherwise, if these tests are meant to be end-to-end dynamic switching tests, the failpoint should be removed.
| // Start TSO microservice. | ||
| tsoCluster, err := tests.NewTestTSOCluster(ctx, 1, backendEndpoints) | ||
| re.NoError(err) | ||
| defer tsoCluster.Destroy() |
There was a problem hiding this comment.
This test (and the two below) only uses WaitForTSOServiceAvailable() which is essentially Eventually(client.GetTS() == nil) — it proves TSO becomes available at some point, but does not verify that timestamps remain monotonically increasing across the switch.
The repo already has a stronger helper checkTSOMonotonic() (used in TestTSOServiceSwitch). Consider tracking a globalLastTS across the switch boundary and asserting monotonicity:
var globalLastTS uint64
re.NoError(checkTSOMonotonic(ctx, pdClient, &globalLastTS, 10)) // before switch
// ... start TSO microservice ...
re.NoError(checkTSOMonotonic(ctx, pdClient, &globalLastTS, 10)) // after switchWithout this, a timestamp regression during the switch would not be caught.
|
|
||
| // PD should start serving TSO. | ||
| utils.WaitForTSOServiceAvailable(ctx, re, pdClient) | ||
|
|
There was a problem hiding this comment.
nit: The loop doesn't assert that leadership actually changed after ResignLeaderWithRetry(). Consider capturing oldLeaderName before resign and asserting re.NotEqual(oldLeaderName, leaderName) to ensure the test is truly exercising a leader transfer scenario.
oldLeaderName := pdCluster.WaitLeader()
err = pdCluster.GetServer(oldLeaderName).ResignLeaderWithRetry()
re.NoError(err)
leaderName = pdCluster.WaitLeader()
re.NotEqual(oldLeaderName, leaderName)| tsoCluster.Destroy() | ||
|
|
||
| // PD should resume serving TSO locally. | ||
| testutil.Eventually(re, func() bool { |
There was a problem hiding this comment.
nit: Missing blank line between TestDynamicSwitchingTSOToPDFallback and TestDynamicSwitchingWithLeaderTransfer.
[LGTM Timeline notifier]Timeline:
|
…witching-integration-tests
- Add scope comments noting usePDServiceMode failpoint limits tests to server-side behavior; client-side discovery is covered elsewhere. - Replace WaitForTSOServiceAvailable + separate monotonicity check with unified waitAndCheckTSOMonotonic that validates every successful GetTS (including the first post-switchover sample) against globalLastTS. - Refactor newTestServer to accept tsoDynamicSwitchingEnabled param so persistOptions is built from the intended config, not pointer aliasing. - Add blank line between TestDynamicSwitchingTSOToPDFallback and TestDynamicSwitchingWithLeaderTransfer. Signed-off-by: Yuhao Zhang <yhzhang00@outlook.com>
There was a problem hiding this comment.
♻️ Duplicate comments (1)
tests/integrations/tso/client_test.go (1)
884-890:⚠️ Potential issue | 🟡 MinorAssert actual leader change after resignation.
Between Line 885 and Line 890, the test waits for a leader before and after resignation but never checks they differ. This can pass without a real transfer.
♻️ Proposed tightening
for range 2 { - leaderName = pdCluster.WaitLeader() - re.NotEmpty(leaderName) - err = pdCluster.GetServer(leaderName).ResignLeaderWithRetry() + oldLeaderName := pdCluster.WaitLeader() + re.NotEmpty(oldLeaderName) + err = pdCluster.GetServer(oldLeaderName).ResignLeaderWithRetry() re.NoError(err) leaderName = pdCluster.WaitLeader() re.NotEmpty(leaderName) + re.NotEqual(oldLeaderName, leaderName) // ServiceIndependent must remain set after leader resignation. newLeader := pdCluster.GetServer(leaderName)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integrations/tso/client_test.go` around lines 884 - 890, The test currently records leaderName before and after calling pdCluster.GetServer(leaderName).ResignLeaderWithRetry() but only asserts non-emptiness; update the test to assert the leader actually changed by checking that the new leaderName != old leaderName after pdCluster.WaitLeader() returns (use the same leaderName variable or a new one like newLeader and assert inequality), while retaining existing re.NotEmpty and re.NoError checks around pdCluster.WaitLeader() and ResignLeaderWithRetry().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@tests/integrations/tso/client_test.go`:
- Around line 884-890: The test currently records leaderName before and after
calling pdCluster.GetServer(leaderName).ResignLeaderWithRetry() but only asserts
non-emptiness; update the test to assert the leader actually changed by checking
that the new leaderName != old leaderName after pdCluster.WaitLeader() returns
(use the same leaderName variable or a new one like newLeader and assert
inequality), while retaining existing re.NotEmpty and re.NoError checks around
pdCluster.WaitLeader() and ResignLeaderWithRetry().
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 12a6f11a-388d-41f2-9474-98bd756c6231
📒 Files selected for processing (2)
server/server_test.gotests/integrations/tso/client_test.go
What problem does this PR solve?
Issue Number: ref #10329
What is changed and how does it work?
Add integration tests for TSO dynamic switching between PD and TSO microservice, covering forward transition, fallback when the microservice stops, and leader transfer resilience.
Also add a unit test for
IsServiceIndependent.Check List
Tests
Code changes
None
Side effects
None
Related changes
None
Release note
Summary by CodeRabbit