[3.5] server: fix isLearner metric not set after snapshot#21487
[3.5] server: fix isLearner metric not set after snapshot#21487mango766 wants to merge 1 commit intoetcd-io:release-3.5from
Conversation
After a learner receives a snapshot or restarts from a snapshot, the etcd_server_is_learner Prometheus metric stays at 0 because the conf change log entries that originally set it have been compacted. The metric was only updated in applyConfChange, which is never called for entries already folded into a snapshot. Fix this by setting the isLearner metric right after Recover() in both the server startup path (NewServer) and the snapshot apply path (applySnapshot). This ensures the metric reflects the member's actual learner status regardless of how the cluster state was loaded. Fixes etcd-io#17175 Signed-off-by: mango766 <mango766@users.noreply.github.com> Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com> Signed-off-by: easonysliu <easonysliu@tencent.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: mango766 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @mango766. Thanks for your PR. I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Summary
Backport of the fix for #17175 to
release-3.5. The original fix (#17176) was merged tomainbut the previous backport attempt (#17266) was closed without merging.After a learner node receives a snapshot (or restarts after one), the
etcd_server_is_learnerPrometheus metric stays at0instead of being set to1. This happens because the conf change entries that originally set the metric have been compacted into the snapshot, soapplyConfChangeis never called for them.Changes
isLearnermetric right afterRecover()inNewServer()(server startup/restart path)isLearnermetric right afterRecover()inapplySnapshot()(snapshot receive path)This is a simpler approach than the
mainfix — instead of moving the metric into the membership package, we just update it at the two call sites inserver.go. Both of the reporter's reproduction cases (learner joining after snapshot, learner restarting after snapshot) are covered.Fixes #17175