fix(controller): Deterministically select master pod#437
fix(controller): Deterministically select master pod#437ashotland merged 31 commits intodragonflydb:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request addresses a race condition in the Dragonfly operator where multiple pods could simultaneously be promoted to master. The fix introduces deterministic master pod selection to ensure only one pod becomes master at a time.
- Adds
selectMasterCandidatefunction to deterministically select the lowest-ordinal ready pod as master - Implements role verification to detect and fix labeled masters running as replicas
- Introduces
PhaseConfiguringstate with recovery logic to prevent stuck configurations
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| internal/controller/util.go | Adds getOrdinal and selectMasterCandidate functions for deterministic master selection |
| internal/controller/dragonfly_pod_lifecycle_controller.go | Updates master selection logic to use selectMasterCandidate, adds role verification for labeled masters, and enables recovery from PhaseConfiguring state |
| internal/controller/dragonfly_instance.go | Modifies checkAndConfigureReplicas to return configuration status, adds getRedisRole function, and updates phase transition logic |
| e2e/dragonfly_pod_lifecycle_controller_test.go | Adds test to verify recovery from stuck PhaseConfiguring phase |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func getOrdinal(podName string) int { | ||
| ordinal, err := strconv.Atoi(strings.Split(podName, "-")[1]) | ||
| if err != nil { | ||
| return -1 |
There was a problem hiding this comment.
Returning -1 on error could lead to unexpected behavior in selectMasterCandidate. A pod with a parsing error would be treated as having ordinal -1, which would always be less than valid ordinals (0, 1, 2, etc.), potentially selecting an invalid pod as the master candidate. Consider returning a larger sentinel value like math.MaxInt or handling the error more explicitly in the caller.
There was a problem hiding this comment.
either check that getOrdinal is not -1 before using in selectMasterCandidate or return MaxInt as suggested here.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: kathleen xue <kathleen@shaped.ai>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: kathleen xue <kathleen@shaped.ai>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: kathleen xue <kathleen@shaped.ai>
| role, err := dfi.getRedisRole(ctx, master) | ||
| if err != nil { | ||
| log.Error(err, "failed to get redis role for labeled master", "pod", master.Name) | ||
| } else if role == resources.Replica { | ||
| log.Info("Pod labeled as master is running as replica. Promoting it.", "pod", master.Name) | ||
| if err := dfi.replicaOfNoOne(ctx, master); err != nil { | ||
| return ctrl.Result{}, fmt.Errorf("failed to promote master: %w", err) | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
I am concerned with this logic and I think this is redundant - we already handle this in checkAndConfigureReplicas function. I don't see any reason to put it here. Why do you think the checkAndConfigureReplicas function is not enough?
There was a problem hiding this comment.
I think the checkAndConfigureReplicas function only handles if the replica pods are configured correctly (not rogue, connected to the right master). But this logic checks if the master is unintentionally running as a replica, so it accounts for issues with the master pod that checkAndConfigureReplicas doesn't check.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: kathleen xue <kathleen@shaped.ai>
Signed-off-by: kathleen xue <kathleen@shaped.ai>
|
Made a few more changes to ensure no race conditions @Abhra303 if you could review again. Summary of changes:
|
|
hey @Abhra303 would appreciate another review here 🙏 |
|
gentle bump @Abhra303 |
ashotland
left a comment
There was a problem hiding this comment.
Thanks for this PR, sorry to take it a step back, please see some comments
| } | ||
| } | ||
|
|
||
| masterPod, err := dfi.getMaster(ctx) |
There was a problem hiding this comment.
what do you getMaster again ?
also seem to call dfi.replicaOfNoOne(ctx, master) below
can we just consistently use the master variable from line 77?
There was a problem hiding this comment.
changed masterPod, err := dfi.getMaster(ctx) to use the existing master variable consistently
There was a problem hiding this comment.
I still see masterPod, err := dfi.getMaster(ctx)
why can't you use master ?
There was a problem hiding this comment.
should we use patch here too ?
| continue | ||
| } | ||
|
|
||
| if !isPodOnLatestVersion(&pod, updateRevision) && !isTerminating(&pod) && !isRunningAndReady(&pod) { |
There was a problem hiding this comment.
!isTerminating is now redundant ?
|
|
||
| for i := range pods { | ||
| p := &pods[i] | ||
| // We can't use isReady() because the readiness probe might not pass until replication is configured. |
There was a problem hiding this comment.
but seems we check isReady right after calling this function
also, IIUC, in case pod 0 will be slow to become ready we'll keep requeuing while there might be other ready pods, which is a considerable deviation from current behaviour.
can we do something like the following instead :
func selectMasterCandidate(pods []corev1.Pod, isReady func(*corev1.Pod) bool) *corev1.Pod {
var best *corev1.Pod
for i := range pods {
p := &pods[i]
if !isReady(p) {
continue
}
if best == nil || getOrdinal(p.Name) < getOrdinal(best.Name) {
best = p
}
}
return best
}
There was a problem hiding this comment.
makes sense, updated as per this
| return ctrl.Result{}, nil | ||
| } | ||
|
|
||
| masterReady, err := dfi.isPodReady(ctx, masterCandidate) |
There was a problem hiding this comment.
I believe this check is redundant as the master Candidate must be ready
| } | ||
| } | ||
|
|
||
| masterPod, err := dfi.getMaster(ctx) |
There was a problem hiding this comment.
I still see masterPod, err := dfi.getMaster(ctx)
why can't you use master ?
| return err | ||
| } | ||
|
|
||
| dfiStatus.Phase = PhaseReady |
There was a problem hiding this comment.
this seems like a behaviour change that doesn't seems related to the goal of this pr
currently we move to ready once master is configured and serving and I am not sure we should change this
generally it'd be great to limit the scope to changes that are required to achieve the purpose of the PR.
There was a problem hiding this comment.
this was to deal with the test DF Pod Lifecycle Reconciler Fail Over is working which needs something that handles a pod-event while in PhaseConfiguring before it transitions to PhaseReady. but i've reverted this change in favor of adding PhaseConfiguring back to the gate condition, and after checkAndConfigureReplicas succeeds, transition from PhaseConfiguring to PhaseReady.
There was a problem hiding this comment.
Thanks but you still moved dfiStatus.Phase = PhaseReady to later
which is why I think you added changes in DfPodLifeCycleReconciler to handle 'stuck' PhaseConfiguring
I still don't understand why this is needed for the purpose of this pr and why we can't we keep dfiStatus.Phase = PhaseReady in it's original state
| log.Info("non-deletion event for a pod with an existing role. checking if something is wrong", "pod", pod.Name, "role", pod.Labels[resources.RoleLabelKey]) | ||
|
|
||
| if err = dfi.checkAndConfigureReplicas(ctx, master.Status.PodIP); err != nil { | ||
| if allConfigured, err := dfi.checkAndConfigureReplicas(ctx, master.Status.PodIP); err != nil { |
There was a problem hiding this comment.
not sure I understand why we need this change
also once we revert back to PahseReady on master is configured dfi.getStatus().Phase == PhaseConfiguring
below become dead code
There was a problem hiding this comment.
check my comment here, but reverted this logic
| return err | ||
| } | ||
|
|
||
| dfiStatus.Phase = PhaseReady |
There was a problem hiding this comment.
Thanks but you still moved dfiStatus.Phase = PhaseReady to later
which is why I think you added changes in DfPodLifeCycleReconciler to handle 'stuck' PhaseConfiguring
I still don't understand why this is needed for the purpose of this pr and why we can't we keep dfiStatus.Phase = PhaseReady in it's original state
| Expect(podRoles[resources.Replica]).To(HaveLen(replicas - 1)) | ||
| }) | ||
|
|
||
| It("Should recover from stuck Configuring phase", func() { |
There was a problem hiding this comment.
given the other comments this test may no longer be required
| } | ||
|
|
||
| // selectMasterCandidate deterministically selects a master candidate from the given list of pods. | ||
| func selectMasterCandidate(pods []corev1.Pod, isReady func(*corev1.Pod) bool) *corev1.Pod { |
There was a problem hiding this comment.
how about a unit test for this function ?
right, it's not needed anymore, so I've moved it back to its original place, thanks. |
ashotland
left a comment
There was a problem hiding this comment.
thanks a lot looks good now, please just update the branch and resolve conflicts
Signed-off-by: kathleen xue <kathleen@shaped.ai>
Abhra303
left a comment
There was a problem hiding this comment.
LGTM, please check my comment. Should be ready to merge after this.
| role, err := dfi.getRedisRole(ctx, master) | ||
| if err != nil { | ||
| log.Info("failed to verify master status in redis (ignoring)", "error", err) | ||
| } else if role == resources.Replica { |
There was a problem hiding this comment.
Can you please use else if role != resources.Master. Some dragonfly versions returns slave instead of replica as the role.
Deterministically select the next master pod. Context: we were hitting an issue where the operator was failing to reconcile because there was a race condition where multiple pods were not finding a healthy master, then getting promoted to master simultaneously and trying to mark each other as replicas.