SLING-13169 - Fix JobManager readiness condition to preserve topology on transient probe failures#54
Conversation
|
@daniancu can you please check the Sonar summary at https://sonarcloud.io/summary/new_code?id=apache_sling-org-apache-sling-event&pullRequest=54 and see if we can improve the rating? Thanks! |
|
@daniancu is there already an issue for this at https://issues.apache.org/jira? |
| final TopologyCapabilities caps = this.topologyCapabilities; | ||
| final boolean active = caps != null && isJobProcessingEnabled(); |
There was a problem hiding this comment.
| final TopologyCapabilities caps = this.topologyCapabilities; | |
| final boolean active = caps != null && isJobProcessingEnabled(); | |
| final boolean active = this.topologyCapabilities != null && isJobProcessingEnabled(); |
I would collapse these 2 statements into one.
|
I agree that right now the semantics of |
48001b9 to
92cbf76
Compare
…pology on transient probe failures Remove stopProcessing() from unbindJobProcessingEnabledCondition() so that topology state survives readiness condition removal. Update notifyListeners() and addListener() to send combined state (topology AND readiness) ensuring all listeners — including JobSchedulerImpl — correctly stop when the readiness condition is absent. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Ai-Assisted-By: claude
92cbf76 to
44d02d3
Compare
…pology on transient probe failures Remove stopProcessing() from unbindJobProcessingEnabledCondition() so that topology state survives readiness condition removal. Update notifyListeners() and addListener() to send combined state (topology AND readiness) ensuring all listeners — including JobSchedulerImpl — correctly stop when the readiness condition is absent. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Ai-Assisted-By: claude
|
|



Summary
stopProcessing()fromunbindJobProcessingEnabledCondition()so topology state survives readiness condition removal — the core fix enabling recovery after transient probe blipsnotifyListeners()andaddListener()to send combined state (caps != null && isJobProcessingEnabled()) ensuring all listeners — includingJobSchedulerImpl— correctly stop when readiness condition is absentaddListener()combined-state contract, and rapid toggle stressContext
When the readiness condition is removed (e.g. during a transient K8S probe failure),
unbindJobProcessingEnabledCondition()calledstopProcessing()which destroyedtopologyCapabilities. When the condition returned,notifyListeners()sentactive=false(because topology was null) and job processing never resumed — requiring an unrelated topology event to recover.Additionally,
notifyListeners()andaddListener()only sentcaps != nullto listeners, ignoring the readiness condition state. This meantJobSchedulerImplwould continue scheduling jobs even when the readiness condition was absent.Changes
unbindJobProcessingEnabledCondition(): RemovestopProcessing()call — preserve topology state so recovery is immediate when the condition returnsnotifyListeners(): Sendcaps != null && isJobProcessingEnabled()instead of justcaps != null— all listeners see the correct combined stateaddListener(): Same combined-state logic for the initial callback on subscription — ensures freshly registered listeners don't become active when the condition is absentTest plan
testConditionTogglePreservesTopology— verifies topology survives unbind/rebind and listeners get correct active statetestAddListenerSendsCombinedState— verifiesaddListener()sendsfalsewhen topology exists but condition is absent (JobSchedulerImpl startup path)testRapidConditionToggle— verifies topology and state consistency after rapid toggles