Bug Report
What did you do?
ResourceOperations.removeFinalizer ultimately calls client.resource(r).edit(unaryOperator) via ManagedInformerEventSource.eventFilteringUpdateAndCacheResource. When that edit() returns null (fabric8's behavior on HTTP 404 when the target is gone), the result is fed unguarded into TemporaryResourceCache.putResource, which calls ResourceID.fromResource(null) and throws.
The code path that handles this null is missing; see ManagedInformerEventSource.java:101-103 on current main (5.3.4-SNAPSHOT). For comparison, removeFinalizer's own retry-precondition guard already null-checks the resource and returns gracefully (ResourceOperations.java:636-639); this one does not.
Concrete observed reproduction: a controller implementing Reconciler + Cleaner, using KubernetesMockServer (fabric8 7.6.1) in test teardown. The mock server honours finalizers, so the typical sequence we observed was:
- Test calls
client.resources(...).inNamespace(ns).delete().
- Mock server sets
deletionTimestamp and keeps the resource (finalizer present).
- Informer emits an update event; JOSDK runs
handleCleanup; removeFinalizer patches the finalizer off successfully.
- Mock server actually removes the resource.
- A subsequent cleanup pass runs (the dispatcher had a superseding event queued — could be the delete event or another update; logs show
deleteEvent=false, so in this case it was an update event). This pass's patch reaches the API server after the resource is gone; edit() returns null; NPE.
We have not characterized this on a real cluster — no production observations either way. The bug is reported because the code path is unguarded against a documented null return from fabric8, and we have a deterministic in-process reproducer (below).
What did you expect to see?
A clean return (no exception), matching the existing precondition guard in ResourceOperations.removeFinalizer's retry path:
if (r == null) {
log.warn("Cannot remove finalizer since resource not exists.");
return false;
}
What did you see instead? Under which circumstances?
NullPointerException thrown from TemporaryResourceCache.putResource, surfaced as a WARN in EventProcessor ("Uncaught error during event processing ... another reconciliation will be attempted because a superseding event has been received").
java.lang.NullPointerException: Cannot invoke "io.fabric8.kubernetes.api.model.HasMetadata.getMetadata()" because "resource" is null
at io.javaoperatorsdk.operator.processing.event.ResourceID.fromResource(ResourceID.java:28)
at io.javaoperatorsdk.operator.processing.event.source.informer.TemporaryResourceCache.putResource(TemporaryResourceCache.java:180)
at io.javaoperatorsdk.operator.processing.event.source.informer.ManagedInformerEventSource.eventFilteringUpdateAndCacheResource(ManagedInformerEventSource.java:103)
at io.javaoperatorsdk.operator.api.reconciler.ResourceOperations.resourcePatch(ResourceOperations.java:567)
at io.javaoperatorsdk.operator.api.reconciler.ResourceOperations.jsonPatchPrimary(ResourceOperations.java:409)
at io.javaoperatorsdk.operator.api.reconciler.ResourceOperations.conflictRetryingPatchPrimary(ResourceOperations.java:668)
at io.javaoperatorsdk.operator.api.reconciler.ResourceOperations.removeFinalizer(ResourceOperations.java:630)
Triggering circumstances:
client.resources(...).inNamespace(ns).delete() is called against a resource whose reconciler manages a finalizer.
- Mock server (and real API server) honour the finalizer: deletionTimestamp is set, resource still present.
- Informer emits an update event with
deletionTimestamp set.
- JOSDK runs
handleCleanup and removeFinalizer patches the finalizer off successfully.
- Mock server removes the resource.
- A second cleanup pass is dispatched (superseding event already queued). This time
edit() returns null and the NPE fires.
The relevant call site (ManagedInformerEventSource.eventFilteringUpdateAndCacheResource, current main / 5.3.4-SNAPSHOT):
updatedResource = updateMethod.apply(resourceToUpdate); // can return null
log.debug("Resource update successful");
handleRecentResourceUpdate(id, updatedResource, resourceToUpdate); // NPE: no null guard
handleRecentResourceUpdate forwards to temporaryResourceCache.putResource(null), which fails at ResourceID.fromResource(null).
Environment
Kubernetes cluster type:
vanilla; reproducer is also deterministic against fabric8 KubernetesMockServer.
$ Mention java-operator-sdk version from pom.xml file
Originally observed on 5.3.3; reproducer (below) also confirmed against current main / 5.3.4-SNAPSHOT.
$ java -version
$ kubectl version
N/A (reproduced via KubernetesMockServer and via direct unit test; not cluster-dependent).
Possible Solution
Null-guard the call in ManagedInformerEventSource.eventFilteringUpdateAndCacheResource:
updatedResource = updateMethod.apply(resourceToUpdate);
if (updatedResource != null) {
log.debug("Resource update successful");
handleRecentResourceUpdate(id, updatedResource, resourceToUpdate);
} else {
log.debug("Update returned null, skipping cache update");
}
return updatedResource;
The existing finally block already null-checks updatedResource for version tracking, so only this direct call path needs the guard.
Additional context
Reproducer test (no cluster required), demonstrating the call path from removeFinalizer all the way down to the NPE:
@Test
void removeFinalizerHandlesAlreadyDeletedTargetGracefully() {
// When the underlying patch returns null because the target resource is already gone
// on the API server (fabric8's edit() returns null on 404), removeFinalizer must
// return cleanly rather than throw.
var resource = TestUtils.testCustomResource1();
resource.getMetadata().setResourceVersion("1");
resource.addFinalizer(FINALIZER_NAME);
when(context.getPrimaryResource()).thenReturn(resource);
var client = MockKubernetesClient.client(TestCustomResource.class);
NamespaceableResource<TestCustomResource> single = mock();
when(single.edit(any(UnaryOperator.class))).thenReturn(null);
when(client.resource(any(TestCustomResource.class))).thenReturn(single);
var eventSource = new ControllerEventSource<>(testController(client));
eventSource.start();
try {
when(context.getClient()).thenReturn(client);
var eventSourceRetriever = mock(EventSourceRetriever.class);
when(context.eventSourceRetriever()).thenReturn(eventSourceRetriever);
when(eventSourceRetriever.getControllerEventSource()).thenReturn(eventSource);
var result = resourceOperations.removeFinalizer(FINALIZER_NAME);
assertThat(result).isNull();
} finally {
eventSource.stop();
}
}
A second, narrower reproducer at the ManagedInformerEventSource layer:
@Test
void eventFilteringUpdateHandlesNullFromUpdateMethodGracefully() {
withRealTemporaryResourceCache();
var result = informerEventSource.eventFilteringUpdateAndCacheResource(testDeployment(), r -> null);
assertThat(result).isNull();
}
Both fail today with the NPE shown above and pass with the proposed null-guard.
Related prior issue (different flow, same class of bug): #1987 — fixed in the old ReconciliationDispatcher flow; the 5.x rewrite moved the logic to ResourceOperations + ManagedInformerEventSource and regressed null-safety on the cache-update side.
Bug Report
What did you do?
ResourceOperations.removeFinalizerultimately callsclient.resource(r).edit(unaryOperator)viaManagedInformerEventSource.eventFilteringUpdateAndCacheResource. When thatedit()returnsnull(fabric8's behavior on HTTP 404 when the target is gone), the result is fed unguarded intoTemporaryResourceCache.putResource, which callsResourceID.fromResource(null)and throws.The code path that handles this null is missing; see
ManagedInformerEventSource.java:101-103on currentmain(5.3.4-SNAPSHOT). For comparison,removeFinalizer's own retry-precondition guard already null-checks the resource and returns gracefully (ResourceOperations.java:636-639); this one does not.Concrete observed reproduction: a controller implementing
Reconciler+Cleaner, usingKubernetesMockServer(fabric8 7.6.1) in test teardown. The mock server honours finalizers, so the typical sequence we observed was:client.resources(...).inNamespace(ns).delete().deletionTimestampand keeps the resource (finalizer present).handleCleanup;removeFinalizerpatches the finalizer off successfully.deleteEvent=false, so in this case it was an update event). This pass's patch reaches the API server after the resource is gone;edit()returnsnull; NPE.We have not characterized this on a real cluster — no production observations either way. The bug is reported because the code path is unguarded against a documented null return from fabric8, and we have a deterministic in-process reproducer (below).
What did you expect to see?
A clean return (no exception), matching the existing precondition guard in
ResourceOperations.removeFinalizer's retry path:What did you see instead? Under which circumstances?
NullPointerExceptionthrown fromTemporaryResourceCache.putResource, surfaced as a WARN inEventProcessor("Uncaught error during event processing ... another reconciliation will be attempted because a superseding event has been received").Triggering circumstances:
client.resources(...).inNamespace(ns).delete()is called against a resource whose reconciler manages a finalizer.deletionTimestampset.handleCleanupandremoveFinalizerpatches the finalizer off successfully.edit()returnsnulland the NPE fires.The relevant call site (
ManagedInformerEventSource.eventFilteringUpdateAndCacheResource, current main / 5.3.4-SNAPSHOT):handleRecentResourceUpdateforwards totemporaryResourceCache.putResource(null), which fails atResourceID.fromResource(null).Environment
Kubernetes cluster type:
vanilla; reproducer is also deterministic against fabric8
KubernetesMockServer.$ Mention java-operator-sdk version from pom.xml fileOriginally observed on
5.3.3; reproducer (below) also confirmed against currentmain/5.3.4-SNAPSHOT.$ java -version$ kubectl versionN/A (reproduced via
KubernetesMockServerand via direct unit test; not cluster-dependent).Possible Solution
Null-guard the call in
ManagedInformerEventSource.eventFilteringUpdateAndCacheResource:The existing
finallyblock already null-checksupdatedResourcefor version tracking, so only this direct call path needs the guard.Additional context
Reproducer test (no cluster required), demonstrating the call path from
removeFinalizerall the way down to the NPE:A second, narrower reproducer at the
ManagedInformerEventSourcelayer:Both fail today with the NPE shown above and pass with the proposed null-guard.
Related prior issue (different flow, same class of bug): #1987 — fixed in the old
ReconciliationDispatcherflow; the 5.x rewrite moved the logic toResourceOperations+ManagedInformerEventSourceand regressed null-safety on the cache-update side.