Describe the bug
We observed a NullPointerException in FixedSizeExemplarReservoir.offerDoubleMeasurement(...) while recording metrics through the OpenTelemetry Java SDK.
The exception message indicates that the storage array itself was non-null, but one of its elements was observed as null:
Cannot invoke "io.opentelemetry.sdk.metrics.internal.exemplar.ReservoirCell.recordDoubleMeasurement(double, io.opentelemetry.api.common.Attributes, io.opentelemetry.context.Context)" because "this.storage[bucket]" is null
FixedSizeExemplarReservoir lazily initializes ReservoirCell[] storage, but storage is neither volatile nor initialized under synchronization.
@Nullable private ReservoirCell[] storage;
@Override
public void offerDoubleMeasurement(double value, Attributes attributes, Context context) {
if (storage == null) {
storage = initStorage();
}
int bucket = reservoirCellSelector.reservoirCellIndexFor(storage, value, attributes, context);
if (bucket != -1) {
this.storage[bucket].recordDoubleMeasurement(value, attributes, context);
this.hasMeasurements = true;
}
}
initStorage() initializes every element:
private ReservoirCell[] initStorage() {
ReservoirCell[] storage = new ReservoirCell[this.size];
for (int i = 0; i < size; ++i) {
storage[i] = new ReservoirCell(this.clock);
}
return storage;
}
There does not appear to be any code path that sets storage[bucket] back to null after initialization. This looks like an unsafe race during concurrent first-time metric recording: another thread may observe the array reference without safely observing all element writes.
Steps to reproduce
I do not currently have a deterministic reproducer.
The issue occurred in a production Spring Boot workload under concurrent request handling while recording a repository invocation metric through Micrometer / OpenTelemetry metrics.
The observed stack path included:
OpenTelemetryTimer.recordNonNegative
MetricsRepositoryMethodInvocationListener.afterInvocation
FixedSizeExemplarReservoir.offerDoubleMeasurement
The failure appears to require concurrent metric recording during the first-time use of a lazily initialized exemplar reservoir. Since this appears to be a Java Memory Model unsafe publication race, it may be difficult to reproduce deterministically with a normal unit test.
What did you expect to see?
Concurrent metric recordings should not observe a partially initialized ReservoirCell[] storage.
FixedSizeExemplarReservoir.offerDoubleMeasurement(...) should either initialize the reservoir safely or use an already fully initialized reservoir, and metric recording should not throw.
What did you see instead?
Metric recording threw the following NullPointerException, and the application request failed with a 500:
java.lang.NullPointerException: Cannot invoke "io.opentelemetry.sdk.metrics.internal.exemplar.ReservoirCell.recordDoubleMeasurement(double, io.opentelemetry.api.common.Attributes, io.opentelemetry.context.Context)" because "this.storage[bucket]" is null
What version and what artifacts are you using?
Artifacts:
- OpenTelemetry Java agent via OpenTelemetry Operator auto-instrumentation
- OpenTelemetry Java SDK metrics, shaded inside the Java agent
- Micrometer / Spring Boot metrics bridge recording into OpenTelemetry metrics
Version:
- OpenTelemetry Java instrumentation:
2.27.0
- OpenTelemetry Java SDK used by instrumentation:
1.61.0
- Java auto-instrumentation image:
opentelemetry-operator/autoinstrumentation-java:2.27.0
I also checked opentelemetry-java v1.63.0 and current main, and FixedSizeExemplarReservoir appears to still have the same lazy initialization pattern.
How did you reference these artifacts?
The application uses Kubernetes auto-instrumentation:
instrumentation.opentelemetry.io/inject-java: addons-opentelemetry-operator/java-instrumentation
The injected init container image was:
opentelemetry-operator/autoinstrumentation-java:2.27.0
Environment
Compiler: not directly applicable; the application is instrumented at runtime.
Runtime:
- Spring Boot application running on Kubernetes
- OpenTelemetry Java auto-instrumentation agent
2.27.0
- OpenTelemetry Java SDK
1.61.0 bundled with the agent
OS:
- Amazon Linux container on Kubernetes with Amazon Corretto 25
Additional context
A possible fix would be to safely publish the lazily initialized array, for example with volatile and double-checked locking using a dedicated lock object:
@Nullable private volatile ReservoirCell[] storage;
private final Object storageLock = new Object();
private ReservoirCell[] getOrInitStorage() {
ReservoirCell[] currentStorage = storage;
if (currentStorage == null) {
synchronized (storageLock) {
currentStorage = storage;
if (currentStorage == null) {
currentStorage = initStorage();
storage = currentStorage;
}
}
}
return currentStorage;
}
Then offerDoubleMeasurement(...) / offerLongMeasurement(...) can use the returned local array for bucket selection and recording.
As a workaround, setting the exemplar filter to always_off should avoid this code path:
OTEL_METRICS_EXEMPLAR_FILTER=always_off
This keeps metrics and traces enabled, but disables metric exemplars.
Describe the bug
We observed a
NullPointerExceptioninFixedSizeExemplarReservoir.offerDoubleMeasurement(...)while recording metrics through the OpenTelemetry Java SDK.The exception message indicates that the
storagearray itself was non-null, but one of its elements was observed as null:FixedSizeExemplarReservoirlazily initializesReservoirCell[] storage, butstorageis neithervolatilenor initialized under synchronization.initStorage()initializes every element:There does not appear to be any code path that sets
storage[bucket]back to null after initialization. This looks like an unsafe race during concurrent first-time metric recording: another thread may observe the array reference without safely observing all element writes.Steps to reproduce
I do not currently have a deterministic reproducer.
The issue occurred in a production Spring Boot workload under concurrent request handling while recording a repository invocation metric through Micrometer / OpenTelemetry metrics.
The observed stack path included:
The failure appears to require concurrent metric recording during the first-time use of a lazily initialized exemplar reservoir. Since this appears to be a Java Memory Model unsafe publication race, it may be difficult to reproduce deterministically with a normal unit test.
What did you expect to see?
Concurrent metric recordings should not observe a partially initialized
ReservoirCell[] storage.FixedSizeExemplarReservoir.offerDoubleMeasurement(...)should either initialize the reservoir safely or use an already fully initialized reservoir, and metric recording should not throw.What did you see instead?
Metric recording threw the following
NullPointerException, and the application request failed with a 500:What version and what artifacts are you using?
Artifacts:
Version:
2.27.01.61.0opentelemetry-operator/autoinstrumentation-java:2.27.0I also checked
opentelemetry-javav1.63.0and currentmain, andFixedSizeExemplarReservoirappears to still have the same lazy initialization pattern.How did you reference these artifacts?
The application uses Kubernetes auto-instrumentation:
The injected init container image was:
Environment
Compiler: not directly applicable; the application is instrumented at runtime.
Runtime:
2.27.01.61.0bundled with the agentOS:
Additional context
A possible fix would be to safely publish the lazily initialized array, for example with
volatileand double-checked locking using a dedicated lock object:Then
offerDoubleMeasurement(...)/offerLongMeasurement(...)can use the returned local array for bucket selection and recording.As a workaround, setting the exemplar filter to
always_offshould avoid this code path:This keeps metrics and traces enabled, but disables metric exemplars.