[ibm-mq-metrics] move to produce metrics via MetricProducer#2836
[ibm-mq-metrics] move to produce metrics via MetricProducer#2836atoulme wants to merge 6 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR replaces the OpenTelemetry Meter-based instrumentation in the ibm-mq-metrics module with a custom MetricProducer implementation that builds MetricData directly and registers itself with the SDK's MeterProvider via registerMetricProducer. The Metrics.java weaver template (which generated Meter-based factory methods) is replaced by a MetricProducer.java template that generates one record* method per metric, each appending a fully-formed MetricData to an internal list that the SDK drains on each collection cycle.
Changes:
- New
MetricProducerand (auto-value-backed)MetricDataclasses, plus updated weaver template/config;Metrics.java.j2removed. - All collectors (
ChannelMetricsCollector,QueueManagerMetricsCollector,QueueCollectionBuddy, etc.),WmqMonitor, andMainare reworked to take aMetricProducerinstead of aMeterand callproducer.record*directly. - All unit/integration tests are migrated off
OpenTelemetryExtension/Meterto construct aMetricProducerand assert onproducer.produce(Resource.empty()).
Reviewed changes
Copilot reviewed 28 out of 29 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
ibm-mq-metrics/templates/registry/java/weaver.yaml |
Switches generation to the new MetricProducer.java.j2 template. |
ibm-mq-metrics/templates/registry/java/Metrics.java.j2 |
Deleted (replaced by MetricProducer.java.j2). |
ibm-mq-metrics/templates/registry/java/MetricProducer.java.j2 |
New template generating one record* method per metric. |
ibm-mq-metrics/src/main/java/io/opentelemetry/ibm/mq/metrics/MetricProducer.java |
Generated producer class implementing the SDK MetricProducer SPI. |
ibm-mq-metrics/src/main/java/io/opentelemetry/ibm/mq/metrics/MetricData.java |
AutoValue-backed MetricData with a createMetricData factory. |
ibm-mq-metrics/src/main/java/io/opentelemetry/ibm/mq/WmqMonitor.java |
Drops Meter/Long* fields, uses producer directly for heartbeat & connection-error metrics. |
ibm-mq-metrics/src/main/java/io/opentelemetry/ibm/mq/opentelemetry/Main.java |
Wires the producer into the auto-configured SDK via registerMetricProducer. |
ibm-mq-metrics/src/main/java/.../metricscollector/ChannelMetricsCollector.java |
Replaces gauge fields with producer recorder calls. |
ibm-mq-metrics/src/main/java/.../metricscollector/InquireChannelCmdCollector.java |
Replaces gauge fields with producer recorder calls. |
ibm-mq-metrics/src/main/java/.../metricscollector/InquireQueueManagerCmdCollector.java |
Replaces gauge field with producer recorder call. |
ibm-mq-metrics/src/main/java/.../metricscollector/InquireTStatusCmdCollector.java |
Replaces gauge fields with producer recorder calls. |
ibm-mq-metrics/src/main/java/.../metricscollector/ListenerMetricsCollector.java |
Replaces gauge field with producer recorder call. |
ibm-mq-metrics/src/main/java/.../metricscollector/PerformanceEventQueueCollector.java |
Replaces counter fields with producer recorder calls. |
ibm-mq-metrics/src/main/java/.../metricscollector/QueueCollectionBuddy.java |
Switches AllowedGauge registry from LongGauge to BiConsumer<Long,Attributes> recorder. |
ibm-mq-metrics/src/main/java/.../metricscollector/QueueManagerEventCollector.java |
Replaces counter field with producer recorder call. |
ibm-mq-metrics/src/main/java/.../metricscollector/QueueManagerMetricsCollector.java |
Replaces gauge fields with producer recorder calls. |
ibm-mq-metrics/src/main/java/.../metricscollector/QueueMetricsCollector.java |
Constructor now takes a MetricProducer. |
ibm-mq-metrics/src/main/java/.../metricscollector/ReadConfigurationEventQueueCollector.java |
Replaces gauge field with producer recorder call. |
ibm-mq-metrics/src/main/java/.../metricscollector/TopicMetricsCollector.java |
Constructor now takes a MetricProducer. |
ibm-mq-metrics/src/test/java/.../*CollectorTest.java (5 files) |
Migrated tests to construct a MetricProducer directly and assert on its produce() output. |
ibm-mq-metrics/src/integrationTest/java/.../TestWMQMonitor.java |
Holds a MetricProducer instead of a Meter. |
ibm-mq-metrics/src/integrationTest/java/.../WMQMonitorIntegrationTest.java |
Replaces OpenTelemetryExtension with a directly-instantiated MetricProducer. |
ibm-mq-metrics/build.gradle.kts |
Adds AutoValue annotations + processor for MetricData. |
ibm-mq-metrics/config.yml |
Trailing newline added. |
Comments suppressed due to low confidence (4)
ibm-mq-metrics/src/main/java/io/opentelemetry/ibm/mq/metrics/MetricProducer.java:902
- The
produce()method copiesmetricDatainto a new list and then clears it as two separate operations. AlthoughmetricDatais asynchronizedList, the copy-then-clear sequence is not atomic: anyrecord*invocation that occurs between thenew ArrayList<>(this.metricData)(which internally synchronizes on the list) andthis.metricData.clear()will be silently dropped. Additionally, iterating the synchronized list inside theArrayListcopy constructor without an explicitsynchronized(this.metricData)block can throwConcurrentModificationExceptionif a recorder thread mutates the list during iteration. Wrap both operations in a singlesynchronized(this.metricData)block (or use a different concurrent structure such as draining viaConcurrentLinkedQueue).
public List<MetricData> produce(Resource resource) {
List<MetricData> collectedPoints = new ArrayList<>(this.metricData);
this.metricData.clear();
this.currentEpochNanos = Clock.getDefault().nanoTime();
return collectedPoints;
}
ibm-mq-metrics/src/main/java/io/opentelemetry/ibm/mq/metrics/MetricProducer.java:902
- The
Resourceparameter passed by the SDK toproduce(Resource resource)is ignored — the producer always exports metrics tagged with theResourceprovided in the constructor. Per theMetricProducerSPI contract this argument is the resource the SDK wants the producer to associate metrics with (typically the SDK's own resource), so ignoring it can cause exported metrics to carry a different/empty resource than other metrics from the same SDK pipeline. Either use the suppliedresourcewhen building theMetricData, or document why it's deliberately ignored.
@Override
public List<MetricData> produce(Resource resource) {
List<MetricData> collectedPoints = new ArrayList<>(this.metricData);
this.metricData.clear();
this.currentEpochNanos = Clock.getDefault().nanoTime();
return collectedPoints;
}
ibm-mq-metrics/src/main/java/io/opentelemetry/ibm/mq/metrics/MetricProducer.java:56
- Each
record*call allocates a singletonList, aLongPointData, aGaugeData/SumData, and aMetricDataobject, plus a synchronized-listadd(and aClock.getDefault()lookup). Previously,LongGauge.set/LongCounter.addcost essentially nothing on the hot path, with aggregation done at collection time. For collectors that fire frequently or with many attribute combinations this is a meaningful regression in CPU and GC pressure. Consider aggregating points per (name, attributes) so that only one point per metric stream is emitted per collection cycle, similar to what the SDK aggregator did before.
public void recordIbmMqMessageRetryCount(long value, Attributes attributes) {
this.metricData.add(
createMetricData(
this.resource,
this.instrumentationScopeInfo,
"ibm.mq.message.retry.count",
"Number of message retries",
"{message}",
MetricDataType.LONG_GAUGE,
GaugeData.createLongGaugeData(
Collections.singletonList(
LongPointData.create(
this.currentEpochNanos,
Clock.getDefault().nanoTime(),
attributes,
value)))));
}
ibm-mq-metrics/src/main/java/io/opentelemetry/ibm/mq/metrics/MetricProducer.java:292
- Counter-style metrics (
recordIbmMq*Event,recordIbmMqUnauthorizedEvent,recordIbmMqConnectionErrors) emit oneLongPointDataper call carrying only the per-call delta value (e.g.1), but they are declared asAggregationTemporality.CUMULATIVEandisMonotonic=true. A cumulative sum point is supposed to carry the total sincestartEpochNanos, not a delta; with this implementation each export interval will emit several independent "cumulative" points all of value 1 with the same start/end (or with monotonic-time values), which downstream cumulative-temporality consumers will interpret incorrectly (e.g., as resets to 1 each interval). Either pre-aggregate per (name, attributes) before emission, or declare these asAggregationTemporality.DELTA.
public void recordIbmMqQueueDepthFullEvent(long value, Attributes attributes) {
this.metricData.add(
createMetricData(
this.resource,
this.instrumentationScopeInfo,
"ibm.mq.queue.depth.full.event",
"The number of full queue events",
"{event}",
MetricDataType.LONG_SUM,
SumData.createLongSumData(
/* isMonotonic= */ true,
AggregationTemporality.CUMULATIVE,
Collections.singletonList(
LongPointData.create(
this.currentEpochNanos,
Clock.getDefault().nanoTime(),
attributes,
value)))));
}
…etricProducer.java Co-authored-by: jason plumb <75337021+breedx-splk@users.noreply.github.com>
| public List<MetricData> produce(Resource resource) { | ||
| List<MetricData> collectedPoints = new ArrayList<>(this.metricData); | ||
| this.metricData.clear(); | ||
| this.currentEpochNanos = Clock.getDefault().nanoTime(); |
There was a problem hiding this comment.
| this.currentEpochNanos = Clock.getDefault().nanoTime(); | |
| this.currentEpochNanos = Clock.getDefault().now(); |
| List<MetricData> collectedPoints = new ArrayList<>(this.metricData); | ||
| this.metricData.clear(); |
There was a problem hiding this comment.
Unfortunately, this is not thread safe.
It's possible for other methods to add data points to the metricData as the new array is being copied but before the clear(). There is real risk of those data points being dropped.
Two possible ways that I can of solving this:
- ditch the
Collections.synchronizedListfor a plainArrayList, create a mutex object, and explicitly protect all accesses thru the mutex. It's a little cumbersome. - change
metricDatafromListtoAtomicReference<List<...>>and instead of clearing it here usemetricDataReference.getAndSet(new ArrayList<>())to atomically swap out the reference and get the underlying (old) list for returning.
Of those, I think 2 is a bit cleaner.
|
I don't fully understand this, but I'll share in case this makes sense to you @atoulme: it feels like a type of aggregation is being lost there, but I'm not sure. Other than that, this looks good. I'm honestly surprised at how much cleaner it makes things! 🥳 |
Description:
Move away from the meter, and use the SDK API to create a MetricProducer to produce data points to be directly sent to the backend.