fix: preserve counter metric types in OTLP#198
Conversation
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughThis PR introduces explicit metric type awareness (counter vs gauge) throughout the metrics pipeline: type constants and field, a settable counter collector, scraper-type detection, SQLite persistence and migration for metric types, OTLP export routing to Sum/Gauge, and metric-definition updates to use counter collectors. ChangesMetric Type System
Sequence Diagram(s)sequenceDiagram
participant Collector as Metric Collector
participant Scraper as Prometheus Scraper
participant Store as SQLite Store
participant Exporter as OTLP Exporter
Collector->>Scraper: expose Prometheus metric family
Scraper->>Scraper: detect counter vs gauge -> set Metric.Type
Scraper->>Store: write Metric with Type
Store-->>Store: ensure metric_type column / migration
Exporter->>Store: read Metric with Type
Store->>Exporter: return Metric with Type
Exporter->>Exporter: convertMetricToOTLP -> Sum (counter) or Gauge (gauge)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
third_party/fleet-intelligence-sdk/pkg/metrics/scraper/prometheus.go (1)
71-82:⚠️ Potential issue | 🟠 Major | ⚡ Quick winSkip unsupported metric kinds before appending.
When a gathered metric is not counter/gauge,
m.Typeandm.Valueremain zero-values but the row is still appended. That emits bogus datapoints downstream (later defaulted/exported as gauge).Suggested fix
// for now, only support counter and gauge switch { case mtRaw.GetCounter() != nil: m.Type = pkgmetrics.MetricTypeCounter m.Value = mtRaw.GetCounter().GetValue() case mtRaw.GetGauge() != nil: m.Type = pkgmetrics.MetricTypeGauge m.Value = mtRaw.GetGauge().GetValue() + default: + continue } ms = append(ms, m)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@third_party/fleet-intelligence-sdk/pkg/metrics/scraper/prometheus.go` around lines 71 - 82, The loop currently always appends m even when mtRaw is not a counter or gauge, producing zero-valued bogus datapoints; update the metric parsing in prometheus.go (the switch handling mtRaw.GetCounter()/GetGauge() that sets m.Type and m.Value) to skip unsupported kinds by continuing the loop when neither case matches (i.e., if mtRaw.GetCounter() == nil && mtRaw.GetGauge() == nil then continue) so only properly populated metrics (pkgmetrics.MetricTypeCounter or pkgmetrics.MetricTypeGauge) are appended to ms.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@third_party/fleet-intelligence-sdk/pkg/metrics/store/sqlite.go`:
- Around line 277-309: ensureMetricTypeColumn currently interpolates the table
variable directly into PRAGMA and ALTER TABLE SQL which risks SQL injection and
broken identifiers; validate the table name with a strict identifier regexp
(e.g. ^[A-Za-z_][A-Za-z0-9_]*$) and/or reject unsafe names, then build a
properly quoted identifier by escaping any internal double-quotes and wrapping
the name in double quotes before using it in fmt.Sprintf; update uses in
ensureMetricTypeColumn (both the PRAGMA table_info(...) and ALTER TABLE ... ADD
COLUMN ...) to use the validated/quoted identifier and keep references to
columnMetricType and pkgmetrics.MetricTypeGauge unchanged.
---
Outside diff comments:
In `@third_party/fleet-intelligence-sdk/pkg/metrics/scraper/prometheus.go`:
- Around line 71-82: The loop currently always appends m even when mtRaw is not
a counter or gauge, producing zero-valued bogus datapoints; update the metric
parsing in prometheus.go (the switch handling mtRaw.GetCounter()/GetGauge() that
sets m.Type and m.Value) to skip unsupported kinds by continuing the loop when
neither case matches (i.e., if mtRaw.GetCounter() == nil && mtRaw.GetGauge() ==
nil then continue) so only properly populated metrics
(pkgmetrics.MetricTypeCounter or pkgmetrics.MetricTypeGauge) are appended to ms.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 989ad40d-1da4-4108-b0b1-d51db8aaeca0
📒 Files selected for processing (16)
internal/exporter/converter/otlp.gointernal/exporter/converter/otlp_test.gothird_party/fleet-intelligence-sdk/components/accelerator/nvidia/dcgm/mem/metrics.gothird_party/fleet-intelligence-sdk/components/accelerator/nvidia/dcgm/nvlink/metrics.gothird_party/fleet-intelligence-sdk/components/accelerator/nvidia/dcgm/pcie/metrics.gothird_party/fleet-intelligence-sdk/components/accelerator/nvidia/dcgm/power/metrics.gothird_party/fleet-intelligence-sdk/components/accelerator/nvidia/dcgm/thermal/metrics.gothird_party/fleet-intelligence-sdk/components/accelerator/nvidia/infiniband/metrics.gothird_party/fleet-intelligence-sdk/components/network/ethernet/metrics.gothird_party/fleet-intelligence-sdk/pkg/metrics/scraper/prometheus.gothird_party/fleet-intelligence-sdk/pkg/metrics/scraper/prometheus_test.gothird_party/fleet-intelligence-sdk/pkg/metrics/settable_const_metric.gothird_party/fleet-intelligence-sdk/pkg/metrics/settable_const_metric_test.gothird_party/fleet-intelligence-sdk/pkg/metrics/store/sqlite.gothird_party/fleet-intelligence-sdk/pkg/metrics/store/sqlite_test.gothird_party/fleet-intelligence-sdk/pkg/metrics/types.go
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Description
Checklist
Summary by CodeRabbit
New Features
Improvements
Tests