Skip to content

feat: Add Elasticsearch Transform Metrics & Benchmarking (PR#2)#12

Merged
ricardozanini merged 12 commits into
mainfrom
feat/elasticsearch-transform-metrics-pr2
May 15, 2026
Merged

feat: Add Elasticsearch Transform Metrics & Benchmarking (PR#2)#12
ricardozanini merged 12 commits into
mainfrom
feat/elasticsearch-transform-metrics-pr2

Conversation

@ricardozanini
Copy link
Copy Markdown

Summary

Adds Prometheus-compatible metrics for Elasticsearch transforms and comprehensive performance benchmarking to verify smart filtering scales efficiently.

Implementation

Metrics Collection

  • ElasticsearchTransformMetricsCollector - Scheduled job polls Transform Stats API every 30s
  • 5 Micrometer gauges per transform (2 transforms total):
    • documents_processed - Total documents processed
    • documents_indexed - Total documents indexed
    • lag - Processing lag (processed - indexed)
    • state - Transform state (0=stopped, 1=started, 2=failed, -1=unknown)
    • last_checkpoint - Last checkpoint timestamp

Configuration

data-index.metrics.transform.enabled=true  # Enable/disable metrics
data-index.metrics.transform.poll-interval=30s  # Poll frequency

Testing

  • 3 integration tests - Verify metrics collection, Prometheus endpoint, periodic updates
  • 2 performance benchmarks - Verify smart filtering scales, lag stays low under load
  • All tests passing (25 total, 0 failures)

Documentation

  • Updated TRANSFORM_OPTIMIZATION.md with comprehensive metrics guide
  • Added Grafana dashboard examples and alert rules
  • Updated CLAUDE.md with metrics section

Performance Results

Scaling Test:

  • Phase 1: 1K events, 90% terminal (old)
  • Phase 2: 11K events, 90% terminal (old)
  • Result: 1.00x processing time (no slowdown, smart filtering working)

Lag Test:

  • 1K events inserted rapidly
  • Lag observed: 14 documents (well below 100 threshold)
  • Lag stable over time

Integration Test Results

Metrics Enabled:

  • ✅ Service starts successfully
  • ✅ All 10 metrics exposed at /q/metrics
  • ✅ Both transforms showing state=1.0 (started)

Metrics Disabled:

  • ✅ Service starts successfully
  • ✅ No transform metrics exposed
  • ✅ Configuration respected

Test Plan

  • Unit tests pass
  • Integration tests pass (3/3)
  • Performance benchmarks pass (2/2)
  • Regression tests pass (25 total, 0 failures)
  • Manual integration test verified
  • Metrics exposed at /q/metrics
  • Metrics can be disabled via configuration

Breaking Changes

None. Metrics are opt-in (enabled by default but can be disabled).

Dependencies

Added: io.quarkus:quarkus-micrometer-registry-prometheus

🤖 Generated with Claude Code

ricardozanini and others added 12 commits May 14, 2026 18:39
Add quarkus-micrometer-registry-prometheus to enable metrics
collection and Prometheus endpoint exposure.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Scheduled job polls Transform Stats API every 30s and exposes
Micrometer gauges for documents processed, indexed, lag, state,
and checkpoint timestamp for both workflow and task transforms.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add configuration for metrics collection with sensible defaults:
- enabled=true (can be disabled to reduce ES load)
- poll-interval=30s (balances freshness vs overhead)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Test profile reduces metrics poll interval from 30s to 5s
to speed up integration tests.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add ElasticsearchTransformMetricsIT with helper methods for testing metrics collection.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add tests verifying:
- Metrics are collected and updated
- Prometheus endpoint exposes metrics
- Metrics update periodically with new data

All 3 tests passing.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add benchmark test class with helper methods for:
- Bulk event insertion with terminal/non-terminal ratio
- Age offset simulation for old events
- Transform lag monitoring

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Benchmark verifies processing time stays constant as data grows:
- Phase 1: 1K events
- Phase 2: 11K events
- Assert: < 50% processing time increase (vs 10x without smart filtering)

Test passing with 1.00x increase (no slowdown).

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Benchmark verifies lag stays low under load:
- Insert 1K events rapidly
- Monitor lag over 25 seconds
- Assert: max lag < 100 documents, lag decreases over time

Both performance tests passing (lag=14, 1.00x scaling).

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add comprehensive metrics documentation to TRANSFORM_OPTIMIZATION.md:
- Exposed Prometheus metrics (processed, indexed, lag, state, checkpoint)
- Grafana dashboard queries and alert rules
- Troubleshooting guide for high lag and metrics issues

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add section explaining:
- Prometheus metrics exposure
- Configuration options
- Grafana integration
- Performance benchmarking

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add detailed implementation plan for Elasticsearch Transform Metrics & Benchmarking feature.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@ricardozanini ricardozanini merged commit 4984dcc into main May 15, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant