Skip to content

feat(flagd): Opentelemetry feature for metrics/spans#92

Draft
erenatas wants to merge 6 commits intomainfrom
erenatas/otel-metrics
Draft

feat(flagd): Opentelemetry feature for metrics/spans#92
erenatas wants to merge 6 commits intomainfrom
erenatas/otel-metrics

Conversation

@erenatas
Copy link
Copy Markdown
Contributor

@erenatas erenatas commented Dec 15, 2025

This change is on top of #91, temporarily changing target branch to show diff correctly

Add optional otel cargo feature that enables OpenTelemetry tracing
and metrics for all evaluation modes (RPC, REST, in-process, file).

Tracing:

  • Flag evaluation spans with semantic attributes
  • gRPC client spans for RPC/in-process resolvers
  • HTTP client spans for REST resolver
  • Trace context propagation via HTTP headers

Metrics:

  • feature_flag.evaluation_total (counter)
  • feature_flag.evaluation_duration (histogram)
  • feature_flag.evaluation_error_total (counter)

Attributes include:

  • feature_flag.key, feature_flag.provider_name, feature_flag.provider_version
  • feature_flag.variant, feature_flag.reason, resolver_type, error.type

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @erenatas, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the flagd crate by integrating OpenTelemetry for comprehensive observability. It introduces tracing for flag evaluations and network calls, along with detailed metrics to monitor performance and errors across all evaluation modes. The changes also include a more robust error handling system, an improved file watching mechanism for local evaluation, and refined connection management, ensuring a more stable and observable feature flagging experience.

Highlights

  • OpenTelemetry Instrumentation: Introduced an optional otel cargo feature to enable OpenTelemetry tracing and metrics across all evaluation modes (RPC, REST, in-process, file).
  • Tracing Capabilities: Added tracing spans for flag evaluations with semantic attributes, gRPC client spans for RPC/in-process resolvers, HTTP client spans for REST resolvers, and trace context propagation via HTTP headers.
  • Metrics Collection: Implemented OpenTelemetry metrics including feature_flag.evaluation_total (counter), feature_flag.evaluation_duration (histogram), and feature_flag.evaluation_error_total (counter) with detailed attributes.
  • Improved Error Handling: Refactored and expanded the FlagdError enum to include more specific error types (e.g., Sync, Parse, Timeout, Io, Json, Channel) and updated error conversions.
  • Enhanced File Watching: The in-process file connector now uses the notify crate for more robust and efficient file change detection, handling atomic writes and providing better error reporting.
  • Configuration and Connection Management: Added keep_alive_time_ms to FlagdOptions for HTTP/2 keepalive configuration and implemented channel reuse and invalidation logic for gRPC connections. Caching is now disabled by default for in-process and file modes per specification.
  • New Examples and Documentation: Included new examples demonstrating in-process evaluation with live file updates and updated README.md with detailed instructions on using the otel feature, emitted spans, and metrics.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@erenatas erenatas changed the base branch from main to erenatas/improve-sync-mode December 15, 2025 17:10
Signed-off-by: Eren Atas <eren_atas@hotmail.com>
@erenatas erenatas force-pushed the erenatas/otel-metrics branch from 33baa26 to 98d6905 Compare December 15, 2025 17:11
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant new feature: OpenTelemetry support for metrics and tracing across all evaluation modes. The implementation is comprehensive, including a new otel module with well-structured submodules for gRPC, HTTP, metrics, and spans, along with extensive tests using a fake collector.

Beyond the new feature, this PR includes several impressive refactorings that improve the provider's robustness and performance:

  • Error handling has been standardized to a custom FlagdError enum, replacing anyhow.
  • The file-based resolver now uses the notify crate for efficient, event-driven file watching instead of polling.
  • The gRPC stream connector has been enhanced with connection reuse, better keep-alive handling, and more resilient retry logic.
  • The in-process targeting engine now correctly serializes nested evaluation context structures, which is a critical improvement.

Overall, this is an excellent set of changes that significantly enhances the flagd provider. My only minor suggestion is to avoid a fixed sleep in one of the new examples to prevent potential flakiness.

I am having trouble creating individual review comments. Click here to see my feedback.

crates/flagd/examples/common/mod.rs (39)

medium

The use of a fixed-duration sleep can lead to flaky examples, as the required initialization time for flagd might vary depending on the environment. The container is already configured with with_wait_for to wait for a specific log message. If additional initialization is needed after the port is available, a more robust approach would be to poll a health endpoint or wait for another specific condition rather than using a fixed sleep. While acceptable for an example, avoiding arbitrary sleeps is a good practice to ensure reliability.

crates/flagd/src/resolver/in_process/targeting/mod.rs (129-137)

medium

The previous implementation for handling EvaluationContextFieldValue::Struct used format!("{:?}", s), which would not correctly serialize the struct for the JSONLogic engine. This change to properly downcast to open_feature::StructValue and recursively serialize it is a critical fix that enables complex, nested evaluation contexts to work correctly with in-process targeting rules. This is an excellent improvement.

Copy link
Copy Markdown
Member

@beeme1mr beeme1mr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @erenatas, thanks for the PR. Would it be possible to publish as a separate hook so that any provider could use it? Also, do the OTel Rust SDK support span events? That's the recommended way to capture feature flag evaluations (at least for in-process evals) since a span tends to clutter most tracing UIs.

@erenatas erenatas changed the base branch from erenatas/improve-sync-mode to main December 23, 2025 09:36
@erenatas erenatas marked this pull request as draft December 29, 2025 12:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants