Skip to content

Additional tracing spans for activation, persistence, and migration#9870

Merged
ReubenBond merged 35 commits intodotnet:mainfrom
rkargMsft:moar_tracing
Feb 13, 2026
Merged

Additional tracing spans for activation, persistence, and migration#9870
ReubenBond merged 35 commits intodotnet:mainfrom
rkargMsft:moar_tracing

Conversation

@rkargMsft
Copy link
Copy Markdown
Contributor

@rkargMsft rkargMsft commented Jan 5, 2026

Adds Lifecycle and Storage ActivitySources and new Spans for:

  • Lifecycle - Placement, activation, grain directory, migration, dehydrate/rehydrate
  • Storage -Persistance read/write/clear
  • Application - IAsyncEnumerable: main call span and nested Start/MoveNext/Dispose
    • Note that these three spans have moved out of the Runtime ActivitySource and into the Application ActivitySource. Versions on those ActivitySources have been updated to reflect this change.

Additionally added public constants in Orleans.Diagnostics.ActivitySources to help with adding listeners, such as through OpenTelemetry's .AddSource(ActivitySources.ApplicationGrainActivitySourceName). There is additionally a ActivitySources.AllActivitySourceName value to get all Orleans spans.

Documentation PR: dotnet/docs#50919

Microsoft Reviewers: Open in CodeFlow

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive distributed tracing support for Orleans grain lifecycle operations, including activation, placement, storage, and migration. The changes introduce new activity sources to organize telemetry spans and instrument key runtime operations that were previously not traced.

Key changes:

  • Introduces new ActivitySources and ActivityNames abstractions to centralize activity source management and operation naming
  • Adds tracing spans for grain activation, placement filtering, directory registration, and OnActivateAsync execution
  • Adds tracing spans for storage operations (read, write, clear) with proper parent context propagation
  • Adds tracing spans for grain migration (dehydrate/rehydrate) with context propagation across silos
  • Adds comprehensive test coverage to verify span creation and proper parent-child relationships

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
src/Orleans.Core.Abstractions/Diagnostics/ActivitySources.cs New file defining centralized activity sources for Application, Runtime, Lifecycle, and Storage operations
src/Orleans.Core/Diagnostics/ActivityNames.cs New file defining standardized operation names for tracing spans
src/Orleans.Core/Diagnostics/ActivityPropagationGrainCallFilter.cs Updated to use new ActivitySources abstraction and route different interface types to appropriate activity sources
src/Orleans.Core.Abstractions/Runtime/AsyncEnumerableRequest.cs Adds session-level activity tracking for async enumerable operations with proper parent-child relationships
src/Orleans.Core.Abstractions/Properties/AssemblyInfo.cs Exposes internal types to Tester assembly for testing
src/Orleans.Runtime/Catalog/Catalog.cs Adds activation span creation with activity context propagation from request context
src/Orleans.Runtime/Catalog/ActivationData.cs Adds detailed lifecycle tracing for activation, dehydration, and rehydration with events and error tracking
src/Orleans.Runtime/Storage/StateStorageBridge.cs Adds storage operation tracing spans with parent context from activation activity
src/Orleans.Runtime/Placement/PlacementService.cs Adds placement operation tracing and placement filter spans with activity context restoration
test/Tester/ActivationTracingTests.cs Comprehensive test suite verifying all new tracing spans and their parent-child relationships
test/Tester/ActivityPropagationTests.cs Updated to use new ActivitySources constants
test/Grains/TestGrainInterfaces/IActivityGrain.cs Adds test interface for async enumerable activity tracing
test/Grains/TestGrains/ActivityGrain.cs Adds test grain implementation for async enumerable activity tracing

Comment thread src/Orleans.Runtime/Catalog/ActivationData.cs
Comment thread src/Orleans.Runtime/Catalog/ActivationData.cs Outdated
Comment thread src/Orleans.Runtime/Catalog/ActivationData.cs Outdated
Comment thread src/Orleans.Runtime/Placement/PlacementService.cs
Comment thread test/Tester/ActivationTracingTests.cs Outdated
Comment thread src/Orleans.Runtime/Placement/PlacementService.cs Outdated
Comment thread test/Tester/ActivationTracingTests.cs Outdated
Comment thread test/Tester/ActivationTracingTests.cs
Comment thread src/Orleans.Runtime/Placement/PlacementService.cs Outdated
Comment thread src/Orleans.Core/Diagnostics/ActivityNames.cs Outdated
@rkargMsft rkargMsft marked this pull request as ready for review January 6, 2026 23:49
rkargMsft added a commit to rkargMsft/docs that referenced this pull request Jan 7, 2026
@rkargMsft
Copy link
Copy Markdown
Contributor Author

@ReubenBond This PR should be ready for review. Docs PR also linked.

Test failure appears to be a flaky test.

Comment thread src/Orleans.Runtime/Catalog/ActivationData.cs Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 19 comments.

Comment thread src/Orleans.Runtime/Catalog/ActivationData.cs Outdated
Comment thread test/Tester/ActivationTracingTests.cs
Comment thread test/Tester/ActivationTracingTests.cs
Comment thread test/Tester/ActivationTracingTests.cs
Comment thread src/Orleans.Core.Abstractions/Diagnostics/ActivitySources.cs
Comment thread test/Tester/ActivationTracingTests.cs Outdated
Comment thread test/Tester/ActivationTracingTests.cs
Comment thread src/Orleans.Core/Diagnostics/ActivityPropagationGrainCallFilter.cs
Comment thread src/Orleans.Runtime/Catalog/ActivationData.cs
Comment thread src/Orleans.Runtime/Catalog/ActivationData.cs
Copy link
Copy Markdown
Member

@ReubenBond ReubenBond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you! Just some comments about moving this diagnostic code out of the main logic paths and into helpers. Copilot/etc ought to be able to handle this change

Comment thread src/Orleans.Runtime/Catalog/ActivationData.cs Outdated
Copy link
Copy Markdown

@shacal shacal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Id move trace Tags also to collected class for consts, makes it easier to use in downstream and do not have magic string layin around.

Other than that, welcome addition for filtering and managing traces.

Well done 🤌

@rkargMsft
Copy link
Copy Markdown
Contributor Author

Unit test failures appear to be flaky tests. Pass locally and don't appear to be related to the changes.

@rkargMsft
Copy link
Copy Markdown
Contributor Author

I somehow missed OnDeactivateAsync
Working on that now

@rkargMsft
Copy link
Copy Markdown
Contributor Author

Working out some oddness with IAsyncEnumerable returning calls and the dispose grain span

rkargMsft and others added 6 commits February 3, 2026 11:39
- Ensure placement filter and activation/deactivation spans are correctly parented and restore Activity.Current
- Defer placement filter span creation to enumeration for accurate timing and parentage
- PlaceGrainAsync restores parent activity context from request context
- Add multi-filter and migration placement filter test grains and tests
- Add comprehensive trace context propagation tests (client/server, nested, concurrent, activation, cross-silo, HTTP)
The variable returned by TryRestoreActivityContext is never referenced after
assignment. Use the discard pattern to clarify intent: the Activity exists only
for its scoped using lifetime.
Replace Stop() calls with Dispose() (which internally calls Stop()) and null out
the field to release resources promptly. Activity.Dispose() signals the
ActivitySource that the span is complete and can be flushed, whereas Stop()
alone only records the end timestamp.
Comment thread src/Orleans.Runtime/Catalog/ActivationData.cs
@ReubenBond ReubenBond added this pull request to the merge queue Feb 13, 2026
Merged via the queue into dotnet:main with commit 88fbaa6 Feb 13, 2026
59 checks passed
rkargMsft added a commit to rkargMsft/orleans that referenced this pull request Feb 27, 2026
…otnet#9870)

* WIP

* Pulling more spans in to activation tracing

* Spans for persistence state

* dehydrate/rehydrate spans

* Separating class

* removing nullable

* reverting nullable operation

* removing unnecessary change

* Using activity source to ensure we see the root span in the test debug output

* aligning span naming with OTel conventions

https://opentelemetry.io/blog/2025/how-to-name-your-spans/#verb-object

* renaming remaining spans

https://opentelemetry.io/blog/2025/how-to-name-your-spans/#verb-object

* only add dehydrate/rehydrate for grain migration participants

* Refactor tracing: add granular ActivitySources and IAsyncEnumerable tracing

Introduce distinct ActivitySources for application, runtime, lifecycle, and storage tracing in Orleans. Update grain call filter and runtime logic to use the correct source based on operation type. Enhance async enumerable tracing with session activities and proper context propagation. Add new grain and tests to verify async enumerable activity spans. Improve test coverage and assertions for new tracing structure, increasing observability and diagnostic precision across subsystems.

* Implementing valid Copilot feedback

* Updating ActivitySource versions to denote changes around IAsyncEnumerable spans.

Runtime (1.0.0 -> 2.0.0) no longer gets the Start/MoveNext/Dispose spans
Application (1.0.0 -> 1.1.0) now gets the Start/MoveNext/Dispose spans and these are nested under a new span for the overall grain call being made.

* consolidate setting Activity error properties

* setting error description string

* correcting method call

* explicit using on activity

* fixing locking on `this`

* explicit discard

* more explicit discards

* set Activity error on rehydrate error

* restoring current activity after placement

* Commenting on current need to lock on `this`
Reverting System.Threading.Lock update

* Using constants for tags and error descriptions

* adding deactivation Activity

* Fixing errant using

* fix: adding deactivate Activity

* additional test scenario around IAsyncEnumerable use case and explicit dispose

* always stopping deactivate grain span

* correct parenting of spans for accepting migrations

* Improve Orleans tracing: context propagation & test coverage

- Ensure placement filter and activation/deactivation spans are correctly parented and restore Activity.Current
- Defer placement filter span creation to enumeration for accurate timing and parentage
- PlaceGrainAsync restores parent activity context from request context
- Add multi-filter and migration placement filter test grains and tests
- Add comprehensive trace context propagation tests (client/server, nested, concurrent, activation, cross-silo, HTTP)

* Use discard for unused placeGrainActivity variable

The variable returned by TryRestoreActivityContext is never referenced after
assignment. Use the discard pattern to clarify intent: the Activity exists only
for its scoped using lifetime.

* Dispose _activationActivity after activation lifecycle completes

Replace Stop() calls with Dispose() (which internally calls Stop()) and null out
the field to release resources promptly. Activity.Dispose() signals the
ActivitySource that the span is complete and can be flushed, whereas Stop()
alone only records the end timestamp.

---------

Co-authored-by: Reuben Bond <reuben.bond@gmail.com>
@github-actions github-actions bot locked and limited conversation to collaborators Mar 16, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants