Skip to content

Add release and identifier labels to phase_duration metric.#1009

Merged
rene-oromtz merged 1 commit intomainfrom
provisioning-metric-with-release-label
Apr 13, 2026
Merged

Add release and identifier labels to phase_duration metric.#1009
rene-oromtz merged 1 commit intomainfrom
provisioning-metric-with-release-label

Conversation

@gntzio
Copy link
Copy Markdown
Collaborator

@gntzio gntzio commented Apr 9, 2026

Description

To enable more meaningful data separation for the provisioning duration metric, this PR adds 2 labels to the phase_duration metric: identifier and release.

  • identifier is derived from the identifier field in testflinger-agent.conf.
  • release indicates the OS with which the DUT was provisioned. release becomes available after the provisioning phase and is applied to all subsequent phases, including provisioning. Implementation details:
    • After provisioning is finished, the TF Agent will ssh the DUT to read /etc/os-release information.
    • If successful, the Agent will normalizes the version to match the C3 definition (the logic is documented in test_os_release.py)
    • The normalization ensures consistent behavior in Grafana dashboards.

Note: Additional rationale for the selected approach is provided in the linked ticket.

Resolved issues

CERTTF-832

Documentation

Unit tests + linked ticket.

Web service API changes

N/A

Tests

Unit tests.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.12%. Comparing base (547b32d) to head (df423fa).
⚠️ Report is 27 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1009      +/-   ##
==========================================
+ Coverage   73.98%   74.12%   +0.14%     
==========================================
  Files         108      109       +1     
  Lines       10343    10398      +55     
  Branches      887      895       +8     
==========================================
+ Hits         7652     7708      +56     
  Misses       2503     2503              
+ Partials      188      187       -1     
Flag Coverage Δ *Carryforward flag
agent 75.78% <100.00%> (+1.14%) ⬆️
cli 89.56% <ø> (ø) Carriedforward from 547b32d
device 60.15% <ø> (ø) Carriedforward from 547b32d
server 87.88% <ø> (ø) Carriedforward from 547b32d

*This pull request uses carry forward flags. Click here to find out more.

Components Coverage Δ
Agent 75.78% <100.00%> (+1.14%) ⬆️
CLI 89.56% <ø> (ø)
Common ∅ <ø> (∅)
Device Connectors 60.15% <ø> (ø)
Server 87.88% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds two new labels (identifier, release) to the agent’s phase_duration_seconds Prometheus histogram to enable better segmentation of phase-duration metrics by device identifier and provisioned OS release.

Changes:

  • Extend phase_duration_seconds metric labels to include identifier (from agent config) and release (queried from DUT /etc/os-release after provisioning).
  • Add DUT OS release querying/parsing logic and unit tests for normalization behavior.
  • Update agent metrics unit test expectations to include the new labels.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
agent/src/testflinger_agent/metrics.py Adds identifier/release labels to the phase duration histogram and updates the reporting API.
agent/src/testflinger_agent/agent.py Captures config identifier, queries DUT release after successful provision, and reports phase durations with the new labels.
agent/src/testflinger_agent/os_release.py Introduces SSH-based /etc/os-release query + parsing/normalization helpers.
agent/tests/test_os_release.py Adds unit tests for parsing, label derivation, and SSH query failure handling.
agent/tests/test_agent.py Updates metric label assertions to include identifier and release.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread agent/src/testflinger_agent/agent.py
Comment thread agent/src/testflinger_agent/agent.py
Comment thread agent/src/testflinger_agent/os_release.py
Comment thread agent/src/testflinger_agent/agent.py
Copy link
Copy Markdown
Contributor

@rene-oromtz rene-oromtz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking care of this! Overall this looks good to me!
I took the liberty to test it on staging and found a bug regarding the device_ip retrieval. Once that is addressed it should be good as I can see the identifier label was properly added:
phase_duration_seconds_sum{agent_id="audino", identifier="202409-35503", ...", test_phase="provision"}

Comment thread agent/src/testflinger_agent/agent.py Outdated
Comment thread agent/src/testflinger_agent/agent.py Outdated
Comment thread agent/src/testflinger_agent/os_release.py
Comment thread agent/src/testflinger_agent/os_release.py Outdated
Comment thread agent/src/testflinger_agent/os_release.py
Comment thread agent/tests/test_os_release.py
Comment thread agent/src/testflinger_agent/os_release.py
@gntzio gntzio requested review from ajzobro and rene-oromtz April 10, 2026 09:59
Copy link
Copy Markdown
Contributor

@rene-oromtz rene-oromtz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet! Working as intended:

Prometheus:
phase_duration_seconds_sum{agent_id="audino", identifier="202409-35503", ..., release="24.04.4", test_phase="provision"}

Agent logs:

[26-04-10 14:24:54]    INFO: (job.py:129)| Running setup_command: tf-setup
[26-04-10 14:24:59]    INFO: (job.py:129)| Running provision_command: tf-provision
[26-04-10 14:38:42]    INFO: (os_release.py:75)| DUT release detected: 24.04.4

There is just a pending discussion with @ajzobro so let's see what he thinks, besides that LGTM

@ajzobro
Copy link
Copy Markdown
Collaborator

ajzobro commented Apr 10, 2026

I won't hold this up.

@gntzio
Copy link
Copy Markdown
Collaborator Author

gntzio commented Apr 13, 2026

Thanks to both of you for the review.

@ajzobro thanks for highlighting the issue with licensing - it is important to keep it consistent in the long term.

It is not a problem to address your concerns and adapt copyright headers in newly added files. However, this would make them inconsistent with the other source files in the agent directory. Also, given that COPYING file is not part of that directory, it could bring even more confusion.

I would suggest to address the copyright topic for the entire agent component in a separate PR.

@gntzio gntzio force-pushed the provisioning-metric-with-release-label branch from f9f437a to df423fa Compare April 13, 2026 14:12
@rene-oromtz rene-oromtz merged commit b796e8d into main Apr 13, 2026
16 checks passed
@rene-oromtz rene-oromtz deleted the provisioning-metric-with-release-label branch April 13, 2026 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants