fix(agent): include GlobalTimeout during allocate phase by rene-oromtz · Pull Request #928 · canonical/testflinger

rene-oromtz · 2026-02-18T20:32:11Z

Description

~~This PR adds a a schema validation for allocate_data.~~

This PR adds GlobalTimeout to the allocate phase, without a proper parent job id, the agent can wait indefinitively in the allocate phase until job is manually cancelled. Ideally, the best approach is to enforce the schema in server side but for now, just adding this timeout should allow no agent exceeds the allowed amount of time.

The rationale on not including a schema on server side is that there seems to be cases where users that rely on Spread testing are using the allocate phase to fetch for the IP and then use that IP to run manual tests, this is not ideal as the reserve stage should serve for that purpose but we should at least give a proper warning before enforcing the restriction on server side.

Resolved issues

Resolves #799
Resolves CERTTF-713

Documentation

Web service API changes

Tests

Added couple of unit tests for testing agent exits on allocate status

codecov · 2026-02-18T20:46:20Z

Codecov Report

❌ Patch coverage is 91.30435% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.07%. Comparing base (9cbb26f) to head (d104f1d).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #928      +/-   ##
==========================================
+ Coverage   73.85%   74.07%   +0.21%     
==========================================
  Files         108      108              
  Lines       10313    10322       +9     
  Branches      886      888       +2     
==========================================
+ Hits         7617     7646      +29     
+ Misses       2508     2488      -20     
  Partials      188      188

Flag	Coverage Δ		*Carryforward flag
agent	`76.16% <90.90%> (+1.75%)`	⬆️
cli	`89.56% <ø> (ø)`		Carriedforward from 8a3e71c
device	`59.86% <100.00%> (ø)`
server	`87.85% <ø> (ø)`		Carriedforward from 8a3e71c

*This pull request uses carry forward flags. Click here to find out more.

Components	Coverage Δ
Agent	`76.16% <90.90%> (+1.75%)`	⬆️
CLI	`89.56% <ø> (ø)`
Common	`∅ <ø> (∅)`
Device Connectors	`59.86% <100.00%> (ø)`
Server	`87.85% <ø> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

This reverts commit 3614347.

ajzobro · 2026-03-30T22:35:44Z


                parent_job_id = self.job_data.get("parent_job_id")
                if not parent_job_id:
+                    # TODO: Remove this path once Spread testing use cases


Do we have any sort of future ticket we can reference in this TODO to make sure this happens?

ajzobro · 2026-03-30T22:43:27Z

+                        logger.info("Parent job completed, exiting...")
+                        break
            except TFServerError:
                logger.warning("Failed to get allocated job status, retrying")


There are two jobs that could be responsible for this error -- parent and self.job_id; should we have two try statements to isolate them and provide a better error message and handling?

ajzobro · 2026-03-30T22:43:29Z

                if not parent_job_id:
+                    # TODO: Remove this path once Spread testing use cases
+                    # are migrated to reserve phase instead of allocate phase.
                    logger.warning("No parent job ID found while allocated")


What will remedy this situation? We seem to get into this while loop, and if we don't have a parent_job_id, how will we get one? Will we just spin around in this loop and never leave?

ajzobro · 2026-03-30T22:49:09Z

        return exitcode

-    def allocate(self):
+    def allocate(self, _):


Why aren't we naming this new arg? Is allocate intended to be an abstract method for multi-device sub-classes?

rene-oromtz force-pushed the feat/add-multi-allocate branch from 16b0278 to 3614347 Compare February 18, 2026 20:44

rene-oromtz marked this pull request as draft February 18, 2026 21:00

rene-oromtz changed the title ~~fix(server): add validation for allocate phase only for multi device agents.~~ fix(agent): include GlobalTimeout during allocate phase Feb 19, 2026

rene-oromtz added 5 commits February 24, 2026 15:33

feat(server): add schema validation for allocate_data

98bf548

Revert "feat(server): add schema validation for allocate_data"

81c3dd2

This reverts commit 3614347.

fix(agent): include global timeout during allocate phase

7868cdf

set global timeout checker at the beginning of the phase

58df8ea

add unit tests

8a3e71c

rene-oromtz force-pushed the feat/add-multi-allocate branch from 878591e to 8a3e71c Compare February 24, 2026 22:33

devices: update allocate method

d104f1d

rene-oromtz marked this pull request as ready for review February 24, 2026 22:42

rene-oromtz requested a review from ajzobro February 24, 2026 22:58

ajzobro reviewed Mar 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agent): include GlobalTimeout during allocate phase#928

fix(agent): include GlobalTimeout during allocate phase#928
rene-oromtz wants to merge 6 commits intomainfrom
feat/add-multi-allocate

rene-oromtz commented Feb 18, 2026 •

edited

Loading

Uh oh!

codecov bot commented Feb 18, 2026 •

edited

Loading

Uh oh!

ajzobro Mar 30, 2026

Uh oh!

ajzobro Mar 30, 2026

Uh oh!

ajzobro Mar 30, 2026

Uh oh!

ajzobro Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rene-oromtz commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Resolved issues

Documentation

Web service API changes

Tests

Uh oh!

codecov bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ajzobro Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

ajzobro Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

ajzobro Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

ajzobro Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rene-oromtz commented Feb 18, 2026 •

edited

Loading

codecov bot commented Feb 18, 2026 •

edited

Loading