Skip to content

In the event of a network outage, e2e tests can be come synchronized #31

@nkinkade

Description

@nkinkade

There was a long-standing bug recently discovered in the ndt_e2e.sh test which was causing tests to run at more random intervals and likely more frequently than every 10 minutes.

Yesterday, @stephen-soltesz discovered this:

script_exporter_e2e_synchronization

After the event (around 08:00 UTC) there is a very clear 10m period. After investigation, discovered that for some reason the tests had become more synchronized that they were before. In the event of a test failure (perhaps due to a network outage) all tests will begin running every single minute until a success is registered. When the network finally comes back the tests will start succeeding, but by this time they are all running close to within one minute of each other. From there forward, they will be less spaced out. It turns out that the bug referenced above was probably protecting us from this sort of synchronization.

We need to figure out a more robust way of keeping tests spaced as uniformly across 10m as possible.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions