Skip to content

perf(flagd): speed up e2e test execution via container pool and parallel scenarios#1752

Open
aepfli wants to merge 6 commits intomainfrom
feat/speed-up-flagd-e2e-tests
Open

perf(flagd): speed up e2e test execution via container pool and parallel scenarios#1752
aepfli wants to merge 6 commits intomainfrom
feat/speed-up-flagd-e2e-tests

Conversation

@aepfli
Copy link
Copy Markdown
Member

@aepfli aepfli commented Mar 30, 2026

Summary

Reduces flagd provider e2e test wall-clock time by ~75% through a pre-warmed container pool and Cucumber-level parallel scenario execution — no Surefire fork changes needed.

Changes

1. Container pool (ContainerPool + ContainerEntry)

The previous setup used a single shared Docker Compose stack for all scenarios within a runner. Since the flagd-testbed launchpad controls a single flagd process via /start, /stop, /restart, and /change HTTP endpoints, scenarios sharing one container would race on these operations and could not run concurrently.

The fix is a pre-warmed container pool: @BeforeAll starts N containers in parallel (~45s, once per JVM), and each Cucumber scenario borrows one ContainerEntry for its duration, giving it a fully isolated flagd process. After teardown the entry is returned to the pool.

Pool size is tunable via -Dflagd.e2e.pool.size=N (default: 2).

A reference counter ensures that when multiple suite runners share the same JVM (reuseForks=true), containers are only started on the first initialize() call and stopped on the last shutdown() call.

2. Parallel Cucumber scenarios

With cucumber.execution.parallel.enabled=true and fixed.parallelism=2 (matching the default pool size), scenarios within each runner execute concurrently.

Correctness safeguards via exclusive resource locks:

  • @env-var scenarios serialised behind ENV_VARS lock (requires companion PR flagd-testbed#359)
  • @grace scenarios (container restart + reconnection timing) serialised behind CONTAINER_RESTART lock
  • ConfigCucumberTest disables parallelism entirely (env-var mutations in <0.4s suite — no benefit)

3. Per-provider teardown

Replaced OpenFeatureAPI.getInstance().shutdown() (global — tears down all providers) with a per-provider NoOpProvider swap through the SDK lifecycle. This properly detaches event emitters and is safe for parallel execution.

4. Event drain fix

EventSteps now drains events up to and including the first match instead of clear()-ing the entire queue. This prevents stale events (e.g. a READY from before a disconnect) from satisfying later assertions.

Architecture

Before:
  Runner 1 (sequential) → start 1 container → scenario 1,2,...N (sequential) → stop
  Runner 2 (sequential) → start 1 container → scenario 1,2,...N (sequential) → stop
  Runner 3 (sequential) → start 1 container → scenario 1,2,...N (sequential) → stop

After (pool + parallel):
  Runner 1: start pool(2) → 2 parallel scenarios → ... → defer shutdown
  Runner 2: reuse pool    → 2 parallel scenarios → ... → defer shutdown
  Runner 3: reuse pool    → 2 parallel scenarios → ... → stop pool

Dependencies

  • flagd-testbed#359 — adds @env-var tag to config scenarios (submodule temporarily pointed at PR branch)

Draft — watching CI

Opening as draft to observe CI behaviour before merging.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables parallel end-to-end test execution for the flagd provider by implementing a ContainerPool to manage multiple Docker Compose environments. It introduces ContainerEntry and ContainerPool classes, refactors test steps to use pooled containers, and updates Maven and Cucumber configurations for parallel forking. A review comment identifies a potential resource leak in ContainerPool.initialize() and suggests using a try-finally block to ensure the ExecutorService is always shut down and to prevent container leaks if an exception occurs.

@aepfli aepfli force-pushed the feat/speed-up-flagd-e2e-tests branch from e985cfb to 8057aa4 Compare March 31, 2026 08:50
@aepfli aepfli changed the title perf(flagd): speed up e2e test execution via parallel runners and container pool perf(flagd): speed up e2e test execution via container pool and parallel scenarios Mar 31, 2026
@aepfli aepfli force-pushed the feat/speed-up-flagd-e2e-tests branch 2 times, most recently from a985b64 to 209cce0 Compare March 31, 2026 08:57
aepfli and others added 4 commits March 31, 2026 11:01
Replace the single shared Docker Compose stack with a pre-warmed
ContainerPool. Each Cucumber scenario borrows its own isolated
ContainerEntry (flagd + envoy + temp dir), eliminating the process-level
contention that prevented parallel execution.

Key changes:
- ContainerEntry: encapsulates a single Docker Compose stack + temp dir
- ContainerPool: manages a fixed-size pool with acquire/release semantics
  and reference counting so multiple suite runners sharing a JVM only
  start/stop containers once
- ProviderSteps: borrows a container per scenario, replaces global
  API.shutdown() with per-provider NoOpProvider swap through the SDK
  lifecycle (properly detaches event emitters)
- State: carries the borrowed ContainerEntry and provider domain name

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Simon Schrottner <simon.schrottner@dynatrace.com>
Enable cucumber.execution.parallel.enabled=true with fixed parallelism
matching the container pool size (2).

Correctness safeguards:
- @env-var scenarios serialised behind an ENV_VARS exclusive resource
  lock (requires @env-var tag in test-harness, see companion PR)
- @Grace scenarios serialised behind a CONTAINER_RESTART lock to avoid
  reconnection timeouts under parallel container restarts
- ConfigCucumberTest disables parallelism entirely (env-var mutations
  in <0.4s suite — no benefit, avoids races)
- EventSteps: drain-based event matching replaces clear() to prevent
  stale events from satisfying later assertions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Simon Schrottner <simon.schrottner@dynatrace.com>
Temporary: CI needs the @env-var tag from flagd-testbed#359.
Revert to released branch once that PR is merged and tagged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Simon Schrottner <simon.schrottner@dynatrace.com>
Switch Cucumber plugin from 'pretty' (prints every step) to 'summary'
(only prints failures and a final count). Keeps CI logs readable.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Simon Schrottner <simon.schrottner@dynatrace.com>
@aepfli aepfli force-pushed the feat/speed-up-flagd-e2e-tests branch from 209cce0 to f9e647c Compare March 31, 2026 09:01
aepfli and others added 2 commits March 31, 2026 11:08
Switch Cucumber strategy from 'fixed' to 'dynamic' (factor=1.0, i.e.
one thread per available processor). ContainerPool default pool size
also scales with availableProcessors() so pool slots match thread count.

Both are still overridable:
  -Dflagd.e2e.pool.size=N
  -Dcucumber.execution.parallel.config.dynamic.factor=N

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Simon Schrottner <simon.schrottner@dynatrace.com>
Default pool size was Runtime.availableProcessors() which on large machines
(22 CPUs) spawned too many simultaneous Docker Compose stacks and caused
ContainerLaunchException. Cap at min(availableProcessors, 4).

Cucumber threads still scale with CPUs (dynamic factor=1) — extra threads
simply block waiting for a free container, which is safe.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Simon Schrottner <simon.schrottner@dynatrace.com>
@aepfli aepfli marked this pull request as ready for review April 1, 2026 08:05
@aepfli aepfli requested a review from a team as a code owner April 1, 2026 08:05
// later assertion that expects a *new* event of the same type, while still
// preserving events that arrived *after* the match for subsequent steps.
Event matched = null;
while (!state.events.isEmpty()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are w esure that no new events can be emitted while or immediately after this runs? Otherwise this loop might not be sufficient

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean, we want to specifically keep events which are generated while this loop runs. because an ready event can happen shortly after a disconnect, and while we wait for the disconnect including cleanup, we might even remove the new ready

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are worried about events shortly after a disconnect, we should wait for some time and check events afterwards or in the meantime. This loop might be done after 0 or 1 iterations, and might be done before we receive such an event that we would want to wait for

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this cleans all the events which has happened till our matched event. so cleaning the list till out event. so that if there are events in the meantime, they stay in the list, and we can match in the next check against all of them.

break;
}
}
state.lastEvent = java.util.Optional.ofNullable(matched);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming is not ideal, this is not the last event, it's the last event that matches the eventType

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the last event is for the current test state, the last tracked event. we do not care about the type. this is only used to verify some event information.

Comment on lines +59 to +60
// properly calls detachEventProvider (nulls onEmit) and shuts down the emitter
// executor — neither of which happens when calling provider.shutdown() directly.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something that should happen when we call shutdown?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the tests are now running in parallel, so shutting down the api is a no go, as it is also messing with other tests

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but I mean in general, not specifically this test case

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe, but currently the goal is speeding up tests ;) - when we reset a new provider, we are actually cleaning up

}

public static void shutdown() {
int remaining = refCount.decrementAndGet();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be possible that all current users call shutdown, even though there are still outstanding users, who have not called initialize yet? Then we would shutdown the pool, even though there are still tests lined up

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is happening on the beforeAll - per test suites, tests suites are not parallel anyways, there is another improvement for this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the test suites do not run in parallel, then we don't need this sync mechanism. If they do run in parallel, the scenrio in my comment could (even though it is unlikely) occur

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i ran it shortly in parallel, and the next iteration will add more flexibility to it, where all the tests are actually parallel. but this is in the next follow up pr. i shortly ran all 3 tests in parallel with some hacks, but it was not worth the effort.

"flagd.e2e.pool.size", Math.min(Runtime.getRuntime().availableProcessors(), 4));

private static final BlockingQueue<ContainerEntry> pool = new LinkedBlockingQueue<>();
private static final List<ContainerEntry> all = new ArrayList<>();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be a concurrent data structure too, so that we guarantee that all changes to the list are also visible to another thread calling shutdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants