Skip to content

Conversation

@breskeby
Copy link
Contributor

@breskeby breskeby commented Dec 8, 2025

This addresses some issues detecting docker availability and test skipping on windows after updating testcontainer to > 2.x. This is a fallout from our improvements on rerunning periodic builds and I ran into this repeatedly there

@breskeby breskeby added >non-issue :Delivery/Build Build or test infrastructure Team:Delivery Meta label for Delivery team auto-backport Automatically create backport pull requests when merged v9.3.0 v9.1.9 v8.19.9 v9.2.3 labels Dec 8, 2025
@breskeby breskeby self-assigned this Dec 8, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-delivery (Team:Delivery)

@breskeby breskeby requested a review from a team December 8, 2025 14:54
@breskeby breskeby added the test-windows Trigger CI checks on Windows label Dec 8, 2025
Copy link
Contributor

@jozala jozala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the DockerAvailabilty part is mostly code moved from DockerEnvironmentAwareTestContainer, but I've got some doubts about how this is expected to work.
I'd like to clarify the CI comment, but I would like to also understand the general aim.

  1. Do we want to ignore the problematic Docker tests runs in the CI or outside of it?
  2. I understand it makes sense for the EXCLUDED_OS in the CI, because these OSes from dockerOnLinuxExclusions file cannot run Docker for whatever reason. However, I think we should fail in CI if we are on a supported OS, but Docker is unavailable (DOCKER_PROBING_SUCCESSFUL == false). With the current implementation, we hide the infrastructure configuration issues and silently ignore the tests whenever Docker is not available.


static void assumeDockerIsAvailable() {
org.junit.Assume.assumeFalse("The current OS is excluded from Docker-based tests", EXCLUDED_OS);
org.junit.Assume.assumeTrue("The current OS is excluded from Docker-based tests", DOCKER_PROBING_SUCCESSFUL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: Should this have a different message so it is easy to distinguish between excluded OS and Docker unavailable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines +61 to +66
if (System.getProperty("os.name").toLowerCase().startsWith("windows")) {
return true;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to completely disable tests Docker-based test on Windows outside of CI?
Is it really that bad with the Testcontainers support on Windows?

Comment on lines 57 to 60
if (CI) {
// we dont exclude OS outside of CI environment
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may get lost a bit in the number of negations here, but isn't it the opposite to what the comment says?

From the code, I understand that if CI == true then we do not exclude. That means we never exclude the OS if we are in the CI, but only exclude when running locally.
For the CI run the isExcludedOs() is always false. This value is used in the assumeFalse so if we are in CI we get assumeFalse(false), and the test will always pass this assume invocation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally valid concern. It was actually wrong originally already I think. fixed this.

@breskeby
Copy link
Contributor Author

breskeby commented Dec 9, 2025

I see the DockerAvailabilty part is mostly code moved from DockerEnvironmentAwareTestContainer, but I've got some doubts about how this is expected to work. I'd like to clarify the CI comment, but I would like to also understand the general aim.

  1. Do we want to ignore the problematic Docker tests runs in the CI or outside of it?
  2. I understand it makes sense for the EXCLUDED_OS in the CI, because these OSes from dockerOnLinuxExclusions file cannot run Docker for whatever reason. However, I think we should fail in CI if we are on a supported OS, but Docker is unavailable (DOCKER_PROBING_SUCCESSFUL == false). With the current implementation, we hide the infrastructure configuration issues and silently ignore the tests whenever Docker is not available.

you are totally correct. I have just moved stuff and refactored but actually the CI handling wasn't clear. So far we have indeed just ignored those tests when docker not available. that actually left us with a hole in our coverage and detecting docker broken on ci is hard. I think your suggestion is right and we should fail in environments that we expect to work. We can then be explicit which OSs to exclude. Overall I think we just don't test docker on windows either as our images don't have docker support.

    - Invert logic in isExcludedOs to correctly apply exclusions only in CI.
    - Fail tests in CI if Docker is missing on supported OS instead of skipping.
    - Update assumption messages for clarity.
@breskeby breskeby force-pushed the fix/docker-windows-availability branch from f26aea7 to 2d3604e Compare December 9, 2025 12:43
@breskeby breskeby requested a review from jozala December 9, 2025 12:43
Copy link
Contributor

@jozala jozala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@breskeby breskeby merged commit 410e573 into elastic:main Dec 10, 2025
37 of 40 checks passed
breskeby added a commit to breskeby/elasticsearch that referenced this pull request Dec 10, 2025
* Fix docker availability handling accross platforms
    - Invert logic in isExcludedOs to correctly apply exclusions only in CI.
    - Fail tests in CI if Docker is missing on supported OS instead of skipping.
    - Update assumption messages for clarity.
breskeby added a commit to breskeby/elasticsearch that referenced this pull request Dec 10, 2025
* Fix docker availability handling accross platforms
    - Invert logic in isExcludedOs to correctly apply exclusions only in CI.
    - Fail tests in CI if Docker is missing on supported OS instead of skipping.
    - Update assumption messages for clarity.
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
9.1
8.19
9.2

breskeby added a commit to breskeby/elasticsearch that referenced this pull request Dec 10, 2025
* Fix docker availability handling accross platforms
    - Invert logic in isExcludedOs to correctly apply exclusions only in CI.
    - Fail tests in CI if Docker is missing on supported OS instead of skipping.
    - Update assumption messages for clarity.
elasticsearchmachine pushed a commit that referenced this pull request Dec 10, 2025
* Fix docker availability handling accross platforms
    - Invert logic in isExcludedOs to correctly apply exclusions only in CI.
    - Fail tests in CI if Docker is missing on supported OS instead of skipping.
    - Update assumption messages for clarity.
breskeby added a commit that referenced this pull request Dec 19, 2025
* Fix docker availability handling accross platforms
    - Invert logic in isExcludedOs to correctly apply exclusions only in CI.
    - Fail tests in CI if Docker is missing on supported OS instead of skipping.
    - Update assumption messages for clarity.
breskeby added a commit that referenced this pull request Dec 19, 2025
* Fix docker availability handling accross platforms
    - Invert logic in isExcludedOs to correctly apply exclusions only in CI.
    - Fail tests in CI if Docker is missing on supported OS instead of skipping.
    - Update assumption messages for clarity.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged :Delivery/Build Build or test infrastructure >non-issue Team:Delivery Meta label for Delivery team test-windows Trigger CI checks on Windows v8.19.9 v9.1.9 v9.2.3 v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants