forked from opensearch-project/OpenSearch
-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
Committed-code failures detected on 2026-03-25
The following tests failed in gradle-check builds that ran against committed code (Timer runs on main or Post Merge Actions) within the past 24 hours. Historical failure data across all build types (including PR builds) is included to assess flake rates.
Failing Tests
1. MixedClusterClientYamlTestSuiteIT — 310_match_bool_prefix/multi_match multiple fields complete term
- Recent build: #73223
- First failure: 2024-03-25
- Total unique builds affected: 144
- Pattern: Chronic flaky test active for 2 years. Major spike in Sep 2024 (54 builds), then settled to a steady 1–5 builds/month through 2026. Still consistently failing every month. Stable (persistent low-rate flake).
2. MixedClusterClientYamlTestSuiteIT — 310_match_bool_prefix/multi_match multiple fields partial term
- Recent build: #73223
- First failure: 2024-03-25
- Total unique builds affected: 137
- Pattern: Nearly identical to the "complete term" variant above. Same Sep 2024 spike (55 builds), same persistent low-rate tail. Stable (persistent low-rate flake).
3. MixedClusterClientYamlTestSuiteIT — 110_strict_allow_templates (Index documents with setting dynamic parameter)
- Recent build: #73215
- First failure: 2024-06-26 (MixedCluster variant); 2024-08-06 (ClientYamlTestSuiteIT variant)
- Total unique builds affected: 48 (MixedCluster) + 58 (ClientYamlTestSuiteIT) = ~106 across variants
- Pattern: Sporadic flake. Had a large spike in Sep 2024 (39 builds for MixedCluster variant), then mostly quiet with occasional 1–4 builds/month. Jan 2026 saw a spike of 13 builds in the ClientYamlTestSuiteIT variant. Worsening slightly — Jan 2026 spike suggests renewed instability.
4. AwarenessAllocationIT — testThreeZoneOneReplicaWithForceZoneValueAndLoadAwareness
- Recent build: #73244
- First failure: 2024-08-31
- Total unique builds affected: 131
- Pattern: Dormant until Apr 2025, then became a high-frequency flake: 9→5→3→9→18→10→16→12→9→14→16→8 builds/month from Apr 2025 through Mar 2026. Worsening — escalated significantly since mid-2025 and remains at high levels.
5. RemoteSegmentMetadataHandlerTests — testWriteContent
- Recent build: #73253
- First failure: 2024-04-17
- Total unique builds affected: 8
- Pattern: Very rare flake — only 8 builds in nearly 2 years. Scattered across months with long quiet periods. Stable (rare, low-impact flake).
6. RemoteSegmentMetadataHandlerTests — classMethod
- Recent build: #73253
- First failure: 2024-04-17
- Total unique builds affected: 21
- Pattern: Low-frequency flake, 1–3 builds/month when it appears. Mar 2026 has 3 occurrences so far. Likely a suite-level setup/teardown issue that surfaces when
testWriteContentor other tests in the class fail. Stable (low-rate, correlated with other test failures in the class).
7. AzureBlobStoreRepositoryTests — testWriteRead
- Recent build: #73222
- First failure: 2024-04-29
- Total unique builds affected: 75
- Pattern: Persistent flake for nearly 2 years. Rate has been increasing: 1–2 builds/month in early history, rising to 5–9 builds/month since Nov 2025. Worsening — clear upward trend in recent months.
8. NodeJoinLeftIT — testClusterStabilityWhenDisconnectDuringSlowNodeLeftTask
- Recent build: #73232
- First failure: 2025-06-09
- Total unique builds affected: 8
- Pattern: Rare flake, only 8 builds in ~10 months. Appears sporadically with 1–2 builds in scattered months. Stable (rare flake).
9. RemoteRestoreSnapshotIT — testClusterManagerFailoverDuringSnapshotCreation (writable_warm_index=true)
- Recent build: #73248
- First failure: 2025-06-02
- Total unique builds affected: 48
- Pattern: Consistent flake since introduction in Jun 2025. Running at 1–8 builds/month with no sign of improvement. Mar 2026 already at 8 builds. Worsening — Mar 2026 is on track to be the worst month.
10. RemoteRestoreSnapshotIT — classMethod
- Recent build: #73248
- First failure: 2024-08-30
- Total unique builds affected: 125
- Pattern: High-frequency flake. Escalated from 1–3 builds/month in late 2024 to 9–16 builds/month since mid-2025. Jan 2026 peaked at 16 builds. Worsening — significant escalation over the past year.
Summary Table
| # | Test | Recent Build | First Seen | Builds Affected | Trend |
|---|---|---|---|---|---|
| 1 | MixedClusterClientYamlTestSuiteIT — 310_match_bool_prefix complete term |
#73223 | 2024-03 | 144 | Stable |
| 2 | MixedClusterClientYamlTestSuiteIT — 310_match_bool_prefix partial term |
#73223 | 2024-03 | 137 | Stable |
| 3 | AwarenessAllocationIT — testThreeZone...LoadAwareness |
#73244 | 2024-08 | 131 | |
| 4 | RemoteRestoreSnapshotIT — classMethod |
#73248 | 2024-08 | 125 | |
| 5 | MixedClusterClientYamlTestSuiteIT — 110_strict_allow_templates |
#73215 | 2024-06 | ~106 | |
| 6 | AzureBlobStoreRepositoryTests — testWriteRead |
#73222 | 2024-04 | 75 | |
| 7 | RemoteRestoreSnapshotIT — testClusterManagerFailover (warm=true) |
#73248 | 2025-06 | 48 | |
| 8 | RemoteSegmentMetadataHandlerTests — classMethod |
#73253 | 2024-04 | 21 | Stable |
| 9 | NodeJoinLeftIT — testClusterStability...SlowNodeLeftTask |
#73232 | 2025-06 | 8 | Stable |
| 10 | RemoteSegmentMetadataHandlerTests — testWriteContent |
#73253 | 2024-04 | 8 | Stable |
Key Observations
- 4 of 10 tests are worsening: AwarenessAllocationIT, RemoteRestoreSnapshotIT (classMethod and testClusterManagerFailover), and AzureBlobStoreRepositoryTests show clear upward trends in failure frequency.
- The MixedClusterClientYamlTestSuiteIT 310_match_bool_prefix tests are the longest-running flakes (2+ years) but have stabilized at a low rate.
- RemoteRestoreSnapshotIT.classMethod is likely a suite-level issue that correlates with individual test failures in the class — fixing the underlying test flakes would likely resolve this.
Data sourced from the OpenSearch metrics cluster on 2026-03-25.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels