test(cloudflare): Unflake integration test#20208
Conversation
Semver Impact of This PR🟢 Patch (bug fixes) 📋 Changelog PreviewThis is how your changes will appear in the changelog. New Features ✨Core
Deps
Other
Bug Fixes 🐛Deno
Other
Internal Changes 🔧Deps
Other
🤖 This preview updates automatically when you update the PR. |
| // This is needed because wrangler dev may not guarantee waitUntil completion | ||
| // the same way production Cloudflare does. Without this delay, the last | ||
| // envelope's HTTP request may not complete before the test moves on. | ||
| const delay = () => new Promise(resolve => setTimeout(resolve, 50)); |
There was a problem hiding this comment.
m: can we solve this differently, specifically is there some event that we could await before moving onto the next request instead of adding a timeout? this might already help but I am worried that this will not fully resolve the flakiness
There was a problem hiding this comment.
Right now there is now way, as the runner doesn't provide a way of doing this. Wrangler, as of now, seems to drop waitUntil runs entirely when another request comes in. For that to work we have to change the way how the runner works, and not register all expect at once and then call the worker, but rather call the worker and wait for the expect right after. But that would be a bigger part of work AFAICT
There was a problem hiding this comment.
ok, so to understand correctly: We delay here by 50ms so that a kicked off waitUntil task finishes before we start a new request? And we do this due to a local wrangler limitation (?)
Taking a step back: Why is this test doing 5 request repetitions? I see we always assert on the same payload, without cross-envelope checks, so what do we gain from it? (not saying we shouldn't just that it's not clear).
Given I understand correctly, I'd say the delay is fine (for the lack of better options). But can we make sure this is enough for CI? 50ms seems a bit short but then again, I'm not sure if it's necessary to wait longer. Maybe just deferring to the next tick is already enough?
|
any chance this resolves #20209? |
Potentially. I also have a little bit more code locally, where we entirely wait between tests that Regardless, I'll take a closer look at the other flake. |
7aed56c to
f77c6b1
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Fixed-delay sleep may still cause test flakiness
- Replaced the 50ms setTimeout delay with an event-based waitForEnvelopes() method that waits for actual envelope delivery, eliminating the race condition and test flakiness.
Or push these changes by commenting:
@cursor push 8dc8b96c93
Preview (8dc8b96c93)
diff --git a/dev-packages/cloudflare-integration-tests/runner.ts b/dev-packages/cloudflare-integration-tests/runner.ts
--- a/dev-packages/cloudflare-integration-tests/runner.ts
+++ b/dev-packages/cloudflare-integration-tests/runner.ts
@@ -50,6 +50,7 @@
path: string,
options?: { headers?: Record<string, string>; data?: BodyInit; expectError?: boolean },
): Promise<T | undefined>;
+ waitForEnvelopes(count: number): Promise<void>;
};
/** Creates a test runner */
@@ -112,12 +113,23 @@
let child: ReturnType<typeof spawn> | undefined;
let childSubWorker: ReturnType<typeof spawn> | undefined;
+ // Track promises waiting for specific envelope counts
+ const envelopeWaiters: Array<{ count: number; resolve: () => void }> = [];
+
/** Called after each expect callback to check if we're complete */
function expectCallbackCalled(): void {
envelopeCount++;
if (envelopeCount === expectedEnvelopeCount) {
resolve();
}
+
+ // Resolve any waiters that are waiting for this envelope count
+ for (let i = envelopeWaiters.length - 1; i >= 0; i--) {
+ if (envelopeCount >= envelopeWaiters[i].count) {
+ envelopeWaiters[i].resolve();
+ envelopeWaiters.splice(i, 1);
+ }
+ }
}
function assertEnvelopeMatches(expected: Expected, envelope: Envelope): void {
@@ -308,6 +320,15 @@
return;
}
},
+ waitForEnvelopes: async function (count: number): Promise<void> {
+ if (envelopeCount >= count) {
+ return Promise.resolve();
+ }
+
+ return new Promise<void>(resolveWaiter => {
+ envelopeWaiters.push({ count, resolve: resolveWaiter });
+ });
+ },
};
},
};
diff --git a/dev-packages/cloudflare-integration-tests/suites/tracing/durableobject-spans/test.ts b/dev-packages/cloudflare-integration-tests/suites/tracing/durableobject-spans/test.ts
--- a/dev-packages/cloudflare-integration-tests/suites/tracing/durableobject-spans/test.ts
+++ b/dev-packages/cloudflare-integration-tests/suites/tracing/durableobject-spans/test.ts
@@ -45,20 +45,14 @@
// Expect 5 transaction envelopes — one per call.
const runner = createRunner(__dirname).expectN(5, assertDoWorkEnvelope).start(signal);
- // Small delay between requests to allow waitUntil to process in wrangler dev.
- // This is needed because wrangler dev may not guarantee waitUntil completion
- // the same way production Cloudflare does. Without this delay, the last
- // envelope's HTTP request may not complete before the test moves on.
- const delay = () => new Promise(resolve => setTimeout(resolve, 50));
-
await runner.makeRequest('get', '/');
- await delay();
+ await runner.waitForEnvelopes(1);
await runner.makeRequest('get', '/');
- await delay();
+ await runner.waitForEnvelopes(2);
await runner.makeRequest('get', '/');
- await delay();
+ await runner.waitForEnvelopes(3);
await runner.makeRequest('get', '/');
- await delay();
+ await runner.waitForEnvelopes(4);
await runner.makeRequest('get', '/');
await runner.completed();
});This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.
Reviewed by Cursor Bugbot for commit f77c6b1. Configure here.
| await runner.makeRequest('get', '/'); | ||
| await delay(); | ||
| await runner.makeRequest('get', '/'); | ||
| await delay(); |
There was a problem hiding this comment.
Fixed-delay sleep may still cause test flakiness
Low Severity
The test introduces a hardcoded 50ms setTimeout delay between requests as a workaround for wrangler dev's waitUntil behavior. Per the review rules, timeouts or sleeps in tests are flagged as likely to introduce flakes — concrete events or signals to wait on are preferred. A 50ms delay may be insufficient under CI load, potentially causing the same flakiness this PR aims to fix. The PR discussion already acknowledges this limitation, noting that the runner doesn't currently provide an event-based mechanism.
Triggered by project rule: PR Review Guidelines for Cursor Bot
Reviewed by Cursor Bugbot for commit f77c6b1. Configure here.
size-limit report 📦
|
Lms24
left a comment
There was a problem hiding this comment.
Approving to unblock but please take a look at my comments
| }, | ||
| }, | ||
| sequence: { | ||
| shuffle: true, |
There was a problem hiding this comment.
With the shuffle flag it was consistently failing, this is why this is added in this PR as well
Maybe I misunderstand but shouldn't shuffle be set to false then?
| // This is needed because wrangler dev may not guarantee waitUntil completion | ||
| // the same way production Cloudflare does. Without this delay, the last | ||
| // envelope's HTTP request may not complete before the test moves on. | ||
| const delay = () => new Promise(resolve => setTimeout(resolve, 50)); |
There was a problem hiding this comment.
ok, so to understand correctly: We delay here by 50ms so that a kicked off waitUntil task finishes before we start a new request? And we do this due to a local wrangler limitation (?)
Taking a step back: Why is this test doing 5 request repetitions? I see we always assert on the same payload, without cross-envelope checks, so what do we gain from it? (not saying we shouldn't just that it's not clear).
Given I understand correctly, I'd say the delay is fine (for the lack of better options). But can we make sure this is enough for CI? 50ms seems a bit short but then again, I'm not sure if it's necessary to wait longer. Maybe just deferring to the next tick is already enough?
| expect.objectContaining({ description: 'task-1', op: 'task' }), | ||
| expect.objectContaining({ description: 'task-2', op: 'task' }), | ||
| expect.objectContaining({ description: 'task-3', op: 'task' }), | ||
| expect.objectContaining({ description: 'task-4', op: 'task' }), | ||
| expect.objectContaining({ description: 'task-5', op: 'task' }), |
There was a problem hiding this comment.
out of scope (so no need to do it in this PR) but I just saw this: Is task a valid op? I didn't find it in our list of span operations. Not sure if this was discussed and agreed upon but if yes, let's update the span operations doc in develop.
The test is timing out intermittently in CI, causing spurious failures. This will be fixed as part of #20208 Co-authored-by: Claude Opus 4 <noreply@anthropic.com>
The test is timing out intermittently in CI, causing spurious failures. This will be fixed as part of #20208 Co-authored-by: Claude Opus 4 <noreply@anthropic.com>



There was one flaky test, which got me a little deeper into the
runner.tslogic. This test was only passing when it was running / finishing first. With theshuffleflag it was consistently failing, this is why this is added in this PR as well.Furthermore a random port will be created for each runner by setting
--port 0, this just makes sure that when runningwrangler devin another tab, while running the tests, the local development has the default:8787port.