test(cloudflare): Unflake integration test by JPeer264 · Pull Request #20208 · getsentry/sentry-javascript

JPeer264 · 2026-04-10T13:53:41Z

There was one flaky test, which got me a little deeper into the runner.ts logic. This test was only passing when it was running / finishing first. With the shuffle flag it was consistently failing, this is why this is added in this PR as well.

Furthermore a random port will be created for each runner by setting --port 0, this just makes sure that when running wrangler dev in another tab, while running the tests, the local development has the default :8787 port.

github-actions · 2026-04-10T13:54:42Z

Semver Impact of This PR

🟢 Patch (bug fixes)

📋 Changelog Preview

This is how your changes will appear in the changelog.
Entries from this PR are highlighted with a left border (blockquote style).

New Features ✨

Core

Automatically disable truncation when span streaming is enabled in LangGraph integration by andreiborza in #20231
Automatically disable truncation when span streaming is enabled in LangChain integration by andreiborza in #20230
Automatically disable truncation when span streaming is enabled in Google GenAI integration by andreiborza in #20229
Automatically disable truncation when span streaming is enabled in Anthropic AI integration by andreiborza in #20228
Automatically disable truncation when span streaming is enabled in Vercel AI integration by andreiborza in #20232
Automatically disable truncation when span streaming is enabled in OpenAI integration by andreiborza in #20227
Add enableTruncation option to Vercel AI integration by nicohrubec in #20195
Add enableTruncation option to Google GenAI integration by andreiborza in #20184
Add enableTruncation option to Anthropic AI integration by andreiborza in #20181
Add enableTruncation option to LangGraph integration by andreiborza in #20183
Add enableTruncation option to LangChain integration by andreiborza in #20182
Add enableTruncation option to OpenAI integration by andreiborza in #20167
Export a reusable function to add tracing headers by JPeer264 in #20076

Deps

Bump axios from 1.13.5 to 1.15.0 by dependabot in #20180
Bump hono from 4.12.7 to 4.12.12 by dependabot in #20118
Bump defu from 6.1.4 to 6.1.6 by dependabot in #20104

Other

(cloudflare) Propagate traceparent to RPC calls - via fetch by JPeer264 in #19991

Bug Fixes 🐛

Deno

Handle reader.closed rejection from releaseLock() in streaming by andreiborza in #20187
Avoid inferring invalid span op from Deno tracer by Lms24 in #20128

Other

(ci) Prevent command injection in ci-metadata workflow by fix-it-felix-sentry in #19899
(e2e) Add op check to waitForTransaction in React Router e2e tests by copilot-swe-agent in #20193
(node-integration-tests) Fix flaky kafkajs test race condition by copilot-swe-agent in #20189

Internal Changes 🔧

Deps

Bump hono from 4.12.7 to 4.12.12 in /dev-packages/e2e-tests/test-applications/cloudflare-hono by dependabot in #20119
Bump axios from 1.13.5 to 1.15.0 in /dev-packages/e2e-tests/test-applications/nestjs-basic by dependabot in #20179

Other

(bugbot) Add rules to flag test-flake-provoking patterns by Lms24 in #20192

(cloudflare) Unflake integration test by JPeer264 in #20208

(deps-dev) Bump vite from 7.2.0 to 7.3.2 in /dev-packages/e2e-tests/test-applications/tanstackstart-react by dependabot in #20107
(react) Remove duplicated test mock by s1gr1d in #20200
(size-limit) Bump failing size limit scenario by Lms24 in #20186
Fix flaky ANR test by increasing blocking duration by JPeer264 in #20239
Add automatic flaky test detector by nicohrubec in #18684

_{🤖 This preview updates automatically when you update the PR.}

nicohrubec · 2026-04-10T14:46:36Z

+  // This is needed because wrangler dev may not guarantee waitUntil completion
+  // the same way production Cloudflare does. Without this delay, the last
+  // envelope's HTTP request may not complete before the test moves on.
+  const delay = () => new Promise(resolve => setTimeout(resolve, 50));


m: can we solve this differently, specifically is there some event that we could await before moving onto the next request instead of adding a timeout? this might already help but I am worried that this will not fully resolve the flakiness

Right now there is now way, as the runner doesn't provide a way of doing this. Wrangler, as of now, seems to drop waitUntil runs entirely when another request comes in. For that to work we have to change the way how the runner works, and not register all expect at once and then call the worker, but rather call the worker and wait for the expect right after. But that would be a bigger part of work AFAICT

ok, so to understand correctly: We delay here by 50ms so that a kicked off waitUntil task finishes before we start a new request? And we do this due to a local wrangler limitation (?)

Taking a step back: Why is this test doing 5 request repetitions? I see we always assert on the same payload, without cross-envelope checks, so what do we gain from it? (not saying we shouldn't just that it's not clear).

Given I understand correctly, I'd say the delay is fine (for the lack of better options). But can we make sure this is enough for CI? 50ms seems a bit short but then again, I'm not sure if it's necessary to wait longer. Maybe just deferring to the next tick is already enough?

Lms24 · 2026-04-10T15:01:32Z

any chance this resolves #20209?

JPeer264 · 2026-04-13T11:29:33Z

any chance this resolves #20209?

Potentially. I also have a little bit more code locally, where we entirely wait between tests that wrangler dev and its child processes have exited entirely. I didn't add this here, as I think using different ports would be enough. So I hope it will solve the flakiness already.

Regardless, I'll take a closer look at the other flake.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: Fixed-delay sleep may still cause test flakiness
- Replaced the 50ms setTimeout delay with an event-based waitForEnvelopes() method that waits for actual envelope delivery, eliminating the race condition and test flakiness.

Or push these changes by commenting:

@cursor push 8dc8b96c93

Preview (8dc8b96c93)

diff --git a/dev-packages/cloudflare-integration-tests/runner.ts b/dev-packages/cloudflare-integration-tests/runner.ts
--- a/dev-packages/cloudflare-integration-tests/runner.ts
+++ b/dev-packages/cloudflare-integration-tests/runner.ts
@@ -50,6 +50,7 @@
     path: string,
     options?: { headers?: Record<string, string>; data?: BodyInit; expectError?: boolean },
   ): Promise<T | undefined>;
+  waitForEnvelopes(count: number): Promise<void>;
 };
 
 /** Creates a test runner */
@@ -112,12 +113,23 @@
       let child: ReturnType<typeof spawn> | undefined;
       let childSubWorker: ReturnType<typeof spawn> | undefined;
 
+      // Track promises waiting for specific envelope counts
+      const envelopeWaiters: Array<{ count: number; resolve: () => void }> = [];
+
       /** Called after each expect callback to check if we're complete */
       function expectCallbackCalled(): void {
         envelopeCount++;
         if (envelopeCount === expectedEnvelopeCount) {
           resolve();
         }
+
+        // Resolve any waiters that are waiting for this envelope count
+        for (let i = envelopeWaiters.length - 1; i >= 0; i--) {
+          if (envelopeCount >= envelopeWaiters[i].count) {
+            envelopeWaiters[i].resolve();
+            envelopeWaiters.splice(i, 1);
+          }
+        }
       }
 
       function assertEnvelopeMatches(expected: Expected, envelope: Envelope): void {
@@ -308,6 +320,15 @@
             return;
           }
         },
+        waitForEnvelopes: async function (count: number): Promise<void> {
+          if (envelopeCount >= count) {
+            return Promise.resolve();
+          }
+
+          return new Promise<void>(resolveWaiter => {
+            envelopeWaiters.push({ count, resolve: resolveWaiter });
+          });
+        },
       };
     },
   };

diff --git a/dev-packages/cloudflare-integration-tests/suites/tracing/durableobject-spans/test.ts b/dev-packages/cloudflare-integration-tests/suites/tracing/durableobject-spans/test.ts
--- a/dev-packages/cloudflare-integration-tests/suites/tracing/durableobject-spans/test.ts
+++ b/dev-packages/cloudflare-integration-tests/suites/tracing/durableobject-spans/test.ts
@@ -45,20 +45,14 @@
   // Expect 5 transaction envelopes — one per call.
   const runner = createRunner(__dirname).expectN(5, assertDoWorkEnvelope).start(signal);
 
-  // Small delay between requests to allow waitUntil to process in wrangler dev.
-  // This is needed because wrangler dev may not guarantee waitUntil completion
-  // the same way production Cloudflare does. Without this delay, the last
-  // envelope's HTTP request may not complete before the test moves on.
-  const delay = () => new Promise(resolve => setTimeout(resolve, 50));
-
   await runner.makeRequest('get', '/');
-  await delay();
+  await runner.waitForEnvelopes(1);
   await runner.makeRequest('get', '/');
-  await delay();
+  await runner.waitForEnvelopes(2);
   await runner.makeRequest('get', '/');
-  await delay();
+  await runner.waitForEnvelopes(3);
   await runner.makeRequest('get', '/');
-  await delay();
+  await runner.waitForEnvelopes(4);
   await runner.makeRequest('get', '/');
   await runner.completed();
 });

_{This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit f77c6b1. Configure here.}

cursor · 2026-04-13T11:41:47Z

  await runner.makeRequest('get', '/');
+  await delay();
  await runner.makeRequest('get', '/');
+  await delay();


Fixed-delay sleep may still cause test flakiness

Low Severity

The test introduces a hardcoded 50ms setTimeout delay between requests as a workaround for wrangler dev's waitUntil behavior. Per the review rules, timeouts or sleeps in tests are flagged as likely to introduce flakes — concrete events or signals to wait on are preferred. A 50ms delay may be insufficient under CI load, potentially causing the same flakiness this PR aims to fix. The PR discussion already acknowledges this limitation, noting that the runner doesn't currently provide an event-based mechanism.

^{Triggered by project rule: PR Review Guidelines for Cursor Bot}

^{Reviewed by Cursor Bugbot for commit f77c6b1. Configure here.}

github-actions · 2026-04-13T11:48:38Z

size-limit report 📦

Path	Size	% Change	Change
@sentry/browser	25.72 kB	-	-
@sentry/browser - with treeshaking flags	24.21 kB	-	-
@sentry/browser (incl. Tracing)	42.73 kB	-	-
@sentry/browser (incl. Tracing, Profiling)	47.35 kB	-	-
@sentry/browser (incl. Tracing, Replay)	81.54 kB	-	-
@sentry/browser (incl. Tracing, Replay) - with treeshaking flags	71.11 kB	-	-
@sentry/browser (incl. Tracing, Replay with Canvas)	86.25 kB	-	-
@sentry/browser (incl. Tracing, Replay, Feedback)	98.45 kB	-	-
@sentry/browser (incl. Feedback)	42.51 kB	-	-
@sentry/browser (incl. sendFeedback)	30.39 kB	-	-
@sentry/browser (incl. FeedbackAsync)	35.38 kB	-	-
@sentry/browser (incl. Metrics)	27.04 kB	-	-
@sentry/browser (incl. Logs)	27.18 kB	-	-
@sentry/browser (incl. Metrics & Logs)	27.86 kB	-	-
@sentry/react	27.48 kB	-	-
@sentry/react (incl. Tracing)	45.05 kB	-	-
@sentry/vue	30.56 kB	-	-
@sentry/vue (incl. Tracing)	44.59 kB	-	-
@sentry/svelte	25.74 kB	-	-
CDN Bundle	28.41 kB	-	-
CDN Bundle (incl. Tracing)	43.75 kB	-	-
CDN Bundle (incl. Logs, Metrics)	29.78 kB	-	-
CDN Bundle (incl. Tracing, Logs, Metrics)	44.83 kB	-	-
CDN Bundle (incl. Replay, Logs, Metrics)	68.59 kB	-	-
CDN Bundle (incl. Tracing, Replay)	80.64 kB	-	-
CDN Bundle (incl. Tracing, Replay, Logs, Metrics)	81.66 kB	-	-
CDN Bundle (incl. Tracing, Replay, Feedback)	86.17 kB	-	-
CDN Bundle (incl. Tracing, Replay, Feedback, Logs, Metrics)	87.2 kB	-	-
CDN Bundle - uncompressed	82.99 kB	-	-
CDN Bundle (incl. Tracing) - uncompressed	129.77 kB	-	-
CDN Bundle (incl. Logs, Metrics) - uncompressed	87.14 kB	-	-
CDN Bundle (incl. Tracing, Logs, Metrics) - uncompressed	133.19 kB	-	-
CDN Bundle (incl. Replay, Logs, Metrics) - uncompressed	210.12 kB	-	-
CDN Bundle (incl. Tracing, Replay) - uncompressed	246.65 kB	-	-
CDN Bundle (incl. Tracing, Replay, Logs, Metrics) - uncompressed	250.05 kB	-	-
CDN Bundle (incl. Tracing, Replay, Feedback) - uncompressed	259.56 kB	-	-
CDN Bundle (incl. Tracing, Replay, Feedback, Logs, Metrics) - uncompressed	262.95 kB	-	-
@sentry/nextjs (client)	47.47 kB	-	-
@sentry/sveltekit (client)	43.2 kB	-	-
@sentry/node-core	57.86 kB	+0.02%	+7 B 🔺
@sentry/node	174.93 kB	+0.01%	+11 B 🔺
@sentry/node - without tracing	97.97 kB	+0.03%	+22 B 🔺
@sentry/aws-serverless	115.22 kB	+0.02%	+19 B 🔺

View base workflow run

Lms24

Approving to unblock but please take a look at my comments

Lms24 · 2026-04-13T11:50:17Z

      },
    },
+    sequence: {
+      shuffle: true,


With the shuffle flag it was consistently failing, this is why this is added in this PR as well

Maybe I misunderstand but shouldn't shuffle be set to false then?

Lms24 · 2026-04-13T11:56:14Z

+  // This is needed because wrangler dev may not guarantee waitUntil completion
+  // the same way production Cloudflare does. Without this delay, the last
+  // envelope's HTTP request may not complete before the test moves on.
+  const delay = () => new Promise(resolve => setTimeout(resolve, 50));


ok, so to understand correctly: We delay here by 50ms so that a kicked off waitUntil task finishes before we start a new request? And we do this due to a local wrangler limitation (?)

Taking a step back: Why is this test doing 5 request repetitions? I see we always assert on the same payload, without cross-envelope checks, so what do we gain from it? (not saying we shouldn't just that it's not clear).

Given I understand correctly, I'd say the delay is fine (for the lack of better options). But can we make sure this is enough for CI? 50ms seems a bit short but then again, I'm not sure if it's necessary to wait longer. Maybe just deferring to the next tick is already enough?

Lms24 · 2026-04-13T11:59:20Z

        expect.objectContaining({ description: 'task-1', op: 'task' }),
        expect.objectContaining({ description: 'task-2', op: 'task' }),
        expect.objectContaining({ description: 'task-3', op: 'task' }),
        expect.objectContaining({ description: 'task-4', op: 'task' }),
        expect.objectContaining({ description: 'task-5', op: 'task' }),


out of scope (so no need to do it in this PR) but I just saw this: Is task a valid op? I didn't find it in our list of span operations. Not sure if this was discussed and agreed upon but if yes, let's update the span operations doc in develop.

The test is timing out intermittently in CI, causing spurious failures. This will be fixed as part of #20208 Co-authored-by: Claude Opus 4 <noreply@anthropic.com>

JPeer264 requested review from nicohrubec and s1gr1d April 10, 2026 13:53

JPeer264 self-assigned this Apr 10, 2026

nicohrubec reviewed Apr 10, 2026

View reviewed changes

test(cloudflare): Unflake integration test

f77c6b1

JPeer264 force-pushed the jp/unflake-cf-integration branch from 7aed56c to f77c6b1 Compare April 13, 2026 11:38

cursor bot reviewed Apr 13, 2026

View reviewed changes

Lms24 approved these changes Apr 13, 2026

View reviewed changes

JPeer264 mentioned this pull request Apr 14, 2026

test(cloudflare): Skip flaky durableobject-spans test #20282

Merged

Uh oh!

Conversation

JPeer264 commented Apr 10, 2026

Uh oh!

github-actions bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Semver Impact of This PR

New Features ✨

Core

Deps

Other

Bug Fixes 🐛

Deno

Other

Internal Changes 🔧

Deps

Other

Uh oh!

nicohrubec Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

JPeer264 Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Lms24 Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Lms24 commented Apr 10, 2026

Uh oh!

JPeer264 commented Apr 13, 2026

Uh oh!

cursor bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cursor bot Apr 13, 2026

Choose a reason for hiding this comment

Fixed-delay sleep may still cause test flakiness

Uh oh!

github-actions bot commented Apr 13, 2026

size-limit report 📦

Uh oh!

Lms24 left a comment

Choose a reason for hiding this comment

Uh oh!

Lms24 Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Lms24 Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Lms24 Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Apr 10, 2026 •

edited

Loading

cursor bot left a comment •

edited

Loading

Lms24 Apr 13, 2026 •

edited

Loading