Skip to content

[Bug]: retain-on-failure produces truncated trace.zip files missing End of Central Directory record #41172

@ashblox

Description

@ashblox

Version

1.59.1

Steps to reproduce

  1. Configure trace: "retain-on-failure" with retries: 1 and a webServer in a Playwright config:
const config: PlaywrightTestConfig = {
  fullyParallel: true,
  retries: ci ? 1 : 0,
  use: {
    trace: "retain-on-failure",
    serviceWorkers: "block",
    navigationTimeout: 30000,
    actionTimeout: 10000,
  },
  webServer: {
    command: "pnpm start:mocked-auth",
    port: 3102,
    reuseExistingServer: !ci,
    timeout: 120000,
    stdout: "pipe",
  },
};

No custom context.tracing.start() / .stop() calls — trace collection is entirely config-driven.

  1. Run a test that fails in CI on a resource-constrained or contended runner (in our case, an ARM64 GitHub Actions runner in a shared job alongside build/lint/unit-test tasks running via Nx with parallel: 3)
  2. Inspect the resulting trace.zip files in test-results/

We cannot reproduce the truncation locally (Windows, x64, headed or headless). It only occurs in CI.

Expected behavior

trace.zip should be a valid, complete ZIP archive that can be opened in the Trace Viewer.

Actual behavior

trace.zip starts with valid PK local file headers but is truncated — the End of Central Directory (EOCD) record is missing. The file cannot be opened by any tool:

$ npx playwright trace open trace.zip
Error: End of central directory record signature not found. Either not a zip file, or file is truncated.

Both the initial run and retry produce independently truncated files of nearly identical size:

File Size Valid PK header EOCD record
trace.zip (run 1) 75,121 bytes Yes (50 4B 03 04) Missing
trace.zip (retry 1) 75,222 bytes Yes (50 4B 03 04) Missing

The consistent ~75KB truncation point across both independent attempts suggests a systematic teardown cutoff rather than a random race. By manually walking the local file headers and inflating with zlib, we confirmed the trace data is partially present — action logs, route fulfillments, and fixture teardown events were all recoverable. The ZIP was simply never finalized.

This worked correctly on 1.58.x with the same config and CI setup.

Additional context

We run a second Playwright suite (E2E against a deployed environment) with the same trace: "retain-on-failure" config. That suite does not have retries enabled and its traces are always valid. The key differences in the failing (mocked) suite are retries, a webServer, and that it runs in a shared CI job with higher resource contention.

Suspected regression source

PR #39884 (commit a8ea6558, merged 2026-03-27) converted yazl/yauzl from static top-level imports to lazy await import() in the three codepaths that write trace.zip:

  1. localUtils.zip() — the primary path for config-driven retain-on-failure traces
  2. SerializedFS._performOperation('zip') in fileUtils.ts
  3. testTracing.stopIfNeeded() / mergeTraceFiles() — final trace.zip assembly

The teardown flow wraps stopIfNeeded() in _runWithTimeout(...).catch(() => {}) — errors and timeouts are silently swallowed. The lazy await import('../zipBundle') adds async latency at ZIP creation time that did not exist in 1.58 (where the module was already loaded at startup). Under resource contention in CI, this could widen the window for the teardown timeout to fire after yazl has started streaming local file entries but before zipFile.end() completes writing the central directory and EOCD record.

A secondary area of change is the screencast/tracing refactor (PR #39512, PR #39520, PR #39937) which changed how screencast frames are captured during tracing. This could affect resource flush timing before ZIP assembly.

Environment

System:
  OS: Windows 11 10.0.26100
  CPU: (16) x64 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
  Memory: 4.51 GB / 31.80 GB
Binaries:
  Node: 22.22.0
  pnpm: 11.1.3
npmPackages:
  @playwright/test: 1.59.1 => 1.59.1

CI (where truncation occurs):
  OS: Ubuntu 24.04 (ARM64)
  Runner: GitHub Actions, 32 CPU / 128 GB
  Container: Playwright 1.59 image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions