Skip to content

Openspec/architecture alignment v2#2

Merged
danylosoft-create merged 10 commits intomainfrom
openspec/architecture-alignment-v2
Mar 31, 2026
Merged

Openspec/architecture alignment v2#2
danylosoft-create merged 10 commits intomainfrom
openspec/architecture-alignment-v2

Conversation

@danylosoft-create
Copy link
Copy Markdown
Collaborator

@danylosoft-create danylosoft-create commented Mar 23, 2026

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced @practica/test-eyes npm package—a Playwright reporter for automated test analytics integration.
    • Added automatic flaky test detection across multiple runs.
    • Implemented test statistics aggregation with pass/fail counts, average duration, and p95 percentile metrics.
    • Added dashboard deployment with test overview, flaky tests view, and slowest tests view.
    • Extended support for JUnit XML parsing alongside Playwright test collection.
  • Documentation

    • Added comprehensive setup and configuration guides for the new reporter package.

danylosoft-create and others added 10 commits March 20, 2026 13:33
- Create test-eyes-reporter package in collectors/playwright-reporter
- Implement Playwright Reporter interface for automatic data collection
- Add CLI for JUnit XML support (any test framework)
- Extend schema with flakyCount for tracking flaky tests
- Add deploy command for dashboard deployment
- Update frontend to display flaky test count column

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add structured error handling with detailed logging
- Add 19 unit tests for aggregate, junit-parser, file-operations
- Add vitest as dev dependency

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Break submitData() from 95 lines into 6 focused methods (~15 lines each)
- Add SubmitContext interface for cleaner data flow
- Extract mapStatus(), checkoutDataBranch(), saveData(), commitAndPush()
- Follow golden rule: functions under 20 lines

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
titlePath() returns empty strings for root suite/project, causing
test names like " >  > example.spec.ts > auth > ..."

Now correctly outputs: "example.spec.ts > auth > should login"

Bumped to v0.1.3 and published to npm.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Previously all runs on same day with same commit SHA would overwrite
each other. Now format is: date_HHMMSS_sha (e.g. 2026-03-20_144506_d68aea9)

Bumped to v0.1.4

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 23, 2026

📝 Walkthrough

Walkthrough

This pull request introduces the @practica/test-eyes Playwright reporter package that automates test analytics collection into a Git data branch. The implementation includes test result collection, flaky detection (tests passing after retry), JUnit XML parsing, multi-run aggregation with duration percentiles, GitHub Pages dashboard deployment, and corresponding frontend updates to display flaky test counts.

Changes

Cohort / File(s) Summary
Playwright Reporter - Core Package
collectors/playwright-reporter/package.json, tsconfig.json, README.md
Package metadata for @practica/test-eyes v0.1.4 with Playwright reporter and CLI binaries; comprehensive documentation of installation, configuration, CLI usage, and data schema.
Playwright Reporter - Main Implementation
collectors/playwright-reporter/src/reporter.ts, cli.ts, aggregate.ts, deploy.ts
Playwright reporter collects per-test results and lifecycle handlers; CLI dispatches collect (parses input, configures git, writes run data, aggregates) and deploy commands; aggregation rebuilds statistics from run files with duration tracking; deployment pushes frontend build and aggregated data to gh-pages branch.
Playwright Reporter - Utilities
collectors/playwright-reporter/src/types.ts, file-operations.ts, git-operations.ts, junit-parser.ts, index.ts
Type definitions for reporter options, test results, and run data; file I/O helpers for JSON persistence; git CLI wrappers (branch management, staging, commit/push); JUnit XML parser with self-closing tag and nested element support; module re-exports for public API.
Playwright Reporter - Tests
collectors/playwright-reporter/tests/aggregate.test.ts, file-operations.test.ts, junit-parser.test.ts
Vitest suites validating aggregation (flaky counting, multi-run totals, statistics), file operations (directory creation, filename generation, JSON load/save), and JUnit parsing (status extraction, duration conversion, multi-suite handling).
Frontend - Flaky Metrics
apps/frontend/src/App.tsx, hooks/useTestFilter.ts, types/index.ts
Extracted TestRow type to shared types module; added flakyCount field to TestRow interface and table column; updated data mapping to explicitly assign all stats including flaky count with backward-compatible defaults.
Test Processing - Flaky Tracking
apps/test-processing/src/types.ts, aggregate.ts
Extended TestResult with optional wasFlaky?: boolean flag; added flakyCount: number to TestStats; aggregation increments flaky count when wasFlaky is true.
OpenSpec - Playwright Reporter Lifecycle
.openspec/changes/playwright-reporter/01-problem.md, 02-solution.md, 03-tasks.md, 04-status.md
Problem statement defining manual JUnit integration limitations; solution design specifying reporter entry points and git flow; task tracking covering reporter implementation, external validation, and bug fixes; status documentation confirming v0.1.4 publication and completion checklist.
OpenSpec - Architecture Alignment
openspec/changes/architecture-alignment/ (README, design, proposal, task specs, flow-testing/spec, multi-format-collectors/spec, playwright-reporter/spec, test-aggregation/spec)
Architecture proposal to refactor Playwright reporter as thin collector delegating to shared collectFromRunData() core; flaky detection via test.outcome() === 'flaky'; bundling strategy via esbuild; JUnit collector relocation; flow test specifications with data factories and scenario coverage; multi-run aggregation contract specification.
Workspace Configuration
pnpm-workspace.yaml
Added collectors/* directory pattern to workspace packages list.

Sequence Diagram(s)

sequenceDiagram
    participant Playwright as Playwright<br/>(test runner)
    participant Reporter as TestEyes<br/>Reporter
    participant Git as Git<br/>Operations
    participant DataBranch as gh-data<br/>Branch
    participant Agg as Aggregation<br/>Engine

    Playwright->>Reporter: onTestEnd(test)
    Reporter->>Reporter: Extract result,<br/>mark flaky if retried
    Playwright->>Reporter: onEnd()
    Reporter->>Git: getCurrentSha(),<br/>getCurrentBranch()
    Reporter->>Git: checkoutBranch<br/>(gh-data)
    Reporter->>Reporter: Generate run data<br/>(runId, tests)
    Reporter->>Reporter: saveRunData()<br/>to data/
    Reporter->>Agg: aggregate(dataDir)
    Agg->>Agg: Load existing<br/>main-test-data.json
    Agg->>Agg: Scan for new<br/>JSON run files
    Agg->>Agg: Merge test stats,<br/>calc avg & p95<br/>duration
    Agg->>Agg: Track flaky<br/>counts
    Agg->>DataBranch: Save<br/>main-test-data.json
    Reporter->>Git: stageFiles(),<br/>commit(),<br/>push(gh-data)
    Reporter->>Git: checkoutBranch<br/>(original)
Loading
sequenceDiagram
    participant CLI as test-eyes<br/>CLI
    participant Parser as JUnit<br/>Parser
    participant Reporter as Reporter/<br/>Aggregator
    participant Git as Git<br/>Operations
    participant Deploy as Dashboard<br/>Deploy

    CLI->>Parser: parseJUnitFile(path)
    Parser->>Parser: Extract testcase,<br/>status, duration
    Parser-->>CLI: TestResult[]
    CLI->>Git: configureGit()
    CLI->>Reporter: aggregate(dataDir)
    Reporter->>Reporter: Reconstruct<br/>statistics
    Reporter->>Reporter: Calc duration<br/>percentiles
    CLI->>Git: stage, commit,<br/>push to gh-data
    alt Deploy enabled
        CLI->>Deploy: deployDashboard()
        Deploy->>Git: checkoutBranch<br/>(gh-data)
        Deploy->>Deploy: Fetch aggregated<br/>data JSON
        Deploy->>Deploy: Copy frontend<br/>dist + data
        Deploy->>Git: checkoutBranch<br/>(gh-pages)
        Deploy->>Deploy: Commit & push<br/>gh-pages
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Poem

🐰 Hoppy news for test-eyes flow!
With reporters that watch and know,
Flaky tests are caught in flight,
Data branches, pages bright!
GitHub Pages dashboards glow,
All from npm—watch it grow! 🚀

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 2.17% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Openspec/architecture alignment v2' clearly summarizes the primary change: documenting and specifying architectural alignment work for the test-eyes project, specifically an updated version of the architecture/design specification.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch openspec/architecture-alignment-v2

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 19

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
apps/frontend/src/App.tsx (1)

42-42: ⚠️ Potential issue | 🟡 Minor

Typo in state setter name.

setSllowestSorting has an extra 'l'. This doesn't affect functionality but reduces readability.

✏️ Fix typo
-  const [slowestSorting, setSllowestSorting] = useState<SortingState>([{ id: 'p95DurationMs', desc: true }])
+  const [slowestSorting, setSlowestSorting] = useState<SortingState>([{ id: 'p95DurationMs', desc: true }])

Also update the usage at line 98:

-    onSortingChange: setSllowestSorting,
+    onSortingChange: setSlowestSorting,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/frontend/src/App.tsx` at line 42, There's a typo in the state setter
name: rename setSllowestSorting to setSlowestSorting where the state is declared
(const [slowestSorting, setSllowestSorting] = useState<SortingState>(...)) and
update all usages of setSllowestSorting elsewhere in this component (e.g., the
handler referenced near the sorting update) to the corrected identifier so the
state follows the slowestSorting / setSlowestSorting convention and remains
consistent.
🟡 Minor comments (4)
.openspec/changes/playwright-reporter/02-solution.md-43-46 (1)

43-46: ⚠️ Potential issue | 🟡 Minor

Inconsistent flaky detection logic with updated spec.

This document specifies flaky detection as result.status === 'passed' && result.retry > 0, but the updated spec at openspec/changes/architecture-alignment/specs/playwright-reporter/spec.md requires using test.outcome() === 'flaky' instead. The documentation should be updated to reflect the canonical approach.

📝 Suggested update
 ## Flaky Detection Logic
 ```typescript
-wasFlaky: result.status === 'passed' && result.retry > 0
+isFlaky: test.outcome() === 'flaky'
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @.openspec/changes/playwright-reporter/02-solution.md around lines 43 - 46,
Update the flaky-detection expression to use the canonical Playwright API:
replace the old logic that computes wasFlaky using result.status and
result.retry (wasFlaky: result.status === 'passed' && result.retry > 0) with an
isFlaky check that calls test.outcome() === 'flaky' (rename the variable to
isFlaky and use test.outcome() to determine flakiness).


</details>

</blockquote></details>
<details>
<summary>openspec/changes/architecture-alignment/specs/flow-testing/spec.md-28-30 (1)</summary><blockquote>

`28-30`: _⚠️ Potential issue_ | _🟡 Minor_

**Inconsistency between spec and implementation: `isFlaky` vs `wasFlaky`.**

The spec references `isFlaky: true` but the `TestResult` interface in both `collectors/playwright-reporter/src/types.ts` and `apps/test-processing/src/types.ts` uses `wasFlaky`. This should be aligned to avoid confusion during implementation.



<details>
<summary>📝 Proposed fix</summary>

```diff
 #### Scenario: Flaky test produces correct output
 - **WHEN** reporter receives 3 attempts (fail, fail, pass) for one test with outcome 'flaky'
-- **THEN** pushToGitHub receives RunData with 1 test having isFlaky: true, retries: 2, status: 'passed'
+- **THEN** pushToGitHub receives RunData with 1 test having wasFlaky: true, retries: 2, status: 'passed'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openspec/changes/architecture-alignment/specs/flow-testing/spec.md` around
lines 28 - 30, The spec uses isFlaky but the codebase defines wasFlaky on
TestResult; update the spec language so it references wasFlaky to match the
existing TestResult types (collectors/playwright-reporter/src/types.ts and
apps/test-processing/src/types.ts) and any RunData pushed to pushToGitHub so
expectations read "wasFlaky: true" (keep retries: 2 and status: 'passed'
unchanged); alternatively, if you prefer changing code instead, rename the field
on TestResult and all usages (including RunData and pushToGitHub consumers) from
wasFlaky to isFlaky consistently across those two modules.
.openspec/changes/playwright-reporter/04-status.md-15-15 (1)

15-15: ⚠️ Potential issue | 🟡 Minor

Typo in demo URL.

The URL appears to contain a typo: test-eyeT should likely be test-eyes.

📝 Proposed fix
-- **Demo:** danylosoft-create.github.io/test-eyeT/
+- **Demo:** danylosoft-create.github.io/test-eyes/
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.openspec/changes/playwright-reporter/04-status.md at line 15, The demo URL
contains a typo: update the string "Demo:
danylosoft-create.github.io/test-eyeT/" to the correct URL "Demo:
danylosoft-create.github.io/test-eyes/" so the demo link points to the intended
"test-eyes" page (look for the line starting with "Demo:" that includes
"test-eyeT").
collectors/playwright-reporter/package.json-47-50 (1)

47-50: ⚠️ Potential issue | 🟡 Minor

Replace the placeholder repository metadata before publishing.

Leaving the manifest pointed at a placeholder repo will send npm users to a dead project page.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@collectors/playwright-reporter/package.json` around lines 47 - 50, The
package.json currently contains a placeholder "repository" object; update the
repository metadata by replacing the placeholder "type" and "url" values in the
repository field with the actual VCS type (e.g., "git") and the real repository
URL for this project (or remove the repository key if this package should not
expose a public repo), and confirm package.json's repository entry matches the
repo used by your CI/publishing pipeline so npm users are directed to the
correct project page.
🧹 Nitpick comments (9)
collectors/playwright-reporter/src/file-operations.ts (3)

10-13: Use one timestamp snapshot in generateRunId.

Line 11 and Line 12 call toISOString() separately; a single snapshot avoids edge-case date/time mismatch.

Suggested tweak
 export function generateRunId(commitSha: string): string {
-  const now = new Date()
-  const date = now.toISOString().split('T')[0]
-  const time = now.toISOString().split('T')[1].slice(0, 8).replace(/:/g, '')
+  const iso = new Date().toISOString()
+  const [date, timePart] = iso.split('T')
+  const time = timePart.slice(0, 8).replace(/:/g, '')
   const shortSha = commitSha.slice(0, 7)
   return `${date}_${time}_${shortSha}`
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@collectors/playwright-reporter/src/file-operations.ts` around lines 10 - 13,
The code builds date and time from two separate toISOString() calls causing
possible mismatch; in generateRunId capture a single timestamp (e.g., const iso
= new Date().toISOString()) and derive date and time from that single iso value
(use iso.split('T')[0] for date and
iso.split('T')[1].slice(0,8).replace(/:/g,'') for time) while keeping shortSha =
commitSha.slice(0,7); update occurrences of now/date/time to use the single
snapshot to avoid edge-case mismatches.

54-60: Differentiate missing-directory from real filesystem failures.

Line 58 currently converts all readdir failures into []; permission or transient FS failures should surface instead of silently skipping data.

Suggested error handling split
 export async function listJsonFiles(dirPath: string): Promise<string[]> {
   try {
     const files = await fs.readdir(dirPath)
     return files.filter((f: string) => f.endsWith('.json'))
-  } catch {
-    return []
+  } catch (error) {
+    const err = error as NodeJS.ErrnoException
+    if (err.code === 'ENOENT') return []
+    throw error
   }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@collectors/playwright-reporter/src/file-operations.ts` around lines 54 - 60,
The current listJsonFiles function swallows all fs.readdir errors and returns an
empty array; change error handling so only a missing-directory (ENOENT) returns
[] while other filesystem errors are rethrown. In practice, inside listJsonFiles
catch the thrown error, check error.code === 'ENOENT' (or error?.code) and
return [] for that case, otherwise throw the original error so permission or
transient failures surface; reference the listJsonFiles function and its use of
fs.readdir when making this change.

32-38: Avoid swallowing all JSON/file errors in loadJson.

Returning null for every failure path hides malformed JSON and permission issues, which makes production failures hard to diagnose.

Suggested error handling split
 export async function loadJson<T>(filepath: string): Promise<T | null> {
   try {
     const content = await fs.readFile(filepath, 'utf-8')
     return JSON.parse(content) as T
-  } catch {
-    return null
+  } catch (error) {
+    const err = error as NodeJS.ErrnoException
+    if (err.code === 'ENOENT') return null
+    throw error
   }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@collectors/playwright-reporter/src/file-operations.ts` around lines 32 - 38,
loadJson currently swallows all errors and always returns null, hiding malformed
JSON and permission problems; update the loadJson(filepath: string)
implementation so it only returns null for "file not found" (ENOENT) cases but
rethrows or propagates other errors (like JSON.parse syntax errors or EACCES
permission errors); specifically, inside loadJson catch the thrown error,
inspect error.code === 'ENOENT' and return null only then, otherwise rethrow the
error so callers see parse/permission failures.
collectors/playwright-reporter/tests/aggregate.test.ts (1)

71-107: Consider adding p95 calculation test.

The multi-run test verifies avgDurationMs but doesn't test p95DurationMs calculation. Given that p95 is displayed in the dashboard's "Slowest Tests" table, consider adding a test case with enough data points to verify percentile logic.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@collectors/playwright-reporter/tests/aggregate.test.ts` around lines 71 -
107, Add a test assertion that verifies the p95 percentile is computed by
aggregate: create enough run entries for the test case (use the existing
TEST_DIR, run objects like run1/run2 or add run3..runN) with varying durationMs
so the 95th percentile is meaningful, call aggregate(TEST_DIR), read
main-test-data.json into mainData, and add an expectation checking
mainData.tests['test1'].p95DurationMs equals the expected p95 value (referencing
aggregate and main-test-data.json to locate where the value is produced and
stored).
collectors/playwright-reporter/src/junit-parser.ts (3)

4-45: Unused interfaces and helper functions.

The interfaces JUnitTestCase, JUnitTestSuite, JUnitReport and helper functions normalizeToArray, getTestStatus, and parseTestCase are defined but never used. The actual parsing logic uses parseTestAttrs instead. These appear to be remnants from a previous XML library-based approach.

Consider removing this dead code to reduce maintenance burden.

🧹 Proposed cleanup
 import { readFile } from 'fs/promises'
 import type { TestResult } from './types.js'
 
-// JUnit XML types
-interface JUnitTestCase {
-  '@_classname'?: string
-  '@_name': string
-  '@_time'?: string
-  failure?: unknown
-  error?: unknown
-  skipped?: unknown
-}
-
-interface JUnitTestSuite {
-  '@_name'?: string
-  testcase?: JUnitTestCase | JUnitTestCase[]
-}
-
-interface JUnitReport {
-  testsuites?: { testsuite: JUnitTestSuite | JUnitTestSuite[] }
-  testsuite?: JUnitTestSuite
-}
-
-function normalizeToArray<T>(value: T | T[] | undefined): T[] {
-  if (!value) return []
-  return Array.isArray(value) ? value : [value]
-}
-
-function getTestStatus(testCase: JUnitTestCase): TestResult['status'] {
-  if (testCase.skipped !== undefined) return 'skipped'
-  if (testCase.failure !== undefined || testCase.error !== undefined) return 'failed'
-  return 'passed'
-}
-
-function parseTestCase(testCase: JUnitTestCase): TestResult {
-  const className = testCase['@_classname'] || ''
-  const testName = testCase['@_name'] || 'unknown'
-  const name = className ? `${className} ${testName}` : testName
-
-  return {
-    name: name.trim(),
-    durationMs: Math.round(parseFloat(testCase['@_time'] || '0') * 1000),
-    status: getTestStatus(testCase)
-  }
-}
-
 export function parseJUnitXml(xml: string): TestResult[] {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@collectors/playwright-reporter/src/junit-parser.ts` around lines 4 - 45,
Remove the dead/unused JUnit XML artifacts: delete the interfaces JUnitTestCase,
JUnitTestSuite, JUnitReport and the helper functions normalizeToArray,
getTestStatus, and parseTestCase from junit-parser.ts; verify that
parseTestAttrs (and any other active parsing helpers) remain untouched, search
the repo for any remaining references to those symbols before committing, and
run type-checks/tests to ensure no regressions.

71-74: Regex attribute parsing has edge-case limitations.

The attribute regex /name=["']([^"']+)["']/ won't handle:

  • XML entities in attribute values (e.g., &quot;, &amp;)
  • Test names containing the quote character used as delimiter

This is acceptable for most JUnit outputs, but consider documenting this limitation or using a more robust XML parsing approach if broader compatibility is needed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@collectors/playwright-reporter/src/junit-parser.ts` around lines 71 - 74, The
regex-based parsing in parseTestAttrs (function parseTestAttrs) fails for
attributes containing XML entities or quote chars used as delimiters; update the
implementation to use a proper XML/HTML parser (e.g., DOMParser in Node/browser
or an XML library like fast-xml-parser/xml2js) to extract name, classname and
time attributes reliably and unescape entities, or at minimum document this
limitation near parseTestAttrs/TestResult explaining that names with embedded
quotes or XML entities are not supported and recommend using a real XML parser
for broader compatibility.

94-97: Consider adding error handling for file read failures.

The parseJUnitFile function will throw if the file doesn't exist or isn't readable. Consider whether a more graceful error message would help users diagnose issues.

💡 Optional improvement
 export async function parseJUnitFile(filepath: string): Promise<TestResult[]> {
-  const xml = await readFile(filepath, 'utf-8')
-  return parseJUnitXml(xml)
+  try {
+    const xml = await readFile(filepath, 'utf-8')
+    return parseJUnitXml(xml)
+  } catch (error) {
+    throw new Error(`Failed to parse JUnit file "${filepath}": ${error instanceof Error ? error.message : error}`)
+  }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@collectors/playwright-reporter/src/junit-parser.ts` around lines 94 - 97, The
parseJUnitFile function currently lets readFile errors bubble up; wrap the await
readFile(filepath, 'utf-8') and subsequent parseJUnitXml(xml) in a try/catch
inside parseJUnitFile, and on error throw or reject with a clearer message that
includes the filepath and the original error (e.g., "Failed to read/parse JUnit
file: <filepath> - <original error>"), or return a controlled value if
preferred; refer to parseJUnitFile and parseJUnitXml when adding the try/catch
and include the original error as the cause/context for easier debugging.
collectors/playwright-reporter/src/aggregate.ts (1)

116-120: Skipped tests increment totalRuns but are not reflected in pass/fail counts.

A skipped test increments stats.totalRuns (line 114) but doesn't increment any status counter. This creates a gap where passCount + failCount < totalRuns for tests with skipped runs. If this is intentional, consider adding a skipCount field for completeness; otherwise, skipped tests may need different handling.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@collectors/playwright-reporter/src/aggregate.ts` around lines 116 - 120, The
current aggregation increments stats.totalRuns but only updates stats.passCount
or stats.failCount for test.status 'passed'/'failed', leaving skipped runs
uncounted; modify the aggregation logic in aggregate.ts to handle test.status
'skipped' by adding a new counter stats.skipCount (initialize where stats is
created) and increment it when test.status === 'skipped', or alternatively
adjust existing logic to not increment totalRuns for skipped runs—update
references to stats.totalRuns, stats.passCount, stats.failCount, and the new
stats.skipCount accordingly so passCount + failCount + skipCount == totalRuns.
collectors/playwright-reporter/src/git-operations.ts (1)

55-58: Avoid shell error suppression with || true.

The || true masks all errors from git rm -rf ., making failures invisible. Handle the error in JavaScript instead, so you can differentiate between "nothing to remove" (expected) and actual failures.

♻️ Proposed fix
 export async function createOrphanBranch(branch: string): Promise<void> {
-  await run(`git checkout --orphan ${branch}`)
-  await run('git rm -rf . || true')
+  await run('git', ['checkout', '--orphan', branch])
+  try {
+    await run('git', ['rm', '-rf', '.'])
+  } catch {
+    // Ignore error when there are no files to remove (empty repo)
+  }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@collectors/playwright-reporter/src/git-operations.ts` around lines 55 - 58,
In createOrphanBranch, stop using the shell-suppressed command "git rm -rf . ||
true" and instead await run('git rm -rf .') inside a try/catch so we can handle
failures explicitly: remove the "|| true", wrap the await run call in try/catch,
inspect the thrown error (message/stdout/stderr) from run and silently ignore
only the benign "did not match any files" / "pathspec" (no files to remove)
cases, but rethrow or log and propagate any other errors; reference
createOrphanBranch and the run helper to locate and update the code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/test-processing/src/aggregate.ts`:
- Around line 87-89: The code increments stats.flakyCount when test.wasFlaky but
older aggregated records may lack flakyCount causing NaN; before incrementing in
the aggregation logic around the stats object (where stats and test.wasFlaky are
used) ensure flakyCount is initialized to a numeric 0 if missing or not a number
(e.g., set stats.flakyCount = 0 when typeof stats.flakyCount !== 'number' or use
a fallback) and then perform the increment so flaky totals remain correct.

In `@collectors/playwright-reporter/package.json`:
- Around line 30-38: The package.json for collectors/playwright-reporter is
missing a dependency on the shared collection core in apps/test-processing, so
the repo continues to use the duplicated runtime files (aggregate.ts,
git-operations.ts, file-operations.ts, deploy.ts); update
collectors/playwright-reporter's package.json to add a dependency (or workspace
reference) to the apps/test-processing package, change imports in the reporter
code to import the shared orchestration/collection functions from that package
instead of the local duplicate modules (search for imports referencing
aggregate, git-operations, file-operations, deploy) and remove the duplicated
modules from the runtime path so the reporter delegates collection/orchestration
to the shared core.
- Around line 12-18: The package.json exports currently include a "require"
condition pointing at "./dist/index.js" while the package uses "type": "module",
causing CommonJS consumers to fail; either remove the "require" field from the
exports map (exports["."].require) or produce and point it to a dedicated
CommonJS build (e.g., compile a .cjs bundle such as "./dist/index.cjs" and
update exports["."].require to that file), and ensure the "types" and ESM
"import"/"default" still point to the ESM outputs (e.g., "./dist/index.d.ts" and
"./dist/index.js") so Node CJS consumers can require the .cjs file without
breaking ESM consumers.

In `@collectors/playwright-reporter/README.md`:
- Around line 15-19: Update the installation instructions in the README: replace
the incorrect package name string "test-eyes-reporter" with the published
package name "@practica/test-eyes" in both npm and pnpm examples so users can
successfully install the released package; ensure both occurrences in the
markdown block use "@practica/test-eyes".
- Around line 30-37: The reporter string in the Playwright config's reporter
array ('test-eyes-reporter') must match the actual npm package name; open the
reporter configuration where the reporter array is defined (the reporter: [...]
block) and replace 'test-eyes-reporter' with the package name exactly as
specified in the reporter package.json (or the installed module name), keeping
the existing options object (dataBranch, deployBranch, deploy) intact so
Playwright can resolve the module.

In `@collectors/playwright-reporter/src/aggregate.ts`:
- Around line 88-92: The reconstructed durations map currently fills samples
using stats.avgDurationMs (for each entry in data.tests), which loses historical
variance and breaks p95 accuracy; modify the loop that builds durations (the
code iterating over data.tests and calling durations.set) to first check for and
use stats.durationSamples (an array of real samples) when present, falling back
to Array(stats.totalRuns).fill(stats.avgDurationMs) only for backward
compatibility, and additionally consider using a stored stats.p95 value when
neither samples nor reliable totals exist to avoid synthetic recalculation.

In `@collectors/playwright-reporter/src/cli.ts`:
- Around line 51-60: The CLI currently accepts unvalidated branch refs (options
'data-branch' and any deploy branch from collectCommand / parseArgs) and later
passes them into git helper shell commands, creating a command-injection risk;
update collectCommand to validate ref names (e.g., allow only [A-Za-z0-9._/-]
and reject/escape anything else) before using them in any git helper, or change
the git helper invocation to use argv-style process execution (spawn/execFile)
rather than shell interpolation; ensure you apply the same validation or
safe-call change for the deploy branch path referenced around the code handling
deploy (lines near 138-149) so both dataBranch and deployBranch are sanitized or
passed as safe args to the git helpers.
- Around line 92-133: The checkout to the data branch and subsequent
save/aggregate/push sequence can leave the repo on the data branch if an error
is thrown; wrap the block that begins after switching branches (the
save/aggregate/commit/push flow that calls ensureDir, saveRunData, aggregate,
stageFiles, hasChanges, commit, push, and deployDashboard) in a try/finally and
perform the checkout back to originalBranch in the finally (using
checkoutBranch(originalBranch)) so the workspace is always restored; preserve
the existing logic that skips checkout when originalBranch is falsy or equals
dataBranch and keep deployDashboard only after successful work inside the try.

In `@collectors/playwright-reporter/src/deploy.ts`:
- Around line 57-158: The deployDashboard function can leave the repo on the
wrong branch or skip removing the tempDir if an exception occurs; wrap the main
work (everything after saving originalBranch and before cleanup) in a
try/finally so that in the finally block you always call await
checkoutBranch(originalBranch) and await fs.rm(tempDir, { recursive: true,
force: true }) (or equivalent cleanup), and ensure any early returns (e.g., when
no local frontend) still perform the cleanup by moving that logic inside the try
and returning after cleanup; reference deployDashboard, tempDir, originalBranch,
checkoutBranch, and fs.rm to locate the changes.
- Around line 81-98: deployDashboard currently uses a hard-coded localDist
('apps/frontend/dist') preventing the exposed frontendDistPath option from being
honored; update deployDashboard (and its options type) to accept and use the
frontendDistPath option (or import frontendDistPath from
collectors/playwright-reporter/src/types.ts) instead of the hard-coded
localDist, then replace uses of localDist in the fileExists and copyDir calls
(and any logging) so external repos can pass their build output; ensure the
options/interface signature that calls deployDashboard (and the variable
tempDir/fileExists/copyDir usage) is updated accordingly so the override is
threaded through.

In `@collectors/playwright-reporter/src/git-operations.ts`:
- Around line 60-64: The stageFiles function is injecting untrusted pattern
strings directly into a shell command causing command injection; update
stageFiles to call the underlying runner with argv (not a shell-interpolated
string) — e.g. invoke the run helper or child_process API using an argument
array like ['git','add', pattern] (or run('git', ['add', pattern'])) for each
pattern so the pattern is passed as a single argument and not interpreted by the
shell; ensure any helper run function supports argv arrays and adjust its call
sites if needed.
- Around line 75-91: The commit() and push() functions call run() with
interpolated strings making message and branch vulnerable to shell-injection and
their catch blocks swallow errors; update commit() and push() to call run() with
safe, argumentized execution (use an API that accepts argv arrays or escape
arguments instead of string interpolation) so the message and branch are passed
as separate args, and in both catch blocks log the caught error (include error
details and context mentioning commit(message) or push(branch)) before returning
null/false (or rethrow if preferred); reference the commit, push, run, and
getCurrentSha symbols when making these changes.
- Around line 33-53: The branch parameter is currently interpolated into shell
commands in fetchBranch, branchExists, and checkoutBranch which allows command
injection; update these functions to call the underlying execFile-style runner
with argument arrays (e.g., run('git', ['fetch', 'origin', branch + ':' +
branch']) or preferably split into ['fetch','origin',`${branch}:${branch}`],
run('git', ['rev-parse','--verify', branch]) and run('git', ['checkout',
branch]) ) instead of string interpolation, or modify the run helper to accept
(cmd, args) and use child_process.execFile/spawn without a shell; ensure you do
not reintroduce shell=true and keep error handling unchanged.
- Around line 20-23: The configureGit function is vulnerable to command
injection because it interpolates config.userName and config.userEmail into
shell strings passed to run; update configureGit to call a non-shell command
executor (e.g., use child_process.execFile or spawn with an args array) instead
of string interpolation, or change the run helper to accept a command and an
args array and call run('git', ['config', 'user.name', config.userName]) and
run('git', ['config', 'user.email', config.userEmail]); ensure inputs are passed
as arguments (not concatenated into a shell) and validate/trim
userName/userEmail before passing them.

In `@collectors/playwright-reporter/src/reporter.ts`:
- Around line 112-117: The upload path is currently ignoring commit()/push()
failures and uses a force-push, so change saveToDataBranch (and any callers like
submitData) to fail-fast on any commit or push error: have saveToDataBranch
return/throw on push failure instead of returning success, and ensure
reporter.ts (the block around saveToDataBranch, returnToOriginalBranch, deploy)
checks that result and aborts deployment when upload fails. In addition, remove
or change the force-push behavior in git-operations.ts push() (stop using
--force) and handle push conflicts (return error to caller) so concurrent CI
jobs cannot silently overwrite each other. Ensure both the first occurrence
(around lines 112–117) and the other occurrence (around lines 177–188) follow
the same fail-fast behavior and propagate push errors up to prevent deploying
stale data.
- Around line 78-84: onTestEnd is currently appending an entry for every attempt
(using result.status), causing duplicate entries for retried tests; change it to
record only the final logical outcome by using test.outcome() instead of
result.status and derive wasFlaky from test.outcome() and test.retries/run
history (so you push one entry per TestCase into this.tests). Update the
onTestEnd implementation to use test.outcome() for status, compute wasFlaky from
the test's final/flaky indicators, and ensure aggregate() continues to operate
on this.tests which will now contain only final outcomes.

In `@openspec/changes/architecture-alignment/proposal.md`:
- Around line 33-41: The CLI relocation is a breaking change because the
published package no longer exposes the previous binaries (test-eyes /
test-eyes-reporter) after removing collectors/playwright-reporter/src/cli.ts and
moving collection into collectors/junit/; either mark this in the proposal as a
breaking change or add a compatibility shim that re-exports or re-installs the
old CLI entrypoints (e.g., restore a small module at
collectors/playwright-reporter/src/cli.ts that imports and delegates to the new
collectors/junit/ entry, and update package.json "bin" to point to those legacy
paths) so existing consumers invoking the published binaries continue to work.

In
`@openspec/changes/architecture-alignment/specs/multi-format-collectors/spec.md`:
- Around line 3-12: Update the spec for collectFromRunData to require that
runData aggregation collapses multiple retry attempts of the same logical test
into a single TestResult (e.g., identify tests by a stable key such as
testId/name + suite/path) and compute flaky counts per logical test rather than
per-attempt; change both Playwright and JUnit scenarios to assert that
collectFromRunData deduplicates attempts, produces one TestResult per logical
test, and increments a flakyCount when a test has mixed pass/fail attempts
across retries before committing/pushing.

In `@openspec/changes/architecture-alignment/specs/playwright-reporter/spec.md`:
- Around line 29-38: The reporter currently duplicates collection, git and file
logic; remove the local implementations in git-operations.ts, file-operations.ts
and aggregate.ts and instead import collectFromRunData from the test-processing
package; update the reporter's onEnd handler (the function named onEnd in the
reporter) to call collectFromRunData(runData) and use its returned payload for
any subsequent save/commit/push operations, replacing calls to local functions
previously defined in git-operations.ts, file-operations.ts and aggregate.ts;
ensure any tests or callers now reference the imported collectFromRunData and
delete the redundant local modules.

---

Outside diff comments:
In `@apps/frontend/src/App.tsx`:
- Line 42: There's a typo in the state setter name: rename setSllowestSorting to
setSlowestSorting where the state is declared (const [slowestSorting,
setSllowestSorting] = useState<SortingState>(...)) and update all usages of
setSllowestSorting elsewhere in this component (e.g., the handler referenced
near the sorting update) to the corrected identifier so the state follows the
slowestSorting / setSlowestSorting convention and remains consistent.

---

Minor comments:
In @.openspec/changes/playwright-reporter/02-solution.md:
- Around line 43-46: Update the flaky-detection expression to use the canonical
Playwright API: replace the old logic that computes wasFlaky using result.status
and result.retry (wasFlaky: result.status === 'passed' && result.retry > 0) with
an isFlaky check that calls test.outcome() === 'flaky' (rename the variable to
isFlaky and use test.outcome() to determine flakiness).

In @.openspec/changes/playwright-reporter/04-status.md:
- Line 15: The demo URL contains a typo: update the string "Demo:
danylosoft-create.github.io/test-eyeT/" to the correct URL "Demo:
danylosoft-create.github.io/test-eyes/" so the demo link points to the intended
"test-eyes" page (look for the line starting with "Demo:" that includes
"test-eyeT").

In `@collectors/playwright-reporter/package.json`:
- Around line 47-50: The package.json currently contains a placeholder
"repository" object; update the repository metadata by replacing the placeholder
"type" and "url" values in the repository field with the actual VCS type (e.g.,
"git") and the real repository URL for this project (or remove the repository
key if this package should not expose a public repo), and confirm package.json's
repository entry matches the repo used by your CI/publishing pipeline so npm
users are directed to the correct project page.

In `@openspec/changes/architecture-alignment/specs/flow-testing/spec.md`:
- Around line 28-30: The spec uses isFlaky but the codebase defines wasFlaky on
TestResult; update the spec language so it references wasFlaky to match the
existing TestResult types (collectors/playwright-reporter/src/types.ts and
apps/test-processing/src/types.ts) and any RunData pushed to pushToGitHub so
expectations read "wasFlaky: true" (keep retries: 2 and status: 'passed'
unchanged); alternatively, if you prefer changing code instead, rename the field
on TestResult and all usages (including RunData and pushToGitHub consumers) from
wasFlaky to isFlaky consistently across those two modules.

---

Nitpick comments:
In `@collectors/playwright-reporter/src/aggregate.ts`:
- Around line 116-120: The current aggregation increments stats.totalRuns but
only updates stats.passCount or stats.failCount for test.status
'passed'/'failed', leaving skipped runs uncounted; modify the aggregation logic
in aggregate.ts to handle test.status 'skipped' by adding a new counter
stats.skipCount (initialize where stats is created) and increment it when
test.status === 'skipped', or alternatively adjust existing logic to not
increment totalRuns for skipped runs—update references to stats.totalRuns,
stats.passCount, stats.failCount, and the new stats.skipCount accordingly so
passCount + failCount + skipCount == totalRuns.

In `@collectors/playwright-reporter/src/file-operations.ts`:
- Around line 10-13: The code builds date and time from two separate
toISOString() calls causing possible mismatch; in generateRunId capture a single
timestamp (e.g., const iso = new Date().toISOString()) and derive date and time
from that single iso value (use iso.split('T')[0] for date and
iso.split('T')[1].slice(0,8).replace(/:/g,'') for time) while keeping shortSha =
commitSha.slice(0,7); update occurrences of now/date/time to use the single
snapshot to avoid edge-case mismatches.
- Around line 54-60: The current listJsonFiles function swallows all fs.readdir
errors and returns an empty array; change error handling so only a
missing-directory (ENOENT) returns [] while other filesystem errors are
rethrown. In practice, inside listJsonFiles catch the thrown error, check
error.code === 'ENOENT' (or error?.code) and return [] for that case, otherwise
throw the original error so permission or transient failures surface; reference
the listJsonFiles function and its use of fs.readdir when making this change.
- Around line 32-38: loadJson currently swallows all errors and always returns
null, hiding malformed JSON and permission problems; update the
loadJson(filepath: string) implementation so it only returns null for "file not
found" (ENOENT) cases but rethrows or propagates other errors (like JSON.parse
syntax errors or EACCES permission errors); specifically, inside loadJson catch
the thrown error, inspect error.code === 'ENOENT' and return null only then,
otherwise rethrow the error so callers see parse/permission failures.

In `@collectors/playwright-reporter/src/git-operations.ts`:
- Around line 55-58: In createOrphanBranch, stop using the shell-suppressed
command "git rm -rf . || true" and instead await run('git rm -rf .') inside a
try/catch so we can handle failures explicitly: remove the "|| true", wrap the
await run call in try/catch, inspect the thrown error (message/stdout/stderr)
from run and silently ignore only the benign "did not match any files" /
"pathspec" (no files to remove) cases, but rethrow or log and propagate any
other errors; reference createOrphanBranch and the run helper to locate and
update the code.

In `@collectors/playwright-reporter/src/junit-parser.ts`:
- Around line 4-45: Remove the dead/unused JUnit XML artifacts: delete the
interfaces JUnitTestCase, JUnitTestSuite, JUnitReport and the helper functions
normalizeToArray, getTestStatus, and parseTestCase from junit-parser.ts; verify
that parseTestAttrs (and any other active parsing helpers) remain untouched,
search the repo for any remaining references to those symbols before committing,
and run type-checks/tests to ensure no regressions.
- Around line 71-74: The regex-based parsing in parseTestAttrs (function
parseTestAttrs) fails for attributes containing XML entities or quote chars used
as delimiters; update the implementation to use a proper XML/HTML parser (e.g.,
DOMParser in Node/browser or an XML library like fast-xml-parser/xml2js) to
extract name, classname and time attributes reliably and unescape entities, or
at minimum document this limitation near parseTestAttrs/TestResult explaining
that names with embedded quotes or XML entities are not supported and recommend
using a real XML parser for broader compatibility.
- Around line 94-97: The parseJUnitFile function currently lets readFile errors
bubble up; wrap the await readFile(filepath, 'utf-8') and subsequent
parseJUnitXml(xml) in a try/catch inside parseJUnitFile, and on error throw or
reject with a clearer message that includes the filepath and the original error
(e.g., "Failed to read/parse JUnit file: <filepath> - <original error>"), or
return a controlled value if preferred; refer to parseJUnitFile and
parseJUnitXml when adding the try/catch and include the original error as the
cause/context for easier debugging.

In `@collectors/playwright-reporter/tests/aggregate.test.ts`:
- Around line 71-107: Add a test assertion that verifies the p95 percentile is
computed by aggregate: create enough run entries for the test case (use the
existing TEST_DIR, run objects like run1/run2 or add run3..runN) with varying
durationMs so the 95th percentile is meaningful, call aggregate(TEST_DIR), read
main-test-data.json into mainData, and add an expectation checking
mainData.tests['test1'].p95DurationMs equals the expected p95 value (referencing
aggregate and main-test-data.json to locate where the value is produced and
stored).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fa738906-ddbe-4866-b85a-3feb0a30cd99

📥 Commits

Reviewing files that changed from the base of the PR and between f49f76a and d1fa915.

⛔ Files ignored due to path filters (6)
  • openspec/.DS_Store is excluded by !**/.DS_Store
  • openspec/changes/.DS_Store is excluded by !**/.DS_Store
  • openspec/changes/componentize-frontend/.DS_Store is excluded by !**/.DS_Store
  • openspec/changes/github-action-install/.DS_Store is excluded by !**/.DS_Store
  • openspec/changes/github-action-install/specs/.DS_Store is excluded by !**/.DS_Store
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (34)
  • .openspec/changes/playwright-reporter/01-problem.md
  • .openspec/changes/playwright-reporter/02-solution.md
  • .openspec/changes/playwright-reporter/03-tasks.md
  • .openspec/changes/playwright-reporter/04-status.md
  • apps/frontend/src/App.tsx
  • apps/frontend/src/hooks/useTestFilter.ts
  • apps/frontend/src/types/index.ts
  • apps/test-processing/src/aggregate.ts
  • apps/test-processing/src/types.ts
  • collectors/playwright-reporter/README.md
  • collectors/playwright-reporter/package.json
  • collectors/playwright-reporter/src/aggregate.ts
  • collectors/playwright-reporter/src/cli.ts
  • collectors/playwright-reporter/src/deploy.ts
  • collectors/playwright-reporter/src/file-operations.ts
  • collectors/playwright-reporter/src/git-operations.ts
  • collectors/playwright-reporter/src/index.ts
  • collectors/playwright-reporter/src/junit-parser.ts
  • collectors/playwright-reporter/src/reporter.ts
  • collectors/playwright-reporter/src/types.ts
  • collectors/playwright-reporter/tests/aggregate.test.ts
  • collectors/playwright-reporter/tests/file-operations.test.ts
  • collectors/playwright-reporter/tests/junit-parser.test.ts
  • collectors/playwright-reporter/tsconfig.json
  • openspec/changes/architecture-alignment/.openspec.yaml
  • openspec/changes/architecture-alignment/README.md
  • openspec/changes/architecture-alignment/design.md
  • openspec/changes/architecture-alignment/proposal.md
  • openspec/changes/architecture-alignment/specs/flow-testing/spec.md
  • openspec/changes/architecture-alignment/specs/multi-format-collectors/spec.md
  • openspec/changes/architecture-alignment/specs/playwright-reporter/spec.md
  • openspec/changes/architecture-alignment/specs/test-aggregation/spec.md
  • openspec/changes/architecture-alignment/tasks.md
  • pnpm-workspace.yaml

Comment on lines +87 to +89
if (test.wasFlaky) {
stats.flakyCount++
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Initialize legacy flakyCount before incrementing.

At Line 88, older aggregated records without flakyCount will produce NaN on increment, corrupting flaky totals.

Proposed fix
-    if (test.wasFlaky) {
-      stats.flakyCount++
-    }
+    if (typeof stats.flakyCount !== 'number' || Number.isNaN(stats.flakyCount)) {
+      stats.flakyCount = 0
+    }
+    if (test.wasFlaky === true) {
+      stats.flakyCount++
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (test.wasFlaky) {
stats.flakyCount++
}
if (typeof stats.flakyCount !== 'number' || Number.isNaN(stats.flakyCount)) {
stats.flakyCount = 0
}
if (test.wasFlaky === true) {
stats.flakyCount++
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/test-processing/src/aggregate.ts` around lines 87 - 89, The code
increments stats.flakyCount when test.wasFlaky but older aggregated records may
lack flakyCount causing NaN; before incrementing in the aggregation logic around
the stats object (where stats and test.wasFlaky are used) ensure flakyCount is
initialized to a numeric 0 if missing or not a number (e.g., set
stats.flakyCount = 0 when typeof stats.flakyCount !== 'number' or use a
fallback) and then perform the increment so flaky totals remain correct.

Comment on lines +12 to +18
"exports": {
".": {
"types": "./dist/index.d.ts",
"import": "./dist/index.js",
"require": "./dist/index.js",
"default": "./dist/index.js"
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In Node.js package exports, if package.jsonhas"type": "module"and"exports": { ".": { "require": "./dist/index.js" } }, can CommonJS require()load that target across Node 18, 20, and 22 without a separate.cjs build?

💡 Result:

No.

Because your package has "type": "module", the target file ./dist/index.js is treated as ESM (Node treats .js as ESM inside a "type": "module" package, and you’d need a .cjs file to force CommonJS). (nodejs.org)

So a CommonJS consumer doing require("your-pkg") will not be able to load ./dist/index.js on Node 18 (it errors, e.g. ERR_REQUIRE_ESM), and on Node 20/22 it only works if you opt into the newer require(esm) behavior (Node 20 added this in v20.17.0, and Node 22 documents it as only supported when --experimental-require-module is enabled, and only for synchronous ESM graphs). (nodejs.org)

For “just works” across Node 18, 20, and 22 without flags, you still need a real CommonJS entry (e.g. a .cjs build, or a CommonJS “wrapper” file / nested package.json with "type": "commonjs" in dist/). (nodejs.org)

Citations:


Drop the require export or ship a separate CommonJS build.

With "type": "module", the file ./dist/index.js is treated as ESM. CommonJS consumers cannot load it via require() on Node 18, and on Node 20–22 it only works with experimental flags. Either remove the require condition from exports["."] or provide a proper .cjs build.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@collectors/playwright-reporter/package.json` around lines 12 - 18, The
package.json exports currently include a "require" condition pointing at
"./dist/index.js" while the package uses "type": "module", causing CommonJS
consumers to fail; either remove the "require" field from the exports map
(exports["."].require) or produce and point it to a dedicated CommonJS build
(e.g., compile a .cjs bundle such as "./dist/index.cjs" and update
exports["."].require to that file), and ensure the "types" and ESM
"import"/"default" still point to the ESM outputs (e.g., "./dist/index.d.ts" and
"./dist/index.js") so Node CJS consumers can require the .cjs file without
breaking ESM consumers.

Comment on lines +30 to +38
"peerDependencies": {
"@playwright/test": ">=1.40.0"
},
"devDependencies": {
"@playwright/test": "^1.52.0",
"@types/node": "^22.0.0",
"typescript": "^5.6.3",
"vitest": "^3.0.0"
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

This manifest still doesn't wire in the shared collection core.

The new OpenSpec change says collectors/playwright-reporter should delegate collection/orchestration to apps/test-processing, but there is no dependency here on that package. As-is, the duplicated aggregate.ts / git-operations.ts / file-operations.ts / deploy.ts stack remains the runtime path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@collectors/playwright-reporter/package.json` around lines 30 - 38, The
package.json for collectors/playwright-reporter is missing a dependency on the
shared collection core in apps/test-processing, so the repo continues to use the
duplicated runtime files (aggregate.ts, git-operations.ts, file-operations.ts,
deploy.ts); update collectors/playwright-reporter's package.json to add a
dependency (or workspace reference) to the apps/test-processing package, change
imports in the reporter code to import the shared orchestration/collection
functions from that package instead of the local duplicate modules (search for
imports referencing aggregate, git-operations, file-operations, deploy) and
remove the duplicated modules from the runtime path so the reporter delegates
collection/orchestration to the shared core.

Comment on lines +15 to +19
```bash
npm install test-eyes-reporter
# or
pnpm add test-eyes-reporter
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Package name inconsistency with published package.

The installation instructions reference test-eyes-reporter, but the status document indicates the published package name is @practica/test-eyes@0.1.4. This will cause installation failures for users.

📝 Proposed fix
 ```bash
-npm install test-eyes-reporter
+npm install `@practica/test-eyes`
 # or
-pnpm add test-eyes-reporter
+pnpm add `@practica/test-eyes`
</details>

<!-- suggestion_start -->

<details>
<summary>📝 Committable suggestion</summary>

> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

```suggestion

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@collectors/playwright-reporter/README.md` around lines 15 - 19, Update the
installation instructions in the README: replace the incorrect package name
string "test-eyes-reporter" with the published package name
"@practica/test-eyes" in both npm and pnpm examples so users can successfully
install the released package; ensure both occurrences in the markdown block use
"@practica/test-eyes".

Comment on lines +30 to +37
reporter: [
['list'],
['test-eyes-reporter', {
dataBranch: 'gh-data', // Branch to store test data
deployBranch: 'gh-pages', // Branch to deploy dashboard
deploy: true // Auto-deploy after test run
}]
]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reporter name should match package name.

The reporter configuration uses 'test-eyes-reporter' which should align with the actual package name.

📝 Proposed fix
   reporter: [
     ['list'],
-    ['test-eyes-reporter', {
+    ['@practica/test-eyes', {
       dataBranch: 'gh-data',    // Branch to store test data
       deployBranch: 'gh-pages', // Branch to deploy dashboard
       deploy: true              // Auto-deploy after test run
     }]
   ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@collectors/playwright-reporter/README.md` around lines 30 - 37, The reporter
string in the Playwright config's reporter array ('test-eyes-reporter') must
match the actual npm package name; open the reporter configuration where the
reporter array is defined (the reporter: [...] block) and replace
'test-eyes-reporter' with the package name exactly as specified in the reporter
package.json (or the installed module name), keeping the existing options object
(dataBranch, deployBranch, deploy) intact so Playwright can resolve the module.

Comment on lines +78 to +84
onTestEnd(test: TestCase, result: PlaywrightTestResult): void {
this.tests.push({
name: test.titlePath().filter(Boolean).join(' > '),
durationMs: result.duration,
status: this.mapStatus(result.status),
wasFlaky: result.status === 'passed' && result.retry > 0
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

In the Playwright Reporter API, does onTestEnd(test, result)run once per retry attempt, and istest.outcome()the API for the final logical test outcome (includingflaky)?

💡 Result:

Yes.

  • reporter.onTestEnd(test, result) is invoked after each individual test run in a worker (i.e., the initial run and every retry), because the result argument is explicitly “Result of the test run”, and each run has its own TestResult with a testResult.retry counter you can use to distinguish attempts. [1], [2]
  • test.outcome() is the API for the final logical outcome of the TestCase, computed from all runs (it can be "flaky" if it passed on a retry). [3]

Sources:
[1] Playwright Reporter API (onTestEnd) https://playwright.dev/docs/next/api/class-reporter
[2] Playwright TestResult.retry https://playwright.dev/docs/api/class-testresult
[3] Playwright TestCase.outcome() / TestCase.results https://playwright.dev/docs/next/api/class-testcase


Record final test outcome, not per-retry attempts.

onTestEnd() runs after each test attempt (initial run and every retry). This code appends an entry for every attempt, so a flaky test—initially failed, then passed on retry—creates two entries in this.tests (failed + passed). When aggregate() counts every entry, flaky tests are double-counted, inflating failure and pass totals.

Use test.outcome() to record only the final logical outcome per test, which correctly reflects flaky status and prevents duplicate counting.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@collectors/playwright-reporter/src/reporter.ts` around lines 78 - 84,
onTestEnd is currently appending an entry for every attempt (using
result.status), causing duplicate entries for retried tests; change it to record
only the final logical outcome by using test.outcome() instead of result.status
and derive wasFlaky from test.outcome() and test.retries/run history (so you
push one entry per TestCase into this.tests). Update the onTestEnd
implementation to use test.outcome() for status, compute wasFlaky from the
test's final/flaky indicators, and ensure aggregate() continues to operate on
this.tests which will now contain only final outcomes.

Comment on lines +112 to +117
const success = await this.saveToDataBranch(context)
await this.returnToOriginalBranch(context.originalBranch)

if (success && this.options.deploy) {
await this.deploy()
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Make the upload path fail-fast and stop force-pushing the data branch.

push() is implemented as a force-push in collectors/playwright-reporter/src/git-operations.ts, so concurrent CI jobs can overwrite each other's newly written run files. This path also ignores commit()/push() failures, so submitData() can still treat the upload as successful and proceed to deployment with stale remote data.

Also applies to: 177-188

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@collectors/playwright-reporter/src/reporter.ts` around lines 112 - 117, The
upload path is currently ignoring commit()/push() failures and uses a
force-push, so change saveToDataBranch (and any callers like submitData) to
fail-fast on any commit or push error: have saveToDataBranch return/throw on
push failure instead of returning success, and ensure reporter.ts (the block
around saveToDataBranch, returnToOriginalBranch, deploy) checks that result and
aborts deployment when upload fails. In addition, remove or change the
force-push behavior in git-operations.ts push() (stop using --force) and handle
push conflicts (return error to caller) so concurrent CI jobs cannot silently
overwrite each other. Ensure both the first occurrence (around lines 112–117)
and the other occurrence (around lines 177–188) follow the same fail-fast
behavior and propagate push errors up to prevent deploying stale data.

Comment on lines +33 to +41
**APIs**:
- New exports from test-processing: `collectFromRunData`, `pushToGitHub`
- TestResult type: add `retries?: number`

**Dependencies**:
- playwright-reporter depends on test-processing (workspace dependency, bundled on publish)

**Breaking**:
- Internal refactor only, npm package API unchanged
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Please call out the CLI move as a breaking change or ship a compatibility shim.

Moving collection into collectors/junit/ and deleting collectors/playwright-reporter/src/cli.ts changes the published test-eyes / test-eyes-reporter binaries this package currently exposes. That makes “npm package API unchanged” too strong as written.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openspec/changes/architecture-alignment/proposal.md` around lines 33 - 41,
The CLI relocation is a breaking change because the published package no longer
exposes the previous binaries (test-eyes / test-eyes-reporter) after removing
collectors/playwright-reporter/src/cli.ts and moving collection into
collectors/junit/; either mark this in the proposal as a breaking change or add
a compatibility shim that re-exports or re-installs the old CLI entrypoints
(e.g., restore a small module at collectors/playwright-reporter/src/cli.ts that
imports and delegates to the new collectors/junit/ entry, and update
package.json "bin" to point to those legacy paths) so existing consumers
invoking the published binaries continue to work.

Comment on lines +3 to +12
### Requirement: Format-agnostic collection core
The system SHALL provide a `collectFromRunData()` function in `apps/test-processing/` that accepts RunData from any collector and handles git operations, saving, and aggregation.

#### Scenario: Playwright collector calls shared core
- **WHEN** Playwright reporter calls `collectFromRunData({ runData, dataBranch: 'gh-data' })`
- **THEN** system saves run data to data branch, aggregates statistics, commits and pushes

#### Scenario: JUnit collector calls shared core
- **WHEN** JUnit collector calls `collectFromRunData({ runData, dataBranch: 'gh-data' })`
- **THEN** system saves run data to data branch, aggregates statistics, commits and pushes
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Encode the per-logical-test/flaky-count contract in the spec.

These scenarios only describe save/aggregate/commit/push orchestration. They never require retries to collapse into one logical TestResult, so the current per-attempt overcounting bug would still satisfy this spec.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@openspec/changes/architecture-alignment/specs/multi-format-collectors/spec.md`
around lines 3 - 12, Update the spec for collectFromRunData to require that
runData aggregation collapses multiple retry attempts of the same logical test
into a single TestResult (e.g., identify tests by a stable key such as
testId/name + suite/path) and compute flaky counts per logical test rather than
per-attempt; change both Playwright and JUnit scenarios to assert that
collectFromRunData deduplicates attempts, produces one TestResult per logical
test, and increments a flakyCount when a test has mixed pass/fail attempts
across retries before committing/pushing.

Comment on lines +29 to +38
### Requirement: Import shared logic from test-processing
The reporter SHALL import `collectFromRunData` from test-processing instead of implementing its own collection logic.

#### Scenario: Uses collectFromRunData
- **WHEN** reporter's onEnd completes
- **THEN** it calls collectFromRunData from test-processing package

#### Scenario: No local collection implementation
- **WHEN** inspecting playwright-reporter source
- **THEN** there is no local implementation of git operations, file saving, or aggregation
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify that playwright-reporter delegates to test-processing and doesn't have local implementations

echo "=== Checking for collectFromRunData import ==="
rg -n 'collectFromRunData' collectors/playwright-reporter/src/

echo ""
echo "=== Checking for local git operations in reporter ==="
rg -n 'git\s+(commit|push|checkout)' collectors/playwright-reporter/src/ || echo "No local git operations found"

echo ""
echo "=== Listing files in playwright-reporter/src ==="
fd . collectors/playwright-reporter/src/ --type f

Repository: practicajs/test-eyes

Length of output: 1059


Reporter implementation violates shared logic architecture requirements.

The playwright-reporter contains local implementations of git operations, file saving, and aggregation that should be delegated to test-processing instead. Specifically:

  • git-operations.ts implements local git operations (git checkout, commit, push)
  • file-operations.ts contains local file handling logic
  • aggregate.ts implements local data aggregation
  • collectFromRunData from test-processing is not imported anywhere in the reporter source

Per the architecture spec, the reporter should import and use collectFromRunData from the test-processing package rather than duplicating this logic locally.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openspec/changes/architecture-alignment/specs/playwright-reporter/spec.md`
around lines 29 - 38, The reporter currently duplicates collection, git and file
logic; remove the local implementations in git-operations.ts, file-operations.ts
and aggregate.ts and instead import collectFromRunData from the test-processing
package; update the reporter's onEnd handler (the function named onEnd in the
reporter) to call collectFromRunData(runData) and use its returned payload for
any subsequent save/commit/push operations, replacing calls to local functions
previously defined in git-operations.ts, file-operations.ts and aggregate.ts;
ensure any tests or callers now reference the imported collectFromRunData and
delete the redundant local modules.

@danylosoft-create danylosoft-create merged commit d1fa915 into main Mar 31, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant