Replace your QE E2E automation backlog with AI-generated Playwright tests β triggered on every commit.
AutoSpec AI is a GitHub Action that analyzes your code changes (via diff), understands what user-facing behavior changed, and generates production-quality Playwright E2E tests that match your existing test style.
Commit / PR β Diff Analysis β LLM Test Planning β Playwright Code Gen β PR with Tests
- Diff Extraction β Detects changed files from push events or pull requests (configurable).
- Smart Filtering β Ignores lockfiles, images, docs, and existing tests. Focuses on source code.
- Test Planning (Phase 1) β LLM analyzes the diff and produces a prioritized test plan in JSON.
- Code Generation (Phase 2) β For each planned test, the LLM generates a Playwright spec file matching your existing test patterns.
- Delivery β Opens a PR with the generated tests (or commits directly).
# .github/workflows/autospec.yml
name: AutoSpec AI
on:
pull_request:
types: [opened, synchronize]
permissions:
contents: write
pull-requests: write
jobs:
generate-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # Required for diff analysis
- uses: autospec-ai/action@v1
with:
llm_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
base_url: 'http://localhost:3000'
framework: 'react'
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}- uses: autospec-ai/action@v1
with:
llm_provider: anthropic
llm_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
llm_model: claude-sonnet-4-20250514 # optional, this is the default- uses: autospec-ai/action@v1
with:
llm_provider: openai
llm_api_key: ${{ secrets.OPENAI_API_KEY }}
llm_model: gpt-4o # optional, this is the default- uses: autospec-ai/action@v1
with:
llm_provider: custom
llm_api_key: ${{ secrets.TOGETHER_API_KEY }}
llm_model: meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo
llm_base_url: https://api.together.xyz/v1AutoSpec analyzes diffs, plans tests, and generates Playwright specs β all automatically. Tests are severity-tagged (@sev1 through @sev4) and match your existing test style.
Captures Playwright traces, screenshots, and video on test failures. Trace files are uploaded as GitHub Actions artifacts for one-click debugging.
trace_on_failure: 'true'
trace_mode: 'retain-on-failure' # on | off | retain-on-failure | on-first-retryAfter a test run, view traces locally:
npx playwright show-trace test-results/<test-name>/trace.zipDetects fetch, axios, useSWR, useQuery, and WebSocket patterns in your source code. Generates page.route() mocks in each test. When the number of route mocks in a single test exceeds the threshold, they are extracted into shared fixture files.
generate_api_mocks: 'true'
mock_error_states: 'true' # Also generate 4xx/5xx error test cases
fixture_extraction_threshold: '3' # Extract to fixtures when route count exceeds thisAdds toHaveScreenshot() assertions at visual checkpoints. On first run, Playwright generates baseline screenshots. Subsequent runs compare against them.
visual_regression: 'true'
visual_threshold: '0.2' # Pixel comparison threshold (0-1)
visual_max_diff_ratio: '0.05' # Max diff pixel ratio before failure
visual_full_page: 'false' # Viewport-only or full-page captureUpdate baselines after intentional UI changes:
npx playwright test --update-snapshotsAutoSpec scans your project for existing page objects, utility functions, and test coverage before generating tests. It injects the actual source code of discovered POM classes and utilities into the LLM prompt, so the model can see exactly what methods, elements, and locators exist β no regex extraction, no guessing.
What it discovers:
- Page Objects β Full source of classes matching POM patterns (methods, elements, locators, routes β everything)
- Utilities β Full source of exported helper functions and constants
- Test Coverage β Individual test names, flows, and routes already under test (so the LLM doesn't duplicate them)
Auto-detection works out of the box for common conventions (**/pages/**/*.ts, **/*.page.ts, **/*.po.ts, **/helpers/**/*.ts, etc.). If your project uses different naming, configure the patterns explicitly:
- uses: autospec-ai/action@v1
with:
llm_api_key: ${{ secrets.OPENAI_API_KEY }}
test_directory: 'e2e/tests' # where to write generated tests
pom_patterns: '**/*.po.ts,**/pageobjects/**/*.ts' # match your POM convention
utility_patterns: '**/helpers/**/*.ts' # match your utility convention
pom_output_directory: 'e2e/pages' # where to write NEW POM files
project_context_budget: '12000' # increase if you have many POMspom_output_directory β When set, the LLM can create new page object classes as separate files instead of inlining them in test specs. If not set, the LLM is instructed to only use existing POMs or raw page.locator() calls.
Tip: Check the Action logs for Discovered: X page objects, Y utility files, Z tested files to verify the scanner is finding your project's artifacts. If the counts are 0, your file naming doesn't match the default patterns β set pom_patterns and utility_patterns explicitly.
Adds toMatchAriaSnapshot() assertions to validate accessibility tree structure. Optionally generates a dedicated axe-core scan test case.
accessibility_assertions: 'true'
axe_scan: 'true'
axe_standard: 'wcag2aa' # wcag2a | wcag2aa | wcag21a | wcag21aa | best-practice| Input | Default | Description |
|---|---|---|
llm_provider |
anthropic |
anthropic, openai, or custom |
llm_api_key |
(required) | API key (use GitHub secrets) |
llm_model |
auto | Model name (provider-specific defaults) |
llm_base_url |
β | Custom endpoint for OpenAI-compatible APIs |
test_directory |
e2e/generated |
Where to write generated test files |
test_pattern |
e2e/**/*.spec.ts |
Glob to find existing tests for style matching |
base_url |
http://localhost:3000 |
App URL for Playwright config |
framework |
generic |
react, vue, svelte, angular, nextjs, generic |
diff_mode |
auto |
auto, pr, or push |
include_paths |
β | Comma-separated path prefixes to include |
exclude_paths |
test/,tests/,... |
Comma-separated path prefixes to exclude |
auto_commit |
false |
Commit tests directly to the branch |
auto_pr |
true |
Create a separate PR with generated tests |
max_test_files |
5 |
Cap on tests generated per run |
dry_run |
false |
Preview without writing files |
custom_instructions |
β | Additional context for the LLM |
| Input | Default | Description |
|---|---|---|
trace_on_failure |
true |
Enable Playwright trace collection for test failures |
trace_mode |
retain-on-failure |
Trace mode: on, off, retain-on-failure, on-first-retry |
| Input | Default | Description |
|---|---|---|
generate_api_mocks |
false |
Detect API dependencies and generate page.route() mocks |
mock_error_states |
false |
Generate additional test cases for API error responses |
fixture_extraction_threshold |
3 |
Number of page.route() calls before extracting into a shared fixture |
| Input | Default | Description |
|---|---|---|
visual_regression |
false |
Add toHaveScreenshot() assertions at visual checkpoints |
visual_threshold |
0.2 |
Pixel comparison threshold (0-1) |
visual_max_diff_ratio |
0.05 |
Maximum allowed diff pixel ratio (0-1) |
visual_full_page |
false |
Capture full-page screenshots instead of viewport only |
| Input | Default | Description |
|---|---|---|
pom_patterns |
(auto-detected) | Comma-separated globs for page object files (e.g., **/*.po.ts,**/pages/**/*.ts) |
utility_patterns |
(auto-detected) | Comma-separated globs for helper/utility files (e.g., **/helpers/**/*.ts) |
pom_output_directory |
β | Directory for generated POM files. When set, the LLM creates new POMs as separate files instead of inlining them in test specs |
project_context_budget |
8000 |
Approximate token budget for project context injected into LLM prompts |
When left empty, the scanner uses built-in patterns:
- Page Objects:
**/*.page.ts,**/pages/**/*.ts,**/page-objects/**/*.ts,**/*.pom.ts,**/*.po.ts,**/pom/**/*.ts - Utilities:
**/helpers/**/*.ts,**/utils/**/*.ts,**/fixtures/**/*.ts,**/support/**/*.ts,**/*.helper.ts,**/*.util.ts
| Input | Default | Description |
|---|---|---|
accessibility_assertions |
false |
Add toMatchAriaSnapshot() assertions for changed components |
axe_scan |
false |
Generate a dedicated axe-core accessibility scan test case |
axe_standard |
wcag2aa |
axe-core standard: wcag2a, wcag2aa, wcag21a, wcag21aa, best-practice |
| Output | Description |
|---|---|
tests_generated |
Number of test files created |
test_files |
JSON array of generated test file paths |
fixture_files |
JSON array of generated fixture file paths (when API mock generation is enabled) |
pr_number |
PR number (if auto_pr is true) |
summary |
Human-readable summary |
- uses: autospec-ai/action@v1
with:
llm_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
base_url: 'http://localhost:3000'
framework: 'react'
# Trace
trace_on_failure: 'true'
trace_mode: 'retain-on-failure'
# API Mocks
generate_api_mocks: 'true'
mock_error_states: 'true'
fixture_extraction_threshold: '3'
# Visual Regression
visual_regression: 'true'
visual_threshold: '0.2'
visual_max_diff_ratio: '0.05'
visual_full_page: 'false'
# Accessibility
accessibility_assertions: 'true'
axe_scan: 'true'
axe_standard: 'wcag2aa'
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}- uses: autospec-ai/action@v1
with:
llm_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
include_paths: 'src/components/,src/pages/'
exclude_paths: 'src/components/__tests__/'- uses: autospec-ai/action@v1
id: autospec
with:
llm_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
dry_run: 'true'
- name: Comment preview
if: github.event_name == 'pull_request' && steps.autospec.outputs.tests_generated != '0'
uses: actions/github-script@v7
env:
TESTS_GENERATED: ${{ steps.autospec.outputs.tests_generated }}
SUMMARY: ${{ steps.autospec.outputs.summary }}
with:
script: |
const testsGenerated = process.env.TESTS_GENERATED;
const summary = process.env.SUMMARY;
github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: context.issue.number,
body: `### π€ AutoSpec Preview\nGenerated **${testsGenerated}** test(s).\n\n${summary}`
});- uses: autospec-ai/action@v1
with:
llm_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
auto_pr: 'false'
auto_commit: 'false'
- name: Install Playwright
run: npx playwright install --with-deps
- name: Run generated tests
run: npx playwright test e2e/generated/# Run only critical tests for hotfix branches
- name: Run critical tests
if: startsWith(github.head_ref, 'hotfix/')
run: npx playwright test --grep "@sev1"
# Run sev1 + sev2 for staging
- name: Run high-priority tests
run: npx playwright test --grep "@sev1|@sev2"- uses: autospec-ai/action@v1
with:
llm_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
custom_instructions: |
- Our app uses Clerk for auth. Mock auth with: await clerk.signIn(page)
- All API calls go through /api/v2/ prefix
- Use data-cy attributes for selectors (our convention)
- We use MSW for API mocking in testsAutoSpec reads your existing test files (matched by test_pattern) and uses the best example as a style reference. The generated tests will match:
- Import patterns and test structure
- Naming conventions
- Selector strategies (data-testid, role, etc.)
- Setup/teardown patterns
- Assertion style
If no existing tests are found, it generates clean Playwright tests following official best practices.
src/
βββ index.ts # Action entry point, config parsing
βββ types.ts # Shared TypeScript types
βββ providers/
β βββ index.ts # Provider factory
β βββ anthropic.ts # Anthropic Claude client
β βββ openai.ts # OpenAI / compatible client
βββ diff/
β βββ analyzer.ts # Git diff extraction & filtering
βββ discovery/
β βββ project-scanner.ts # Scans for existing POMs, utilities, and test coverage
βββ generator/
β βββ prompts.ts # Two-phase prompt construction with feature-conditional sections
β βββ test-generator.ts # Orchestrates planning + generation + post-processing
βββ utils/
βββ git-ops.ts # Commit & PR creation
βββ test-post-processor.ts # Trace injection, axe imports, screenshot normalization
βββ fixture-extractor.ts # Extracts page.route() mocks into shared fixture files
βββ trace-uploader.ts # Uploads traces as GitHub Actions artifacts
Generated test code passes through a post-processing pipeline in this order:
- Strip markdown fences β Remove any
\``typescript` wrappers from LLM output - Inject trace config β Add
test.use({ trace, screenshot, video })block - Ensure axe import β Add
@axe-core/playwrightimport ifAxeBuilderis used - Normalize screenshots β Add threshold/maxDiffRatio/fullPage options to
toHaveScreenshot()calls
When generate_api_mocks is enabled and a test contains more page.route() calls than fixture_extraction_threshold, the mocks are extracted into a fixtures/<name>.fixtures.ts file with a setupApiMocks(page) function. The test is rewritten to import and call it.
npm install
npm run build # Compile with ncc
npm run lint
npm testMIT