Skip to content

feat: page snapshot scripts for offline selector testing#10

Open
blubblubservice wants to merge 1 commit intomainfrom
feat/page-snapshots
Open

feat: page snapshot scripts for offline selector testing#10
blubblubservice wants to merge 1 commit intomainfrom
feat/page-snapshots

Conversation

@blubblubservice
Copy link
Collaborator

@blubblubservice blubblubservice commented Feb 18, 2026

What

  • scripts/save-snapshots.ts — Playwright script to fetch avto.net pages (may be blocked by Cloudflare)
  • scripts/test-selectors.ts — Load saved HTML snapshots and test all current selectors offline, reporting populated vs null fields
  • snapshots/ directory with README instructions
  • npm run snapshot and npm run test-selectors scripts

Why

Cloudflare blocks automated access to avto.net from our server. This tooling lets the team:

  1. Save HTML pages manually from a real browser
  2. Test/debug selectors offline without hitting the live site
  3. Iterate on the <!-- DATA --> / <!-- PRICE --> comment walkers

Note

Cloudflare Turnstile challenge blocks all headless/headed Playwright from our server IP. HTML snapshots need to be saved manually from a real browser (Ctrl+S → HTML only). See snapshots/README.md.

Summary by CodeRabbit

Release Notes

  • Chores

    • Added new npm scripts for snapshot management and selector testing functionality.
  • Documentation

    • Added guide for offline snapshot usage and development workflows.

- scripts/save-snapshots.ts: Playwright script to fetch avto.net pages
- scripts/test-selectors.ts: Load saved HTML and test all selectors offline
- snapshots/README.md: Instructions for manual snapshot saving (Cloudflare blocks automation)
- npm run snapshot / npm run test-selectors

Note: Cloudflare blocks automated Playwright access. Save HTML manually from
a real browser (see snapshots/README.md), then use test-selectors to verify.
@coderabbitai
Copy link

coderabbitai bot commented Feb 18, 2026

📝 Walkthrough

Walkthrough

Introduces infrastructure for offline HTML snapshot capture and selector testing of avto.net pages. Adds two snapshot-saving scripts (save-snapshots.ts and save-snapshots-v2.ts) that use Playwright to capture search and detail page HTML while bypassing Cloudflare. Includes a test-selectors.ts script to validate DOM selectors against snapshots. Updates package.json with new npm scripts and adds documentation.

Changes

Cohort / File(s) Summary
Package Configuration
package.json
Added "snapshot" and "test-selectors" npm scripts to execute corresponding TypeScript utilities; adjusted trailing comma formatting.
Snapshot Capture Scripts
scripts/save-snapshots.ts, scripts/save-snapshots-v2.ts
Two Playwright-based scripts that automate browser navigation to avto.net, capture full-page HTML snapshots of search and detail pages, handle Cloudflare detection, extract detail page links, and save outputs to snapshots/ directory. Both include retry logic, dynamic delays, and error handling.
Selector Testing
scripts/test-selectors.ts
New script that loads HTML snapshots from snapshots/ directory and analyzes DOM content to verify selector outputs. Extracts structured data (titles, prices, specs, links, pagination) from both detail and search pages, providing per-file validation status.
Documentation
snapshots/README.md
Usage guide for HTML snapshot feature, including manual capture instructions, automated script details, offline testing methodology, and categorized artifact listing.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Snapshots captured, Cloudflare we tame,
Selectors tested in offline domain,
Browser automation spins up with glee,
Detail pages parsed for all the world to see!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: page snapshot scripts for offline selector testing' clearly and concisely summarizes the main change: adding scripts for capturing HTML snapshots and testing selectors offline, which aligns directly with all the primary file additions (save-snapshots.ts, test-selectors.ts, and supporting documentation).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/page-snapshots

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (3)
scripts/test-selectors.ts (1)

31-44: Empty catch {} blocks in a diagnostic tool hide selector errors.

In the snapshot scripts, swallowing errors is fine (best-effort). But in a testing tool whose purpose is to report what works and what doesn't, a silently caught exception will show up as ❌ null instead of revealing that the selector threw. Consider logging the error so you can distinguish "field not found" from "selector crashed."

Proposed fix (for both blocks)
-        } catch {}
+        } catch (e) { r.priceError = String(e); }
-        } catch {}
+        } catch (e) { r.specsError = String(e); }

Also applies to: 52-78

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/test-selectors.ts` around lines 31 - 44, The empty catch block in the
TreeWalker block (in scripts/test-selectors.ts) swallows selector errors causing
tests to report null instead of an error; update the catch to capture the
exception and record or log it (e.g., set r.priceComment or a new r.priceError
with the error.message/stack and/or call console.error) so failures that are
selector exceptions are distinguishable from "not found" results — apply the
same change to the other similar block that inspects nodes (lines referencing
document.createTreeWalker, r.priceComment, r.priceFromComment).
scripts/save-snapshots.ts (2)

73-81: Snapshots are saved without verifying Cloudflare was bypassed.

Unlike save-snapshots-v2.ts which checks page content and applies a -cf-blocked suffix, this script blindly saves whatever HTML is returned (line 80). If CF blocks the request, you get a challenge page named search-1.html that looks like a valid snapshot. When fed to test-selectors, every selector silently reports ❌ null with no indication the snapshot itself is bad.

Consider adding a basic content check (e.g., look for challenge-platform or the absence of expected markers) and either renaming or logging a warning.

Proposed fix
             const html = await page.content();
+            const cfBlocked = html.includes('challenge-platform') || (!html.includes('details.asp') && !html.includes('ResultsAd'));
-            const filename = `search-${i + 1}.html`;
+            const filename = cfBlocked ? `search-${i + 1}-cf-blocked.html` : `search-${i + 1}.html`;
+            if (cfBlocked) console.warn(`  ⚠️ Cloudflare may have blocked this page`);
             writeFileSync(join(SNAPSHOT_DIR, filename), html, 'utf-8');
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/save-snapshots.ts` around lines 73 - 81, The script saves snapshots
without checking for Cloudflare challenge pages, so update the block around
page.goto / page.content in scripts/save-snapshots.ts to detect CF by inspecting
the HTML for known markers (e.g., 'challenge-platform', 'Cloudflare', or absence
of expected page markers used by test-selectors), and if detected either append
a '-cf-blocked' suffix to the filename (constructed as `search-${i + 1}.html`)
or emit a clear warning log before writing; use the existing SNAPSHOT_DIR,
filename variable and writeFileSync call, mirror the detection/handling logic
from save-snapshots-v2.ts so test-selectors can skip or flag these snapshots.

1-144: Significant overlap with save-snapshots-v2.ts.

Both scripts share browser config, anti-detection setup, delay helpers, URL patterns, and file-saving logic. If you plan to iterate on these further, consider extracting shared config/utilities into a common module to avoid the two scripts drifting apart. Fine to defer given these are dev tools.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/save-snapshots.ts` around lines 1 - 144, Extract the duplicated setup
into a shared dev-utils module: move the delay(ms) helper, SNAPSHOT_DIR and
SEARCH_URLS constants, the Chromium launch/config object and the anti-detection
init script (the context.addInitScript block) into a common export (e.g., create
functions getBrowserOptions(), getContextInitScript(), getSnapshotDir(),
getSearchUrls(), delay()). Update both save-snapshots.ts and
save-snapshots-v2.ts to import these symbols and call them from their main()
functions (keep per-script logic like which pages to fetch and file naming in
each file); ensure writeFileSync usage remains local but uses the shared
SNAPSHOT_DIR value.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/save-snapshots-v2.ts`:
- Around line 46-58: The browser launched by chromium.launch() in main() can
leak if an exception occurs before browser.close(); declare let browser; before
calling chromium.launch(), then wrap the launch and the rest of main()'s body in
a try/finally block and call await browser?.close() in the finally to guarantee
cleanup; apply the same try/finally pattern around the other launch/close pair
referenced at lines 171-172 so every browser instance is closed on error.
- Around line 128-145: The paginated snapshot code always writes "search-2.html"
even when the page is Cloudflare-challenged; update the block that calls
waitForCloudflare(page, 60) to mirror the earlier search-1 logic: capture the
page HTML into html, determine whether the page is CF-blocked (use the same
check/flag used for the first search capture), and set the filename to include
"-cf-blocked" when blocked before calling writeFileSync(join(SNAPSHOT_DIR,
filename), html, 'utf-8') and logging. Ensure you reference the same detection
mechanism used with waitForCloudflare and reuse symbols html, SNAPSHOT_DIR,
writeFileSync and waitForCloudflare so naming is consistent with the initial
search snapshot.

In `@scripts/test-selectors.ts`:
- Around line 142-172: The main() function may leak the Playwright browser if an
exception occurs between chromium.launch() and browser.close(); wrap the browser
lifecycle in a try/finally (or use a local variable and close in finally) so
browser.close() always runs even on errors, and change the top-level caller
(currently main().catch(console.error)) to log the error and exit non-zero
(e.g., process.exit(1)) so failures don’t return code 0; update references
inside main (browser, chromium.launch(), browser.close(), testSearchPage,
testDetailPage) accordingly to ensure deterministic cleanup and non-zero exit on
failure.

In `@snapshots/README.md`:
- Around line 1-33: Add a snapshots/.gitignore that ignores *.html but keeps
.gitkeep (e.g., add a negation for .gitkeep) so saved HTML snapshots aren’t
committed; then update README.md to remove the "Commit and push" step and
document the new behavior. Also either wire up save-snapshots-v2.ts by adding an
npm script (e.g., "snapshot:v2": "ts-node scripts/save-snapshots-v2.ts" or
similar) and a short README note describing when to use v2 (Cloudflare handling)
or delete save-snapshots-v2.ts if v1 is the intended/only supported script.
Ensure references use the exact names snapshots/.gitignore, README.md,
save-snapshots-v2.ts and the npm script key you add (e.g., snapshot:v2).

---

Nitpick comments:
In `@scripts/save-snapshots.ts`:
- Around line 73-81: The script saves snapshots without checking for Cloudflare
challenge pages, so update the block around page.goto / page.content in
scripts/save-snapshots.ts to detect CF by inspecting the HTML for known markers
(e.g., 'challenge-platform', 'Cloudflare', or absence of expected page markers
used by test-selectors), and if detected either append a '-cf-blocked' suffix to
the filename (constructed as `search-${i + 1}.html`) or emit a clear warning log
before writing; use the existing SNAPSHOT_DIR, filename variable and
writeFileSync call, mirror the detection/handling logic from
save-snapshots-v2.ts so test-selectors can skip or flag these snapshots.
- Around line 1-144: Extract the duplicated setup into a shared dev-utils
module: move the delay(ms) helper, SNAPSHOT_DIR and SEARCH_URLS constants, the
Chromium launch/config object and the anti-detection init script (the
context.addInitScript block) into a common export (e.g., create functions
getBrowserOptions(), getContextInitScript(), getSnapshotDir(), getSearchUrls(),
delay()). Update both save-snapshots.ts and save-snapshots-v2.ts to import these
symbols and call them from their main() functions (keep per-script logic like
which pages to fetch and file naming in each file); ensure writeFileSync usage
remains local but uses the shared SNAPSHOT_DIR value.

In `@scripts/test-selectors.ts`:
- Around line 31-44: The empty catch block in the TreeWalker block (in
scripts/test-selectors.ts) swallows selector errors causing tests to report null
instead of an error; update the catch to capture the exception and record or log
it (e.g., set r.priceComment or a new r.priceError with the error.message/stack
and/or call console.error) so failures that are selector exceptions are
distinguishable from "not found" results — apply the same change to the other
similar block that inspects nodes (lines referencing document.createTreeWalker,
r.priceComment, r.priceFromComment).

Comment on lines +46 to +58
async function main() {
console.log('Launching browser v2 (longer waits, Turnstile interaction)...');

const browser = await chromium.launch({
headless: false,
args: [
'--disable-blink-features=AutomationControlled',
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--window-size=1920,1080',
],
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Browser is not closed if an error occurs mid-execution.

If any unhandled error is thrown between chromium.launch() and browser.close(), the browser process leaks. Wrap the body of main() in a try/finally to ensure cleanup.

🛡️ Proposed fix
 async function main() {
     console.log('Launching browser v2 (longer waits, Turnstile interaction)...');
 
     const browser = await chromium.launch({
         headless: false,
         args: [ /* ... */ ],
     });
 
+    try {
     const context = await browser.newContext({ /* ... */ });
     // ... rest of main body ...
-    await browser.close();
-    console.log('\nDone!');
+    } finally {
+        await browser.close();
+        console.log('\nDone!');
+    }
 }

Also applies to: 171-172

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/save-snapshots-v2.ts` around lines 46 - 58, The browser launched by
chromium.launch() in main() can leak if an exception occurs before
browser.close(); declare let browser; before calling chromium.launch(), then
wrap the launch and the rest of main()'s body in a try/finally block and call
await browser?.close() in the finally to guarantee cleanup; apply the same
try/finally pattern around the other launch/close pair referenced at lines
171-172 so every browser instance is closed on error.

Comment on lines +128 to +145
// If we got through, try a second search page via pagination
if (detailUrls.length > 0) {
console.log('\nTrying to navigate to page 2 via pagination...');
await delay(10000);
try {
const nextLink = await page.$('a:has-text("Naslednja"), a:has-text("»"), a:has-text("2")');
if (nextLink) {
await nextLink.click();
await page.waitForLoadState('domcontentloaded', { timeout: 30000 });
await waitForCloudflare(page, 60);
const html = await page.content();
writeFileSync(join(SNAPSHOT_DIR, 'search-2.html'), html, 'utf-8');
console.log(`Saved search-2.html (${(html.length / 1024).toFixed(0)} KB)`);
}
} catch (e) {
console.log('Could not get page 2:', e);
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Pagination snapshot doesn't apply the cf-blocked naming convention.

Search page 1 (line 110) conditionally names the file with -cf-blocked when CF isn't passed, but the paginated search-2 (line 139) always saves as search-2.html even though waitForCloudflare is called on line 137. This could produce a snapshot that looks valid but actually contains the CF challenge page.

Proposed fix
-                await waitForCloudflare(page, 60);
+                const passed = await waitForCloudflare(page, 60);
                 const html = await page.content();
-                writeFileSync(join(SNAPSHOT_DIR, 'search-2.html'), html, 'utf-8');
-                console.log(`Saved search-2.html (${(html.length / 1024).toFixed(0)} KB)`);
+                const filename = passed ? 'search-2.html' : 'search-2-cf-blocked.html';
+                writeFileSync(join(SNAPSHOT_DIR, filename), html, 'utf-8');
+                console.log(`Saved ${filename} (${(html.length / 1024).toFixed(0)} KB)`);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// If we got through, try a second search page via pagination
if (detailUrls.length > 0) {
console.log('\nTrying to navigate to page 2 via pagination...');
await delay(10000);
try {
const nextLink = await page.$('a:has-text("Naslednja"), a:has-text("»"), a:has-text("2")');
if (nextLink) {
await nextLink.click();
await page.waitForLoadState('domcontentloaded', { timeout: 30000 });
await waitForCloudflare(page, 60);
const html = await page.content();
writeFileSync(join(SNAPSHOT_DIR, 'search-2.html'), html, 'utf-8');
console.log(`Saved search-2.html (${(html.length / 1024).toFixed(0)} KB)`);
}
} catch (e) {
console.log('Could not get page 2:', e);
}
}
// If we got through, try a second search page via pagination
if (detailUrls.length > 0) {
console.log('\nTrying to navigate to page 2 via pagination...');
await delay(10000);
try {
const nextLink = await page.$('a:has-text("Naslednja"), a:has-text("»"), a:has-text("2")');
if (nextLink) {
await nextLink.click();
await page.waitForLoadState('domcontentloaded', { timeout: 30000 });
const passed = await waitForCloudflare(page, 60);
const html = await page.content();
const filename = passed ? 'search-2.html' : 'search-2-cf-blocked.html';
writeFileSync(join(SNAPSHOT_DIR, filename), html, 'utf-8');
console.log(`Saved ${filename} (${(html.length / 1024).toFixed(0)} KB)`);
}
} catch (e) {
console.log('Could not get page 2:', e);
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/save-snapshots-v2.ts` around lines 128 - 145, The paginated snapshot
code always writes "search-2.html" even when the page is Cloudflare-challenged;
update the block that calls waitForCloudflare(page, 60) to mirror the earlier
search-1 logic: capture the page HTML into html, determine whether the page is
CF-blocked (use the same check/flag used for the first search capture), and set
the filename to include "-cf-blocked" when blocked before calling
writeFileSync(join(SNAPSHOT_DIR, filename), html, 'utf-8') and logging. Ensure
you reference the same detection mechanism used with waitForCloudflare and reuse
symbols html, SNAPSHOT_DIR, writeFileSync and waitForCloudflare so naming is
consistent with the initial search snapshot.

Comment on lines +142 to +172
async function main() {
if (!existsSync(SNAPSHOT_DIR)) {
console.error('No snapshots/ directory found. Save HTML snapshots first (see snapshots/README.md).');
process.exit(1);
}

const files = readdirSync(SNAPSHOT_DIR).filter(f => f.endsWith('.html'));
if (files.length === 0) {
console.error('No .html files in snapshots/. Save avto.net pages manually (see snapshots/README.md).');
process.exit(1);
}

const browser = await chromium.launch({ headless: true });
const page = await browser.newPage();

const detailFiles = files.filter(f => f.startsWith('detail'));
const searchFiles = files.filter(f => f.startsWith('search'));

if (searchFiles.length > 0) {
console.log('\n📋 SEARCH RESULTS PAGES');
for (const f of searchFiles) await testSearchPage(page, f);
}

if (detailFiles.length > 0) {
console.log('\n📄 DETAIL PAGES');
for (const f of detailFiles) await testDetailPage(page, f);
}

await browser.close();
console.log('\nDone.');
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Browser not cleaned up on mid-execution failure; inconsistent exit behavior.

Same resource-leak pattern as save-snapshots-v2.ts: if an error occurs between chromium.launch() and browser.close(), the process leaks. Additionally, line 174 uses main().catch(console.error) which logs but exits with code 0, unlike the snapshot scripts that call process.exit(1). For a dev tool invoked via npm run test-selectors, a silent zero-exit on failure can be misleading.

Proposed fix
 async function main() {
     // ... validation ...
     const browser = await chromium.launch({ headless: true });
+    try {
     const page = await browser.newPage();
     // ... test logic ...
-    await browser.close();
-    console.log('\nDone.');
+    } finally {
+        await browser.close();
+        console.log('\nDone.');
+    }
 }
 
-main().catch(console.error);
+main().catch((e) => {
+    console.error('Fatal:', e);
+    process.exit(1);
+});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/test-selectors.ts` around lines 142 - 172, The main() function may
leak the Playwright browser if an exception occurs between chromium.launch() and
browser.close(); wrap the browser lifecycle in a try/finally (or use a local
variable and close in finally) so browser.close() always runs even on errors,
and change the top-level caller (currently main().catch(console.error)) to log
the error and exit non-zero (e.g., process.exit(1)) so failures don’t return
code 0; update references inside main (browser, chromium.launch(),
browser.close(), testSearchPage, testDetailPage) accordingly to ensure
deterministic cleanup and non-zero exit on failure.

Comment on lines +1 to +33
# HTML Snapshots

Save HTML snapshots of avto.net pages here for offline selector development.

## How to save snapshots

Cloudflare blocks automated access. Save pages manually from a real browser:

1. Open Chrome, navigate to avto.net
2. Find a search results page → Right-click → "Save as" → "Webpage, HTML Only"
3. Save as `search-1.html`, `search-2.html`, etc.
4. Open 3-5 individual listing detail pages → Save each as `detail-1.html`, `detail-2.html`, etc.
5. Commit and push

Or use the automated script (may be blocked by Cloudflare):
```bash
npm run snapshot
```

## Testing selectors offline

Once you have HTML files here, test selectors against them:
```bash
npm run test-selectors
```

This loads each snapshot and runs the current detail/search selectors, reporting which fields are populated vs null.

## Files

- `search-*.html` — Search results pages
- `detail-*.html` — Individual listing detail pages
- `.gitkeep` — Keeps this directory in git
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check snapshots directory structure
echo "=== Snapshots directory structure ==="
fd --type f . snapshots/ 2>/dev/null | head -20

echo -e "\n=== Check for .gitkeep ==="
find snapshots -name ".gitkeep" 2>/dev/null && echo "Found" || echo "Not found"

echo -e "\n=== Check for .gitignore in snapshots ==="
find snapshots -name ".gitignore" 2>/dev/null && echo "Found" || echo "Not found"

echo -e "\n=== Root .gitignore snapshot patterns ==="
cat .gitignore 2>/dev/null | grep -i snapshot || echo "No snapshot pattern found"

echo -e "\n=== Check for save-snapshots-v2.ts ==="
find . -name "*save-snapshots*" -o -name "*snapshot*v2*" 2>/dev/null | grep -v node_modules

Repository: blubblub/test-scraper

Length of output: 392


🏁 Script executed:

#!/bin/bash
# Check package.json for snapshot-related scripts
echo "=== Package.json snapshot scripts ==="
if [ -f package.json ]; then
  jq '.scripts | to_entries[] | select(.key | contains("snapshot")) | "\(.key): \(.value)"' package.json 2>/dev/null
else
  echo "package.json not found"
fi

Repository: blubblub/test-scraper

Length of output: 148


🏁 Script executed:

#!/bin/bash
# Verify content of snapshots/README.md to confirm what it says
echo "=== Content of snapshots/README.md ==="
cat -n snapshots/README.md 2>/dev/null | head -35

Repository: blubblub/test-scraper

Length of output: 1300


🏁 Script executed:

#!/bin/bash
# Check what's in snapshots/.gitignore
echo "=== snapshots/.gitignore content ==="
cat snapshots/.gitignore

echo -e "\n=== Check if v2 is referenced in package.json ==="
cat package.json | grep -i "v2\|save-snapshots-v2"

Repository: blubblub/test-scraper

Length of output: 222


🏁 Script executed:

#!/bin/bash
# Check save-snapshots-v2.ts to understand its purpose
echo "=== save-snapshots-v2.ts first 30 lines ==="
head -30 scripts/save-snapshots-v2.ts

echo -e "\n=== Check for any documentation about v2 ==="
grep -r "save-snapshots-v2" --include="*.md" --include="*.ts" --include="*.js" . 2>/dev/null | head -10

Repository: blubblub/test-scraper

Length of output: 1458


Create snapshots/.gitignore to prevent HTML snapshots from being committed, and document or configure the v2 script.

The README instructs users to "Commit and push" HTML snapshots (step 5), but there is no snapshots/.gitignore to prevent large HTML files from bloating the repository history. Add a .gitignore file that ignores *.html while preserving .gitkeep.

Additionally, save-snapshots-v2.ts exists in the codebase with enhanced Cloudflare handling, but it is not:

  • Added to npm scripts (only v1 is configured)
  • Documented in the README or elsewhere

Either add v2 to the npm scripts and document when to use it, or remove the file if v1 is the intended approach.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@snapshots/README.md` around lines 1 - 33, Add a snapshots/.gitignore that
ignores *.html but keeps .gitkeep (e.g., add a negation for .gitkeep) so saved
HTML snapshots aren’t committed; then update README.md to remove the "Commit and
push" step and document the new behavior. Also either wire up
save-snapshots-v2.ts by adding an npm script (e.g., "snapshot:v2": "ts-node
scripts/save-snapshots-v2.ts" or similar) and a short README note describing
when to use v2 (Cloudflare handling) or delete save-snapshots-v2.ts if v1 is the
intended/only supported script. Ensure references use the exact names
snapshots/.gitignore, README.md, save-snapshots-v2.ts and the npm script key you
add (e.g., snapshot:v2).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant