Skip to content

chore(rrweb): Replace fast-mhtml with inline MHTML parser#274

Open
chargome wants to merge 5 commits intosentry-v2from
chargome/chore/remove-fast-mhtml
Open

chore(rrweb): Replace fast-mhtml with inline MHTML parser#274
chargome wants to merge 5 commits intosentry-v2from
chargome/chore/remove-fast-mhtml

Conversation

@chargome
Copy link
Copy Markdown
Member

@chargome chargome commented Mar 26, 2026

Remove the fast-mhtml dependency which was only used in one test utility function (packages/rrweb/test/utils.ts) for parsing MHTML snapshots in replayer E2E tests.

Replace with a minimal inline parser (~30 lines) that handles multipart MIME boundary splitting and quoted-printable content decoding. All 47 replayer tests pass.

fast-mhtml pulled in cheerio, express, undici, qs, bluebird, and cookie — a massive transitive tree for a simple test helper. This removes ~584 lines from yarn.lock.

Dependabot alerts resolved

Fully resolved (vulnerable package completely removed from lockfile):

Alert Severity Package Summary
#166 MEDIUM qs arrayLimit bypass allows DoS via memory exhaustion
#183 LOW qs arrayLimit bypass in comma parsing allows DoS

Partially resolved (some entries removed, but package still exists via other dependency chains):

Alert Severity Package Remaining source
#225, #224, #223, #222, #221, #170, #130, #112 HIGH/MEDIUM/LOW undici Still pulled in by puppeteer (Phase 3)
#100 LOW cookie Still pulled in by @sveltejs/kit (Phase 5)

closes https://linear.app/getsentry/issue/SDK-1097/replace-fast-mhtml-9-alerts

chargome and others added 2 commits March 26, 2026 15:29
Remove the fast-mhtml dependency which was only used in one test utility
function for parsing MHTML snapshots. Replace with a minimal inline
parser (~30 lines) that handles multipart MIME boundary splitting and
quoted-printable decoding.

fast-mhtml pulled in cheerio, express, undici, qs, and bluebird — a
massive transitive tree for a simple test helper. This removes ~584
lines from yarn.lock.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@linear-code
Copy link
Copy Markdown

linear-code bot commented Mar 27, 2026

@chargome chargome marked this pull request as ready for review March 27, 2026 09:30
chargome and others added 2 commits March 30, 2026 14:21
Remove dead `filename.includes('frame')` check that could never trigger
after replacing fast-mhtml with inline parser. Fix `decodeQuotedPrintable`
to correctly handle multi-byte UTF-8 sequences using TextDecoder.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ignore compiled vitest configs and rrvideo temp directories that
accumulate during local development and branch switching.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
if (index !== -1) {
return `file-${prefix}-${index}`;
} else {
return `file-${prefix}-${resources.push(filename) - 1}`;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regex null result stringified as "null" in filenames

Low Severity

When parseMhtml produces parts without a Content-Location header (frame documents), filename is ''. Then const prefix = /^\w+/.exec('') returns null, and the template literal file-${prefix}-${index} coerces null to the string "null", producing the label file-null-0. This also means all frame parts share the identical label file-null-0 (they were previously distinguishable as file-frame-4, file-frame-5, etc.), making snapshot output less informative for debugging.

Additional Locations (1)
Fix in Cursor Fix in Web

Node.js 24 is now the latest LTS, which causes yarn v1 to hang
during install. Pin to Node 20 (our tested version) until we can
validate compatibility with newer versions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

const headers = part.substring(0, headerEnd);
const body = part
.substring(headerEnd + (separatorMatch?.[1].length || 4))
.trim();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trim before QP decode corrupts trailing soft breaks

Low Severity

The .trim() is applied to the MHTML body before passing it to decodeQuotedPrintable. In quoted-printable encoding, =\r\n is a soft line break. If the last content line ends with a soft break (e.g., long line wrapping), .trim() strips the \r\n, leaving a dangling =. The QP decoder then treats this = as a literal character instead of a continuation marker, producing an extra = in the decoded output. Moving the .trim() to after decoding (or trimming only the decoded result) would avoid this.

Additional Locations (1)
Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant