lib: correct windows-1252 decoding in TextDecoder #60737

yashwantbezawada · 2025-11-15T23:20:21Z

Description

This PR corrects the Windows-1252 decoding in TextDecoder by disabling the Latin-1 fast path for Windows-1252 encoding.

Problem

The TextDecoder was incorrectly using the Latin-1 fast path (via simdutf's convert_latin1_to_utf8) for Windows-1252 encoding. This caused incorrect decoding of bytes in the 0x80-0x9F range.

The root cause is that Windows-1252 differs from ISO-8859-1 (Latin-1) in this byte range:

ISO-8859-1: Bytes 0x80-0x9F are undefined/control characters that map directly to Unicode (e.g., 0x92 → U+0092)
Windows-1252: These bytes map to specific printable characters (e.g., 0x92 → U+2019 RIGHT SINGLE QUOTATION MARK ')

Solution

Disable the Latin-1 fast path for Windows-1252 by setting this[kLatin1FastPath] = false. This forces the decoder to use the ICU converter (getConverter()), which correctly handles Windows-1252 character mappings according to the WHATWG Encoding Standard.

Changes

Modified lib/internal/encoding.js line 423 to disable Latin-1 fast path
Added comprehensive test coverage for all 32 affected characters (0x80-0x9F)
Test includes the specific case from issue TextDecoder incorrectly decodes 0x92 and several other characters for Windows-1252 #56542 and realistic text samples

Test Coverage

The new test file test/parallel/test-whatwg-encoding-custom-windows-1252.js verifies:

Specific issue case: byte 0x92 correctly decodes to U+2019 (')
All 32 characters in the 0x80-0x9F range according to WHATWG spec
Common Windows-1252 encoding aliases (windows-1252, cp1252, x-cp1252)
Realistic text samples with mixed special characters

References

Issue: TextDecoder incorrectly decodes 0x92 and several other characters for Windows-1252 #56542
WHATWG Encoding Standard: https://encoding.spec.whatwg.org/#windows-1252
Windows-1252 Wikipedia: https://en.wikipedia.org/wiki/Windows-1252#Codepage_layout

lib/internal/encoding.js

test/parallel/test-whatwg-encoding-custom-windows-1252.js

The TextDecoder was incorrectly using the Latin-1 fast path for windows-1252 encoding, which caused incorrect decoding of bytes in the 0x80-0x9F range. The issue occurs because windows-1252 differs from ISO-8859-1 (Latin-1) in this byte range. The simdutf library's convert_latin1_to_utf8 function directly maps bytes to Unicode codepoints (e.g., 0x92 → U+0092), which is correct for ISO-8859-1 but incorrect for windows-1252, where 0x92 should map to U+2019 (RIGHT SINGLE QUOTATION MARK '). This fix disables the Latin-1 fast path for windows-1252, forcing the decoder to use the ICU converter which correctly handles the windows-1252 specific character mappings according to the WHATWG Encoding Standard. The fix includes comprehensive tests for all 32 affected characters (bytes 0x80-0x9F) to prevent regression. Fixes: nodejs#56542 Refs: https://encoding.spec.whatwg.org/#windows-1252

codecov · 2025-11-16T02:39:59Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.56%. Comparing base (2271d2d) to head (de6c369).
⚠️ Report is 79 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #60737      +/-   ##
==========================================
+ Coverage   88.53%   88.56%   +0.02%     
==========================================
  Files         703      703              
  Lines      208226   208254      +28     
  Branches    40145    40172      +27     
==========================================
+ Hits       184352   184437      +85     
+ Misses      15884    15822      -62     
- Partials     7990     7995       +5

Files with missing lines	Coverage Δ
lib/internal/encoding.js	`99.50% <100.00%> (-0.01%)`	⬇️

... and 60 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

The Latin-1 fast path was incorrectly enabled only for windows-1252 encoding, which differs from ISO-8859-1 (Latin-1) in the 0x80-0x9F range. Since windows-1252 cannot use the Latin-1 fast path (it requires different character mappings via ICU), and no other encoding uses it, the entire Latin-1 fast path mechanism has been removed. This simplifies the code while fixing the windows-1252 decoding issue. Windows-1252 now correctly uses the ICU decoder for all characters. Fixes: nodejs#56542

lib/internal/encoding.js

Remove the decodeLatin1 import from encoding_binding as it is no longer used after disabling the Latin-1 fast path for Windows-1252.

ChALkeR · 2025-11-29T07:24:16Z

decodeLatin1 has no usage outside of this and should be just removed, it was added under a wrong assumption
this PR leaves it present but unused

i noticed this PR just now, but I already filed #60889

Tests here are useful though!

yashwantbezawada · 2025-11-29T09:53:30Z

Makes sense - #60889 is a more complete fix since it removes the C++ code too.

Happy to open a separate PR to add the Windows-1252 test file after #60889 lands, or I can add it directly to #60889 if that's easier. Let me know what works best.

ChALkeR · 2026-01-17T10:57:52Z

This can be closed, #61118 landed, which removed the broken codepath.
It also added tests for all single-byte encodings.

nodejs-github-bot added encoding Issues and PRs related to the TextEncoder and TextDecoder APIs. needs-ci PRs that need a full CI run. labels Nov 15, 2025

yashwantbezawada force-pushed the fix-windows-1252-decoding branch from 9226e25 to d57a575 Compare November 15, 2025 23:35

yashwantbezawada changed the title ~~fix: correct Windows-1252 decoding in TextDecoder~~ lib: correct windows-1252 decoding in TextDecoder Nov 15, 2025

yashwantbezawada force-pushed the fix-windows-1252-decoding branch from d57a575 to ba71243 Compare November 15, 2025 23:42

Renegade334 reviewed Nov 16, 2025

View reviewed changes

lib/internal/encoding.js Outdated Show resolved Hide resolved

test/parallel/test-whatwg-encoding-custom-windows-1252.js Show resolved Hide resolved

Renegade334 added semver-major PRs that contain breaking changes and should be released in the next major version. web-standards Issues and PRs related to Web APIs labels Nov 16, 2025

yashwantbezawada force-pushed the fix-windows-1252-decoding branch from ba71243 to 24a2207 Compare November 16, 2025 00:55

anonrig reviewed Nov 23, 2025

View reviewed changes

lib/internal/encoding.js Show resolved Hide resolved

anonrig removed the semver-major PRs that contain breaking changes and should be released in the next major version. label Nov 23, 2025

lib: remove unused decodeLatin1 import

de6c369

Remove the decodeLatin1 import from encoding_binding as it is no longer used after disabling the Latin-1 fast path for Windows-1252.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

lib: correct windows-1252 decoding in TextDecoder #60737

lib: correct windows-1252 decoding in TextDecoder #60737

Uh oh!

yashwantbezawada commented Nov 15, 2025

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Nov 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

ChALkeR commented Nov 29, 2025 •

edited

Loading

Uh oh!

yashwantbezawada commented Nov 29, 2025

Uh oh!

ChALkeR commented Jan 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

lib: correct windows-1252 decoding in TextDecoder #60737

Are you sure you want to change the base?

lib: correct windows-1252 decoding in TextDecoder #60737

Uh oh!

Conversation

yashwantbezawada commented Nov 15, 2025

Description

Problem

Solution

Changes

Test Coverage

References

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Nov 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

ChALkeR commented Nov 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yashwantbezawada commented Nov 29, 2025

Uh oh!

ChALkeR commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov bot commented Nov 16, 2025 •

edited

Loading

ChALkeR commented Nov 29, 2025 •

edited

Loading

ChALkeR commented Jan 17, 2026 •

edited

Loading