Skip to content

Conversation

@ChALkeR
Copy link
Member

@ChALkeR ChALkeR commented Dec 18, 2025

Tracking: #61041

This builds on top of #61093 and gives an additional ~2x improvement by moving the same logic to native

The old native fast path removed in #61093 had a bunch of nested ifs and converted data from utf16 to utf8 and then back to utf16
This instead constructs strings using direct maps as #61093 and returns them as raw buffers

windows-1252, main pre-#61093:

Test Size Throughput Mean Time
Latin lipsum (ASCII) 84.902 KiB 0.31 GiB/s 0.272 ms
Complex 1 79.771 KiB 0.06 GiB/s 1.292 ms

windows-1252, main with #61093:

Test Size Throughput Mean Time
Latin lipsum (ASCII) 84.902 KiB 33.41 GiB/s 0.003 ms
Complex 1 79.771 KiB 1.48 GiB/s 0.056 ms

windows-1252, this PR:

Test Size Throughput Mean Time
Latin lipsum (ASCII) 84.902 KiB 36.83 GiB/s 0.002 ms
Complex 1 79.771 KiB 3.11 GiB/s 0.027 ms

This also similarly improves all other 1-byte encodings compared to #61093


For comparison, Bun:

Test Size Throughput Mean Time
Latin lipsum (ASCII) 84.902 KiB 36.89 GiB/s 0.003 ms
Complex 1 79.771 KiB 0.23 GiB/s 0.329 ms

v8/jsc is not an issue, unoptimal code is

cc @nodejs/performance

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/startup

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Dec 18, 2025
@ChALkeR ChALkeR changed the title perf: move all 1-byte encodings to native src: move all 1-byte encodings to native Dec 18, 2025
@ChALkeR ChALkeR force-pushed the chalker/decoder/single-byte/1 branch 7 times, most recently from 6130c12 to 4be8283 Compare December 19, 2025 00:16
@ChALkeR
Copy link
Member Author

ChALkeR commented Jan 17, 2026

#61093 landed, rebased.

@ChALkeR ChALkeR force-pushed the chalker/decoder/single-byte/1 branch 3 times, most recently from 51adfc3 to 82a6ecc Compare January 17, 2026 11:47
Comment on lines 370 to 387
const uint16_t* const tSingleByteEncodings[28] = {
tIBM866, tKOI8_R, tKOI8_U, tMacintosh, tXMacCyrillic,
tISO_8859_2, tISO_8859_3, tISO_8859_4, tISO_8859_5, tISO_8859_6, tISO_8859_7,
tISO_8859_8, tISO_8859_8, // second time for for 8-i
tISO_8859_10, tISO_8859_13, tISO_8859_14, tISO_8859_15, tISO_8859_16,
tWindows874, tWindows1250, tWindows1251, tWindows1252, tWindows1253,
tWindows1254, tWindows1255, tWindows1256, tWindows1257, tWindows1258
};

// Matches the list of encoding sin single-byte.js
const bool fSingleByteEncodings[28] = {
fIBM866, fKOI8_R, fKOI8_U, fMacintosh, fXMacCyrillic,
fISO_8859_2, fISO_8859_3, fISO_8859_4, fISO_8859_5, fISO_8859_6, fISO_8859_7,
fISO_8859_8, fISO_8859_8, // second time for for 8-i
fISO_8859_10, fISO_8859_13, fISO_8859_14, fISO_8859_15, fISO_8859_16,
fWindows874, fWindows1250, fWindows1251, fWindows1252, fWindows1253,
fWindows1254, fWindows1255, fWindows1256, fWindows1257, fWindows1258
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the header is included in more than one translation unit, it may cause ODR violations. We can either mark them inline or move their definitions to a .cc file.

@mertcanaltin
Copy link
Member

mertcanaltin commented Jan 17, 2026

For the cpp & other lint errors, this might help: make lint or CLANG_FORMAT_START=$(git merge-base HEAD main) make format-cpp

@ChALkeR ChALkeR force-pushed the chalker/decoder/single-byte/1 branch from 82a6ecc to 35b7960 Compare January 17, 2026 11:54
@ChALkeR ChALkeR force-pushed the chalker/decoder/single-byte/1 branch 7 times, most recently from b5d51ae to ee8b680 Compare January 17, 2026 13:03
@ChALkeR
Copy link
Member Author

ChALkeR commented Jan 17, 2026

@mertcanaltin lint was just not ready yet (as was noted in the PR description), fixed.
Thanks for the reviews!

@codecov
Copy link

codecov bot commented Jan 17, 2026

Codecov Report

❌ Patch coverage is 87.36842% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.50%. Comparing base (955d347) to head (1d96511).
⚠️ Report is 10 commits behind head on main.

Files with missing lines Patch % Lines
src/encoding_binding.cc 73.91% 3 Missing and 9 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #61118      +/-   ##
==========================================
- Coverage   88.52%   88.50%   -0.03%     
==========================================
  Files         704      703       -1     
  Lines      208802   208785      -17     
  Branches    40318    40312       -6     
==========================================
- Hits       184842   184784      -58     
- Misses      15947    15973      +26     
- Partials     8013     8028      +15     
Files with missing lines Coverage Δ
lib/internal/encoding.js 99.55% <100.00%> (-0.45%) ⬇️
src/encoding_binding.h 100.00% <ø> (ø)
src/string_bytes.cc 69.93% <100.00%> (+0.19%) ⬆️
src/string_bytes.h 77.77% <ø> (ø)
src/encoding_binding.cc 55.96% <73.91%> (+3.22%) ⬆️

... and 33 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Co-authored-by: Mert Can Altin <mertgold60@gmail.com>
@ChALkeR ChALkeR force-pushed the chalker/decoder/single-byte/1 branch from 0847010 to 1d96511 Compare January 17, 2026 14:50
Copy link
Member

@gurgunday gurgunday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

lgtm other than one comment

return;
}

uint16_t* dst = node::UncheckedMalloc<uint16_t>(length);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we can allocate a huge uint16_t[length] before any V8 string-length limit check, and if it ends up being higher than what V8 accepts, we would allocate for no reason

Maybe it would be better to guard this with sth like:

Suggested change
uint16_t* dst = node::UncheckedMalloc<uint16_t>(length);
if (length > static_cast<size_t>(v8::String::kMaxLength)) {
// throw
}
uint16_t* dst = node::UncheckedMalloc<uint16_t>(length);

Let me know if I'm missing something

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants