Skip to content

Conversation

@harbournick
Copy link
Collaborator

@harbournick harbournick commented Sep 3, 2025

  • Files encoded with UTF-16 will break DocxZipper. This fixes it.
  • OOXML allows UTF-16LE, UTF-16BE so we handle this now

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes an encoding issue in DocxZipper where XML files encoded in UTF-16 would break the parser. The fix introduces comprehensive encoding detection and handling utilities.

  • Added encoding detection utilities for UTF-8, UTF-16LE, and UTF-16BE with BOM support
  • Replaced string-based ZIP entry extraction with byte-level extraction and proper decoding for XML files
  • Added comprehensive test coverage for various encoding scenarios

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
packages/super-editor/src/core/encoding-helpers.js New utility module with encoding detection, BOM handling, and XML string normalization functions
packages/super-editor/src/core/encoding-helpers.test.js Comprehensive test suite covering all encoding scenarios and utility functions
packages/super-editor/src/core/DocxZipper.js Updated to use encoding helpers for proper XML file extraction from ZIP archives
packages/super-editor/src/core/DocxZipper.test.js Added integration test for UTF-16LE XML handling in DOCX files

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@harbournick harbournick merged commit 3a1be24 into main Sep 3, 2025
7 checks passed
@harbournick harbournick deleted the fix/utf-16-imports branch September 3, 2025 04:44
harbournick pushed a commit that referenced this pull request Sep 3, 2025
# [0.16.0-next.7](v0.16.0-next.6...v0.16.0-next.7) (2025-09-03)

### Bug Fixes

* imports encoded in utf-16 break DocxZipper ([#860](#860)) ([3a1be24](3a1be24))
@harbournick
Copy link
Collaborator Author

🎉 This PR is included in version 0.16.0-next.7 🎉

The release is available on:

Your semantic-release bot 📦🚀

harbournick pushed a commit that referenced this pull request Sep 3, 2025
## [0.16.1](v0.16.0...v0.16.1) (2025-09-03)

### Bug Fixes

* add safety check for clipboard usage ([#859](#859)) ([bfca96e](bfca96e))
* correct syntax in release workflow for semantic-release command ([3e6376e](3e6376e))
* dispatch tracked changes transaction only once at import ([31ecec7](31ecec7))
* imports encoded in utf-16 break DocxZipper ([6d09115](6d09115))
* imports encoded in utf-16 break DocxZipper ([9bc488d](9bc488d))
* imports encoded in utf-16 break DocxZipper ([#860](#860)) ([3a1be24](3a1be24))
* semantic release range ([505e27b](505e27b))
* update release naming pattern in .releaserc.json for better version matching ([1fda655](1fda655))
@harbournick
Copy link
Collaborator Author

🎉 This PR is included in version 0.16.1 🎉

The release is available on:

Your semantic-release bot 📦🚀

harbournick pushed a commit that referenced this pull request Sep 9, 2025
# [0.16.0](v0.15.18...v0.16.0) (2025-09-09)

### Bug Fixes

* add processing for line-height defined in px ([#880](#880)) ([3b61275](3b61275))
* add safety check for clipboard usage ([#859](#859)) ([bfca96e](bfca96e))
* additional fixes to list indent/outdent, split list, toggle list, types and more tests ([02e6cd9](02e6cd9))
* backspaceNextToList, toggleList and tests ([8b33258](8b33258))
* closing dropdown after clicking again ([#835](#835)) ([88ff88d](88ff88d))
* correct syntax in release workflow for semantic-release command ([3e6376e](3e6376e))
* createNewList in input rule to fix new list in tables, lint ([aa79655](aa79655))
* definition possibly missing name key, add jsdoc ([bb714f1](bb714f1))
* dispatch tracked changes transaction only once at import ([31ecec7](31ecec7))
* do not deploy next on oracle or yjs changes ([a02cf33](a02cf33))
* highlight selected value in font dropdowns ([#869](#869)) ([4a30f59](4a30f59))
* images are missing for the document in edit mode ([#831](#831)) ([a9af47e](a9af47e))
* imports encoded in utf-16 break DocxZipper ([#860](#860)) ([3a1be24](3a1be24))
* include package lock on tests folder ([#845](#845)) ([1409d02](1409d02))
* insertContentAt fails if new line characters (\n) inserted ([dd60d91](dd60d91))
* insertContentAt for html ([f6c53d3](f6c53d3))
* inserting html with heading tags does not render as expected (HAR-10430) ([#874](#874)) ([bba5074](bba5074))
* install http server ([#846](#846)) ([1a6e684](1a6e684))
* **internal:** remove pdfjs from build ([#843](#843)) ([021b2c1](021b2c1))
* japanese list numbering ([#882](#882)) ([d256a48](d256a48))
* regex improvements ([ee0333b](ee0333b))
* remove footer line length breaking deployments ([04766cd](04766cd))
* restore stored marks if they exist ([#863](#863)) ([0a2860e](0a2860e))
* restore stored marks if they exist ([#863](#863)) ([1961e5f](1961e5f))
* splitListItem if there are images or other atom nodes in list item, fix tests ([#878](#878)) ([535390f](535390f))
* **table:** add support for table row w:cantSplit ([#890](#890)) ([3467ad5](3467ad5))
* test ([8572b8a](8572b8a))
* test ([65126fd](65126fd))
* test ([42cb383](42cb383))
* test next release ([c3ac7d0](c3ac7d0))
* toggle list ([770998a](770998a))
* toggle list for multiple nodes and active selection ([69b3a1b](69b3a1b))
* toggle list inside tables ([091df80](091df80))
* update condition checks for screenshot updates in CI workflow ([e17fdf0](e17fdf0))

### Features

* add custom toolbar button example (HAR-10436) ([#868](#868)) ([c4fd4d5](c4fd4d5))
* add support for paragraph borders ([#862](#862)) ([2f98c07](2f98c07))
* begin v0.18 development ([ed5030f](ed5030f))
* enable dispatching example apps tests ([#844](#844)) ([8b2bc73](8b2bc73))
* filter out ooxml tags cli to highest priority namespaces ([23b1efa](23b1efa))
* ignore specific docx nodes during import ([#909](#909)) ([0a99a09](0a99a09))
harbournick pushed a commit that referenced this pull request Sep 9, 2025
# [0.16.0](v0.15.18...v0.16.0) (2025-09-09)

### Bug Fixes

* add processing for line-height defined in px ([#880](#880)) ([3b61275](3b61275))
* add safety check for clipboard usage ([#859](#859)) ([bfca96e](bfca96e))
* additional fixes to list indent/outdent, split list, toggle list, types and more tests ([02e6cd9](02e6cd9))
* backspaceNextToList, toggleList and tests ([8b33258](8b33258))
* closing dropdown after clicking again ([#835](#835)) ([88ff88d](88ff88d))
* correct syntax in release workflow for semantic-release command ([3e6376e](3e6376e))
* createNewList in input rule to fix new list in tables, lint ([aa79655](aa79655))
* definition possibly missing name key, add jsdoc ([bb714f1](bb714f1))
* dispatch tracked changes transaction only once at import ([31ecec7](31ecec7))
* do not deploy next on oracle or yjs changes ([a02cf33](a02cf33))
* highlight selected value in font dropdowns ([#869](#869)) ([4a30f59](4a30f59))
* images are missing for the document in edit mode ([#831](#831)) ([a9af47e](a9af47e))
* imports encoded in utf-16 break DocxZipper ([#860](#860)) ([3a1be24](3a1be24))
* include package lock on tests folder ([#845](#845)) ([1409d02](1409d02))
* insertContentAt fails if new line characters (\n) inserted ([dd60d91](dd60d91))
* insertContentAt for html ([f6c53d3](f6c53d3))
* inserting html with heading tags does not render as expected (HAR-10430) ([#874](#874)) ([bba5074](bba5074))
* install http server ([#846](#846)) ([1a6e684](1a6e684))
* **internal:** remove pdfjs from build ([#843](#843)) ([021b2c1](021b2c1))
* japanese list numbering ([#882](#882)) ([d256a48](d256a48))
* regex improvements ([ee0333b](ee0333b))
* remove footer line length breaking deployments ([04766cd](04766cd))
* restore stored marks if they exist ([#863](#863)) ([0a2860e](0a2860e))
* restore stored marks if they exist ([#863](#863)) ([1961e5f](1961e5f))
* splitListItem if there are images or other atom nodes in list item, fix tests ([#878](#878)) ([535390f](535390f))
* **table:** add support for table row w:cantSplit ([#890](#890)) ([3467ad5](3467ad5))
* test ([8572b8a](8572b8a))
* test ([65126fd](65126fd))
* test ([42cb383](42cb383))
* test next release ([c3ac7d0](c3ac7d0))
* toggle list ([770998a](770998a))
* toggle list for multiple nodes and active selection ([69b3a1b](69b3a1b))
* toggle list inside tables ([091df80](091df80))
* update condition checks for screenshot updates in CI workflow ([e17fdf0](e17fdf0))

### Features

* add custom toolbar button example (HAR-10436) ([#868](#868)) ([c4fd4d5](c4fd4d5))
* add support for paragraph borders ([#862](#862)) ([2f98c07](2f98c07))
* begin v0.18 development ([ed5030f](ed5030f))
* enable dispatching example apps tests ([#844](#844)) ([8b2bc73](8b2bc73))
* filter out ooxml tags cli to highest priority namespaces ([23b1efa](23b1efa))
* ignore specific docx nodes during import ([#909](#909)) ([0a99a09](0a99a09))
* new release cycle after version sync ([eb9684a](eb9684a))
harbournick pushed a commit that referenced this pull request Sep 9, 2025
# [0.16.0](v0.15.18...v0.16.0) (2025-09-09)

### Bug Fixes

* add processing for line-height defined in px ([#880](#880)) ([3b61275](3b61275))
* add safety check for clipboard usage ([#859](#859)) ([bfca96e](bfca96e))
* additional fixes to list indent/outdent, split list, toggle list, types and more tests ([02e6cd9](02e6cd9))
* backspaceNextToList, toggleList and tests ([8b33258](8b33258))
* closing dropdown after clicking again ([#835](#835)) ([88ff88d](88ff88d))
* correct syntax in release workflow for semantic-release command ([3e6376e](3e6376e))
* createNewList in input rule to fix new list in tables, lint ([aa79655](aa79655))
* definition possibly missing name key, add jsdoc ([bb714f1](bb714f1))
* dispatch tracked changes transaction only once at import ([31ecec7](31ecec7))
* do not deploy next on oracle or yjs changes ([a02cf33](a02cf33))
* highlight selected value in font dropdowns ([#869](#869)) ([4a30f59](4a30f59))
* images are missing for the document in edit mode ([#831](#831)) ([a9af47e](a9af47e))
* imports encoded in utf-16 break DocxZipper ([#860](#860)) ([3a1be24](3a1be24))
* include package lock on tests folder ([#845](#845)) ([1409d02](1409d02))
* insertContentAt fails if new line characters (\n) inserted ([dd60d91](dd60d91))
* insertContentAt for html ([f6c53d3](f6c53d3))
* inserting html with heading tags does not render as expected (HAR-10430) ([#874](#874)) ([bba5074](bba5074))
* install http server ([#846](#846)) ([1a6e684](1a6e684))
* **internal:** remove pdfjs from build ([#843](#843)) ([021b2c1](021b2c1))
* japanese list numbering ([#882](#882)) ([d256a48](d256a48))
* regex improvements ([ee0333b](ee0333b))
* remove footer line length breaking deployments ([04766cd](04766cd))
* restore stored marks if they exist ([#863](#863)) ([0a2860e](0a2860e))
* restore stored marks if they exist ([#863](#863)) ([1961e5f](1961e5f))
* splitListItem if there are images or other atom nodes in list item, fix tests ([#878](#878)) ([535390f](535390f))
* **table:** add support for table row w:cantSplit ([#890](#890)) ([3467ad5](3467ad5))
* test ([8572b8a](8572b8a))
* test ([65126fd](65126fd))
* test ([42cb383](42cb383))
* test next release ([c3ac7d0](c3ac7d0))
* toggle list ([770998a](770998a))
* toggle list for multiple nodes and active selection ([69b3a1b](69b3a1b))
* toggle list inside tables ([091df80](091df80))
* update condition checks for screenshot updates in CI workflow ([e17fdf0](e17fdf0))

### Features

* add custom toolbar button example (HAR-10436) ([#868](#868)) ([c4fd4d5](c4fd4d5))
* add support for paragraph borders ([#862](#862)) ([2f98c07](2f98c07))
* begin v0.18 development ([ed5030f](ed5030f))
* enable dispatching example apps tests ([#844](#844)) ([8b2bc73](8b2bc73))
* filter out ooxml tags cli to highest priority namespaces ([23b1efa](23b1efa))
* ignore specific docx nodes during import ([#909](#909)) ([0a99a09](0a99a09))
harbournick pushed a commit that referenced this pull request Sep 9, 2025
# [0.16.0](v0.15.18...v0.16.0) (2025-09-09)

### Bug Fixes

* add processing for line-height defined in px ([#880](#880)) ([3b61275](3b61275))
* add safety check for clipboard usage ([#859](#859)) ([bfca96e](bfca96e))
* additional fixes to list indent/outdent, split list, toggle list, types and more tests ([02e6cd9](02e6cd9))
* backspaceNextToList, toggleList and tests ([8b33258](8b33258))
* closing dropdown after clicking again ([#835](#835)) ([88ff88d](88ff88d))
* correct syntax in release workflow for semantic-release command ([3e6376e](3e6376e))
* createNewList in input rule to fix new list in tables, lint ([aa79655](aa79655))
* definition possibly missing name key, add jsdoc ([bb714f1](bb714f1))
* dispatch tracked changes transaction only once at import ([31ecec7](31ecec7))
* do not deploy next on oracle or yjs changes ([a02cf33](a02cf33))
* highlight selected value in font dropdowns ([#869](#869)) ([4a30f59](4a30f59))
* images are missing for the document in edit mode ([#831](#831)) ([a9af47e](a9af47e))
* imports encoded in utf-16 break DocxZipper ([#860](#860)) ([3a1be24](3a1be24))
* include package lock on tests folder ([#845](#845)) ([1409d02](1409d02))
* insertContentAt fails if new line characters (\n) inserted ([dd60d91](dd60d91))
* insertContentAt for html ([f6c53d3](f6c53d3))
* inserting html with heading tags does not render as expected (HAR-10430) ([#874](#874)) ([bba5074](bba5074))
* install http server ([#846](#846)) ([1a6e684](1a6e684))
* **internal:** remove pdfjs from build ([#843](#843)) ([021b2c1](021b2c1))
* japanese list numbering ([#882](#882)) ([d256a48](d256a48))
* regex improvements ([ee0333b](ee0333b))
* remove footer line length breaking deployments ([04766cd](04766cd))
* restore stored marks if they exist ([#863](#863)) ([0a2860e](0a2860e))
* restore stored marks if they exist ([#863](#863)) ([1961e5f](1961e5f))
* splitListItem if there are images or other atom nodes in list item, fix tests ([#878](#878)) ([535390f](535390f))
* **table:** add support for table row w:cantSplit ([#890](#890)) ([3467ad5](3467ad5))
* test ([8572b8a](8572b8a))
* test ([65126fd](65126fd))
* test ([42cb383](42cb383))
* test next release ([c3ac7d0](c3ac7d0))
* toggle list ([770998a](770998a))
* toggle list for multiple nodes and active selection ([69b3a1b](69b3a1b))
* toggle list inside tables ([091df80](091df80))
* update condition checks for screenshot updates in CI workflow ([e17fdf0](e17fdf0))

### Features

* add custom toolbar button example (HAR-10436) ([#868](#868)) ([c4fd4d5](c4fd4d5))
* add support for paragraph borders ([#862](#862)) ([2f98c07](2f98c07))
* begin v0.18 development ([ed5030f](ed5030f))
* enable dispatching example apps tests ([#844](#844)) ([8b2bc73](8b2bc73))
* filter out ooxml tags cli to highest priority namespaces ([23b1efa](23b1efa))
* ignore specific docx nodes during import ([#909](#909)) ([0a99a09](0a99a09))
* new release cycle after version sync ([eb9684a](eb9684a))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants