Skip to content

feat(converter): render EMF+ images via embedded bitmaps (SD-2503)#3214

Open
gpardhivvarma wants to merge 3 commits intosuperdoc-dev:mainfrom
gpardhivvarma:feat/emf-plus-embedded-bitmap-sd-2503
Open

feat(converter): render EMF+ images via embedded bitmaps (SD-2503)#3214
gpardhivvarma wants to merge 3 commits intosuperdoc-dev:mainfrom
gpardhivvarma:feat/emf-plus-embedded-bitmap-sd-2503

Conversation

@gpardhivvarma
Copy link
Copy Markdown
Contributor

Summary

Renders EMF+ images that embed a compressed bitmap (PNG/JPEG/GIF) instead of falling back to the placeholder SVG. Most real-world EMF+ files generated by Office — cover slides, charts, illustrations — wrap a complete PNG or JPEG inside an EmfPlusObject(Image) record. Walking the EMF+ stream and pulling that bitmap out gives pixel-perfect rendering without implementing a GDI+ rasterizer.

Closes #3172.

What changed

packages/super-editor/src/editors/v1/core/super-converter/v3/handlers/wp/helpers/metafile-converter.js

  • New extractBitmapFromEmfPlus(buffer): walks EMR_COMMENT records carrying EMF+ payloads, scans inner EMF+ records for EmfPlusObject(Image) entries, reassembles continuation series via the TotalObjectSize prefix on the first chunk per MS-EMFPLUS § 2.3.5.1, and parses the resulting EmfPlusImage / EmfPlusBitmap to extract the encoded image bytes.
  • New parseEmfPlusImageObject(bytes): validates Image.Type=Bitmap and Bitmap.Type=Compressed, then returns the embedded PNG/JPEG/GIF as a data URI.
  • New detectCompressedImageFormat(bytes) helper using PNG/JPEG/GIF magic bytes.
  • Wired the extractor into convertEmfToSvg between the existing classic EMR_STRETCHDIBITS path and the EMF+ placeholder, so the placeholder remains the final fallback for pure-vector EMF+.
  • Pulled the literal 70 for EMR_COMMENT into a named constant shared with the existing isEmfPlus detector.
  • Updated the module-level docstring to reflect the layered strategy.

Spec correctness

Per MS-EMFPLUS § 2.3.5.1:

  • First chunk: ContinueBit=1, ObjectData = TotalObjectSize | first slice.
  • Middle chunks: ContinueBit=1, raw appended bytes.
  • Final chunk: ContinueBit=0 — the parser keys off this to flush.
  • Defensive fallback: if an off-spec encoder leaves ContinueBit=1 on the last record, the parser flushes early once TotalObjectSize bytes are accumulated.

What this does NOT cover

  • Pure-vector EMF+ (logos drawn entirely with GDI+ paths) — those still hit the placeholder. Implementing a full GDI+ renderer is out of scope.
  • Pixel-format (uncompressed) EmfPlusBitmap — also still hits the placeholder; rasterizing raw pixel buffers requires the same infrastructure.

Acceptance criteria

  • EMF+ images with embedded compressed bitmaps render actual document content instead of the placeholder SVG.
  • Existing classic EMF/WMF rendering behavior is preserved (293 tests in the helpers directory still pass).
  • DOCX round-trip export continues to preserve the original metafile asset (the import path stores originalSrc / originalExtension when a metafile is converted; not changed).
  • Targeted test coverage: 6 new tests using synthetic in-memory EMF+ buffers cover PNG/JPEG extraction, spec-compliant continuation reassembly, off-spec lenient continuation flush, fallback for non-Image objects, and rejection of pixel-format bitmaps.

Test plan

  • pnpm exec vitest run src/editors/v1/core/super-converter/v3/handlers/wp/helpers/metafile-converter.test.js — 14/14 pass.
  • pnpm exec vitest run src/editors/v1/core/super-converter/v3/handlers/wp/helpers/ — 293/293 pass (no regressions in adjacent helpers).
  • pnpm exec prettier --check on both modified files — clean.
  • Open the reproducer (m3 proposal.docx) in the editor and verify the cover image renders.

EMF+ payloads use GDI+ drawing records that the rtf.js renderer doesn't
implement, so prior to this change every EMF+ image rendered as an
"Unable to render EMF+ image" placeholder.

Most real-world EMF+ files generated by Office (cover slides, charts,
illustrations) embed a complete PNG/JPEG inside an EmfPlusObject(Image)
record with BitmapDataType=Compressed. Walk the EMR_COMMENT records in
the EMF stream, parse the inner EMF+ records, reassemble continuation
series via the TotalObjectSize prefix on the first chunk
(MS-EMFPLUS § 2.3.5.1), and return the embedded image directly.

Pure-vector and pixel-format EMF+ images still fall back to the
placeholder — a full GDI+ rasterizer is out of scope here.

Closes superdoc-dev#3172
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7900e7f5eb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

…(SD-2503)

Per MS-EMFPLUS § 2.3.5.1, when ContinueBit=1 the EmfPlusObject record
header is 16 bytes — TotalObjectSize sits between Size and DataSize and
is present on every continued record (not only the first). The previous
implementation read offset 8 as DataSize for continued records, which
is actually TotalObjectSize, and treated the first 4 bytes of ObjectData
as TotalObjectSize. The synthetic continuation tests built buffers with
the same wrong layout, so they passed without exercising the bug.

Real EMF+ files written by Office (the multi-record cover-image case)
follow the spec layout, so the prior code would have either bailed on
the bounds check or copied from the wrong offset and fallen through to
the placeholder.

Now:
  ContinueBit=1: Type(2) Flags(2) Size(4) TotalObjectSize(4) DataSize(4) ObjectData
  ContinueBit=0: Type(2) Flags(2) Size(4)                    DataSize(4) ObjectData

Tests rebuild the synthetic buffers with the correct layout and add
coverage for a 3+ record continuation series.
@gpardhivvarma
Copy link
Copy Markdown
Contributor Author

@caio-pizzol please review this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: render EMF+ metafile images

2 participants