Smoke-tested the Python wrapper end-to-end against the public NVCA Model COI (same pipeline as the C# smoke test: strip 95 drafting-note footnotes, delete the Preliminary Notes / pay-to-play / Sections 6.1–6.5, choose between four alternative-section forks, fill 124 placeholders, mop up 2 bare-underscore runs). The wrapper produces a byte-identical 95,131-byte output to the C# driver and the final EditSummary reports 0/0/0/0. Bones are good.
Four DX gaps surfaced that I'd group as one polish pass on the API surface. Filing them together since they cluster — the wrapper is well-shaped enough that each is a small focused change; together they make 0.1.0a2 substantively nicer.
1. fill_placeholders / FillOptions / BulkEditResult not wrapped — biggest item
The C# convenience helper that's the centerpiece of the template-fill DX is missing from Python. Users have to hand-roll the multi-pass loop:
filled_total = 0
unfilled_seen = {}
for _ in range(8): # MaxPasses cap
ph = session.find_placeholders(
kinds=ds.PlaceholderKinds.ALL,
boundary=ds.ContextBoundary.BRACKET,
)
ph_sorted = sorted(
ph,
key=lambda p: (p.match.enclosing_anchor.id, p.match.span.start),
reverse=True, # reverse-offset ordering
)
filled_this_pass = 0
for p in ph_sorted:
value = picker(p)
if value is None:
unfilled_seen.setdefault(key_for(p), p)
continue
if p.match.text.startswith(\"$\") and not value.startswith(\"$\"):
value = \"$\" + value # PreserveDollarPrefix
if session.replace_match(p.match, value).success:
filled_this_pass += 1
unfilled_seen.pop(key_for(p), None)
filled_total += filled_this_pass
if filled_this_pass == 0:
break
That's ~25 lines of subtle correctness (anchor-then-offset sort, dollar-prefix preservation, multi-pass nested-bracket convergence, dedup of unfilled across passes) that the C# helper exists specifically to encapsulate. Every Python integrator who hits a template will rediscover these the hard way.
Ask: wrap DocxSession.FillPlaceholders along with FillOptions and BulkEditResult as session.fill_placeholders(picker, *, kinds=..., scope=..., max_passes=8, preserve_dollar_prefix=True, context_chars=80, boundary=ContextBoundary.CHAR) returning a BulkEditResult(filled, skipped, passes, unfilled, errors).
2. undo() / redo() return bool — should return EditResult
>>> import inspect, docx_scalpel as ds
>>> inspect.signature(ds.DocxSession.undo)
(self) -> 'bool'
>>> inspect.signature(ds.DocxSession.redo)
(self) -> 'bool'
The C# API returns EditResult from Undo() / Redo() so callers can read Created / Removed / Modified anchor lists and keep their cached projection in sync without re-projecting the whole document (per the anchor-lifecycle contract in docs/architecture/docx_mutation_api.md).
The Python bool throws that information away — callers either re-project() per undo (expensive on large docs) or maintain their own undo-aware state diff. Neither matches the documented contract.
Ask: change return type to EditResult matching the rest of the mutation surface. Pass-through is mechanical if the stdio host already returns the full envelope; if it currently returns just a boolean, that's a host-side fix too.
3. __version__ == \"0.1.0a0\" while the PyPI release is \"0.1.0a1\"
>>> import docx_scalpel
>>> docx_scalpel.__version__
'0.1.0a0'
>>> # pip show docx-scalpel reports Version: 0.1.0a1
Just a forgotten bump in docx_scalpel/__init__.py (or wherever __version__ lives) when the release was cut. Cosmetic but it makes pip show vs runtime-introspection answers disagree, which is a debugging-time confusion tax.
Ask: sync __version__ to the wheel version on release; consider a release-time check or setup.cfg/pyproject.toml-driven dynamic version so it can't drift.
4. from_wire classmethods are in the public namespace
AnchorTarget, TextMatch, EditResult, TemplatePlaceholder, etc. all expose a from_wire(d) classmethod that's clearly the JSON-deserializer for the stdio transport — not something callers should ever use.
>>> [n for n in dir(ds.AnchorTarget) if not n.startswith('_')]
['from_wire', 'id', 'kind', 'part_uri', 'scope', 'text_preview', 'unid']
It shows up in dir(...), in IDE autocomplete, in help(...), and in generated API docs. New users will wonder what it's for.
Ask: rename to _from_wire (or move to a sibling internal _wire.py module that does AnchorTarget._wire_decode(...) instead). Keeps the public surface focused on what users actually call.
What works well (worth keeping)
AnchorTarget is flatter than the C# shape (.id directly, no .anchor.id indirection). Better DX, arguably what C# should mirror.
py.typed ships in the wheel — type checkers see real signatures.
with open_session(bytes) as s: context-manager form is the natural Python shape.
- Idiomatic enum casing (
PlaceholderKind.BLANK_FILL, EditErrorCode.ANCHOR_NOT_FOUND).
EditError shape (code / message / anchor_id) preserved cleanly through the wire.
- Stdio host stays warm between calls — no reboot per request, latency feels native.
- All of
find_by_kind, find_placeholders, grep, replace_match, delete_block, delete_range, delete_section, get_edit_summary, project, project_anchor, save, raw.get_xml/insert_xml/replace_xml, list_annotations present and functional.
Repro / smoke artifacts
Driver script (smoke.py) plus the filled DOCX preserved at /tmp/docxodus-smoketest/py/ — happy to attach if useful. The script is ~330 lines and includes the hand-rolled fill loop showing exactly the workaround #1 forces on every user.
Smoke-tested the Python wrapper end-to-end against the public NVCA Model COI (same pipeline as the C# smoke test: strip 95 drafting-note footnotes, delete the Preliminary Notes / pay-to-play / Sections 6.1–6.5, choose between four alternative-section forks, fill 124 placeholders, mop up 2 bare-underscore runs). The wrapper produces a byte-identical 95,131-byte output to the C# driver and the final
EditSummaryreports0/0/0/0. Bones are good.Four DX gaps surfaced that I'd group as one polish pass on the API surface. Filing them together since they cluster — the wrapper is well-shaped enough that each is a small focused change; together they make 0.1.0a2 substantively nicer.
1.
fill_placeholders/FillOptions/BulkEditResultnot wrapped — biggest itemThe C# convenience helper that's the centerpiece of the template-fill DX is missing from Python. Users have to hand-roll the multi-pass loop:
That's ~25 lines of subtle correctness (anchor-then-offset sort, dollar-prefix preservation, multi-pass nested-bracket convergence, dedup of unfilled across passes) that the C# helper exists specifically to encapsulate. Every Python integrator who hits a template will rediscover these the hard way.
Ask: wrap
DocxSession.FillPlaceholdersalong withFillOptionsandBulkEditResultassession.fill_placeholders(picker, *, kinds=..., scope=..., max_passes=8, preserve_dollar_prefix=True, context_chars=80, boundary=ContextBoundary.CHAR)returning aBulkEditResult(filled, skipped, passes, unfilled, errors).2.
undo()/redo()returnbool— should returnEditResultThe C# API returns
EditResultfromUndo()/Redo()so callers can readCreated/Removed/Modifiedanchor lists and keep their cached projection in sync without re-projecting the whole document (per the anchor-lifecycle contract indocs/architecture/docx_mutation_api.md).The Python
boolthrows that information away — callers either re-project()per undo (expensive on large docs) or maintain their own undo-aware state diff. Neither matches the documented contract.Ask: change return type to
EditResultmatching the rest of the mutation surface. Pass-through is mechanical if the stdio host already returns the full envelope; if it currently returns just a boolean, that's a host-side fix too.3.
__version__ == \"0.1.0a0\"while the PyPI release is\"0.1.0a1\"Just a forgotten bump in
docx_scalpel/__init__.py(or wherever__version__lives) when the release was cut. Cosmetic but it makespip showvs runtime-introspection answers disagree, which is a debugging-time confusion tax.Ask: sync
__version__to the wheel version on release; consider a release-time check orsetup.cfg/pyproject.toml-driven dynamic version so it can't drift.4.
from_wireclassmethods are in the public namespaceAnchorTarget,TextMatch,EditResult,TemplatePlaceholder, etc. all expose afrom_wire(d)classmethod that's clearly the JSON-deserializer for the stdio transport — not something callers should ever use.It shows up in
dir(...), in IDE autocomplete, inhelp(...), and in generated API docs. New users will wonder what it's for.Ask: rename to
_from_wire(or move to a sibling internal_wire.pymodule that doesAnchorTarget._wire_decode(...)instead). Keeps the public surface focused on what users actually call.What works well (worth keeping)
AnchorTargetis flatter than the C# shape (.iddirectly, no.anchor.idindirection). Better DX, arguably what C# should mirror.py.typedships in the wheel — type checkers see real signatures.with open_session(bytes) as s:context-manager form is the natural Python shape.PlaceholderKind.BLANK_FILL,EditErrorCode.ANCHOR_NOT_FOUND).EditErrorshape (code / message / anchor_id) preserved cleanly through the wire.find_by_kind,find_placeholders,grep,replace_match,delete_block,delete_range,delete_section,get_edit_summary,project,project_anchor,save,raw.get_xml/insert_xml/replace_xml,list_annotationspresent and functional.Repro / smoke artifacts
Driver script (
smoke.py) plus the filled DOCX preserved at/tmp/docxodus-smoketest/py/— happy to attach if useful. The script is ~330 lines and includes the hand-rolled fill loop showing exactly the workaround #1 forces on every user.