Skip to content

dx(python): docx-scalpel 0.1.0a1 — missing FillPlaceholders, bool undo/redo, version drift, public from_wire #192

@JSv4

Description

@JSv4

Smoke-tested the Python wrapper end-to-end against the public NVCA Model COI (same pipeline as the C# smoke test: strip 95 drafting-note footnotes, delete the Preliminary Notes / pay-to-play / Sections 6.1–6.5, choose between four alternative-section forks, fill 124 placeholders, mop up 2 bare-underscore runs). The wrapper produces a byte-identical 95,131-byte output to the C# driver and the final EditSummary reports 0/0/0/0. Bones are good.

Four DX gaps surfaced that I'd group as one polish pass on the API surface. Filing them together since they cluster — the wrapper is well-shaped enough that each is a small focused change; together they make 0.1.0a2 substantively nicer.

1. fill_placeholders / FillOptions / BulkEditResult not wrapped — biggest item

The C# convenience helper that's the centerpiece of the template-fill DX is missing from Python. Users have to hand-roll the multi-pass loop:

filled_total = 0
unfilled_seen = {}
for _ in range(8):                                       # MaxPasses cap
    ph = session.find_placeholders(
        kinds=ds.PlaceholderKinds.ALL,
        boundary=ds.ContextBoundary.BRACKET,
    )
    ph_sorted = sorted(
        ph,
        key=lambda p: (p.match.enclosing_anchor.id, p.match.span.start),
        reverse=True,                                    # reverse-offset ordering
    )
    filled_this_pass = 0
    for p in ph_sorted:
        value = picker(p)
        if value is None:
            unfilled_seen.setdefault(key_for(p), p)
            continue
        if p.match.text.startswith(\"$\") and not value.startswith(\"$\"):
            value = \"$\" + value                          # PreserveDollarPrefix
        if session.replace_match(p.match, value).success:
            filled_this_pass += 1
            unfilled_seen.pop(key_for(p), None)
    filled_total += filled_this_pass
    if filled_this_pass == 0:
        break

That's ~25 lines of subtle correctness (anchor-then-offset sort, dollar-prefix preservation, multi-pass nested-bracket convergence, dedup of unfilled across passes) that the C# helper exists specifically to encapsulate. Every Python integrator who hits a template will rediscover these the hard way.

Ask: wrap DocxSession.FillPlaceholders along with FillOptions and BulkEditResult as session.fill_placeholders(picker, *, kinds=..., scope=..., max_passes=8, preserve_dollar_prefix=True, context_chars=80, boundary=ContextBoundary.CHAR) returning a BulkEditResult(filled, skipped, passes, unfilled, errors).

2. undo() / redo() return bool — should return EditResult

>>> import inspect, docx_scalpel as ds
>>> inspect.signature(ds.DocxSession.undo)
(self) -> 'bool'
>>> inspect.signature(ds.DocxSession.redo)
(self) -> 'bool'

The C# API returns EditResult from Undo() / Redo() so callers can read Created / Removed / Modified anchor lists and keep their cached projection in sync without re-projecting the whole document (per the anchor-lifecycle contract in docs/architecture/docx_mutation_api.md).

The Python bool throws that information away — callers either re-project() per undo (expensive on large docs) or maintain their own undo-aware state diff. Neither matches the documented contract.

Ask: change return type to EditResult matching the rest of the mutation surface. Pass-through is mechanical if the stdio host already returns the full envelope; if it currently returns just a boolean, that's a host-side fix too.

3. __version__ == \"0.1.0a0\" while the PyPI release is \"0.1.0a1\"

>>> import docx_scalpel
>>> docx_scalpel.__version__
'0.1.0a0'
>>> # pip show docx-scalpel reports Version: 0.1.0a1

Just a forgotten bump in docx_scalpel/__init__.py (or wherever __version__ lives) when the release was cut. Cosmetic but it makes pip show vs runtime-introspection answers disagree, which is a debugging-time confusion tax.

Ask: sync __version__ to the wheel version on release; consider a release-time check or setup.cfg/pyproject.toml-driven dynamic version so it can't drift.

4. from_wire classmethods are in the public namespace

AnchorTarget, TextMatch, EditResult, TemplatePlaceholder, etc. all expose a from_wire(d) classmethod that's clearly the JSON-deserializer for the stdio transport — not something callers should ever use.

>>> [n for n in dir(ds.AnchorTarget) if not n.startswith('_')]
['from_wire', 'id', 'kind', 'part_uri', 'scope', 'text_preview', 'unid']

It shows up in dir(...), in IDE autocomplete, in help(...), and in generated API docs. New users will wonder what it's for.

Ask: rename to _from_wire (or move to a sibling internal _wire.py module that does AnchorTarget._wire_decode(...) instead). Keeps the public surface focused on what users actually call.


What works well (worth keeping)

  • AnchorTarget is flatter than the C# shape (.id directly, no .anchor.id indirection). Better DX, arguably what C# should mirror.
  • py.typed ships in the wheel — type checkers see real signatures.
  • with open_session(bytes) as s: context-manager form is the natural Python shape.
  • Idiomatic enum casing (PlaceholderKind.BLANK_FILL, EditErrorCode.ANCHOR_NOT_FOUND).
  • EditError shape (code / message / anchor_id) preserved cleanly through the wire.
  • Stdio host stays warm between calls — no reboot per request, latency feels native.
  • All of find_by_kind, find_placeholders, grep, replace_match, delete_block, delete_range, delete_section, get_edit_summary, project, project_anchor, save, raw.get_xml/insert_xml/replace_xml, list_annotations present and functional.

Repro / smoke artifacts

Driver script (smoke.py) plus the filled DOCX preserved at /tmp/docxodus-smoketest/py/ — happy to attach if useful. The script is ~330 lines and includes the hand-rolled fill loop showing exactly the workaround #1 forces on every user.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions