Skip to content

Retire _v2_image_token_to_v1_dict adaptor: migrate embedders + image helpers to v2 directly #1490

@JSv4

Description

@JSv4

Background

PR #1488 completed the migration to canonical v2 PAWLs end-to-end with one residual adaptor: _v2_image_token_to_v1_dict in opencontractserver/utils/multimodal_embeddings.py. It reconstructs a v1 long-key dict (image_path, content_hash, base64_data, format, …) from a v2 TokenView so the embedder pipeline and image helpers (get_image_as_base64, get_image_data_url) — written against the v1 PawlsTokenPythonType shape — keep working unchanged.

This is the last point inside an internal runtime module where v1-shape data is materialized. Tracking it in this issue so it doesn't become forgotten dead code.

Scope

Migrate the v1-shape consumers to read v2 directly, then delete the adaptor:

  • opencontractserver/utils/pdf_token_extraction.py
    • get_image_as_base64(image_token: PawlsTokenPythonType) — read v2 short keys (b64, p, f) directly, or accept a TokenView.
    • get_image_data_url(image_token: PawlsTokenPythonType) — same.
  • opencontractserver/utils/multimodal_embeddings.py
    • Remove _v2_image_token_to_v1_dict.
    • Remove the call sites at lines ~236 and ~438 — pass TokenView (or v2 dict) straight through to the embedder/image-helper APIs.
  • opencontractserver/llms/tools/image_tools.py
    • Remove the _v2_token_to_v1_image_dict helper (mirror of the above) and the cast(PawlsTokenPythonType, …) calls at lines 240/247/299/303 once get_image_as_base64/get_image_data_url accept v2.

Acceptance criteria

  • _v2_image_token_to_v1_dict and _v2_token_to_v1_image_dict are deleted.
  • No call site under opencontractserver/ constructs a v1 image-token dict from v2 data at runtime.
  • Existing image / multimodal embedding tests still pass without modification (other than fixture-shape changes if the test was building v1 dicts by hand).

Context

Surfaced in Claude review of #1488, item #6:

This backward-compat adapter converts v2 TokenView → v1 long-key dict so downstream embedders continue to receive the shape they were written for. The comment says "Phase 2 will migrate them off v1 entirely". This function and its callers are the one place where v1-shape survives inside an internal module — worth tracking in a follow-up issue so it doesn't become forgotten dead code.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions