Skip to content

feat: Add fips-agents add vision (closes #20)#39

Merged
rdwj merged 1 commit into
mainfrom
feat/20-add-vision
May 4, 2026
Merged

feat: Add fips-agents add vision (closes #20)#39
rdwj merged 1 commit into
mainfrom
feat/20-add-vision

Conversation

@rdwj
Copy link
Copy Markdown
Collaborator

@rdwj rdwj commented May 4, 2026

Summary

Wires multimodal (image input) example client into existing agent
projects. Closes #20.

The new `add vision` command:

  1. Verifies `server.files.enabled: true` in `agent.yaml` (precondition;
    the `file_id:` URL scheme resolves bytes via the BytesStore,
    which only exists when files is enabled). Refuses to apply with an
    actionable hint when not satisfied.
  2. Drops `examples/vision_client.py` showing the three `image_url` URL
    forms the agent runtime accepts: inline `data:`, remote `https://`,
    and internal `file_id:`.
  3. Prints next-steps describing how to point the agent at a
    vision-capable model (Granite Vision 3.2-2B canonical example) and
    run the example script.

Pairs with `fipsagents 0.20.0` (image input in OpenAI content blocks)
and `fips-agents add files`.

Notes on the issue wording

  • No `vision:` section added to `agent.yaml`. The agent-template
    audit for #101 (image input on the runtime side) explicitly chose a
    single multimodal endpoint via the existing `model.endpoint` —
    adding a `vision:` block now would bake in a split that hasn't
    been needed yet.
  • Example code lives at `examples/vision_client.py` (client-side),
    not `src/agent.py`. Content blocks are constructed by callers; the
    agent runtime resolves them automatically. The agent code itself
    doesn't need to change for image input.

Test plan

  • `pytest` clean — 265 passed (was 261; +4 new tests for vision E2E
    + spec shape).
  • `black --check` and `ruff check` clean on the changed Python files.
  • Manual: `fips-agents add vision` in a fresh agent project
    without files enabled fails with the right hint; after
    `fips-agents add files`, succeeds and drops the example.
  • Idempotent: second run of `fips-agents add vision` reports
    `already exists` and exits 0.

Closes #20.

Drops `examples/vision_client.py` showing the three `image_url` URL
forms the agent runtime accepts: inline `data:` URIs, remote
`https://` URLs, and the internal `file_id:<id>` scheme that the
agent rewrites to a `data:` URI server-side.

Image input flows through the agent's existing `model.endpoint` —
no separate vision endpoint split. Set `MODEL_ENDPOINT` and
`MODEL_NAME` to a vision-capable model (Granite Vision 3.2-2B and
others) before running the agent.

Precondition: `server.files.enabled` must be `true` in agent.yaml.
The `file_id:<id>` URL scheme resolves bytes via the BytesStore,
which only exists when files is enabled. The command refuses to
apply (with an actionable hint) until `fips-agents add files` has
been run.

Requires fipsagents>=0.20.0 in the project's dependencies.

Notes on issue #20's wording:
- No `vision:` section is added to agent.yaml. The agent-template
  audit for issue #101 explicitly chose a single multimodal endpoint
  via existing `model.endpoint` — adding a `vision:` block now would
  bake in a split that hasn't been needed.
- Example code lives at `examples/vision_client.py` (client-side), not
  `src/agent.py`. Content blocks are constructed by callers and
  flow through the agent runtime automatically; the agent code itself
  doesn't need to change.

Assisted-by: Claude Code (Opus 4.7)
@rdwj rdwj merged commit 4804217 into main May 4, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Scaffold vision model configuration for image input

1 participant