fips-agents · rdwj · Apr 29, 2026
@@ -50,7 +50,8 @@ These are settled. Do not revisit without explicit discussion.
 - **Prometheus metrics** — `MetricsCollector` in `metrics.py` follows the TraceCollector observer pattern. Records `agent_requests_total`, `agent_request_duration_seconds`, `agent_model_call_duration_seconds`, `agent_tool_call_total`, `agent_tokens_total`. Exposed at `GET /metrics` in Prometheus text format. Optional `[metrics]` extra (`prometheus_client`). `NullMetricsCollector` when disabled.
 - **OTEL trace export** — `OTELTraceStore` in `otel.py` wraps an inner `TraceStore` with OpenTelemetry span export via OTLP. Span IDs are deterministically hashed (SHA-256) from internal string IDs. Monotonic-to-wallclock conversion anchored on `Trace.started_at`. Optional `[otel]` extra (`opentelemetry-sdk`, `opentelemetry-exporter-otlp-proto-grpc`). Configure via `traces.exporter: otel` in agent.yaml.
 - **Distributed trace propagation** — W3C Trace Context (`traceparent` header) extracted from incoming requests and injected into outgoing `RemoteNode` HTTP calls. `propagation.py` provides `extract_trace_context()` and `inject_trace_context()`. `TraceCollector` accepts `parent_trace_id`/`parent_span_id` to join distributed traces. `RemoteNode.set_trace_context()` injects headers.
-- **Server module structure** — `fipsagents.server` is a proper package: `app.py` (OpenAIChatServer), `models.py` (request/response schemas), `sessions.py` (session stores), `tracing.py` (trace model + stores), `collector.py` (TraceCollector), `metrics.py` (Prometheus metrics), `otel.py` (OTEL export), `propagation.py` (W3C Trace Context), `sqlite.py` (shared connection manager). `__init__.py` re-exports `OpenAIChatServer`, `ChatCompletionRequest`, `ChatMessage`.
+- **Server module structure** — `fipsagents.server` is a proper package: `app.py` (OpenAIChatServer), `models.py` (request/response schemas), `sessions.py` (session stores), `tracing.py` (trace model + stores), `collector.py` (TraceCollector), `metrics.py` (Prometheus metrics), `otel.py` (OTEL export), `propagation.py` (W3C Trace Context), `sqlite.py` (shared connection manager), `files.py` (FileStore + FileRecord), `bytes_store.py` (BytesStore), `parser.py` (FileParser), `scanner.py` (VirusScanner), `feedback.py`, `pricing.py`, `budget.py`, `http.py`. `__init__.py` re-exports `OpenAIChatServer`, `ChatCompletionRequest`, `ChatMessage`.
+- **File upload subsystem** — server-layer only, mirrors the sessions/traces pattern. Four ABCs: `FileStore` (metadata), `BytesStore` (raw bytes), `FileParser` (extract text by MIME), `VirusScanner` (HTTP sidecar). `FileStore` and `BytesStore` are deliberately split per [ADR-0001](docs/adr/0001-s3-bytes-backend.md): `SqliteFileStore` and `PostgresFileStore` compose with any `BytesStore` (`NullBytesStore` / `LocalFsBytesStore` / `S3BytesStore`). Multi-replica deployments use `bytes_backend.type: s3`; single-replica edge/dev keeps `local_fs` on a PVC. Parser dispatch is two-tier — `PlaintextParser` (text/* + JSON/YAML/XML, no extra deps) → `DoclingParser` (binary formats, opt-in via `[files]` extra ~5–6 GB). Optional extras: `[files]` (Docling), `[s3]` (aioboto3 ~30 MB). `OpenAIChatServer` builds and owns all four; closed on shutdown. Endpoints: `POST/GET/DELETE/LIST /v1/files`, plus `file_ids: [...]` on `ChatCompletionRequest` resolves to system messages with extracted text injected before the agent runs. `SqliteFileStore` metadata DB co-locates with bytes via `FilesConfig.sqlite_path` (chart sets `FILES_SQLITE_DB_PATH=<mount>/.metadata/agent.db` when persistence is on). Chunking + pgvector retrieval for large files is the next bullet of #100, designed in [ADR-0002](docs/adr/0002-large-file-chunking-pgvector.md), tracked in #137 (blocked on the ADR).
 - **`probe_role_support()`** is a diagnostic utility in `fipsagents.baseagent.diagnostics` -- probes whether a deployed model supports a given message role (e.g. `developer`). Template inspection (best-effort, checks vLLM model metadata) + canary completion (prompt token delta). Not on the hot path.
 - **`ThinkTagParser`** in `fipsagents.baseagent.reasoning` -- streaming parser that separates `<think>…</think>` blocks from content deltas. Auto-enabled for Granite and DeepSeek models (via `create_reasoning_parser(model_name)`). Wired in `setup()` step 11 and `astep_stream`. Falls back gracefully when vLLM's `--reasoning-parser` already handles extraction server-side.
 - **`McpServerConfig`** supports two YAML-configurable transports: HTTP (`url`) and stdio (`command`/`args`/`env`/`cwd`). Pydantic validator enforces exactly one. `connect_mcp()` also accepts FastMCP server objects for in-process transport (programmatic, not YAML).

@@ -256,7 +256,6 @@ def test_empty_files_sqlite_path_falls_back_to_storage(self, tmp_path):
         # Default behavior — covered by the existing _build_server_with_files
         # fixture which sets storage.sqlite_path and leaves files.sqlite_path
         # empty. Just confirm uploads land at the storage path.
-        from pathlib import Path
         server = _build_server_with_files(tmp_path)
         storage_db = tmp_path / "agent.db"
         with TestClient(server.app) as client: