feat: batch-build with OpenAI Batch API + demo template#71
Conversation
Add OpenAI Batch API support for pipeline transforms, enabling async batch processing of LLM calls at ~50% cost reduction. Core implementation: - BatchLLMClient: drop-in LLMClient replacement using control-flow exceptions (BatchCollecting/BatchInProgress) for lifecycle management - BatchState: persistent JSON state tracking for multi-run batch builds - batch_runner: DAG walker that routes layers to batch vs sync mode based on provider and explicit batch=True/False/None config CLI commands (synix batch-build): - run: create build instance and submit batches - resume: continue a previously submitted build - plan: dry-run showing which layers batch vs sync - list/status: inspect build instances Cassette replay fix: - Allow SYNIX_CASSETTE_MODE=replay when batch_responses.json exists - Skip API key validation during cassette replay - Enables deterministic demo replay without real API keys Demo template (05-batch-build): - 2-level pipeline: batch OpenAI work styles → sync Anthropic summary - Full cassette coverage for both batch and sync layers - Golden files with normalized batch IDs, timestamps, pipeline hashes Tests: 22 E2E tests + unit tests for BatchLLMClient and BatchState
|
Note Red Team Review — OpenAI GPT-5.2 | Adversarial review (docs + diff only) Threat assessment — Medium risk: this PR adds a new persistence format + CLI surface + execution mode (async batch) with several correctness and operability gaps that will be painful to unwind once users depend on it. One-way doors
Findings
Missing
VerdictBlock — the feature is directionally fine, but the pipeline fingerprint guardrail is currently fake and the corrupted-state recovery is unsafe; both are foundational and must be fixed before shipping a new persistent build mode. Review parameters
|
|
Note Architectural Review — Claude Opus | Blind review (docs + diff only) Summary This PR adds OpenAI Batch API support as a new Alignment Strong fit. The build-system analogy holds: batch-build is analogous to a distributed build backend — same DAG, same artifacts, same cache, different execution strategy. Artifacts remain content-addressed and immutable (shared Observations
Verdict This is a well-designed, well-tested incremental step that adds a meaningful cost optimization without disturbing the core build model — good to merge after addressing the Review parameters
|
…eline hash - Replace fragile `BatchState.__new__` bypass with `create_fresh()` classmethod for corrupted state recovery (both reviewers flagged) - Strengthen `compute_pipeline_hash()` to include pipeline llm_config, per-layer config dicts, and batch parameter — changing model, temperature, or batch mode between submit/resume is now detected
|
Note Red Team Review — OpenAI GPT-5.2 | Adversarial review (docs + diff only) Threat assessment — High risk: it adds a new persistence format + CLI surface + alternate execution engine that will be very hard to unwind once users adopt it, and there are correctness holes around caching, state recovery, and batch semantics. One-way doors
Findings
Missing
VerdictBlock — too many contract mismatches and scale/recovery holes for a feature that introduces durable state + new public CLI/API. Fix eligibility semantics, add state schema/versioning, implement batch chunking, and tighten recovery before merging. Review parameters
|
|
Note Architectural Review — Claude Opus | Blind review (docs + diff only) Summary This PR adds OpenAI Batch API support as an experimental Alignment Strong fit. This is a cost optimization layer that preserves all core invariants: artifacts remain content-addressed and immutable, the DAG walk order is unchanged, provenance chains are recorded identically, and cache keys reuse Observations
Verdict This is a well-scoped, architecturally sound addition that respects all core invariants while addressing a real cost concern — the duplicated artifact-saving logic is the main thing to fix before merge. Review parameters
|
- Extract _save_or_cache_artifact() shared helper used by both sync and batch transform runners (addresses review feedback on duplicated logic) - Switch max_polls from fixed iteration count to 24-hour time-based loop - Add Artifact to module-level imports for type annotation
|
Note Red Team Review — OpenAI GPT-5.2 | Adversarial review (docs + diff only) Threat assessment — Medium-high risk: this adds a new execution mode with persistent state + new CLI surface area, and there are a couple of correctness/reliability gaps that will bite users in non-demo scenarios. One-way doors
Findings
Missing
VerdictBlock — the auto-batch condition mismatch and the per-request failure handling are correctness issues that will produce confusing, silently partial builds; fix those plus add state versioning before merging. Review parameters
|
|
Note Architectural Review — Claude Opus | Blind review (docs + diff only) Summary This PR adds OpenAI Batch API support as an experimental Alignment Strong fit. The build system analogy extends naturally: batch builds are like distributed compilation — same DAG, same artifacts, different execution strategy. Artifacts remain content-addressed and immutable (shared Observations
Verdict This is a well-designed incremental step that adds meaningful cost optimization without compromising the core build-system invariants — the cassette-mode contradiction (finding #4) and the pipeline hash serialization edge case (#6) should be resolved before merge. Review parameters
|
…tch-build demo - Record real OpenAI Batch API + Anthropic cassettes (no more synthetic data) - Add sync build path (steps 7-10) to demo — same pipeline, no Batch API - Add `batch-build status --latest` step to exercise all subcommands - Add normalization rules for pipeline hashes, datetimes, dashboard URLs, OpenAI batch IDs - Regenerate all goldens from real API recordings
|
Note Red Team Review — OpenAI GPT-5.2 | Adversarial review (docs + diff only) Threat assessment — Medium risk: you’re introducing a new execution mode with persistent state + new CLI surface area, and the design has a couple correctness holes that will bite under partial failures and resume scenarios. One-way doors
Findings
Missing
VerdictBlock — doc/behavior mismatches + expired-batch behavior contradicts spec, and silent partial-output handling is a correctness footgun for a build system. Fix those before shipping this CLI/state format. Review parameters
|
|
Note Architectural Review — Claude Opus | Blind review (docs + diff only) Summary This PR adds OpenAI Batch API support ( Alignment Strong fit. DESIGN.md identifies LLM cost as a key differentiator from SQL-based build systems ("$0.01-$1.00 per call vs. milliseconds of SQL. Incremental rebuild matters more."). Batch API directly addresses this. The design preserves core invariants: artifacts remain content-addressed and immutable, the DAG walk order is unchanged, provenance chains are complete (batch artifacts go through Observations
Verdict This is a well-designed, thoroughly tested feature addition that directly serves the project's cost-reduction goals while preserving all core architectural invariants — a strong incremental step, pending resolution of the spec/code mismatch on auto-batch triggering and cassette-mode semantics. Review parameters
|
OpenAI SDK raises on client creation without OPENAI_API_KEY, even though replay mode never makes real API calls. Provide a placeholder key so SDK clients can be constructed on CI where API keys aren't available.
|
Note Red Team Review — OpenAI GPT-5.2 | Adversarial review (docs + diff only) Threat assessment — High risk: this PR introduces a new persistent on-disk state machine + new CLI surface area + control-flow-by-exception in core execution paths, with several correctness holes that will be painful to undo. One-way doors
Findings
Missing
VerdictBlock — core correctness issues (request key collisions, wrong provider gating, unbounded state file, no concurrency safety) make this unsafe to ship even as “experimental”. Review parameters
|
|
Note Architectural Review — Claude Opus | Blind review (docs + diff only) Summary This PR adds OpenAI Batch API support as an experimental Alignment Strong fit. DESIGN.md §2.1 explicitly calls out that "LLM transforms are expensive — $0.01-$1.00 per call" and that caching matters more than in SQL pipelines. A 50% cost reduction directly serves this. Artifacts remain content-addressed and immutable — Observations
Verdict This is a well-structured, well-tested incremental step that directly addresses the project's cost sensitivity thesis — the spec/implementation cassette-mode inconsistency (#1) and the duplicated config resolution logic (#2) should be resolved before merge. Review parameters
|
- batch=True rejects non-OpenAI providers (deepseek, openai-compatible) - batch=True rejects custom base_url (Batch API is platform-specific) - batch=None (auto) only routes to batch for native OpenAI (no base_url) - Source layer failures propagate instead of being swallowed - BatchRequestFailed raises RuntimeError instead of logging a warning - Poll loop errors raise immediately instead of silently retrying - Cassette parse errors fail hard (no silent None return) - Plan mode still allows missing API key (shows sync), but provider misconfiguration fails immediately
|
Note Red Team Review — OpenAI GPT-5.2 | Adversarial review (docs + diff only) Threat assessment — High risk: this PR introduces a new CLI surface + new persistence format + new execution mode, with several correctness holes that will create silent wrong builds or hard-to-debug behavior. One-way doors
Findings
Missing
VerdictBlock. The docs promise behaviors the implementation doesn’t provide (partial failures), state/CLI surfaces are one-way doors without versioning/lock safety, and the template duplication will cause immediate user breakage. Fix the critical items before merging. Review parameters
|
|
Note Architectural Review — Claude Opus | Blind review (docs + diff only) Summary — Adds OpenAI Batch API support as an experimental Alignment — Strong fit. DESIGN.md §2.1 explicitly calls out that "LLM transforms are expensive" and caching/cost matter more than in dbt. A 50% cost reduction directly serves this. The batch client reuses Observations
Verdict — This is a well-designed, well-tested incremental step that directly addresses the project's cost-of-LLM-calls concern while preserving all core invariants. The duplicated config resolution and the unresolved cassette-mode documentation issue deserve follow-up, but neither blocks merge. Review parameters
|
Add CLI reference rows for all batch-build subcommands and a full "Batch Build (Experimental)" section covering poll/resume workflows, the batch parameter, provider restrictions, and transform requirements.
|
Note Red Team Review — OpenAI GPT-5.2 | Adversarial review (docs + diff only) Threat assessment — High risk: this PR introduces new persistent state formats + a new CLI surface + control-flow via exceptions; it’s easy to get subtle correctness and “stuck build” behavior wrong. One-way doors
Findings
Missing
VerdictBlock — the implementation contradicts the documented completion/error semantics and is unsafe under concurrent resume; fix those before shipping an experimental feature that writes persistent state and charges money. Review parameters
|
|
Note Architectural Review — Claude Opus | Blind review (docs + diff only) Summary This PR adds experimental OpenAI Batch API support ( Alignment Strong fit. DESIGN.md explicitly calls out that "LLM transforms are expensive — $0.01-$1.00 per call" and that caching/incremental rebuild matter because of cost. Batch mode is a direct response: same DAG, same transforms, same content-addressed artifacts, 50% cheaper. The Observations
Verdict This is a well-scoped, well-tested incremental addition that directly addresses the cost problem DESIGN.md highlights, without compromising any core invariants — good step for the project, pending resolution of the cassette-mode contract inconsistency and the missing Review parameters
|
- BatchState.create_fresh() uses _skip_load parameter instead of unsafe object.__new__ bypass - compute_pipeline_hash() includes transform fingerprints for stronger change detection between submit and resume - Auto-batch skips N:1 transforms (estimate_output_count <= 1) since they don't benefit from batching - Per-unit BatchRequestFailed is caught and logged instead of aborting the entire layer — remaining units continue processing
|
Note Red Team Review — OpenAI GPT-5.2 | Adversarial review (docs + diff only) Threat assessment — Medium-to-high risk: you’re introducing a new persisted state format + new CLI surface + control-flow-by-exception execution path, with several correctness holes that will be painful to unwind if they ship. One-way doors
Findings
Missing
VerdictBlock — the cache/provenance correctness issues (label-based cache loads + allow-mismatch) and the cassette/spec inconsistencies are one-way mistakes that will create irreproducible or wrong builds. Fix those, add schema versioning + locking, and then reassess. Review parameters
|
|
Note Architectural Review — Claude Opus | Blind review (docs + diff only) Summary This PR adds OpenAI Batch API support ( Alignment Strong fit. The DESIGN.md explicitly identifies LLM cost as a key differentiator from SQL-based build systems ("$0.01-$1.00 per call"). Batch builds at 50% cost directly address this. The implementation preserves core invariants: artifacts remain content-addressed via Observations
Verdict This is a well-structured, thoroughly tested incremental step that adds meaningful cost optimization while preserving all architectural invariants — ship it after resolving the duplicate template files and the cassette-mode Review parameters
|
Resolve conflicts from ext transforms merge (PR #72): - Keep v0.12.1 version from main - Keep HEAD's batch-build fixes (create_fresh, pipeline hash, N:1 gating, per-unit errors) - Merge CLI table styles from main with full batch-build commands
|
Note Red Team Review — OpenAI GPT-5.2 | Adversarial review (docs + diff only) Threat assessment — Medium risk: it quietly changes batch/sync execution semantics and introduces new CLI surface area/state coupling without showing the supporting tests/spec alignment. One-way doors
Findings
Missing
VerdictBlock. The new batch/sync heuristic is almost certainly wrong, and treating per-unit submission failure as non-fatal without strict downstream gating will create silently incomplete builds and broken provenance. Review parameters
|
|
Note Architectural Review — Claude Opus | Blind review (docs + diff only) Summary This PR makes three improvements to the experimental batch-build feature: (1) N:1 transforms (like Alignment Strong fit. DESIGN.md §3.3 establishes that cache keys must "capture everything affecting what gets produced" — extending the pipeline hash to include transform fingerprints directly advances this. The non-fatal error handling aligns with the §10.2 stance: "Log malformed outputs but don't halt." The N:1 batching optimization is coherent with the build-system model — you don't batch a link step that produces one binary. Batch build itself advances Hypothesis 3 (architecture is a runtime concern) by making iteration cheaper. Observations
Verdict Good directional changes — the N:1 optimization, resilient error handling, and fingerprint-based change detection are all well-motivated — but the missing tests and the unaddressed partial-completion/provenance gap make this not quite ready to merge. Review parameters
|
Summary
synix batch-build run/resume/plan/list/statuscommands using OpenAI Batch API for eligible transform layers, with automatic fallback to sync for non-OpenAI providersSYNIX_CASSETTE_MODE=replaycheck to allow demo replay whenbatch_responses.jsonexists, skipping API key validation during replay05-batch-build— 2-level pipeline (batch OpenAI work styles → sync Anthropic team summary) with full cassette coverage and golden filesdemo_commands.pyfor stable golden comparisonArchitecture
BatchLLMClientis a drop-inLLMClientreplacement that uses control-flow exceptions (BatchCollecting,BatchInProgress) to signal the runner.BatchStatepersists build progress to JSON for multi-run resume support.Test plan
BatchLLMClientandBatchStateuv run releasepasses (1008 tests, lint, template sync, all demos)