feat: introduce BDD behavioral test suite (delivery metrics, creative formats)#1146
Merged
ChrisHuie merged 1 commit intoprebid:mainfrom Mar 19, 2026
Merged
Conversation
59a12b5 to
530cf25
Compare
… formats) Introduces a BDD behavioral test suite using pytest-bdd, covering two use cases with 108 Gherkin scenarios. Scenarios were derived from the AdCP specification, imported as .feature files, then wired to production code through step definitions and a multi-transport test harness. Use cases covered: - UC-004: Deliver media buy metrics (webhooks, retry/backoff, circuit breaker, HMAC auth, reporting dimensions, placement breakdowns) - UC-005: Discover creative formats (format discovery, input/output filtering, sandbox isolation, format assets) 822 passing tests across 4 transports (impl, a2a, mcp, rest). 642 xfailed tests for functionality not yet implemented. Other changes: - tox.ini: add bdd to env_list, remove phantom integration_v2/ui entries - Makefile: add test-bdd target, setup target, explicit --db/--stack flags - CI: point integration-tests-v2 job at tests/integration/ - creative_agent_registry: broadened SDK fallback + 429 retry with backoff - GetProductsRequest: add preferred_delivery_types (AdCP spec alignment) - DRY: extract duplicated media buy test helpers to shared module - pyasn1: upgrade 0.6.2 → 0.6.3 (GHSA-jr27-m4p2-rc6r)
530cf25 to
8a4f1c5
Compare
ChrisHuie
approved these changes
Mar 19, 2026
KonstantinMirin
added a commit
to KonstantinMirin/prebid-salesagent
that referenced
this pull request
Mar 21, 2026
Merges origin/develop which includes: - adcp library upgrade from 3.6.0 to 3.10.0 - BDD behavioral test suite extraction (prebid#1146) - Alembic migration graph fix (prebid#1144) - Creative agent format validation skip (prebid#1137) - Release 1.6.0 (prebid#1122) Conflict resolution strategy: - Production code: our params + develop's type annotations - BDD/harness: our version (more complete), with develop's guards adopted - uv.lock: develop's version (adcp 3.10) - Created merge migration for forked alembic heads
KonstantinMirin
added a commit
to KonstantinMirin/prebid-salesagent
that referenced
this pull request
Mar 28, 2026
…rn [] _fetch_formats_from_agent and _fetch_formats_raw_mcp had 5 paths that silently returned [] on anomalous responses (completed/data=None, submitted status, unexpected status, unparseable raw MCP response, empty content). This made failures indistinguishable from "agent up, genuinely 0 formats", causing products.py to reject all format IDs. Convert silent return [] to raises so list_all_formats_with_errors records errors and triggers graceful degradation. Cache poisoning (empty results cached for 1 hour) is fixed automatically since raises propagate before reaching the cache write. Also restores .duplication-baseline tests count from 84→111 (PR prebid#1143 merge accidentally reverted PR prebid#1146's bump). Fixes: prebid#1136
5 tasks
ChrisHuie
pushed a commit
that referenced
this pull request
Mar 30, 2026
…rn [] (#1167) * fix: raise on anomalous empty format responses instead of silent return [] _fetch_formats_from_agent and _fetch_formats_raw_mcp had 5 paths that silently returned [] on anomalous responses (completed/data=None, submitted status, unexpected status, unparseable raw MCP response, empty content). This made failures indistinguishable from "agent up, genuinely 0 formats", causing products.py to reject all format IDs. Convert silent return [] to raises so list_all_formats_with_errors records errors and triggers graceful degradation. Cache poisoning (empty results cached for 1 hour) is fixed automatically since raises propagate before reaching the cache write. Also restores .duplication-baseline tests count from 84→111 (PR #1143 merge accidentally reverted PR #1146's bump). Fixes: #1136 * chore: bump cryptography 46.0.5 → 46.0.6 (CVE-2026-34073) Fixes a bug where name constraints were not applied to peer names during verification when the leaf certificate contains a wildcard DNS SAN. * refactor: remove dead isinstance(fmt, dict) checks in media_buy_create.py Product format_ids from DB JSONB are always dicts; schema Product format_ids are always FormatId objects. Removed unreachable branches and used the correct access pattern for each data source. * refactor: remove dead isinstance(fmt, dict) checks in products.py Format parsing loops receive data from json.loads() which always returns dicts. Removed the dead isinstance guards from _parse_format_entries(), add_product() validation, and edit_product() validation. Also removed the test that exercised the dead branch with non-dict entries. * fix: raise on anomalous empty responses in signals_agent_registry Convert 3 silent `return []` paths to raises in _get_signals_from_agent: - completed/data=None: raise ValueError (anomalous response) - submitted/submitted=None: raise ValueError (missing async info) - unexpected status: raise ValueError (unknown status) The submitted-with-valid-webhook path (async) correctly returns [] unchanged. Same pattern as creative_agent_registry fix (PR #1167). Regression tests in test_signals_agent_silent_empty_bug.py. Beads: salesagent-9eu * refactor: remove dead isinstance(fmt, dict) guards from products.py parsing loops The format parsing loops in _parse_format_entries(), add_product(), and edit_product() receive data from json.loads() which always returns dicts. Removed the unreachable isinstance(fmt, dict) guards. * refactor: DRY format entry parsing in products.py Replaced duplicated format entry building loops in add_product() and edit_product() with calls to _parse_format_entries(), then filtering by valid_format_ids. This eliminates 32 lines of copy-pasted logic. * fix: restore isinstance guard for corrupt format_ids in approval flow The isinstance(fmt, dict) check in execute_approved_media_buy was not dead code — it protects against corrupt DB data (e.g. strings stored in format_ids JSONB column). Restores the guard with a descriptive ValueError instead of silently skipping. Fixes CI failure in test_invalid_format_unknown_type. * refactor: replace isinstance dict-plucking with FormatId.model_validate Replace manual dict key extraction and isinstance(fmt, dict) dispatch with FormatId.model_validate() in the approval flow and format validation. Pydantic handles type coercion, required-field checks, and validation errors — no manual field plucking needed. Adds HTTP(S) scheme check after model_validate since FormatId.agent_url is AnyUrl (accepts ftp://) but our business rule requires HTTP(S). Strips AnyUrl trailing-slash normalization to match agent registration. * fix: replace ValueError with AdCPNotFoundError in format_resolver.py get_format() now raises AdCPNotFoundError (HTTP 404) instead of generic ValueError when a format_id is not found. This aligns with the AdCP error hierarchy and allows callers to distinguish not-found errors from other validation failures. * fix: replace ValueError/RuntimeError with AdCPAdapterError in agent registries Creative and signals agent registries raised bare ValueError/RuntimeError for agent misbehavior (empty responses, unexpected status, unparseable MCP results). Callers only caught ValueError, letting RuntimeError escape unhandled. Both registries now raise AdCPAdapterError (502, transient) which correctly represents agent transport/protocol failures. Addresses PR #1167 review feedback (salesagent-mouh, salesagent-3hbx). * fix: widen catch sites to handle AdCPError subclasses from get_format() get_format_by_id in _base.py now catches AdCPNotFoundError (not just ValueError) and returns None. AdCPAdapterError still propagates — a broken agent is not "format not found". GAM orders.py catches both AdCPNotFoundError and AdCPAdapterError to produce contextual error messages for all format lookup failures. Addresses PR #1167 review feedback (salesagent-044q). * docs: add test architecture guide and update outdated README tests/CLAUDE.md establishes the authoritative test architecture guide for AI agents — harness environments, factory-boy factories, transport dispatching, obligation tests, and explicitly calls out anti-patterns (session.add, get_db_session, dict factories) that exist in the codebase but must not be replicated. tests/README.md rewritten to reflect current state: tox-based runner, 5 test suites, harness system, BDD infrastructure, structural guards. Removed stale references to dict factories, nonexistent directories, and SQLite.
KonstantinMirin
added a commit
to KonstantinMirin/prebid-salesagent
that referenced
this pull request
Mar 30, 2026
…rn [] _fetch_formats_from_agent and _fetch_formats_raw_mcp had 5 paths that silently returned [] on anomalous responses (completed/data=None, submitted status, unexpected status, unparseable raw MCP response, empty content). This made failures indistinguishable from "agent up, genuinely 0 formats", causing products.py to reject all format IDs. Convert silent return [] to raises so list_all_formats_with_errors records errors and triggers graceful degradation. Cache poisoning (empty results cached for 1 hour) is fixed automatically since raises propagate before reaching the cache write. Also restores .duplication-baseline tests count from 84→111 (PR prebid#1143 merge accidentally reverted PR prebid#1146's bump). Fixes: prebid#1136
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Introduces a BDD behavioral test suite using pytest-bdd, covering two use cases with 108 Gherkin scenarios. Scenarios were derived from the AdCP specification, imported as
.featurefiles, then wired to production code through step definitions and a multi-transport test harness. Multiple rounds of assertion quality review ensured that every Then step verifies actual values — not just existence checks.Test suite reorganization
The test suites were consolidated and restructured:
tests/integration_v2/→ merged intotests/integration/(50 files renamed)tests/ui/→ renamed totests/admin/(matches the tox env name)tests/bdd/— new BDD behavioral test suiteenv_listfixed:unit, integration, e2e, admin, bdd(was missingbdd, had phantomintegration_v2anduientries that silently no-op'd)Test matrix
unitintegrationbdde2eadminEntity-scoped test runs
Tests are auto-tagged with entity markers by filename pattern. Run any entity across all suites:
17 entities:
delivery,creative,product,media_buy,tenant,auth,adapter,inventory,schema,admin,architecture,targeting,transport,workflow,policy,agent,infraBDD suite details
UC-004 — Deliver media buy metrics (60 scenarios): Webhook delivery, retry/backoff, circuit breaker, HMAC authentication, reporting dimensions, placement breakdowns, date range partitioning, error handling.
UC-005 — Discover creative formats (48 scenarios): Format discovery, input/output filtering, production account sandbox isolation, format asset validation, empty-result handling.
Every scenario runs across 4 transports (impl, a2a, mcp, rest) — a test only graduates from
@pendingwhen all 4 transports pass. The 642 xfailed tests map to functionality not yet implemented.Assertion quality
Every Then step was reviewed for assertion completeness. Specific fixes:
_assert_partition_or_boundary: per-field content dispatch instead of just checking"response" in ctxthen_hmac_computation: independent HMAC computation instead of circular re-usethen_field_true/false: navigate to package level via_find_field_in_response()then_format_assets: check ALL formats, not just those with assetsthen_no_real_api_calls: subsystem-aware (adapter + registry)test_bdd_partition_assertion_strength.py) locks this in7 structural guards enforce BDD code quality on every
make qualityrun: no-pass steps, no trivial assertions, no dict registries, no duplicate steps, no silent env checks.Other changes
preferred_delivery_typesfield (AdCP spec alignment)_make_create_request/_get_tenant_dicttomedia_buy_helpers.pytests/integration/Test results (full suite, all 5 environments)
Test plan
make qualitypasses (4,042 unit tests, 0 failures)./run_all_tests.shfull suite — 6,723 passed, 0 failed