feat: introduce BDD behavioral test suite (delivery metrics, creative formats) by KonstantinMirin · Pull Request #1146 · prebid/salesagent

KonstantinMirin · 2026-03-18T16:34:18Z

Summary

Introduces a BDD behavioral test suite using pytest-bdd, covering two use cases with 108 Gherkin scenarios. Scenarios were derived from the AdCP specification, imported as .feature files, then wired to production code through step definitions and a multi-transport test harness. Multiple rounds of assertion quality review ensured that every Then step verifies actual values — not just existence checks.

Test suite reorganization

The test suites were consolidated and restructured:

tests/integration_v2/ → merged into tests/integration/ (50 files renamed)
tests/ui/ → renamed to tests/admin/ (matches the tox env name)
tests/bdd/ — new BDD behavioral test suite
tox env_list fixed: unit, integration, e2e, admin, bdd (was missing bdd, had phantom integration_v2 and ui entries that silently no-op'd)

Test matrix

Suite	Files	Tests	What it covers
`unit`	299	4,042	Fast isolated tests, mocked externals
`integration`	166	1,773	Real PostgreSQL, full request pipelines
`bdd`	12	822 + 642 xfail	Gherkin behavioral scenarios, 4-transport parity
`e2e`	21	82	Full Docker stack (app + nginx + postgres)
`admin`	2	4	Admin UI template rendering

Entity-scoped test runs

Tests are auto-tagged with entity markers by filename pattern. Run any entity across all suites:

make test-entity ENTITY=delivery       # all delivery tests (unit + integration + e2e + admin)
make test-entity ENTITY=creative       # all creative tests
make test-entity ENTITY=product        # all product tests

17 entities: delivery, creative, product, media_buy, tenant, auth, adapter, inventory, schema, admin, architecture, targeting, transport, workflow, policy, agent, infra

BDD suite details

UC-004 — Deliver media buy metrics (60 scenarios): Webhook delivery, retry/backoff, circuit breaker, HMAC authentication, reporting dimensions, placement breakdowns, date range partitioning, error handling.

UC-005 — Discover creative formats (48 scenarios): Format discovery, input/output filtering, production account sandbox isolation, format asset validation, empty-result handling.

Every scenario runs across 4 transports (impl, a2a, mcp, rest) — a test only graduates from @pending when all 4 transports pass. The 642 xfailed tests map to functionality not yet implemented.

Assertion quality

Every Then step was reviewed for assertion completeness. Specific fixes:

_assert_partition_or_boundary: per-field content dispatch instead of just checking "response" in ctx
then_hmac_computation: independent HMAC computation instead of circular re-use
then_field_true/false: navigate to package level via _find_field_in_response()
then_format_assets: check ALL formats, not just those with assets
then_no_real_api_calls: subsystem-aware (adapter + registry)
Regression test (test_bdd_partition_assertion_strength.py) locks this in

7 structural guards enforce BDD code quality on every make quality run: no-pass steps, no trivial assertions, no dict registries, no duplicate steps, no silent env checks.

Other changes

creative_agent_registry: Broadened SDK fallback trigger + 429 retry with exponential backoff
GetProductsRequest: Added preferred_delivery_types field (AdCP spec alignment)
DRY: Extract duplicated _make_create_request/_get_tenant_dict to media_buy_helpers.py
CI workflow: Point integration-tests-v2 job at tests/integration/
pyasn1: Upgrade 0.6.2 → 0.6.3 (GHSA-jr27-m4p2-rc6r)
Tooling: Generic agentic workflows extracted to external plugin

Test results (full suite, all 5 environments)

Suite	Passed	XFailed
unit	4,042	19
integration	1,773	39
bdd	822	642
e2e	82	0
admin	4	0
Total	6,723	700

Test plan

make quality passes (4,042 unit tests, 0 failures)
./run_all_tests.sh full suite — 6,723 passed, 0 failed
All 4 transports pass for every graduated BDD scenario
No new duplication violations
Regression test for assertion strength
642 xfailed scenarios tracked — each maps to unimplemented functionality

… formats) Introduces a BDD behavioral test suite using pytest-bdd, covering two use cases with 108 Gherkin scenarios. Scenarios were derived from the AdCP specification, imported as .feature files, then wired to production code through step definitions and a multi-transport test harness. Use cases covered: - UC-004: Deliver media buy metrics (webhooks, retry/backoff, circuit breaker, HMAC auth, reporting dimensions, placement breakdowns) - UC-005: Discover creative formats (format discovery, input/output filtering, sandbox isolation, format assets) 822 passing tests across 4 transports (impl, a2a, mcp, rest). 642 xfailed tests for functionality not yet implemented. Other changes: - tox.ini: add bdd to env_list, remove phantom integration_v2/ui entries - Makefile: add test-bdd target, setup target, explicit --db/--stack flags - CI: point integration-tests-v2 job at tests/integration/ - creative_agent_registry: broadened SDK fallback + 429 retry with backoff - GetProductsRequest: add preferred_delivery_types (AdCP spec alignment) - DRY: extract duplicated media buy test helpers to shared module - pyasn1: upgrade 0.6.2 → 0.6.3 (GHSA-jr27-m4p2-rc6r)

Merges origin/develop which includes: - adcp library upgrade from 3.6.0 to 3.10.0 - BDD behavioral test suite extraction (prebid#1146) - Alembic migration graph fix (prebid#1144) - Creative agent format validation skip (prebid#1137) - Release 1.6.0 (prebid#1122) Conflict resolution strategy: - Production code: our params + develop's type annotations - BDD/harness: our version (more complete), with develop's guards adopted - uv.lock: develop's version (adcp 3.10) - Created merge migration for forked alembic heads

…rn [] _fetch_formats_from_agent and _fetch_formats_raw_mcp had 5 paths that silently returned [] on anomalous responses (completed/data=None, submitted status, unexpected status, unparseable raw MCP response, empty content). This made failures indistinguishable from "agent up, genuinely 0 formats", causing products.py to reject all format IDs. Convert silent return [] to raises so list_all_formats_with_errors records errors and triggers graceful degradation. Cache poisoning (empty results cached for 1 hour) is fixed automatically since raises propagate before reaching the cache write. Also restores .duplication-baseline tests count from 84→111 (PR prebid#1143 merge accidentally reverted PR prebid#1146's bump). Fixes: prebid#1136

…rn [] (#1167) * fix: raise on anomalous empty format responses instead of silent return [] _fetch_formats_from_agent and _fetch_formats_raw_mcp had 5 paths that silently returned [] on anomalous responses (completed/data=None, submitted status, unexpected status, unparseable raw MCP response, empty content). This made failures indistinguishable from "agent up, genuinely 0 formats", causing products.py to reject all format IDs. Convert silent return [] to raises so list_all_formats_with_errors records errors and triggers graceful degradation. Cache poisoning (empty results cached for 1 hour) is fixed automatically since raises propagate before reaching the cache write. Also restores .duplication-baseline tests count from 84→111 (PR #1143 merge accidentally reverted PR #1146's bump). Fixes: #1136 * chore: bump cryptography 46.0.5 → 46.0.6 (CVE-2026-34073) Fixes a bug where name constraints were not applied to peer names during verification when the leaf certificate contains a wildcard DNS SAN. * refactor: remove dead isinstance(fmt, dict) checks in media_buy_create.py Product format_ids from DB JSONB are always dicts; schema Product format_ids are always FormatId objects. Removed unreachable branches and used the correct access pattern for each data source. * refactor: remove dead isinstance(fmt, dict) checks in products.py Format parsing loops receive data from json.loads() which always returns dicts. Removed the dead isinstance guards from _parse_format_entries(), add_product() validation, and edit_product() validation. Also removed the test that exercised the dead branch with non-dict entries. * fix: raise on anomalous empty responses in signals_agent_registry Convert 3 silent `return []` paths to raises in _get_signals_from_agent: - completed/data=None: raise ValueError (anomalous response) - submitted/submitted=None: raise ValueError (missing async info) - unexpected status: raise ValueError (unknown status) The submitted-with-valid-webhook path (async) correctly returns [] unchanged. Same pattern as creative_agent_registry fix (PR #1167). Regression tests in test_signals_agent_silent_empty_bug.py. Beads: salesagent-9eu * refactor: remove dead isinstance(fmt, dict) guards from products.py parsing loops The format parsing loops in _parse_format_entries(), add_product(), and edit_product() receive data from json.loads() which always returns dicts. Removed the unreachable isinstance(fmt, dict) guards. * refactor: DRY format entry parsing in products.py Replaced duplicated format entry building loops in add_product() and edit_product() with calls to _parse_format_entries(), then filtering by valid_format_ids. This eliminates 32 lines of copy-pasted logic. * fix: restore isinstance guard for corrupt format_ids in approval flow The isinstance(fmt, dict) check in execute_approved_media_buy was not dead code — it protects against corrupt DB data (e.g. strings stored in format_ids JSONB column). Restores the guard with a descriptive ValueError instead of silently skipping. Fixes CI failure in test_invalid_format_unknown_type. * refactor: replace isinstance dict-plucking with FormatId.model_validate Replace manual dict key extraction and isinstance(fmt, dict) dispatch with FormatId.model_validate() in the approval flow and format validation. Pydantic handles type coercion, required-field checks, and validation errors — no manual field plucking needed. Adds HTTP(S) scheme check after model_validate since FormatId.agent_url is AnyUrl (accepts ftp://) but our business rule requires HTTP(S). Strips AnyUrl trailing-slash normalization to match agent registration. * fix: replace ValueError with AdCPNotFoundError in format_resolver.py get_format() now raises AdCPNotFoundError (HTTP 404) instead of generic ValueError when a format_id is not found. This aligns with the AdCP error hierarchy and allows callers to distinguish not-found errors from other validation failures. * fix: replace ValueError/RuntimeError with AdCPAdapterError in agent registries Creative and signals agent registries raised bare ValueError/RuntimeError for agent misbehavior (empty responses, unexpected status, unparseable MCP results). Callers only caught ValueError, letting RuntimeError escape unhandled. Both registries now raise AdCPAdapterError (502, transient) which correctly represents agent transport/protocol failures. Addresses PR #1167 review feedback (salesagent-mouh, salesagent-3hbx). * fix: widen catch sites to handle AdCPError subclasses from get_format() get_format_by_id in _base.py now catches AdCPNotFoundError (not just ValueError) and returns None. AdCPAdapterError still propagates — a broken agent is not "format not found". GAM orders.py catches both AdCPNotFoundError and AdCPAdapterError to produce contextual error messages for all format lookup failures. Addresses PR #1167 review feedback (salesagent-044q). * docs: add test architecture guide and update outdated README tests/CLAUDE.md establishes the authoritative test architecture guide for AI agents — harness environments, factory-boy factories, transport dispatching, obligation tests, and explicitly calls out anti-patterns (session.add, get_db_session, dict factories) that exist in the codebase but must not be replicated. tests/README.md rewritten to reflect current state: tox-based runner, 5 test suites, harness system, BDD infrastructure, structural guards. Removed stale references to dict factories, nonexistent directories, and SQLite.

…rn [] _fetch_formats_from_agent and _fetch_formats_raw_mcp had 5 paths that silently returned [] on anomalous responses (completed/data=None, submitted status, unexpected status, unparseable raw MCP response, empty content). This made failures indistinguishable from "agent up, genuinely 0 formats", causing products.py to reject all format IDs. Convert silent return [] to raises so list_all_formats_with_errors records errors and triggers graceful degradation. Cache poisoning (empty results cached for 1 hour) is fixed automatically since raises propagate before reaching the cache write. Also restores .duplication-baseline tests count from 84→111 (PR prebid#1143 merge accidentally reverted PR prebid#1146's bump). Fixes: prebid#1136

KonstantinMirin changed the title ~~fix: BDD assertion overhaul — eliminate false-green confidence from trivial Then steps~~ feat: introduce BDD behavioral test suite (UC-004, UC-005) with pytest-bdd Mar 18, 2026

KonstantinMirin changed the title ~~feat: introduce BDD behavioral test suite (UC-004, UC-005) with pytest-bdd~~ feat: introduce BDD behavioral test suite (delivery metrics, creative formats) Mar 18, 2026

KonstantinMirin force-pushed the feature/requirements-overhaul branch from 59a12b5 to 530cf25 Compare March 18, 2026 17:05

ChrisHuie requested review from ChrisHuie March 19, 2026 11:24

KonstantinMirin force-pushed the feature/requirements-overhaul branch from 530cf25 to 8a4f1c5 Compare March 19, 2026 17:40

ChrisHuie approved these changes Mar 19, 2026

View reviewed changes

ChrisHuie merged commit 7f0d45a into prebid:main Mar 19, 2026
19 of 21 checks passed

github-actions bot mentioned this pull request Mar 19, 2026

chore(main): release 1.7.0 #1147

Merged

KonstantinMirin mentioned this pull request Mar 28, 2026

fix: raise on anomalous empty format responses instead of silent return [] #1167

Merged

5 tasks

KonstantinMirin mentioned this pull request Apr 3, 2026

fix: prevent stale signup_flow from bypassing tenant access lookup #1173

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: introduce BDD behavioral test suite (delivery metrics, creative formats)#1146

feat: introduce BDD behavioral test suite (delivery metrics, creative formats)#1146
ChrisHuie merged 1 commit intoprebid:mainfrom
KonstantinMirin:feature/requirements-overhaul

KonstantinMirin commented Mar 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

KonstantinMirin commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test suite reorganization

Test matrix

Entity-scoped test runs

BDD suite details

Assertion quality

Other changes

Test results (full suite, all 5 environments)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

KonstantinMirin commented Mar 18, 2026 •

edited

Loading