Skip to content

feat: introduce BDD behavioral test suite (delivery metrics, creative formats)#1146

Merged
ChrisHuie merged 1 commit intoprebid:mainfrom
KonstantinMirin:feature/requirements-overhaul
Mar 19, 2026
Merged

feat: introduce BDD behavioral test suite (delivery metrics, creative formats)#1146
ChrisHuie merged 1 commit intoprebid:mainfrom
KonstantinMirin:feature/requirements-overhaul

Conversation

@KonstantinMirin
Copy link
Copy Markdown
Collaborator

@KonstantinMirin KonstantinMirin commented Mar 18, 2026

Summary

Introduces a BDD behavioral test suite using pytest-bdd, covering two use cases with 108 Gherkin scenarios. Scenarios were derived from the AdCP specification, imported as .feature files, then wired to production code through step definitions and a multi-transport test harness. Multiple rounds of assertion quality review ensured that every Then step verifies actual values — not just existence checks.

Test suite reorganization

The test suites were consolidated and restructured:

  • tests/integration_v2/ → merged into tests/integration/ (50 files renamed)
  • tests/ui/ → renamed to tests/admin/ (matches the tox env name)
  • tests/bdd/ — new BDD behavioral test suite
  • tox env_list fixed: unit, integration, e2e, admin, bdd (was missing bdd, had phantom integration_v2 and ui entries that silently no-op'd)

Test matrix

Suite Files Tests What it covers
unit 299 4,042 Fast isolated tests, mocked externals
integration 166 1,773 Real PostgreSQL, full request pipelines
bdd 12 822 + 642 xfail Gherkin behavioral scenarios, 4-transport parity
e2e 21 82 Full Docker stack (app + nginx + postgres)
admin 2 4 Admin UI template rendering

Entity-scoped test runs

Tests are auto-tagged with entity markers by filename pattern. Run any entity across all suites:

make test-entity ENTITY=delivery       # all delivery tests (unit + integration + e2e + admin)
make test-entity ENTITY=creative       # all creative tests
make test-entity ENTITY=product        # all product tests

17 entities: delivery, creative, product, media_buy, tenant, auth, adapter, inventory, schema, admin, architecture, targeting, transport, workflow, policy, agent, infra

BDD suite details

UC-004 — Deliver media buy metrics (60 scenarios): Webhook delivery, retry/backoff, circuit breaker, HMAC authentication, reporting dimensions, placement breakdowns, date range partitioning, error handling.

UC-005 — Discover creative formats (48 scenarios): Format discovery, input/output filtering, production account sandbox isolation, format asset validation, empty-result handling.

Every scenario runs across 4 transports (impl, a2a, mcp, rest) — a test only graduates from @pending when all 4 transports pass. The 642 xfailed tests map to functionality not yet implemented.

Assertion quality

Every Then step was reviewed for assertion completeness. Specific fixes:

  • _assert_partition_or_boundary: per-field content dispatch instead of just checking "response" in ctx
  • then_hmac_computation: independent HMAC computation instead of circular re-use
  • then_field_true/false: navigate to package level via _find_field_in_response()
  • then_format_assets: check ALL formats, not just those with assets
  • then_no_real_api_calls: subsystem-aware (adapter + registry)
  • Regression test (test_bdd_partition_assertion_strength.py) locks this in

7 structural guards enforce BDD code quality on every make quality run: no-pass steps, no trivial assertions, no dict registries, no duplicate steps, no silent env checks.

Other changes

  • creative_agent_registry: Broadened SDK fallback trigger + 429 retry with exponential backoff
  • GetProductsRequest: Added preferred_delivery_types field (AdCP spec alignment)
  • DRY: Extract duplicated _make_create_request/_get_tenant_dict to media_buy_helpers.py
  • CI workflow: Point integration-tests-v2 job at tests/integration/
  • pyasn1: Upgrade 0.6.2 → 0.6.3 (GHSA-jr27-m4p2-rc6r)
  • Tooling: Generic agentic workflows extracted to external plugin

Test results (full suite, all 5 environments)

Suite Passed Failed XFailed
unit 4,042 0 19
integration 1,773 0 39
bdd 822 0 642
e2e 82 0 0
admin 4 0 0
Total 6,723 0 700

Test plan

  • make quality passes (4,042 unit tests, 0 failures)
  • ./run_all_tests.sh full suite — 6,723 passed, 0 failed
  • All 4 transports pass for every graduated BDD scenario
  • No new duplication violations
  • Regression test for assertion strength
  • 642 xfailed scenarios tracked — each maps to unimplemented functionality

@KonstantinMirin KonstantinMirin changed the title fix: BDD assertion overhaul — eliminate false-green confidence from trivial Then steps feat: introduce BDD behavioral test suite (UC-004, UC-005) with pytest-bdd Mar 18, 2026
@KonstantinMirin KonstantinMirin changed the title feat: introduce BDD behavioral test suite (UC-004, UC-005) with pytest-bdd feat: introduce BDD behavioral test suite (delivery metrics, creative formats) Mar 18, 2026
@KonstantinMirin KonstantinMirin force-pushed the feature/requirements-overhaul branch from 59a12b5 to 530cf25 Compare March 18, 2026 17:05
@ChrisHuie ChrisHuie requested review from ChrisHuie March 19, 2026 11:24
… formats)

Introduces a BDD behavioral test suite using pytest-bdd, covering two use
cases with 108 Gherkin scenarios. Scenarios were derived from the AdCP
specification, imported as .feature files, then wired to production code
through step definitions and a multi-transport test harness.

Use cases covered:
- UC-004: Deliver media buy metrics (webhooks, retry/backoff, circuit
  breaker, HMAC auth, reporting dimensions, placement breakdowns)
- UC-005: Discover creative formats (format discovery, input/output
  filtering, sandbox isolation, format assets)

822 passing tests across 4 transports (impl, a2a, mcp, rest).
642 xfailed tests for functionality not yet implemented.

Other changes:
- tox.ini: add bdd to env_list, remove phantom integration_v2/ui entries
- Makefile: add test-bdd target, setup target, explicit --db/--stack flags
- CI: point integration-tests-v2 job at tests/integration/
- creative_agent_registry: broadened SDK fallback + 429 retry with backoff
- GetProductsRequest: add preferred_delivery_types (AdCP spec alignment)
- DRY: extract duplicated media buy test helpers to shared module
- pyasn1: upgrade 0.6.2 → 0.6.3 (GHSA-jr27-m4p2-rc6r)
@KonstantinMirin KonstantinMirin force-pushed the feature/requirements-overhaul branch from 530cf25 to 8a4f1c5 Compare March 19, 2026 17:40
@ChrisHuie ChrisHuie merged commit 7f0d45a into prebid:main Mar 19, 2026
19 of 21 checks passed
KonstantinMirin added a commit to KonstantinMirin/prebid-salesagent that referenced this pull request Mar 21, 2026
Merges origin/develop which includes:
- adcp library upgrade from 3.6.0 to 3.10.0
- BDD behavioral test suite extraction (prebid#1146)
- Alembic migration graph fix (prebid#1144)
- Creative agent format validation skip (prebid#1137)
- Release 1.6.0 (prebid#1122)

Conflict resolution strategy:
- Production code: our params + develop's type annotations
- BDD/harness: our version (more complete), with develop's guards adopted
- uv.lock: develop's version (adcp 3.10)
- Created merge migration for forked alembic heads
KonstantinMirin added a commit to KonstantinMirin/prebid-salesagent that referenced this pull request Mar 28, 2026
…rn []

_fetch_formats_from_agent and _fetch_formats_raw_mcp had 5 paths that
silently returned [] on anomalous responses (completed/data=None,
submitted status, unexpected status, unparseable raw MCP response,
empty content). This made failures indistinguishable from "agent up,
genuinely 0 formats", causing products.py to reject all format IDs.

Convert silent return [] to raises so list_all_formats_with_errors
records errors and triggers graceful degradation. Cache poisoning
(empty results cached for 1 hour) is fixed automatically since
raises propagate before reaching the cache write.

Also restores .duplication-baseline tests count from 84→111 (PR prebid#1143
merge accidentally reverted PR prebid#1146's bump).

Fixes: prebid#1136
ChrisHuie pushed a commit that referenced this pull request Mar 30, 2026
…rn [] (#1167)

* fix: raise on anomalous empty format responses instead of silent return []

_fetch_formats_from_agent and _fetch_formats_raw_mcp had 5 paths that
silently returned [] on anomalous responses (completed/data=None,
submitted status, unexpected status, unparseable raw MCP response,
empty content). This made failures indistinguishable from "agent up,
genuinely 0 formats", causing products.py to reject all format IDs.

Convert silent return [] to raises so list_all_formats_with_errors
records errors and triggers graceful degradation. Cache poisoning
(empty results cached for 1 hour) is fixed automatically since
raises propagate before reaching the cache write.

Also restores .duplication-baseline tests count from 84→111 (PR #1143
merge accidentally reverted PR #1146's bump).

Fixes: #1136

* chore: bump cryptography 46.0.5 → 46.0.6 (CVE-2026-34073)

Fixes a bug where name constraints were not applied to peer names
during verification when the leaf certificate contains a wildcard
DNS SAN.

* refactor: remove dead isinstance(fmt, dict) checks in media_buy_create.py

Product format_ids from DB JSONB are always dicts; schema Product format_ids
are always FormatId objects. Removed unreachable branches and used the correct
access pattern for each data source.

* refactor: remove dead isinstance(fmt, dict) checks in products.py

Format parsing loops receive data from json.loads() which always returns dicts.
Removed the dead isinstance guards from _parse_format_entries(), add_product()
validation, and edit_product() validation. Also removed the test that exercised
the dead branch with non-dict entries.

* fix: raise on anomalous empty responses in signals_agent_registry

Convert 3 silent `return []` paths to raises in _get_signals_from_agent:
- completed/data=None: raise ValueError (anomalous response)
- submitted/submitted=None: raise ValueError (missing async info)
- unexpected status: raise ValueError (unknown status)

The submitted-with-valid-webhook path (async) correctly returns [] unchanged.

Same pattern as creative_agent_registry fix (PR #1167).
Regression tests in test_signals_agent_silent_empty_bug.py.

Beads: salesagent-9eu

* refactor: remove dead isinstance(fmt, dict) guards from products.py parsing loops

The format parsing loops in _parse_format_entries(), add_product(), and
edit_product() receive data from json.loads() which always returns dicts.
Removed the unreachable isinstance(fmt, dict) guards.

* refactor: DRY format entry parsing in products.py

Replaced duplicated format entry building loops in add_product() and
edit_product() with calls to _parse_format_entries(), then filtering
by valid_format_ids. This eliminates 32 lines of copy-pasted logic.

* fix: restore isinstance guard for corrupt format_ids in approval flow

The isinstance(fmt, dict) check in execute_approved_media_buy was not
dead code — it protects against corrupt DB data (e.g. strings stored
in format_ids JSONB column). Restores the guard with a descriptive
ValueError instead of silently skipping.

Fixes CI failure in test_invalid_format_unknown_type.

* refactor: replace isinstance dict-plucking with FormatId.model_validate

Replace manual dict key extraction and isinstance(fmt, dict) dispatch
with FormatId.model_validate() in the approval flow and format
validation. Pydantic handles type coercion, required-field checks,
and validation errors — no manual field plucking needed.

Adds HTTP(S) scheme check after model_validate since FormatId.agent_url
is AnyUrl (accepts ftp://) but our business rule requires HTTP(S).
Strips AnyUrl trailing-slash normalization to match agent registration.

* fix: replace ValueError with AdCPNotFoundError in format_resolver.py

get_format() now raises AdCPNotFoundError (HTTP 404) instead of generic
ValueError when a format_id is not found. This aligns with the AdCP error
hierarchy and allows callers to distinguish not-found errors from other
validation failures.

* fix: replace ValueError/RuntimeError with AdCPAdapterError in agent registries

Creative and signals agent registries raised bare ValueError/RuntimeError
for agent misbehavior (empty responses, unexpected status, unparseable MCP
results). Callers only caught ValueError, letting RuntimeError escape
unhandled. Both registries now raise AdCPAdapterError (502, transient)
which correctly represents agent transport/protocol failures.

Addresses PR #1167 review feedback (salesagent-mouh, salesagent-3hbx).

* fix: widen catch sites to handle AdCPError subclasses from get_format()

get_format_by_id in _base.py now catches AdCPNotFoundError (not just
ValueError) and returns None. AdCPAdapterError still propagates — a
broken agent is not "format not found".

GAM orders.py catches both AdCPNotFoundError and AdCPAdapterError to
produce contextual error messages for all format lookup failures.

Addresses PR #1167 review feedback (salesagent-044q).

* docs: add test architecture guide and update outdated README

tests/CLAUDE.md establishes the authoritative test architecture guide
for AI agents — harness environments, factory-boy factories, transport
dispatching, obligation tests, and explicitly calls out anti-patterns
(session.add, get_db_session, dict factories) that exist in the codebase
but must not be replicated.

tests/README.md rewritten to reflect current state: tox-based runner,
5 test suites, harness system, BDD infrastructure, structural guards.
Removed stale references to dict factories, nonexistent directories,
and SQLite.
KonstantinMirin added a commit to KonstantinMirin/prebid-salesagent that referenced this pull request Mar 30, 2026
…rn []

_fetch_formats_from_agent and _fetch_formats_raw_mcp had 5 paths that
silently returned [] on anomalous responses (completed/data=None,
submitted status, unexpected status, unparseable raw MCP response,
empty content). This made failures indistinguishable from "agent up,
genuinely 0 formats", causing products.py to reject all format IDs.

Convert silent return [] to raises so list_all_formats_with_errors
records errors and triggers graceful degradation. Cache poisoning
(empty results cached for 1 hour) is fixed automatically since
raises propagate before reaching the cache write.

Also restores .duplication-baseline tests count from 84→111 (PR prebid#1143
merge accidentally reverted PR prebid#1146's bump).

Fixes: prebid#1136
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants