Skip to content

feat(chaos): audit trail for injection events (#44)#245

Open
mvillmow wants to merge 1 commit into
mainfrom
44-chaos-audit-trail
Open

feat(chaos): audit trail for injection events (#44)#245
mvillmow wants to merge 1 commit into
mainfrom
44-chaos-audit-trail

Conversation

@mvillmow
Copy link
Copy Markdown
Contributor

Summary

Closes #44. Adds ChaosAuditLog, a header-only structured JSON-lines audit trail that records every chaos fault injection and removal so chaos test outcomes are no longer ephemeral.

Each record captures:

  • timestamp — ISO-8601 UTC with millisecond precision
  • actioninject | remove
  • fault_typenetwork-partition, latency, kill, queue-starve, ...
  • fault_id — id returned by Agamemnon
  • target — Agamemnon base URL the fault was sent to
  • status — HTTP status returned by Agamemnon (0 on transport failure)
  • requesterCHAOS_AUDIT_REQUESTER env, falling back to $USER, then unknown
  • details — verbatim Agamemnon response body
  • schema_version1

Configuration

Env var Default Purpose
CHAOS_AUDIT_LOG stderr File path for the audit trail. -, stderr, or unset emit to stderr. Any other value is appended to that file path.
CHAOS_AUDIT_REQUESTER $USER or unknown Identity recorded in each record's requester field.

If the requested file path cannot be opened the class falls back to stderr with a one-line warning rather than crashing the test process — audit emission is best-effort telemetry and must never break a chaos test. Writes are mutex-serialised, so one instance is safe across threads.

Why a file-based JSONL log (not NATS)?

The issue body asks for persistence "to stderr or a dedicated file". A header-only JSONL emitter:

  • Captures the audit gap immediately (timestamp + environment + fault) with zero new infrastructure.
  • Plays nicely with the existing hi.logs.> Promtail pipeline by tailing the file (or capturing stderr from the test runner) — no nats.c FetchContent needed.
  • Stays well inside the MEDIUM scope cap (6 files, ~340 LOC added).

A future ticket can layer NATS publishing on top by reusing the same envelope.

Changes

  • include/projectcharybdis/chaos_audit.hpp — new header-only ChaosAuditLog.
  • test/src/test_chaos_audit_unit.cpp — 6 unit tests covering envelope shape, inject vs remove, multi-line ordering, and stderr-fallback paths.
  • test/src/test_chaos_api.cpp — E01–E04 routed through inject() / remove_fault() helpers that emit audit records around every Agamemnon API call.
  • cmake/SourcesAndHeaders.cmake, test/CMakeLists.txt — wire the new header and unit test into the build.
  • README.md — documents CHAOS_AUDIT_LOG and CHAOS_AUDIT_REQUESTER.

Test plan

  • Local build with conan + cmake (clang-tidy/cppcheck disabled locally due to pre-existing stddef.h toolchain issue on this host, unrelated to this PR — affects unchanged src/http_test_client.cpp on main too): ProjectCharybdis_tests and ProjectCharybdis_integration_tests both link clean.
  • All 6 new unit tests pass.
  • No regression in the 40 pre-existing unit tests (the one failing test, HttpTestClientUnit.MalformedUrlFallsBackToDefaults, fails on main too on hosts where something happens to listen on localhost:8080).
  • CI: full cmake --preset ci build with -Werror, clang-tidy, cppcheck.

🤖 Generated with Claude Code

Closes #44

Adds `ChaosAuditLog`, a header-only structured JSON-lines audit trail that
records every chaos fault injection and removal so chaos test outcomes are
no longer ephemeral. Each record captures timestamp (ISO-8601 UTC ms),
action (inject|remove), fault_type, fault_id, target Agamemnon URL, HTTP
status, requester identity, and the verbatim Agamemnon response body.

Destination is controlled by `CHAOS_AUDIT_LOG` (file path; `-`/`stderr`/
unset emit to stderr). Requester identity comes from `CHAOS_AUDIT_REQUESTER`,
falling back to `$USER` then "unknown". Emission is best-effort: a bad
audit-log path falls back to stderr with a warning rather than failing the
test, and writes are mutex-serialised so a single instance is safe across
threads.

`test_chaos_api.cpp` (E01-E04) is wired through `inject()` / `remove_fault()`
helpers that emit audit records around every Agamemnon API call.

Six new unit tests in `test_chaos_audit_unit.cpp` exercise envelope shape,
inject/remove distinction, multi-line ordering, and stderr-fallback paths
without requiring Agamemnon or NATS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mvillmow mvillmow enabled auto-merge (squash) May 14, 2026 03:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[MINOR] §15: No audit trail for chaos injection events

1 participant