-
Notifications
You must be signed in to change notification settings - Fork 2.2k
chore(tests): Antithesis Vector end-to-end ack scenario #25571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| /.launch/ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,111 @@ | ||
| # syntax=docker/dockerfile:1 | ||
|
|
||
| ARG SANCOV_RUSTFLAGS='["-Cpasses=sancov-module","-Cllvm-args=-sanitizer-coverage-level=3","-Cllvm-args=-sanitizer-coverage-trace-pc-guard","-Clink-args=-Wl,--build-id"]' | ||
|
|
||
| ############################ | ||
| # Vector SUT — build stage # | ||
| ############################ | ||
| FROM rust:1.92-bookworm AS vector-build | ||
| RUN apt-get update && apt-get install -y --no-install-recommends \ | ||
| protobuf-compiler cmake perl build-essential pkg-config libssl-dev clang binutils \ | ||
| && rm -rf /var/lib/apt/lists/* | ||
| WORKDIR /src | ||
| COPY Cargo.toml Cargo.lock rust-toolchain.toml build.rs ./ | ||
| COPY lib ./lib | ||
| COPY src ./src | ||
| COPY proto ./proto | ||
| COPY benches ./benches | ||
| COPY tests ./tests | ||
| COPY vdev ./vdev | ||
| COPY scripts ./scripts | ||
| ARG SANCOV_RUSTFLAGS | ||
| # debug=true keeps DWARF for /symbols. lto=false keeps sancov instrumentation | ||
| # predictable and stops the optimizer from dropping the force-linked runtime. | ||
| RUN --mount=type=cache,target=/usr/local/cargo/registry \ | ||
| --mount=type=cache,target=/src/target \ | ||
| cargo build --release --no-default-features \ | ||
| --features "sources-http_server,sinks-http,sources-internal_metrics,sinks-prometheus,antithesis-scenario-memory" \ | ||
| --bin vector \ | ||
| --config 'build.target = "x86_64-unknown-linux-gnu"' \ | ||
| --config 'profile.release.debug = true' \ | ||
| --config 'profile.release.lto = false' \ | ||
| --config "target.x86_64-unknown-linux-gnu.rustflags = ${SANCOV_RUSTFLAGS}" \ | ||
| && cp target/x86_64-unknown-linux-gnu/release/vector /usr/local/bin/vector \ | ||
| && echo "validating instrumentation symbols..." \ | ||
| && nm /usr/local/bin/vector | grep -q __sanitizer_cov_trace_pc_guard \ | ||
| && nm /usr/local/bin/vector | grep -q antithesis_load_libvoidstar \ | ||
| && echo "instrumentation OK" | ||
|
|
||
| ################################## | ||
| # Harness (workload) — build stage | ||
| ################################## | ||
| # The workload binaries live in the shared `harness` crate, a member of Vector's | ||
| # workspace, so the build needs the workspace root and member manifests. `-p` | ||
| # compiles only the harness bins. | ||
| FROM rust:1.92-bookworm AS workload-build | ||
| RUN apt-get update && apt-get install -y --no-install-recommends binutils \ | ||
| && rm -rf /var/lib/apt/lists/* | ||
| WORKDIR /src | ||
| COPY Cargo.toml Cargo.lock rust-toolchain.toml build.rs ./ | ||
| COPY lib ./lib | ||
| COPY src ./src | ||
| COPY proto ./proto | ||
| COPY benches ./benches | ||
| COPY tests ./tests | ||
| COPY vdev ./vdev | ||
| COPY scripts ./scripts | ||
| ARG SANCOV_RUSTFLAGS | ||
| # debug=true keeps DWARF for /symbols. | ||
| RUN --mount=type=cache,target=/usr/local/cargo/registry \ | ||
| --mount=type=cache,target=/src/target \ | ||
| cargo build --release -p antithesis-harness \ | ||
| --config 'build.target = "x86_64-unknown-linux-gnu"' \ | ||
| --config 'profile.release.debug = true' \ | ||
| --config "target.x86_64-unknown-linux-gnu.rustflags = ${SANCOV_RUSTFLAGS}" \ | ||
| && D=target/x86_64-unknown-linux-gnu/release \ | ||
| && cp "$D/oracle" "$D/parallel_driver_produce" "$D/eventually_conservation" /usr/local/bin/ \ | ||
| && echo "validating instrumentation symbols..." \ | ||
| && nm /usr/local/bin/oracle | grep -q __sanitizer_cov_trace_pc_guard \ | ||
| && nm /usr/local/bin/oracle | grep -q antithesis_load_libvoidstar \ | ||
| && echo "instrumentation OK" | ||
|
|
||
| ##################################### | ||
| # Runtime: Vector SUT (the one node) # | ||
| ##################################### | ||
| FROM debian:stable-slim AS vector | ||
| RUN apt-get update && apt-get install -y --no-install-recommends curl ca-certificates \ | ||
| && rm -rf /var/lib/apt/lists/* | ||
| COPY --from=vector-build /usr/local/bin/vector /usr/bin/vector | ||
| # Bake the node config plus its benign alternate, which the reload fault swaps in | ||
| # to force a sink rebuild. | ||
| COPY tests/antithesis/scenarios/vector_e2e/vector.yaml /etc/vector/vector.yaml | ||
| COPY tests/antithesis/scenarios/vector_e2e/vector.b.yaml /etc/vector/vector.b.yaml | ||
| # The reload fault is an anytime_ test command that runs IN the node container. | ||
| # The node stays running because its entrypoint is Vector, not a test command. | ||
| COPY --chmod=755 tests/antithesis/scenarios/vector_e2e/anytime_reload.sh /opt/antithesis/test/v1/ve2e/anytime_reload | ||
| RUN mkdir -p /symbols && ln -s /usr/bin/vector /symbols/vector | ||
| ENV NO_COLOR=1 | ||
| EXPOSE 8080 9598 | ||
| ENTRYPOINT ["/usr/bin/vector"] | ||
|
|
||
| ################################### | ||
| # Runtime: workload (oracle + cmds) # | ||
| ################################### | ||
| FROM debian:stable-slim AS workload | ||
| RUN apt-get update && apt-get install -y --no-install-recommends curl ca-certificates \ | ||
| && rm -rf /var/lib/apt/lists/* | ||
| # The oracle is the entrypoint. The two test-command binaries are the test commands: | ||
| # drop them straight into the test template, named by their Antithesis prefix. | ||
| COPY --from=workload-build /usr/local/bin/oracle /usr/bin/oracle | ||
| COPY --from=workload-build /usr/local/bin/parallel_driver_produce /opt/antithesis/test/v1/ve2e/parallel_driver_produce | ||
| COPY --from=workload-build /usr/local/bin/eventually_conservation /opt/antithesis/test/v1/ve2e/eventually_conservation | ||
| # Symbolize all three instrumented binaries: the oracle entrypoint and both | ||
| # test-command bins. A crash in any of them must resolve against /symbols. | ||
| RUN mkdir -p /symbols \ | ||
| && ln -s /usr/bin/oracle /symbols/oracle \ | ||
| && ln -s /opt/antithesis/test/v1/ve2e/parallel_driver_produce /symbols/parallel_driver_produce \ | ||
| && ln -s /opt/antithesis/test/v1/ve2e/eventually_conservation /symbols/eventually_conservation | ||
| ENV NO_COLOR=1 | ||
| # The oracle waits for the SUT, emits setup_complete via the SDK, then serves so | ||
| # Antithesis runs the test commands in this container. | ||
| ENTRYPOINT ["/usr/bin/oracle"] | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,57 @@ | ||
| # vector_e2e | ||
|
|
||
| The no-disk counterpart of `vector_to_vector_e2e_disk`. Same two properties, one | ||
| Vector process, memory buffer instead of `disk_v2`. | ||
|
|
||
| **Conservation**: every event the oracle acked must eventually come back, across | ||
| arbitrary Antithesis faults. Duplicates are allowed because the contract is | ||
| at-least-once. A missing acked id is the bug. | ||
|
|
||
| **Liveness**: once faults stop, the node must still accept fresh writes. | ||
|
|
||
| ## Why a single process | ||
|
|
||
| Vector's end-to-end acknowledgements are in-process: a source holds the client's | ||
| ack until every sink that received the event has finished. So a single node is the | ||
| honest place to test what an e2e ack promises. The producer POSTs to the node's | ||
| `http_server` source; the 200 comes back only once the `http` sink has delivered | ||
| the event to the oracle and the oracle returned 2xx. That means an acked id has | ||
| **already** reached the oracle — which is why conservation can hold even though the | ||
| node has no disk buffer and a crash drops whatever is still in memory: those | ||
| in-flight events were never acked, so they were never an obligation. | ||
|
|
||
| ## How it works | ||
|
|
||
| One Vector node and one oracle container. | ||
|
|
||
| - **vector** takes an `http_server` source (`:8080`) and delivers over `http` to | ||
| the oracle through an in-memory buffer with `when_full: block` and e2e acks. It | ||
| also exposes Prometheus metrics (`:9598`) for the health gate, and runs the | ||
| reload fault: an `anytime_` command swaps `vector.yaml`/`vector.b.yaml` and sends | ||
| `SIGHUP`, forcing the sink to rebuild mid-run. | ||
| - **oracle** (`:8686`) is one container that injects unique event ids at the node | ||
| and runs the HTTP endpoint the node's sink delivers back to. | ||
|
|
||
| The oracle keeps its id sets in memory and Antithesis never terminates it, so the | ||
| faults under test cannot corrupt the judge. The workload binaries (`oracle`, | ||
| `parallel_driver_produce`, `eventually_conservation`) are the shared, buffer- | ||
| agnostic bins from `tests/antithesis/harness`, pointed at this topology by the | ||
| environment in `docker-compose.yaml`. | ||
|
|
||
| ## Run | ||
|
|
||
| Validate the config locally: | ||
|
|
||
| ```bash | ||
| cd tests/antithesis | ||
| docker compose -f scenarios/vector_e2e/docker-compose.yaml build | ||
| snouty validate scenarios/vector_e2e | ||
| ``` | ||
|
|
||
| Submit a run through the shared launcher, which pins the fault profile (see | ||
| `tests/antithesis/AGENTS.md`): | ||
|
|
||
| ```bash | ||
| cd tests/antithesis/scenarios | ||
| ./launch.sh vector_e2e | ||
|
blt marked this conversation as resolved.
|
||
| ``` | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,34 @@ | ||
| #!/usr/bin/env bash | ||
|
|
||
| set -euo pipefail | ||
| [ -n "${VECTOR_CONFIG_ALT:-}" ] || exit 0 | ||
| cfg="${VECTOR_CONFIG:?}" | ||
| alt="${VECTOR_CONFIG_ALT:?}" | ||
|
|
||
| # Vector only ever reads $cfg, so reload alternates $cfg between two immutable | ||
| # sources rather than swapping two live files. The alternate $alt is never | ||
| # written, and the baseline (the original $cfg) is snapshotted once, so the only | ||
| # mutable file is $cfg and the only writes to it are a single rename of a fully | ||
| # written temp. The node-termination fault can therefore interrupt this script at | ||
| # any point and leave $cfg as one complete config or the other, never half-written | ||
| # and never collapsed so both sources hold the same content. Alternation always | ||
| # resumes on the next invocation. | ||
| base="$cfg.orig" | ||
| if [ ! -f "$base" ]; then | ||
| cp "$cfg" "$base.tmp" | ||
| mv "$base.tmp" "$base" | ||
| fi | ||
|
|
||
| # Pick whichever source is not currently live. cksum reads from stdin so its | ||
| # output is the checksum alone, with no filename to differ on. | ||
| if [ "$(cksum <"$cfg")" = "$(cksum <"$alt")" ]; then | ||
| next="$base" | ||
| else | ||
| next="$alt" | ||
| fi | ||
| cp "$next" "$cfg.tmp" | ||
| mv "$cfg.tmp" "$cfg" | ||
|
|
||
| # Vector is PID 1 in the node container. SIGHUP triggers reload-from-disk. | ||
| kill -HUP 1 | ||
| sleep 5 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,51 @@ | ||
| name: vector-e2e | ||
|
|
||
| x-vector-build: &vector-build | ||
| context: ../../../.. | ||
| dockerfile: tests/antithesis/scenarios/vector_e2e/Dockerfile | ||
| target: vector | ||
|
|
||
| x-node-health: &node-health | ||
| test: ["CMD", "curl", "-fsS", "http://localhost:9598/metrics"] | ||
| interval: 5s | ||
| timeout: 3s | ||
| retries: 30 | ||
| start_period: 10s | ||
|
|
||
| services: | ||
| vector: | ||
| container_name: vector | ||
| hostname: vector | ||
| platform: linux/amd64 | ||
| init: true | ||
| build: *vector-build | ||
| image: ve2e-vector:${ANTITHESIS_IMAGE_TAG:-dev} | ||
| entrypoint: ["/usr/bin/vector", "--config", "/etc/vector/vector.yaml"] | ||
| # vector runs the reload fault: VECTOR_CONFIG_ALT lets anytime_reload swap | ||
| # configs and SIGHUP, forcing the sink to rebuild. No disk buffer, so no volume. | ||
| environment: | ||
| NO_COLOR: "1" | ||
| VECTOR_CONFIG: "/etc/vector/vector.yaml" | ||
| VECTOR_CONFIG_ALT: "/etc/vector/vector.b.yaml" | ||
| healthcheck: *node-health | ||
|
|
||
| oracle: | ||
| container_name: oracle | ||
| hostname: oracle | ||
| platform: linux/amd64 | ||
| init: true | ||
| build: | ||
| context: ../../../.. | ||
| dockerfile: tests/antithesis/scenarios/vector_e2e/Dockerfile | ||
| target: workload | ||
| image: ve2e-oracle:${ANTITHESIS_IMAGE_TAG:-dev} | ||
| environment: | ||
| NO_COLOR: "1" | ||
| SCENARIO_NAME: "vector_e2e" | ||
| VECTOR_SOURCE_URL: "http://vector:8080/" | ||
| VECTOR_METRICS_URL: "http://vector:9598/metrics" | ||
| VECTOR_METRICS_URLS: "http://vector:9598/metrics" | ||
| # Test commands run in this container, so they reach the oracle locally. | ||
| ORACLE_URL: "http://127.0.0.1:8686" | ||
| depends_on: | ||
| vector: { condition: service_healthy } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| # Per-scenario settings for ../launch.sh. Only the bits that differ from every | ||
| # other scenario live here; the launcher supplies the rest. | ||
| SCENARIO_TEST_NAME=vector_e2e | ||
| SCENARIO_DESCRIPTION="memory-buffer e2e conservation under crash/hang/throttle of the node" | ||
| SCENARIO_FAULT_NODES="vector" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,36 @@ | ||
| sources: | ||
| in: | ||
| type: http_server | ||
| address: 0.0.0.0:8080 | ||
| decoding: | ||
| codec: json | ||
| acknowledgements: | ||
| enabled: true | ||
|
|
||
| metrics: | ||
| type: internal_metrics | ||
| scrape_interval_secs: 1 | ||
|
|
||
| sinks: | ||
| out: | ||
| type: http | ||
| inputs: [in] | ||
| uri: http://oracle:8686/ingest | ||
| method: post | ||
| encoding: | ||
| codec: json | ||
| # Benign alternate the reload fault swaps in. It differs from vector.yaml only | ||
| # by an explicit request timeout, enough to make the reload rebuild the sink. | ||
| request: | ||
| timeout_secs: 45 | ||
| buffer: | ||
| type: memory | ||
| max_events: 500 | ||
| when_full: block | ||
| acknowledgements: | ||
| enabled: true | ||
|
|
||
| prom: | ||
| type: prometheus_exporter | ||
| inputs: [metrics] | ||
| address: 0.0.0.0:9598 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,38 @@ | ||
| sources: | ||
| in: | ||
| type: http_server | ||
| address: 0.0.0.0:8080 | ||
| decoding: | ||
| codec: json | ||
| acknowledgements: | ||
| enabled: true | ||
|
|
||
| metrics: | ||
| type: internal_metrics | ||
| scrape_interval_secs: 1 | ||
|
|
||
| sinks: | ||
| out: | ||
| type: http | ||
| inputs: [in] | ||
| uri: http://oracle:8686/ingest | ||
| method: post | ||
| encoding: | ||
| codec: json | ||
| # Memory buffer, no disk: this is the no-disk counterpart of the disk scenario. | ||
| # when_full: block keeps the same backpressure so the source applies it to the | ||
| # client instead of dropping. With end-to-end acks the 200 to the producer | ||
| # fires only once this sink has delivered to the oracle, so an acked event has | ||
| # already arrived — that is why conservation can hold even though a crash loses | ||
| # whatever is still in this in-memory buffer. | ||
| buffer: | ||
| type: memory | ||
| max_events: 500 | ||
| when_full: block | ||
| acknowledgements: | ||
| enabled: true | ||
|
|
||
| prom: | ||
| type: prometheus_exporter | ||
| inputs: [metrics] | ||
| address: 0.0.0.0:9598 |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.