Real-time log compression for agents.
codag-drain adapts Drain3 into a streaming
log-templating engine. It collapses high-volume, repetitive log lines into a
compact set of template groups — each with a few raw samples and slot summaries —
so an agent can take in a large, noisy log window without spending its context
budget on near-duplicate lines.
That is the whole job. It is the compression layer behind codag wrap: turn a
raw log stream into a small, agent-readable artifact, in real time, as lines
arrive. What reads that artifact decides what it means.
drain3_rust/ Rust Drain3 implementation used as the base algorithm
codag-drain/ deterministic templating library and CLI
examples/server/ reference HTTP host for long-lived wrapping sessions
docs/ design and evaluation notes
The default GrouperKind::Drain is Drain-style positional similarity with one
codag adaptation:
- normal logs use Drain3-compatible whitespace tokenization and default masking;
- compact punctuation-heavy one-token logs, such as compact JSON, use a generic character-class tokenizer so Drain still has token positions to compare;
- output rendering is codag-specific: template count, samples, and slot summaries.
Additional deterministic arms are available for evaluation:
drain-stockdrain-delimiteddrain-fullsearchstatistical
See docs/PUBLIC_BENCHMARKS.md for reproducible LogHub parser benchmarks and docs/AGENT_SERVING_EVAL.md for the downstream blind-judge evidence.
echo 'worker ready shard=1
worker ready shard=2' | cargo run -p codag-drainJSON output:
echo 'worker ready shard=1' \
| cargo run -p codag-drain -- --format jsonSelect a grouper:
cargo run -p codag-drain -- --grouper drain-stockPrint CLI compression stats on stderr:
cargo run -p codag-drain -- --statsThe examples/server crate is a thin host around TemplateIndex. It is useful
for local integration tests and as a deployment reference; production auth,
tenancy, persistence, and routing should live in the production service layer.
cargo run -p codag-drain-serverRoutes:
GET /health
POST /v1/template
POST /v1/session/:id/ingest
GET /v1/session/:id/templates
Query parameters:
grouper=drain|drain-stock|drain-delimited|drain-fullsearch|statistical
samples=N
format=text|json
body=text|ndjson
The hosted production instance is a separate Railway service:
Railway project: codag-drain
Railway service: codag-drain
Production URL: https://codag-drain-production.up.railway.app
Backend env: CODAG_DRAIN_URL=https://codag-drain-production.up.railway.app
All /v1/* routes require Authorization: Bearer <token>. Configure the
same secret value on both services:
codag-drain: CODAG_DRAIN_AUTH_TOKEN=<random secret>
backend: CODAG_DRAIN_AUTH_TOKEN=<same random secret>
Deploy it from this repo root:
railway up --service codag-drain --environment production --detachSee CONTRIBUTING.md. Benchmark claims must be scoped and paired against raw logs and Drain3; do not claim a win from compression alone.
CARGO_TARGET_DIR=/private/tmp/codag-drain-target cargo test
CARGO_TARGET_DIR=/private/tmp/codag-drain-target cargo clippy --all-targets --all-features -- -D warningsHeld-out evals are ignored by default because they need local data:
LOGHUB_DIR=/path/to/loghub2 \
CARGO_TARGET_DIR=/private/tmp/codag-drain-target cargo test -p codag-drain --test eval_loghub grouping_loghub -- --ignored --nocapture
GITHUB_JSONL=/path/to/github.jsonl \
CARGO_TARGET_DIR=/private/tmp/codag-drain-target cargo test -p codag-drain --test eval_loghub grouping_github_lora -- --ignored --nocapture