Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 30 additions & 42 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,7 @@ members = [
"lib/codecs",
"lib/dnsmsg-parser",
"lib/docs-renderer",
"lib/drain-log",
"lib/fakedata",
"lib/file-source",
"lib/file-source-common",
Expand Down Expand Up @@ -379,6 +380,7 @@ csv = { version = "1.3", default-features = false }
databend-client = { version = "0.28.0", default-features = false, features = ["rustls"], optional = true }
derivative.workspace = true
dirs-next = { version = "2.0.0", default-features = false, optional = true }
drain-log = { path = "lib/drain-log", optional = true }
dyn-clone = { version = "1.0.20", default-features = false }
encoding_rs = { version = "0.8.35", default-features = false, features = ["serde"] }
enum_dispatch = { version = "0.3.13", default-features = false }
Expand Down Expand Up @@ -791,6 +793,7 @@ transforms-logs = [
"transforms-aws_ec2_metadata",
"transforms-dedupe",
"transforms-delay",
"transforms-drain",
"transforms-filter",
"transforms-window",
"transforms-log_to_metric",
Expand Down Expand Up @@ -821,6 +824,7 @@ transforms-aggregate = []
transforms-aws_ec2_metadata = ["dep:arc-swap"]
transforms-dedupe = ["transforms-impl-dedupe"]
transforms-delay = []
transforms-drain = ["dep:drain-log"]
transforms-filter = []
transforms-incremental_to_absolute = []
transforms-window = []
Expand Down
9 changes: 9 additions & 0 deletions changelog.d/drain_transform.feature.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Added a new `drain` transform that clusters log lines using the Drain log
parsing algorithm and annotates each event with a derived template string
(e.g. `user <*> logged in from <*>`). Mirrors the OpenTelemetry Collector
`drain` processor, including `seed_templates`, `seed_logs`, and
`warmup_min_clusters` for stable templates across deployments. Use the
emitted template field as input to a downstream `filter`/`route` to act on
classes of log patterns.

authors: srstrickland
15 changes: 15 additions & 0 deletions lib/drain-log/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
[package]
name = "drain-log"
version = "0.1.0"
edition = "2021"
authors = ["Vector Contributors <vector@datadoghq.com>"]
description = "Log template extraction via the Drain algorithm with LRU cluster eviction. Adapted from drain3 (akshatagarwl)."
license = "Apache-2.0"
publish = false

[dependencies]
bon = "3.9.1"
fastrand = "2.4.1"
snafu = "0.8"
string-interner = { version = "0.15", features = ["backends"] }
smallvec = "1.13"
17 changes: 17 additions & 0 deletions lib/drain-log/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/

Copyright 2026 Akshat Agarwal

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
24 changes: 24 additions & 0 deletions lib/drain-log/NOTICE
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
drain-log
Copyright 2026 Vector Contributors

This product includes software derived from drain3 (Apache License 2.0):

drain3 — Fast log template extraction via fixed-depth prefix trees
Copyright 2026 Akshat Agarwal
https://github.com/akshatagarwl/drain3

drain3 is itself a Rust port of logpai/Drain3:

Drain3 — Streaming log template miner with persistence and masking
https://github.com/logpai/Drain3
Released under the MIT License.

Local additions on top of the upstream drain3 sources:
* True LRU eviction of clusters once `max_clusters` is reached, so the
matcher can adapt to drifting log vocabularies on long-running streams
without unbounded memory growth. The LRU is implemented as an intrusive
doubly-linked list threaded through `Cluster`, giving O(1) touch on
match and O(1) eviction on cap; freed cluster ids are recycled so the
`clusters` slot vector stays bounded.
* A `cluster_count` accessor on `Matcher` exposing the live tracked
cluster count after eviction.
Loading
Loading