Skip to content

Log-Analyzer/Drain3-rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

drain3-rs

A Rust-accelerated drop-in replacement for Drain3, the streaming log template miner. Exposes the exact same Python API so existing code works without modification.

Installation

Requires a Rust toolchain (rustc + cargo). Install the latest stable with:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Then install drain3-rs directly from source:

pip install /path/to/drain3-rs

Or if published to PyPI:

pip install drain3

For Kafka or Redis persistence, install the optional extras:

pip install 'drain3[kafka]'   # kafka-python
pip install 'drain3[redis]'   # redis

Usage

The API is identical to the original Drain3. Existing code requires no changes.

from drain3.template_miner import TemplateMiner
from drain3.template_miner_config import TemplateMinerConfig
from drain3.masking import MaskingInstruction

config = TemplateMinerConfig()
config.drain_sim_th = 0.4
config.drain_depth = 4
config.masking_instructions.append(MaskingInstruction(r'\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b', 'IP'))
config.masking_instructions.append(MaskingInstruction(r'[\-\+]?\b\d+\b', 'NUM'))

tm = TemplateMiner(None, config)

for line in log_lines:
    result = tm.add_log_message(line)
    print(result['template_mined'])

Or load configuration from a file:

tm = TemplateMiner()   # reads drain3.ini from the working directory

Output dictionary

add_log_message() returns a dict with:

Key Description
change_type "cluster_created", "cluster_template_changed", or "none"
cluster_id Sequential integer ID of the matched/created cluster
cluster_size Number of messages matched to this cluster so far
cluster_count Total number of clusters seen
template_mined The current template string for this cluster

Parameter extraction

result = tm.add_log_message("user johndoe logged in 11 minutes ago")
params = tm.extract_parameters(result["template_mined"], log_line)
# [ExtractedParameter(value='johndoe', mask_name='*'),
#  ExtractedParameter(value='11', mask_name='NUM')]

Masking patterns

Pattern style determines which regex engine is used internally:

Pattern style Rust engine Speed
\b-based word boundaries regex crate (DFA) Fastest
Lookahead/lookbehind (?<=...) fancy-regex (backtracking) Same as Python

Use \b patterns for best performance. The drain3.ini included in this repo has all seven common log patterns rewritten with \b:

[MASKING]
masking = [
    {"regex_pattern":"\\b([0-9a-f]{2,}:){3,}[0-9a-f]{2,}\\b",          "mask_with": "ID"},
    {"regex_pattern":"\\b\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\b", "mask_with": "IP"},
    {"regex_pattern":"\\b[0-9a-f]{6,}(?: [0-9a-f]{6,}){2,}\\b",         "mask_with": "SEQ"},
    {"regex_pattern":"\\b[0-9A-F]{4}(?: [0-9A-F]{4}){3,}\\b",           "mask_with": "SEQ"},
    {"regex_pattern":"\\b0x[a-f0-9A-F]+\\b",                             "mask_with": "HEX"},
    {"regex_pattern":"[\\-\\+]?\\b\\d+\\b",                              "mask_with": "NUM"},
    {"regex_pattern":"(?<=executed cmd )(\".+?\")",                       "mask_with": "CMD"}
]

The CMD pattern uses a lookbehind (unavoidable for its semantics) and falls back to fancy-regex, which is still correct — just not DFA-speed.

Configuration

Configuration is identical to Drain3. The default config file is drain3.ini in the working directory.

Key parameters:

Section Key Default Description
[DRAIN] sim_th 0.4 Similarity threshold for cluster matching
[DRAIN] depth 4 Prefix tree depth (minimum 3)
[DRAIN] max_children 100 Max children per internal node
[DRAIN] max_clusters unlimited LRU cap on tracked clusters
[DRAIN] extra_delimiters [] Additional token delimiters, e.g. ["_"]
[MASKING] mask_prefix / mask_suffix < / > Wraps masked tokens
[SNAPSHOT] snapshot_interval_minutes 5 Periodic save interval
[SNAPSHOT] compress_state true Zlib-compress snapshots

Persistence

Snapshots save and restore the full cluster tree in a JSON format (zlib-compressed by default).

from drain3.file_persistence import FilePersistence
from drain3.memory_buffer_persistence import MemoryBufferPersistence

# File
tm = TemplateMiner(FilePersistence("drain3_state.bin"))

# In-memory (useful for tests)
buf = MemoryBufferPersistence()
tm = TemplateMiner(buf)

# Kafka / Redis — same API as original Drain3
from drain3.kafka_persistence import KafkaPersistence
tm = TemplateMiner(KafkaPersistence("drain3-snapshot", bootstrap_servers="localhost:9092"))

Custom persistence: subclass PersistenceHandler and implement save_state(bytes) / load_state() -> Optional[bytes].

Training vs. inference

# Training — updates templates as new patterns are seen
result = tm.add_log_message(log_line)

# Inference — matches existing templates only, returns None if no match
cluster = tm.match(log_line)                            # exact match
cluster = tm.match(log_line, "fallback")                # try tree, then full scan
cluster = tm.match(log_line, "always")                  # always full scan

Building from source

git clone <this-repo>
cd drain3-rs

# Install maturin (build tool for PyO3)
pip install maturin

# Build and install in the current venv (development)
maturin develop --release

# Build a wheel
maturin build --release

Run the benchmark (requires the original Python drain3 installed alongside for comparison):

python benches/benchmark.py

Run the original Drain3 test suite against the Rust implementation:

pip install pytest
python -m pytest /path/to/Drain3/tests/ -v

Engines

Two engines are supported, selected via [DRAIN]/engine in the config file:

  • Drain (default) — standard fixed-depth prefix tree, keys on token count at the root level.
  • JaccardDrain — variant that uses Jaccard similarity and keys on the first token, allowing variable-length template merging.

Differences from Python Drain3

  • Snapshot format: uses a native Rust JSON format instead of jsonpickle. Snapshots from the Python version are not compatible and must be regenerated.
  • No profiling output: SimpleProfiler is implemented but profiling metrics are not printed to stdout in this version.
  • Rust toolchain required to build from source (pre-built wheels have no such requirement).

Performance

Benchmarked on 500 K synthetic SSH-style log lines across two scenarios: no masking (pure tree throughput) and production-style \b-based masking patterns from drain3.ini. The internal tree and cluster cache are pure Rust data structures — no GIL borrows during traversal or template matching.

Scaling benchmark

No masking

Messages Rust (msg/s) Python (msg/s) Speedup
1,000 648,099 298,441 2.17x
5,000 735,609 295,114 2.49x
10,000 734,295 289,519 2.54x
25,000 729,276 303,969 2.40x
50,000 728,015 283,702 2.57x
100,000 704,195 299,077 2.35x
250,000 744,239 306,935 2.42x
500,000 736,453 288,195 2.56x

\b patterns (drain3.ini — 7 production masking rules)

Messages Rust (msg/s) Python (msg/s) Speedup
1,000 250,375 76,576 3.27x
5,000 237,816 70,298 3.38x
10,000 237,489 77,384 3.07x
25,000 252,220 79,155 3.19x
50,000 246,566 79,606 3.10x
100,000 254,946 80,038 3.19x
250,000 244,813 79,978 3.06x
500,000 247,813 67,977 3.65x

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors