A Rust-accelerated drop-in replacement for Drain3, the streaming log template miner. Exposes the exact same Python API so existing code works without modification.
Requires a Rust toolchain (rustc + cargo). Install the latest stable with:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | shThen install drain3-rs directly from source:
pip install /path/to/drain3-rsOr if published to PyPI:
pip install drain3For Kafka or Redis persistence, install the optional extras:
pip install 'drain3[kafka]' # kafka-python
pip install 'drain3[redis]' # redisThe API is identical to the original Drain3. Existing code requires no changes.
from drain3.template_miner import TemplateMiner
from drain3.template_miner_config import TemplateMinerConfig
from drain3.masking import MaskingInstruction
config = TemplateMinerConfig()
config.drain_sim_th = 0.4
config.drain_depth = 4
config.masking_instructions.append(MaskingInstruction(r'\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b', 'IP'))
config.masking_instructions.append(MaskingInstruction(r'[\-\+]?\b\d+\b', 'NUM'))
tm = TemplateMiner(None, config)
for line in log_lines:
result = tm.add_log_message(line)
print(result['template_mined'])Or load configuration from a file:
tm = TemplateMiner() # reads drain3.ini from the working directoryadd_log_message() returns a dict with:
| Key | Description |
|---|---|
change_type |
"cluster_created", "cluster_template_changed", or "none" |
cluster_id |
Sequential integer ID of the matched/created cluster |
cluster_size |
Number of messages matched to this cluster so far |
cluster_count |
Total number of clusters seen |
template_mined |
The current template string for this cluster |
result = tm.add_log_message("user johndoe logged in 11 minutes ago")
params = tm.extract_parameters(result["template_mined"], log_line)
# [ExtractedParameter(value='johndoe', mask_name='*'),
# ExtractedParameter(value='11', mask_name='NUM')]Pattern style determines which regex engine is used internally:
| Pattern style | Rust engine | Speed |
|---|---|---|
\b-based word boundaries |
regex crate (DFA) |
Fastest |
Lookahead/lookbehind (?<=...) |
fancy-regex (backtracking) |
Same as Python |
Use \b patterns for best performance. The drain3.ini included in this repo has all seven common log patterns rewritten with \b:
[MASKING]
masking = [
{"regex_pattern":"\\b([0-9a-f]{2,}:){3,}[0-9a-f]{2,}\\b", "mask_with": "ID"},
{"regex_pattern":"\\b\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\b", "mask_with": "IP"},
{"regex_pattern":"\\b[0-9a-f]{6,}(?: [0-9a-f]{6,}){2,}\\b", "mask_with": "SEQ"},
{"regex_pattern":"\\b[0-9A-F]{4}(?: [0-9A-F]{4}){3,}\\b", "mask_with": "SEQ"},
{"regex_pattern":"\\b0x[a-f0-9A-F]+\\b", "mask_with": "HEX"},
{"regex_pattern":"[\\-\\+]?\\b\\d+\\b", "mask_with": "NUM"},
{"regex_pattern":"(?<=executed cmd )(\".+?\")", "mask_with": "CMD"}
]The CMD pattern uses a lookbehind (unavoidable for its semantics) and falls back to fancy-regex, which is still correct — just not DFA-speed.
Configuration is identical to Drain3. The default config file is drain3.ini in the working directory.
Key parameters:
| Section | Key | Default | Description |
|---|---|---|---|
[DRAIN] |
sim_th |
0.4 |
Similarity threshold for cluster matching |
[DRAIN] |
depth |
4 |
Prefix tree depth (minimum 3) |
[DRAIN] |
max_children |
100 |
Max children per internal node |
[DRAIN] |
max_clusters |
unlimited | LRU cap on tracked clusters |
[DRAIN] |
extra_delimiters |
[] |
Additional token delimiters, e.g. ["_"] |
[MASKING] |
mask_prefix / mask_suffix |
< / > |
Wraps masked tokens |
[SNAPSHOT] |
snapshot_interval_minutes |
5 |
Periodic save interval |
[SNAPSHOT] |
compress_state |
true |
Zlib-compress snapshots |
Snapshots save and restore the full cluster tree in a JSON format (zlib-compressed by default).
from drain3.file_persistence import FilePersistence
from drain3.memory_buffer_persistence import MemoryBufferPersistence
# File
tm = TemplateMiner(FilePersistence("drain3_state.bin"))
# In-memory (useful for tests)
buf = MemoryBufferPersistence()
tm = TemplateMiner(buf)
# Kafka / Redis — same API as original Drain3
from drain3.kafka_persistence import KafkaPersistence
tm = TemplateMiner(KafkaPersistence("drain3-snapshot", bootstrap_servers="localhost:9092"))Custom persistence: subclass PersistenceHandler and implement save_state(bytes) / load_state() -> Optional[bytes].
# Training — updates templates as new patterns are seen
result = tm.add_log_message(log_line)
# Inference — matches existing templates only, returns None if no match
cluster = tm.match(log_line) # exact match
cluster = tm.match(log_line, "fallback") # try tree, then full scan
cluster = tm.match(log_line, "always") # always full scangit clone <this-repo>
cd drain3-rs
# Install maturin (build tool for PyO3)
pip install maturin
# Build and install in the current venv (development)
maturin develop --release
# Build a wheel
maturin build --releaseRun the benchmark (requires the original Python drain3 installed alongside for comparison):
python benches/benchmark.pyRun the original Drain3 test suite against the Rust implementation:
pip install pytest
python -m pytest /path/to/Drain3/tests/ -vTwo engines are supported, selected via [DRAIN]/engine in the config file:
Drain(default) — standard fixed-depth prefix tree, keys on token count at the root level.JaccardDrain— variant that uses Jaccard similarity and keys on the first token, allowing variable-length template merging.
- Snapshot format: uses a native Rust JSON format instead of
jsonpickle. Snapshots from the Python version are not compatible and must be regenerated. - No profiling output:
SimpleProfileris implemented but profiling metrics are not printed to stdout in this version. - Rust toolchain required to build from source (pre-built wheels have no such requirement).
Benchmarked on 500 K synthetic SSH-style log lines across two scenarios: no masking (pure tree throughput) and production-style \b-based masking patterns from drain3.ini. The internal tree and cluster cache are pure Rust data structures — no GIL borrows during traversal or template matching.
| Messages | Rust (msg/s) | Python (msg/s) | Speedup |
|---|---|---|---|
| 1,000 | 648,099 | 298,441 | 2.17x |
| 5,000 | 735,609 | 295,114 | 2.49x |
| 10,000 | 734,295 | 289,519 | 2.54x |
| 25,000 | 729,276 | 303,969 | 2.40x |
| 50,000 | 728,015 | 283,702 | 2.57x |
| 100,000 | 704,195 | 299,077 | 2.35x |
| 250,000 | 744,239 | 306,935 | 2.42x |
| 500,000 | 736,453 | 288,195 | 2.56x |
| Messages | Rust (msg/s) | Python (msg/s) | Speedup |
|---|---|---|---|
| 1,000 | 250,375 | 76,576 | 3.27x |
| 5,000 | 237,816 | 70,298 | 3.38x |
| 10,000 | 237,489 | 77,384 | 3.07x |
| 25,000 | 252,220 | 79,155 | 3.19x |
| 50,000 | 246,566 | 79,606 | 3.10x |
| 100,000 | 254,946 | 80,038 | 3.19x |
| 250,000 | 244,813 | 79,978 | 3.06x |
| 500,000 | 247,813 | 67,977 | 3.65x |
