added attacklog_correlator by erik-graf · Pull Request #106 · ait-testbed/attackbed

erik-graf · 2026-05-22T12:39:22Z

added attacklog_correlator.py to scripts/ and adapted .gitignore for its working files

…data

whotwagner · 2026-05-26T14:19:59Z

Please add:

VERSION
README.md
AUTHORS
LICENSE
.pre-commit.yaml (see detectmateservice)
pytests

thorinaboenke · 2026-05-27T12:18:11Z

Apache/Nginx lines like:

27.0.0.1 - - [27/May/2026:13:01:57 +0200] "GET / HTTP/1.1" 200 612 "-" "curl/7.81.0"

are silently dropped, no LogEvent is created cause all five timestamp parsers anchor with ^ and expect the timestamp at position 0.

parse_json_line_timestamp() only captures string, misses epochs

to find bugs easier; i suggest counting unparsable lines and adding something like "skipped N lines across M files due to unparseable timestamps", is that feasible?

thorinaboenke · 2026-05-27T12:45:43Z

_sanitize_preview_text and - _ui_sanitize --> pretty much the same function?
performance? all events loaded into memory at once, might become a problem for very large log directories? (but probably future us problem^)
check if journalctl is actually there before using and maybe notify instead of runtime error and silently skipping
if zone info encounters Exception uses fallback, log a warning (in case of misspelled timezone in config for example)

thorinaboenke · 2026-05-27T12:46:44Z

+def get_zone(name: str) -> ZoneInfo:
+    try:
+        return ZoneInfo(name)
+    except Exception:


should log a warning if that happens

thorinaboenke · 2026-05-27T12:47:31Z

+    end = line_number + radius
+    out: list[str] = []
+
+    def _sanitize_preview_text(value: str) -> str:


this is the same as _ui_sanitize ?

thorinaboenke · 2026-05-27T12:49:58Z

+    return attacks
+
+
+def run_journalctl_export(path: Path) -> list[str]:


better to ask for permission here? (i.e. is journalclt there?) otherwise silent error, was journal file was skipped because of a format issue or because journalctl isn't installed

thorinaboenke · 2026-05-27T12:55:39Z

+    lexical_terms = extract_attack_lexical_terms(attack)
+    exact_hits = [tok for tok in lexical_terms if tok not in LEXICAL_STOP_TOKENS and tok in event_text]
+    if exact_hits:
+        val = min(80.0, 10.0 * len(set(exact_hits)))


magic numbers

thorinaboenke · 2026-05-27T12:55:46Z

+
+    overlap_terms = sorted((set(lexical_terms) - set(exact_hits)) & event_terms)
+    if overlap_terms:
+        val = min(40.0, 6.0 * len(overlap_terms))


magic numbers,

thorinaboenke · 2026-05-27T13:01:34Z

+
+    matched_total = len(set(binary_hits)) + len(set(exact_hits)) + len(set(overlap_terms))
+    if matched_total >= 3:
+        bonus = min(25.0, 5.0 * matched_total)


magic numbers
giving meaningful names somewhere, an then its also easier to tinker around with if one wants to change the scoring system

# Binary name match: highest weight signal, executable names are strong evidence SCORE_BINARY_BASE = 40.0 # awarded for any binary match at all SCORE_BINARY_PER_HIT = 25.0 # per additional distinct binary matched SCORE_BINARY_CAP = 120.0 # caps at 4 binaries (base + 3 additional hits) # Exact token match: attack tokens appearing literally in the log line SCORE_EXACT_TOKEN_PER_HIT = 10.0 # per distinct token SCORE_EXACT_TOKEN_CAP = 80.0 # caps at 8 tokens # Token overlap: tokens present in both after tokenization SCORE_OVERLAP_PER_TERM = 6.0 # per overlapping term SCORE_OVERLAP_CAP = 40.0 # caps at ~7 terms # Multi-signal bonus: rewards correlations where several independent signals agree SCORE_MULTI_SIGNAL_THRESHOLD = 3 # minimum combined hits to qualify SCORE_MULTI_SIGNAL_PER_HIT = 5.0 # per hit above threshold SCORE_MULTI_SIGNAL_CAP = 25.0 # caps at 5 combined hits

thorinaboenke · 2026-05-27T13:06:50Z

+    cmd: str
+    source_file: str
+    line_number: int
+    raw: dict[str, Any]


this is never used again? gets assigned to Attack() when getting it from raw output it assigns raw={}?

thorinaboenke · 2026-05-27T13:08:09Z

+        cmd=str(row.get("cmd", row.get("attack_cmd", ""))),
+        source_file=str(row.get("source_file", row.get("attack_file", ""))),
+        line_number=int(row.get("line_number", row.get("attack_line", 0)) or 0),
+        raw={},


intentional? the raw is saved in Attack() before

thorinaboenke · 2026-05-27T13:16:07Z

+            )
+        return top
+
+    def _main(self, stdscr) -> None:


somewhat long, maybe extract rendering of each pane to functions?

thorinaboenke · 2026-05-27T13:18:18Z

+        return None
+
+
+ndefault = object()


this is never used?

added attacklog_correlator.py and adapted .gitignore for its working …

537f6bc

…data

erik-graf requested a review from whotwagner May 22, 2026 12:39

thorinaboenke self-requested a review May 27, 2026 12:26

thorinaboenke reviewed May 27, 2026

View reviewed changes

		return attacks


		def run_journalctl_export(path: Path) -> list[str]:

Conversation

erik-graf commented May 22, 2026

Uh oh!

whotwagner commented May 26, 2026

Uh oh!

thorinaboenke commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thorinaboenke commented May 27, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thorinaboenke May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thorinaboenke May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

thorinaboenke commented May 27, 2026 •

edited

Loading

thorinaboenke May 27, 2026 •

edited

Loading

thorinaboenke May 27, 2026 •

edited

Loading