fix(log-surgeon): Allow header variables to contain a timestamp capture as timestamps are unused in subquery decomposition; Remove outdated delimiter check in search lexer.#1972
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughWrapped lexer utilities in namespace Changes
Sequence Diagram(s)sequenceDiagram
participant File as SchemaFile
participant Loader as Utils::load_lexer_from_file
participant Lexer as Lexer (symbol tables)
participant Run as clp::clp::run (validator)
File->>Loader: read schema file (rules, delimiters)
Loader->>Lexer: register rule names & symbol ids
Loader->>Loader: remove_delimiters_from_wildcard, build regex (newline literal)
alt rule is header with single capture named "timestamp"
Loader-->>Lexer: add rule but mark as header-timestamp skip
else
Loader-->>Lexer: add rule normally
end
Lexer->>Run: runtime validation sees rule with captures
alt header-timestamp single capture
Run-->>Lexer: allow (skip error)
else
Run-->>Lexer: error on capture groups
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@components/core/src/clp/clp/run.cpp`:
- Around line 67-83: The code treats any present optional_captures as indicating
a capture group and throws, but get_captures_from_rule_id() can return an empty
vector; update the logic in the block handling optional_captures (the variables
optional_captures, captures, rule_name, rule_id and schema_file_path) to check
captures.empty() (or captures.size() == 0) and skip/continue when there are zero
captures before the existing special-case for the "header" rule and before
throwing the runtime_error so that only rules with one or more capture groups
trigger the error.
In `@components/core/src/clp/Utils.cpp`:
- Around line 169-171: The call currently constructs a temporary
RegexASTLiteral<ByteNfaState> and then passes it to make_unique; replace that by
calling make_unique<RegexASTLiteral<ByteNfaState>> with the constructor argument
directly (e.g., pass '\n' directly) so the object is constructed in-place;
update the expression that currently wraps RegexASTLiteral<ByteNfaState>('\n')
inside make_unique to a direct make_unique<RegexASTLiteral<ByteNfaState>>('\n').
header variables to contain a timestamp capture timestamp is unused in subquery decomposition; Remove outdated delimiter check in search lexer.header variables to contain a timestamp capture as timestamps are unused in subquery decomposition; Remove outdated delimiter check in search lexer.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
components/core/config/schemas.txt (1)
19-19:⚠️ Potential issue | 🔴 CriticalNon-header rule
equalscontains a capture group that will cause a schema load error.The
equalspattern at line 19 uses a named capture group(?<var>...), but capture groups are only allowed inheaderrules. The schema validation will reject this rule with an error message during load.
🤖 Fix all issues with AI agents
In `@components/core/tests/test-ParserWithUserSchema.cpp`:
- Line 86: Add a short inline comment explaining why spdlog::drop_all() is
called here to avoid future removal: note that clp::clp::run registers spdlog
loggers which persist across Catch2 sections and test runs, so dropping all
loggers prevents re-registration conflicts and test flakiness; place the comment
on the line with spdlog::drop_all() near its current usage in
test-ParserWithUserSchema.cpp.
davidlion
left a comment
There was a problem hiding this comment.
Minor changes, otherwise lgtm.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@components/core/config/schemas.txt`:
- Line 8: Replace the inner capturing group in the header regex with a
non-capturing optional group and use the ? quantifier: change
header:(?<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}(\.\d{3}){0,1}) to
header:(?<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}(?:\.\d{3})?) so the
schema uses only one capture (the named group); also verify in Utils.cpp and
run.cpp whether the capture-count validation inspects all capture groups (named
+ unnamed) or only named captures and update the validator or tests accordingly.
- Line 19: The regex for the "equals:" key was loosened incorrectly; replace the
current pattern equals:.*=.*[a-zA-Z0-9].* with one that preserves the original
key constraint but removes capture groups, e.g.
equals:[a-zA-Z0-9]+=.*[a-zA-Z0-9].*, so the left-hand key remains [a-zA-Z0-9]+
(preventing last-`=` greedy matches) while the value still requires an
alphanumeric character and no capture groups are used.
…apture as timestamps are unused in subquery decomposition; Remove outdated delimiter check in search lexer. (y-scope#1972) Co-authored-by: SharafMohamed <SharafMohamed@users.noreply.github.com> Co-authored-by: davidlion <david.lion@yscope.com>
Reference
headerkeyword withtimestampcaptures. This breaks multi-line parsing, and timestamp extraction with the current version ofLog Surgeon.Description
timestampcapture from aheaderis not stored in the variable dictionary, theheadervariable is treated specially:0 captures, it is treated as a normal variable in both compression and search.1 timestamp capture: it extracts timestamps + static-text in compression, thus it is not needed in the search lexer as timestamps aren't considered during log matching.1+ non-timestamp captureor2+ timestamp captures: Disabled as TNFA subquery decomposition is needed.Validation Performed
Summary by CodeRabbit
New Features
Refactor
Tests
Configuration