Skip to content

fix(log-surgeon): Allow header variables to contain a timestamp capture as timestamps are unused in subquery decomposition; Remove outdated delimiter check in search lexer.#1972

Merged
SharafMohamed merged 10 commits into
y-scope:mainfrom
SharafMohamed:allow-headers-with-timestamp
Feb 19, 2026

Conversation

@SharafMohamed

@SharafMohamed SharafMohamed commented Feb 11, 2026

Copy link
Copy Markdown
Contributor

Reference

Description

  • As the timestamp capture from a header is not stored in the variable dictionary, the header variable is treated specially:
    • 0 captures, it is treated as a normal variable in both compression and search.
    • 1 timestamp capture: it extracts timestamps + static-text in compression, thus it is not needed in the search lexer as timestamps aren't considered during log matching.
    • 1+ non-timestamp capture or 2+ timestamp captures: Disabled as TNFA subquery decomposition is needed.

Validation Performed

  • Unit-tests still pass.

Summary by CodeRabbit

  • New Features

    • Accept header rules that capture a single timestamp without error.
  • Refactor

    • Simplified rule construction and registration flow; streamlined newline/header and delimiter handling; reduced delimiter error verbosity.
  • Tests

    • Expanded tests and added schema examples covering header/no-capture/timestamp/int/multi-capture scenarios.
  • Configuration

    • Updated schema patterns for header/timestamp, integer and float, and dictionary variable matching.

@SharafMohamed SharafMohamed requested a review from a team as a code owner February 11, 2026 10:47
@coderabbitai

coderabbitai Bot commented Feb 11, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Wrapped lexer utilities in namespace clp, added type aliases, simplified includes and delimiter handling, and introduced a schema-loader skip for header rules with a single timestamp capture; runtime validation now uses optional_captures and allows that header-timestamp pattern while rejecting other capture patterns.

Changes

Cohort / File(s) Summary
Lexer utils & token init
components/core/src/clp/Utils.cpp
Moved code into clp namespace, added type aliases, removed unused includes, added cTokenHeader handling, simplified newline regex creation, ensured rule names are registered into symbol tables, applied remove_delimiters_from_wildcard, and removed verbose delimiter error reporting.
Runtime capture validation
components/core/src/clp/clp/run.cpp
Replaced unconditional throw on detected regex captures with optional_captures handling; permits header rules with exactly one capture named timestamp, otherwise preserves previous error behavior for capture groups.
Schema definitions
components/core/config/schemas.txt
Replaced timestamp section with header using a named timestamp capture, updated timestamp/int/float patterns to use \d and -? shorthand, and adjusted dictionary/equality patterns.
Unit tests (C++ changes)
components/core/tests/test-ParserWithUserSchema.cpp
Updated includes, added spdlog::drop_all() call, renamed and expanded tests to cover header/no-capture, header with timestamp capture, and various invalid capture combinations; updated expected error messages.
Test schema files
components/core/tests/test_schema_files/header_with_int.txt, components/core/tests/test_schema_files/header_with_no_capture.txt, components/core/tests/test_schema_files/header_with_timestamp.txt, components/core/tests/test_schema_files/header_with_timestamp_and_int.txt
Added new schema fixtures defining delimiters and various header: regex patterns (no capture, timestamp capture, int capture, timestamp+int captures) for test coverage of header capture handling.

Sequence Diagram(s)

sequenceDiagram
    participant File as SchemaFile
    participant Loader as Utils::load_lexer_from_file
    participant Lexer as Lexer (symbol tables)
    participant Run as clp::clp::run (validator)

    File->>Loader: read schema file (rules, delimiters)
    Loader->>Lexer: register rule names & symbol ids
    Loader->>Loader: remove_delimiters_from_wildcard, build regex (newline literal)
    alt rule is header with single capture named "timestamp"
        Loader-->>Lexer: add rule but mark as header-timestamp skip
    else
        Loader-->>Lexer: add rule normally
    end
    Lexer->>Run: runtime validation sees rule with captures
    alt header-timestamp single capture
        Run-->>Lexer: allow (skip error)
    else
        Run-->>Lexer: error on capture groups
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main changes: allowing header variables with timestamp captures and removing an outdated delimiter check in the search lexer, which aligns with the PR objectives and file modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@components/core/src/clp/clp/run.cpp`:
- Around line 67-83: The code treats any present optional_captures as indicating
a capture group and throws, but get_captures_from_rule_id() can return an empty
vector; update the logic in the block handling optional_captures (the variables
optional_captures, captures, rule_name, rule_id and schema_file_path) to check
captures.empty() (or captures.size() == 0) and skip/continue when there are zero
captures before the existing special-case for the "header" rule and before
throwing the runtime_error so that only rules with one or more capture groups
trigger the error.

In `@components/core/src/clp/Utils.cpp`:
- Around line 169-171: The call currently constructs a temporary
RegexASTLiteral<ByteNfaState> and then passes it to make_unique; replace that by
calling make_unique<RegexASTLiteral<ByteNfaState>> with the constructor argument
directly (e.g., pass '\n' directly) so the object is constructed in-place;
update the expression that currently wraps RegexASTLiteral<ByteNfaState>('\n')
inside make_unique to a direct make_unique<RegexASTLiteral<ByteNfaState>>('\n').

Comment thread components/core/src/clp/clp/run.cpp
Comment thread components/core/src/clp/Utils.cpp Outdated
@SharafMohamed SharafMohamed changed the title fix(log-surgeon): Allow header variables to contain a timestamp capture timestamp is unused in subquery decomposition; Remove outdated delimiter check in search lexer. fix(log-surgeon): Allow header variables to contain a timestamp capture as timestamps are unused in subquery decomposition; Remove outdated delimiter check in search lexer. Feb 11, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
components/core/config/schemas.txt (1)

19-19: ⚠️ Potential issue | 🔴 Critical

Non-header rule equals contains a capture group that will cause a schema load error.

The equals pattern at line 19 uses a named capture group (?<var>...), but capture groups are only allowed in header rules. The schema validation will reject this rule with an error message during load.

🤖 Fix all issues with AI agents
In `@components/core/tests/test-ParserWithUserSchema.cpp`:
- Line 86: Add a short inline comment explaining why spdlog::drop_all() is
called here to avoid future removal: note that clp::clp::run registers spdlog
loggers which persist across Catch2 sections and test runs, so dropping all
loggers prevents re-registration conflicts and test flakiness; place the comment
on the line with spdlog::drop_all() near its current usage in
test-ParserWithUserSchema.cpp.

Comment thread components/core/tests/test-ParserWithUserSchema.cpp

@davidlion davidlion left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor changes, otherwise lgtm.

Comment thread components/core/config/schemas.txt Outdated
Comment thread components/core/tests/test-ParserWithUserSchema.cpp

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/core/config/schemas.txt`:
- Line 8: Replace the inner capturing group in the header regex with a
non-capturing optional group and use the ? quantifier: change
header:(?<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}(\.\d{3}){0,1}) to
header:(?<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}(?:\.\d{3})?) so the
schema uses only one capture (the named group); also verify in Utils.cpp and
run.cpp whether the capture-count validation inspects all capture groups (named
+ unnamed) or only named captures and update the validator or tests accordingly.
- Line 19: The regex for the "equals:" key was loosened incorrectly; replace the
current pattern equals:.*=.*[a-zA-Z0-9].* with one that preserves the original
key constraint but removes capture groups, e.g.
equals:[a-zA-Z0-9]+=.*[a-zA-Z0-9].*, so the left-hand key remains [a-zA-Z0-9]+
(preventing last-`=` greedy matches) while the value still requires an
alphanumeric character and no capture groups are used.

Comment thread components/core/config/schemas.txt
Comment thread components/core/config/schemas.txt

@davidlion davidlion left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@SharafMohamed SharafMohamed merged commit 2221f64 into y-scope:main Feb 19, 2026
27 checks passed
@junhaoliao junhaoliao added this to the February 2026 milestone Feb 26, 2026
junhaoliao pushed a commit to junhaoliao/clp that referenced this pull request May 17, 2026
…apture as timestamps are unused in subquery decomposition; Remove outdated delimiter check in search lexer. (y-scope#1972)

Co-authored-by: SharafMohamed <SharafMohamed@users.noreply.github.com>
Co-authored-by: davidlion <david.lion@yscope.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants