Skip to content

[Bug] KQL Validation Add Wildcard w/ Space token value#5753

Merged
imays11 merged 17 commits intomainfrom
bug_fix_kql_parser_spaces_with_wildcards
Mar 18, 2026
Merged

[Bug] KQL Validation Add Wildcard w/ Space token value#5753
imays11 merged 17 commits intomainfrom
bug_fix_kql_parser_spaces_with_wildcards

Conversation

@imays11
Copy link
Copy Markdown
Contributor

@imays11 imays11 commented Feb 20, 2026

Pull Request

Issue link(s):

Summary

Fixes KQL parser to support wildcard values containing spaces (e.g., *S3 Browser*), which work in Kibana but were rejected by our unit tests.

Changes

Grammar (lib/kql/kql/kql.g)

  • Added WILDCARD_LITERAL token with priority 3 to match wildcard patterns containing spaces
  • Uses negative lookahead to stop before or/and/not keywords
  • Added to value rule (not literal) so field names remain unaffected

Parser (lib/kql/kql/parser.py)

  • Handle new WILDCARD_LITERAL token type as wildcards
  • Quoted strings ("*text*") now treated as literals, matching Kibana behavior

Behavior

Query Before After
field: *S3 Browser* ❌ Parse error ✅ Wildcard
common.*: value ✅ Works ✅ Works
field: "*text*" Wildcard ✅ Literal (matches Kibana)

Test plan

Working Query in Kibana

Screenshot 2026-02-20 at 2 08 56 PM

Parser changes working with rule from PR #5694

Screenshot 2026-02-20 at 6 28 41 PM

All KQL unit tests passing w/ changes

Screenshot 2026-02-20 at 6 07 49 PM

## Summary
Fixes KQL parser to support wildcard values containing spaces (e.g., `*S3 Browser*`), which work in Kibana but were rejected by our unit tests.

**Issue:** #5750

## Changes

### Grammar (`lib/kql/kql/kql.g`)
- Added `WILDCARD_LITERAL` token with priority 3 to match wildcard patterns containing spaces
- Uses negative lookahead to stop before `or`/`and`/`not` keywords
- Added to `value` rule (not `literal`) so field names remain unaffected

### Parser (`lib/kql/kql/parser.py`)
- Handle new `WILDCARD_LITERAL` token type as wildcards
- Quoted strings (`"*text*"`) now treated as literals, matching Kibana behavior

## Behavior

| Query | Before | After |
|-------|--------|-------|
| `field: *S3 Browser*` | ❌ Parse error | ✅ Wildcard |
| `field: *test*` | ✅ Wildcard | ✅ Wildcard |
| `common.*: value` | ✅ Works | ✅ Works |
| `field: "*text*"` | Wildcard | ✅ Literal (matches Kibana) |

## Test plan
- [x] All 63 existing KQL unit tests pass
- [x] New wildcard-with-spaces patterns parse correctly
- [x] Wildcard field names (`common.*`) still work
- [x] Keywords (`or`, `and`, `not`) correctly recognized as separators
- [x] Tested against rule file from PR #5694
@imays11 imays11 self-assigned this Feb 20, 2026
@imays11 imays11 added bug Something isn't working python Internal python for the repository Team: TRADE labels Feb 20, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Bug - Guidelines

These guidelines serve as a reminder set of considerations when addressing a bug in the code.

Documentation and Context

  • Provide detailed documentation (description, screenshots, reproducing the bug, etc.) of the bug if not already documented in an issue.
  • Include additional context or details about the problem.
  • Ensure the fix includes necessary updates to the release documentation and versioning.

Code Standards and Practices

  • Code follows established design patterns within the repo and avoids duplication.
  • Ensure that the code is modular and reusable where applicable.

Testing

  • New unit tests have been added to cover the bug fix or edge cases.
  • Existing unit tests have been updated to reflect the changes.
  • Provide evidence of testing and detecting the bug fix (e.g., test logs, screenshots).
  • Validate that any rules affected by the bug are correctly updated.
  • Ensure that performance is not negatively impacted by the changes.
  • Verify that any release artifacts are properly generated and tested.
  • Conducted system testing, including fleet, import, and create APIs (e.g., run make test-cli, make test-remote-cli, make test-hunting-cli)

Additional Checks

  • Verify that the bug fix works across all relevant environments (e.g., different OS versions).
  • Confirm that the proper version label is applied to the PR patch, minor, major.

@imays11 imays11 added the patch label Feb 23, 2026
@eric-forte-elastic
Copy link
Copy Markdown
Contributor

I don't think this works as intended yet. If you look through the current rules on main with the parser update and have it log each time is sees a Wildcard literally, it will detect the value not /tmp/go-build* as a wildcard literal from rules/linux/execution_suspicious_executable_running_system_commands.toml which I do not think we want.

Reference Implementation for detection

parser_py.patch

Logfile result after installing via make deps and running unit tests:
(this is all one detection, just runs through multiple unit tests)

---------- WILDCARD LITERAL DETECTED ----------
token.value: not /tmp/go-build*
---------- WILDCARD LITERAL DETECTED ----------
token.value: not /tmp/go-build*
---------- WILDCARD LITERAL DETECTED ----------
token.value: not /tmp/go-build*
---------- WILDCARD LITERAL DETECTED ----------
token.value: not /tmp/go-build*
---------- WILDCARD LITERAL DETECTED ----------
token.value: not /tmp/go-build*
---------- WILDCARD LITERAL DETECTED ----------
token.value: not /tmp/go-build*
---------- WILDCARD LITERAL DETECTED ----------
token.value: not /tmp/go-build*
---------- WILDCARD LITERAL DETECTED ----------
token.value: not /tmp/go-build*
---------- WILDCARD LITERAL DETECTED ----------
token.value: not /tmp/go-build*
---------- WILDCARD LITERAL DETECTED ----------
token.value: not /tmp/go-build*
---------- WILDCARD LITERAL DETECTED ----------
token.value: not /tmp/go-build*
---------- WILDCARD LITERAL DETECTED ----------
token.value: not /tmp/go-build*


@eric-forte-elastic
Copy link
Copy Markdown
Contributor

I don't think this works as intended yet. If you look through the current rules on main with the parser update and have it log each time is sees a Wildcard literally, it will detect the value not /tmp/go-build* as a wildcard literal from rules/linux/execution_suspicious_executable_running_system_commands.toml which I do not think we want.

Reference Implementation for detection

Here is something that appears to fix the issue as a suggestion. There probably is a better way to do this, just an example.

Details

my_changes.patch

…ding keywords

Add Negative lookahead at start of Pattern 2 - uses (?!(?:or|and|not)\b) at the start to prevent matching values that begin with keywords like 'not /path*'
@imays11 imays11 marked this pull request as draft February 23, 2026 17:50
…aced phrase

# KQL Parser Changes - Wildcard Spaces and NOT Prefix Fix

## Overview

This update fixes two issues in the KQL parser:
1. **Wildcard values with spaces** - Values like `*S3 Browser*` now parse correctly
2. **NOT prefix false match** - Values like `not /tmp/go-build*` are no longer incorrectly consumed as a single wildcard literal

## Files Modified

### `lib/kql/kql/kql.g` (Grammar)

**Added `optional_not` rule** to handle `NOT` as an explicit grammar element:
```
?list_of_values: "(" or_list_of_values ")"
| optional_not value
?optional_not: NOT optional_not
|
```

**Expanded `WILDCARD_LITERAL`** with 4 patterns to support all wildcard-with-space cases:

| Pattern | Description | Example |
|---------|-------------|---------|
| 1 | Starts with `*` | `*S3 Browser`, `*S3 Browser*` |
| 2 | Ends with `*` (doesn't start with `*`) | `S3 Browser*` |
| 3a | `*` appears after a space | `S3 B*owser` |
| 3b | `*` appears before a space | `S3* Browser` |

### `lib/kql/kql/parser.py`

Added methods to handle the new grammar rules:
- `list_of_values()` - handles `optional_not value` structure
- `optional_not()` - counts NOT occurrences and wraps values with `NotValue`

### `lib/kql/kql/kql2eql.py`

Added corresponding methods for EQL conversion:
- `list_of_values()` - handles `optional_not value` structure
- `optional_not()` - counts NOT occurrences and wraps with `eql.ast.Not`

## Test Results

All 63 kuery tests pass. Verified wildcard cases:

| Input | Result |
|-------|--------|
| `field: *S3 Browser*` | `field:*S3\ Browser*` |
| `field: S3 Browser*` | `field:S3\ Browser*` |
| `field: *S3 Browser` | `field:*S3\ Browser` |
| `field: S3 B*owser` | `field:S3\ B*owser` |
| `field: S3* Browser` | `field:S3*\ Browser` |
| `field: foo* bar* baz` | `field:foo*\ bar*\ baz` |
| `process.executable: not /tmp/go-build*` | `not process.executable:/tmp/go-build*` |
| `field < value` | `field < value` (range expression, not wildcard) |

## Technical Notes

### Pattern 3a Fix
Pattern 3a requires at least one character AFTER the `*` (uses `[...]+` instead of `[...]*`). This prevents Pattern 2 from incorrectly matching shorter strings like `S3 B*` when the full value is `S3 B*owser`.

### NOT Keyword Handling
The `optional_not` grammar approach explicitly parses `NOT` as a keyword before the value, preventing it from being consumed as part of a wildcard literal. This is safer than regex-only approaches because:
- `NOT` token only matches the exact word "not" (case-insensitive)
- Values like `notafile*` are still parsed as `UNQUOTED_LITERAL`
- Edge case: literal value "not" must be quoted: `field: "not"`
@imays11
Copy link
Copy Markdown
Contributor Author

imays11 commented Feb 23, 2026

@eric-forte-elastic I implemented your optional_not token approach, and added support for phrases with a space and the wildcard is somewhere in the middle, previously the implementation only focused on wildcard being at the beginning or at the end.

KQL Parser Changes - Wildcard Spaces and NOT Prefix Fix

Overview

This update fixes two issues in the KQL parser:

  1. Wildcard values with spaces - Values like *S3 Browser* now parse correctly
  2. NOT prefix false match - Values like not /tmp/go-build* are no longer incorrectly consumed as a single wildcard literal

Files Modified

lib/kql/kql/kql.g (Grammar)

Added optional_not rule to handle NOT as an explicit grammar element:

?list_of_values: "(" or_list_of_values ")"
| optional_not value
?optional_not: NOT optional_not
|

Expanded WILDCARD_LITERAL with 4 patterns to support all wildcard-with-space cases:

Pattern Description Example
1 Starts with * *S3 Browser, *S3 Browser*
2 Ends with * (doesn't start with *) S3 Browser*
3a * appears after a space S3 B*owser
3b * appears before a space S3* Browser

lib/kql/kql/parser.py

Added methods to handle the new grammar rules:

  • list_of_values() - handles optional_not value structure
  • optional_not() - counts NOT occurrences and wraps values with NotValue

lib/kql/kql/kql2eql.py

Added corresponding methods for EQL conversion:

  • list_of_values() - handles optional_not value structure
  • optional_not() - counts NOT occurrences and wraps with eql.ast.Not

Test Results

All 63 kuery tests pass. Verified wildcard cases:

Input Result
field: *S3 Browser* field:*S3\ Browser*
field: S3 Browser* field:S3\ Browser*
field: *S3 Browser field:*S3\ Browser
field: S3 B*owser field:S3\ B*owser
field: S3* Browser field:S3*\ Browser
field: foo* bar* baz field:foo*\ bar*\ baz
process.executable: not /tmp/go-build* not process.executable:/tmp/go-build*, not wildcard literal
field < value field < value (range expression, not wildcard)

Technical Notes

Pattern 3a Fix

Pattern 3a requires at least one character AFTER the * (uses [...]+ instead of [...]*). This prevents Pattern 2 from incorrectly matching shorter strings like S3 B* when the full value is S3 B*owser.

NOT Keyword Handling

The optional_not grammar approach explicitly parses NOT as a keyword before the value, preventing it from being consumed as part of a wildcard literal. This is safer than regex-only approaches because:

  • NOT token only matches the exact word "not" (case-insensitive)
  • Values like notafile* are still parsed as UNQUOTED_LITERAL
  • Edge case: literal value "not" must be quoted: field: "not"
  • not process.executable:/tmp/go-build* parsed as UNQUOTED_LITERAL

@imays11 imays11 marked this pull request as ready for review February 23, 2026 18:24
@eric-forte-elastic
Copy link
Copy Markdown
Contributor

Just as a note from the testing output:

process.executable: not /tmp/go-build*

^ This is currently not detected as a wildcard literal, this is detected as an unquoted literal. In the current implementation wildcard literals will only be determined as such if there is a space with the field.

I think this is fine, just want to be sure that this is intentional.

Copy link
Copy Markdown
Contributor

@eric-forte-elastic eric-forte-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comment otherwise looks good 👍

@imays11
Copy link
Copy Markdown
Contributor Author

imays11 commented Mar 4, 2026

Just as a note from the testing output:

process.executable: not /tmp/go-build*

^ This is currently not detected as a wildcard literal, this is detected as an unquoted literal. In the current implementation wildcard literals will only be determined as such if there is a space with the field.

I think this is fine, just want to be sure that this is intentional.

Yes that's expected, I updated the comment to clarify

Copy link
Copy Markdown
Contributor

@Mikaayenson Mikaayenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main thing is to add test cases.

Comment thread lib/kql/kql/kql.g
Comment thread lib/kql/kql/parser.py Outdated
Comment thread lib/kql/kql/kql2eql.py
### Changes to Addresses Review Comments @Mikaayenson

1. **Fixed regex patterns to prevent trailing whitespace capture** (`kql.g`)
   - Added `(?=\s|$|[()":{}])` lookahead to all WILDCARD_LITERAL patterns
   - This ensures patterns stop at boundaries without capturing trailing whitespace

2. **Removed `.rstrip()` workaround** (`parser.py`)
   - No longer needed since regex now handles boundaries correctly

3. **Added explicit WILDCARD_LITERAL handling** (`kql2eql.py`)
   - Now checks `token.type == "WILDCARD_LITERAL"` explicitly
   - Mirrors the approach used in `parser.py`

4. **Added unit tests** (`tests/kuery/test_parser.py`)
   - `test_wildcard_with_spaces` - all 4 WILDCARD_LITERAL patterns
   - `test_wildcard_with_spaces_and_keywords` - wildcards with `and`/`or` boundaries
   - `test_not_prefix_with_wildcard` - NOT keyword not consumed as wildcard
   - `test_quoted_wildcard_as_literal` - quoted wildcards are literal strings
   - `test_triple_not_optimization` - `not not not foo` → `not foo`
@imays11
Copy link
Copy Markdown
Contributor Author

imays11 commented Mar 17, 2026

Changes to Address Review Comments @Mikaayenson

  1. Fixed regex patterns to prevent trailing whitespace capture (kql.g)

    • Added (?=\s|$|[()":{}]) lookahead to all WILDCARD_LITERAL patterns
    • This ensures patterns stop at boundaries without capturing trailing whitespace
  2. Removed .rstrip() workaround (parser.py)

    • No longer needed since regex now handles boundaries correctly
  3. Added explicit WILDCARD_LITERAL handling (kql2eql.py)

    • Now checks token.type == "WILDCARD_LITERAL" explicitly
    • Mirrors the approach used in parser.py
  4. Added unit tests (tests/kuery/test_parser.py)

    • test_wildcard_with_spaces - all 4 WILDCARD_LITERAL patterns
    • test_wildcard_with_spaces_and_keywords - wildcards with and/or boundaries
    • test_not_prefix_with_wildcard - NOT keyword not consumed as wildcard
    • test_quoted_wildcard_as_literal - quoted wildcards are literal strings
    • test_triple_not_optimization - not not not foonot foo
Screenshot 2026-03-17 at 2 25 34 PM

@imays11
Copy link
Copy Markdown
Contributor Author

imays11 commented Mar 18, 2026

Backport Testing

I tested the changes against the branches we still release updates for, with the following steps for branches 9.3, 9.2, 9.1 and 8.19.

# Checkout the target release branch
git checkout <branch e.g. 8.19>

# Checkout the feature branch with code changes
git checkout <feature-branch>

# Checkout just the rules from release branches
git checkout 8.19 -- rules_building_block 
git checkout 8.19 -- rules

# Run pytest w/ updated lib/kql
pip install -e lib/kql && python -m detection_rules test

All branches have passed except 9.1, which is failing due to a missing updated_date on an Azure rule. The new tests pass though.

8.19 Passed

Screenshot 2026-03-18 at 3 39 11 PM

9.1 Failed due to Azure rule

Screenshot 2026-03-18 at 3 50 01 PM

9.2 Passed

Screenshot 2026-03-18 at 4 06 05 PM

9.3 Passed

Screenshot 2026-03-18 at 4 06 45 PM

@imays11
Copy link
Copy Markdown
Contributor Author

imays11 commented Mar 18, 2026

9.1 passing with Azure rule fixed locally

Screenshot 2026-03-18 at 5 03 09 PM

@imays11 imays11 merged commit 7ae2980 into main Mar 18, 2026
17 checks passed
@imays11 imays11 deleted the bug_fix_kql_parser_spaces_with_wildcards branch March 18, 2026 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport: auto bug Something isn't working patch python Internal python for the repository Team: TRADE

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants