fix(core): Unescape variable strings before dictionary lookup in EncodedVariableInterpreter::encode_and_search_dictionary (fixes #590).#1270
Conversation
…ises unescaping behaviour.
WalkthroughAdds clp::string_utils::unescape_string and uses it to unescape variable/query strings before dictionary lookups in EncodedVariableInterpreter and QueryRunner; implements and tests unescaping, extends tests to include backslash-escaped values, and updates test data and expectations for an escaped search case. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant QueryRunner
participant StringUtils as clp::string_utils::unescape_string
participant VarDict as VariableDictionary
User->>QueryRunner: run query with VarString
QueryRunner->>StringUtils: unescape(query_string)
StringUtils-->>QueryRunner: unescaped_query
QueryRunner->>VarDict: get_entry_matching_value(unescaped_query, ignore_case)
alt no match
VarDict-->>QueryRunner: []
QueryRunner-->>User: no results for this term
else matches
VarDict-->>QueryRunner: entries
QueryRunner-->>User: results (existing flow)
end
sequenceDiagram
autonumber
participant Encoder as EncodedVariableInterpreter
participant StringUtils as clp::string_utils::unescape_string
participant VarDict as VariableDictionary
Encoder->>Encoder: encode_and_search_dictionary(var_str)
alt numeric (int/float)
Encoder-->>Encoder: handle numeric path
else non-numeric
Encoder->>StringUtils: unescape(var_str)
StringUtils-->>Encoder: unescaped_var
Encoder->>VarDict: get_entry_matching_value(unescaped_var, ignore_case)
alt no entries
VarDict-->>Encoder: []
Encoder-->>Encoder: return false
else entries found
VarDict-->>Encoder: entries
Encoder-->>Encoder: add dict var and proceed
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: ASSERTIVE Plan: Pro 📒 Files selected for processing (3)
🧰 Additional context used📓 Path-based instructions (1)**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}⚙️ CodeRabbit configuration file
Files:
🧠 Learnings (1)📚 Learning: 2025-01-30T19:26:33.869ZApplied to files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
🔇 Additional comments (3)
Pre-merge checks (2 passed, 1 warning)❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (7)
components/core/src/clp/EncodedVariableInterpreter.hpp(2 hunks)components/core/src/clp/string_utils/string_utils.cpp(1 hunks)components/core/src/clp/string_utils/string_utils.hpp(1 hunks)components/core/src/clp_s/search/QueryRunner.cpp(1 hunks)components/core/tests/test-EncodedVariableInterpreter.cpp(4 hunks)components/core/tests/test-clp_s-search.cpp(1 hunks)components/core/tests/test_log_files/test_search.jsonl(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>rather than!<expression>.
Files:
components/core/src/clp/EncodedVariableInterpreter.hppcomponents/core/src/clp_s/search/QueryRunner.cppcomponents/core/src/clp/string_utils/string_utils.hppcomponents/core/tests/test-clp_s-search.cppcomponents/core/src/clp/string_utils/string_utils.cppcomponents/core/tests/test-EncodedVariableInterpreter.cpp
🧠 Learnings (1)
📚 Learning: 2024-11-01T03:26:26.386Z
Learnt from: LinZhihao-723
PR: y-scope/clp#570
File: components/core/tests/test-ir_encoding_methods.cpp:376-399
Timestamp: 2024-11-01T03:26:26.386Z
Learning: In the test code (`components/core/tests/test-ir_encoding_methods.cpp`), exception handling for `msgpack::unpack` can be omitted because the Catch2 testing framework captures exceptions if they occur.
Applied to files:
components/core/tests/test-EncodedVariableInterpreter.cpp
🧬 Code graph analysis (2)
components/core/src/clp/EncodedVariableInterpreter.hpp (1)
components/core/src/glt/EncodedVariableInterpreter.hpp (1)
var_str(160-166)
components/core/tests/test-EncodedVariableInterpreter.cpp (1)
components/core/src/clp/EncodedVariableInterpreter.hpp (11)
logtype(60-62)logtype(60-60)logtype(68-70)logtype(68-68)logtype(76-78)logtype(76-76)logtype(84-86)logtype(84-84)encode_and_search_dictionary(199-205)encode_and_search_dictionary(440-486)encode_and_search_dictionary(440-446)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: build-macos (macos-15, true)
🔇 Additional comments (4)
components/core/tests/test_log_files/test_search.jsonl (1)
5-5: LGTM: include backslash case in test data.The JSON string correctly uses a double backslash to represent a single literal '' in the payload; this unblocks the corresponding test.
components/core/tests/test-clp_s-search.cpp (1)
205-205: Re-enabled and expanded assertions look correct.
- The direct match for "Msg 4: \Abc123" and the wildcard expectation including index 4 align with the new unescaping flow.
Also applies to: 208-208
components/core/src/clp_s/search/QueryRunner.cpp (1)
888-893: Good centralization: use string_utils::unescape_string.Replacing the ad hoc unescape with the shared utility reduces drift and keeps behaviour consistent with EncodedVariableInterpreter.
If you want, I can scan for remaining manual backslash-unescaping to consolidate on the utility.
components/core/src/clp/EncodedVariableInterpreter.hpp (1)
9-10: Include addition is appropriate.Using the shared string utility here keeps interpretation in sync with query handling.
There was a problem hiding this comment.
Actionable comments posted: 4
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
components/core/tests/test-EncodedVariableInterpreter.cpp(4 hunks)components/core/tests/test-string_utils.cpp(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>rather than!<expression>.
Files:
components/core/tests/test-string_utils.cppcomponents/core/tests/test-EncodedVariableInterpreter.cpp
🧬 Code graph analysis (2)
components/core/tests/test-string_utils.cpp (2)
components/core/src/clp/string_utils/string_utils.cpp (2)
unescape_string(191-205)unescape_string(191-191)components/core/src/clp/string_utils/string_utils.hpp (1)
unescape_string(99-99)
components/core/tests/test-EncodedVariableInterpreter.cpp (1)
components/core/src/clp/EncodedVariableInterpreter.hpp (3)
encode_and_search_dictionary(199-205)encode_and_search_dictionary(440-486)encode_and_search_dictionary(440-446)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
🔇 Additional comments (5)
components/core/tests/test-string_utils.cpp (2)
17-17: Import looks good.Bringing unescape_string into scope is appropriate for the new tests.
116-121: Good: locks down trailing backslash behaviour.These assertions clearly encode the contract that a trailing '' is dropped.
components/core/tests/test-EncodedVariableInterpreter.cpp (3)
3-6: Includes are appropriate.Adding , <string_view>, and matches the new usages below.
22-22: LGTM on std::string_view usage.Keeps the test code efficient and consistent with the APIs.
447-448: Backslash test value is fine for dictionary insertion.Storing the literal “\a1” in the dictionary is correct for this scenario.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
components/core/src/clp/string_utils/string_utils.hpp (1)
90-99: Clarify Doxygen wording and escape angle brackets to avoid HTML parsing.Use
</>and explicitly state uniform handling (including wildcards) and that C-style escapes aren’t interpreted. Also tighten @param/@return.-/** - * Unescapes a string according to the following rules: - * <ul> - * <li>Escape sequences `\<char>` are replaced by `<char>`</li> - * <li>Lone dangling `\` is removed from the end of the string</li> - * </ul> - * @param str - * @return An unescaped version of `str`. - */ +/** + * Unescapes a string with simple backslash removal: + * <ul> + * <li>Escape sequences `\\<char>` are replaced by `<char>` (applies to all characters, + * including '*' and '?'; no special-casing and no C-style decoding like `\\n` → newline).</li> + * <li>A trailing lone backslash is removed.</li> + * </ul> + * @param str Input string + * @return Unescaped copy of `str`. + */
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
components/core/src/clp/string_utils/string_utils.hpp(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>rather than!<expression>.
Files:
components/core/src/clp/string_utils/string_utils.hpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: build-macos (macos-15, true)
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
components/core/src/clp/string_utils/string_utils.hpp (2)
7-7: Header now self-contained with std::string_view include — good catch.
This addresses the prior concern about relying on transitive includes.
91-101: API + docs look solid; add [[nodiscard]] and tighten 1st bullet (nit).
Function behaviour is clear and matches implementation. Marking the return as [[nodiscard]] prevents accidental discard; small wording tweak clarifies uniform handling./** * Unescapes a string according to the following rules: * <ul> - * <li>Escape sequences `\<char>` are replaced by `<char>`</li> + * <li>Escape sequences `\<char>` are replaced by `<char>` for all characters (no special-casing)</li> * <li>Lone dangling `\` is removed from the end of the string</li> * </ul> * @param str * @return An unescaped version of `str`. */ -auto unescape_string(std::string_view str) -> std::string; +[[nodiscard]] auto unescape_string(std::string_view str) -> std::string;components/core/src/clp/string_utils/string_utils.cpp (1)
193-207: Pre-reserve output to avoid reallocations (nit).
Behaviour aligns with docs (trailing '' dropped). Reserving improves the common case.auto unescape_string(std::string_view str) -> std::string { - std::string unescaped_str; + std::string unescaped_str; + unescaped_str.reserve(str.size()); bool escaped{false}; for (auto const c : str) { if (escaped) { unescaped_str.push_back(c); escaped = false; } else if ('\\' == c) { escaped = true; } else { unescaped_str.push_back(c); } } return unescaped_str; }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
components/core/src/clp/string_utils/string_utils.cpp(2 hunks)components/core/src/clp/string_utils/string_utils.hpp(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>rather than!<expression>.
Files:
components/core/src/clp/string_utils/string_utils.cppcomponents/core/src/clp/string_utils/string_utils.hpp
🧠 Learnings (1)
📚 Learning: 2025-01-30T19:26:33.869Z
Learnt from: davemarco
PR: y-scope/clp#700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.
Applied to files:
components/core/src/clp/string_utils/string_utils.cppcomponents/core/src/clp/string_utils/string_utils.hpp
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
components/core/tests/test-string_utils.cpp (1)
122-136: Reserve capacity to avoid reallocations in the exhaustive sweep.This loop pushes 2N + N chars; reserving once avoids repeated growth on tight CI boxes.
SECTION("Exhaustive string unescaping") { std::string unescaped_string; std::string escaped_string; + // Reserve once to prevent repeated reallocations: 256 unescaped, 512 escaped (for 8-bit char). + escaped_string.reserve(2u * (unsigned{std::numeric_limits<unsigned char>::max()} + 1u)); + unescaped_string.reserve(unsigned{std::numeric_limits<unsigned char>::max()} + 1u); char c{std::numeric_limits<char>::min()}; while (true) { escaped_string.push_back('\\'); escaped_string.push_back(c); unescaped_string.push_back(c); if (c == std::numeric_limits<char>::max()) { break; } ++c; } REQUIRE(unescaped_string == unescape_string(escaped_string)); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
components/core/tests/test-string_utils.cpp(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>rather than!<expression>.
Files:
components/core/tests/test-string_utils.cpp
🧠 Learnings (1)
📚 Learning: 2024-11-01T03:26:26.386Z
Learnt from: LinZhihao-723
PR: y-scope/clp#570
File: components/core/tests/test-ir_encoding_methods.cpp:376-399
Timestamp: 2024-11-01T03:26:26.386Z
Learning: In the test code (`components/core/tests/test-ir_encoding_methods.cpp`), exception handling for `msgpack::unpack` can be omitted because the Catch2 testing framework captures exceptions if they occur.
Applied to files:
components/core/tests/test-string_utils.cpp
🧬 Code graph analysis (1)
components/core/tests/test-string_utils.cpp (2)
components/core/src/clp/string_utils/string_utils.hpp (1)
unescape_string(100-100)components/core/src/clp/string_utils/string_utils.cpp (2)
unescape_string(193-207)unescape_string(193-193)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (2)
components/core/tests/test-string_utils.cpp (2)
2-2: Correct header for std::numeric_limits.Including is the right fix; this resolves portability/compile issues across toolchains.
17-17: Localusingfor unescape_string is fine.Keeps the tests concise without polluting broader scope.
LinZhihao-723
left a comment
There was a problem hiding this comment.
A small fix, otherwise lgtm.
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
LinZhihao-723
left a comment
There was a problem hiding this comment.
For the PR title, how about:
fix(core): Unescape variable strings before dictionary lookup in `EncodedVariableInterpreter::encode_and_search_dictionary`.
EncodedVariableInterpreter::encode_and_search_dictionary unescape variable strings before dictionary lookup. (fixes #590)EncodedVariableInterpreter::encode_and_search_dictionary (fixes #590).
…odedVariableInterpreter::encode_and_search_dictionary` (fixes y-scope#590). (y-scope#1270) Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
Description
This PR fixes a bug in
EncodedVariableInterpreter::encode_and_search_dictionarywhere variable strings were not unescaped before being looked up in the variable dictionary. Fixing this issue addresses the last outstanding failing case in #590 (note however that we should also finish updating our wildcard search implementation to solve similar unescaping bugs).The bug this PR solves could be encountered when issuing searches that target variable dictionary entries containing the
\character -- since\isn't a delimiter (unlike other characters that need to be escaped for literal search such as*,?, and the other variable placeholders) the heuristic parser will parse a string like"text \a123"as the logtext"text \var"and the variable"\a123". However, to query such a string the user needs to escape the\, i.e. the query string will be"text \\a123"; to properly perform the dictionary lookup the escaped variable"\\a123"must be unescaped as"\a123".This change is implemented by adding a simple unescaping utility to string utils which we use in
encode_and_search_dictionary, as well as one other place in clp-s that was previously unescaping query strings in line before doing dictionary lookups.We expand testing in
test-EncodedVariableInterpreter.cppto cover this edge case, and enable the failing case from #590 intest-clp_s-search.cpp.Checklist
breaking change.
Validation performed
test-EncodedVariableInterpreter.cpptest-clp_s-search.cppSummary by CodeRabbit
Bug Fixes
New Features
Tests