Skip to content

fix(core): Unescape variable strings before dictionary lookup in EncodedVariableInterpreter::encode_and_search_dictionary (fixes #590).#1270

Merged
gibber9809 merged 16 commits into
y-scope:mainfrom
gibber9809:fix-encode-and-search-dictionary
Sep 12, 2025
Merged

Conversation

@gibber9809

@gibber9809 gibber9809 commented Aug 27, 2025

Copy link
Copy Markdown
Contributor

Description

This PR fixes a bug in EncodedVariableInterpreter::encode_and_search_dictionary where variable strings were not unescaped before being looked up in the variable dictionary. Fixing this issue addresses the last outstanding failing case in #590 (note however that we should also finish updating our wildcard search implementation to solve similar unescaping bugs).

The bug this PR solves could be encountered when issuing searches that target variable dictionary entries containing the \ character -- since \ isn't a delimiter (unlike other characters that need to be escaped for literal search such as *, ?, and the other variable placeholders) the heuristic parser will parse a string like "text \a123" as the logtext "text \var" and the variable "\a123". However, to query such a string the user needs to escape the \, i.e. the query string will be "text \\a123"; to properly perform the dictionary lookup the escaped variable "\\a123" must be unescaped as "\a123".

This change is implemented by adding a simple unescaping utility to string utils which we use in encode_and_search_dictionary, as well as one other place in clp-s that was previously unescaping query strings in line before doing dictionary lookups.

We expand testing in test-EncodedVariableInterpreter.cpp to cover this edge case, and enable the failing case from #590 in test-clp_s-search.cpp.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Expanded test in test-EncodedVariableInterpreter.cpp
  • Enabled test in test-clp_s-search.cpp

Summary by CodeRabbit

  • Bug Fixes

    • Searches and dictionary lookups now unescape input so queries match values containing escaped characters (e.g., backslashes); wildcard results now include entries like “\Abc123”.
  • New Features

    • Centralized unescape handling for query strings to ensure consistent matching of escaped content across search operations.
  • Tests

    • Re-enabled a previously skipped search case and added extensive tests covering escaped-character scenarios and unescape behaviour.

@gibber9809 gibber9809 requested review from a team and wraymo as code owners August 27, 2025 21:41
@coderabbitai

coderabbitai Bot commented Aug 27, 2025

Copy link
Copy Markdown
Contributor

Walkthrough

Adds clp::string_utils::unescape_string and uses it to unescape variable/query strings before dictionary lookups in EncodedVariableInterpreter and QueryRunner; implements and tests unescaping, extends tests to include backslash-escaped values, and updates test data and expectations for an escaped search case.

Changes

Cohort / File(s) Summary
String utilities API
components/core/src/clp/string_utils/string_utils.hpp, components/core/src/clp/string_utils/string_utils.cpp
Declares and implements clp::string_utils::unescape_string(std::string_view) that removes backslash escapes (treats \xx, drops lone trailing \). Adds required includes.
Dictionary lookup logic
components/core/src/clp/EncodedVariableInterpreter.hpp
Includes string utils and uses unescape_string for non-numeric variable strings before calling var_dict.get_entry_matching_value(...).
Search query processing
components/core/src/clp_s/search/QueryRunner.cpp
Replaces manual inlined unescape logic with clp::string_utils::unescape_string(query_string) prior to dictionary lookup.
Tests — EncodedVariableInterpreter
components/core/tests/test-EncodedVariableInterpreter.cpp
Adds escaped-value test coverage (backslash-containing string), introduces an escape_handler to build escaped inputs, and calls encode_and_search_dictionary with escaped inputs.
Tests — clp-s search and test data
components/core/tests/test-clp_s-search.cpp, components/core/tests/test_log_files/test_search.jsonl
Re-enables a previously skipped search case containing \\Abc123, updates expected result indices to include index 4, and changes the 4th test record field from skip_msg to msg.
Tests — string utilities
components/core/tests/test-string_utils.cpp
Exposes and tests unescape_string (simple cases and exhaustive char-range case); adds <limits> include.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant QueryRunner
  participant StringUtils as clp::string_utils::unescape_string
  participant VarDict as VariableDictionary

  User->>QueryRunner: run query with VarString
  QueryRunner->>StringUtils: unescape(query_string)
  StringUtils-->>QueryRunner: unescaped_query
  QueryRunner->>VarDict: get_entry_matching_value(unescaped_query, ignore_case)
  alt no match
    VarDict-->>QueryRunner: []
    QueryRunner-->>User: no results for this term
  else matches
    VarDict-->>QueryRunner: entries
    QueryRunner-->>User: results (existing flow)
  end
Loading
sequenceDiagram
  autonumber
  participant Encoder as EncodedVariableInterpreter
  participant StringUtils as clp::string_utils::unescape_string
  participant VarDict as VariableDictionary

  Encoder->>Encoder: encode_and_search_dictionary(var_str)
  alt numeric (int/float)
    Encoder-->>Encoder: handle numeric path
  else non-numeric
    Encoder->>StringUtils: unescape(var_str)
    StringUtils-->>Encoder: unescaped_var
    Encoder->>VarDict: get_entry_matching_value(unescaped_var, ignore_case)
    alt no entries
      VarDict-->>Encoder: []
      Encoder-->>Encoder: return false
    else entries found
      VarDict-->>Encoder: entries
      Encoder-->>Encoder: add dict var and proceed
    end
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • wraymo
  • haiqi96
  • kirkrodrigues

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c748d16 and 3770aaa.

📒 Files selected for processing (3)
  • components/core/src/clp/string_utils/string_utils.hpp (2 hunks)
  • components/core/src/clp_s/search/QueryRunner.cpp (1 hunks)
  • components/core/tests/test-clp_s-search.cpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp/string_utils/string_utils.hpp
  • components/core/src/clp_s/search/QueryRunner.cpp
  • components/core/tests/test-clp_s-search.cpp
🧠 Learnings (1)
📚 Learning: 2025-01-30T19:26:33.869Z
Learnt from: davemarco
PR: y-scope/clp#700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.

Applied to files:

  • components/core/src/clp/string_utils/string_utils.hpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: package-image
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: lint-check (macos-15)
🔇 Additional comments (3)
components/core/tests/test-clp_s-search.cpp (2)

206-206: Re-enabling the escaped-backslash case is correct.

This validates the bugfix for unescaping prior to dictionary lookup. Good addition.


209-209: Adjusted expectation includes idx 4 — matches the new semantics.

Wildcard query now correctly matches the entry containing “\Abc123”. Looks right.

components/core/src/clp_s/search/QueryRunner.cpp (1)

889-893: Good: centralised unescaping before exact dictionary lookup.

Replacing ad‑hoc logic with clp::string_utils::unescape_string is the right fix and improves consistency.

Pre-merge checks (2 passed, 1 warning)

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title clearly and specifically describes the main change: unescaping variable strings before dictionary lookup in EncodedVariableInterpreter::encode_and_search_dictionary, which matches the PR diffs (string_utils addition, updates to the interpreter and related tests). It also follows the repo's conventional prefix ("fix(core):") and includes the issue reference (#590), so a reviewer scanning history will understand the purpose.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between dd91cf9 and 2af7920.

📒 Files selected for processing (7)
  • components/core/src/clp/EncodedVariableInterpreter.hpp (2 hunks)
  • components/core/src/clp/string_utils/string_utils.cpp (1 hunks)
  • components/core/src/clp/string_utils/string_utils.hpp (1 hunks)
  • components/core/src/clp_s/search/QueryRunner.cpp (1 hunks)
  • components/core/tests/test-EncodedVariableInterpreter.cpp (4 hunks)
  • components/core/tests/test-clp_s-search.cpp (1 hunks)
  • components/core/tests/test_log_files/test_search.jsonl (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp/EncodedVariableInterpreter.hpp
  • components/core/src/clp_s/search/QueryRunner.cpp
  • components/core/src/clp/string_utils/string_utils.hpp
  • components/core/tests/test-clp_s-search.cpp
  • components/core/src/clp/string_utils/string_utils.cpp
  • components/core/tests/test-EncodedVariableInterpreter.cpp
🧠 Learnings (1)
📚 Learning: 2024-11-01T03:26:26.386Z
Learnt from: LinZhihao-723
PR: y-scope/clp#570
File: components/core/tests/test-ir_encoding_methods.cpp:376-399
Timestamp: 2024-11-01T03:26:26.386Z
Learning: In the test code (`components/core/tests/test-ir_encoding_methods.cpp`), exception handling for `msgpack::unpack` can be omitted because the Catch2 testing framework captures exceptions if they occur.

Applied to files:

  • components/core/tests/test-EncodedVariableInterpreter.cpp
🧬 Code graph analysis (2)
components/core/src/clp/EncodedVariableInterpreter.hpp (1)
components/core/src/glt/EncodedVariableInterpreter.hpp (1)
  • var_str (160-166)
components/core/tests/test-EncodedVariableInterpreter.cpp (1)
components/core/src/clp/EncodedVariableInterpreter.hpp (11)
  • logtype (60-62)
  • logtype (60-60)
  • logtype (68-70)
  • logtype (68-68)
  • logtype (76-78)
  • logtype (76-76)
  • logtype (84-86)
  • logtype (84-84)
  • encode_and_search_dictionary (199-205)
  • encode_and_search_dictionary (440-486)
  • encode_and_search_dictionary (440-446)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: build-macos (macos-15, true)
🔇 Additional comments (4)
components/core/tests/test_log_files/test_search.jsonl (1)

5-5: LGTM: include backslash case in test data.

The JSON string correctly uses a double backslash to represent a single literal '' in the payload; this unblocks the corresponding test.

components/core/tests/test-clp_s-search.cpp (1)

205-205: Re-enabled and expanded assertions look correct.

  • The direct match for "Msg 4: \Abc123" and the wildcard expectation including index 4 align with the new unescaping flow.

Also applies to: 208-208

components/core/src/clp_s/search/QueryRunner.cpp (1)

888-893: Good centralization: use string_utils::unescape_string.

Replacing the ad hoc unescape with the shared utility reduces drift and keeps behaviour consistent with EncodedVariableInterpreter.

If you want, I can scan for remaining manual backslash-unescaping to consolidate on the utility.

components/core/src/clp/EncodedVariableInterpreter.hpp (1)

9-10: Include addition is appropriate.

Using the shared string utility here keeps interpretation in sync with query handling.

Comment thread components/core/src/clp/EncodedVariableInterpreter.hpp
Comment thread components/core/src/clp/string_utils/string_utils.cpp Outdated
Comment thread components/core/src/clp/string_utils/string_utils.hpp
Comment thread components/core/tests/test-EncodedVariableInterpreter.cpp
Comment thread components/core/tests/test-EncodedVariableInterpreter.cpp Outdated
Comment thread components/core/tests/test-EncodedVariableInterpreter.cpp Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2af7920 and f55c532.

📒 Files selected for processing (2)
  • components/core/tests/test-EncodedVariableInterpreter.cpp (4 hunks)
  • components/core/tests/test-string_utils.cpp (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/tests/test-string_utils.cpp
  • components/core/tests/test-EncodedVariableInterpreter.cpp
🧬 Code graph analysis (2)
components/core/tests/test-string_utils.cpp (2)
components/core/src/clp/string_utils/string_utils.cpp (2)
  • unescape_string (191-205)
  • unescape_string (191-191)
components/core/src/clp/string_utils/string_utils.hpp (1)
  • unescape_string (99-99)
components/core/tests/test-EncodedVariableInterpreter.cpp (1)
components/core/src/clp/EncodedVariableInterpreter.hpp (3)
  • encode_and_search_dictionary (199-205)
  • encode_and_search_dictionary (440-486)
  • encode_and_search_dictionary (440-446)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
🔇 Additional comments (5)
components/core/tests/test-string_utils.cpp (2)

17-17: Import looks good.

Bringing unescape_string into scope is appropriate for the new tests.


116-121: Good: locks down trailing backslash behaviour.

These assertions clearly encode the contract that a trailing '' is dropped.

components/core/tests/test-EncodedVariableInterpreter.cpp (3)

3-6: Includes are appropriate.

Adding , <string_view>, and matches the new usages below.


22-22: LGTM on std::string_view usage.

Keeps the test code efficient and consistent with the APIs.


447-448: Backslash test value is fine for dictionary insertion.

Storing the literal “\a1” in the dictionary is correct for this scenario.

Comment thread components/core/tests/test-EncodedVariableInterpreter.cpp
Comment thread components/core/tests/test-EncodedVariableInterpreter.cpp
Comment thread components/core/tests/test-string_utils.cpp Outdated
Comment thread components/core/tests/test-string_utils.cpp

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
components/core/src/clp/string_utils/string_utils.hpp (1)

90-99: Clarify Doxygen wording and escape angle brackets to avoid HTML parsing.

Use &lt;/&gt; and explicitly state uniform handling (including wildcards) and that C-style escapes aren’t interpreted. Also tighten @param/@return.

-/**
- * Unescapes a string according to the following rules:
- * <ul>
- *   <li>Escape sequences `\<char>` are replaced by `<char>`</li>
- *   <li>Lone dangling `\` is removed from the end of the string</li>
- * </ul>
- * @param str
- * @return An unescaped version of `str`.
- */
+/**
+ * Unescapes a string with simple backslash removal:
+ * <ul>
+ *   <li>Escape sequences `\\&lt;char&gt;` are replaced by `&lt;char&gt;` (applies to all characters,
+ *       including '*' and '?'; no special-casing and no C-style decoding like `\\n` → newline).</li>
+ *   <li>A trailing lone backslash is removed.</li>
+ * </ul>
+ * @param str Input string
+ * @return Unescaped copy of `str`.
+ */
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f55c532 and 00ab42b.

📒 Files selected for processing (1)
  • components/core/src/clp/string_utils/string_utils.hpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp/string_utils/string_utils.hpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: build-macos (macos-15, true)

Comment thread components/core/src/clp/string_utils/string_utils.hpp Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
components/core/src/clp/string_utils/string_utils.hpp (2)

7-7: Header now self-contained with std::string_view include — good catch.
This addresses the prior concern about relying on transitive includes.


91-101: API + docs look solid; add [[nodiscard]] and tighten 1st bullet (nit).
Function behaviour is clear and matches implementation. Marking the return as [[nodiscard]] prevents accidental discard; small wording tweak clarifies uniform handling.

 /**
  * Unescapes a string according to the following rules:
  * <ul>
- *   <li>Escape sequences `\<char>` are replaced by `<char>`</li>
+ *   <li>Escape sequences `\<char>` are replaced by `<char>` for all characters (no special-casing)</li>
  *   <li>Lone dangling `\` is removed from the end of the string</li>
  * </ul>
  * @param str
  * @return An unescaped version of `str`.
  */
-auto unescape_string(std::string_view str) -> std::string;
+[[nodiscard]] auto unescape_string(std::string_view str) -> std::string;
components/core/src/clp/string_utils/string_utils.cpp (1)

193-207: Pre-reserve output to avoid reallocations (nit).
Behaviour aligns with docs (trailing '' dropped). Reserving improves the common case.

 auto unescape_string(std::string_view str) -> std::string {
-    std::string unescaped_str;
+    std::string unescaped_str;
+    unescaped_str.reserve(str.size());
     bool escaped{false};
     for (auto const c : str) {
         if (escaped) {
             unescaped_str.push_back(c);
             escaped = false;
         } else if ('\\' == c) {
             escaped = true;
         } else {
             unescaped_str.push_back(c);
         }
     }
     return unescaped_str;
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 00ab42b and 617231d.

📒 Files selected for processing (2)
  • components/core/src/clp/string_utils/string_utils.cpp (2 hunks)
  • components/core/src/clp/string_utils/string_utils.hpp (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp/string_utils/string_utils.cpp
  • components/core/src/clp/string_utils/string_utils.hpp
🧠 Learnings (1)
📚 Learning: 2025-01-30T19:26:33.869Z
Learnt from: davemarco
PR: y-scope/clp#700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.

Applied to files:

  • components/core/src/clp/string_utils/string_utils.cpp
  • components/core/src/clp/string_utils/string_utils.hpp

Comment thread components/core/src/clp/string_utils/string_utils.cpp

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
components/core/tests/test-string_utils.cpp (1)

122-136: Reserve capacity to avoid reallocations in the exhaustive sweep.

This loop pushes 2N + N chars; reserving once avoids repeated growth on tight CI boxes.

     SECTION("Exhaustive string unescaping") {
         std::string unescaped_string;
         std::string escaped_string;
+        // Reserve once to prevent repeated reallocations: 256 unescaped, 512 escaped (for 8-bit char).
+        escaped_string.reserve(2u * (unsigned{std::numeric_limits<unsigned char>::max()} + 1u));
+        unescaped_string.reserve(unsigned{std::numeric_limits<unsigned char>::max()} + 1u);
         char c{std::numeric_limits<char>::min()};
         while (true) {
             escaped_string.push_back('\\');
             escaped_string.push_back(c);
             unescaped_string.push_back(c);
             if (c == std::numeric_limits<char>::max()) {
                 break;
             }
             ++c;
         }
         REQUIRE(unescaped_string == unescape_string(escaped_string));
     }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 617231d and c748d16.

📒 Files selected for processing (1)
  • components/core/tests/test-string_utils.cpp (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/tests/test-string_utils.cpp
🧠 Learnings (1)
📚 Learning: 2024-11-01T03:26:26.386Z
Learnt from: LinZhihao-723
PR: y-scope/clp#570
File: components/core/tests/test-ir_encoding_methods.cpp:376-399
Timestamp: 2024-11-01T03:26:26.386Z
Learning: In the test code (`components/core/tests/test-ir_encoding_methods.cpp`), exception handling for `msgpack::unpack` can be omitted because the Catch2 testing framework captures exceptions if they occur.

Applied to files:

  • components/core/tests/test-string_utils.cpp
🧬 Code graph analysis (1)
components/core/tests/test-string_utils.cpp (2)
components/core/src/clp/string_utils/string_utils.hpp (1)
  • unescape_string (100-100)
components/core/src/clp/string_utils/string_utils.cpp (2)
  • unescape_string (193-207)
  • unescape_string (193-193)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (2)
components/core/tests/test-string_utils.cpp (2)

2-2: Correct header for std::numeric_limits.

Including is the right fix; this resolves portability/compile issues across toolchains.


17-17: Local using for unescape_string is fine.

Keeps the tests concise without polluting broader scope.

Comment thread components/core/tests/test-string_utils.cpp
@LinZhihao-723 LinZhihao-723 self-requested a review September 11, 2025 21:26

@LinZhihao-723 LinZhihao-723 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small fix, otherwise lgtm.

Comment thread components/core/src/clp/string_utils/string_utils.hpp Outdated
gibber9809 and others added 2 commits September 11, 2025 21:09

@LinZhihao-723 LinZhihao-723 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the PR title, how about:

fix(core): Unescape variable strings before dictionary lookup in `EncodedVariableInterpreter::encode_and_search_dictionary`.

@gibber9809 gibber9809 changed the title fix(core): Make EncodedVariableInterpreter::encode_and_search_dictionary unescape variable strings before dictionary lookup. (fixes #590) fix(core): Unescape variable strings before dictionary lookup in EncodedVariableInterpreter::encode_and_search_dictionary (fixes #590). Sep 12, 2025
@gibber9809 gibber9809 merged commit 12ef392 into y-scope:main Sep 12, 2025
27 checks passed
junhaoliao pushed a commit to junhaoliao/clp that referenced this pull request May 17, 2026
…odedVariableInterpreter::encode_and_search_dictionary` (fixes y-scope#590). (y-scope#1270)

Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

clp-s: Several issues searching for logs that contain escaped characters.

2 participants