Skip to content

fix(api): refactors the SQL LIKE pattern escaping logic to use a centralized utility function, ensuring consistent and secure handling of special characters across all database queries.#150

Open
tomerqodo wants to merge 4 commits intoqodo_action_req_1_base_fixapi_refactors_the_sql_like_pattern_escaping_logic_to_use_a_centralized__utility_function_ensuring_consistent_and_secure_handling_of_special_characters_across_all_database_quefrom
qodo_action_req_1_head_fixapi_refactors_the_sql_like_pattern_escaping_logic_to_use_a_centralized__utility_function_ensuring_consistent_and_secure_handling_of_special_characters_across_all_database_que
Open

fix(api): refactors the SQL LIKE pattern escaping logic to use a centralized utility function, ensuring consistent and secure handling of special characters across all database queries.#150
tomerqodo wants to merge 4 commits intoqodo_action_req_1_base_fixapi_refactors_the_sql_like_pattern_escaping_logic_to_use_a_centralized__utility_function_ensuring_consistent_and_secure_handling_of_special_characters_across_all_database_quefrom
qodo_action_req_1_head_fixapi_refactors_the_sql_like_pattern_escaping_logic_to_use_a_centralized__utility_function_ensuring_consistent_and_secure_handling_of_special_characters_across_all_database_que

Conversation

@tomerqodo
Copy link
Copy Markdown

Benchmark PR from agentic-review-benchmarks#12

NeatGuyCoding and others added 4 commits January 25, 2026 12:05
…ralized

utility function, ensuring consistent and secure handling of special characters
across all database queries.

Signed-off-by: NeatGuyCoding <15627489+NeatGuyCoding@users.noreply.github.com>
…logic

Signed-off-by: NeatGuyCoding <15627489+NeatGuyCoding@users.noreply.github.com>
@qodo-code-review
Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (3) 📎 Requirement gaps (0)

Grey Divider


Action required

1. escape_like_pattern returns None 📘 Rule violation ✓ Correctness
Description
escape_like_pattern is annotated to accept pattern: str and return str, but it returns
  pattern directly when falsy, and tests explicitly pass None expecting None back.
• This violates the requirement that Python functions have correct type annotations for parameters
  and return values, and will be flagged by type checking (basedpyright) as an incompatible
  argument/return type.
Code

api/libs/helper.py[R35-65]

+def escape_like_pattern(pattern: str) -> str:
+    """
+    Escape special characters in a string for safe use in SQL LIKE patterns.
+
+    This function escapes the special characters used in SQL LIKE patterns:
+    - Backslash (\\) -> \\
+    - Percent (%) -> \\%
+    - Underscore (_) -> \\_
+
+    The escaped pattern can then be safely used in SQL LIKE queries with the
+    ESCAPE '\\' clause to prevent SQL injection via LIKE wildcards.
+
+    Args:
+        pattern: The string pattern to escape
+
+    Returns:
+        Escaped string safe for use in SQL LIKE queries
+
+    Examples:
+        >>> escape_like_pattern("50% discount")
+        '50\\% discount'
+        >>> escape_like_pattern("test_data")
+        'test\\_data'
+        >>> escape_like_pattern("path\\to\\file")
+        'path\\\\to\\\\file'
+    """
+    if not pattern:
+        return pattern
+    # Escape backslash first, then percent and underscore
+    escapedPattern = pattern.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_")
+    return escapedPattern
Evidence
The compliance rule requires complete and correct type annotations for all function parameters and
return types. The implementation returns pattern when falsy (including None), and the unit test
calls the function with None, contradicting the declared str -> str signature.

AGENTS.md
api/libs/helper.py[35-65]
api/tests/unit_tests/libs/test_helper.py[96-101]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`escape_like_pattern` is typed as `str -&gt; str` but it currently returns `None` when passed `None` (and tests rely on that). This violates the project&#x27;s strict type-annotation compliance and will fail type checking.

## Issue Context
The function contains an early return `if not pattern: return pattern`, and tests pass `None` to confirm that behavior.

## Fix Focus Areas
- api/libs/helper.py[35-65]
- api/tests/unit_tests/libs/test_helper.py[96-101]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. notlike missing escape 📘 Rule violation ⛨ Security
Description
• In process_metadata_filter_func, the not contains branch escapes % and _ using backslashes
  but does not pass an escape character to notlike.
• Without escape="\\", sequences like \% may still be treated as a wildcard % by the database,
  undermining the intended input sanitization and potentially causing incorrect filtering behavior.
Code

api/core/rag/retrieval/dataset_retrieval.py[R1205-1207]

            case "not contains":
-                filters.append(json_field.notlike(f"%{value}%"))
+                escaped_value = escape_like_pattern(str(value))
+                filters.append(json_field.notlike(f"%{escaped_value}%"))
Evidence
The security-first input handling rule requires consistent sanitization to prevent injection issues
and incorrect behavior. The code escapes the pattern but omits the required escape argument for
the notlike operator, unlike adjacent like cases that explicitly provide escape="\\".

Rule 6: Generic: Security-First Input Validation and Data Handling
api/core/rag/retrieval/dataset_retrieval.py[1200-1215]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The `not contains` metadata filter escapes LIKE wildcards but does not specify the escape character on `notlike`, which can cause `\%` / `\_` to be treated as wildcards instead of literals.

## Issue Context
Other branches (`contains`, `start with`, `end with`) correctly pass `escape=&quot;\\&quot;`.

## Fix Focus Areas
- api/core/rag/retrieval/dataset_retrieval.py[1205-1207]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. New pytest tests lack types 📘 Rule violation ✓ Correctness
Description
• Newly added pytest test functions introduce untyped parameters and omit return type annotations
  (e.g., db_session_with_containers, mock_external_service_dependencies).
• This violates the requirement that all Python function definitions include parameter and return
  type annotations using modern Python 3.12+ syntax.
Code

api/tests/test_containers_integration_tests/services/test_annotation_service.py[R447-449]

+    def test_get_annotation_list_by_app_id_with_special_characters_in_keyword(
+        self, db_session_with_containers, mock_external_service_dependencies
+    ):
Evidence
The compliance rule requires explicit type hints for all function parameters and return types. The
added tests define new functions without annotating fixture parameters or -> None return types.

AGENTS.md
api/tests/test_containers_integration_tests/services/test_annotation_service.py[447-449]
api/tests/test_containers_integration_tests/services/test_app_service.py[1012-1014]
api/tests/test_containers_integration_tests/services/test_tag_service.py[316-318]
api/tests/test_containers_integration_tests/services/test_workflow_app_service.py[319-321]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
New pytest test functions were added without type annotations for parameters and return values, violating the repo’s strict type-annotation requirements.

## Issue Context
Pytest fixtures can be annotated using concrete types when known (e.g., SQLAlchemy `Session`) or `Any` when not convenient.

## Fix Focus Areas
- api/tests/test_containers_integration_tests/services/test_annotation_service.py[447-449]
- api/tests/test_containers_integration_tests/services/test_app_service.py[1012-1014]
- api/tests/test_containers_integration_tests/services/test_tag_service.py[316-318]
- api/tests/test_containers_integration_tests/services/test_workflow_app_service.py[319-321]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (2)
4. IRIS ESCAPE char mismatch 🐞 Bug ✓ Correctness
Description
escape_like_pattern() escapes using backslash (e.g. %\%), but the IRIS fallback SQL
  declares ESCAPE '|'.
• This mismatch means %/_ may still behave as wildcards (or the query may instead look for literal
  backslashes), causing incorrect/no results for keywords containing special characters.
• Impact: broken/incorrect LIKE fallback search behavior for the IRIS vector datasource.
Code

api/core/rag/datasource/vdb/iris/iris_vector.py[R290-300]

+                # Escape special characters for LIKE clause to prevent SQL injection
+                from libs.helper import escape_like_pattern
+
+                escaped_query = escape_like_pattern(query)
+                query_pattern = f"%{escaped_query}%"
                sql = f"""
                    SELECT TOP {top_k} id, text, meta
                    FROM {self.schema}.{self.table_name}
-                    WHERE text LIKE ?
+                    WHERE text LIKE ? ESCAPE '|'
                """
                cursor.execute(sql, (query_pattern,))
Evidence
escape_like_pattern hardcodes backslash escaping and its doc explicitly says to use ESCAPE '\\'.
IRIS fallback uses ESCAPE '|', so the escaping performed by escape_like_pattern is not
interpreted by the database as intended.

api/libs/helper.py[35-65]
api/core/rag/datasource/vdb/iris/iris_vector.py[289-300]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
IRIS vector fallback uses `ESCAPE &#x27;|&#x27;` while the new shared helper escapes patterns with backslash. This causes incorrect matching for `%`, `_`, and `\\`.

### Issue Context
`escape_like_pattern` is documented and implemented for backslash escaping. IRIS SQL currently expects a different escape character.

### Fix Focus Areas
- api/core/rag/datasource/vdb/iris/iris_vector.py[289-300]
- api/libs/helper.py[35-65]

### Suggested approach
- Prefer standardizing on backslash everywhere:
 - Update IRIS SQL to: `WHERE text LIKE ? ESCAPE &#x27;\\\\&#x27;` (so the SQL literal contains `\\` and the effective escape char is `\`).
 - Keep `escape_like_pattern` as-is.
- Alternatively, if IRIS requires `|`:
 - Add an optional `escape_char` parameter to `escape_like_pattern` and escape `%/_/&lt;escape_char&gt;` with that char.
 - Call it with `escape_char=&#x27;|&#x27;` and adjust accordingly.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


5. Workflow keyword unicode regression 🐞 Bug ✓ Correctness
Description
• Workflow app log keyword search no longer applies unicode-escape normalization before building the
  LIKE pattern, and now only escapes LIKE wildcards.
• WorkflowRun inputs/outputs are persisted via json.dumps(...) without ensure_ascii=False in
  task code, which commonly stores non-ASCII characters as \uXXXX sequences in the text column.
• Impact: searching workflow logs by non-ASCII keywords (e.g., CJK) may stop matching the stored
  JSON content.
Code

api/services/workflow_app_service.py[R88-103]

        if keyword:
-            keyword_like_val = f"%{keyword[:30].encode('unicode_escape').decode('utf-8')}%".replace(r"\u", r"\\u")
+            from libs.helper import escape_like_pattern
+
+            # Escape special characters in keyword to prevent SQL injection via LIKE wildcards
+            keyword_trimmed = keyword[:30]
+            escaped_keyword = escape_like_pattern(keyword_trimmed)
+            keyword_like_val = f"%{escaped_keyword}%"
            keyword_conditions = [
-                WorkflowRun.inputs.ilike(keyword_like_val),
-                WorkflowRun.outputs.ilike(keyword_like_val),
+                WorkflowRun.inputs.ilike(keyword_like_val, escape="\\"),
+                WorkflowRun.outputs.ilike(keyword_like_val, escape="\\"),
                # filter keyword by end user session id if created by end user role
-                and_(WorkflowRun.created_by_role == "end_user", EndUser.session_id.ilike(keyword_like_val)),
+                and_(
+                    WorkflowRun.created_by_role == "end_user",
+                    EndUser.session_id.ilike(keyword_like_val, escape="\\"),
+                ),
            ]
Evidence
The search path now uses the raw keyword[:30] (only applying LIKE escaping). Meanwhile workflow
run persistence uses json.dumps(...) without ensure_ascii=False, unlike other workflow
serialization that explicitly opts into ensure_ascii=False, indicating workflow run JSON text may
not be stored in literal unicode form. Without normalization, LIKE searches for non-ASCII may not
match the persisted representation.

api/services/workflow_app_service.py[88-103]
api/tasks/workflow_execution_tasks.py[104-107]
api/models/workflow.py[527-533]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
Workflow log keyword search removed unicode-escape normalization and now may fail to match workflow run inputs/outputs that are stored as JSON-escaped text.

### Issue Context
WorkflowRun inputs/outputs are persisted using `json.dumps(...)` without `ensure_ascii=False` in task code. This often results in `\\uXXXX` sequences for non-ASCII, while the user’s keyword is literal unicode.

### Fix Focus Areas
- api/services/workflow_app_service.py[88-103]
- api/tasks/workflow_execution_tasks.py[104-107]

### Suggested approach
- Before calling `escape_like_pattern`, transform the keyword into the same textual form used by JSON serialization, e.g.:
```py
keyword_trimmed = keyword[:30]
keyword_for_json_text = keyword_trimmed.encode(&quot;unicode_escape&quot;).decode(&quot;utf-8&quot;)
escaped_keyword = escape_like_pattern(keyword_for_json_text)
keyword_like_val = f&quot;%{escaped_keyword}%&quot;
```
- Keep `escape=&quot;\\&quot;` on the `ilike()` calls.
- Add/extend tests to cover a non-ASCII keyword search scenario (e.g. Chinese) to prevent regressions.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



ⓘ The new review experience is currently in Beta. Learn more

Qodo Logo

Comment thread api/libs/helper.py
Comment on lines +35 to +65
def escape_like_pattern(pattern: str) -> str:
"""
Escape special characters in a string for safe use in SQL LIKE patterns.

This function escapes the special characters used in SQL LIKE patterns:
- Backslash (\\) -> \\
- Percent (%) -> \\%
- Underscore (_) -> \\_

The escaped pattern can then be safely used in SQL LIKE queries with the
ESCAPE '\\' clause to prevent SQL injection via LIKE wildcards.

Args:
pattern: The string pattern to escape

Returns:
Escaped string safe for use in SQL LIKE queries

Examples:
>>> escape_like_pattern("50% discount")
'50\\% discount'
>>> escape_like_pattern("test_data")
'test\\_data'
>>> escape_like_pattern("path\\to\\file")
'path\\\\to\\\\file'
"""
if not pattern:
return pattern
# Escape backslash first, then percent and underscore
escapedPattern = pattern.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_")
return escapedPattern
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. escape_like_pattern returns none 📘 Rule violation ✓ Correctness

escape_like_pattern is annotated to accept pattern: str and return str, but it returns
  pattern directly when falsy, and tests explicitly pass None expecting None back.
• This violates the requirement that Python functions have correct type annotations for parameters
  and return values, and will be flagged by type checking (basedpyright) as an incompatible
  argument/return type.
Agent prompt
## Issue description
`escape_like_pattern` is typed as `str -> str` but it currently returns `None` when passed `None` (and tests rely on that). This violates the project's strict type-annotation compliance and will fail type checking.

## Issue Context
The function contains an early return `if not pattern: return pattern`, and tests pass `None` to confirm that behavior.

## Fix Focus Areas
- api/libs/helper.py[35-65]
- api/tests/unit_tests/libs/test_helper.py[96-101]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines 1205 to +1207
case "not contains":
filters.append(json_field.notlike(f"%{value}%"))
escaped_value = escape_like_pattern(str(value))
filters.append(json_field.notlike(f"%{escaped_value}%"))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

2. notlike missing escape 📘 Rule violation ⛨ Security

• In process_metadata_filter_func, the not contains branch escapes % and _ using backslashes
  but does not pass an escape character to notlike.
• Without escape="\\", sequences like \% may still be treated as a wildcard % by the database,
  undermining the intended input sanitization and potentially causing incorrect filtering behavior.
Agent prompt
## Issue description
The `not contains` metadata filter escapes LIKE wildcards but does not specify the escape character on `notlike`, which can cause `\%` / `\_` to be treated as wildcards instead of literals.

## Issue Context
Other branches (`contains`, `start with`, `end with`) correctly pass `escape="\\"`.

## Fix Focus Areas
- api/core/rag/retrieval/dataset_retrieval.py[1205-1207]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +447 to +449
def test_get_annotation_list_by_app_id_with_special_characters_in_keyword(
self, db_session_with_containers, mock_external_service_dependencies
):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

3. New pytest tests lack types 📘 Rule violation ✓ Correctness

• Newly added pytest test functions introduce untyped parameters and omit return type annotations
  (e.g., db_session_with_containers, mock_external_service_dependencies).
• This violates the requirement that all Python function definitions include parameter and return
  type annotations using modern Python 3.12+ syntax.
Agent prompt
## Issue description
New pytest test functions were added without type annotations for parameters and return values, violating the repo’s strict type-annotation requirements.

## Issue Context
Pytest fixtures can be annotated using concrete types when known (e.g., SQLAlchemy `Session`) or `Any` when not convenient.

## Fix Focus Areas
- api/tests/test_containers_integration_tests/services/test_annotation_service.py[447-449]
- api/tests/test_containers_integration_tests/services/test_app_service.py[1012-1014]
- api/tests/test_containers_integration_tests/services/test_tag_service.py[316-318]
- api/tests/test_containers_integration_tests/services/test_workflow_app_service.py[319-321]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +290 to 300
# Escape special characters for LIKE clause to prevent SQL injection
from libs.helper import escape_like_pattern

escaped_query = escape_like_pattern(query)
query_pattern = f"%{escaped_query}%"
sql = f"""
SELECT TOP {top_k} id, text, meta
FROM {self.schema}.{self.table_name}
WHERE text LIKE ?
WHERE text LIKE ? ESCAPE '|'
"""
cursor.execute(sql, (query_pattern,))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

4. Iris escape char mismatch 🐞 Bug ✓ Correctness

escape_like_pattern() escapes using backslash (e.g. %\%), but the IRIS fallback SQL
  declares ESCAPE '|'.
• This mismatch means %/_ may still behave as wildcards (or the query may instead look for literal
  backslashes), causing incorrect/no results for keywords containing special characters.
• Impact: broken/incorrect LIKE fallback search behavior for the IRIS vector datasource.
Agent prompt
### Issue description
IRIS vector fallback uses `ESCAPE '|'` while the new shared helper escapes patterns with backslash. This causes incorrect matching for `%`, `_`, and `\\`.

### Issue Context
`escape_like_pattern` is documented and implemented for backslash escaping. IRIS SQL currently expects a different escape character.

### Fix Focus Areas
- api/core/rag/datasource/vdb/iris/iris_vector.py[289-300]
- api/libs/helper.py[35-65]

### Suggested approach
- Prefer standardizing on backslash everywhere:
  - Update IRIS SQL to: `WHERE text LIKE ? ESCAPE '\\\\'` (so the SQL literal contains `\\` and the effective escape char is `\`).
  - Keep `escape_like_pattern` as-is.
- Alternatively, if IRIS requires `|`:
  - Add an optional `escape_char` parameter to `escape_like_pattern` and escape `%/_/<escape_char>` with that char.
  - Call it with `escape_char='|'` and adjust accordingly.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines 88 to 103
if keyword:
keyword_like_val = f"%{keyword[:30].encode('unicode_escape').decode('utf-8')}%".replace(r"\u", r"\\u")
from libs.helper import escape_like_pattern

# Escape special characters in keyword to prevent SQL injection via LIKE wildcards
keyword_trimmed = keyword[:30]
escaped_keyword = escape_like_pattern(keyword_trimmed)
keyword_like_val = f"%{escaped_keyword}%"
keyword_conditions = [
WorkflowRun.inputs.ilike(keyword_like_val),
WorkflowRun.outputs.ilike(keyword_like_val),
WorkflowRun.inputs.ilike(keyword_like_val, escape="\\"),
WorkflowRun.outputs.ilike(keyword_like_val, escape="\\"),
# filter keyword by end user session id if created by end user role
and_(WorkflowRun.created_by_role == "end_user", EndUser.session_id.ilike(keyword_like_val)),
and_(
WorkflowRun.created_by_role == "end_user",
EndUser.session_id.ilike(keyword_like_val, escape="\\"),
),
]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

5. Workflow keyword unicode regression 🐞 Bug ✓ Correctness

• Workflow app log keyword search no longer applies unicode-escape normalization before building the
  LIKE pattern, and now only escapes LIKE wildcards.
• WorkflowRun inputs/outputs are persisted via json.dumps(...) without ensure_ascii=False in
  task code, which commonly stores non-ASCII characters as \uXXXX sequences in the text column.
• Impact: searching workflow logs by non-ASCII keywords (e.g., CJK) may stop matching the stored
  JSON content.
Agent prompt
### Issue description
Workflow log keyword search removed unicode-escape normalization and now may fail to match workflow run inputs/outputs that are stored as JSON-escaped text.

### Issue Context
WorkflowRun inputs/outputs are persisted using `json.dumps(...)` without `ensure_ascii=False` in task code. This often results in `\\uXXXX` sequences for non-ASCII, while the user’s keyword is literal unicode.

### Fix Focus Areas
- api/services/workflow_app_service.py[88-103]
- api/tasks/workflow_execution_tasks.py[104-107]

### Suggested approach
- Before calling `escape_like_pattern`, transform the keyword into the same textual form used by JSON serialization, e.g.:
```py
keyword_trimmed = keyword[:30]
keyword_for_json_text = keyword_trimmed.encode("unicode_escape").decode("utf-8")
escaped_keyword = escape_like_pattern(keyword_for_json_text)
keyword_like_val = f"%{escaped_keyword}%"
```
- Keep `escape="\\"` on the `ilike()` calls.
- Add/extend tests to cover a non-ASCII keyword search scenario (e.g. Chinese) to prevent regressions.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants