fix(api): refactors the SQL LIKE pattern escaping logic to use a centralized utility function, ensuring consistent and secure handling of special characters across all database queries. by tomerqodo · Pull Request #150 · qodo-benchmark/dify-combined-qodo

tomerqodo · 2026-01-29T23:01:41Z

Benchmark PR from agentic-review-benchmarks#12

…ralized utility function, ensuring consistent and secure handling of special characters across all database queries. Signed-off-by: NeatGuyCoding <15627489+NeatGuyCoding@users.noreply.github.com>

…logic Signed-off-by: NeatGuyCoding <15627489+NeatGuyCoding@users.noreply.github.com>

qodo-code-review · 2026-01-29T23:08:50Z

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (3) 📎 Requirement gaps (0)

1. escape_like_pattern returns None 📘 Rule violation ✓ Correctness

Description

• escape_like_pattern is annotated to accept pattern: str and return str, but it returns
  pattern directly when falsy, and tests explicitly pass None expecting None back.
• This violates the requirement that Python functions have correct type annotations for parameters
  and return values, and will be flagged by type checking (basedpyright) as an incompatible
  argument/return type.

Code

api/libs/helper.py[R35-65]

+def escape_like_pattern(pattern: str) -> str:
+    """
+    Escape special characters in a string for safe use in SQL LIKE patterns.
+
+    This function escapes the special characters used in SQL LIKE patterns:
+    - Backslash (\\) -> \\
+    - Percent (%) -> \\%
+    - Underscore (_) -> \\_
+
+    The escaped pattern can then be safely used in SQL LIKE queries with the
+    ESCAPE '\\' clause to prevent SQL injection via LIKE wildcards.
+
+    Args:
+        pattern: The string pattern to escape
+
+    Returns:
+        Escaped string safe for use in SQL LIKE queries
+
+    Examples:
+        >>> escape_like_pattern("50% discount")
+        '50\\% discount'
+        >>> escape_like_pattern("test_data")
+        'test\\_data'
+        >>> escape_like_pattern("path\\to\\file")
+        'path\\\\to\\\\file'
+    """
+    if not pattern:
+        return pattern
+    # Escape backslash first, then percent and underscore
+    escapedPattern = pattern.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_")
+    return escapedPattern

Evidence

The compliance rule requires complete and correct type annotations for all function parameters and
return types. The implementation returns pattern when falsy (including None), and the unit test
calls the function with None, contradicting the declared str -> str signature.

AGENTS.md
api/libs/helper.py[35-65]
api/tests/unit_tests/libs/test_helper.py[96-101]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`escape_like_pattern` is typed as `str -&gt; str` but it currently returns `None` when passed `None` (and tests rely on that). This violates the project&#x27;s strict type-annotation compliance and will fail type checking.

## Issue Context
The function contains an early return `if not pattern: return pattern`, and tests pass `None` to confirm that behavior.

## Fix Focus Areas
- api/libs/helper.py[35-65]
- api/tests/unit_tests/libs/test_helper.py[96-101]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. notlike missing escape 📘 Rule violation ⛨ Security

Description

• In process_metadata_filter_func, the not contains branch escapes % and _ using backslashes
  but does not pass an escape character to notlike.
• Without escape="\\", sequences like \% may still be treated as a wildcard % by the database,
  undermining the intended input sanitization and potentially causing incorrect filtering behavior.

Code

api/core/rag/retrieval/dataset_retrieval.py[R1205-1207]

            case "not contains":
-                filters.append(json_field.notlike(f"%{value}%"))
+                escaped_value = escape_like_pattern(str(value))
+                filters.append(json_field.notlike(f"%{escaped_value}%"))

Evidence

The security-first input handling rule requires consistent sanitization to prevent injection issues
and incorrect behavior. The code escapes the pattern but omits the required escape argument for
the notlike operator, unlike adjacent like cases that explicitly provide escape="\\".

Rule 6: Generic: Security-First Input Validation and Data Handling
api/core/rag/retrieval/dataset_retrieval.py[1200-1215]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The `not contains` metadata filter escapes LIKE wildcards but does not specify the escape character on `notlike`, which can cause `\%` / `\_` to be treated as wildcards instead of literals.

## Issue Context
Other branches (`contains`, `start with`, `end with`) correctly pass `escape=&quot;\\&quot;`.

## Fix Focus Areas
- api/core/rag/retrieval/dataset_retrieval.py[1205-1207]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. New pytest tests lack types 📘 Rule violation ✓ Correctness

Description

• Newly added pytest test functions introduce untyped parameters and omit return type annotations
  (e.g., db_session_with_containers, mock_external_service_dependencies).
• This violates the requirement that all Python function definitions include parameter and return
  type annotations using modern Python 3.12+ syntax.

Code

api/tests/test_containers_integration_tests/services/test_annotation_service.py[R447-449]
+    def test_get_annotation_list_by_app_id_with_special_characters_in_keyword(
+        self, db_session_with_containers, mock_external_service_dependencies
+    ):

Evidence
The compliance rule requires explicit type hints for all function parameters and return types. The
added tests define new functions without annotating fixture parameters or -> None return types.
AGENTS.md
api/tests/test_containers_integration_tests/services/test_annotation_service.py[447-449]
api/tests/test_containers_integration_tests/services/test_app_service.py[1012-1014]
api/tests/test_containers_integration_tests/services/test_tag_service.py[316-318]
api/tests/test_containers_integration_tests/services/test_workflow_app_service.py[319-321]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
New pytest test functions were added without type annotations for parameters and return values, violating the repo’s strict type-annotation requirements.

## Issue Context
Pytest fixtures can be annotated using concrete types when known (e.g., SQLAlchemy `Session`) or `Any` when not convenient.

## Fix Focus Areas
- api/tests/test_containers_integration_tests/services/test_annotation_service.py[447-449]
- api/tests/test_containers_integration_tests/services/test_app_service.py[1012-1014]
- api/tests/test_containers_integration_tests/services/test_tag_service.py[316-318]
- api/tests/test_containers_integration_tests/services/test_workflow_app_service.py[319-321]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

View more (2)

4. IRIS ESCAPE char mismatch 🐞 Bug ✓ Correctness

Description

• escape_like_pattern() escapes using backslash (e.g. % → \%), but the IRIS fallback SQL
  declares ESCAPE '|'.
• This mismatch means %/_ may still behave as wildcards (or the query may instead look for literal
  backslashes), causing incorrect/no results for keywords containing special characters.
• Impact: broken/incorrect LIKE fallback search behavior for the IRIS vector datasource.

Code

api/core/rag/datasource/vdb/iris/iris_vector.py[R290-300]

+                # Escape special characters for LIKE clause to prevent SQL injection
+                from libs.helper import escape_like_pattern
+
+                escaped_query = escape_like_pattern(query)
+                query_pattern = f"%{escaped_query}%"
                sql = f"""
                    SELECT TOP {top_k} id, text, meta
                    FROM {self.schema}.{self.table_name}
-                    WHERE text LIKE ?
+                    WHERE text LIKE ? ESCAPE '|'
                """
                cursor.execute(sql, (query_pattern,))

Evidence

escape_like_pattern hardcodes backslash escaping and its doc explicitly says to use ESCAPE '\\'.
IRIS fallback uses ESCAPE '|', so the escaping performed by escape_like_pattern is not
interpreted by the database as intended.

api/libs/helper.py[35-65]
api/core/rag/datasource/vdb/iris/iris_vector.py[289-300]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
IRIS vector fallback uses `ESCAPE &#x27;|&#x27;` while the new shared helper escapes patterns with backslash. This causes incorrect matching for `%`, `_`, and `\\`.

### Issue Context
`escape_like_pattern` is documented and implemented for backslash escaping. IRIS SQL currently expects a different escape character.

### Fix Focus Areas
- api/core/rag/datasource/vdb/iris/iris_vector.py[289-300]
- api/libs/helper.py[35-65]

### Suggested approach
- Prefer standardizing on backslash everywhere:
 - Update IRIS SQL to: `WHERE text LIKE ? ESCAPE &#x27;\\\\&#x27;` (so the SQL literal contains `\\` and the effective escape char is `\`).
 - Keep `escape_like_pattern` as-is.
- Alternatively, if IRIS requires `|`:
 - Add an optional `escape_char` parameter to `escape_like_pattern` and escape `%/_/&lt;escape_char&gt;` with that char.
 - Call it with `escape_char=&#x27;|&#x27;` and adjust accordingly.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

5. Workflow keyword unicode regression 🐞 Bug ✓ Correctness

Description

• Workflow app log keyword search no longer applies unicode-escape normalization before building the
  LIKE pattern, and now only escapes LIKE wildcards.
• WorkflowRun inputs/outputs are persisted via json.dumps(...) without ensure_ascii=False in
  task code, which commonly stores non-ASCII characters as \uXXXX sequences in the text column.
• Impact: searching workflow logs by non-ASCII keywords (e.g., CJK) may stop matching the stored
  JSON content.

Code

api/services/workflow_app_service.py[R88-103]

        if keyword:
-            keyword_like_val = f"%{keyword[:30].encode('unicode_escape').decode('utf-8')}%".replace(r"\u", r"\\u")
+            from libs.helper import escape_like_pattern
+
+            # Escape special characters in keyword to prevent SQL injection via LIKE wildcards
+            keyword_trimmed = keyword[:30]
+            escaped_keyword = escape_like_pattern(keyword_trimmed)
+            keyword_like_val = f"%{escaped_keyword}%"
            keyword_conditions = [
-                WorkflowRun.inputs.ilike(keyword_like_val),
-                WorkflowRun.outputs.ilike(keyword_like_val),
+                WorkflowRun.inputs.ilike(keyword_like_val, escape="\\"),
+                WorkflowRun.outputs.ilike(keyword_like_val, escape="\\"),
                # filter keyword by end user session id if created by end user role
-                and_(WorkflowRun.created_by_role == "end_user", EndUser.session_id.ilike(keyword_like_val)),
+                and_(
+                    WorkflowRun.created_by_role == "end_user",
+                    EndUser.session_id.ilike(keyword_like_val, escape="\\"),
+                ),
            ]

Evidence

The search path now uses the raw keyword[:30] (only applying LIKE escaping). Meanwhile workflow
run persistence uses json.dumps(...) without ensure_ascii=False, unlike other workflow
serialization that explicitly opts into ensure_ascii=False, indicating workflow run JSON text may
not be stored in literal unicode form. Without normalization, LIKE searches for non-ASCII may not
match the persisted representation.

api/services/workflow_app_service.py[88-103]
api/tasks/workflow_execution_tasks.py[104-107]
api/models/workflow.py[527-533]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
Workflow log keyword search removed unicode-escape normalization and now may fail to match workflow run inputs/outputs that are stored as JSON-escaped text.

### Issue Context
WorkflowRun inputs/outputs are persisted using `json.dumps(...)` without `ensure_ascii=False` in task code. This often results in `\\uXXXX` sequences for non-ASCII, while the user’s keyword is literal unicode.

### Fix Focus Areas
- api/services/workflow_app_service.py[88-103]
- api/tasks/workflow_execution_tasks.py[104-107]

### Suggested approach
- Before calling `escape_like_pattern`, transform the keyword into the same textual form used by JSON serialization, e.g.:
```py
keyword_trimmed = keyword[:30]
keyword_for_json_text = keyword_trimmed.encode(&quot;unicode_escape&quot;).decode(&quot;utf-8&quot;)
escaped_keyword = escape_like_pattern(keyword_for_json_text)
keyword_like_val = f&quot;%{escaped_keyword}%&quot;
```
- Keep `escape=&quot;\\&quot;` on the `ilike()` calls.
- Add/extend tests to cover a non-ASCII keyword search scenario (e.g. Chinese) to prevent regressions.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

ⓘ The new review experience is currently in Beta. Learn more

qodo-code-review · 2026-01-29T23:08:50Z

+def escape_like_pattern(pattern: str) -> str:
+    """
+    Escape special characters in a string for safe use in SQL LIKE patterns.
+
+    This function escapes the special characters used in SQL LIKE patterns:
+    - Backslash (\\) -> \\
+    - Percent (%) -> \\%
+    - Underscore (_) -> \\_
+
+    The escaped pattern can then be safely used in SQL LIKE queries with the
+    ESCAPE '\\' clause to prevent SQL injection via LIKE wildcards.
+
+    Args:
+        pattern: The string pattern to escape
+
+    Returns:
+        Escaped string safe for use in SQL LIKE queries
+
+    Examples:
+        >>> escape_like_pattern("50% discount")
+        '50\\% discount'
+        >>> escape_like_pattern("test_data")
+        'test\\_data'
+        >>> escape_like_pattern("path\\to\\file")
+        'path\\\\to\\\\file'
+    """
+    if not pattern:
+        return pattern
+    # Escape backslash first, then percent and underscore
+    escapedPattern = pattern.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_")
+    return escapedPattern


1. escape_like_pattern returns none 📘 Rule violation ✓ Correctness

• escape_like_pattern is annotated to accept pattern: str and return str, but it returns pattern directly when falsy, and tests explicitly pass None expecting None back. • This violates the requirement that Python functions have correct type annotations for parameters and return values, and will be flagged by type checking (basedpyright) as an incompatible argument/return type.

Agent prompt

## Issue description `escape_like_pattern` is typed as `str -> str` but it currently returns `None` when passed `None` (and tests rely on that). This violates the project's strict type-annotation compliance and will fail type checking. ## Issue Context The function contains an early return `if not pattern: return pattern`, and tests pass `None` to confirm that behavior. ## Fix Focus Areas - api/libs/helper.py[35-65] - api/tests/unit_tests/libs/test_helper.py[96-101]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

qodo-code-review · 2026-01-29T23:08:51Z

            case "not contains":
-                filters.append(json_field.notlike(f"%{value}%"))
+                escaped_value = escape_like_pattern(str(value))
+                filters.append(json_field.notlike(f"%{escaped_value}%"))


2. notlike missing escape 📘 Rule violation ⛨ Security

• In process_metadata_filter_func, the not contains branch escapes % and _ using backslashes but does not pass an escape character to notlike. • Without escape="\\", sequences like \% may still be treated as a wildcard % by the database, undermining the intended input sanitization and potentially causing incorrect filtering behavior.

Agent prompt

## Issue description The `not contains` metadata filter escapes LIKE wildcards but does not specify the escape character on `notlike`, which can cause `\%` / `\_` to be treated as wildcards instead of literals. ## Issue Context Other branches (`contains`, `start with`, `end with`) correctly pass `escape="\\"`. ## Fix Focus Areas - api/core/rag/retrieval/dataset_retrieval.py[1205-1207]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

qodo-code-review · 2026-01-29T23:08:51Z

+    def test_get_annotation_list_by_app_id_with_special_characters_in_keyword(
+        self, db_session_with_containers, mock_external_service_dependencies
+    ):


3. New pytest tests lack types 📘 Rule violation ✓ Correctness

• Newly added pytest test functions introduce untyped parameters and omit return type annotations (e.g., db_session_with_containers, mock_external_service_dependencies). • This violates the requirement that all Python function definitions include parameter and return type annotations using modern Python 3.12+ syntax.

Agent prompt

## Issue description New pytest test functions were added without type annotations for parameters and return values, violating the repo’s strict type-annotation requirements. ## Issue Context Pytest fixtures can be annotated using concrete types when known (e.g., SQLAlchemy `Session`) or `Any` when not convenient. ## Fix Focus Areas - api/tests/test_containers_integration_tests/services/test_annotation_service.py[447-449] - api/tests/test_containers_integration_tests/services/test_app_service.py[1012-1014] - api/tests/test_containers_integration_tests/services/test_tag_service.py[316-318] - api/tests/test_containers_integration_tests/services/test_workflow_app_service.py[319-321]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

qodo-code-review · 2026-01-29T23:08:51Z

+                # Escape special characters for LIKE clause to prevent SQL injection
+                from libs.helper import escape_like_pattern
+
+                escaped_query = escape_like_pattern(query)
+                query_pattern = f"%{escaped_query}%"
                sql = f"""
                    SELECT TOP {top_k} id, text, meta
                    FROM {self.schema}.{self.table_name}
-                    WHERE text LIKE ?
+                    WHERE text LIKE ? ESCAPE '|'
                """
                cursor.execute(sql, (query_pattern,))


4. Iris escape char mismatch 🐞 Bug ✓ Correctness

• escape_like_pattern() escapes using backslash (e.g. % → \%), but the IRIS fallback SQL declares ESCAPE '|'. • This mismatch means %/_ may still behave as wildcards (or the query may instead look for literal backslashes), causing incorrect/no results for keywords containing special characters. • Impact: broken/incorrect LIKE fallback search behavior for the IRIS vector datasource.

Agent prompt

### Issue description IRIS vector fallback uses `ESCAPE '|'` while the new shared helper escapes patterns with backslash. This causes incorrect matching for `%`, `_`, and `\\`. ### Issue Context `escape_like_pattern` is documented and implemented for backslash escaping. IRIS SQL currently expects a different escape character. ### Fix Focus Areas - api/core/rag/datasource/vdb/iris/iris_vector.py[289-300] - api/libs/helper.py[35-65] ### Suggested approach - Prefer standardizing on backslash everywhere: - Update IRIS SQL to: `WHERE text LIKE ? ESCAPE '\\\\'` (so the SQL literal contains `\\` and the effective escape char is `\`). - Keep `escape_like_pattern` as-is. - Alternatively, if IRIS requires `|`: - Add an optional `escape_char` parameter to `escape_like_pattern` and escape `%/_/<escape_char>` with that char. - Call it with `escape_char='|'` and adjust accordingly.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

qodo-code-review · 2026-01-29T23:08:51Z

        if keyword:
-            keyword_like_val = f"%{keyword[:30].encode('unicode_escape').decode('utf-8')}%".replace(r"\u", r"\\u")
+            from libs.helper import escape_like_pattern
+
+            # Escape special characters in keyword to prevent SQL injection via LIKE wildcards
+            keyword_trimmed = keyword[:30]
+            escaped_keyword = escape_like_pattern(keyword_trimmed)
+            keyword_like_val = f"%{escaped_keyword}%"
            keyword_conditions = [
-                WorkflowRun.inputs.ilike(keyword_like_val),
-                WorkflowRun.outputs.ilike(keyword_like_val),
+                WorkflowRun.inputs.ilike(keyword_like_val, escape="\\"),
+                WorkflowRun.outputs.ilike(keyword_like_val, escape="\\"),
                # filter keyword by end user session id if created by end user role
-                and_(WorkflowRun.created_by_role == "end_user", EndUser.session_id.ilike(keyword_like_val)),
+                and_(
+                    WorkflowRun.created_by_role == "end_user",
+                    EndUser.session_id.ilike(keyword_like_val, escape="\\"),
+                ),
            ]


5. Workflow keyword unicode regression 🐞 Bug ✓ Correctness

• Workflow app log keyword search no longer applies unicode-escape normalization before building the LIKE pattern, and now only escapes LIKE wildcards. • WorkflowRun inputs/outputs are persisted via json.dumps(...) without ensure_ascii=False in task code, which commonly stores non-ASCII characters as \uXXXX sequences in the text column. • Impact: searching workflow logs by non-ASCII keywords (e.g., CJK) may stop matching the stored JSON content.

Agent prompt

### Issue description Workflow log keyword search removed unicode-escape normalization and now may fail to match workflow run inputs/outputs that are stored as JSON-escaped text. ### Issue Context WorkflowRun inputs/outputs are persisted using `json.dumps(...)` without `ensure_ascii=False` in task code. This often results in `\\uXXXX` sequences for non-ASCII, while the user’s keyword is literal unicode. ### Fix Focus Areas - api/services/workflow_app_service.py[88-103] - api/tasks/workflow_execution_tasks.py[104-107] ### Suggested approach - Before calling `escape_like_pattern`, transform the keyword into the same textual form used by JSON serialization, e.g.: ```py keyword_trimmed = keyword[:30] keyword_for_json_text = keyword_trimmed.encode("unicode_escape").decode("utf-8") escaped_keyword = escape_like_pattern(keyword_for_json_text) keyword_like_val = f"%{escaped_keyword}%" ``` - Keep `escape="\\"` on the `ilike()` calls. - Add/extend tests to cover a non-ASCII keyword search scenario (e.g. Chinese) to prevent regressions.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

NeatGuyCoding and others added 4 commits January 25, 2026 12:05

fix(api): refactors the SQL LIKE pattern escaping logic to use a cent…

9fcc127

…ralized utility function, ensuring consistent and secure handling of special characters across all database queries. Signed-off-by: NeatGuyCoding <15627489+NeatGuyCoding@users.noreply.github.com>

fix(api): fix the tests of refactoring the SQL LIKE pattern escaping …

98c38c2

…logic Signed-off-by: NeatGuyCoding <15627489+NeatGuyCoding@users.noreply.github.com>

[autofix.ci] apply automated fixes

8dbf5d4

update pr

fe99d63

qodo-code-review Bot reviewed Jan 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(api): refactors the SQL LIKE pattern escaping logic to use a centralized utility function, ensuring consistent and secure handling of special characters across all database queries.#150

tomerqodo commented Jan 29, 2026

Uh oh!

qodo-code-review Bot commented Jan 29, 2026

Uh oh!

qodo-code-review Bot Jan 29, 2026

Uh oh!

qodo-code-review Bot Jan 29, 2026

Uh oh!

qodo-code-review Bot Jan 29, 2026

Uh oh!

qodo-code-review Bot Jan 29, 2026

Uh oh!

qodo-code-review Bot Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tomerqodo commented Jan 29, 2026

Uh oh!

qodo-code-review Bot commented Jan 29, 2026

Code Review by Qodo

Uh oh!

qodo-code-review Bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

qodo-code-review Bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

qodo-code-review Bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

qodo-code-review Bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

qodo-code-review Bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants