fix(api): refactors the SQL LIKE pattern escaping logic to use a centralized utility function, ensuring consistent and secure handling of special characters across all database queries.#150
fix(api): refactors the SQL LIKE pattern escaping logic to use a centralized utility function, ensuring consistent and secure handling of special characters across all database queries.#150
Conversation
…ralized utility function, ensuring consistent and secure handling of special characters across all database queries. Signed-off-by: NeatGuyCoding <15627489+NeatGuyCoding@users.noreply.github.com>
…logic Signed-off-by: NeatGuyCoding <15627489+NeatGuyCoding@users.noreply.github.com>
Code Review by Qodo
1. escape_like_pattern returns None
|
| def escape_like_pattern(pattern: str) -> str: | ||
| """ | ||
| Escape special characters in a string for safe use in SQL LIKE patterns. | ||
|
|
||
| This function escapes the special characters used in SQL LIKE patterns: | ||
| - Backslash (\\) -> \\ | ||
| - Percent (%) -> \\% | ||
| - Underscore (_) -> \\_ | ||
|
|
||
| The escaped pattern can then be safely used in SQL LIKE queries with the | ||
| ESCAPE '\\' clause to prevent SQL injection via LIKE wildcards. | ||
|
|
||
| Args: | ||
| pattern: The string pattern to escape | ||
|
|
||
| Returns: | ||
| Escaped string safe for use in SQL LIKE queries | ||
|
|
||
| Examples: | ||
| >>> escape_like_pattern("50% discount") | ||
| '50\\% discount' | ||
| >>> escape_like_pattern("test_data") | ||
| 'test\\_data' | ||
| >>> escape_like_pattern("path\\to\\file") | ||
| 'path\\\\to\\\\file' | ||
| """ | ||
| if not pattern: | ||
| return pattern | ||
| # Escape backslash first, then percent and underscore | ||
| escapedPattern = pattern.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_") | ||
| return escapedPattern |
There was a problem hiding this comment.
1. escape_like_pattern returns none 📘 Rule violation ✓ Correctness
• escape_like_pattern is annotated to accept pattern: str and return str, but it returns pattern directly when falsy, and tests explicitly pass None expecting None back. • This violates the requirement that Python functions have correct type annotations for parameters and return values, and will be flagged by type checking (basedpyright) as an incompatible argument/return type.
Agent prompt
## Issue description
`escape_like_pattern` is typed as `str -> str` but it currently returns `None` when passed `None` (and tests rely on that). This violates the project's strict type-annotation compliance and will fail type checking.
## Issue Context
The function contains an early return `if not pattern: return pattern`, and tests pass `None` to confirm that behavior.
## Fix Focus Areas
- api/libs/helper.py[35-65]
- api/tests/unit_tests/libs/test_helper.py[96-101]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| case "not contains": | ||
| filters.append(json_field.notlike(f"%{value}%")) | ||
| escaped_value = escape_like_pattern(str(value)) | ||
| filters.append(json_field.notlike(f"%{escaped_value}%")) |
There was a problem hiding this comment.
2. notlike missing escape 📘 Rule violation ⛨ Security
• In process_metadata_filter_func, the not contains branch escapes % and _ using backslashes but does not pass an escape character to notlike. • Without escape="\\", sequences like \% may still be treated as a wildcard % by the database, undermining the intended input sanitization and potentially causing incorrect filtering behavior.
Agent prompt
## Issue description
The `not contains` metadata filter escapes LIKE wildcards but does not specify the escape character on `notlike`, which can cause `\%` / `\_` to be treated as wildcards instead of literals.
## Issue Context
Other branches (`contains`, `start with`, `end with`) correctly pass `escape="\\"`.
## Fix Focus Areas
- api/core/rag/retrieval/dataset_retrieval.py[1205-1207]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| def test_get_annotation_list_by_app_id_with_special_characters_in_keyword( | ||
| self, db_session_with_containers, mock_external_service_dependencies | ||
| ): |
There was a problem hiding this comment.
3. New pytest tests lack types 📘 Rule violation ✓ Correctness
• Newly added pytest test functions introduce untyped parameters and omit return type annotations (e.g., db_session_with_containers, mock_external_service_dependencies). • This violates the requirement that all Python function definitions include parameter and return type annotations using modern Python 3.12+ syntax.
Agent prompt
## Issue description
New pytest test functions were added without type annotations for parameters and return values, violating the repo’s strict type-annotation requirements.
## Issue Context
Pytest fixtures can be annotated using concrete types when known (e.g., SQLAlchemy `Session`) or `Any` when not convenient.
## Fix Focus Areas
- api/tests/test_containers_integration_tests/services/test_annotation_service.py[447-449]
- api/tests/test_containers_integration_tests/services/test_app_service.py[1012-1014]
- api/tests/test_containers_integration_tests/services/test_tag_service.py[316-318]
- api/tests/test_containers_integration_tests/services/test_workflow_app_service.py[319-321]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| # Escape special characters for LIKE clause to prevent SQL injection | ||
| from libs.helper import escape_like_pattern | ||
|
|
||
| escaped_query = escape_like_pattern(query) | ||
| query_pattern = f"%{escaped_query}%" | ||
| sql = f""" | ||
| SELECT TOP {top_k} id, text, meta | ||
| FROM {self.schema}.{self.table_name} | ||
| WHERE text LIKE ? | ||
| WHERE text LIKE ? ESCAPE '|' | ||
| """ | ||
| cursor.execute(sql, (query_pattern,)) |
There was a problem hiding this comment.
4. Iris escape char mismatch 🐞 Bug ✓ Correctness
• escape_like_pattern() escapes using backslash (e.g. % → \%), but the IRIS fallback SQL declares ESCAPE '|'. • This mismatch means %/_ may still behave as wildcards (or the query may instead look for literal backslashes), causing incorrect/no results for keywords containing special characters. • Impact: broken/incorrect LIKE fallback search behavior for the IRIS vector datasource.
Agent prompt
### Issue description
IRIS vector fallback uses `ESCAPE '|'` while the new shared helper escapes patterns with backslash. This causes incorrect matching for `%`, `_`, and `\\`.
### Issue Context
`escape_like_pattern` is documented and implemented for backslash escaping. IRIS SQL currently expects a different escape character.
### Fix Focus Areas
- api/core/rag/datasource/vdb/iris/iris_vector.py[289-300]
- api/libs/helper.py[35-65]
### Suggested approach
- Prefer standardizing on backslash everywhere:
- Update IRIS SQL to: `WHERE text LIKE ? ESCAPE '\\\\'` (so the SQL literal contains `\\` and the effective escape char is `\`).
- Keep `escape_like_pattern` as-is.
- Alternatively, if IRIS requires `|`:
- Add an optional `escape_char` parameter to `escape_like_pattern` and escape `%/_/<escape_char>` with that char.
- Call it with `escape_char='|'` and adjust accordingly.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| if keyword: | ||
| keyword_like_val = f"%{keyword[:30].encode('unicode_escape').decode('utf-8')}%".replace(r"\u", r"\\u") | ||
| from libs.helper import escape_like_pattern | ||
|
|
||
| # Escape special characters in keyword to prevent SQL injection via LIKE wildcards | ||
| keyword_trimmed = keyword[:30] | ||
| escaped_keyword = escape_like_pattern(keyword_trimmed) | ||
| keyword_like_val = f"%{escaped_keyword}%" | ||
| keyword_conditions = [ | ||
| WorkflowRun.inputs.ilike(keyword_like_val), | ||
| WorkflowRun.outputs.ilike(keyword_like_val), | ||
| WorkflowRun.inputs.ilike(keyword_like_val, escape="\\"), | ||
| WorkflowRun.outputs.ilike(keyword_like_val, escape="\\"), | ||
| # filter keyword by end user session id if created by end user role | ||
| and_(WorkflowRun.created_by_role == "end_user", EndUser.session_id.ilike(keyword_like_val)), | ||
| and_( | ||
| WorkflowRun.created_by_role == "end_user", | ||
| EndUser.session_id.ilike(keyword_like_val, escape="\\"), | ||
| ), | ||
| ] |
There was a problem hiding this comment.
5. Workflow keyword unicode regression 🐞 Bug ✓ Correctness
• Workflow app log keyword search no longer applies unicode-escape normalization before building the LIKE pattern, and now only escapes LIKE wildcards. • WorkflowRun inputs/outputs are persisted via json.dumps(...) without ensure_ascii=False in task code, which commonly stores non-ASCII characters as \uXXXX sequences in the text column. • Impact: searching workflow logs by non-ASCII keywords (e.g., CJK) may stop matching the stored JSON content.
Agent prompt
### Issue description
Workflow log keyword search removed unicode-escape normalization and now may fail to match workflow run inputs/outputs that are stored as JSON-escaped text.
### Issue Context
WorkflowRun inputs/outputs are persisted using `json.dumps(...)` without `ensure_ascii=False` in task code. This often results in `\\uXXXX` sequences for non-ASCII, while the user’s keyword is literal unicode.
### Fix Focus Areas
- api/services/workflow_app_service.py[88-103]
- api/tasks/workflow_execution_tasks.py[104-107]
### Suggested approach
- Before calling `escape_like_pattern`, transform the keyword into the same textual form used by JSON serialization, e.g.:
```py
keyword_trimmed = keyword[:30]
keyword_for_json_text = keyword_trimmed.encode("unicode_escape").decode("utf-8")
escaped_keyword = escape_like_pattern(keyword_for_json_text)
keyword_like_val = f"%{escaped_keyword}%"
```
- Keep `escape="\\"` on the `ilike()` calls.
- Add/extend tests to cover a non-ASCII keyword search scenario (e.g. Chinese) to prevent regressions.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
Benchmark PR from agentic-review-benchmarks#12