fix(api): refactors the SQL LIKE pattern escaping logic to use a centralized utility function, ensuring consistent and secure handling of special characters across all database queries.#61
fix(api): refactors the SQL LIKE pattern escaping logic to use a centralized utility function, ensuring consistent and secure handling of special characters across all database queries.#61
Conversation
…ralized utility function, ensuring consistent and secure handling of special characters across all database queries. Signed-off-by: NeatGuyCoding <15627489+NeatGuyCoding@users.noreply.github.com>
…logic Signed-off-by: NeatGuyCoding <15627489+NeatGuyCoding@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This pull request refactors SQL LIKE pattern escaping across the Dify API codebase by introducing a centralized utility function escape_like_pattern() to prevent SQL injection via LIKE wildcards. The changes ensure that special characters (%, _, ) in user-provided search terms are properly escaped before being used in database queries.
Changes:
- Added centralized
escape_like_pattern()utility function inapi/libs/helper.pywith comprehensive unit tests - Updated 10+ service and controller files to use the new utility function for consistent SQL LIKE pattern escaping
- Added integration tests to verify special character handling in search functionality across multiple services
- Fixed circular dependency issues in test files by moving AppService imports to function scope
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| api/libs/helper.py | Added escape_like_pattern() utility function to centrally handle SQL LIKE pattern escaping |
| api/tests/unit_tests/libs/test_helper.py | Added comprehensive unit tests for the new escape function covering edge cases |
| api/services/workflow_app_service.py | Refactored keyword search to use centralized escaping with proper ESCAPE clause |
| api/services/tag_service.py | Updated tag name filtering to use escaped patterns |
| api/services/external_knowledge_service.py | Updated search functionality with pattern escaping |
| api/services/dataset_service.py | Updated dataset name and content search with escaping |
| api/services/conversation_service.py | Replaced inline escaping logic with centralized utility |
| api/services/app_service.py | Updated app name filtering with escaped patterns |
| api/services/annotation_service.py | Updated annotation search with pattern escaping |
| api/core/rag/retrieval/dataset_retrieval.py | Updated metadata filtering with escaped patterns |
| api/core/rag/datasource/vdb/iris/iris_vector.py | Updated full-text search with pattern escaping |
| api/core/rag/datasource/vdb/clickzetta/clickzetta_vector.py | Updated LIKE-based search with pattern escaping |
| api/controllers/console/datasets/datasets_segments.py | Updated segment search with escaped keyword patterns |
| api/controllers/console/app/conversation.py | Updated conversation search with pattern escaping |
| api/tests/test_containers_integration_tests/services/*.py | Added integration tests for special character handling and fixed circular dependencies |
Comments suppressed due to low confidence (8)
api/tests/test_containers_integration_tests/services/test_tag_service.py:512
- This import of module uuid is redundant, as it was previously imported on line 1.
import uuid
api/tests/test_containers_integration_tests/services/test_tag_service.py:791
- This import of module uuid is redundant, as it was previously imported on line 1.
import uuid
api/tests/test_containers_integration_tests/services/test_tag_service.py:882
- This import of module uuid is redundant, as it was previously imported on line 1.
import uuid
api/tests/test_containers_integration_tests/services/test_tag_service.py:954
- This import of module uuid is redundant, as it was previously imported on line 1.
import uuid
api/tests/test_containers_integration_tests/services/test_tag_service.py:1063
- This import of module uuid is redundant, as it was previously imported on line 1.
import uuid
api/tests/test_containers_integration_tests/services/test_tag_service.py:1198
- This import of module uuid is redundant, as it was previously imported on line 1.
import uuid
api/tests/test_containers_integration_tests/services/test_tag_service.py:1246
- This import of module uuid is redundant, as it was previously imported on line 1.
import uuid
api/tests/test_containers_integration_tests/services/test_tag_service.py:1270
- This import of module uuid is redundant, as it was previously imported on line 1.
import uuid
| escapedPattern = pattern.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_") | ||
| return escapedPattern |
There was a problem hiding this comment.
Variable name 'escapedPattern' does not follow Python naming conventions. It should use snake_case instead of camelCase. Consider renaming to 'escaped_pattern' for consistency with Python style guidelines (PEP 8).
| escapedPattern = pattern.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_") | |
| return escapedPattern | |
| escaped_pattern = pattern.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_") | |
| return escaped_pattern |
| SELECT TOP {top_k} id, text, meta | ||
| FROM {self.schema}.{self.table_name} | ||
| WHERE text LIKE ? | ||
| WHERE text LIKE ? ESCAPE '|' |
There was a problem hiding this comment.
The ESCAPE clause is inconsistent with other usages in the codebase. Throughout the PR, the escape parameter uses "\" (e.g., escape="\\" for SQLAlchemy methods), but here it uses '|'. This inconsistency could lead to confusion and should use '\' for consistency with the escape_like_pattern function which escapes using backslash.
| WHERE text LIKE ? ESCAPE '|' | |
| WHERE text LIKE ? ESCAPE '\\' |
| from libs.helper import escape_like_pattern | ||
|
|
||
| escaped_query = escape_like_pattern(query).replace("'", "''") | ||
| filter_clauses.append(f"{Field.CONTENT_KEY} LIKE '%{escaped_query}%' ESCAPE '\\\\'") |
There was a problem hiding this comment.
The ESCAPE clause syntax is inconsistent with other usages in the codebase. The string literal uses four backslashes ('\\') which may be appropriate for raw SQL, but this differs from the SQLAlchemy escape parameter pattern used elsewhere (e.g., escape="\\"). Consider documenting why this database requires a different escape syntax, or ensure this is the correct escape character for ClickZetta.
| filter_clauses.append(f"{Field.CONTENT_KEY} LIKE '%{escaped_query}%' ESCAPE '\\\\'") | |
| # Use '\' as the LIKE escape character to match escape_like_pattern and other usages | |
| filter_clauses.append(f"{Field.CONTENT_KEY} LIKE '%{escaped_query}%' ESCAPE '\\'") |
| case "not contains": | ||
| filters.append(json_field.notlike(f"%{value}%")) | ||
| escaped_value = escape_like_pattern(str(value)) | ||
| filters.append(json_field.notlike(f"%{escaped_value}%")) |
There was a problem hiding this comment.
The 'not contains' case is missing the escape parameter. While the value is properly escaped using escape_like_pattern, the notlike() call should include escape="\\" parameter for consistency with the 'contains' case on line 1203. Without this, the escape characters won't be interpreted correctly by the database.
| filters.append(json_field.notlike(f"%{escaped_value}%")) | |
| filters.append(json_field.notlike(f"%{escaped_value}%", escape="\\")) |
| case "end with": | ||
| filters.append(json_field.like(f"%{value}")) | ||
| escaped_value = escape_like_pattern(str(value)) | ||
| filters.append(json_field.like(f"%{escaped_value}", escape="\\")) |
There was a problem hiding this comment.
The 'end with' case is missing the escape parameter. While the value is properly escaped using escape_like_pattern, the like() call should include escape="\\" parameter for consistency with the 'contains' and 'start with' cases. Without this, the escape characters won't be interpreted correctly by the database.
| account, | ||
| ) | ||
|
|
||
| app_with_underscore = app_service.create_app( |
There was a problem hiding this comment.
Variable app_with_underscore is not used.
| app_with_underscore = app_service.create_app( | |
| app_service.create_app( |
| account, | ||
| ) | ||
|
|
||
| app_with_backslash = app_service.create_app( |
There was a problem hiding this comment.
Variable app_with_backslash is not used.
| app_with_backslash = app_service.create_app( | |
| app_service.create_app( |
| ) | ||
|
|
||
| # Create app that should NOT match | ||
| app_no_match = app_service.create_app( |
There was a problem hiding this comment.
Variable app_no_match is not used.
| app_no_match = app_service.create_app( | |
| app_service.create_app( |
| - SQL injection via LIKE wildcards is prevented | ||
| """ | ||
| # Arrange: Create test data | ||
| fake = Faker() |
There was a problem hiding this comment.
Variable fake is not used.
| fake = Faker() |
| - SQL injection via LIKE wildcards is prevented | ||
| """ | ||
| # Arrange: Create test data | ||
| fake = Faker() |
There was a problem hiding this comment.
Variable fake is not used.
| fake = Faker() |
Benchmark PR from qodo-benchmark#431