Refactor to use DBMS_HYBRID_SEARCH.SEARCH directly by aopstudio · Pull Request #148 · oceanbase/pyseekdb

aopstudio · 2026-01-28T03:17:33Z

Summary

Refactor to use DBMS_HYBRID_SEARCH.SEARCH directly

Solution Description

Replaces the two-step process of calling GET_SQL and executing the returned SQL with a direct call to DBMS_HYBRID_SEARCH.SEARCH. This simplifies the search logic and parses the JSON result directly, improving efficiency and maintainability.

Notes

This PR is for the AI coding activity and addresses issue #118.

Summary by CodeRabbit

Refactor
- Optimized search query execution workflow. The internal search processing has been streamlined to reduce overhead and improve performance. These changes are internal only; all existing functionality, public APIs, and backward compatibility remain completely unchanged.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Replaces the two-step process of calling GET_SQL and executing the returned SQL with a direct call to DBMS_HYBRID_SEARCH.SEARCH. This simplifies the search logic and parses the JSON result directly, improving efficiency and maintainability.

CLAassistant · 2026-01-28T03:17:40Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

coderabbitai · 2026-01-28T03:17:54Z

📝 Walkthrough

Walkthrough

This change refactors the hybrid search workflow in the client base by replacing a two-step pattern (DBMS_HYBRID_SEARCH.GET_SQL followed by SQL execution) with a direct DBMS_HYBRID_SEARCH.SEARCH call. The code now processes search results directly instead of generating and executing SQL, updating result parsing, error handling, and logging accordingly.

Changes

Cohort / File(s)	Summary
Hybrid Search Flow Refactoring `src/pyseekdb/client/client_base.py`	Replaced GET_SQL + execution path with direct SEARCH call; updated result parsing to handle stringified JSON/dict from SEARCH instead of executing returned SQL; modified error handling to warn on missing search_result; adjusted logging to reflect new query flow and result reception

Possibly related issues

[Enhancement]: simplify hybrid_search instead of calling get_sql and then execute sql #118: Both changes modify the hybrid search flow in client_base.py by replacing the GET_SQL + execute pattern with a direct DBMS_HYBRID_SEARCH.SEARCH call

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A rabbit's ode to streamlined paths:
Once SQL strings danced in the night,
Through GET_SQL gates, so roundabout tight—
Now SEARCH hops directly, swift and lean,
One call replaces the in-between!
The result flows pure, no strings to bind,
A simpler way, with cleaner mind. 🌙

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: replacing the two-step GET_SQL + execution flow with a direct DBMS_HYBRID_SEARCH.SEARCH call, which is the primary focus of the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@src/pyseekdb/client/client_base.py`:
- Line 2570: Replace the redundant f-string in the logging call so it isn't
using an f-string with no placeholders; change the logger.debug call that
currently reads logger.debug(f"Search result received from
DBMS_HYBRID_SEARCH.SEARCH") to use a plain string logger.debug("Search result
received from DBMS_HYBRID_SEARCH.SEARCH") in the same function where
logger.debug is called.
- Around line 2564-2569: The code reads search_result_json and passes it to
_transform_sql_result but doesn't validate its schema; if search_result_json is
a dict (e.g., a nested response with a 'hits' key from
DBMS_HYBRID_SEARCH.SEARCH) _transform_sql_result will iterate keys and corrupt
output. Update the parsing in the block that sets search_result (inspect
variable search_result_json) to detect dict-shaped responses: if it's a dict,
either extract the list of rows (e.g., search_result_json["hits"] or similar) or
pass the dict to _transform_search_result to obtain a list[dict[str, Any]]
before calling _transform_sql_result; ensure the final search_result is
validated as a list of dicts before further processing and raise or log a clear
error if the schema is unexpected.
- Around line 2549-2553: The SQL currently injects table_name directly into
search_query; instead build the query with a parameter placeholder and pass
table_name as a bound parameter to _execute_query_with_cursor to prevent SQL
injection: change the constructed search_query to use a %s placeholder for the
DBMS_HYBRID_SEARCH.SEARCH first argument (e.g. DBMS_HYBRID_SEARCH.SEARCH(%s,
`@search_parm`) ...) and call self._execute_query_with_cursor(conn, search_query,
[table_name], use_context_manager) so the table name from
CollectionNames.table_name() is bound instead of interpolated.

coderabbitai · 2026-01-28T03:21:53Z

src/pyseekdb/client/client_base.py

+        # Call DBMS_HYBRID_SEARCH.SEARCH directly instead of GET_SQL + SQL execution
+        search_query = f"SELECT DBMS_HYBRID_SEARCH.SEARCH('{table_name}', @search_parm) as search_result FROM dual"
+        logger.debug(f"Executing search query: {search_query}")

-        rows = self._execute_query_with_cursor(conn, get_sql_query, [], use_context_manager)
-
-        if not rows or not rows[0].get("query_sql"):
-            logger.warning("No SQL query returned from GET_SQL")
+        rows = self._execute_query_with_cursor(conn, search_query, [], use_context_manager)


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Find the method containing lines 2549-2553 rg -n "_collection_hybrid_search|def.*hybrid" src/pyseekdb/client/client_base.py -B 5 -A 30 | head -100

Repository: oceanbase/pyseekdb

Length of output: 1876

🏁 Script executed:

# Check the signature of _execute_query_with_cursor rg -n "_execute_query_with_cursor" src/pyseekdb/client/client_base.py -B 2 -A 10 | head -60

Repository: oceanbase/pyseekdb

Length of output: 2494

🏁 Script executed:

# Understand how table_name is derived in the hybrid search context rg -n "table_name.*=" src/pyseekdb/client/client_base.py | grep -A 2 -B 2 "2[45]"

Repository: oceanbase/pyseekdb

Length of output: 929

🏁 Script executed:

# Check what _transform_sql_result expects rg -n "_transform_sql_result" src/pyseekdb/client/client_base.py -B 2 -A 5

Repository: oceanbase/pyseekdb

Length of output: 770

🏁 Script executed:

# Find CollectionNames class and table_name methods rg -n "class CollectionNames|def table_name" src/pyseekdb -B 2 -A 8

Repository: oceanbase/pyseekdb

Length of output: 1576

🏁 Script executed:

# Look for other uses of DBMS_HYBRID_SEARCH in the codebase rg -n "DBMS_HYBRID_SEARCH" src/pyseekdb --context 3

Repository: oceanbase/pyseekdb

Length of output: 1876

🏁 Script executed:

# Check how parameters are used in other execute_query_with_cursor calls rg -n "_execute_query_with_cursor.*\[" src/pyseekdb/client/client_base.py | head -20

Repository: oceanbase/pyseekdb

Length of output: 331

🏁 Script executed:

# Look at lines 2549-2570 more carefully to understand the full context sed -n '2540,2580p' src/pyseekdb/client/client_base.py

Repository: oceanbase/pyseekdb

Length of output: 1873

🏁 Script executed:

# Find escape_string function rg -n "def escape_string|escape_string" src/pyseekdb --context 2 | head -40

Repository: oceanbase/pyseekdb

Length of output: 2960

🏁 Script executed:

# Check how other SQL queries handle table_name interpolation rg -n "SELECT.*\`.*table" src/pyseekdb/client/client_base.py | head -10

Repository: oceanbase/pyseekdb

Length of output: 460

🏁 Script executed:

# Look for backtick usage around table_name in other queries rg -n "\`.*table_name" src/pyseekdb/client/client_base.py | head -15

Repository: oceanbase/pyseekdb

Length of output: 1795

🏁 Script executed:

# Check UPDATE statement with backticks (from earlier search showing line 1525) sed -n '1520,1530p' src/pyseekdb/client/client_base.py

Repository: oceanbase/pyseekdb

Length of output: 450

Escape or parameterize the table_name in the DBMS_HYBRID_SEARCH.SEARCH call to prevent SQL injection.

The table_name is derived from user-supplied collection_name via CollectionNames.table_name() and is interpolated directly into the SQL string without escaping or parameterization. Although the preceding SET statement uses escape_string() for the search parameters, this call lacks protection. Use parameter binding with %s to align with the parameterized approach already supported by _execute_query_with_cursor.

Proposed fix

- search_query = f"SELECT DBMS_HYBRID_SEARCH.SEARCH('{table_name}', `@search_parm`) as search_result FROM dual" - logger.debug(f"Executing search query: {search_query}") - - rows = self._execute_query_with_cursor(conn, search_query, [], use_context_manager) + search_query = "SELECT DBMS_HYBRID_SEARCH.SEARCH(%s, `@search_parm`) as search_result FROM dual" + logger.debug(f"Executing search query: {search_query}") + + rows = self._execute_query_with_cursor(conn, search_query, [table_name], use_context_manager)

🧰 Tools

🪛 Ruff (0.14.14)

2550-2550: Possible SQL injection vector through string-based query construction

(S608)

🤖 Prompt for AI Agents

In `@src/pyseekdb/client/client_base.py` around lines 2549 - 2553, The SQL currently injects table_name directly into search_query; instead build the query with a parameter placeholder and pass table_name as a bound parameter to _execute_query_with_cursor to prevent SQL injection: change the constructed search_query to use a %s placeholder for the DBMS_HYBRID_SEARCH.SEARCH first argument (e.g. DBMS_HYBRID_SEARCH.SEARCH(%s, `@search_parm`) ...) and call self._execute_query_with_cursor(conn, search_query, [table_name], use_context_manager) so the table name from CollectionNames.table_name() is bound instead of interpolated.

coderabbitai · 2026-01-28T03:21:53Z

src/pyseekdb/client/client_base.py

+        # Parse the search result JSON
+        search_result_json = rows[0]["search_result"]
+        if isinstance(search_result_json, str):
+            search_result = json.loads(search_result_json)
+        else:
+            search_result = search_result_json


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the file structure and the specific lines mentioned wc -l src/pyseekdb/client/client_base.py

Repository: oceanbase/pyseekdb

Length of output: 101

🏁 Script executed:

# Read the context around lines 2564-2573 sed -n '2540,2590p' src/pyseekdb/client/client_base.py

Repository: oceanbase/pyseekdb

Length of output: 2233

🏁 Script executed:

# Find the _transform_sql_result method definition rg -n "_transform_sql_result" src/pyseekdb/client/client_base.py

Repository: oceanbase/pyseekdb

Length of output: 182

🏁 Script executed:

# Find the _transform_search_result method definition rg -n "_transform_search_result" src/pyseekdb/client/client_base.py

Repository: oceanbase/pyseekdb

Length of output: 182

🏁 Script executed:

# Search for DBMS_HYBRID_SEARCH.SEARCH usage rg -n "DBMS_HYBRID_SEARCH\.SEARCH" src/pyseekdb/client/client_base.py -A 5 -B 5

Repository: oceanbase/pyseekdb

Length of output: 1494

🏁 Script executed:

# Read _transform_sql_result implementation sed -n '3007,3060p' src/pyseekdb/client/client_base.py

Repository: oceanbase/pyseekdb

Length of output: 2030

🏁 Script executed:

# Read _transform_search_result implementation sed -n '3107,3170p' src/pyseekdb/client/client_base.py

Repository: oceanbase/pyseekdb

Length of output: 2271

🏁 Script executed:

# Check type hints for both methods sed -n '3007,3010p' src/pyseekdb/client/client_base.py sed -n '3107,3110p' src/pyseekdb/client/client_base.py

Repository: oceanbase/pyseekdb

Length of output: 539

Validate SEARCH response schema before transforming.

_transform_sql_result expects list[dict[str, Any]], but will silently corrupt output if passed a dict with nested structure (it would iterate over dict keys instead of rows). DBMS_HYBRID_SEARCH.SEARCH may return either a flat list or a nested dict with hits key, but the code provides no schema validation. Use the existing _transform_search_result method for dict responses or extract rows accordingly.

Suggested validation

- # Transform search result to standard format - return self._transform_sql_result(search_result, include) + # Transform search result to standard format + if isinstance(search_result, dict) and "hits" in search_result: + return self._transform_search_result(search_result, include) + if isinstance(search_result, dict) and "rows" in search_result: + search_result = search_result["rows"] + if not isinstance(search_result, list): + raise ValueError(f"Unexpected SEARCH result schema: {type(search_result)}") + return self._transform_sql_result(search_result, include)

🤖 Prompt for AI Agents

In `@src/pyseekdb/client/client_base.py` around lines 2564 - 2569, The code reads search_result_json and passes it to _transform_sql_result but doesn't validate its schema; if search_result_json is a dict (e.g., a nested response with a 'hits' key from DBMS_HYBRID_SEARCH.SEARCH) _transform_sql_result will iterate keys and corrupt output. Update the parsing in the block that sets search_result (inspect variable search_result_json) to detect dict-shaped responses: if it's a dict, either extract the list of rows (e.g., search_result_json["hits"] or similar) or pass the dict to _transform_search_result to obtain a list[dict[str, Any]] before calling _transform_sql_result; ensure the final search_result is validated as a list of dicts before further processing and raise or log a clear error if the schema is unexpected.

coderabbitai · 2026-01-28T03:21:53Z

src/pyseekdb/client/client_base.py

+            search_result = json.loads(search_result_json)
+        else:
+            search_result = search_result_json
+        logger.debug(f"Search result received from DBMS_HYBRID_SEARCH.SEARCH")


⚠️ Potential issue | 🟡 Minor

Remove redundant f-string.

f"..." without placeholders triggers F541.

🧹 Proposed fix

- logger.debug(f"Search result received from DBMS_HYBRID_SEARCH.SEARCH") + logger.debug("Search result received from DBMS_HYBRID_SEARCH.SEARCH")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

logger.debug(f"Search result received from DBMS_HYBRID_SEARCH.SEARCH")

logger.debug("Search result received from DBMS_HYBRID_SEARCH.SEARCH")

🧰 Tools

🪛 Ruff (0.14.14)

2570-2570: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents

In `@src/pyseekdb/client/client_base.py` at line 2570, Replace the redundant f-string in the logging call so it isn't using an f-string with no placeholders; change the logger.debug call that currently reads logger.debug(f"Search result received from DBMS_HYBRID_SEARCH.SEARCH") to use a plain string logger.debug("Search result received from DBMS_HYBRID_SEARCH.SEARCH") in the same function where logger.debug is called.

hnwyllmm · 2026-01-31T07:01:57Z

Please sign the CLA and fix the CI/quality first.
You can execute 'make pre-commit' to fix CI/quality.

frostming · 2026-01-31T07:57:47Z

src/pyseekdb/client/client_base.py

+            search_result = json.loads(search_result_json)
+        else:
+            search_result = search_result_json
+        logger.debug(f"Search result received from DBMS_HYBRID_SEARCH.SEARCH")


There is no placeholders in this f-string, please check if this is intentional

coderabbitai bot reviewed Jan 28, 2026

View reviewed changes

zhanghuidinah mentioned this pull request Jan 30, 2026

Developer Activities: Call for Participation! oceanbase/seekdb#123

Open

3 tasks

frostming reviewed Jan 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor to use DBMS_HYBRID_SEARCH.SEARCH directly#148

Refactor to use DBMS_HYBRID_SEARCH.SEARCH directly#148
aopstudio wants to merge 1 commit intooceanbase:developfrom
aopstudio:simple_hybrid_search

aopstudio commented Jan 28, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

CLAassistant commented Jan 28, 2026

Uh oh!

coderabbitai bot commented Jan 28, 2026 •

edited

Loading

Walkthrough

Changes

Possibly related issues

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 28, 2026

Uh oh!

coderabbitai bot Jan 28, 2026

Uh oh!

coderabbitai bot Jan 28, 2026

Uh oh!

hnwyllmm commented Jan 31, 2026

Uh oh!

frostming Jan 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	logger.debug(f"Search result received from DBMS_HYBRID_SEARCH.SEARCH")
	logger.debug("Search result received from DBMS_HYBRID_SEARCH.SEARCH")

Conversation

aopstudio commented Jan 28, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Solution Description

Notes

Summary by CodeRabbit

Uh oh!

CLAassistant commented Jan 28, 2026

Uh oh!

coderabbitai bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly related issues

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

hnwyllmm commented Jan 31, 2026

Uh oh!

frostming Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

aopstudio commented Jan 28, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 28, 2026 •

edited

Loading