add upgrade pyseekdb case by cms-cms · Pull Request #152 · oceanbase/pyseekdb

cms-cms · 2026-01-28T08:45:20Z

Summary

Solution Description

Summary by CodeRabbit

Tests
- Added comprehensive V1-to-V2 upgrade test suite validating database migration between versions 1.0.0b7 and 1.0.0b8.
- Tests verify data persistence, collection isolation, and functionality across both legacy and new collections throughout the upgrade process.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-28T08:45:32Z

📝 Walkthrough

Walkthrough

This pull request introduces a comprehensive V1-to-V2 upgrade test suite for pyseekdb. The addition includes a two-phase upgrade testing infrastructure with Phase1 creating v1-era collection data and Phase2 validating post-upgrade behavior, shell script orchestration for isolated virtual environment testing, pytest integration, documentation, and gitignore configuration.

Changes

Cohort / File(s)	Summary
Configuration & Documentation `.gitignore`, `tests/v1_upgrade_v2/README.md`, `tests/v1_upgrade_v2/__init__.py`	Added gitignore entry for upgrade state file. Created comprehensive README documenting test purpose, execution flow, environment setup, and expected assertions for v1 and v2 collection isolation. Added package initializer.
Phase 1: V1 Data Creation `tests/v1_upgrade_v2/phase1_create_v1_data.py`	Script creates v1-style collection with dimension 3, inserts test embeddings and documents, validates shapes and counts, and persists upgrade state JSON recording collection metadata and test data IDs for Phase2. Handles multiple `create_collection` signatures for backward compatibility with older pyseekdb versions.
Phase 2: V2 Upgrade Validation `tests/v1_upgrade_v2/phase2_verify_after_upgrade.py`	Script validates comprehensive V1-to-V2 upgrade behavior: confirms v1 collection persistence, tests v1 CRUD operations, creates v2 collection with HNSW, validates v1/v2 isolation, verifies `list_collections` and `sdk_collections` behavior, and tests collection deletion scenarios. Includes extensive assertion-driven checks.
Test Orchestration & Integration `tests/v1_upgrade_v2/run_upgrade_test.sh`, `tests/v1_upgrade_v2/test_v1_upgrade_v2.py`	Shell script orchestrates isolated venv setup, installs pyseekdb==1.0.0b7 for Phase1 execution, upgrades to 1.0.0b8 for Phase2, and captures exit statuses. Pytest test module invokes shell script with 600-second timeout and provides diagnostic output on failure.

Sequence Diagram

sequenceDiagram
    participant Pytest as pytest
    participant Script as run_upgrade_test.sh
    participant Venv as Python Venv
    participant P1 as phase1_create_v1_data.py
    participant DB1 as pyseekdb v1.0.0b7
    participant P2 as phase2_verify_after_upgrade.py
    participant DB2 as pyseekdb v1.0.0b8

    Pytest->>Script: Execute upgrade test
    Script->>Venv: Create isolated venv
    Venv->>Venv: Install pyseekdb==1.0.0b7
    Script->>P1: Run Phase1 with V1
    P1->>DB1: create_collection(upgrade_test_v1)
    P1->>DB1: insert test data (embeddings, docs)
    P1->>DB1: validate v1 collection
    P1->>P1: Write upgrade_test_state.json
    P1-->>Script: Phase1 complete
    Script->>Venv: Upgrade pyseekdb to 1.0.0b8
    Script->>P2: Run Phase2 with V2
    P2->>DB2: Load state from JSON
    P2->>DB2: get_collection(v1_name)
    P2->>DB2: Verify v1 data persists
    P2->>DB2: create_collection(v2, HNSW)
    P2->>DB2: Validate v1/v2 isolation
    P2->>DB2: list_collections
    P2->>DB2: Test v1 and v2 deletion
    P2-->>Script: Phase2 complete
    Script-->>Pytest: PASSED/FAILED

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A venv hops from V1 to V2 with glee,
Phase one plants data seeds, phase two bees,
Isolation blooms, collections persist,
The upgrade test dances—no bytes are missed! 🌱✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'add upgrade pyseekdb case' directly describes the main change: adding a comprehensive V1-to-V2 upgrade test suite for pyseekdb, including test scripts, documentation, and orchestration logic.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

CLAassistant · 2026-01-28T10:07:44Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ cms-cms
❌ chenminsi.cms

chenminsi.cms seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@tests/v1_upgrade_v2/phase2_verify_after_upgrade.py`:
- Around line 265-270: The test uses "assert False" to signal failure after
calling client.get_collection(V2_COLLECTION_NAME); replace that with an explicit
raise AssertionError(...) to avoid being skipped under Python -O; update the
block around client.get_collection(V2_COLLECTION_NAME) so that instead of
"assert False, f...)" you raise AssertionError with the same message, keeping
the surrounding try/except for ValueError intact so the expected exception
handling (and the message check that looks for "not found" or "does not exist")
still runs.

In `@tests/v1_upgrade_v2/README.md`:
- Line 41: Update the README entry to match the new, non-user-specific default
used by run_upgrade_test.sh: replace the hardcoded
"/home/chenminsi.cms/.venv_upgrade" text with the portable default path now set
in the script (or note that VENV_UPGRADE_DIR can override it), and ensure the
description references the VENV_UPGRADE_DIR environment variable and the default
database path `tests/seekdb.db` exactly as implemented in run_upgrade_test.sh so
the docs and script stay consistent.

In `@tests/v1_upgrade_v2/run_upgrade_test.sh`:
- Line 21: The default VENV_DIR is hardcoded to a user-specific absolute path;
change the fallback for VENV_DIR (the VENV_DIR variable and VENV_UPGRADE_DIR
usage) to a portable location such as a temp directory or a repo-local test
directory (e.g., derive from REPO_ROOT or use mktemp -d when VENV_UPGRADE_DIR is
unset) so other developers can run the test; also add cleanup at the end of the
script to remove the created venv directory (rm -rf "${VENV_DIR}") after the
test finishes.

🧹 Nitpick comments (7)

tests/v1_upgrade_v2/test_v1_upgrade_v2.py (1)
30-39: Consider capturing stdout/stderr for better failure diagnostics.

When the test fails, the assertion message instructs users to "run it manually for full output." Capturing the output would provide immediate diagnostics without requiring a manual re-run.
♻️ Suggested improvement
     proc = subprocess.run(
         ["bash", str(RUN_SCRIPT)],
         cwd=SCRIPT_DIR,
         env=env,
         timeout=600,
+        capture_output=True,
+        text=True,
     )
     assert proc.returncode == 0, (
         f"run_upgrade_test.sh exited with {proc.returncode}; "
-        "run it manually for full output: ./tests/v1_upgrade_v2/run_upgrade_test.sh"
+        f"run it manually for full output: ./tests/v1_upgrade_v2/run_upgrade_test.sh\n"
+        f"stdout:\n{proc.stdout}\nstderr:\n{proc.stderr}"
     )
tests/v1_upgrade_v2/phase1_create_v1_data.py (2)
14-14: Remove unused noqa directive.

The E402 rule (module level import not at top of file) is not enabled in the linter configuration, making this directive unnecessary.
♻️ Suggested fix
-import pyseekdb  # noqa: E402
+import pyseekdb
82-82: Remove extraneous f prefix from string without placeholders.
♻️ Suggested fix
-    print(f"\n✅ Verifying inserted data")
+    print("\n✅ Verifying inserted data")
tests/v1_upgrade_v2/phase2_verify_after_upgrade.py (4)
32-33: Specify explicit encoding when opening files.

The file is opened without an explicit encoding, which defaults to the platform-dependent locale encoding. For JSON files, explicitly specifying encoding='utf-8' ensures consistent behavior across different systems.
Suggested fix
-    with open(state_path) as f:
+    with open(state_path, encoding="utf-8") as f:
         return json.load(f)
207-212: Consider logging cleanup failures more visibly or re-raising after all attempts.

Catching bare Exception during cleanup is reasonable to ensure all collections are attempted, but swallowing exceptions with only a warning could mask underlying issues. Consider collecting failures and optionally re-raising after cleanup completes, or at minimum ensure the warning is visible in test output.
Alternative: collect and report failures
     # Clean up additional collections
     print(f"   Cleaning up additional v2 collections")
+    cleanup_failures = []
     for extra_coll in additional_v2_collections:
         try:
             client.delete_collection(extra_coll.name)
             print(f"      ✅ Deleted '{extra_coll.name}'")
         except Exception as e:
             print(f"      ⚠️  Failed to delete '{extra_coll.name}': {e}")
+            cleanup_failures.append((extra_coll.name, e))
+    if cleanup_failures:
+        print(f"   ⚠️  {len(cleanup_failures)} cleanup failure(s) occurred")
58-58: Multiple f-strings without placeholders detected.

Static analysis flagged several f-strings that contain no placeholders (e.g., lines 58, 81, 109, 166, 197, 206, 215, 226, 272-278). These should be regular strings. This is a minor style issue but cleaning them up would satisfy the linter.

Example fix pattern:
-    print(f"\n✅ Verifying v1 collection data from Phase1")
+    print("\n✅ Verifying v1 collection data from Phase1")
Also applies to: 81-81, 109-109, 166-166, 197-197, 206-206, 215-215, 226-226, 272-278

198-198: Consider using list unpacking for concatenation.

Per Ruff RUF005, list unpacking is slightly more idiomatic than concatenation.
Suggested fix
-    all_v2_names = [V2_COLLECTION_NAME] + additional_v2_names
+    all_v2_names = [V2_COLLECTION_NAME, *additional_v2_names]

coderabbitai · 2026-01-28T10:10:47Z

tests/v1_upgrade_v2/phase2_verify_after_upgrade.py

+    try:
+        deleted_coll = client.get_collection(V2_COLLECTION_NAME)
+        assert False, f"get_collection('{V2_COLLECTION_NAME}') should raise ValueError after deletion, but returned {deleted_coll}"
+    except ValueError as e:
+        assert "not found" in str(e).lower() or "does not exist" in str(e).lower(), f"Expected 'not found' or 'does not exist' in error, got: {e}"
+        print(f"   ✅ get_collection('{V2_COLLECTION_NAME}') correctly raises ValueError: {e}")


⚠️ Potential issue | 🟡 Minor

Replace assert False with raise AssertionError().

When Python is run with optimization (-O flag), assert statements are removed entirely. Using assert False to indicate test failure can silently pass in optimized mode.

Suggested fix

try: deleted_coll = client.get_collection(V2_COLLECTION_NAME) - assert False, f"get_collection('{V2_COLLECTION_NAME}') should raise ValueError after deletion, but returned {deleted_coll}" + raise AssertionError(f"get_collection('{V2_COLLECTION_NAME}') should raise ValueError after deletion, but returned {deleted_coll}") except ValueError as e: assert "not found" in str(e).lower() or "does not exist" in str(e).lower(), f"Expected 'not found' or 'does not exist' in error, got: {e}" print(f" ✅ get_collection('{V2_COLLECTION_NAME}') correctly raises ValueError: {e}")

🧰 Tools

🪛 Ruff (0.14.14)

267-267: Do not assert False (python -O removes these calls), raise AssertionError()

Replace assert False

(B011)

🤖 Prompt for AI Agents

In `@tests/v1_upgrade_v2/phase2_verify_after_upgrade.py` around lines 265 - 270, The test uses "assert False" to signal failure after calling client.get_collection(V2_COLLECTION_NAME); replace that with an explicit raise AssertionError(...) to avoid being skipped under Python -O; update the block around client.get_collection(V2_COLLECTION_NAME) so that instead of "assert False, f...)" you raise AssertionError with the same message, keeping the surrounding try/except for ValueError intact so the expected exception handling (and the message check that looks for "not found" or "does not exist") still runs.

coderabbitai · 2026-01-28T10:10:47Z

tests/v1_upgrade_v2/README.md

+./tests/v1_upgrade_v2/run_upgrade_test.sh
+```
+
+By default the script uses `tests/seekdb.db` as the database (same as integration_tests) and creates/uses the virtual env at `/home/chenminsi.cms/.venv_upgrade`. Override with `VENV_UPGRADE_DIR` if needed.


⚠️ Potential issue | 🟡 Minor

Update documentation when fixing the hardcoded path in the script.

This line references the hardcoded user-specific path. When the path in run_upgrade_test.sh is updated to a portable default, this documentation should be updated to match.

🤖 Prompt for AI Agents

In `@tests/v1_upgrade_v2/README.md` at line 41, Update the README entry to match the new, non-user-specific default used by run_upgrade_test.sh: replace the hardcoded "/home/chenminsi.cms/.venv_upgrade" text with the portable default path now set in the script (or note that VENV_UPGRADE_DIR can override it), and ensure the description references the VENV_UPGRADE_DIR environment variable and the default database path `tests/seekdb.db` exactly as implemented in run_upgrade_test.sh so the docs and script stay consistent.

coderabbitai · 2026-01-28T10:10:47Z

tests/v1_upgrade_v2/run_upgrade_test.sh

+SEEKDB_PATH="${SEEKDB_PATH:-${TESTS_DIR}/seekdb.db}"
+SEEKDB_DATABASE="${SEEKDB_DATABASE:-test}"
+# Virtual env moved to home to avoid large venv under repo
+VENV_DIR="${VENV_UPGRADE_DIR:-/home/chenminsi.cms/.venv_upgrade}"


⚠️ Potential issue | 🟠 Major

Hardcoded user-specific path will break for other developers.

The default VENV_DIR uses an absolute path specific to one user's home directory. This will fail for anyone else running the test.

🔧 Suggested fix using a portable temp directory

-VENV_DIR="${VENV_UPGRADE_DIR:-/home/chenminsi.cms/.venv_upgrade}" +VENV_DIR="${VENV_UPGRADE_DIR:-${TMPDIR:-/tmp}/.venv_upgrade_test_$$}"

Alternatively, place it under the repo's test directory:

-VENV_DIR="${VENV_UPGRADE_DIR:-/home/chenminsi.cms/.venv_upgrade}" +VENV_DIR="${VENV_UPGRADE_DIR:-${SCRIPT_DIR}/.venv_upgrade}"

If using a temp directory, consider adding cleanup at the end:

# At end of script (after "RESULT: PASSED") rm -rf "${VENV_DIR}"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

VENV_DIR="${VENV_UPGRADE_DIR:-/home/chenminsi.cms/.venv_upgrade}"

VENV_DIR="${VENV_UPGRADE_DIR:-${TMPDIR:-/tmp}/.venv_upgrade_test_$$}"

Suggested change

VENV_DIR="${VENV_UPGRADE_DIR:-/home/chenminsi.cms/.venv_upgrade}"

VENV_DIR="${VENV_UPGRADE_DIR:-${SCRIPT_DIR}/.venv_upgrade}"

🤖 Prompt for AI Agents

In `@tests/v1_upgrade_v2/run_upgrade_test.sh` at line 21, The default VENV_DIR is hardcoded to a user-specific absolute path; change the fallback for VENV_DIR (the VENV_DIR variable and VENV_UPGRADE_DIR usage) to a portable location such as a temp directory or a repo-local test directory (e.g., derive from REPO_ROOT or use mktemp -d when VENV_UPGRADE_DIR is unset) so other developers can run the test; also add cleanup at the end of the script to remove the created venv directory (rm -rf "${VENV_DIR}") after the test finishes.

updatecase

add upgrade pyseekdb case

b37af65

Update upgrade test scripts and state

522d6d7

coderabbitai bot reviewed Jan 28, 2026

View reviewed changes

change cases

1930c39

updatecase

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add upgrade pyseekdb case#152

add upgrade pyseekdb case#152
cms-cms wants to merge 3 commits intooceanbase:developfrom
cms-cms:develop

cms-cms commented Jan 28, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 28, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

CLAassistant commented Jan 28, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 28, 2026

Uh oh!

coderabbitai bot Jan 28, 2026

Uh oh!

coderabbitai bot Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	VENV_DIR="${VENV_UPGRADE_DIR:-/home/chenminsi.cms/.venv_upgrade}"
	VENV_DIR="${VENV_UPGRADE_DIR:-${TMPDIR:-/tmp}/.venv_upgrade_test_$$}"

	VENV_DIR="${VENV_UPGRADE_DIR:-/home/chenminsi.cms/.venv_upgrade}"
	VENV_DIR="${VENV_UPGRADE_DIR:-${SCRIPT_DIR}/.venv_upgrade}"

Conversation

cms-cms commented Jan 28, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Solution Description

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

CLAassistant commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cms-cms commented Jan 28, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 28, 2026 •

edited

Loading

CLAassistant commented Jan 28, 2026 •

edited

Loading