Skip to content

add upgrade pyseekdb case#152

Open
cms-cms wants to merge 3 commits intooceanbase:developfrom
cms-cms:develop
Open

add upgrade pyseekdb case#152
cms-cms wants to merge 3 commits intooceanbase:developfrom
cms-cms:develop

Conversation

@cms-cms
Copy link

@cms-cms cms-cms commented Jan 28, 2026

Summary

Solution Description

Summary by CodeRabbit

  • Tests
    • Added comprehensive V1-to-V2 upgrade test suite validating database migration between versions 1.0.0b7 and 1.0.0b8.
    • Tests verify data persistence, collection isolation, and functionality across both legacy and new collections throughout the upgrade process.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 28, 2026

📝 Walkthrough

Walkthrough

This pull request introduces a comprehensive V1-to-V2 upgrade test suite for pyseekdb. The addition includes a two-phase upgrade testing infrastructure with Phase1 creating v1-era collection data and Phase2 validating post-upgrade behavior, shell script orchestration for isolated virtual environment testing, pytest integration, documentation, and gitignore configuration.

Changes

Cohort / File(s) Summary
Configuration & Documentation
.gitignore, tests/v1_upgrade_v2/README.md, tests/v1_upgrade_v2/__init__.py
Added gitignore entry for upgrade state file. Created comprehensive README documenting test purpose, execution flow, environment setup, and expected assertions for v1 and v2 collection isolation. Added package initializer.
Phase 1: V1 Data Creation
tests/v1_upgrade_v2/phase1_create_v1_data.py
Script creates v1-style collection with dimension 3, inserts test embeddings and documents, validates shapes and counts, and persists upgrade state JSON recording collection metadata and test data IDs for Phase2. Handles multiple create_collection signatures for backward compatibility with older pyseekdb versions.
Phase 2: V2 Upgrade Validation
tests/v1_upgrade_v2/phase2_verify_after_upgrade.py
Script validates comprehensive V1-to-V2 upgrade behavior: confirms v1 collection persistence, tests v1 CRUD operations, creates v2 collection with HNSW, validates v1/v2 isolation, verifies list_collections and sdk_collections behavior, and tests collection deletion scenarios. Includes extensive assertion-driven checks.
Test Orchestration & Integration
tests/v1_upgrade_v2/run_upgrade_test.sh, tests/v1_upgrade_v2/test_v1_upgrade_v2.py
Shell script orchestrates isolated venv setup, installs pyseekdb==1.0.0b7 for Phase1 execution, upgrades to 1.0.0b8 for Phase2, and captures exit statuses. Pytest test module invokes shell script with 600-second timeout and provides diagnostic output on failure.

Sequence Diagram

sequenceDiagram
    participant Pytest as pytest
    participant Script as run_upgrade_test.sh
    participant Venv as Python Venv
    participant P1 as phase1_create_v1_data.py
    participant DB1 as pyseekdb v1.0.0b7
    participant P2 as phase2_verify_after_upgrade.py
    participant DB2 as pyseekdb v1.0.0b8

    Pytest->>Script: Execute upgrade test
    Script->>Venv: Create isolated venv
    Venv->>Venv: Install pyseekdb==1.0.0b7
    Script->>P1: Run Phase1 with V1
    P1->>DB1: create_collection(upgrade_test_v1)
    P1->>DB1: insert test data (embeddings, docs)
    P1->>DB1: validate v1 collection
    P1->>P1: Write upgrade_test_state.json
    P1-->>Script: Phase1 complete
    Script->>Venv: Upgrade pyseekdb to 1.0.0b8
    Script->>P2: Run Phase2 with V2
    P2->>DB2: Load state from JSON
    P2->>DB2: get_collection(v1_name)
    P2->>DB2: Verify v1 data persists
    P2->>DB2: create_collection(v2, HNSW)
    P2->>DB2: Validate v1/v2 isolation
    P2->>DB2: list_collections
    P2->>DB2: Test v1 and v2 deletion
    P2-->>Script: Phase2 complete
    Script-->>Pytest: PASSED/FAILED
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A venv hops from V1 to V2 with glee,
Phase one plants data seeds, phase two bees,
Isolation blooms, collections persist,
The upgrade test dances—no bytes are missed! 🌱✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'add upgrade pyseekdb case' directly describes the main change: adding a comprehensive V1-to-V2 upgrade test suite for pyseekdb, including test scripts, documentation, and orchestration logic.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@CLAassistant
Copy link

CLAassistant commented Jan 28, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ cms-cms
❌ chenminsi.cms


chenminsi.cms seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@tests/v1_upgrade_v2/phase2_verify_after_upgrade.py`:
- Around line 265-270: The test uses "assert False" to signal failure after
calling client.get_collection(V2_COLLECTION_NAME); replace that with an explicit
raise AssertionError(...) to avoid being skipped under Python -O; update the
block around client.get_collection(V2_COLLECTION_NAME) so that instead of
"assert False, f...)" you raise AssertionError with the same message, keeping
the surrounding try/except for ValueError intact so the expected exception
handling (and the message check that looks for "not found" or "does not exist")
still runs.

In `@tests/v1_upgrade_v2/README.md`:
- Line 41: Update the README entry to match the new, non-user-specific default
used by run_upgrade_test.sh: replace the hardcoded
"/home/chenminsi.cms/.venv_upgrade" text with the portable default path now set
in the script (or note that VENV_UPGRADE_DIR can override it), and ensure the
description references the VENV_UPGRADE_DIR environment variable and the default
database path `tests/seekdb.db` exactly as implemented in run_upgrade_test.sh so
the docs and script stay consistent.

In `@tests/v1_upgrade_v2/run_upgrade_test.sh`:
- Line 21: The default VENV_DIR is hardcoded to a user-specific absolute path;
change the fallback for VENV_DIR (the VENV_DIR variable and VENV_UPGRADE_DIR
usage) to a portable location such as a temp directory or a repo-local test
directory (e.g., derive from REPO_ROOT or use mktemp -d when VENV_UPGRADE_DIR is
unset) so other developers can run the test; also add cleanup at the end of the
script to remove the created venv directory (rm -rf "${VENV_DIR}") after the
test finishes.
🧹 Nitpick comments (7)
tests/v1_upgrade_v2/test_v1_upgrade_v2.py (1)

30-39: Consider capturing stdout/stderr for better failure diagnostics.

When the test fails, the assertion message instructs users to "run it manually for full output." Capturing the output would provide immediate diagnostics without requiring a manual re-run.

♻️ Suggested improvement
     proc = subprocess.run(
         ["bash", str(RUN_SCRIPT)],
         cwd=SCRIPT_DIR,
         env=env,
         timeout=600,
+        capture_output=True,
+        text=True,
     )
     assert proc.returncode == 0, (
         f"run_upgrade_test.sh exited with {proc.returncode}; "
-        "run it manually for full output: ./tests/v1_upgrade_v2/run_upgrade_test.sh"
+        f"run it manually for full output: ./tests/v1_upgrade_v2/run_upgrade_test.sh\n"
+        f"stdout:\n{proc.stdout}\nstderr:\n{proc.stderr}"
     )
tests/v1_upgrade_v2/phase1_create_v1_data.py (2)

14-14: Remove unused noqa directive.

The E402 rule (module level import not at top of file) is not enabled in the linter configuration, making this directive unnecessary.

♻️ Suggested fix
-import pyseekdb  # noqa: E402
+import pyseekdb

82-82: Remove extraneous f prefix from string without placeholders.

♻️ Suggested fix
-    print(f"\n✅ Verifying inserted data")
+    print("\n✅ Verifying inserted data")
tests/v1_upgrade_v2/phase2_verify_after_upgrade.py (4)

32-33: Specify explicit encoding when opening files.

The file is opened without an explicit encoding, which defaults to the platform-dependent locale encoding. For JSON files, explicitly specifying encoding='utf-8' ensures consistent behavior across different systems.

Suggested fix
-    with open(state_path) as f:
+    with open(state_path, encoding="utf-8") as f:
         return json.load(f)

207-212: Consider logging cleanup failures more visibly or re-raising after all attempts.

Catching bare Exception during cleanup is reasonable to ensure all collections are attempted, but swallowing exceptions with only a warning could mask underlying issues. Consider collecting failures and optionally re-raising after cleanup completes, or at minimum ensure the warning is visible in test output.

Alternative: collect and report failures
     # Clean up additional collections
     print(f"   Cleaning up additional v2 collections")
+    cleanup_failures = []
     for extra_coll in additional_v2_collections:
         try:
             client.delete_collection(extra_coll.name)
             print(f"      ✅ Deleted '{extra_coll.name}'")
         except Exception as e:
             print(f"      ⚠️  Failed to delete '{extra_coll.name}': {e}")
+            cleanup_failures.append((extra_coll.name, e))
+    if cleanup_failures:
+        print(f"   ⚠️  {len(cleanup_failures)} cleanup failure(s) occurred")

58-58: Multiple f-strings without placeholders detected.

Static analysis flagged several f-strings that contain no placeholders (e.g., lines 58, 81, 109, 166, 197, 206, 215, 226, 272-278). These should be regular strings. This is a minor style issue but cleaning them up would satisfy the linter.

Example fix pattern:

-    print(f"\n✅ Verifying v1 collection data from Phase1")
+    print("\n✅ Verifying v1 collection data from Phase1")

Also applies to: 81-81, 109-109, 166-166, 197-197, 206-206, 215-215, 226-226, 272-278


198-198: Consider using list unpacking for concatenation.

Per Ruff RUF005, list unpacking is slightly more idiomatic than concatenation.

Suggested fix
-    all_v2_names = [V2_COLLECTION_NAME] + additional_v2_names
+    all_v2_names = [V2_COLLECTION_NAME, *additional_v2_names]

Comment on lines +265 to +270
try:
deleted_coll = client.get_collection(V2_COLLECTION_NAME)
assert False, f"get_collection('{V2_COLLECTION_NAME}') should raise ValueError after deletion, but returned {deleted_coll}"
except ValueError as e:
assert "not found" in str(e).lower() or "does not exist" in str(e).lower(), f"Expected 'not found' or 'does not exist' in error, got: {e}"
print(f" ✅ get_collection('{V2_COLLECTION_NAME}') correctly raises ValueError: {e}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Replace assert False with raise AssertionError().

When Python is run with optimization (-O flag), assert statements are removed entirely. Using assert False to indicate test failure can silently pass in optimized mode.

Suggested fix
     try:
         deleted_coll = client.get_collection(V2_COLLECTION_NAME)
-        assert False, f"get_collection('{V2_COLLECTION_NAME}') should raise ValueError after deletion, but returned {deleted_coll}"
+        raise AssertionError(f"get_collection('{V2_COLLECTION_NAME}') should raise ValueError after deletion, but returned {deleted_coll}")
     except ValueError as e:
         assert "not found" in str(e).lower() or "does not exist" in str(e).lower(), f"Expected 'not found' or 'does not exist' in error, got: {e}"
         print(f"   ✅ get_collection('{V2_COLLECTION_NAME}') correctly raises ValueError: {e}")
🧰 Tools
🪛 Ruff (0.14.14)

267-267: Do not assert False (python -O removes these calls), raise AssertionError()

Replace assert False

(B011)

🤖 Prompt for AI Agents
In `@tests/v1_upgrade_v2/phase2_verify_after_upgrade.py` around lines 265 - 270,
The test uses "assert False" to signal failure after calling
client.get_collection(V2_COLLECTION_NAME); replace that with an explicit raise
AssertionError(...) to avoid being skipped under Python -O; update the block
around client.get_collection(V2_COLLECTION_NAME) so that instead of "assert
False, f...)" you raise AssertionError with the same message, keeping the
surrounding try/except for ValueError intact so the expected exception handling
(and the message check that looks for "not found" or "does not exist") still
runs.

./tests/v1_upgrade_v2/run_upgrade_test.sh
```

By default the script uses `tests/seekdb.db` as the database (same as integration_tests) and creates/uses the virtual env at `/home/chenminsi.cms/.venv_upgrade`. Override with `VENV_UPGRADE_DIR` if needed.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Update documentation when fixing the hardcoded path in the script.

This line references the hardcoded user-specific path. When the path in run_upgrade_test.sh is updated to a portable default, this documentation should be updated to match.

🤖 Prompt for AI Agents
In `@tests/v1_upgrade_v2/README.md` at line 41, Update the README entry to match
the new, non-user-specific default used by run_upgrade_test.sh: replace the
hardcoded "/home/chenminsi.cms/.venv_upgrade" text with the portable default
path now set in the script (or note that VENV_UPGRADE_DIR can override it), and
ensure the description references the VENV_UPGRADE_DIR environment variable and
the default database path `tests/seekdb.db` exactly as implemented in
run_upgrade_test.sh so the docs and script stay consistent.

SEEKDB_PATH="${SEEKDB_PATH:-${TESTS_DIR}/seekdb.db}"
SEEKDB_DATABASE="${SEEKDB_DATABASE:-test}"
# Virtual env moved to home to avoid large venv under repo
VENV_DIR="${VENV_UPGRADE_DIR:-/home/chenminsi.cms/.venv_upgrade}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Hardcoded user-specific path will break for other developers.

The default VENV_DIR uses an absolute path specific to one user's home directory. This will fail for anyone else running the test.

🔧 Suggested fix using a portable temp directory
-VENV_DIR="${VENV_UPGRADE_DIR:-/home/chenminsi.cms/.venv_upgrade}"
+VENV_DIR="${VENV_UPGRADE_DIR:-${TMPDIR:-/tmp}/.venv_upgrade_test_$$}"

Alternatively, place it under the repo's test directory:

-VENV_DIR="${VENV_UPGRADE_DIR:-/home/chenminsi.cms/.venv_upgrade}"
+VENV_DIR="${VENV_UPGRADE_DIR:-${SCRIPT_DIR}/.venv_upgrade}"

If using a temp directory, consider adding cleanup at the end:

# At end of script (after "RESULT: PASSED")
rm -rf "${VENV_DIR}"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
VENV_DIR="${VENV_UPGRADE_DIR:-/home/chenminsi.cms/.venv_upgrade}"
VENV_DIR="${VENV_UPGRADE_DIR:-${TMPDIR:-/tmp}/.venv_upgrade_test_$$}"
Suggested change
VENV_DIR="${VENV_UPGRADE_DIR:-/home/chenminsi.cms/.venv_upgrade}"
VENV_DIR="${VENV_UPGRADE_DIR:-${SCRIPT_DIR}/.venv_upgrade}"
🤖 Prompt for AI Agents
In `@tests/v1_upgrade_v2/run_upgrade_test.sh` at line 21, The default VENV_DIR is
hardcoded to a user-specific absolute path; change the fallback for VENV_DIR
(the VENV_DIR variable and VENV_UPGRADE_DIR usage) to a portable location such
as a temp directory or a repo-local test directory (e.g., derive from REPO_ROOT
or use mktemp -d when VENV_UPGRADE_DIR is unset) so other developers can run the
test; also add cleanup at the end of the script to remove the created venv
directory (rm -rf "${VENV_DIR}") after the test finishes.

updatecase
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants