Skip to content

fix(phenoData): surreal-bridge SurrealDB v3 API migration#47

Merged
KooshaPari merged 3 commits into
mainfrom
fix/surreal-v3-migration
May 2, 2026
Merged

fix(phenoData): surreal-bridge SurrealDB v3 API migration#47
KooshaPari merged 3 commits into
mainfrom
fix/surreal-v3-migration

Conversation

@KooshaPari
Copy link
Copy Markdown
Owner

@KooshaPari KooshaPari commented May 2, 2026

User description

Summary

SurrealDB v3 removed surrealdb::sql::Thing, surrealdb::sql::Value, the SurrealValue trait, and the typed .create().content() API.

Changes

  • RecordId is now a String type alias ("table:id" format) instead of a struct with surrealdb::sql::Thing
  • All record/Value operations replaced with serde_json::Value
  • Used raw SQL via db.query() for CREATE ... CONTENT $data RETURN id
  • select() returns Vec<serde_json::Value> directly deserializable to domain structs
  • Added extract_record_id() helper handling v3's id-object format ({"tb": "...", "id": "..."})
  • Added tempfile dev-dependency for test

Test

cargo test --package surreal-bridge → 1 pass, 0 fail

COAUTHORED_BY: Claude Opus 4.7 noreply@anthropic.com


Note

Medium Risk
Updates persistence/query code to use SurrealDB v3’s raw SQL + serde_json::Value flow and changes record ID representation, which can affect data serialization/deserialization and query results. Scope is contained to surreal-bridge plus small config additions.

Overview
Migrates crates/surreal-bridge to SurrealDB v3 by replacing typed record APIs with raw db.query() CREATE ... CONTENT $data RETURN id and serde_json::Value-based (de)serialization for Skill/Embedding and vector search results.

Changes record IDs to a String ("table:id") with a new extract_record_id() helper, updates tests accordingly (adds tempfile dev-dependency), and adds repo hygiene configs (FUNDING.yml update and new trufflehog.yml).

Reviewed by Cursor Bugbot for commit 0a425f0. Bugbot is set up for automated code reviews on this repo. Configure here.


CodeAnt-AI Description

Move SurrealDB storage to the v3-compatible format

What Changed

  • Skill and embedding records now save and load through SurrealDB v3’s JSON-based flow, so the bridge keeps working with the newer database API
  • Record IDs are now returned as plain table:id strings, and existing ID shapes from v3 are handled when records are created
  • Skill and search results are converted from raw database output into app data before use, so queries still return usable records
  • Added repo funding and secret-scan config files, and updated the bridge test setup for the new database path format

Impact

✅ SurrealDB v3 compatibility
✅ Fewer record creation failures
✅ Working skill and embedding lookups

🔄 Retrigger CodeAnt AI Review

Details

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

Phenotype Agent added 2 commits May 2, 2026 13:54
SurrealDB v3 removed `surrealdb::sql::Thing`, `surrealdb::sql::Value`,
the `SurrealValue` trait, and the typed `.create().content()` API.
- Replaced all record/Value operations with serde_json::Value
- `RecordId` is now a String alias ("table:id" format)
- Used raw SQL via `db.query()` for CREATE + RETURN id
- `select()` returns Vec<serde_json::Value> directly
- Added `extract_record_id()` helper for v3's id-object format
- Added tempfile dev-dependency for test

COAUTHORED_BY: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 2, 2026 21:17
@gemini-code-assist
Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@codeant-ai
Copy link
Copy Markdown

codeant-ai Bot commented May 2, 2026

CodeAnt AI is reviewing your PR.


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 2, 2026

Warning

Rate limit exceeded

@KooshaPari has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 27 minutes and 53 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3d54c106-621b-4b90-ba9b-5a0232b88eaa

📥 Commits

Reviewing files that changed from the base of the PR and between 94394dc and 0a425f0.

📒 Files selected for processing (4)
  • FUNDING.yml
  • crates/surreal-bridge/Cargo.toml
  • crates/surreal-bridge/src/lib.rs
  • trufflehog.yml
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/surreal-v3-migration
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch fix/surreal-v3-migration

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 27 minutes and 53 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@KooshaPari KooshaPari merged commit f141624 into main May 2, 2026
6 of 9 checks passed
@KooshaPari KooshaPari deleted the fix/surreal-v3-migration branch May 2, 2026 21:18
@codeant-ai codeant-ai Bot added the size:L This PR changes 100-499 lines, ignoring generated files label May 2, 2026
Comment on lines +81 to +92
"SELECT *, vector::distance::cosine(embedding, $query) AS score \
FROM embedding ORDER BY score ASC LIMIT $limit",
)
.bind(("query", serde_json::json!(query)))
.bind(("limit", limit))
.await?
.take(0)?;

Ok(results)

let scored: Vec<ScoredEmbedding> = results
.into_iter()
.filter_map(|r| serde_json::from_value(r).ok())
.collect();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Architect Review — HIGH

search_similar() uses the embedding field in its SQL but persisted records are written from Embedding { vector: ... }, and ScoredEmbedding also expects an embedding field; this schema mismatch means deserialization into ScoredEmbedding fails and filter_map(... .ok()) silently drops all rows, so similarity searches return empty/incomplete results in normal usage.

Suggestion: Align the write/read/search schema on a single vector field name across Embedding, the SQL query, and ScoredEmbedding, and treat decode failures as errors rather than silently omitting rows. Add an end-to-end test that stores an embedding then successfully retrieves it via search_similar().

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is an **Architect / Logical Review** comment left during a code review. These reviews are first-class, important findings — not optional suggestions. Do NOT dismiss this as a 'big architectural change' just because the title says architect review; most of these can be resolved with a small, localized fix once the intent is understood.

**Path:** crates/surreal-bridge/src/lib.rs
**Line:** 81:92
**Comment:**
	*HIGH: `search_similar()` uses the `embedding` field in its SQL but persisted records are written from `Embedding { vector: ... }`, and `ScoredEmbedding` also expects an `embedding` field; this schema mismatch means deserialization into `ScoredEmbedding` fails and `filter_map(... .ok())` silently drops all rows, so similarity searches return empty/incomplete results in normal usage.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
If a suggested approach is provided above, use it as the authoritative instruction. If no explicit code suggestion is given, you MUST still draft and apply your own minimal, localized fix — do not punt back with 'no suggestion provided, review manually'. Keep the change as small as possible: add a guard clause, gate on a loading state, reorder an await, wrap in a conditional, etc. Do not refactor surrounding code or expand scope beyond the finding.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Bugbot Autofix is ON, but it could not run because the branch was deleted or merged before autofix could start.

Reviewed by Cursor Bugbot for commit 0a425f0. Configure here.

.bind(("query", query))
let results: Vec<serde_json::Value> = self.db
.query(
"SELECT *, vector::distance::cosine(embedding, $query) AS score \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Field name mismatch causes empty search results

High Severity

The Embedding struct stores its vector data in a field called vector, but the SQL query in search_similar references a field called embedding (vector::distance::cosine(embedding, $query)), and the ScoredEmbedding struct expects a field called embedding rather than vector. The cosine distance computation operates on a non-existent field, and every result fails deserialization — but filter_map with .ok() silently swallows all errors, so search_similar always returns an empty Vec instead of surfacing the failure.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 0a425f0. Configure here.

let skills: Vec<Skill> = records
.into_iter()
.filter_map(|r| serde_json::from_value(r).ok())
.collect();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent deserialization drops mask data retrieval failures

Medium Severity

Both query_skills and search_similar use filter_map with .ok() to silently discard any records that fail to deserialize. If SurrealDB v3 returns the id field in object format ({"tb": "...", "id": "..."}) — the very format extract_record_id was added to handle — deserializing it into Option<String> will fail, and every record will be silently dropped, returning an empty Vec with no error.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 0a425f0. Configure here.

use surrealdb::engine::local::{Db, RocksDb};
use surrealdb::Surreal;

pub type RecordId = String;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Using a plain String alias for record IDs is incompatible with SurrealDB v3 responses that can emit structured ID objects ({"tb": "...", "id": ...}), so direct struct deserialization will fail for those records. Replace this alias with a custom ID type/deserializer that accepts both string and object ID formats. [type error]

Severity Level: Critical 🚨
- ❌ RecordId = String mismatches SurrealDB v3 ID objects.
- ⚠️ Skills and embeddings fail deserialization when IDs are objects.
Steps of Reproduction ✅
1. Note that SurrealDB v3 can return record IDs in an object form; the helper
`extract_record_id` defined at `crates/surreal-bridge/src/lib.rs:97-116` explicitly
documents and handles the case where a response is `{"id": {"tb": "skill", "id": "..."}}`,
converting that SurrealDB ID object into a `"table:id"` string.

2. For read operations, `PhenoSurreal::query_skills` at
`crates/surreal-bridge/src/lib.rs:50-56` calls `self.db.select("skill")`, which returns
JSON rows including an `"id"` field representing the record ID; in v3, this `"id"` may
legitimately be an object (e.g., `{"tb": "skill", "id": "..."}`) rather than a plain
`"skill:..."` string.

3. The `Skill` struct at `crates/surreal-bridge/src/lib.rs:119-129` declares `pub id:
Option<RecordId>` with `RecordId` aliased to `String` at
`crates/surreal-bridge/src/lib.rs:15`, so `serde_json::from_value::<Skill>` used in
`query_skills` (lines 52-55) expects the JSON `"id"` field to be a string; when SurrealDB
returns the documented object form, deserialization fails with a type error because a map
is provided where a string is expected.

4. Because `query_skills` wraps `serde_json::from_value` in `.ok()` and `filter_map`
(crates/surreal-bridge/src/lib.rs:52-55), any row whose `"id"` is an object is silently
dropped, meaning that callers of `query_skills` (and any future reads using `Embedding` or
`ScoredEmbedding` IDs) lose access to valid records solely because `RecordId` is a plain
`String` alias instead of a custom type or deserializer capable of accepting both string
and structured ID-object formats used by SurrealDB v3.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** crates/surreal-bridge/src/lib.rs
**Line:** 15:15
**Comment:**
	*Type Error: Using a plain `String` alias for record IDs is incompatible with SurrealDB v3 responses that can emit structured ID objects (`{"tb": "...", "id": ...}`), so direct struct deserialization will fail for those records. Replace this alias with a custom ID type/deserializer that accepts both string and object ID formats.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

Comment on lines +81 to +82
"SELECT *, vector::distance::cosine(embedding, $query) AS score \
FROM embedding ORDER BY score ASC LIMIT $limit",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The vector similarity query references embedding, but stored records use the vector field, so the cosine expression will evaluate against a non-existent field and fail (or produce invalid scores). Use the stored vector field name in the query so scoring runs against real data. [logic error]

Severity Level: Critical 🚨
- ❌ search_similar fails due to querying nonexistent embedding column.
- ⚠️ Blocks vector similarity features using PhenoSurreal::search_similar API.
Steps of Reproduction ✅
1. Initialize an embedded SurrealDB instance using `PhenoSurreal::new` defined in
`crates/surreal-bridge/src/lib.rs:22-28`, which creates a `Surreal<Db>` and selects the
`pheno` namespace and `main` database.

2. From any caller (e.g., future API layer or test), invoke
`PhenoSurreal::search_similar(&[0.1_f32; 3], 10)` implemented at
`crates/surreal-bridge/src/lib.rs:78-94`.

3. Inside `search_similar`, SurrealDB executes the query string at
`crates/surreal-bridge/src/lib.rs:81-82`:

   `SELECT *, vector::distance::cosine(embedding, $query) AS score FROM embedding ORDER BY
   score ASC LIMIT $limit`,

   which computes cosine distance against field `embedding`, even though the stored
   records defined by `Embedding` at `crates/surreal-bridge/src/lib.rs:145-151` use the
   field `vector: Vec<f32>` as the embedding column.

4. Because the `embedding` column does not exist on the `embedding` table (the actual
numeric vector is in `vector`), SurrealDB v3 will evaluate
`vector::distance::cosine(embedding, $query)` against a non-existent field, causing the
query to fail or produce `NULL` scores, and the `.await?` at
`crates/surreal-bridge/src/lib.rs:86` will return an error instead of valid similarity
results for any caller of `search_similar`.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** crates/surreal-bridge/src/lib.rs
**Line:** 81:82
**Comment:**
	*Logic Error: The vector similarity query references `embedding`, but stored records use the `vector` field, so the cosine expression will evaluate against a non-existent field and fail (or produce invalid scores). Use the stored vector field name in the query so scoring runs against real data.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

@codeant-ai
Copy link
Copy Markdown

codeant-ai Bot commented May 2, 2026

CodeAnt AI finished reviewing your PR.

@KooshaPari KooshaPari review requested due to automatic review settings May 2, 2026 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant