fix(phenoData): surreal-bridge SurrealDB v3 API migration#47
Conversation
SurrealDB v3 removed `surrealdb::sql::Thing`, `surrealdb::sql::Value`,
the `SurrealValue` trait, and the typed `.create().content()` API.
- Replaced all record/Value operations with serde_json::Value
- `RecordId` is now a String alias ("table:id" format)
- Used raw SQL via `db.query()` for CREATE + RETURN id
- `select()` returns Vec<serde_json::Value> directly
- Added `extract_record_id()` helper for v3's id-object format
- Added tempfile dev-dependency for test
COAUTHORED_BY: Claude Opus 4.7 <noreply@anthropic.com>
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
CodeAnt AI is reviewing your PR. Thanks for using CodeAnt! 🎉We're free for open-source projects. if you're enjoying it, help us grow by sharing. Share on X · |
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (4)
✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 27 minutes and 53 seconds.Comment |
| "SELECT *, vector::distance::cosine(embedding, $query) AS score \ | ||
| FROM embedding ORDER BY score ASC LIMIT $limit", | ||
| ) | ||
| .bind(("query", serde_json::json!(query))) | ||
| .bind(("limit", limit)) | ||
| .await? | ||
| .take(0)?; | ||
|
|
||
| Ok(results) | ||
|
|
||
| let scored: Vec<ScoredEmbedding> = results | ||
| .into_iter() | ||
| .filter_map(|r| serde_json::from_value(r).ok()) | ||
| .collect(); |
There was a problem hiding this comment.
🟠 Architect Review — HIGH
search_similar() uses the embedding field in its SQL but persisted records are written from Embedding { vector: ... }, and ScoredEmbedding also expects an embedding field; this schema mismatch means deserialization into ScoredEmbedding fails and filter_map(... .ok()) silently drops all rows, so similarity searches return empty/incomplete results in normal usage.
Suggestion: Align the write/read/search schema on a single vector field name across Embedding, the SQL query, and ScoredEmbedding, and treat decode failures as errors rather than silently omitting rows. Add an end-to-end test that stores an embedding then successfully retrieves it via search_similar().
Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is an **Architect / Logical Review** comment left during a code review. These reviews are first-class, important findings — not optional suggestions. Do NOT dismiss this as a 'big architectural change' just because the title says architect review; most of these can be resolved with a small, localized fix once the intent is understood.
**Path:** crates/surreal-bridge/src/lib.rs
**Line:** 81:92
**Comment:**
*HIGH: `search_similar()` uses the `embedding` field in its SQL but persisted records are written from `Embedding { vector: ... }`, and `ScoredEmbedding` also expects an `embedding` field; this schema mismatch means deserialization into `ScoredEmbedding` fails and `filter_map(... .ok())` silently drops all rows, so similarity searches return empty/incomplete results in normal usage.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
If a suggested approach is provided above, use it as the authoritative instruction. If no explicit code suggestion is given, you MUST still draft and apply your own minimal, localized fix — do not punt back with 'no suggestion provided, review manually'. Keep the change as small as possible: add a guard clause, gate on a loading state, reorder an await, wrap in a conditional, etc. Do not refactor surrounding code or expand scope beyond the finding.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fixThere was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is ON, but it could not run because the branch was deleted or merged before autofix could start.
Reviewed by Cursor Bugbot for commit 0a425f0. Configure here.
| .bind(("query", query)) | ||
| let results: Vec<serde_json::Value> = self.db | ||
| .query( | ||
| "SELECT *, vector::distance::cosine(embedding, $query) AS score \ |
There was a problem hiding this comment.
Field name mismatch causes empty search results
High Severity
The Embedding struct stores its vector data in a field called vector, but the SQL query in search_similar references a field called embedding (vector::distance::cosine(embedding, $query)), and the ScoredEmbedding struct expects a field called embedding rather than vector. The cosine distance computation operates on a non-existent field, and every result fails deserialization — but filter_map with .ok() silently swallows all errors, so search_similar always returns an empty Vec instead of surfacing the failure.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 0a425f0. Configure here.
| let skills: Vec<Skill> = records | ||
| .into_iter() | ||
| .filter_map(|r| serde_json::from_value(r).ok()) | ||
| .collect(); |
There was a problem hiding this comment.
Silent deserialization drops mask data retrieval failures
Medium Severity
Both query_skills and search_similar use filter_map with .ok() to silently discard any records that fail to deserialize. If SurrealDB v3 returns the id field in object format ({"tb": "...", "id": "..."}) — the very format extract_record_id was added to handle — deserializing it into Option<String> will fail, and every record will be silently dropped, returning an empty Vec with no error.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 0a425f0. Configure here.
| use surrealdb::engine::local::{Db, RocksDb}; | ||
| use surrealdb::Surreal; | ||
|
|
||
| pub type RecordId = String; |
There was a problem hiding this comment.
Suggestion: Using a plain String alias for record IDs is incompatible with SurrealDB v3 responses that can emit structured ID objects ({"tb": "...", "id": ...}), so direct struct deserialization will fail for those records. Replace this alias with a custom ID type/deserializer that accepts both string and object ID formats. [type error]
Severity Level: Critical 🚨
- ❌ RecordId = String mismatches SurrealDB v3 ID objects.
- ⚠️ Skills and embeddings fail deserialization when IDs are objects.Steps of Reproduction ✅
1. Note that SurrealDB v3 can return record IDs in an object form; the helper
`extract_record_id` defined at `crates/surreal-bridge/src/lib.rs:97-116` explicitly
documents and handles the case where a response is `{"id": {"tb": "skill", "id": "..."}}`,
converting that SurrealDB ID object into a `"table:id"` string.
2. For read operations, `PhenoSurreal::query_skills` at
`crates/surreal-bridge/src/lib.rs:50-56` calls `self.db.select("skill")`, which returns
JSON rows including an `"id"` field representing the record ID; in v3, this `"id"` may
legitimately be an object (e.g., `{"tb": "skill", "id": "..."}`) rather than a plain
`"skill:..."` string.
3. The `Skill` struct at `crates/surreal-bridge/src/lib.rs:119-129` declares `pub id:
Option<RecordId>` with `RecordId` aliased to `String` at
`crates/surreal-bridge/src/lib.rs:15`, so `serde_json::from_value::<Skill>` used in
`query_skills` (lines 52-55) expects the JSON `"id"` field to be a string; when SurrealDB
returns the documented object form, deserialization fails with a type error because a map
is provided where a string is expected.
4. Because `query_skills` wraps `serde_json::from_value` in `.ok()` and `filter_map`
(crates/surreal-bridge/src/lib.rs:52-55), any row whose `"id"` is an object is silently
dropped, meaning that callers of `query_skills` (and any future reads using `Embedding` or
`ScoredEmbedding` IDs) lose access to valid records solely because `RecordId` is a plain
`String` alias instead of a custom type or deserializer capable of accepting both string
and structured ID-object formats used by SurrealDB v3.Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** crates/surreal-bridge/src/lib.rs
**Line:** 15:15
**Comment:**
*Type Error: Using a plain `String` alias for record IDs is incompatible with SurrealDB v3 responses that can emit structured ID objects (`{"tb": "...", "id": ...}`), so direct struct deserialization will fail for those records. Replace this alias with a custom ID type/deserializer that accepts both string and object ID formats.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix| "SELECT *, vector::distance::cosine(embedding, $query) AS score \ | ||
| FROM embedding ORDER BY score ASC LIMIT $limit", |
There was a problem hiding this comment.
Suggestion: The vector similarity query references embedding, but stored records use the vector field, so the cosine expression will evaluate against a non-existent field and fail (or produce invalid scores). Use the stored vector field name in the query so scoring runs against real data. [logic error]
Severity Level: Critical 🚨
- ❌ search_similar fails due to querying nonexistent embedding column.
- ⚠️ Blocks vector similarity features using PhenoSurreal::search_similar API.Steps of Reproduction ✅
1. Initialize an embedded SurrealDB instance using `PhenoSurreal::new` defined in
`crates/surreal-bridge/src/lib.rs:22-28`, which creates a `Surreal<Db>` and selects the
`pheno` namespace and `main` database.
2. From any caller (e.g., future API layer or test), invoke
`PhenoSurreal::search_similar(&[0.1_f32; 3], 10)` implemented at
`crates/surreal-bridge/src/lib.rs:78-94`.
3. Inside `search_similar`, SurrealDB executes the query string at
`crates/surreal-bridge/src/lib.rs:81-82`:
`SELECT *, vector::distance::cosine(embedding, $query) AS score FROM embedding ORDER BY
score ASC LIMIT $limit`,
which computes cosine distance against field `embedding`, even though the stored
records defined by `Embedding` at `crates/surreal-bridge/src/lib.rs:145-151` use the
field `vector: Vec<f32>` as the embedding column.
4. Because the `embedding` column does not exist on the `embedding` table (the actual
numeric vector is in `vector`), SurrealDB v3 will evaluate
`vector::distance::cosine(embedding, $query)` against a non-existent field, causing the
query to fail or produce `NULL` scores, and the `.await?` at
`crates/surreal-bridge/src/lib.rs:86` will return an error instead of valid similarity
results for any caller of `search_similar`.Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** crates/surreal-bridge/src/lib.rs
**Line:** 81:82
**Comment:**
*Logic Error: The vector similarity query references `embedding`, but stored records use the `vector` field, so the cosine expression will evaluate against a non-existent field and fail (or produce invalid scores). Use the stored vector field name in the query so scoring runs against real data.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix|
CodeAnt AI finished reviewing your PR. |


User description
Summary
SurrealDB v3 removed
surrealdb::sql::Thing,surrealdb::sql::Value, theSurrealValuetrait, and the typed.create().content()API.Changes
RecordIdis now aStringtype alias ("table:id"format) instead of a struct withsurrealdb::sql::Thingserde_json::Valuedb.query()forCREATE ... CONTENT $data RETURN idselect()returnsVec<serde_json::Value>directly deserializable to domain structsextract_record_id()helper handling v3's id-object format ({"tb": "...", "id": "..."})tempfiledev-dependency for testTest
cargo test --package surreal-bridge→ 1 pass, 0 failCOAUTHORED_BY: Claude Opus 4.7 noreply@anthropic.com
Note
Medium Risk
Updates persistence/query code to use SurrealDB v3’s raw SQL +
serde_json::Valueflow and changes record ID representation, which can affect data serialization/deserialization and query results. Scope is contained tosurreal-bridgeplus small config additions.Overview
Migrates
crates/surreal-bridgeto SurrealDB v3 by replacing typed record APIs with rawdb.query()CREATE ... CONTENT $data RETURN idandserde_json::Value-based (de)serialization forSkill/Embeddingand vector search results.Changes record IDs to a
String("table:id") with a newextract_record_id()helper, updates tests accordingly (addstempfiledev-dependency), and adds repo hygiene configs (FUNDING.ymlupdate and newtrufflehog.yml).Reviewed by Cursor Bugbot for commit 0a425f0. Bugbot is set up for automated code reviews on this repo. Configure here.
CodeAnt-AI Description
Move SurrealDB storage to the v3-compatible format
What Changed
table:idstrings, and existing ID shapes from v3 are handled when records are createdImpact
✅ SurrealDB v3 compatibility✅ Fewer record creation failures✅ Working skill and embedding lookups🔄 Retrigger CodeAnt AI Review
Details
💡 Usage Guide
Checking Your Pull Request
Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.
Talking to CodeAnt AI
Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:
This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.
Example
Preserve Org Learnings with CodeAnt
You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:
This helps CodeAnt AI learn and adapt to your team's coding style and standards.
Example
Retrigger review
Ask CodeAnt AI to review the PR again, by typing:
Check Your Repository Health
To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.