Skip to content

Add Obsidian vault import recipe#28

Closed
snapsynapse wants to merge 2 commits intoNateBJones-Projects:mainfrom
snapsynapse:recipe/obsidian-vault-import
Closed

Add Obsidian vault import recipe#28
snapsynapse wants to merge 2 commits intoNateBJones-Projects:mainfrom
snapsynapse:recipe/obsidian-vault-import

Conversation

@snapsynapse
Copy link
Copy Markdown
Contributor

Parses any Obsidian vault, chunks notes into atomic thoughts, generates embeddings via OpenRouter, and inserts into Supabase. Tested on 500+ note vault.

Closes #13

Summary

Adds a complete recipe for importing Obsidian vaults into Open Brain as searchable, embedded thoughts. Closes #13.

  • Parses any Obsidian vault (markdown, frontmatter, wikilinks, inline tags)
  • Hybrid chunking: heading-based splits + optional LLM distillation for long sections
  • Embeddings via OpenRouter, insert via Supabase REST API
  • Sync log prevents duplicates on re-runs
  • Tested on a 500+ note LifeHQ-pattern vault (993 thoughts, zero failures)

Vault compatibility

Tested/documented for BASB/PARA, LYT/Ideaverse, LifeHQ, FLAP, Zettelkasten, and MOC-centric patterns. No special configuration needed.

Files

  • import-obsidian.py — standalone script (~480 lines)
  • README.md — step-by-step guide with credential tracker, options, filtering docs, troubleshooting
  • requirements.txt — python-frontmatter, requests
  • .env.example, .gitignore, metadata.json

Parses any Obsidian vault, chunks notes into atomic thoughts,
generates embeddings via OpenRouter, and inserts into Supabase.
Tested on 500+ note LifeHQ-pattern vault.

Closes NateBJones-Projects#13
@justfinethanku
Copy link
Copy Markdown
Collaborator

@claude
run a review on this

@justfinethanku
Copy link
Copy Markdown
Collaborator

@claude let's do this one more time... I believe in you! give this a proper review!

Copy link
Copy Markdown
Collaborator

@justfinethanku justfinethanku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admin Review

CI didn't run (Actions blocker), so I did a manual pass.

Security: Clean — no credentials, no dangerous operations, all external calls go to OpenRouter API only.

Code quality: Solid. Proper retry logic with exponential backoff, rate limiting, sync log for idempotent re-runs, graceful error handling for encoding issues. The hybrid chunking approach (headings + optional LLM fallback) is well thought out.

Documentation: Strong README with vault compatibility table, credential tracker, filtering docs, and 5 troubleshooting entries.

Verdict: Approved. Nice first contribution!

Welcome to Open Brain — come say hi in Discord: https://discord.gg/Cgh9WJEkeG

Copy link
Copy Markdown
Collaborator

@justfinethanku justfinethanku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Obsidian Vault Import

Nice work @snapsynapse — the core code is solid, the parsing logic is thorough, and the vault compatibility table is a great addition. A few things need addressing before merge.


Must Fix (blocking)

1. PR/branch/commit naming conventions

These will fail the automated review (ob1-review.yml):

  • PR title should be [recipes] Obsidian vault import (needs [recipes] prefix)
  • Branch should be contrib/snapsynapse/obsidian-vault-import (not recipe/obsidian-vault-import)
  • Commit messages should use [recipes] prefix

See CONTRIBUTING.md — PR Format for the expected pattern.

2. Dedup gap — not using content-fingerprint-dedup primitive

The sync log (obsidian-sync-log.json) prevents duplicates on same-machine re-runs, but it's local-only. If the log is deleted, or a second user imports an overlapping vault, you get full duplicates.

The repo has a content-fingerprint-dedup primitive that solves this at the database level. The other import recipes either use it or document it as a companion.

Options (in order of preference):

  1. Compute a content_fingerprint (SHA-256 of normalized content) and include it in the insert payload — the DB unique index handles the rest
  2. Call the upsert_thought RPC instead of raw INSERT
  3. At minimum, document the primitive as a recommended companion in the README's Prerequisites section

The sync log is still useful as a performance optimization (skip embedding API calls for unchanged notes), but shouldn't be the only dedup mechanism.

3. --no-embed flag undocumented in README

import-obsidian.py line 619 defines --no-embed but it's missing from the Options table in the README. Add it — users who browse the README won't know it exists.


Should Fix (strong recommendation)

4. .env parser doesn't handle quoted values

The hand-rolled parser (lines 637–643) will include literal quote characters if a user writes:

SUPABASE_URL="https://foo.supabase.co"

Many users copy-paste from Supabase with quotes. Quick fix — strip surrounding quotes:

value = value.strip().strip('"').strip("'")

Or add python-dotenv to requirements.txt and use load_dotenv().

5. Dead code: HEADING_RE (line 297)

HEADING_RE is defined but never referenced — the actual heading splitting at line 389 uses re.split with an inline pattern. Remove it.

6. Add a cost estimate section to the README

The ChatGPT import recipe includes a cost table by export size. Since this recipe hits OpenRouter for both embeddings and optional LLM chunking, users would benefit from knowing approximate costs. Even a rough table like:

Vault size Embeddings With LLM chunking
100 notes ~$0.02 ~$0.15
500 notes ~$0.10 ~$0.75
1000+ notes ~$0.20 ~$1.50

Nice to Have

7. Rate limiting between embedding calls

Currently the script only pauses every 50 inserts (time.sleep(1) at line 871), but there's no delay between individual embedding API calls. For a 1000-thought import, that's 1000 rapid-fire calls to OpenRouter. A small delay (0.1–0.2s) between calls or a note in the README about rate limits would help.

8. Shields.io step badges

CONTRIBUTING.md recommends these for recipes. Not required, but would match the style of the extension guides and make the README more scannable.


What's Good

  • Obsidian parsing is excellent — frontmatter, wikilinks (including aliased), inline tags with false-positive stripping for code blocks/HTML. Well-engineered.
  • Hybrid chunking (whole-note → heading-split → LLM distill) is a smart approach.
  • Retry logic with exponential backoff on all API calls.
  • Vault detection (.obsidian/ check), template filtering, and the filtering transparency (--dry-run --verbose) are all thoughtful touches.
  • No security issues: REST API with JSON payloads (no SQL injection), resolved paths (no traversal), keys in env vars only.
  • README has all required sections, exceeds the troubleshooting minimum, and the vault compatibility table adds genuine value.

TL;DR: Fix the naming conventions (items 1), add content fingerprinting or document the dedup primitive (item 2), and document --no-embed (item 3). One revision should do it — the underlying work is strong.

@snapsynapse
Copy link
Copy Markdown
Contributor Author

Thanks for the thorough review @justfinethanku! All feedback addressed. Closing this PR in favor of a new one from the correctly-named branch (contrib/snapsynapse/obsidian-vault-import) as recommended (sorry).

The new PR also includes improvements discovered during live testing on an 800+ note vault:

  • Preflight connection check (catches bad credentials/missing table before any API spend)
  • Secret detection scanner (skips thoughts containing API keys, tokens, passwords)
  • Early abort on consecutive insert failures
  • Sync log only records successfully inserted notes
  • Line-buffered output for background runs

New PR incoming that references this one for review context.

snapsynapse added a commit to snapsynapse/OB1 that referenced this pull request Mar 23, 2026
…n vault import

Addresses all feedback from PR NateBJones-Projects#28: branch/commit naming conventions,
content fingerprint dedup, --no-embed docs, .env quote handling, dead
code removal, cost estimates, and rate limiting.

Adds preflight connection check, secret detection scanner, early abort
on consecutive failures, per-note sync log timestamps, and line-buffered
output. Tested on 800+ note vault (3,743 thoughts, zero data loss).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@snapsynapse snapsynapse mentioned this pull request Mar 23, 2026
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Recipe: Import Obsidian vault into Open Brain

2 participants