Skip to content

[READY FOR REVIEW] feature/ladybug-provider — add LadybugDB provider and optimize graph ingestion#63

Open
bharatsachya wants to merge 6 commits into
feat/sqllitefrom
feat/ladbug
Open

[READY FOR REVIEW] feature/ladybug-provider — add LadybugDB provider and optimize graph ingestion#63
bharatsachya wants to merge 6 commits into
feat/sqllitefrom
feat/ladbug

Conversation

@bharatsachya
Copy link
Copy Markdown
Collaborator

@bharatsachya bharatsachya commented May 22, 2026

1. Branch Naming

  • Branch: feat/ladbug (Note: branch contains a slight typo, but conforms to prefix guidelines and lowercase structure).

2. PR Title

  • [DONE] feature/ladybug-provider — add LadybugDB provider and optimize graph ingestion

3. PR Description

What changed

  • LadybugDB Provider Integration: Added @bb/ladybug package implementing the IGraphDatabaseProvider contract for LadybugDB (@ladybugdb/core).
  • Connection & Query Cache: Added a lazy-connecting connection client with a global cached Prepared Statement registry to avoid Cypher compilation overhead.
  • Snapshot Optimization: Refactored snapshotFilesToVersion in fileVersions.ts to use CREATE statements instead of Neo4j-style MERGE clauses, preventing unnecessary full-column scans in LadybugDB's columnar layout.
  • Streaming & Bulk Ingestion Pipeline: Refactored files.ts to replace single-file loop ingestion with bulkUpsertFiles. It now accepts an AsyncIterable<UpsertFileNodeInput> stream, preventing OOM crashes by writing items to disk using ParquetWriter before executing a single bulk COPY FROM command in a transaction.
  • Polymorphic Edge Binding: Solved LadybugDB parser/binder exceptions when copying polymorphic edges (CONTAINS and HAS_KEYWORD) by passing explicit query routing options:
    • COPY CONTAINS FROM '...' (FROM='Folder', TO='File')
    • COPY HAS_KEYWORD FROM '...' (FROM='File', TO='Keyword')
  • Resource Sanitation: Guaranteed unlinking of temporary Parquet files in a finally block.
  • Project Structure compliance: Created package-level README.md files at packages/ladybug/README.md and packages/ladybug/src/README.md to satisfy the monorepo's folder context contract rules.

Why

We are migrating our graph database from Neo4j (OLTP) to LadybugDB (OLAP). The previous implementation performed individual record upserts (resulting in thousands of individual COPY FROM commands) and had severe Neo4j-style queries (MERGE statements on append-only logs). These patterns caused high memory pressure, OOM risk for large repositories (50,000+ files), and database-level lock contentions. Shifting to disk-backed Parquet streams and single-transaction bulk copy operations addresses these bottlenecks.


How to test

  1. Ensure the background database is unlocked or any active backend servers accessing ladybug.lbug are terminated.
  2. Run the repository indexer CLI tool to process a remote repository using the LadybugDB provider:
    bun run bytebell index https://github.com/bharatsachya/WiseTrader
  3. "graph_provider": "ladybug",
    "sqlite_path": "/.bytebell/data.sqlite",
    "ladybug_path": "
    /.bytebell/ladybug.lbug"
    make sure your config contain these values

Just In Case:

  1. docker run -p 8000:8000
    -v /Users/zeta/.bytebell:/database
    -e LBUG_FILE=ladybug.lbug
    --rm ghcr.io/ladybugdb/explorer:latest
    run this to restart docker in case you do not see changes
  2. bytebell shutdown sometimes lbug files remain empty

@bharatsachya bharatsachya requested a review from Dead-Bytes May 22, 2026 13:11
@bharatsachya bharatsachya added Ready for Review Ready for review from reviewers atomic labels May 22, 2026
@bharatsachya
Copy link
Copy Markdown
Collaborator Author

Against #60

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

atomic Ready for Review Ready for review from reviewers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants