Skip to content

feat(worlds): add worlds provider with neurosymbolic ingestion for MemoryBench#4

Open
EthanThatOneKid wants to merge 13 commits into
mainfrom
feat/worlds-provider
Open

feat(worlds): add worlds provider with neurosymbolic ingestion for MemoryBench#4
EthanThatOneKid wants to merge 13 commits into
mainfrom
feat/worlds-provider

Conversation

@EthanThatOneKid
Copy link
Copy Markdown

Summary

Add @worlds/client as a MemoryBench provider (-p worlds) with a neurosymbolic typed-graph ingestion pipeline.

Motivation

Enables @worlds/client to be evaluated on published memory/RAG benchmarks (LoCoMo, LongMemEval) using the MemoryBench framework, alongside Supermemory, Mem0, Zep, etc. — without duplicating harness code or blurring CI semantics in worlds-client-evals.

What changed

Provider bootstrap

  • Vendored supermemoryai/memorybench (MIT) as the baseline harness
  • Added @worlds/client (jsr:@jsr/worlds__client@^0.0.14), pinned to same version family as worlds-client-evals
  • Added @comunica/query-sparql-rdfjs-lite (required by @worlds/client adapters)
  • New WorldsProvider (src/providers/worlds/):
    • In-memory LibSQL backend (self-contained, no external service)
    • Session messages ingested as RDF Turtle via client.import()
    • client.rebuildSearchIndex() after ingest for FTS/vector discoverability
    • client.search({ query }) for retrieval
    • Registered as worlds provider in CLI (-p worlds)
  • Added worlds to ProviderName union and getProviderConfig() (uses OPENAI_API_KEY for judge)
  • Added .env.example and src/providers/worlds/README.md documenting the phase mapping and smoke run

Neurosymbolic ingestion harness (Phase 1)

  • ontology.ts: Namespace constants for RDF/RDFS/OWL/XSD, schema.org, PROV-O, SKOS, and a worlds: custom namespace (https://worlds.wazoo.dev/). Exports a reusable TURTLE_PREFIXES block.
  • shapes.ts: SHACL shape definitions (SessionShape, MessageShape, ClaimShape stub) with a structural validateGraph() checker that runs during ingestion.
  • index.ts: Sessions are dual-typed as schema:Conversation + prov:Activity, messages as schema:Message + prov:Entity with schema:text, schema:position, schema:author, schema:hasPart, and prov:wasGeneratedBy provenance links. Replaces inline URI strings with ontology constants.

Design decisions

  • Turtle, not N3: Turtle is a strict subset of N3. The import pipeline parses Turtle natively; N3 extras (formulas, quantifiers) would be silently dropped.
  • PROV-O for provenance: Tracks where facts came from (which session produced which message). Scaffolding for Phase 2 claim extraction (prov:wasDerivedFrom).
  • schema:text as literal: Phase 1 reifies message content on typed nodes with provenance. Phase 2 adds claim decomposition on top.

LoCoMo smoke test

bun install
cp .env.example .env.local   # add API keys
bun run src/index.ts run -p worlds -b locomo -l 5 -j gpt-4o -m gemini-3.1-flash-lite

Relationship to worlds-client-evals

Signal Repo
sparql-handoff-valid, updates-blocked, step budget worlds-client-evals
LoCoMo QA accuracy, search latency, context tokens (MemScore) worlds-client-memorybench

Closes wazootech/worlds-client-evals#34

GitHub Sync Agent and others added 13 commits May 25, 2026 20:28
- Vendor memorybench from supermemoryai/memorybench (MIT) as baseline
- Add @worlds/client (jsr:@jsr/worlds__client@^0.0.14) dependency, same version as worlds-client-evals
- Add @comunica/query-sparql-rdfjs-lite as required by @worlds/client adapters
- Implement WorldsProvider: in-memory LibSQL, RDF/Turtle ingest, rebuildSearchIndex, client.search()
- Add worlds to ProviderName union and register in providers map
- Add worlds case to getProviderConfig() (uses OPENAI_API_KEY for judge)
- Add .env.example with required API keys
- Add src/providers/worlds/README.md documenting phase mapping and smoke run
- Fix formatting across upstream files via prettier
feat(worlds): neurosymbolic ingestion harness
… prompt

- Accept questionDate parameter for temporal reasoning on LoCoMo questions
- Strip RDF metadata (subject, predicate, graph URIs) in prompt builder,
  showing only text + relevance score to reduce token waste
- Keep full search results in search() for show-failures debuggability
- Add step-by-step reasoning format matching supermemory prompt pattern
FTS5 uses AND between tokens after stopword removal, so long questions
like "When did Caroline go to the LGBTQ support group?" match nothing.
When the full query returns empty, fall back to per-term OR-style search
with best-score dedup. Also log rebuildSearchIndex quad/chunk counts for
indexing diagnostics.

LoCoMo -l 5 now completes: 20% accuracy, Hit@4=60%, MemScore 20%/35ms/2306tok.
Wire GeminiEmbeddingService into @worlds/client search index using
@ai-sdk/google embedMany(). 768-dim vectors fused with FTS5 keywords
via Reciprocal Rank Fusion. Graceful degradation to keyword-only mode
when GOOGLE_GENERATIVE_AI_API_KEY is absent.
…ch chunks

- config.ts: worlds provider now uses googleApiKey (was openaiApiKey)
- gemini-embedding-service: switch from deprecated text-embedding-004 to
  gemini-embedding-2 with 768d output via outputDimensionality
- gemini-embedding-service: chunk embed() calls at 100 items to stay
  within BatchEmbedContents API limit
- index.ts: use searchIndexOnImport:false to defer all vectorization
  to rebuildSearchIndex() in the indexing phase
- README: document two-step ingest-once/iterate workflow with -f search

Smoke test: 0% -> 40% accuracy, Hit@10 60% -> 80%, MRR 0.18 -> 0.36
…eaker attribution

Enrich search results with session metadata via a batched SPARQL query
before passing context to the answer LLM. Stores speaker names
(schema:creator) and session participants (worlds:speakerA/B) during
ingestion, resolves them at search time, and surfaces them in the prompt
with improved temporal reasoning and speaker attribution instructions.

smoke-004: 80% accuracy (4/5), up from 40% in smoke-003.
…bility

Extract structured claims at ingest (retry, disk cache) and query them at
search time via entity-aware SPARQL. Interleave facts with hybrid search hits,
rate-limit embeddings, and use 3-pass majority voting plus equivalence rubric
for more consistent judge scores.
Add a 5-question agent runner with AI SDK tool calling, billing check, and JSONL trace logging for prompt/tool iteration via replay. Expose a WorldsProvider client getter for SPARQL tooling.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Scaffold worlds-client-memorybench: Worlds MemoryBench provider + LoCoMo smoke (-l 5)

1 participant