Skip to content

feat: WASM language runtime + memory leak fixes#4

Open
HexaField wants to merge 27 commits into
profiling/memory-investigationfrom
feat/wasm-language-runtime
Open

feat: WASM language runtime + memory leak fixes#4
HexaField wants to merge 27 commits into
profiling/memory-investigationfrom
feat/wasm-language-runtime

Conversation

@HexaField
Copy link
Copy Markdown
Owner

Summary

Combined branch: memory leak investigation/fixes + WASM language runtime for AD4M.

Memory Leak Fixes (from profiling/memory-investigation)

  • Proper perspective teardown: Prolog pools, SurrealDB cache, link language, signal streams
  • Language removal via Rust LanguageController during neighbourhood teardown
  • Holochain signal callback cleanup on language removal
  • SurrealDB shutdown simplified (in-memory cache, not persistent store)
  • Finding: Root cause is Holochain conductor retaining ~140MB/neighbourhood after uninstall_app — see upstream PR #689

WASM Language Runtime

  • rust-executor/src/wasm_core/ — WASM loader, ABI v1, host functions, registry
  • wasm-language-sdk/ — Rust SDK crate for language authors (types, traits, ad4m_language! macro)
  • examples/wasm-languages/note-store/ — port of note-store to Rust WASM (121KB binary)
  • Feature-gated behind wasm-languages — zero impact when disabled
  • Wasmer 6.1.0 (matches Holochain's version)

Build & Test Results (Ubuntu 22.04, x86_64, 32GB RAM)

  • 365MB binary with feature enabled (+17MB for wasmer/cranelift)
  • Build requires mold linker on 32GB machines (GNU ld OOMs during link)
  • 5/13 unit tests pass; remaining 8 need the module name fix applied

What's Needed Next

  1. LanguageController integration — JS side needs to detect .wasm vs .js and route to Rust WASM runtime (part of Nico's JS→Rust refactor)
  2. Language installation flow — detect WASM binary → register with WasmLanguageRegistry
  3. SDK module naming#[link(wasm_import_module = "ad4m")] in SDK
  4. Host function completeness — may need perspective queries, expression storage

Profiling Docs

  • docs/profiling/README.md — overview and reproduction steps
  • docs/profiling/leak-investigation.mjs — comprehensive profiler script

@HexaField HexaField changed the base branch from dev to profiling/memory-investigation February 22, 2026 05:21
HexaField pushed a commit that referenced this pull request Feb 23, 2026
…#652)

* Surreal files per perspective wip 1

* Avoid duplicate links w/ unique index and handle lock and write errors

* Fix new remove_link on surreal service

* Rename update_surreal_cache() to persist_link_diff()

* Temporary perspective data migration from rusqlite to surreal

* fmt

* fix: address CodeRabbit issues #2, #3, coasys#7

- MIGRATION_REMOVAL_GUIDE.md: Complete sentence in heading
- migration.rs: Only mark as migrated when error_count == 0 (prevents data loss)
- surreal_service/mod.rs: Remove overly broad 'index' error check (more precise error handling)

Addresses CodeRabbit actionable comments on PR coasys#652

* fix: preserve original link status instead of hardcoding Local (issue #4)

Instead of hardcoding LinkStatus::Local, now reads link.status and uses it
(falls back to Local if None). This preserves the original link status during
import operations.

Addresses CodeRabbit actionable comment on PR coasys#652

* fix: propagate SurrealDB write failures to prevent desync (issue #1)

- retry_surreal_op now returns Result and propagates errors
- persist_link_diff now returns Result instead of silently swallowing errors
- Updated all callsites:
  - Functions returning Result: use .await? to propagate
  - Functions returning (): use .await.expect() to fail-fast
- Critical synchronization operations now fail loudly instead of silently

Addresses CodeRabbit actionable comment #1 on PR coasys#652

* fix: honor full unique constraint in SurrealDB lookups (issue coasys#6)

Updated get_link to accept and use author and timestamp parameters:
- get_link now takes optional author and timestamp
- When provided, queries using all 5 unique fields (source, target, predicate, author, timestamp)
- When not provided, falls back to 3-field lookup for backward compatibility
- Updated all callsites to pass author and timestamp from LinkExpression

This prevents returning arbitrary links when multiple authors/timestamps exist
for the same source/predicate/target combination.

Addresses CodeRabbit actionable comment coasys#6 on PR coasys#652

* fix: prevent TOCTOU race in initialize_from_db (issue #5)

Added atomic check-and-insert before storing perspective:
- Initial read-lock check remains for quick filtering
- After async initialization completes, do final write-lock check
- Only insert if another task hasn't already initialized this perspective
- Discard duplicate work and don't start background tasks if race lost

This prevents multiple tasks from creating duplicate SurrealDB services
for the same perspective UUID.

Addresses CodeRabbit actionable comment #5 on PR coasys#652

* fix: clone link.status to avoid partial move

Compilation error: link.status.unwrap_or() moves the value, preventing
use of 'link' afterwards. Use clone() to avoid the partial move.

* fix: borrow links in migration loop to avoid partial move

Changed 'for (link_expr, status) in links' to '&links' and cloned status
to avoid moving values out of the vector.

* chore: run cargo fmt and add PR fixes summary

* refactor: remove redundant variable reassignments in get_link

Addresses CodeRabbit feedback: simplified variable flow by directly
assigning query result to response instead of going through intermediate
query and results variables.

Co-authored-by: CodeRabbit AI <coderabbit@example.com>

* fix: address CodeRabbit feedback on PR coasys#652

1. Remove 'WHERE perspective = $perspective' from test queries
   - Each perspective has isolated database, no filtering needed
   - Fixed 11 test queries in surreal_service/mod.rs

2. Make status parsing case-insensitive in SurrealLink conversion
   - Now handles 'Shared'/'shared' and 'Local'/'local' correctly
   - Preserves migrated data regardless of case

3. Require author/timestamp in get_link() signature
   - Changed from Option<&str> to &str for both params
   - Removed fallback branch (always use full unique constraint)
   - Updated 4 callsites in perspective_instance.rs
   - Enforces UNIQUE index (in, out, predicate, author, timestamp)

Co-authored-by: CodeRabbit AI <coderabbit@example.com>

* reset bootstrapSeed.json

* Handle fallback sync read failure gracefully.

* Don’t proceed when migration fails.

* Fix inconsistent error handling: .expect() vs. map_err()?

* fix: handle SurrealDB service creation failure in initialize_from_db

Complements commit e57b8f8 which fixed the same issue in add_perspective().
This fix addresses the spawned task in initialize_from_db() which also had
a panicking .expect() call.

Changes:
- Replace .expect() with match expression
- Log error and return early on failure
- Prevents panic if SurrealDB creation fails (RocksDB lock, disk, permissions)

Addresses CodeRabbit feedback on PR coasys#652 (line 90 issue)

* don't fail silently if links from DB can't be parsed

* don't panic on DB write failures but log error

---------

Co-authored-by: Data <data.coasys@gmail.com>
Co-authored-by: CodeRabbit AI <coderabbit@example.com>
HexaField and others added 26 commits March 4, 2026 11:00
- Baseline profiling: 355 MB startup, 750 MB post-init, ~78 MB per neighbourhood
- Leak investigation: 0% memory recovery on neighbourhood teardown
- perspectiveRemove does not uninstall Holochain hApps or free WASM runtimes
- Bare perspectives leak ~2.4 MB each, language cloning leaks ~4.2 MB each
- Includes reproduction scripts (profiler, leak tester, publish-langs)
Detailed code-level analysis tracing all three categories of memory leaks:
1. CRITICAL: Neighbourhood teardown leaks 100% - perspectiveRemove only sets a flag,
   never uninstalls Holochain hApps, Prolog pools, SurrealDB, or JS languages
2. Bare perspectives leak ~2.4 MB each (Prolog pools + SurrealDB not freed)
3. Language cloning leaks ~4.2 MB per clone (permanent, no unload path)

Includes exact file/line references, proposed fixes ordered by priority,
and architecture recommendations (lifecycle contract, reference counting).
CRITICAL fixes:
- Fix 1: Proper teardown_background_tasks that cleans up Prolog pools,
  SurrealDB, link language, subscribed queries, and batch store
- Fix 2: Add language_remove method to Rust LanguageController to call
  JS languageController.languageRemove() during teardown
- Fix 3: Clean up Holochain signal callbacks on language removal (both
  JS #signalCallbacks and Rust signal stream StreamMap)
- Rename _remove_perspective_pool to remove_perspective_pool

MEDIUM fixes:
- Fix 4: Add reference counting for languages in LanguageController.ts
  (languageAddRef/languageReleaseRef)
- Fix 5: Add SurrealDB shutdown() method that drops all data and indexes
Baseline vs patched binary comparison confirms:
- AD4M-layer teardown works correctly (SurrealDB, signals, languages)
- Holochain conductor retains ~140MB/neighbourhood after uninstall_app
- 0% memory recovery on both original and patched binaries
- Root cause is conductor-level wasmer/LMDB memory management

Updated leak-investigation.mjs with v2 improvements:
- Fixed GQL schema for DecoratedLinkExpression
- Added detailed smaps breakdown per test phase
- Added large anon mapping tracking across lifecycle
- Added teardown log verification
- Remove languageAddRef/languageReleaseRef and #languageRefCounts from
  LanguageController.ts — these were never called from any code path
- Simplify SurrealDB shutdown() to just log — SurrealDB uses in-memory
  storage (Surreal::new::<Mem>), so explicit DELETE/REMOVE INDEX is
  unnecessary; memory is freed when the Arc<Surreal<Db>> is dropped
Adds a WASM language runtime that enables AD4M Language modules to be
compiled to WebAssembly and executed in the Wasmer runtime (same engine
Holochain uses). This eliminates the need for V8/Deno for languages that
target WASM, reducing per-language memory overhead.

Components:
- rust-executor/src/wasm_core/ — WASM loader, ABI, host functions, registry
- wasm-language-sdk/ — Rust SDK crate for language authors (types, traits, macros)
- examples/wasm-languages/note-store/ — port of note-store language to Rust/WASM

Key design:
- ABI versioned from day one (AD4M_LANGUAGE_ABI_VERSION = 1)
- Fat pointer encoding (u64) for passing data across WASM boundary
- JSON serialisation for structured data
- Per-language isolation (each gets own WASM instance + linear memory)
- Host functions mirror Deno ops: agent_did, agent_sign, hash, etc.
- Feature-gated: cargo check --features wasm-languages
- Does not break existing Deno/JS language path

The example note-store language compiles to a 119KB WASM binary with all
required exports (ad4m_alloc, ad4m_dealloc, ad4m_expression_get, etc.)
and imports only the host functions it actually uses.
- Add LanguageBackend async trait in languages/language.rs abstracting
  sync, commit, current_revision, render, others, telepresence methods
- Implement LanguageBackend for existing JS Language (unchanged behavior)
- Add WasmLanguage backend (feature-gated behind wasm-languages) wrapping
  WasmLanguageInstance with sensible defaults for unimplemented methods
- Update LanguageController::language_by_address to check WASM registry
  first, falling back to JS
- Add install_wasm_language and is_wasm_bundle helpers (wasm-languages)
- Update language_remove to handle WASM languages
- Update perspective_instance.rs to use Arc<Mutex<dyn LanguageBackend>>
  instead of concrete Language type
- Add async-trait dependency, fix duplicate surrealdb dep in Cargo.toml
- Fix schema.gql symlink (core/lib/src -> tests/js)
- Fix AgentContext/did_for_context/sign_for_context -> agent::did()/sign()
- Fix create_signed_expression to use 1-arg API
- Remove conflicting From<WasmLanguageError> impl (blanket covers it)
- Fix perspective_instance to use Box<dyn LanguageBackend> in Arc<Mutex<>>
- Add set_app_data_path to perspectives/mod.rs (merge gap)
- Forward wasm-languages feature through cli/Cargo.toml
- Add LinksAdapter trait to wasm-language-sdk with sync/commit/render/current_revision/others
- Add ad4m_links_adapter! macro for optional WASM export generation
- Add has_links_adapter capability detection in host
- Add sync/commit/render/current_revision/others methods to WasmLanguageInstance
- Wire WasmLanguage backend to call through to WASM instance methods
- Add WASM bundle detection in JS LanguageController (magic bytes check)
- Add WASM install path in Rust LanguageController.install_language
…9 pass)

- New example: link-store WASM language with full LinksAdapter (sync, commit, render, current_revision, others)
- Fix HOST_MODULE_NAME: "ad4m" -> "env" to match extern "C" default imports
- Remove duplicate inline mod tests from wasm_core/mod.rs
- 7 new LinksAdapter tests + rebuilt WASM fixtures
…terface

- Update AbiHcCallRequest: replace dna_hash/agent_pubkey with dna_nick
- Add tokio_handle to HostEnv for sync->async bridging
- Implement host_hc_call using block_in_place + handle.block_on
- Use maybe_get_holochain_service() for defensive error handling
- Update SDK: new holochain_call(dna_nick, zome_name, fn_name, payload) API
- Deprecate old hc_call() in SDK
… and ad4m_init lifecycle hook

- Add hc_install_app, hc_remove_app, hc_get_agent_key host functions to wasm_core
- Register new host functions in WASM imports
- Add ad4m_init lifecycle hook: called after WASM instantiation for DNA setup
- Add SDK bindings: holochain_install_app, holochain_remove_app, holochain_get_agent_key
- Add LanguageInit trait with default no-op init() method
- Generate ad4m_init export in ad4m_language! macro
- New p-diff-sync-wasm example: real Holochain-backed link language
  - Embeds 1.1MB Perspective-Diff-Sync .happ bundle via include_bytes!
  - Implements full LinksAdapter (sync, commit, render, current_revision, others)
  - Uses rmp-serde for msgpack serialization to match zome ABI
  - DNA installed via ad4m_init lifecycle hook
  - All zome calls proxied through holochain_call host function
- Fix SDK macro: ad4m_teardown was missing closing brace (ad4m_init nested inside)
- Make tokio Handle optional in HostEnv (Handle::try_current)
  - Allows WASM tests to run without tokio runtime
  - Host functions gracefully return null when no runtime available
- 17/17 WASM tests passing (p-diff-sync correctly fails without conductor)
- Compiled WASM: 1.4MB (1.1MB DNA + ~300KB code)
- Fix snapshot not being re-embedded (add cargo:rerun-if-changed to build.rs)
- Restore is_initialized() guard in agent_load() to prevent crash on fresh data
- Add install_wasm_language op to languages extension (JS + Rust)
- Add languageInstallWasm GQL mutation for WASM language installation
- Route expressionCreate/expressionRaw through WASM backend when applicable
- Fix misleading comments about host module namespace (env, not ad4m)

All 21 WASM unit tests passing.
Integration test: agent gen, perspective CRUD, WASM install, expression ops all working.
- Add app_data_path to LanguageController for Rust-native path resolution
- Implement install_language WASM detection: checks local bundle.wasm,
  then fetches from language language and detects base64-encoded WASM
  (AGFzbQ magic prefix), then falls back to JS install
- Add install_wasm_from_base64: decodes, verifies WASM magic, saves to
  languages dir, registers in WASM runtime
- Add publish_wasm_language: base64-encodes WASM binary, adds
  bundleType:wasm to meta, publishes via language language
- Add languagePublishWasm GQL mutation
- language_source query returns base64 WASM for WASM languages
- Integration test v4: 10/10 tests passing (install, expressions,
  source query, perspective links, publish, base64 detection, memory)
- 21/21 WASM unit tests passing
- Add LanguageInit impl to note-store and link-store examples (macro requires it)
- Add rustup default stable to container-based CI jobs (coasys/ad4m-ci-linux
  container lacks default toolchain)
- Fix p-diff-sync teardown to use stored app_id instead of agent DID
- Error on invalid meta JSON in publish_wasm_language instead of silent fallback
- Delete bundle files on WASM language removal
- Fix CI workflow: use github.head_ref for PR branch detection
The coasys/ad4m-ci-linux container was timing out (1h35m) on GitHub
Actions runners. Switch to installing deps directly — matches what
the WASM SDK job already does successfully.
bundle.js and CUSTOM_DENO_SNAPSHOT.bin are embedded at compile time
but only built by the JS build step. Create placeholder files so
cargo check can pass without a full JS build.
surrealdb-rocksdb takes 50+ min to compile from source on free runners.
The container image has it pre-built. Keep WASM SDK on bare runner (fast).
Bump timeout to 120min for container pull + compile.
@HexaField HexaField force-pushed the feat/wasm-language-runtime branch from bd50f7d to 151eb46 Compare March 4, 2026 00:02
HexaField pushed a commit that referenced this pull request Mar 6, 2026
…rdcoding

Address review comments #3 and #4 from Nico:
- subscriptions.rs: When no explicit predicate is provided, load the
  SHACL class definition and derive the SurrealQL query from its
  property predicates (IN clause). No more hardcoded flux:// types.
- shacl.rs: Enrich ShaclClass with shape_uri and all_predicates fields,
  providing enough metadata to construct targeted queries without
  type-specific knowledge.
- Add load_class_properties_with_uri() to return shape URI alongside
  properties for richer metadata.
HexaField added a commit that referenced this pull request Apr 21, 2026
- Remove duplicate main.rs from rust-executor (Nico #1)
- Rename SparqlService → SparqlStore, move to perspectives/ (Nico #2)
- Remove oxigraph feature flag, make it a regular dependency (Nico #4)
- Extract duplicate SPARQL query logic in ModelQueryBuilder (Nico coasys#7)
- Replace all SurrealDB syntax with native SPARQL in:
  - decorators.ts: buildConformanceFilter generates SPARQL getters (Nico #2, James coasys#9)
  - query-utils.ts: buildWhereCondition generates SPARQL patterns (James coasys#11)
  - shacl-gen.ts: getter generation uses SPARQL (James coasys#13)
  - hydration.ts: remove convertGetterToSPARQL, use prepareGetterQuery
    with native SPARQL support + legacy fallback (Nico coasys#8, James coasys#10)
  - relation-filtering.test.ts: update expectations (James coasys#12)
  - model-getters.test.ts: update to SPARQL syntax (James coasys#14)
  - prolog-and-literals.test.ts: update to SPARQL syntax (James coasys#15)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant