Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
29d35c1
Add memory profiling & leak investigation results
HexaField Feb 21, 2026
e3b2224
Add memory leak analysis and refactoring plan
HexaField Feb 21, 2026
7ee9a03
fix: Implement memory leak fixes for perspective teardown
HexaField Feb 21, 2026
8eeddde
docs: Update profiling results with Holochain conductor leak findings
HexaField Feb 21, 2026
8b4208b
cleanup: Remove dead ref counting code, simplify SurrealDB shutdown
HexaField Feb 21, 2026
18c4273
feat: WASM-based language execution runtime
HexaField Feb 20, 2026
38199bf
ci: add exploration CI workflow for fork branches
HexaField Feb 20, 2026
a2caaf5
refactor: add LanguageBackend trait for dual JS/WASM language backends
HexaField Feb 22, 2026
d248a32
fix: build fixes for WASM language runtime + LanguageBackend trait
HexaField Feb 22, 2026
7c491c9
feat: implement LinksAdapter for WASM languages
HexaField Feb 22, 2026
38d054f
feat: link-store WASM language + LinksAdapter integration tests (19/1…
HexaField Feb 22, 2026
a83eb7c
feat: wire host_hc_call to Holochain conductor via HolochainServiceIn…
HexaField Feb 23, 2026
b3dbb30
feat(wasm): add Holochain DNA install/remove/agent_key host functions…
HexaField Feb 23, 2026
127767d
feat(wasm): p-diff-sync WASM link language with embedded Holochain DNA
HexaField Feb 23, 2026
6ff03d9
fix: WASM language runtime integration
HexaField Feb 23, 2026
ef873db
feat: full WASM language discovery/download flow
HexaField Feb 23, 2026
58ad3fb
fix: CI failures — add LanguageInit impl, fix rustup in container jobs
HexaField Feb 23, 2026
916cb1d
fix: address CodeRabbit review feedback
HexaField Feb 23, 2026
0df412f
ci: drop container image, use dtolnay/rust-toolchain + setup-go
HexaField Feb 23, 2026
20be4d6
ci: add libasound2-dev for alsa-sys
HexaField Feb 23, 2026
a867f3a
ci: increase timeout to 60min for cold compiles
HexaField Feb 23, 2026
c085a56
ci: bump timeout to 90min — cold compile needs it, cache will warm af…
HexaField Feb 23, 2026
15fb43a
ci: add JS build placeholders for include_str!/include_bytes!
HexaField Feb 23, 2026
efc297a
ci: restore libasound2-dev (lost in previous rewrite)
HexaField Feb 23, 2026
43c7097
ci: add dapp/dist placeholder dir for include_dir! macro
HexaField Feb 23, 2026
151eb46
ci: restore container image for Cargo Check + Rust Tests
HexaField Feb 23, 2026
bb0ec13
chore: trigger PR sync after rebase
HexaField Mar 4, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
101 changes: 101 additions & 0 deletions .github/workflows/exploration-ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
name: Exploration CI

on:
push:
branches:
- feat/wasm-language-runtime
- feat/sqlite-link-storage
pull_request:
branches: [dev]

jobs:
wasm-sdk:
name: WASM SDK & Example
if: contains(github.head_ref || github.ref, 'wasm-language-runtime')
runs-on: ubuntu-22.04
timeout-minutes: 30
steps:
- uses: actions/checkout@v4

- name: Install Rust + wasm32 target
uses: dtolnay/rust-toolchain@stable
with:
targets: wasm32-unknown-unknown

- name: Build SDK
run: cd wasm-language-sdk && cargo build --target wasm32-unknown-unknown

- name: Build example note-store
run: cd examples/wasm-languages/note-store && cargo build --release --target wasm32-unknown-unknown

- name: Build example link-store
run: cd examples/wasm-languages/link-store && cargo build --release --target wasm32-unknown-unknown

- name: Verify WASM exports
run: |
sudo apt-get update && sudo apt-get install -y wabt || true
for wasm in examples/wasm-languages/*/target/wasm32-unknown-unknown/release/*.wasm; do
echo "=== $wasm ==="
wasm-objdump -x "$wasm" 2>/dev/null | grep -E "ad4m_" || echo "(wabt not available)"
done

cargo-check:
name: Cargo Check
runs-on: ubuntu-22.04
container:
image: coasys/ad4m-ci-linux:latest@sha256:3d6e8b6357224d689345eebd5f9da49ee5fd494b3fd976273d0cf5528f6903ab
timeout-minutes: 120
steps:
- uses: actions/checkout@v4

- name: Setup Rust
run: rustup default stable && rustc --version

- name: Create JS build placeholders
run: |
mkdir -p executor/lib
echo "// placeholder" > executor/lib/bundle.js
dd if=/dev/zero bs=1 count=64 of=rust-executor/CUSTOM_DENO_SNAPSHOT.bin 2>/dev/null
mkdir -p rust-executor/dapp/dist

- name: Rust cache
uses: Swatinem/rust-cache@v2
with:
workspaces: rust-executor
cache-on-failure: true

- name: Check default features
run: cd rust-executor && cargo check 2>&1

- name: Check wasm-languages feature
if: contains(github.head_ref || github.ref, 'wasm-language-runtime')
run: cd rust-executor && cargo check --features wasm-languages 2>&1

rust-tests:
name: Rust Tests
if: contains(github.head_ref || github.ref, 'wasm-language-runtime')
runs-on: ubuntu-22.04
container:
image: coasys/ad4m-ci-linux:latest@sha256:3d6e8b6357224d689345eebd5f9da49ee5fd494b3fd976273d0cf5528f6903ab
timeout-minutes: 120
steps:
- uses: actions/checkout@v4

- name: Setup Rust
run: rustup default stable && rustc --version

- name: Create JS build placeholders
run: |
mkdir -p executor/lib
echo "// placeholder" > executor/lib/bundle.js
dd if=/dev/zero bs=1 count=64 of=rust-executor/CUSTOM_DENO_SNAPSHOT.bin 2>/dev/null
mkdir -p rust-executor/dapp/dist

- name: Rust cache
uses: Swatinem/rust-cache@v2
with:
workspaces: rust-executor
cache-on-failure: true

- name: Run wasm_core tests
run: cd rust-executor && cargo test wasm_core --features wasm-languages -- --nocapture 2>&1
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,4 @@ members = [
#kitsune2_transport_iroh = { git = "https://github.com/lucksus/kitsune2.git", branch = "debug-logs" }
#kitsune2_transport_iroh = { path = "../../kitsune2/crates/transport_iroh" }
#kitsune2_bootstrap_client = { git = "https://github.com/lucksus/kitsune2.git", branch = "debug-logs" }
#kitsune2_bootstrap_client = { path = "../../kitsune2/crates/bootstrap_client" }
#kitsune2_bootstrap_client = { path = "../../kitsune2/crates/bootstrap_client" }
1 change: 1 addition & 0 deletions cli/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ path = "src/ad4m_executor.rs"
# Pass metal and cuda features through to ad4m-executor
metal = ["ad4m-executor/metal"]
cuda = ["ad4m-executor/cuda"]
wasm-languages = ["ad4m-executor/wasm-languages"]

[dependencies]
ad4m-client = { path = "../rust-client", version="0.12.0-rc1-dev.2" }
Expand Down
69 changes: 69 additions & 0 deletions docs/profiling/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# AD4M Memory Profiling & Leak Investigation

Profiling of the AD4M executor's memory usage during neighbourhood operations, and investigation of memory leaks during resource lifecycle (create/destroy cycles).

## Results

- **[Profiling Results](profiling-results-2026-02-21.md)** — Baseline memory measurements, per-neighbourhood growth (~140 MB each), scaling projections
- **[Leak Investigation](leak-investigation-2026-02-21.md)** — Memory recovery tests showing 0% memory freed on neighbourhood/perspective teardown

## Key Findings

### Root Cause: Holochain Conductor Memory Retention

When a neighbourhood is created, the executor clones a link language, installs it as a Holochain app, and allocates ~140MB of anonymous mmap'd memory (wasmer WASM pages + LMDB environments). When the neighbourhood is removed:

1. **AD4M-layer cleanup works correctly** — SurrealDB databases shut down, signal streams removed, languages cleaned up, Holochain apps uninstalled via `uninstall_app`
2. **Holochain conductor does not release memory** — anonymous mmap'd regions persist, large allocation count remains unchanged, RSS shows 0.0% recovery even after 60s settling

This was confirmed by comparing an unpatched binary (no cleanup) against a patched binary (full teardown) — both show identical 0% memory recovery, proving the leak is below the AD4M layer in the Holochain conductor's wasmer/LMDB memory management.

### Comparison: Original vs Patched Binary

| Metric | Original | Patched |
|--------|----------|---------|
| Post-init RSS | 747 MB | 768 MB |
| 3 NHs + 50 links each | 1201 MB (+428) | 1224 MB (+430) |
| After removing NHs (60s settle) | 1201 MB (0.0% recovery) | 1224 MB (0.0% recovery) |
| Large anon mappings: before/create/remove | 25/50/50 | 25/53/52 |
| Teardown logs firing | ❌ None | ✅ Full cleanup |
| Language cloning cost | 9.4 MB/clone | 4.6 MB/clone |

### Additional Findings

1. **Bare perspectives leak ~2.6 MB each** on create/remove cycle (both binaries).
2. **Language cloning cost halved** with the patch (9.4 → 4.6 MB/clone).
3. **Snapshot queries do not leak** — 100 queries add <1 MB.
4. **Link accumulation** — 300 links in a single neighbourhood adds ~30 MB.

## Reproduction

### Prerequisites
- Ubuntu 22.04 (tested on x86_64, 32GB RAM)
- AD4M executor binary (v0.11.1 or from this branch)
- Node.js 18+
- Bootstrap languages published or available as seed

### Running the Leak Investigation

```bash
# From the ad4m/tests/js directory
node ../../docs/profiling/leak-investigation.mjs
```

The script:
1. Starts the executor with a prepared seed
2. Runs 5 test phases: bare perspective cycles, neighbourhood create/remove, language cloning, link accumulation, and snapshot query stress
3. Measures RSS via `/proc/<pid>/smaps_rollup` with detailed memory breakdowns
4. Outputs per-test deltas and recovery rates

### Code Fixes (this branch)

The `fix: Implement memory leak fixes` commit adds:
- **Perspective teardown** — proper cleanup of Prolog pools, SurrealDB, link languages, subscribed queries, batch stores
- **Language removal** — Rust LanguageController calls JS `languageRemove()` during teardown
- **Signal stream cleanup** — removes Holochain signal callbacks on language removal
- **Language reference counting** — tracks usage to prevent premature removal
- **SurrealDB shutdown** — drops perspective databases on teardown

These fixes are necessary but not sufficient — the Holochain conductor memory retention remains an upstream issue.
133 changes: 133 additions & 0 deletions docs/profiling/leak-investigation-2026-02-21.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
# AD4M Executor Memory Leak Investigation — 2026-02-21

## Setup
- Ubuntu 22.04, x86_64, 32GB RAM
- AD4M v0.11.1 executor, Holochain 0.7.0-dev.10-coasys
- Single agent, local bootstrap, no proxy
- Measurement: `/proc/<pid>/smaps` RSS/PSS + anonymous mapping counts

---

## Finding 1: Neighbourhood teardown releases ZERO memory

**This is the critical issue.**

Created 3 neighbourhoods (each with perspective-diff-sync clone + 50 links), then removed all 3 perspectives:

| State | RSS (MB) | Anonymous (MB) | Large anon mappings |
|-------|----------|----------------|---------------------|
| Baseline (post-init) | 797.1 | — | 26 |
| After 3 neighbourhoods + 50 links each | 1212.9 | 1037.5 | 51 |
| After removing all 3 perspectives (30s settle) | 1213.2 | 1037.7 | 51 |

**Recovery: -0.2 MB of 415.9 MB (0%)**

The anonymous mapping count stays at 51 even after removal — 25 new large (>10MB) anonymous RW mappings were created by neighbourhood operations and **none were released**. The disk usage also doesn't change (134 MB in `ad4m/h/`).

**Root cause:** `perspectiveRemove` removes the perspective from the AD4M layer but does NOT:
- Uninstall the cloned Holochain hApp
- Deallocate Wasmer WASM linear memory for the cloned language
- Clean up the language from the LanguageController
- Remove Holochain conductor cell state

Each neighbourhood creates a dedicated Holochain hApp instance with its own WASM runtime (~78 MB anonymous memory). Removing the perspective leaves these resources permanently allocated.

---

## Finding 2: Bare perspective lifecycle also leaks

Created and removed 10 plain perspectives (no neighbourhood, no link language):

| State | RSS (MB) |
|-------|----------|
| Baseline | 772.6 |
| After creating 10 perspectives | 796.3 |
| After removing all 10 perspectives | 797.1 |

**Leaked: 24.4 MB** — 2.4 MB per perspective that's never recovered. This is likely SurrealDB/Prolog state and JS runtime objects not being cleaned up on perspective removal.

---

## Finding 3: Language cloning accumulates permanently

Cloned perspective-diff-sync 10 times (template + publish) without creating any neighbourhoods:

| State | RSS (MB) |
|-------|----------|
| Baseline | 1213.2 |
| After 5 clones | 1238.1 |
| After 10 clones | 1255.4 |

**~4.2 MB per clone.** Each `languageApplyTemplateAndPublish` call:
- Unpacks/repacks hApp DNA
- Writes a new `bundle.js` to the data directory (8 language directories for 10 clones — some deduplication)
- Publishes the meta to the language-language
- Does NOT unload the cloned language even if it's never used for a perspective

Disk: 7.5 MB in `ad4m/languages/`, temp directory cleaned (4KB).

---

## Finding 4: Link accumulation within a neighbourhood is modest

500 links added to a single neighbourhood in batches of 100:

| Links | RSS (MB) | Δ from 0 links |
|-------|----------|-----------------|
| 0 (neighbourhood just created) | 1252.8 | — |
| 100 | 1285.9 | +33.1 |
| 200 | 1288.5 | +35.7 |
| 300 | 1291.4 | +38.6 |
| 400 | 1312.8 | +60.0 |
| 500 | 1315.6 | +62.8 |

Growth rate: ~0.13 MB per link — sub-linear, with step jumps (likely page allocation boundaries). This is reasonable.

Querying all 500 links added negligible memory (+0.1 MB). Link removal via GQL mutations failed (schema issue with `perspectiveRemoveLink`) so we couldn't test link cleanup, but the add pattern itself isn't concerning.

---

## Finding 5: WASM virtual memory reservation is extreme

From `/proc/maps` analysis:

| State | Large anon RW mappings (>10MB) | Total anon RW virtual |
|-------|-------------------------------|----------------------|
| Post-init | 26 | 1008 MB |
| 3 neighbourhoods | 51 | 1740 MB |
| After removing perspectives | 51 | 1738 MB |
| 5 neighbourhoods (test 4) | 52 | 1919 MB |

Each Holochain hApp instance creates approximately 1 large anonymous mapping. These are Wasmer WASM linear memory regions — they reserve large virtual address space and commit physical pages as the WASM module runs. They are **never unmapped**.

---

## Summary of Leaks

| Source | Leaked per unit | Recoverable? | Severity |
|--------|----------------|---------------|----------|
| Neighbourhood create/remove cycle | ~138 MB per NH | ❌ No | **Critical** |
| Bare perspective create/remove | ~2.4 MB per perspective | ❌ No | Medium |
| Language cloning (template+publish) | ~4.2 MB per clone | ❌ No | Medium |
| Link accumulation | ~0.13 MB per link | N/A (grows, not a leak) | Low |

## Recommended Fixes

### Critical: Holochain hApp lifecycle management
When a perspective is removed (especially one backed by a neighbourhood):
1. **Uninstall the Holochain hApp** — call the conductor admin API to disable/uninstall the cell
2. **Unload the language** — remove the JS language module from the LanguageController
3. **Free WASM memory** — ensure Wasmer instances are dropped so anonymous mappings can be reclaimed
4. **Clean up disk** — remove the cloned language bundle and Holochain cell state

### Medium: Perspective cleanup
- Audit what SurrealDB/Prolog state is created per perspective and ensure it's cleaned up on removal
- Check for JS event listener leaks on perspective objects

### Medium: Language deduplication
- Consider caching compiled WASM modules across languages that share the same DNA
- Share Holochain conductor cells where the DNA hash is identical (template parameters permitting)

### Architecture consideration
- The current model where each neighbourhood = its own hApp instance with dedicated WASM runtime is fundamentally expensive (~78 MB per NH)
- Consider a shared-conductor approach where multiple neighbourhoods can share a single Holochain cell with namespace isolation, reducing the per-NH overhead from ~78 MB to potentially single-digit MB
Loading
Loading