MetaMask · MajorLift · Jun 5, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,6 +7,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
+### Added
+
+- Add `platform` domain (extension errors/lifecycle debugging + architecture knowledge), `testing/benchmark-design`, `performance/browser-extension-profiling`, and `coding/resilient-api-collection`
+
 ## [0.1.0]
 
 ### Added

diff --git a/domains/coding/skills/resilient-api-collection/skill.md b/domains/coding/skills/resilient-api-collection/skill.md
@@ -0,0 +1,188 @@
+---
+name: resilient-api-collection
+description: Build resilient data collection scripts that paginate APIs, handle rate limits, and retry transient errors. Use when writing scrapers, API collectors, data pipelines, or any script that fetches paginated data from external APIs (GitHub GraphQL, REST APIs, etc.).
+---
+
+# Resilient API Collection Scripts
+
+## Core Architecture
+
+Every collection script needs these layers:
+
+```
+run_query()        → single request with retry + error classification
+fetch_all_pages()  → pagination loop with adaptive page sizing
+main()             → orchestration, dedup, persistence
+```
+
+## 1. Error Classification
+
+Classify errors **before** choosing a recovery strategy. Different errors need different fixes.
+
+| Error Type | Signal | Recovery |
+|---|---|---|
+| **Resource/complexity limit** | Query too expensive for server | Reduce page size |
+| **Rate limit** (primary) | 429, `X-RateLimit-Remaining: 0` | Wait until reset time |
+| **Rate limit** (secondary) | 403 + "secondary rate limit" | Exponential backoff (start 60s) |
+| **Transient server error** | 502, 503, 504, stream reset | Retry with exponential backoff |
+| **Client error** | 400, 401, 404 | Don't retry — fix the request |
+
+### CLI tools hide error details
+
+Tools like `gh`, `curl`, `httpie` surface errors differently than raw HTTP responses:
+
+- **`gh api graphql`**: "Resource limits exceeded" appears in `stderr` with non-zero exit code, NOT in the JSON response `errors` array. Always check `stderr` first, before checking `returncode`.
+- Rate limit info may be in response headers (not visible via CLI) or in error messages.
+
+```python
+# Check stderr BEFORE returncode — some errors are in stderr even on exit 0
+stderr = result.stderr.strip()
+
+if "Resource limits" in stderr or "resource limit" in stderr.lower():
+    return RESOURCE_LIMIT_SIGNAL  # caller reduces page size
+
+if result.returncode == 0:
+    data = json.loads(result.stdout)
+    # Also check JSON errors (some APIs put limits here)
+    if "errors" in data:
+        msg = data["errors"][0].get("message", "")
+        if "Resource limits" in msg or "timeout" in msg.lower():
+            return RESOURCE_LIMIT_SIGNAL
+    return data
+
+# Classify non-zero exit
+is_transient = any(s in stderr for s in [
+    "502", "503", "504", "429", "rate limit",
+    "secondary", "stream error", "CANCEL"
+])
+```
+
+## 2. Retry with Exponential Backoff
+
+```python
+MAX_RETRIES = 5
+INITIAL_BACKOFF = 5  # seconds
+
+for attempt in range(1, MAX_RETRIES + 1):
+    result = execute_request(...)
+
+    if success:
+        return result
+    if is_resource_limit(error):
+        return RESOURCE_LIMIT_SIGNAL  # don't retry, reduce page size
+    if not is_transient(error):
+        return None  # permanent failure
+    if attempt == MAX_RETRIES:
+        return None  # exhausted
+
+    wait = INITIAL_BACKOFF * (2 ** (attempt - 1))
+    log(f"Transient error (attempt {attempt}/{MAX_RETRIES}), retrying in {wait}s")
+    time.sleep(wait)
+```
+
+Key: resource-limit errors should NOT be retried — the same query will fail identically. Signal the caller to reduce page size instead.
+
+## 3. Adaptive Page Sizing
+
+Start conservatively. Halve on resource-limit errors. Set a floor.
+
+```python
+MIN_PAGE_SIZE = 5
+MAX_REDUCTIONS = 4
+page_size = 50  # not 100 — nested sub-selections multiply complexity
+
+while has_more_pages:
+    data = run_query(..., page_size=page_size)
+
+    if data == RESOURCE_LIMIT_SIGNAL:
+        reductions += 1
+        if reductions > MAX_REDUCTIONS or page_size <= MIN_PAGE_SIZE:
+            break  # can't go smaller
+        page_size = max(MIN_PAGE_SIZE, page_size // 2)
+        time.sleep(10)  # cool down before retry
+        continue  # retry same page with smaller size
+
+    # process nodes, advance cursor...
+    time.sleep(2)  # inter-page delay to avoid secondary rate limits
+```
+
+### Why 50, not 100?
+
+GraphQL query cost = `nodes × sub-selections`. A query fetching 100 PRs with `reviews(first:50)`, `participants(first:30)`, `commits(first:1)` easily exceeds GitHub's 500K node limit. Starting at 50 avoids most resource-limit errors.
+
+## 4. Deduplication and Incremental Collection
+
+Always dedup by natural key before writing. This lets re-runs extend existing data.
+
+```python
+def dedup(existing, new, key_fn):
+    by_key = {}
+    for item in existing:
+        by_key[key_fn(item)] = item
+    for item in new:
+        by_key[key_fn(item)] = item  # new overwrites old
+    return list(by_key.values())
+
+# On write:
+existing = load_json(path) if os.path.exists(path) else []
+final = dedup(existing, new_items, key_fn=lambda x: (x["repo"], x["number"]))
+save_json(path, final)
+```
+
+## 5. Observability
+
+### Force unbuffered output
+
+Python buffers stdout when output is captured (subprocess, pipe, file redirect). Progress lines never appear.
+
+```python
+import sys
+sys.stdout.reconfigure(line_buffering=True)
+# OR run with: python3 -u script.py
+```
+
+### Log structure for monitoring
+
+```
+=== repo-name (query-type) ===
+  Page 1: 50 nodes, hasNext=True (size=50)
+  Page 2: 50 nodes, hasNext=True (size=50)
+  Resource limit exceeded (page_size=50), signaling page-size reduction
+  Reducing page size to 25 and retrying page 3 (reduction 1/4)
+  Page 3: 25 nodes, hasNext=True (size=25)
+  ...
+  Total: 430 items, collected 430, 3169 sub-items
+```
+
+Every log line should include: page number, items returned, whether there are more pages, and current page size.
+
+## 6. Inter-Page Delays
+
+GitHub's secondary rate limit triggers on sustained request volume, not individual request cost. Add 2-3s between pages.
+
+```python
+PAGE_DELAY = 2  # seconds
+
+# After each successful page:
+time.sleep(PAGE_DELAY)
+
+# After a resource-limit reduction:
+time.sleep(INITIAL_BACKOFF * 2)  # longer cooldown
+```
+
+## Checklist
+
+When writing a collection script, verify:
+
+- [ ] Error classification distinguishes resource-limit from rate-limit from transient
+- [ ] Resource-limit errors reduce page size (not retry same query)
+- [ ] Transient errors retry with exponential backoff
+- [ ] Non-retryable errors fail fast
+- [ ] Page size starts at 50 or lower for nested queries
+- [ ] Page size has a floor (5-10) and max-reduction cap
+- [ ] Inter-page delay prevents secondary rate limits
+- [ ] Output is unbuffered (`-u` flag or `reconfigure`)
+- [ ] Each log line includes page number, count, hasNext, page size
+- [ ] Data is deduped by natural key before writing
+- [ ] Re-runs merge with existing data (incremental collection)
+- [ ] Collection log records run metadata (timestamps, repos, filters)
diff --git a/domains/performance/skills/browser-extension-profiling/skill.md b/domains/performance/skills/browser-extension-profiling/skill.md
@@ -0,0 +1,65 @@
+---
+maturity: experimental
+name: browser-extension-profiling
+description: Compare browser extension performance between branches using WDYR, React DevTools Profiler, and E2E benchmarks with statistical rigor.
+---
+
+# Browser Extension Profiling
+
+Methodology for profiling and comparing extension performance across branches or commits.
+
+## When To Use
+
+- Validating that a refactor reduces unnecessary re-renders (needs before/after comparison)
+- Establishing baseline metrics for a performance initiative
+- Investigating a reported UI slowdown in the extension
+
+## Do Not Use When
+
+- Single-run comparisons — statistical significance requires ≥10 runs per scenario
+- The change touches only non-render paths (background scripts, network with no UI impact)
+- Target behavior is server-side latency, not UI rendering
+
+## Workflow
+
+1. **Build both branches** with `yarn build:test` on the same machine and Chrome version
+
+2. **WDYR profiling** (unnecessary re-render counts)
+   ```bash
+   ENABLE_WHY_DID_YOU_RENDER=true yarn start
+   ```
+   Flags to watch:
+   - `different objects that are equal by value` → object recreation
+   - `different functions with the same name` → callback recreation
+   - `props object itself changed but values equal` → parent cascade
+
+3. **React DevTools Profiler** for flame graphs and commit timings
+   ```bash
+   yarn devtools:react
+   ```
+
+4. **E2E benchmarks** for scenario durations
+   ```bash
+   yarn test:e2e:benchmark
+   ```
+
+5. **Collect ≥10 runs** per scenario. Discard top/bottom 10%. Report mean, median, stddev, p75, p95.
+
+6. **Statistical threshold:** Cohen's d > 0.5 for a meaningful difference.
+
+## Common Pitfalls
+
+| Mistake | Correct Approach |
+|---------|-----------------|
+| Running branches on different machines or Chrome versions | Same machine, same Chrome, no other apps running |
+| Pooling all runs including noisy late-session ones | Compute per-round stats first; report cleanest signal with explicit round attribution |
+| Reporting absolute re-render counts without scenario context | Normalize per-action; cascade fixes show multiplied impact at root |
+| Skipping cache and state reset between runs | Clear browser cache, reset extension state for each run |
+
+## Pre-Profiling Checklist
+
+- [ ] Both branches built with `yarn build:test`
+- [ ] Same machine, same Chrome version
+- [ ] No other tabs or applications running
+- [ ] WDYR enabled: `ENABLE_WHY_DID_YOU_RENDER=true`
+- [ ] Cache and extension state cleared between runs
diff --git a/domains/platform/knowledge/extension-architecture.md b/domains/platform/knowledge/extension-architecture.md
@@ -0,0 +1,89 @@
+---
+name: extension-architecture
+domain: platform
+description: MetaMask extension — background/UI boundary, state sync, build types, key directories
+---
+
+# Extension Architecture
+
+## Background / UI Boundary
+
+The extension runs two separate JavaScript contexts that cannot share memory.
+
+| Context | Entry | Access |
+|---------|-------|--------|
+| Background (Service Worker / background page) | `app/scripts/` | DOM-less; controllers, wallet logic |
+| UI (popup/tab) | `ui/` | React + Redux; rendering only |
+| Shared | `shared/` | Constants, utilities, type definitions |
+
+Communication is message-based (Chrome runtime messaging). Code in `app/scripts/` cannot `import` from `ui/` and vice versa.
+
+## State Sync Flow
+
+```
+Controller state changes (app/scripts/)
+    ↓
+metamask-controller.js batches via debounce (200ms)
+    ↓
+UI receives batched state via sendUpdate
+    ↓
+Redux dispatches UPDATE_METAMASK_STATE
+    ↓
+Immer applies patches (structural sharing — unchanged paths keep stable references)
+    ↓
+useSelector evaluates; components re-render if output changed
+```
+
+Key file: `app/scripts/metamask-controller.js` — aggregates all controller state.
+
+## Build Types
+
+| Build | Command | Background | Security Policy |
+|-------|---------|------------|-----------------|
+| Development | `yarn start` | Webpack, hot reload | No LavaMoat |
+| Production | `yarn dist` | Browserify | LavaMoat enforced |
+| Test | `yarn build:test` | Browserify | Partial LavaMoat |
+
+LavaMoat restricts package capabilities at runtime. After adding/updating dependencies, run `yarn lavamoat:auto` to regenerate policies.
+
+## Manifest Versions
+
+| Version | Background | Lifecycle |
+|---------|------------|-----------|
+| MV3 (Chrome) | Service Worker | Can terminate and restart |
+| MV2 (Firefox) | Background Page | Always running |
+
+Errors concentrated in MV3 (99%+) → root cause is service worker lifecycle, not application logic.
+
+## Key Directories
+
+```
+app/scripts/
+├── controllers/              # Feature controllers (one per domain)
+├── lib/                      # Background utilities
+└── metamask-controller.js    # Main aggregator; 200ms debounce
+
+ui/
+├── components/               # Reusable React components
+├── pages/                    # Page-level components
+│   ├── routes/               # routes.component.tsx (high selector count)
+│   └── home/                 # home.container.js (legacy connect())
+├── ducks/                    # Redux slices
+├── selectors/                # All selectors
+│   ├── selectors.js          # Main file (~2500 lines)
+│   └── <feature>.ts          # Feature-specific selectors
+└── contexts/                 # React Context providers
+
+shared/
+├── constants/
+├── lib/
+└── modules/
+    └── selectors/
+        └── selector-creators.ts
+```
+
+## React Compiler Scope
+
+Enabled for `ui/components`, `ui/contexts`, `ui/hooks`, `ui/layouts`, `ui/pages`.
+
+Does NOT cross file boundaries — selector values from `useSelector` require manual `useMemo`.