Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Added

- Add `platform` domain (extension errors/lifecycle debugging + architecture knowledge), `testing/benchmark-design`, `performance/browser-extension-profiling`, and `coding/resilient-api-collection`

## [0.1.0]

### Added
Expand Down
188 changes: 188 additions & 0 deletions domains/coding/skills/resilient-api-collection/skill.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,188 @@
---
name: resilient-api-collection
description: Build resilient data collection scripts that paginate APIs, handle rate limits, and retry transient errors. Use when writing scrapers, API collectors, data pipelines, or any script that fetches paginated data from external APIs (GitHub GraphQL, REST APIs, etc.).
---

# Resilient API Collection Scripts

## Core Architecture

Every collection script needs these layers:

```
run_query() → single request with retry + error classification
fetch_all_pages() → pagination loop with adaptive page sizing
main() → orchestration, dedup, persistence
```

## 1. Error Classification

Classify errors **before** choosing a recovery strategy. Different errors need different fixes.

| Error Type | Signal | Recovery |
|---|---|---|
| **Resource/complexity limit** | Query too expensive for server | Reduce page size |
| **Rate limit** (primary) | 429, `X-RateLimit-Remaining: 0` | Wait until reset time |
| **Rate limit** (secondary) | 403 + "secondary rate limit" | Exponential backoff (start 60s) |
| **Transient server error** | 502, 503, 504, stream reset | Retry with exponential backoff |
| **Client error** | 400, 401, 404 | Don't retry — fix the request |

### CLI tools hide error details

Tools like `gh`, `curl`, `httpie` surface errors differently than raw HTTP responses:

- **`gh api graphql`**: "Resource limits exceeded" appears in `stderr` with non-zero exit code, NOT in the JSON response `errors` array. Always check `stderr` first, before checking `returncode`.
- Rate limit info may be in response headers (not visible via CLI) or in error messages.

```python
# Check stderr BEFORE returncode — some errors are in stderr even on exit 0
stderr = result.stderr.strip()

if "Resource limits" in stderr or "resource limit" in stderr.lower():
return RESOURCE_LIMIT_SIGNAL # caller reduces page size

if result.returncode == 0:
data = json.loads(result.stdout)
# Also check JSON errors (some APIs put limits here)
if "errors" in data:
msg = data["errors"][0].get("message", "")
if "Resource limits" in msg or "timeout" in msg.lower():
return RESOURCE_LIMIT_SIGNAL
return data

# Classify non-zero exit
is_transient = any(s in stderr for s in [
"502", "503", "504", "429", "rate limit",
"secondary", "stream error", "CANCEL"
])
```

## 2. Retry with Exponential Backoff

```python
MAX_RETRIES = 5
INITIAL_BACKOFF = 5 # seconds

for attempt in range(1, MAX_RETRIES + 1):
result = execute_request(...)

if success:
return result
if is_resource_limit(error):
return RESOURCE_LIMIT_SIGNAL # don't retry, reduce page size
if not is_transient(error):
return None # permanent failure
if attempt == MAX_RETRIES:
return None # exhausted

wait = INITIAL_BACKOFF * (2 ** (attempt - 1))
log(f"Transient error (attempt {attempt}/{MAX_RETRIES}), retrying in {wait}s")
time.sleep(wait)
```

Key: resource-limit errors should NOT be retried — the same query will fail identically. Signal the caller to reduce page size instead.

## 3. Adaptive Page Sizing

Start conservatively. Halve on resource-limit errors. Set a floor.

```python
MIN_PAGE_SIZE = 5
MAX_REDUCTIONS = 4
page_size = 50 # not 100 — nested sub-selections multiply complexity

while has_more_pages:
data = run_query(..., page_size=page_size)

if data == RESOURCE_LIMIT_SIGNAL:
reductions += 1
if reductions > MAX_REDUCTIONS or page_size <= MIN_PAGE_SIZE:
break # can't go smaller
page_size = max(MIN_PAGE_SIZE, page_size // 2)
time.sleep(10) # cool down before retry
continue # retry same page with smaller size

# process nodes, advance cursor...
time.sleep(2) # inter-page delay to avoid secondary rate limits
```

### Why 50, not 100?

GraphQL query cost = `nodes × sub-selections`. A query fetching 100 PRs with `reviews(first:50)`, `participants(first:30)`, `commits(first:1)` easily exceeds GitHub's 500K node limit. Starting at 50 avoids most resource-limit errors.

## 4. Deduplication and Incremental Collection

Always dedup by natural key before writing. This lets re-runs extend existing data.

```python
def dedup(existing, new, key_fn):
by_key = {}
for item in existing:
by_key[key_fn(item)] = item
for item in new:
by_key[key_fn(item)] = item # new overwrites old
return list(by_key.values())

# On write:
existing = load_json(path) if os.path.exists(path) else []
final = dedup(existing, new_items, key_fn=lambda x: (x["repo"], x["number"]))
save_json(path, final)
```

## 5. Observability

### Force unbuffered output

Python buffers stdout when output is captured (subprocess, pipe, file redirect). Progress lines never appear.

```python
import sys
sys.stdout.reconfigure(line_buffering=True)
# OR run with: python3 -u script.py
```

### Log structure for monitoring

```
=== repo-name (query-type) ===
Page 1: 50 nodes, hasNext=True (size=50)
Page 2: 50 nodes, hasNext=True (size=50)
Resource limit exceeded (page_size=50), signaling page-size reduction
Reducing page size to 25 and retrying page 3 (reduction 1/4)
Page 3: 25 nodes, hasNext=True (size=25)
...
Total: 430 items, collected 430, 3169 sub-items
```

Every log line should include: page number, items returned, whether there are more pages, and current page size.

## 6. Inter-Page Delays

GitHub's secondary rate limit triggers on sustained request volume, not individual request cost. Add 2-3s between pages.

```python
PAGE_DELAY = 2 # seconds

# After each successful page:
time.sleep(PAGE_DELAY)

# After a resource-limit reduction:
time.sleep(INITIAL_BACKOFF * 2) # longer cooldown
```

## Checklist

When writing a collection script, verify:

- [ ] Error classification distinguishes resource-limit from rate-limit from transient
- [ ] Resource-limit errors reduce page size (not retry same query)
- [ ] Transient errors retry with exponential backoff
- [ ] Non-retryable errors fail fast
- [ ] Page size starts at 50 or lower for nested queries
- [ ] Page size has a floor (5-10) and max-reduction cap
- [ ] Inter-page delay prevents secondary rate limits
- [ ] Output is unbuffered (`-u` flag or `reconfigure`)
- [ ] Each log line includes page number, count, hasNext, page size
- [ ] Data is deduped by natural key before writing
- [ ] Re-runs merge with existing data (incremental collection)
- [ ] Collection log records run metadata (timestamps, repos, filters)
65 changes: 65 additions & 0 deletions domains/performance/skills/browser-extension-profiling/skill.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
---
maturity: experimental
name: browser-extension-profiling
description: Compare browser extension performance between branches using WDYR, React DevTools Profiler, and E2E benchmarks with statistical rigor.
---

# Browser Extension Profiling

Methodology for profiling and comparing extension performance across branches or commits.

## When To Use

- Validating that a refactor reduces unnecessary re-renders (needs before/after comparison)
- Establishing baseline metrics for a performance initiative
- Investigating a reported UI slowdown in the extension

## Do Not Use When

- Single-run comparisons — statistical significance requires ≥10 runs per scenario
- The change touches only non-render paths (background scripts, network with no UI impact)
- Target behavior is server-side latency, not UI rendering

## Workflow

1. **Build both branches** with `yarn build:test` on the same machine and Chrome version

2. **WDYR profiling** (unnecessary re-render counts)
```bash
ENABLE_WHY_DID_YOU_RENDER=true yarn start
```
Flags to watch:
- `different objects that are equal by value` → object recreation
- `different functions with the same name` → callback recreation
- `props object itself changed but values equal` → parent cascade

3. **React DevTools Profiler** for flame graphs and commit timings
```bash
yarn devtools:react
```

4. **E2E benchmarks** for scenario durations
```bash
yarn test:e2e:benchmark
```

5. **Collect ≥10 runs** per scenario. Discard top/bottom 10%. Report mean, median, stddev, p75, p95.

6. **Statistical threshold:** Cohen's d > 0.5 for a meaningful difference.

## Common Pitfalls

| Mistake | Correct Approach |
|---------|-----------------|
| Running branches on different machines or Chrome versions | Same machine, same Chrome, no other apps running |
| Pooling all runs including noisy late-session ones | Compute per-round stats first; report cleanest signal with explicit round attribution |
| Reporting absolute re-render counts without scenario context | Normalize per-action; cascade fixes show multiplied impact at root |
| Skipping cache and state reset between runs | Clear browser cache, reset extension state for each run |

## Pre-Profiling Checklist

- [ ] Both branches built with `yarn build:test`
- [ ] Same machine, same Chrome version
- [ ] No other tabs or applications running
- [ ] WDYR enabled: `ENABLE_WHY_DID_YOU_RENDER=true`
- [ ] Cache and extension state cleared between runs
89 changes: 89 additions & 0 deletions domains/platform/knowledge/extension-architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
---
name: extension-architecture
domain: platform
description: MetaMask extension — background/UI boundary, state sync, build types, key directories
---

# Extension Architecture

## Background / UI Boundary

The extension runs two separate JavaScript contexts that cannot share memory.

| Context | Entry | Access |
|---------|-------|--------|
| Background (Service Worker / background page) | `app/scripts/` | DOM-less; controllers, wallet logic |
| UI (popup/tab) | `ui/` | React + Redux; rendering only |
| Shared | `shared/` | Constants, utilities, type definitions |

Communication is message-based (Chrome runtime messaging). Code in `app/scripts/` cannot `import` from `ui/` and vice versa.

## State Sync Flow

```
Controller state changes (app/scripts/)
metamask-controller.js batches via debounce (200ms)
UI receives batched state via sendUpdate
Redux dispatches UPDATE_METAMASK_STATE
Immer applies patches (structural sharing — unchanged paths keep stable references)
useSelector evaluates; components re-render if output changed
```

Key file: `app/scripts/metamask-controller.js` — aggregates all controller state.

## Build Types

| Build | Command | Background | Security Policy |
|-------|---------|------------|-----------------|
| Development | `yarn start` | Webpack, hot reload | No LavaMoat |
| Production | `yarn dist` | Browserify | LavaMoat enforced |
| Test | `yarn build:test` | Browserify | Partial LavaMoat |

LavaMoat restricts package capabilities at runtime. After adding/updating dependencies, run `yarn lavamoat:auto` to regenerate policies.

## Manifest Versions

| Version | Background | Lifecycle |
|---------|------------|-----------|
| MV3 (Chrome) | Service Worker | Can terminate and restart |
| MV2 (Firefox) | Background Page | Always running |

Errors concentrated in MV3 (99%+) → root cause is service worker lifecycle, not application logic.

## Key Directories

```
app/scripts/
├── controllers/ # Feature controllers (one per domain)
├── lib/ # Background utilities
└── metamask-controller.js # Main aggregator; 200ms debounce

ui/
├── components/ # Reusable React components
├── pages/ # Page-level components
│ ├── routes/ # routes.component.tsx (high selector count)
│ └── home/ # home.container.js (legacy connect())
├── ducks/ # Redux slices
├── selectors/ # All selectors
│ ├── selectors.js # Main file (~2500 lines)
│ └── <feature>.ts # Feature-specific selectors
└── contexts/ # React Context providers

shared/
├── constants/
├── lib/
└── modules/
└── selectors/
└── selector-creators.ts
```

## React Compiler Scope

Enabled for `ui/components`, `ui/contexts`, `ui/hooks`, `ui/layouts`, `ui/pages`.

Does NOT cross file boundaries — selector values from `useSelector` require manual `useMemo`.
Loading
Loading