Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Added

- Add `analytics` domain: Sentry span-quota guard, Sentry MCP query workflows (including longer-range/30D+ query fidelity and percentile-sample-size filtering), release-over-release performance attribution, instrumentation methodology, and supporting knowledge

## [0.1.0]

### Added
Expand Down
41 changes: 41 additions & 0 deletions domains/analytics/knowledge/metrametrics-identity.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
name: metrametrics-identity
domain: analytics
description: isOptIn:true unconditionally strips user identity in MetaMetricsController — always sends as anonymous ID
---

# MetaMetrics Identity Stripping

## The Mechanism

In `MetaMetricsController` (`app/scripts/controllers/metametrics-controller.ts`):

```typescript
if (excludeMetaMetricsId || (isOptIn && !metaMetricsIdOverride)) {
idType = 'anonymousId';
idValue = METAMETRICS_ANONYMOUS_ID; // 0x0000000000000000
}
```

When `isOptIn: true` with no `metaMetricsIdOverride`:
- The user's real `metaMetricsId` is discarded
- ALL such events share a single anonymous ID (`0x0000000000000000`) in Segment
- User-level attribution is completely lost

This is **unconditional** — it applies to fully opted-in users with valid IDs, not just anonymous users.

## Intended Use

The onboarding opt-in flow (`creation-successful.tsx`) — where the user hasn't committed to MetaMetrics yet and no `metaMetricsId` has been persisted. The event must fire regardless of opt-in state.

## The Misuse Pattern

Post-opt-in `trackEvent` calls with `{ isOptIn: true }` without `metaMetricsIdOverride`. Defeats the purpose of Segment user-level dimensions (account types, feature flags).

## Detection

```bash
grep -r "isOptIn: true" app/scripts/ ui/ --include="*.ts" --include="*.tsx"
```

Any occurrence outside `creation-successful.tsx` (or the onboarding flow) is suspect.
40 changes: 40 additions & 0 deletions domains/analytics/knowledge/segment-governance.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
---
name: segment-governance
domain: analytics
description: Segment event governance via segment-schema is advisory — no CI enforcement prevents unregistered events from shipping
---

# Segment Event Governance

## Architecture

| Component | Location |
|-----------|----------|
| Tracking plan | `Consensys/segment-schema` → `tracking-plans/metamask-extension.yaml` |
| Event registry | `shared/constants/metametrics.ts` → `MetaMetricsEventName` enum (300+ entries) |
| Review process | `CONTRIBUTING.md` in segment-schema; Data Council review |
| Governance channel | `#metamask-metametrics`, `@consensys/data-council` |

## The Gap

There is **no CI enforcement** in the extension repo. A developer can:

1. Add entry to `MetaMetricsEventName` enum
2. Call `trackEvent` with it
3. Merge and ship to production

...without registering in segment-schema or going through Data Council review.

## Implications

- Schema drift between tracking plan and production events
- No property schema validation for unregistered events
- Billing impact goes unreviewed
- Data Council review is bypassable by omission

## Recommended Fix

CI check that:
1. Parses `MetaMetricsEventName` entries
2. Validates each against `tracking-plans/metamask-extension.yaml`
3. Fails build if event is missing from the plan
72 changes: 72 additions & 0 deletions domains/analytics/knowledge/span-sub-sampling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
---
name: span-sub-sampling
domain: analytics
description: Deterministic per-trace sub-sampling for high-frequency custom spans — global tracesSampleRate × span sub-rate, traceId-hash bucketed
---

# Span Sub-Sampling

Durable fix for a custom span that fans out and eats the span budget. Layer a per-trace sub-rate **under** the global `tracesSampleRate`, keyed on the trace id so every span in a trace is kept-or-dropped together. Source: [PR #39891](https://github.com/MetaMask/metamask-extension/pull/39891) (`shared/lib/wrapper-sampling.ts`).

## Rate Math

```
effective rate = global tracesSampleRate × span sub-rate
```

- Global `tracesSampleRate` is already small (extension prod: 0.75%).
- The sub-rate cuts the custom span on top: `0.75% × 1% = 0.0075%`.
- PR #39891 ships a sub-rate of 0.5% (`WRAPPER_SAMPLE_RATE = 0.005`) — a conservative pilot — and names 5% as the step-up once the denylist is confirmed effective in production.

Pick the sub-rate from how many sampled traces the metric needs to stay useful — not from the quota alone. Too low and the metric goes dark.

## Pattern

```ts
const WRAPPER_SAMPLE_RATE = 0.005;

// Deterministic: same answer for the same traceId, so all spans in a trace
// are kept or dropped together — clean waterfalls, no partial gaps.
export function shouldSampleWrappers(traceId: string | undefined): boolean {
if (!traceId || traceId.length < 8) {
return false;
}
const hashBucket = parseInt(traceId.slice(0, 8), 16) % 10000;
return hashBucket < WRAPPER_SAMPLE_RATE * 10000;
}
```

**Why deterministic, not `Math.random()` per call:** independent per-span sampling shreds a trace into partial waterfalls (some spans present, siblings missing) — useless for attribution. Hashing the trace id makes keep/drop a property of the whole trace.

## Gate Order (cheapest check first)

```ts
const traceId = sentryGetActiveSpan()?.spanContext().traceId;
if (!traceId || isReadOnlyAction(action) || !shouldSampleWrappers(traceId)) {
return doWorkWithoutSpan();
}
return trace({ name, op, data }, doWorkWithSpan);
```

1. No active trace → no span.
2. Denylist → skip noise (below).
3. Sub-sample miss → skip this trace's spans.

## Denylist: cut before you sample

Drop spans with no timing/attribution signal before sub-sampling. In PR #39891, read-only verbs are ~90% of `messenger.call` volume:

```ts
const READ_ONLY_VERB = /^(?:get|has|find|is|peek)(?:[A-Z]|$)/u;
```

Removing ~90% of volume before the sample multiplies headroom — a higher sub-rate then yields the same span budget, so kept traces are denser and more useful.

## Where the Gate Goes

- **Consumer (extension):** spans go through `trace()`. Gate at the call site, or for a whole span family inside the wrapper. `traceId` from `sentryGetActiveSpan()?.spanContext().traceId`.
- **Controller package (core):** controllers call an injected `trace` callback. Gate in the package's trace util or the callback so every consumer inherits the cap. Pull the trace id from the controller's tracing context, not a fresh Sentry import.

## Kill Switch

Ship every always-on span family with an env disable flag (PR #39891: `SENTRY_DISTRIBUTED_TRACING_DISABLED` returns the messenger un-wrapped). It turns a future emergency cut into a config flip instead of a cherry-pick.
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
---
repo: metamask-extension
parent: analytics-instrumentation
---

## Key Files

| Content | Path |
|---------|------|
| Sentry trace wrapper | `shared/lib/trace.ts` |
| Trace name enum | `shared/lib/trace.ts` → `TraceName` |
| MetaMetrics controller | `app/scripts/controllers/metametrics-controller.ts` |
| Event enum | `shared/constants/metametrics.ts` → `MetaMetricsEventName` |
| Sentry setup + sample rate | `app/scripts/lib/setupSentry.js` → `getTracesSampleRate()` |
| Segment tracking plan | `Consensys/segment-schema` → `tracking-plans/metamask-extension.yaml` |

## Cross-Process Context (UI → Background)

The extension has two Sentry hubs — one in the UI process and one in the background service worker. A trace starting in UI and continuing in background requires explicit context propagation across the RPC boundary:

```typescript
// Serialize at UI call site
const context: SerializedTraceContext = {
_name: TraceName.MyOperation,
_traceId: span.spanContext().traceId,
_spanId: span.spanContext().spanId,
}

// Background receives context, creates child span
trace({ name: TraceName.MyOperation, parentContext: context }, async () => { ... })
```

Without propagation: Sentry shows two disconnected operations. With propagation: complete tree from user action to RPC call.

## Sentry Sample Rate

```bash
grep -n "tracesSampleRate" app/scripts/lib/setupSentry.js
# Verify current value before calculating — it has changed between releases
```

## Sentry Traces Explorer Query (Volume Estimation)

```
Environment: production | Time range: 30 days | Mode: aggregate
Query: span.op:http.client span.description:*{endpoint}*
Group by: span.description, transaction
Sort: -count(span.duration)
```

## Detect `isOptIn` Misuse

```bash
grep -rn "isOptIn: true" app/scripts/ ui/ --include="*.ts" --include="*.tsx"
# Any occurrence outside the onboarding opt-in flow is suspect
```

## Data Council Contact

- Slack: `#metamask-metametrics`
- Team: `@consensys/data-council`
89 changes: 89 additions & 0 deletions domains/analytics/skills/analytics-instrumentation/skill.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
---
maturity: experimental
name: analytics-instrumentation
description: Create and update Sentry spans, MetaMetrics events, and Segment events — methodology, policies, common pitfalls
---

# Analytics Instrumentation

## When To Use

- Adding or modifying a MetaMetrics (Segment) event
- Adding or modifying a Sentry performance span
- Estimating event or span volume from production data
- Auditing existing instrumentation for correctness

---

## Do Not Use When

- Adding local debug logging with no telemetry destination
- Investigating an existing Sentry error report (use `sentry-mcp-queries`)
- Internal feature flag evaluation not surfaced as an analytics event

---

## Sentry Spans

### Creating a Span

1. **Register a named trace entry** in the repo's trace name enum before writing any span code. Unnamed spans are invisible in Sentry filters.
2. **Use the repo's `trace()` wrapper**, not raw `Sentry.startSpan()`. Wrappers handle cross-process context propagation, active-span inheritance, and consistent tag injection.
3. **Inherit parent automatically** — when no `parentContext` is provided, the wrapper inherits from `Sentry.getActiveSpan()`, making the new span a child of the active parent (e.g., a `pageload` span).

### Updating a Span

- Adding a tag: no governance required
- Renaming a trace name enum entry: grep all callsites; update enum and references atomically
- Changing an `op` value: breaks saved queries and dashboards — coordinate with whoever owns them

---

## MetaMetrics / Segment Events

### Creating an Event

1. **Check the event name enum** — event may already exist under a different phrasing.
2. **Check the segment tracking plan** — event may be registered under a different name than the enum key.
3. **Add to the enum**, then implement the `trackEvent` call.
4. **Do NOT use `isOptIn: true` outside the onboarding opt-in flow.** It strips user identity unconditionally for all users, not just non-opted-in ones (see Reference Knowledge: metrametrics-identity).
5. **Open a data governance review** before merging. There is usually no CI enforcement on schema registration — this step is easy to skip (see Reference Knowledge: segment-governance).
6. **Register in the team's segment tracking plan** before shipping.

### Updating an Event

- Adding a property: requires governance review and schema update
- Renaming an event: deprecate old + add new in tracking plan; coordinate on migration window
- Removing an event: confirm no active dashboards depend on it before removing

---

## Volume Estimation via Sentry

When direct Segment access is unavailable, estimate from Sentry production span data:

1. **Find a correlated HTTP endpoint** — one that fires 1:1 with the event.
2. **Query Sentry Traces Explorer** (aggregate mode):
```
span.op:http.client span.description:*{endpoint}*
```
3. **Extrapolate:**
```
estimated_actual = sampled_count × (1 / tracesSampleRate)
```
4. **Interpret as upper bound** — endpoint may have callers outside the event path.

Caveats: sample population is MetaMetrics opted-in users only; verify the current `tracesSampleRate` before calculating (it changes between releases). For longer-range (30D+) or release-over-release queries, the sampled count is **not** comparable at face value — older releases are downsampled / retention-truncated and `.0` releases are sample-thin; see `sentry-mcp-queries` (Longer-Range Queries and Percentile Fidelity) and the `performance-attribution` skill.

---

## Common Pitfalls

| Mistake | Correct Approach |
|---------|-----------------|
| `isOptIn: true` on post-onboarding events | Strips user identity for all users; only valid in onboarding flow |
| Ship event without tracking-plan registration | No CI gate — add governance review explicitly to PR checklist |
| Raw `Sentry.startSpan()` instead of the repo's `trace()` wrapper | Use the wrapper — handles cross-process context and active-span inheritance |
| New span with no trace name enum entry | Register enum entry first; unnamed spans are invisible in Sentry filters |
| Multiply sampled count by `tracesSampleRate` | Multiply by inverse: `sampled × (1 / rate)` |
| Treat Sentry estimates as exact counts | Probabilistic sample — state sample size and confidence |
Loading