Skip to content

fix: triage blind holdout LinkedIn misses#10

Merged
ftchvs merged 1 commit into
mainfrom
codex/and-62-blind-holdout-review
May 10, 2026
Merged

fix: triage blind holdout LinkedIn misses#10
ftchvs merged 1 commit into
mainfrom
codex/and-62-blind-holdout-review

Conversation

@ftchvs
Copy link
Copy Markdown
Owner

@ftchvs ftchvs commented May 10, 2026

Summary

  • Adds a scoped derived LinkedIn professional-claim review path for softer professional-outcome language.
  • Keeps existing hard LinkedIn promise signals (double your salary, guaranteed promotion, 10x productivity) high risk.
  • Documents the AND-62 blind-holdout follow-up and the remaining conservative telehealth overcall.

Blind holdout impact

  • Decision accuracy: 0.9670.989
  • Decision mismatches: 31
  • Policy false negatives: 1210
  • Policy false positives: unchanged at 7
  • Remaining decision miss: blind_telehealth_info_review (needs_review expected, high_risk actual) due expected Google health policy firing with conservative high-risk scoring.

Test plan

  • make real-world-blind-ci
  • make real-cases-ci
  • make test
  • make pr-preflight

Refs AND-62.

@linear-code
Copy link
Copy Markdown

linear-code Bot commented May 10, 2026

AND-62 Review remaining blind holdout misses after PR #9

Why

PR #9 merged the AdLint reliability backlog and left the blind holdout in a known conservative state: 90 rows, rule-only decision accuracy 0.967, three documented decision mismatches, 12 policy false-negative notes, and 7 policy false-positive notes.

The next useful increment is to review the three remaining decision mismatches without broadening product scope or tuning blindly against the holdout.

Remaining decision mismatch row IDs:

  • blind_productivity_claim_review
  • blind_promotion_workshop_review
  • blind_telehealth_info_review

Scope

Classify each remaining miss as one of:

  • true rule gap
  • label/adjudication issue
  • acceptable limitation
  • future model/extraction issue

Apply only narrow, reviewed rule or label/reporting changes when the evidence supports them. Do not change source rows just to improve the metric.

Acceptance criteria

  • Start from main after PR feat: complete AdLint Linear backlog #9 merge commit 83d0355.
  • Reproduce the current blind baseline with make real-world-blind.
  • Produce a row-by-row triage note for the three remaining mismatches.
  • If a rule change is justified, keep it narrow and explain affected policy IDs.
  • Preserve existing gates: make pr-preflight, make test, make real-cases-ci, make real-world-blind-ci, make policy-coverage-validate.
  • Update docs with before/after metrics and the holdout-boundary decision.
  • Open a small PR; do not bundle unrelated Meta Ads, storage, scoring, or research-loop work.

Notes

Related shipped work: PR #9, AND-28, and AND-49. Deterministic rules remain the production baseline; local model quality remains manual/scheduled diagnostics.

Review in Linear

@ftchvs ftchvs merged commit 5221198 into main May 10, 2026
2 checks passed
@ftchvs ftchvs deleted the codex/and-62-blind-holdout-review branch May 10, 2026 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant