fix: triage blind holdout LinkedIn misses by ftchvs · Pull Request #10 · ftchvs/AdLint

ftchvs · 2026-05-10T05:44:29Z

Summary

Adds a scoped derived LinkedIn professional-claim review path for softer professional-outcome language.
Keeps existing hard LinkedIn promise signals (double your salary, guaranteed promotion, 10x productivity) high risk.
Documents the AND-62 blind-holdout follow-up and the remaining conservative telehealth overcall.

Blind holdout impact

Decision accuracy: 0.967 → 0.989
Decision mismatches: 3 → 1
Policy false negatives: 12 → 10
Policy false positives: unchanged at 7
Remaining decision miss: blind_telehealth_info_review (needs_review expected, high_risk actual) due expected Google health policy firing with conservative high-risk scoring.

Test plan

make real-world-blind-ci
make real-cases-ci
make test
make pr-preflight

Refs AND-62.

linear-code · 2026-05-10T05:44:33Z

AND-62 Review remaining blind holdout misses after PR #9

Why

PR #9 merged the AdLint reliability backlog and left the blind holdout in a known conservative state: 90 rows, rule-only decision accuracy 0.967, three documented decision mismatches, 12 policy false-negative notes, and 7 policy false-positive notes.

The next useful increment is to review the three remaining decision mismatches without broadening product scope or tuning blindly against the holdout.

Remaining decision mismatch row IDs:

blind_productivity_claim_review
blind_promotion_workshop_review
blind_telehealth_info_review

Scope

Classify each remaining miss as one of:

true rule gap
label/adjudication issue
acceptable limitation
future model/extraction issue

Apply only narrow, reviewed rule or label/reporting changes when the evidence supports them. Do not change source rows just to improve the metric.

Acceptance criteria

Start from main after PR feat: complete AdLint Linear backlog #9 merge commit 83d0355.
Reproduce the current blind baseline with make real-world-blind.
Produce a row-by-row triage note for the three remaining mismatches.
If a rule change is justified, keep it narrow and explain affected policy IDs.
Preserve existing gates: make pr-preflight, make test, make real-cases-ci, make real-world-blind-ci, make policy-coverage-validate.
Update docs with before/after metrics and the holdout-boundary decision.
Open a small PR; do not bundle unrelated Meta Ads, storage, scoring, or research-loop work.

Notes

Related shipped work: PR #9, AND-28, and AND-49. Deterministic rules remain the production baseline; local model quality remains manual/scheduled diagnostics.

Review in Linear

fix: triage blind holdout LinkedIn misses

0e18f47

ftchvs merged commit 5221198 into main May 10, 2026
2 checks passed

ftchvs deleted the codex/and-62-blind-holdout-review branch May 10, 2026 17:49

ftchvs mentioned this pull request May 10, 2026

docs: polish open source launch surface #11

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: triage blind holdout LinkedIn misses#10

fix: triage blind holdout LinkedIn misses#10
ftchvs merged 1 commit into
mainfrom
codex/and-62-blind-holdout-review

ftchvs commented May 10, 2026

Uh oh!

linear-code Bot commented May 10, 2026 •

edited

Loading

Why

Scope

Acceptance criteria

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ftchvs commented May 10, 2026

Summary

Blind holdout impact

Test plan

Uh oh!

linear-code Bot commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

Scope

Acceptance criteria

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

linear-code Bot commented May 10, 2026 •

edited

Loading