fix: triage blind holdout LinkedIn misses#10
Conversation
AND-62 Review remaining blind holdout misses after PR #9
WhyPR #9 merged the AdLint reliability backlog and left the blind holdout in a known conservative state: 90 rows, rule-only decision accuracy 0.967, three documented decision mismatches, 12 policy false-negative notes, and 7 policy false-positive notes. The next useful increment is to review the three remaining decision mismatches without broadening product scope or tuning blindly against the holdout. Remaining decision mismatch row IDs:
ScopeClassify each remaining miss as one of:
Apply only narrow, reviewed rule or label/reporting changes when the evidence supports them. Do not change source rows just to improve the metric. Acceptance criteria
NotesRelated shipped work: PR #9, AND-28, and AND-49. Deterministic rules remain the production baseline; local model quality remains manual/scheduled diagnostics. |
Summary
double your salary,guaranteed promotion,10x productivity) high risk.Blind holdout impact
0.967→0.9893→112→107blind_telehealth_info_review(needs_reviewexpected,high_riskactual) due expected Google health policy firing with conservative high-risk scoring.Test plan
make real-world-blind-cimake real-cases-cimake testmake pr-preflightRefs AND-62.