Case Name Bleeding Fixes - Implementation Summary

Fixes Implemented

1. ✅ Validate Extracted Name Appears Before Citation

File: src/utils/unified_case_name_extractor.py

What it does:

After extracting a case name, validates it actually appears in the text BEFORE the citation
Searches in a 500-character window before the citation
If case name not found, rejects it (prevents cross-contamination)
If found too far (>400 chars), also rejects it

Impact: Prevents picking up case names from wrong citations

2. ✅ Improve Citation Boundary Detection

File: src/utils/strict_context_isolator.py

What it does:

Uses END position of previous citations as boundaries (not start)
Ensures we don't include any text from previous citations
Better handling of parallel citations (within 50 chars)

Impact: Prevents case name bleeding from nearby citations

3. ✅ Reject Legal Analysis Text in Extracted Names

File: src/case_name_validator.py

What it does:

Validates extracted names don't contain legal analysis phrases
Rejects names containing: "Frye rulings de novo", "WPLA claim", "ER 702", "We review", etc.
Rejects names starting with legal analysis phrases

Impact: Prevents contamination from surrounding legal text

4. ✅ Remove Legal Analysis Phrases from Context

File: src/utils/strict_context_isolator.py

What it does:

Removes legal analysis phrases from context BEFORE extraction
Patterns like: "Frye rulings de novo", "WPLA claim", "We review choice of law", etc.

Impact: Prevents legal text from contaminating extracted case names

Expected Results

After these fixes, you should see:

Fewer wrong extracted names - Names should match the citation they're extracted for
No legal text contamination - Names like "Frye rulings de novo. L.M. v. Hamilton" should be rejected
Better boundary detection - Case names from nearby citations shouldn't bleed through

Testing

Test with the problematic cases from your results:

Erickson v. Pharmacia LLC, 1980 - Should extract correct name, not "Env't Def. Fund"
Rice v. Dow Chemical Co., 1994 - Should extract "Rice v. Dow Chemical Co.", not "Erickson v. Pharmacia"
State v. Copeland, 1996 - Should extract "State v. Copeland", not "Frye rulings de novo. L.M. v. Hamilton"
State v. Cauthron, 1993 - Should extract "State v. Cauthron", not "Frye hearing. State v. Copeland"

Next Steps

If issues persist:

Check logs for [BOUNDARY-VALIDATION] messages to see why names are being rejected
Review context windows - May need to adjust max_lookback or boundary detection
Add more legal phrase patterns - If new contamination patterns are found

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Case Name Bleeding Fixes - Implementation Summary

Fixes Implemented

1. ✅ Validate Extracted Name Appears Before Citation

2. ✅ Improve Citation Boundary Detection

3. ✅ Reject Legal Analysis Text in Extracted Names

4. ✅ Remove Legal Analysis Phrases from Context

Expected Results

Testing

Next Steps

FilesExpand file tree

CASE_NAME_BLEEDING_FIXES.md

Latest commit

History

CASE_NAME_BLEEDING_FIXES.md

File metadata and controls

Case Name Bleeding Fixes - Implementation Summary

Fixes Implemented

1. ✅ Validate Extracted Name Appears Before Citation

2. ✅ Improve Citation Boundary Detection

3. ✅ Reject Legal Analysis Text in Extracted Names

4. ✅ Remove Legal Analysis Phrases from Context

Expected Results

Testing

Next Steps