BEFORE:
- Frontend calculated its own
name_mismatchanddate_mismatchflags - Used simplified normalization that didn't match backend
- Complex logic to compare canonical vs extracted names
- 50+ lines of normalization and comparison code
AFTER:
- Frontend only reads
name_mismatchanddate_mismatchfrom backend - Zero calculation logic on frontend
- Frontend is now a pure display layer
- Added comprehensive debug logging to track backend flags
Deleted ~50 lines:
// OLD: Complex normalization function
const norm = (s) => {
// ... abbreviation expansion ...
// ... date suffix removal ...
// ... common word filtering ...
}Replaced with:
// NEW: Just read backend flags
const hasNameMismatch = g.some(cit => cit?.name_mismatch === true)
const hasDateMismatch = g.some(cit => cit?.date_mismatch === true)For fallback clusters:
if (hasNameMismatch || hasDateMismatch) {
console.log(`🔍 Cluster ${idx+1} mismatch flags:`, {
canonical_name: vname,
extracted_name: sname,
canonical_date: vdate,
extracted_date: sdate,
has_name_mismatch: hasNameMismatch,
has_date_mismatch: hasDateMismatch,
citation_flags: g.map(c => ({
citation: c.citation,
name_mismatch: c.name_mismatch,
date_mismatch: c.date_mismatch
}))
})
}For cluster display:
const hasNameMismatch = (cluster) => {
const result = Boolean(cluster?.has_name_mismatch)
if (result) {
console.log('🔍 hasNameMismatch=true for cluster:', {
cluster_id: cluster?.cluster_id,
canonical_name: getClusterVerifyingName(cluster),
extracted_name: getClusterSubmittedName(cluster),
backend_flag: cluster?.has_name_mismatch
})
}
return result
}Changed to only show actual extracted names:
// Don't fall back to canonical_name - only show actually extracted names!
// If extraction failed, be honest and show 'N/A'
return 'N/A'- All matching logic lives in backend
- Backend has sophisticated
_names_equivalent()function - Backend has
_case_names_match()with 70% word overlap logic - Backend properly expands abbreviations (Co., Inc., Dept., etc.)
- Backend strips date suffixes correctly
- All console logs show backend flags
- Can trace exactly what backend calculated
- No confusion about frontend vs backend results
- Debug logs show both canonical and extracted values
- Frontend and backend always agree
- No possibility of divergence
- Backend logic can be improved without touching frontend
- ~70 lines of code removed from frontend
- No duplicate logic to maintain
- Frontend is pure display layer
Each citation has:
{
"citation": "161 Wn.2d 676",
"canonical_name": "Erwin v. Cotter Health Centers, Inc.",
"extracted_case_name": "N/A",
"name_mismatch": true, // ← Set by backend
"date_mismatch": false, // ← Set by backend
"possible_match": true
}Each cluster aggregates citation flags:
{
"cluster_id": "cluster_1",
"has_name_mismatch": true, // ← true if ANY citation has name_mismatch
"has_date_mismatch": false, // ← true if ANY citation has date_mismatch
"mismatch_indices": [0, 2], // ← indices of citations with mismatches
"citations": [...]
}Annotation: citation_extraction_endpoint.py::_annotate_mismatch_flags()
- Sets
name_mismatchanddate_mismatchon each citation - Uses
_names_equivalent()for sophisticated matching - Threshold: 0.4 (lowered from 0.6)
Clustering: unified_clustering_master.py
- Aggregates citation-level flags to cluster level
- Sets
has_name_mismatch,has_date_mismatch,mismatch_indices
Pipeline: unified_processing_pipeline.py
- Re-annotates after clustering
- Ensures flags are consistent
When you open browser console, you'll now see:
🔍 Cluster 3 mismatch flags: {
canonical_name: "Erwin v. Cotter Health Centers, Inc.",
extracted_name: "N/A",
canonical_date: "2007-09-20",
extracted_date: "2007",
has_name_mismatch: true,
has_date_mismatch: false,
citation_flags: [
{ citation: "161 Wn.2d 676", name_mismatch: true, date_mismatch: false },
{ citation: "167 P.3d 1112", name_mismatch: true, date_mismatch: false }
]
}
🔍 hasNameMismatch=true for cluster: {
cluster_id: "cluster_3",
canonical_name: "Erwin v. Cotter Health Centers, Inc.",
extracted_name: "N/A",
backend_flag: true
}
- "
⚠️ Different name" warnings - Check console for details - Backend flags - Verify they make sense given the names
- Extraction failures - "N/A" means extraction failed (correct to flag)
- Abbreviation matching - "Co." vs "Company" should NOT be flagged
Frontend is now a pure display layer:
- ✅ No calculation logic
- ✅ Just displays backend data
- ✅ Comprehensive debug logging
- ✅ Easier to maintain
- ✅ Always consistent with backend
- ✅ All processing happens once (backend)
Backend is the single source of truth:
- ✅ Sophisticated name matching
- ✅ Abbreviation expansion
- ✅ Date suffix handling
- ✅ Word overlap calculation
- ✅ Sets all mismatch flags
Result:
- Frontend and backend always agree
- Easy to debug (check console logs)
- Easy to improve (just change backend)
- No duplicate logic