CaseStrainer Active Code Map

Last Updated: November 10, 2025
Purpose: Prevent working on deprecated/unused files

🎯 Quick Reference: Where Is The Active Code?

Feature	ACTIVE FILE	Deprecated/Fallback Files
Citation Extraction	`unified_case_extraction_master.py`	`clean_extraction_pipeline.py` `unified_extraction_architecture.py` `unified_case_name_extractor_v2.py`
Citation Clustering	`unified_clustering_master.py`	`unified_citation_clustering.py`
Citation Verification	`unified_verification_master.py`	(none)
Main Processing	`unified_processing_pipeline.py`	(none)
Citation Processor	`unified_citation_processor_v2.py`	(none)

📊 Execution Flow (What Actually Runs)

1. User Uploads PDF

Frontend → Backend (app.py) → RQ Worker

2. Worker Processing

# src/rq_worker.py
def process_citation_task_direct():
    ↓
# src/unified_processing_pipeline.py
class UnifiedProcessingPipeline:
    def process_citations():
        ↓
        self._extract_citations()  # Line 137
            ↓
# src/unified_citation_processor_v2.py
class UnifiedCitationProcessorV2:
    def process_text():  # Lines 4045-4124
        ↓
        # Method 1: Master extractor (PRIMARY)
        extract_case_name_and_date_unified_master()  # Line 4082
            ↓
# src/unified_case_extraction_master.py (⭐ ACTIVE CODE)
def extract_case_name_and_date_unified_master():  # Line 2491
    ↓
    extractor = get_master_extractor()
    ↓
class UnifiedCaseExtractionMaster:
    def extract_case_name_and_date():  # Line 234
        ↓
        # Line 259: DIAGNOSTIC LOG (ERROR level)
        logger.error(f"[MASTER_EXTRACT ENTRY] citation='{citation}'")
        
        # Line 293-315: Strategy -0.5 (Special formats)
        self._extract_special_citation_formats()
        
        # Line 320-329: Strategy 0 (Comma-anchored)
        # Line 332-347: Strategy 1 (Position-aware)
        # Line 350-356: Strategy 2 (Context-based)
        # Line 359-361: Strategy 3 (Pattern-based)
        # Line 365-393: Strategy 4 (Aggressive fallback)

3. Verification & Clustering

# src/unified_verification_master.py
verify_citations_unified()
    ↓
# src/unified_clustering_master.py
cluster_citations_unified_master()

⚠️ DEPRECATED FILES (DO NOT MODIFY)

These files exist but are NOT in the main execution path:

`clean_extraction_pipeline.py`

Status: DEPRECATED (fallback only)
Why it exists: Used as fallback if master extractor fails
When it runs: Rarely - only in error recovery
How to identify: Docstring says "DEPRECATED"
What to do: Add features to unified_case_extraction_master.py instead

`unified_extraction_architecture.py`

Status: DEPRECATED
Why it exists: Old architecture, replaced by master
How to identify: Docstring says "superseded by UnifiedCaseExtractionMaster"

`unified_case_name_extractor_v2.py`

Status: DEPRECATED
Why it exists: One of 120+ duplicate extraction functions
How to identify: Delegates to extract_case_name_and_date_unified_master

🔍 How To Find Active Code (Developer Checklist)

Step 1: Identify Feature Area

Extraction? → Start with unified_case_extraction_master.py
Clustering? → Start with unified_clustering_master.py
Verification? → Start with unified_verification_master.py

Step 2: Verify It's Actually Called

# Search for imports of the function
grep -r "from.*unified_case_extraction_master import" src/

# Check for actual calls
grep -r "extract_case_name_and_date_unified_master" src/

Step 3: Check The Execution Path

# Start from the entry point and trace forward
# Entry: src/rq_worker.py → unified_processing_pipeline.py → unified_citation_processor_v2.py

Step 4: Look For Deprecation Warnings

Check file docstring (first 20 lines)
Look for "DEPRECATED", "DO NOT MODIFY", "superseded by"
Check for warnings.warn()

🛠️ Before Making Changes

Checklist:

Read the file's docstring - does it say "DEPRECATED"?
Search for imports - is this function actually imported anywhere?
Trace execution - does the code path go through this file?
Check logs - do you see logs from this file when the feature runs?
MOST IMPORTANT: If unsure, add logger.error("TEST") and verify it appears

Red Flags (Don't Modify):

❌ Docstring says "DEPRECATED" or "DO NOT MODIFY"
❌ No other files import from it (except as fallback)
❌ File has a newer version (e.g., _v2, _master, _unified)
❌ Comments say "replaced by" or "superseded by"

Green Flags (Safe To Modify):

✅ Imported by main processing files
✅ Docstring says "THE SINGLE SOURCE OF TRUTH" or "AUTHORITATIVE"
✅ Contains recent fixes/updates
✅ Your test logs from this file appear in production

🔬 Testing Your Changes

1. Add Diagnostic Logging FIRST

# At the entry point of your function:
logger.error(f"[YOUR-FEATURE] Function called with: {param}")

2. Rebuild & Test

docker-compose -f docker-compose.prod.yml up -d --build rqworker1
docker logs casestrainer-rqworker1-prod -f | grep "YOUR-FEATURE"

3. Upload Test Document

Upload 1031351.pdf (or your test case)
Watch logs in real-time

4. Verify Your Logs Appear

If you DON'T see your logs:

❌ You're modifying the wrong file
❌ Your code isn't being called
❌ Logging level is too high (use logger.error() not logger.debug())

📝 Adding New Features

Where to add extraction improvements:

# File: src/unified_case_extraction_master.py
class UnifiedCaseExtractionMaster:
    def extract_case_name_and_date(self, ...):
        # Line 293-315: Add new special format patterns here
        # Line 234-434: Main extraction logic

Where to add clustering improvements:

# File: src/unified_clustering_master.py
class UnifiedClusteringMaster:
    def _normalize_case_name_for_clustering(self, name: str):
        # Lines 874-928: Add abbreviation expansions here

Where to add verification logic:

# File: src/unified_verification_master.py
# Main verification API calls and result processing

🚨 Common Mistakes & Solutions

Mistake 1: Modified wrong file for 4 hours

Symptoms:

Changes don't take effect
No diagnostic logs appear
Same issues persist after rebuild

Solution:

Check file docstring for "DEPRECATED"
Search for actual imports: grep -r "from.*yourfile import" src/
Add logger.error("TEST") and verify it appears

Mistake 2: Used logger.debug() instead of logger.error()

Symptoms:

Code changes work
But no logs appear

Solution:

Worker logging level is INFO
Use logger.error() or logger.warning() for diagnostic logs
Or change LOG_LEVEL environment variable

Mistake 3: Assumed import means it's used

Symptoms:

File is imported somewhere
But changes still don't work

Solution:

Import might be for fallback only
Check if import is inside try/except
Check if there's a newer version that takes precedence

📂 File Organization Best Practices

Naming Convention:

unified_*_master.py = Current authoritative version
unified_*_v2.py = Older version (check if superseded)
*_pipeline.py = Could be old or new (check docstring)

When Creating New Files:

Name it clearly: unified_<feature>_master.py
Add clear docstring: "THE SINGLE SOURCE OF TRUTH"
Deprecate old versions with warnings
Update this ACTIVE_CODE_MAP.md

When Deprecating Files:

Add big warning in docstring
Add warnings.warn() at import
Keep file functional (don't delete - breaks imports)
Document the replacement in docstring

🎓 Learning From This Session

What Went Wrong:

Spent 4+ hours modifying clean_extraction_pipeline.py
The actual code was in unified_case_extraction_master.py
No warnings indicated the file was deprecated
Code was deployed but never executed

What Would Have Helped:

✅ Clear deprecation warnings (now added)
✅ This architecture document (now created)
✅ Diagnostic logging to verify execution (already in master)
✅ Checklist before modifying files (now documented)

Prevention Strategy:

ALWAYS add logger.error("TEST") at entry point
ALWAYS verify logs appear before making real changes
ALWAYS check file docstring for deprecation
ALWAYS trace imports from entry point (rq_worker.py)

📞 Quick Commands

# Find active extraction code
grep -r "extract_case_name_and_date_unified_master" src/

# Find active clustering code
grep -r "cluster_citations_unified_master" src/

# Check what's actually running
docker logs casestrainer-rqworker1-prod --since 5m | grep "MASTER_EXTRACT"

# Verify your changes deployed
docker exec casestrainer-rqworker1-prod grep -n "YOUR_CODE" /app/src/your_file.py

# Test extraction directly
docker exec -it casestrainer-rqworker1-prod python /app/diagnostic_extraction_test.py

Remember: When in doubt, follow the imports from rq_worker.py → your feature!

FilesExpand file tree

ACTIVE_CODE_MAP.md

Latest commit

History

ACTIVE_CODE_MAP.md

File metadata and controls

CaseStrainer Active Code Map

🎯 Quick Reference: Where Is The Active Code?

📊 Execution Flow (What Actually Runs)

1. User Uploads PDF

2. Worker Processing

3. Verification & Clustering

⚠️ DEPRECATED FILES (DO NOT MODIFY)

clean_extraction_pipeline.py

unified_extraction_architecture.py

unified_case_name_extractor_v2.py

🔍 How To Find Active Code (Developer Checklist)

Step 1: Identify Feature Area

Step 2: Verify It's Actually Called

Step 3: Check The Execution Path

Step 4: Look For Deprecation Warnings

🛠️ Before Making Changes

Checklist:

Red Flags (Don't Modify):

Green Flags (Safe To Modify):

🔬 Testing Your Changes

1. Add Diagnostic Logging FIRST

2. Rebuild & Test

3. Upload Test Document

4. Verify Your Logs Appear

📝 Adding New Features

Where to add extraction improvements:

Where to add clustering improvements:

Where to add verification logic:

🚨 Common Mistakes & Solutions

Mistake 1: Modified wrong file for 4 hours

Mistake 2: Used logger.debug() instead of logger.error()

Mistake 3: Assumed import means it's used

📂 File Organization Best Practices

Naming Convention:

When Creating New Files:

When Deprecating Files:

🎓 Learning From This Session

What Went Wrong:

What Would Have Helped:

Prevention Strategy:

📞 Quick Commands

`clean_extraction_pipeline.py`

`unified_extraction_architecture.py`

`unified_case_name_extractor_v2.py`