docs: add comprehensive README for manual test suite

vinod0m · vinod0m · commit aeb4b34d28b8 · 2025-10-07T10:05:10.000+02:00
- Document all manual test files and their purposes
- Explain differences from automated tests
- Provide usage instructions and examples
- Add troubleshooting guide
- Include best practices for manual testing
- Clarify when to use each manual test
- Document test data requirements
diff --git a/test/manual/README.md b/test/manual/README.md
@@ -0,0 +1,367 @@
+# Manual Test Suite
+
+This directory contains manual verification and testing scripts that complement the automated test suite. These tests are designed to be run manually for validation, debugging, and quality assessment.
+
+## Overview
+
+Manual tests serve different purposes than automated tests:
+
+- **Automated tests** (`test/unit/`, `test/integration/`, etc.): Run automatically in CI/CD
+- **Manual tests** (`test/manual/`): Run manually for deep validation, debugging, or human assessment
+
+## Test Files
+
+### 1. `helper_function_verification.py`
+
+**Purpose**: Verify low-level utility functions of RequirementsExtractor
+
+**Tests**:
+- `split_markdown_for_llm()` - Markdown chunking with overlap
+- `parse_md_headings()` - Heading extraction and parsing
+- `merge_section_lists()` - Section deduplication and merging
+- `merge_requirement_lists()` - Requirement deduplication
+- `extract_json_from_text()` - JSON extraction from LLM responses
+- `normalize_and_validate()` - Data normalization and validation
+
+**When to use**:
+- Debugging extraction issues
+- Verifying helper function changes
+- Understanding how utilities work
+- Testing edge cases in isolation
+
+**How to run**:
+```bash
+cd test/manual
+python helper_function_verification.py
+```
+
+**Expected output**:
+```
+======================================================================
+MANUAL VERIFICATION TESTS
+Testing RequirementsExtractor Helper Functions
+======================================================================
+
+Test 1: split_markdown_for_llm()
+======================================================================
+Split into 2 chunks:
+✓ PASSED
+
+[... more tests ...]
+
+======================================================================
+✅ ALL TESTS PASSED
+======================================================================
+
+🎉 RequirementsExtractor helper functions are working correctly!
+```
+
+---
+
+### 2. `quick_integration_test.py`
+
+**Purpose**: End-to-end integration test with actual document files
+
+**Tests**:
+- Complete document extraction workflow
+- Real PDF/DOCX/PPTX processing
+- DocumentAgent functionality
+- Performance measurement
+
+**When to use**:
+- Testing changes with real documents
+- Performance benchmarking
+- Quick validation before commits
+- Reproducing user-reported issues
+
+**How to run**:
+```bash
+cd test/manual
+python quick_integration_test.py
+```
+
+**Requirements**:
+- Test documents in `test/debug/samples/`:
+  - `small_requirements.pdf`
+  - `large_requirements.pdf`
+  - `business_requirements.docx`
+  - `architecture.pptx`
+
+**Expected output**:
+```
+======================================================================
+🧪 Quick Integration Test - Phase 2 Task 6
+======================================================================
+
+🤖 Initializing DocumentAgent...
+   ✅ Agent ready
+
+======================================================================
+📄 Testing: small_requirements.pdf
+======================================================================
+📊 File size: 145.2 KB
+🚀 Extracting requirements...
+
+✅ Completed in 8.3s
+📊 Results:
+   • Sections: 5
+   • Requirements: 23
+     - functional: 15
+     - non-functional: 8
+```
+
+---
+
+### 3. `quality_validation.py`
+
+**Purpose**: Human validation of AI extraction quality metrics
+
+**Tests**:
+- Confidence score accuracy
+- Quality flag correctness
+- Human agreement with AI assessments
+- Interactive requirement review
+
+**When to use**:
+- Validating Task 7 quality metrics
+- Tuning confidence thresholds
+- Assessing extraction accuracy
+- Gathering human validation data
+
+**How to run**:
+```bash
+cd test/manual
+python quality_validation.py
+```
+
+**Requirements**:
+- Benchmark results in `test/test_results/benchmark_logs/benchmark_latest.json`
+- Run benchmark first: `python test/benchmark/benchmark_performance.py`
+
+**Expected output**:
+```
+======================================================================
+Task 7 Quality Metrics - Manual Validation
+======================================================================
+
+This tool helps validate that Task 7 confidence scores and
+quality flags are accurate through manual review.
+
+📂 Loading benchmark results from: benchmark_latest.json
+
+⚙️  Configuration:
+   Sample size (default: 20): 10
+   → Validating 10 requirements
+
+[Interactive validation prompts...]
+
+======================================================================
+VALIDATION REPORT
+======================================================================
+
+📊 Overall Results (n=10):
+   • Complete & well-formed: 9/10 (90.0%)
+   • ID correct: 10/10 (100.0%)
+   • Category correct: 9/10 (90.0%)
+   • Would approve: 8/10 (80.0%)
+
+🎯 Confidence Score Assessment:
+   • Too high (overconfident): 1/10
+   • About right (accurate): 8/10
+   • Too low (underconfident): 1/10
+```
+
+---
+
+### 4. `unit_test_helpers.py`
+
+**Purpose**: Shared utilities for manual tests
+
+**Contents**:
+- Common test fixtures
+- Helper functions
+- Shared mock data
+- Utility classes
+
+**When to use**:
+- Import from other manual tests
+- Avoid code duplication
+- Share test utilities
+
+**How to use**:
+```python
+from unit_test_helpers import create_test_document, mock_llm_response
+
+# Use helper functions in your manual tests
+doc = create_test_document("sample.pdf")
+response = mock_llm_response()
+```
+
+---
+
+## Comparison with Automated Tests
+
+| Aspect | Automated Tests | Manual Tests |
+|--------|----------------|--------------|
+| **Location** | `test/unit/`, `test/integration/` | `test/manual/` |
+| **Run by** | CI/CD (pytest) | Developer (python) |
+| **Purpose** | Regression prevention | Deep validation |
+| **Scope** | Unit/Integration | End-to-end + Human |
+| **Speed** | Fast (< 1 min) | Slow (1-10 min) |
+| **Coverage** | Comprehensive | Targeted |
+| **Pass/Fail** | Automatic | Manual assessment |
+
+## When to Use Manual Tests
+
+### Use `helper_function_verification.py` when:
+- ✅ Debugging helper function issues
+- ✅ Testing edge cases in utilities
+- ✅ Verifying refactoring changes
+- ✅ Learning how utilities work
+
+### Use `quick_integration_test.py` when:
+- ✅ Testing with real documents
+- ✅ Measuring performance
+- ✅ Reproducing user issues
+- ✅ Validating before major commits
+
+### Use `quality_validation.py` when:
+- ✅ Tuning confidence thresholds
+- ✅ Validating AI quality metrics
+- ✅ Assessing extraction accuracy
+- ✅ Gathering validation data
+
+## Running All Manual Tests
+
+```bash
+# Run helper function verification
+python test/manual/helper_function_verification.py
+
+# Run quick integration test (requires test documents)
+python test/manual/quick_integration_test.py
+
+# Run quality validation (requires benchmark results)
+python test/manual/quality_validation.py
+```
+
+## Test Data Requirements
+
+### For `quick_integration_test.py`:
+
+Create test documents in `test/debug/samples/`:
+
+```bash
+test/debug/samples/
+├── small_requirements.pdf      # Small requirements doc (100-200 KB)
+├── large_requirements.pdf      # Large requirements doc (1-5 MB)
+├── business_requirements.docx  # Business requirements
+└── architecture.pptx           # Architecture diagrams
+```
+
+### For `quality_validation.py`:
+
+Run benchmark first to generate results:
+
+```bash
+python test/benchmark/benchmark_performance.py
+# Creates: test/test_results/benchmark_logs/benchmark_latest.json
+```
+
+## Integration with Automated Tests
+
+Manual tests complement automated tests:
+
+```
+test/
+├── unit/                  # Fast, isolated unit tests (pytest)
+├── integration/           # API integration tests (pytest)
+├── smoke/                 # Basic functionality tests (pytest)
+├── e2e/                   # End-to-end workflows (pytest)
+├── manual/                # Manual verification tests (python)
+│   ├── helper_function_verification.py
+│   ├── quick_integration_test.py
+│   ├── quality_validation.py
+│   └── unit_test_helpers.py
+└── benchmark/             # Performance benchmarks (python)
+    └── benchmark_performance.py
+```
+
+## Best Practices
+
+1. **Run manual tests before major commits**
+   - Especially when changing core extraction logic
+   - Validates that automated tests don't catch everything
+
+2. **Use for debugging**
+   - Manual tests provide more visibility
+   - Easier to add print statements and inspect state
+
+3. **Document findings**
+   - If manual tests reveal issues, add automated tests
+   - Convert manual test cases to automated when possible
+
+4. **Keep them simple**
+   - Manual tests should be easy to run
+   - No complex setup or dependencies
+
+5. **Update with code changes**
+   - Keep manual tests in sync with API changes
+   - Update when helper functions change
+
+## Troubleshooting
+
+### "Module not found" errors
+
+Ensure PYTHONPATH is set:
+```bash
+export PYTHONPATH=.
+python test/manual/helper_function_verification.py
+```
+
+Or run from project root:
+```bash
+cd /path/to/unstructuredDataHandler
+python test/manual/helper_function_verification.py
+```
+
+### "File not found" for test documents
+
+Create test documents or update paths in test files:
+```python
+# In quick_integration_test.py
+samples_dir = Path(__file__).parent.parent / "debug" / "samples"
+```
+
+### Benchmark results not found
+
+Run benchmark first:
+```bash
+python test/benchmark/benchmark_performance.py
+```
+
+## Contributing
+
+When adding new manual tests:
+
+1. **Name clearly**: `*_verification.py` or `*_validation.py`
+2. **Add to this README**: Document purpose and usage
+3. **Keep focused**: Each test should have clear purpose
+4. **Make executable**: Add shebang and main guard
+5. **Document requirements**: List dependencies and test data
+
+## Summary
+
+Manual tests in this directory provide:
+
+- ✅ **Verification**: Helper function correctness
+- ✅ **Integration**: End-to-end workflow validation
+- ✅ **Quality**: Human assessment of AI metrics
+- ✅ **Debugging**: Detailed visibility into operations
+- ✅ **Performance**: Real-world timing data
+
+Use them alongside automated tests for comprehensive validation!
+
+---
+
+**Last Updated**: October 7, 2025  
+**Maintainer**: Development Team