|
| 1 | +# Manual Test Suite |
| 2 | + |
| 3 | +This directory contains manual verification and testing scripts that complement the automated test suite. These tests are designed to be run manually for validation, debugging, and quality assessment. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +Manual tests serve different purposes than automated tests: |
| 8 | + |
| 9 | +- **Automated tests** (`test/unit/`, `test/integration/`, etc.): Run automatically in CI/CD |
| 10 | +- **Manual tests** (`test/manual/`): Run manually for deep validation, debugging, or human assessment |
| 11 | + |
| 12 | +## Test Files |
| 13 | + |
| 14 | +### 1. `helper_function_verification.py` |
| 15 | + |
| 16 | +**Purpose**: Verify low-level utility functions of RequirementsExtractor |
| 17 | + |
| 18 | +**Tests**: |
| 19 | +- `split_markdown_for_llm()` - Markdown chunking with overlap |
| 20 | +- `parse_md_headings()` - Heading extraction and parsing |
| 21 | +- `merge_section_lists()` - Section deduplication and merging |
| 22 | +- `merge_requirement_lists()` - Requirement deduplication |
| 23 | +- `extract_json_from_text()` - JSON extraction from LLM responses |
| 24 | +- `normalize_and_validate()` - Data normalization and validation |
| 25 | + |
| 26 | +**When to use**: |
| 27 | +- Debugging extraction issues |
| 28 | +- Verifying helper function changes |
| 29 | +- Understanding how utilities work |
| 30 | +- Testing edge cases in isolation |
| 31 | + |
| 32 | +**How to run**: |
| 33 | +```bash |
| 34 | +cd test/manual |
| 35 | +python helper_function_verification.py |
| 36 | +``` |
| 37 | + |
| 38 | +**Expected output**: |
| 39 | +``` |
| 40 | +====================================================================== |
| 41 | +MANUAL VERIFICATION TESTS |
| 42 | +Testing RequirementsExtractor Helper Functions |
| 43 | +====================================================================== |
| 44 | +
|
| 45 | +Test 1: split_markdown_for_llm() |
| 46 | +====================================================================== |
| 47 | +Split into 2 chunks: |
| 48 | +✓ PASSED |
| 49 | +
|
| 50 | +[... more tests ...] |
| 51 | +
|
| 52 | +====================================================================== |
| 53 | +✅ ALL TESTS PASSED |
| 54 | +====================================================================== |
| 55 | +
|
| 56 | +🎉 RequirementsExtractor helper functions are working correctly! |
| 57 | +``` |
| 58 | + |
| 59 | +--- |
| 60 | + |
| 61 | +### 2. `quick_integration_test.py` |
| 62 | + |
| 63 | +**Purpose**: End-to-end integration test with actual document files |
| 64 | + |
| 65 | +**Tests**: |
| 66 | +- Complete document extraction workflow |
| 67 | +- Real PDF/DOCX/PPTX processing |
| 68 | +- DocumentAgent functionality |
| 69 | +- Performance measurement |
| 70 | + |
| 71 | +**When to use**: |
| 72 | +- Testing changes with real documents |
| 73 | +- Performance benchmarking |
| 74 | +- Quick validation before commits |
| 75 | +- Reproducing user-reported issues |
| 76 | + |
| 77 | +**How to run**: |
| 78 | +```bash |
| 79 | +cd test/manual |
| 80 | +python quick_integration_test.py |
| 81 | +``` |
| 82 | + |
| 83 | +**Requirements**: |
| 84 | +- Test documents in `test/debug/samples/`: |
| 85 | + - `small_requirements.pdf` |
| 86 | + - `large_requirements.pdf` |
| 87 | + - `business_requirements.docx` |
| 88 | + - `architecture.pptx` |
| 89 | + |
| 90 | +**Expected output**: |
| 91 | +``` |
| 92 | +====================================================================== |
| 93 | +🧪 Quick Integration Test - Phase 2 Task 6 |
| 94 | +====================================================================== |
| 95 | +
|
| 96 | +🤖 Initializing DocumentAgent... |
| 97 | + ✅ Agent ready |
| 98 | +
|
| 99 | +====================================================================== |
| 100 | +📄 Testing: small_requirements.pdf |
| 101 | +====================================================================== |
| 102 | +📊 File size: 145.2 KB |
| 103 | +🚀 Extracting requirements... |
| 104 | +
|
| 105 | +✅ Completed in 8.3s |
| 106 | +📊 Results: |
| 107 | + • Sections: 5 |
| 108 | + • Requirements: 23 |
| 109 | + - functional: 15 |
| 110 | + - non-functional: 8 |
| 111 | +``` |
| 112 | + |
| 113 | +--- |
| 114 | + |
| 115 | +### 3. `quality_validation.py` |
| 116 | + |
| 117 | +**Purpose**: Human validation of AI extraction quality metrics |
| 118 | + |
| 119 | +**Tests**: |
| 120 | +- Confidence score accuracy |
| 121 | +- Quality flag correctness |
| 122 | +- Human agreement with AI assessments |
| 123 | +- Interactive requirement review |
| 124 | + |
| 125 | +**When to use**: |
| 126 | +- Validating Task 7 quality metrics |
| 127 | +- Tuning confidence thresholds |
| 128 | +- Assessing extraction accuracy |
| 129 | +- Gathering human validation data |
| 130 | + |
| 131 | +**How to run**: |
| 132 | +```bash |
| 133 | +cd test/manual |
| 134 | +python quality_validation.py |
| 135 | +``` |
| 136 | + |
| 137 | +**Requirements**: |
| 138 | +- Benchmark results in `test/test_results/benchmark_logs/benchmark_latest.json` |
| 139 | +- Run benchmark first: `python test/benchmark/benchmark_performance.py` |
| 140 | + |
| 141 | +**Expected output**: |
| 142 | +``` |
| 143 | +====================================================================== |
| 144 | +Task 7 Quality Metrics - Manual Validation |
| 145 | +====================================================================== |
| 146 | +
|
| 147 | +This tool helps validate that Task 7 confidence scores and |
| 148 | +quality flags are accurate through manual review. |
| 149 | +
|
| 150 | +📂 Loading benchmark results from: benchmark_latest.json |
| 151 | +
|
| 152 | +⚙️ Configuration: |
| 153 | + Sample size (default: 20): 10 |
| 154 | + → Validating 10 requirements |
| 155 | +
|
| 156 | +[Interactive validation prompts...] |
| 157 | +
|
| 158 | +====================================================================== |
| 159 | +VALIDATION REPORT |
| 160 | +====================================================================== |
| 161 | +
|
| 162 | +📊 Overall Results (n=10): |
| 163 | + • Complete & well-formed: 9/10 (90.0%) |
| 164 | + • ID correct: 10/10 (100.0%) |
| 165 | + • Category correct: 9/10 (90.0%) |
| 166 | + • Would approve: 8/10 (80.0%) |
| 167 | +
|
| 168 | +🎯 Confidence Score Assessment: |
| 169 | + • Too high (overconfident): 1/10 |
| 170 | + • About right (accurate): 8/10 |
| 171 | + • Too low (underconfident): 1/10 |
| 172 | +``` |
| 173 | + |
| 174 | +--- |
| 175 | + |
| 176 | +### 4. `unit_test_helpers.py` |
| 177 | + |
| 178 | +**Purpose**: Shared utilities for manual tests |
| 179 | + |
| 180 | +**Contents**: |
| 181 | +- Common test fixtures |
| 182 | +- Helper functions |
| 183 | +- Shared mock data |
| 184 | +- Utility classes |
| 185 | + |
| 186 | +**When to use**: |
| 187 | +- Import from other manual tests |
| 188 | +- Avoid code duplication |
| 189 | +- Share test utilities |
| 190 | + |
| 191 | +**How to use**: |
| 192 | +```python |
| 193 | +from unit_test_helpers import create_test_document, mock_llm_response |
| 194 | + |
| 195 | +# Use helper functions in your manual tests |
| 196 | +doc = create_test_document("sample.pdf") |
| 197 | +response = mock_llm_response() |
| 198 | +``` |
| 199 | + |
| 200 | +--- |
| 201 | + |
| 202 | +## Comparison with Automated Tests |
| 203 | + |
| 204 | +| Aspect | Automated Tests | Manual Tests | |
| 205 | +|--------|----------------|--------------| |
| 206 | +| **Location** | `test/unit/`, `test/integration/` | `test/manual/` | |
| 207 | +| **Run by** | CI/CD (pytest) | Developer (python) | |
| 208 | +| **Purpose** | Regression prevention | Deep validation | |
| 209 | +| **Scope** | Unit/Integration | End-to-end + Human | |
| 210 | +| **Speed** | Fast (< 1 min) | Slow (1-10 min) | |
| 211 | +| **Coverage** | Comprehensive | Targeted | |
| 212 | +| **Pass/Fail** | Automatic | Manual assessment | |
| 213 | + |
| 214 | +## When to Use Manual Tests |
| 215 | + |
| 216 | +### Use `helper_function_verification.py` when: |
| 217 | +- ✅ Debugging helper function issues |
| 218 | +- ✅ Testing edge cases in utilities |
| 219 | +- ✅ Verifying refactoring changes |
| 220 | +- ✅ Learning how utilities work |
| 221 | + |
| 222 | +### Use `quick_integration_test.py` when: |
| 223 | +- ✅ Testing with real documents |
| 224 | +- ✅ Measuring performance |
| 225 | +- ✅ Reproducing user issues |
| 226 | +- ✅ Validating before major commits |
| 227 | + |
| 228 | +### Use `quality_validation.py` when: |
| 229 | +- ✅ Tuning confidence thresholds |
| 230 | +- ✅ Validating AI quality metrics |
| 231 | +- ✅ Assessing extraction accuracy |
| 232 | +- ✅ Gathering validation data |
| 233 | + |
| 234 | +## Running All Manual Tests |
| 235 | + |
| 236 | +```bash |
| 237 | +# Run helper function verification |
| 238 | +python test/manual/helper_function_verification.py |
| 239 | + |
| 240 | +# Run quick integration test (requires test documents) |
| 241 | +python test/manual/quick_integration_test.py |
| 242 | + |
| 243 | +# Run quality validation (requires benchmark results) |
| 244 | +python test/manual/quality_validation.py |
| 245 | +``` |
| 246 | + |
| 247 | +## Test Data Requirements |
| 248 | + |
| 249 | +### For `quick_integration_test.py`: |
| 250 | + |
| 251 | +Create test documents in `test/debug/samples/`: |
| 252 | + |
| 253 | +```bash |
| 254 | +test/debug/samples/ |
| 255 | +├── small_requirements.pdf # Small requirements doc (100-200 KB) |
| 256 | +├── large_requirements.pdf # Large requirements doc (1-5 MB) |
| 257 | +├── business_requirements.docx # Business requirements |
| 258 | +└── architecture.pptx # Architecture diagrams |
| 259 | +``` |
| 260 | + |
| 261 | +### For `quality_validation.py`: |
| 262 | + |
| 263 | +Run benchmark first to generate results: |
| 264 | + |
| 265 | +```bash |
| 266 | +python test/benchmark/benchmark_performance.py |
| 267 | +# Creates: test/test_results/benchmark_logs/benchmark_latest.json |
| 268 | +``` |
| 269 | + |
| 270 | +## Integration with Automated Tests |
| 271 | + |
| 272 | +Manual tests complement automated tests: |
| 273 | + |
| 274 | +``` |
| 275 | +test/ |
| 276 | +├── unit/ # Fast, isolated unit tests (pytest) |
| 277 | +├── integration/ # API integration tests (pytest) |
| 278 | +├── smoke/ # Basic functionality tests (pytest) |
| 279 | +├── e2e/ # End-to-end workflows (pytest) |
| 280 | +├── manual/ # Manual verification tests (python) |
| 281 | +│ ├── helper_function_verification.py |
| 282 | +│ ├── quick_integration_test.py |
| 283 | +│ ├── quality_validation.py |
| 284 | +│ └── unit_test_helpers.py |
| 285 | +└── benchmark/ # Performance benchmarks (python) |
| 286 | + └── benchmark_performance.py |
| 287 | +``` |
| 288 | + |
| 289 | +## Best Practices |
| 290 | + |
| 291 | +1. **Run manual tests before major commits** |
| 292 | + - Especially when changing core extraction logic |
| 293 | + - Validates that automated tests don't catch everything |
| 294 | + |
| 295 | +2. **Use for debugging** |
| 296 | + - Manual tests provide more visibility |
| 297 | + - Easier to add print statements and inspect state |
| 298 | + |
| 299 | +3. **Document findings** |
| 300 | + - If manual tests reveal issues, add automated tests |
| 301 | + - Convert manual test cases to automated when possible |
| 302 | + |
| 303 | +4. **Keep them simple** |
| 304 | + - Manual tests should be easy to run |
| 305 | + - No complex setup or dependencies |
| 306 | + |
| 307 | +5. **Update with code changes** |
| 308 | + - Keep manual tests in sync with API changes |
| 309 | + - Update when helper functions change |
| 310 | + |
| 311 | +## Troubleshooting |
| 312 | + |
| 313 | +### "Module not found" errors |
| 314 | + |
| 315 | +Ensure PYTHONPATH is set: |
| 316 | +```bash |
| 317 | +export PYTHONPATH=. |
| 318 | +python test/manual/helper_function_verification.py |
| 319 | +``` |
| 320 | + |
| 321 | +Or run from project root: |
| 322 | +```bash |
| 323 | +cd /path/to/unstructuredDataHandler |
| 324 | +python test/manual/helper_function_verification.py |
| 325 | +``` |
| 326 | + |
| 327 | +### "File not found" for test documents |
| 328 | + |
| 329 | +Create test documents or update paths in test files: |
| 330 | +```python |
| 331 | +# In quick_integration_test.py |
| 332 | +samples_dir = Path(__file__).parent.parent / "debug" / "samples" |
| 333 | +``` |
| 334 | + |
| 335 | +### Benchmark results not found |
| 336 | + |
| 337 | +Run benchmark first: |
| 338 | +```bash |
| 339 | +python test/benchmark/benchmark_performance.py |
| 340 | +``` |
| 341 | + |
| 342 | +## Contributing |
| 343 | + |
| 344 | +When adding new manual tests: |
| 345 | + |
| 346 | +1. **Name clearly**: `*_verification.py` or `*_validation.py` |
| 347 | +2. **Add to this README**: Document purpose and usage |
| 348 | +3. **Keep focused**: Each test should have clear purpose |
| 349 | +4. **Make executable**: Add shebang and main guard |
| 350 | +5. **Document requirements**: List dependencies and test data |
| 351 | + |
| 352 | +## Summary |
| 353 | + |
| 354 | +Manual tests in this directory provide: |
| 355 | + |
| 356 | +- ✅ **Verification**: Helper function correctness |
| 357 | +- ✅ **Integration**: End-to-end workflow validation |
| 358 | +- ✅ **Quality**: Human assessment of AI metrics |
| 359 | +- ✅ **Debugging**: Detailed visibility into operations |
| 360 | +- ✅ **Performance**: Real-world timing data |
| 361 | + |
| 362 | +Use them alongside automated tests for comprehensive validation! |
| 363 | + |
| 364 | +--- |
| 365 | + |
| 366 | +**Last Updated**: October 7, 2025 |
| 367 | +**Maintainer**: Development Team |
0 commit comments