Skip to content

Commit aeb4b34

Browse files
committed
docs: add comprehensive README for manual test suite
- Document all manual test files and their purposes - Explain differences from automated tests - Provide usage instructions and examples - Add troubleshooting guide - Include best practices for manual testing - Clarify when to use each manual test - Document test data requirements
1 parent e59c569 commit aeb4b34

File tree

1 file changed

+367
-0
lines changed

1 file changed

+367
-0
lines changed

test/manual/README.md

Lines changed: 367 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,367 @@
1+
# Manual Test Suite
2+
3+
This directory contains manual verification and testing scripts that complement the automated test suite. These tests are designed to be run manually for validation, debugging, and quality assessment.
4+
5+
## Overview
6+
7+
Manual tests serve different purposes than automated tests:
8+
9+
- **Automated tests** (`test/unit/`, `test/integration/`, etc.): Run automatically in CI/CD
10+
- **Manual tests** (`test/manual/`): Run manually for deep validation, debugging, or human assessment
11+
12+
## Test Files
13+
14+
### 1. `helper_function_verification.py`
15+
16+
**Purpose**: Verify low-level utility functions of RequirementsExtractor
17+
18+
**Tests**:
19+
- `split_markdown_for_llm()` - Markdown chunking with overlap
20+
- `parse_md_headings()` - Heading extraction and parsing
21+
- `merge_section_lists()` - Section deduplication and merging
22+
- `merge_requirement_lists()` - Requirement deduplication
23+
- `extract_json_from_text()` - JSON extraction from LLM responses
24+
- `normalize_and_validate()` - Data normalization and validation
25+
26+
**When to use**:
27+
- Debugging extraction issues
28+
- Verifying helper function changes
29+
- Understanding how utilities work
30+
- Testing edge cases in isolation
31+
32+
**How to run**:
33+
```bash
34+
cd test/manual
35+
python helper_function_verification.py
36+
```
37+
38+
**Expected output**:
39+
```
40+
======================================================================
41+
MANUAL VERIFICATION TESTS
42+
Testing RequirementsExtractor Helper Functions
43+
======================================================================
44+
45+
Test 1: split_markdown_for_llm()
46+
======================================================================
47+
Split into 2 chunks:
48+
✓ PASSED
49+
50+
[... more tests ...]
51+
52+
======================================================================
53+
✅ ALL TESTS PASSED
54+
======================================================================
55+
56+
🎉 RequirementsExtractor helper functions are working correctly!
57+
```
58+
59+
---
60+
61+
### 2. `quick_integration_test.py`
62+
63+
**Purpose**: End-to-end integration test with actual document files
64+
65+
**Tests**:
66+
- Complete document extraction workflow
67+
- Real PDF/DOCX/PPTX processing
68+
- DocumentAgent functionality
69+
- Performance measurement
70+
71+
**When to use**:
72+
- Testing changes with real documents
73+
- Performance benchmarking
74+
- Quick validation before commits
75+
- Reproducing user-reported issues
76+
77+
**How to run**:
78+
```bash
79+
cd test/manual
80+
python quick_integration_test.py
81+
```
82+
83+
**Requirements**:
84+
- Test documents in `test/debug/samples/`:
85+
- `small_requirements.pdf`
86+
- `large_requirements.pdf`
87+
- `business_requirements.docx`
88+
- `architecture.pptx`
89+
90+
**Expected output**:
91+
```
92+
======================================================================
93+
🧪 Quick Integration Test - Phase 2 Task 6
94+
======================================================================
95+
96+
🤖 Initializing DocumentAgent...
97+
✅ Agent ready
98+
99+
======================================================================
100+
📄 Testing: small_requirements.pdf
101+
======================================================================
102+
📊 File size: 145.2 KB
103+
🚀 Extracting requirements...
104+
105+
✅ Completed in 8.3s
106+
📊 Results:
107+
• Sections: 5
108+
• Requirements: 23
109+
- functional: 15
110+
- non-functional: 8
111+
```
112+
113+
---
114+
115+
### 3. `quality_validation.py`
116+
117+
**Purpose**: Human validation of AI extraction quality metrics
118+
119+
**Tests**:
120+
- Confidence score accuracy
121+
- Quality flag correctness
122+
- Human agreement with AI assessments
123+
- Interactive requirement review
124+
125+
**When to use**:
126+
- Validating Task 7 quality metrics
127+
- Tuning confidence thresholds
128+
- Assessing extraction accuracy
129+
- Gathering human validation data
130+
131+
**How to run**:
132+
```bash
133+
cd test/manual
134+
python quality_validation.py
135+
```
136+
137+
**Requirements**:
138+
- Benchmark results in `test/test_results/benchmark_logs/benchmark_latest.json`
139+
- Run benchmark first: `python test/benchmark/benchmark_performance.py`
140+
141+
**Expected output**:
142+
```
143+
======================================================================
144+
Task 7 Quality Metrics - Manual Validation
145+
======================================================================
146+
147+
This tool helps validate that Task 7 confidence scores and
148+
quality flags are accurate through manual review.
149+
150+
📂 Loading benchmark results from: benchmark_latest.json
151+
152+
⚙️ Configuration:
153+
Sample size (default: 20): 10
154+
→ Validating 10 requirements
155+
156+
[Interactive validation prompts...]
157+
158+
======================================================================
159+
VALIDATION REPORT
160+
======================================================================
161+
162+
📊 Overall Results (n=10):
163+
• Complete & well-formed: 9/10 (90.0%)
164+
• ID correct: 10/10 (100.0%)
165+
• Category correct: 9/10 (90.0%)
166+
• Would approve: 8/10 (80.0%)
167+
168+
🎯 Confidence Score Assessment:
169+
• Too high (overconfident): 1/10
170+
• About right (accurate): 8/10
171+
• Too low (underconfident): 1/10
172+
```
173+
174+
---
175+
176+
### 4. `unit_test_helpers.py`
177+
178+
**Purpose**: Shared utilities for manual tests
179+
180+
**Contents**:
181+
- Common test fixtures
182+
- Helper functions
183+
- Shared mock data
184+
- Utility classes
185+
186+
**When to use**:
187+
- Import from other manual tests
188+
- Avoid code duplication
189+
- Share test utilities
190+
191+
**How to use**:
192+
```python
193+
from unit_test_helpers import create_test_document, mock_llm_response
194+
195+
# Use helper functions in your manual tests
196+
doc = create_test_document("sample.pdf")
197+
response = mock_llm_response()
198+
```
199+
200+
---
201+
202+
## Comparison with Automated Tests
203+
204+
| Aspect | Automated Tests | Manual Tests |
205+
|--------|----------------|--------------|
206+
| **Location** | `test/unit/`, `test/integration/` | `test/manual/` |
207+
| **Run by** | CI/CD (pytest) | Developer (python) |
208+
| **Purpose** | Regression prevention | Deep validation |
209+
| **Scope** | Unit/Integration | End-to-end + Human |
210+
| **Speed** | Fast (< 1 min) | Slow (1-10 min) |
211+
| **Coverage** | Comprehensive | Targeted |
212+
| **Pass/Fail** | Automatic | Manual assessment |
213+
214+
## When to Use Manual Tests
215+
216+
### Use `helper_function_verification.py` when:
217+
- ✅ Debugging helper function issues
218+
- ✅ Testing edge cases in utilities
219+
- ✅ Verifying refactoring changes
220+
- ✅ Learning how utilities work
221+
222+
### Use `quick_integration_test.py` when:
223+
- ✅ Testing with real documents
224+
- ✅ Measuring performance
225+
- ✅ Reproducing user issues
226+
- ✅ Validating before major commits
227+
228+
### Use `quality_validation.py` when:
229+
- ✅ Tuning confidence thresholds
230+
- ✅ Validating AI quality metrics
231+
- ✅ Assessing extraction accuracy
232+
- ✅ Gathering validation data
233+
234+
## Running All Manual Tests
235+
236+
```bash
237+
# Run helper function verification
238+
python test/manual/helper_function_verification.py
239+
240+
# Run quick integration test (requires test documents)
241+
python test/manual/quick_integration_test.py
242+
243+
# Run quality validation (requires benchmark results)
244+
python test/manual/quality_validation.py
245+
```
246+
247+
## Test Data Requirements
248+
249+
### For `quick_integration_test.py`:
250+
251+
Create test documents in `test/debug/samples/`:
252+
253+
```bash
254+
test/debug/samples/
255+
├── small_requirements.pdf # Small requirements doc (100-200 KB)
256+
├── large_requirements.pdf # Large requirements doc (1-5 MB)
257+
├── business_requirements.docx # Business requirements
258+
└── architecture.pptx # Architecture diagrams
259+
```
260+
261+
### For `quality_validation.py`:
262+
263+
Run benchmark first to generate results:
264+
265+
```bash
266+
python test/benchmark/benchmark_performance.py
267+
# Creates: test/test_results/benchmark_logs/benchmark_latest.json
268+
```
269+
270+
## Integration with Automated Tests
271+
272+
Manual tests complement automated tests:
273+
274+
```
275+
test/
276+
├── unit/ # Fast, isolated unit tests (pytest)
277+
├── integration/ # API integration tests (pytest)
278+
├── smoke/ # Basic functionality tests (pytest)
279+
├── e2e/ # End-to-end workflows (pytest)
280+
├── manual/ # Manual verification tests (python)
281+
│ ├── helper_function_verification.py
282+
│ ├── quick_integration_test.py
283+
│ ├── quality_validation.py
284+
│ └── unit_test_helpers.py
285+
└── benchmark/ # Performance benchmarks (python)
286+
└── benchmark_performance.py
287+
```
288+
289+
## Best Practices
290+
291+
1. **Run manual tests before major commits**
292+
- Especially when changing core extraction logic
293+
- Validates that automated tests don't catch everything
294+
295+
2. **Use for debugging**
296+
- Manual tests provide more visibility
297+
- Easier to add print statements and inspect state
298+
299+
3. **Document findings**
300+
- If manual tests reveal issues, add automated tests
301+
- Convert manual test cases to automated when possible
302+
303+
4. **Keep them simple**
304+
- Manual tests should be easy to run
305+
- No complex setup or dependencies
306+
307+
5. **Update with code changes**
308+
- Keep manual tests in sync with API changes
309+
- Update when helper functions change
310+
311+
## Troubleshooting
312+
313+
### "Module not found" errors
314+
315+
Ensure PYTHONPATH is set:
316+
```bash
317+
export PYTHONPATH=.
318+
python test/manual/helper_function_verification.py
319+
```
320+
321+
Or run from project root:
322+
```bash
323+
cd /path/to/unstructuredDataHandler
324+
python test/manual/helper_function_verification.py
325+
```
326+
327+
### "File not found" for test documents
328+
329+
Create test documents or update paths in test files:
330+
```python
331+
# In quick_integration_test.py
332+
samples_dir = Path(__file__).parent.parent / "debug" / "samples"
333+
```
334+
335+
### Benchmark results not found
336+
337+
Run benchmark first:
338+
```bash
339+
python test/benchmark/benchmark_performance.py
340+
```
341+
342+
## Contributing
343+
344+
When adding new manual tests:
345+
346+
1. **Name clearly**: `*_verification.py` or `*_validation.py`
347+
2. **Add to this README**: Document purpose and usage
348+
3. **Keep focused**: Each test should have clear purpose
349+
4. **Make executable**: Add shebang and main guard
350+
5. **Document requirements**: List dependencies and test data
351+
352+
## Summary
353+
354+
Manual tests in this directory provide:
355+
356+
-**Verification**: Helper function correctness
357+
-**Integration**: End-to-end workflow validation
358+
-**Quality**: Human assessment of AI metrics
359+
-**Debugging**: Detailed visibility into operations
360+
-**Performance**: Real-world timing data
361+
362+
Use them alongside automated tests for comprehensive validation!
363+
364+
---
365+
366+
**Last Updated**: October 7, 2025
367+
**Maintainer**: Development Team

0 commit comments

Comments
 (0)