|
| 1 | +# Test and Example Files Cleanup Plan |
| 2 | + |
| 3 | +## Analysis of Redundancy |
| 4 | + |
| 5 | +### 1. Test Files Analysis |
| 6 | + |
| 7 | +#### Redundant/Similar Test Files: |
| 8 | +1. **`test_examples.py`** vs **`test_client.py`** + **`test_async_client.py`** |
| 9 | + - `test_examples.py`: Tests example scripts execution (integration-style) |
| 10 | + - `test_client.py`/`test_async_client.py`: Unit tests for client methods |
| 11 | + - **Status**: NOT redundant - different purposes (integration vs unit tests) |
| 12 | + |
| 13 | +2. **`test_tools.py`** vs **`test_tools_coverage.py`** |
| 14 | + - `test_tools.py`: Tests tool classes and serialization |
| 15 | + - `test_tools_coverage.py`: Tests all tool classes for contract compliance |
| 16 | + - **Status**: POTENTIALLY redundant - both test tool serialization |
| 17 | + - **Recommendation**: Merge into single comprehensive test file |
| 18 | + |
| 19 | +3. **`test_client_errors.py`** vs **`test_async_client_errors.py`** |
| 20 | + - Both test error handling |
| 21 | + - **Status**: NOT redundant - sync vs async versions needed |
| 22 | + |
| 23 | +4. **`test_task_status_and_wait.py`** |
| 24 | + - Tests task status and waiting logic |
| 25 | + - **Status**: Could be merged into `test_client.py` for better organization |
| 26 | + |
| 27 | +### 2. Example Files Analysis |
| 28 | + |
| 29 | +#### Redundant/Similar Example Files: |
| 30 | +1. **`quick_start.py`** vs **`full_acceptance_test.py`** |
| 31 | + - `quick_start.py`: Quick validation of core features |
| 32 | + - `full_acceptance_test.py`: Comprehensive acceptance test suite |
| 33 | + - **Status**: NOT redundant - different scopes (quick vs comprehensive) |
| 34 | + |
| 35 | +2. **`demo_web_scraper_api.py`** vs **`demo_web_scraper_multi_spider.py`** |
| 36 | + - `demo_web_scraper_api.py`: Single spider workflow demo |
| 37 | + - `demo_web_scraper_multi_spider.py`: Multiple spiders test |
| 38 | + - **Status**: NOT redundant - different use cases |
| 39 | + |
| 40 | +3. **`demo_universal.py`** vs **`quick_start.py`** (Universal section) |
| 41 | + - Both demonstrate Universal API |
| 42 | + - **Status**: MINOR redundancy - `demo_universal.py` is more focused |
| 43 | + - **Recommendation**: Keep both (focused demo vs quick start) |
| 44 | + |
| 45 | +4. **`validate_env.py`** |
| 46 | + - Validates environment variables |
| 47 | + - **Status**: Unique utility, keep |
| 48 | + |
| 49 | +5. **`diagnose_network.py`** |
| 50 | + - Network diagnostics utility |
| 51 | + - **Status**: Unique utility, keep |
| 52 | + |
| 53 | +### 3. Integration Test Files |
| 54 | + |
| 55 | +#### Important Integration Tests: |
| 56 | +1. **`test_integration_proxy_protocols.py`** |
| 57 | + - Tests proxy protocol connectivity (HTTPS, SOCKS5h) |
| 58 | + - **Status**: IMPORTANT - validates proxy network functionality |
| 59 | + - **Requires**: `THORDATA_INTEGRATION=true`, proxy credentials |
| 60 | + |
| 61 | +2. **`test_tools_coverage.py`** (with integration flag) |
| 62 | + - Tests all tool classes with real API calls |
| 63 | + - **Status**: IMPORTANT - validates tool contracts |
| 64 | + - **Requires**: `THORDATA_INTEGRATION=true` |
| 65 | + |
| 66 | +## Cleanup Recommendations |
| 67 | + |
| 68 | +### High Priority (Remove/Merge) |
| 69 | + |
| 70 | +1. **Merge `test_tools.py` into `test_tools_coverage.py`** |
| 71 | + - Both test tool serialization |
| 72 | + - `test_tools_coverage.py` is more comprehensive |
| 73 | + - **Action**: Merge functionality, remove `test_tools.py` |
| 74 | + |
| 75 | +2. **Merge `test_task_status_and_wait.py` into `test_client.py`** |
| 76 | + - Task-related tests should be in main client test file |
| 77 | + - **Action**: Move tests, remove separate file |
| 78 | + |
| 79 | +### Medium Priority (Review/Consolidate) |
| 80 | + |
| 81 | +3. **Review example file organization** |
| 82 | + - Consider grouping by category: |
| 83 | + - `examples/basic/` - quick_start.py, validate_env.py |
| 84 | + - `examples/demos/` - demo_*.py files |
| 85 | + - `examples/tools/` - tool-specific examples (already exists) |
| 86 | + - **Action**: Reorganize for better discoverability |
| 87 | + |
| 88 | +### Low Priority (Keep but Document) |
| 89 | + |
| 90 | +4. **Keep all demo files** |
| 91 | + - Each serves a specific purpose |
| 92 | + - **Action**: Add clear docstrings explaining when to use each |
| 93 | + |
| 94 | +## Integration Test Execution Plan |
| 95 | + |
| 96 | +### Required Environment Variables: |
| 97 | +```bash |
| 98 | +THORDATA_INTEGRATION=true |
| 99 | +THORDATA_SCRAPER_TOKEN=... |
| 100 | +THORDATA_PUBLIC_TOKEN=... |
| 101 | +THORDATA_PUBLIC_KEY=... |
| 102 | +THORDATA_PROXY_HOST=pr.thordata.net |
| 103 | +THORDATA_RESIDENTIAL_USERNAME=... |
| 104 | +THORDATA_RESIDENTIAL_PASSWORD=... |
| 105 | +THORDATA_INTEGRATION_HTTP=true # Optional: test HTTP protocol too |
| 106 | +THORDATA_INTEGRATION_STRICT=true # Optional: fail on any error |
| 107 | +``` |
| 108 | + |
| 109 | +### Test Execution: |
| 110 | +1. Proxy Protocol Integration Test |
| 111 | +2. Tools Coverage Integration Test (if enabled) |
| 112 | +3. All other integration tests |
| 113 | + |
| 114 | +## File Structure After Cleanup |
| 115 | + |
| 116 | +``` |
| 117 | +tests/ |
| 118 | +├── test_client.py # Main sync client tests (includes task tests) |
| 119 | +├── test_async_client.py # Main async client tests |
| 120 | +├── test_client_errors.py # Sync error handling |
| 121 | +├── test_async_client_errors.py # Async error handling |
| 122 | +├── test_tools_coverage.py # All tool tests (merged from test_tools.py) |
| 123 | +├── test_integration_proxy_protocols.py # Proxy integration test |
| 124 | +├── test_examples.py # Example scripts integration test |
| 125 | +├── test_browser.py # Browser tests |
| 126 | +├── test_unlimited.py # Unlimited namespace tests |
| 127 | +├── test_batch_creation.py # Batch creation tests |
| 128 | +├── test_models.py # Model tests |
| 129 | +├── test_exceptions.py # Exception tests |
| 130 | +├── test_retry.py # Retry logic tests |
| 131 | +├── test_utils.py # Utility function tests |
| 132 | +├── test_enums.py # Enum tests |
| 133 | +├── test_env.py # Environment tests |
| 134 | +├── test_user_agent.py # User agent tests |
| 135 | +└── test_spec_parity.py # Spec parity tests |
| 136 | +
|
| 137 | +examples/ |
| 138 | +├── quick_start.py # Quick validation |
| 139 | +├── full_acceptance_test.py # Comprehensive acceptance |
| 140 | +├── validate_env.py # Environment validation |
| 141 | +├── diagnose_network.py # Network diagnostics |
| 142 | +├── demo_serp_api.py # SERP demo |
| 143 | +├── demo_universal.py # Universal demo |
| 144 | +├── demo_web_scraper_api.py # Web Scraper single workflow |
| 145 | +├── demo_web_scraper_multi_spider.py # Multi-spider test |
| 146 | +├── demo_proxy_network.py # Proxy demo |
| 147 | +├── demo_browser_api.py # Browser demo |
| 148 | +├── demo_scraping_browser.py # Scraping browser demo |
| 149 | +├── demo_account_and_usage.py # Account demo |
| 150 | +├── async_high_concurrency.py # Async concurrency demo |
| 151 | +└── tools/ # Tool-specific examples |
| 152 | + ├── amazon_scraper.py |
| 153 | + ├── google_maps_scraper.py |
| 154 | + ├── social_media_scraper.py |
| 155 | + └── youtube_downloader.py |
| 156 | +``` |
| 157 | + |
| 158 | +## Implementation Steps |
| 159 | + |
| 160 | +1. **Phase 1: Merge Test Files** |
| 161 | + - Merge `test_tools.py` into `test_tools_coverage.py` |
| 162 | + - Merge `test_task_status_and_wait.py` into `test_client.py` |
| 163 | + - Run full test suite to ensure nothing breaks |
| 164 | + |
| 165 | +2. **Phase 2: Run Integration Tests** |
| 166 | + - Set up environment variables |
| 167 | + - Run `test_integration_proxy_protocols.py` |
| 168 | + - Run `test_tools_coverage.py` with integration flag |
| 169 | + - Verify 100% pass rate |
| 170 | + |
| 171 | +3. **Phase 3: Reorganize Examples (Optional)** |
| 172 | + - Group examples by category |
| 173 | + - Update documentation references |
| 174 | + |
| 175 | +4. **Phase 4: Documentation** |
| 176 | + - Add clear docstrings to all example files |
| 177 | + - Update README with file organization |
| 178 | + |
| 179 | +## Notes |
| 180 | + |
| 181 | +- All integration tests are IMPORTANT for validating real-world functionality |
| 182 | +- Integration tests require real credentials and may consume quota |
| 183 | +- Integration tests should be run before major releases |
| 184 | +- Consider adding CI/CD integration test runs (with proper credential management) |
0 commit comments