This document provides comprehensive coverage of the FileUtils test suite, explaining what we test, why we test it, and how the tests validate key functionality.
- Test Philosophy
- Test Structure
- Unit Tests
- Integration Tests
- Test Data and Fixtures
- Key Functionality Coverage
- Running Tests
- Test Maintenance
- Comprehensive Coverage: Every public API method and major workflow is tested
- Real-world Scenarios: Tests use realistic data and business scenarios
- Error Handling: Both happy path and error conditions are validated
- Data Integrity: Round-trip operations preserve data accuracy
- Regression Prevention: Tests catch breaking changes in existing functionality
- Unit Tests: Test individual methods in isolation with controlled inputs
- Integration Tests: Test complete workflows with realistic data
- Edge Cases: Validate behavior with unusual inputs and conditions
- Error Scenarios: Ensure graceful handling of failures
tests/
├── conftest.py # Shared fixtures and test configuration
├── unit/
│ ├── test_file_utils.py # Core FileUtils functionality tests
│ └── test_document_types.py # Document format specific tests
└── integration/
├── test_azure_storage.py # Azure Blob Storage integration tests
└── test_excel_csv_conversion.py # Excel ↔ CSV round-trip tests
test_initialization: Validates FileUtils initialization with custom config- Purpose: Ensures proper setup of storage backends and configuration
- Key API:
FileUtils(project_root, config_file)
test_save_single_dataframe: Tests saving individual DataFramestest_save_multiple_dataframes: Tests saving multi-sheet workbooks- Purpose: Validates core data persistence functionality
- Key API:
save_data_to_storage(data, output_filetype, output_type, file_name)
test_load_single_file: Tests loading individual filestest_load_excel_sheets: Tests loading multi-sheet Excel filestest_load_multiple_files: Tests batch file loading- Purpose: Ensures data can be retrieved correctly
- Key API:
load_single_file(),load_excel_sheets(),load_multiple_files()
- What: Tests Excel to CSV conversion with structure preservation
- Why: Validates the core Excel → CSV workflow with metadata tracking
- Key API:
convert_excel_to_csv_with_structure(excel_file_path, preserve_structure=True) - Validates:
- CSV files are created for each Excel sheet
- Structure JSON contains workbook metadata
- Sheet dimensions, columns, and data types are preserved
- Data integrity through conversion process
- What: Tests Excel to CSV conversion without structure preservation
- Why: Ensures optional structure file creation works correctly
- Key API:
convert_excel_to_csv_with_structure(excel_file_path, preserve_structure=False) - Validates:
- CSV files are created normally
- No structure JSON file is generated
- Function returns empty string for structure file
- What: Tests CSV to Excel workbook reconstruction
- Why: Validates the CSV → Excel reconstruction workflow
- Key API:
convert_csv_to_excel_workbook(structure_json_path, file_name) - Validates:
- Excel workbook is reconstructed from CSV files
- Sheet names and structure are preserved
- Data content matches original
- Reconstruction metadata is created
- What: Tests complete Excel ↔ CSV round-trip workflow
- Why: Validates end-to-end data processing pipeline
- Key API: Both conversion methods in sequence
- Validates:
- Complete workflow: Excel → CSV → Modify → Excel
- Data modifications are preserved
- Multi-sheet workbooks maintain structure
- Data integrity through entire process
- What: Tests default directory configuration behavior
- Why: Validates backward compatibility and default settings
- Key API:
_get_directory_config(),get_data_path(data_type) - Validates:
- Default configuration uses "data" as main directory
- Subdirectories use standard names ("raw", "processed", "templates")
- Path generation works with default settings
- Directories are created automatically
- What: Tests custom directory configuration
- Why: Validates domain-specific directory naming
- Key API:
FileUtils(config_override=custom_config) - Validates:
- Custom main directory name is used
- Custom subdirectory names are applied
- Path resolution works with custom configuration
- Directory structure matches configuration
- What: Tests file operations with custom directory configuration
- Why: Validates that all file operations work with custom directories
- Key API:
save_data_to_storage(),load_single_file() - Validates:
- Files are saved to custom directories
- Files are loaded from custom directories
- Both raw and processed operations work correctly
- Data integrity is maintained
- What: Tests Excel ↔ CSV conversion with custom directory configuration
- Why: Validates that conversion workflows work with custom directories
- Key API:
convert_excel_to_csv_with_structure(),convert_csv_to_excel_workbook() - Validates:
- Excel files are read from custom raw directory
- CSV files are created in custom processed directory
- Structure JSON is saved to custom processed directory
- Reconstruction works with custom directory paths
- What: Tests directory creation with custom configuration
- Why: Validates directory creation with configurable parent directories
- Key API:
create_directory(directory_name, parent_dir=None) - Validates:
- Directory creation uses configured data directory by default
- Explicit parent directory specification works
- Directories are created in correct locations
- Legacy validation still works for explicit parents
- What: Tests backward compatibility with existing projects
- Why: Ensures existing projects continue working unchanged
- Key API:
FileUtils()(default initialization) - Validates:
- Default configuration uses "data" directory
- Existing file operations work unchanged
- No breaking changes for existing projects
- Path structure remains consistent
- What: Tests partial configuration (only data_directory specified)
- Why: Validates flexible configuration options
- Key API:
FileUtils(config_override=partial_config) - Validates:
- Custom main directory name is used
- Default subdirectory names are applied
- Configuration fallback behavior works correctly
- Mixed configuration scenarios work properly
test_save_single_dataframe_with_subpath: Tests saving with subdirectoriestest_load_single_file_with_subpath: Tests loading from subdirectories- Purpose: Validates hierarchical file organization
- Key API:
sub_pathparameter in save/load operations
test_save_json_as_document: Tests JSON document savingtest_save_yaml_as_document: Tests YAML document saving- Purpose: Validates document format handling
- Key API:
save_document_to_storage(content, output_filetype)
test_automatic_pandas_type_conversion: Tests pandas type conversiontest_intelligent_timestamp_handling: Tests datetime handling- Purpose: Ensures data types are preserved correctly
- Key API: Automatic type inference and conversion
test_file_not_found: Tests handling of missing filestest_invalid_file_type: Tests handling of unsupported formats- Purpose: Validates graceful error handling
- Key API: Exception raising and error messages
- DOCX Template System: Tests enhanced DOCX functionality
- Markdown Processing: Tests YAML frontmatter handling
- PDF Text Extraction: Tests PDF reading capabilities
- Purpose: Validates document format specific features
- What: Tests realistic business data workflow
- Why: Validates real-world usage scenarios
- Data: Multi-sheet business data (Sales, Customers, Products)
- Workflow:
- Create Excel workbook with business data
- Convert to CSV files with structure preservation
- Modify CSV data (add calculated fields)
- Reconstruct Excel workbook
- Verify data integrity and modifications
- Key Validations:
- Data modifications are preserved (
profit_margin,quarterfields) - Unmodified sheets remain unchanged
- Data type changes from CSV round-trip are handled
- Business calculations remain accurate
- Data modifications are preserved (
- What: Tests graceful handling of missing CSV files
- Why: Ensures robust error handling in production
- Scenario: Structure JSON references non-existent CSV files
- Key Validations:
- Appropriate exceptions are raised
- Error messages are informative
- System doesn't crash on missing files
- What: Tests Excel to CSV conversion with custom options
- Why: Validates parameter passing and customization
- Options: Custom delimiter, encoding settings
- Key Validations:
- Custom options are applied correctly
- Files are created successfully
- Content validation works
- Cloud Storage: Tests Azure Blob Storage operations
- Authentication: Tests credential handling
- Purpose: Validates cloud storage backend functionality
@pytest.fixture
def sample_df():
"""Standard test DataFrame with various data types."""
return pd.DataFrame({
'id': [1, 2, 3, 4, 5],
'name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
'value': [10.5, 20.0, 30.5, 40.0, 50.5],
'active': [True, False, True, True, False],
'date': pd.date_range('2024-01-01', periods=5, freq='D')
})# Realistic business data for integration testing
sales_data = pd.DataFrame({
'sale_id': [1, 2, 3, 4, 5],
'customer_name': ['Alice Corp', 'Bob Industries', ...],
'product': ['Widget A', 'Widget B', ...],
'quantity': [10, 5, 15, 8, 12],
'unit_price': [25.50, 45.00, ...],
'sale_date': pd.date_range('2024-01-01', periods=5, freq='D'),
'sales_rep': ['John', 'Jane', ...]
})- Temporary Directories: Each test gets isolated file system
- Clean State: Tests don't interfere with each other
- Realistic Paths: Tests use actual file system operations
| Method | Unit Tests | Integration Tests | Coverage |
|---|---|---|---|
save_data_to_storage |
Multiple | Workflow | Complete |
load_single_file |
Multiple | Workflow | Complete |
load_excel_sheets |
Multiple | Workflow | Complete |
convert_excel_to_csv_with_structure |
Basic + No Preserve | Complete Workflow | Complete |
convert_csv_to_excel_workbook |
Basic | Complete Workflow | Complete |
save_document_to_storage |
Multiple | Workflow | Complete |
| Format | Unit Tests | Integration Tests | Coverage |
|---|---|---|---|
| CSV | Multiple | Round-trip | Complete |
| Excel (.xlsx) | Multiple | Round-trip | Complete |
| JSON | Multiple | Workflow | Complete |
| YAML | Multiple | Workflow | Complete |
| DOCX | Template System | Enhanced Features | Complete |
| Markdown | YAML Frontmatter | Document Processing | Complete |
| Text Extraction | Document Reading | Complete |
| Scenario | Test Coverage | Validation |
|---|---|---|
| Missing Files | Unit + Integration | Proper exception handling |
| Invalid Formats | Unit | Graceful error messages |
| Corrupted Data | Unit | Data validation |
| Network Issues | Integration | Azure storage errors |
| Permission Errors | Unit | File system errors |
# Run all tests
pytest
# Run with verbose output
pytest -v
# Run specific test file
pytest tests/unit/test_file_utils.py
# Run specific test
pytest tests/unit/test_file_utils.py::test_convert_excel_to_csv_with_structure_basic
# Run Excel ↔ CSV tests only
pytest -k "convert_excel_to_csv or convert_csv_to_excel or excel_csv_roundtrip"# Unit tests only
pytest tests/unit/
# Integration tests only
pytest tests/integration/
# Excel ↔ CSV conversion tests
pytest -k "excel_csv"
# Error handling tests
pytest -k "error"# Show test output
pytest -s
# Stop on first failure
pytest -x
# Show local variables on failure
pytest --tb=long
# Run with coverage
pytest --cov=FileUtils- Unit Tests: Add to appropriate
test_*.pyfile - Integration Tests: Create new file in
tests/integration/ - Fixtures: Add to
conftest.pyif shared - Documentation: Update this file with new test coverage
- Unit Tests:
test_method_name_scenario - Integration Tests:
test_workflow_name - Error Tests:
test_method_name_error_condition
- Use Fixtures: For reusable test data
- Isolated Data: Each test should be independent
- Realistic Data: Use business-relevant examples
- Clean Up: Tests should not leave artifacts
- Fast Execution: All tests complete in < 1 second
- Minimal I/O: Use temporary directories
- Efficient Data: Use small but representative datasets
- Parallel Execution: Tests can run concurrently
- Total Tests: 50+ (47 unit + 3+ integration)
- API Coverage: 100% of public methods
- Format Coverage: All supported file formats
- Error Coverage: Major error scenarios
- Pass Rate: 100% ✅
- Deterministic: Tests produce consistent results
- Isolated: Tests don't depend on each other
- Fast: Complete test suite runs in < 1 minute
- Maintainable: Clear test structure and naming
- Unit Tests: 54 tests covering core functionality
- Integration Tests: 3 tests covering end-to-end workflows
- Total Coverage: 57 tests across all major features
- Configurable Directory Tests: 7 new tests for directory customization
- Excel ↔ CSV Tests: 7 tests for conversion workflows
The FileUtils test suite provides comprehensive validation of all core functionality, with particular emphasis on the Excel ↔ CSV conversion workflow and configurable directory features. The tests ensure data integrity, proper error handling, and real-world usability while maintaining fast execution and clear documentation.
The combination of unit tests (for individual method validation) and integration tests (for complete workflow validation) provides confidence that FileUtils is robust, reliable, and ready for production use. The configurable directory tests ensure that domain-specific workflows can be implemented seamlessly while maintaining backward compatibility.