This document explains how the CSV processing functionality works in the Dun project, including the flow of execution, key components, and logging details.
The CSV processing system consists of several key components:
- LLMAnalyzer - Analyzes natural language requests and returns the appropriate processor configuration
- CSV Processor - Handles the actual CSV file processing and merging
- ProcessorEngine - Executes the processor with the given configuration
- Dynamic Package Manager - Installs required Python packages on-demand
-
Request Analysis
- User submits a natural language request (e.g., "Przeanalizuj pliki CSV")
LLMAnalyzer.analyze_request()processes the request- If LLM is unavailable, falls back to the default CSV processor
-
Processor Initialization
_get_csv_processor()creates aProcessorConfigwith:- Required dependencies (pandas)
- Default parameters (input_dir, output_file)
- The actual Python code to execute
-
Dependency Installation
ProcessorEnginechecks for required packages- Missing packages are installed automatically via pip
-
CSV Processing
- The processor code is executed in a sandboxed environment
- It performs the following steps:
- Validates input directory and permissions
- Recursively searches for CSV files
- Reads and combines all found CSV files
- Saves the result to the output file
- Returns processing statistics
-
Result Handling
- The combined CSV is saved to the specified location (or temporary directory)
- Processing statistics are returned to the user
analyze_request(): Entry point for processing natural language requests_get_csv_processor(): Creates the CSV processor configuration_get_default_imap_processor(): Fallback processor selection
name: Identifier for the processordescription: Human-readable descriptiondependencies: List of required Python packagesparameters: Configuration parameterscode_template: The actual Python code to execute
process_natural_request(): Main entry point for processing requests_execute_processor(): Executes the processor code in a sandboxed environmentinstall_package(): Handles dynamic package installation
The system uses the following log levels:
DEBUG: Detailed debug informationINFO: General processing informationWARNING: Non-critical issuesERROR: Processing errorsSUCCESS: Successful operations
[INFO] Szukam plików CSV w katalogu: data/
[INFO] Znaleziono pliki CSV: ['data/sample1.csv']
[INFO] Przetwarzanie pliku: data/sample1.csv
[INFO] Wczytano 2 wierszy i 3 kolumn
[INFO] Połączono dane: 2 wierszy i 3 kolumn
[SUCCESS] Zapisano połączony zbiór danych do: /tmp/.../combined.csv
[WARNING] Nie można zapisać w docelowej lokalizacji, używam katalogu tymczasowego
[ERROR] Błąd podczas przetwarzania pliku: [error details]
[ERROR] Nie udało się wczytać żadnych danych z plików CSV
The system handles various error conditions:
-
Missing Input Files
- Error if no CSV files found in input directory
- Warning if input directory doesn't exist
-
Permission Issues
- Falls back to temporary directory if output directory is not writable
- Detailed error messages for permission-related failures
-
Data Processing Errors
- Continues processing other files if one file fails
- Provides detailed error messages for data-related issues
poetry run dun "Przeanalizuj wszystkie pliki CSV w folderze data/"export INPUT_DIR=my_data
export OUTPUT_FILE=results/combined.csv
poetry run dun "Przetwórz pliki CSV"[INFO] Processing CSV files in: data/
[INFO] Found 2 CSV files
[INFO] Processing file: data/file1.csv
[SUCCESS] Combined data saved to: /tmp/.../combined.csv
==================================================
Processed 2 CSV files
Total rows: 100
Columns: id, name, value
Output file: /tmp/.../combined.csv
==================================================
-
Permission Denied
- Ensure the output directory is writable
- The system will fall back to a temporary directory if needed
-
No CSV Files Found
- Verify the input directory exists and contains CSV files
- Check file extensions (.csv or .CSV)
-
Missing Dependencies
- The system should automatically install required packages
- Check internet connection if installation fails
Enable debug logging for more detailed information:
import logging
logging.basicConfig(level=logging.DEBUG)Check the application logs for detailed error messages and processing information.