π€ AI-Powered Data Science Assistant
Features β’ Installation β’ Quick Start β’ CLI Usage β’ API Reference β’ Configuration β’ Contributing
EVA (Exploratory Visual Analyzer) is an intelligent data science assistant that automates the tedious parts of data analysis. Simply point EVA at your CSV file, and it will:
- π Analyze your data structure and quality
- π Generate comprehensive statistics and visualizations
- π§ Suggest insights and data cleaning strategies using AI
- π€ Recommend machine learning models suited for your data
- π Export everything to a Jupyter notebook for further exploration
EVA uses an agent-based architecture where specialized agents collaborate to provide a complete data analysis pipeline.
- Smart encoding detection - Automatically handles UTF-8, Latin-1, and other encodings
- Type inference - Detects numeric, datetime, categorical, and boolean columns
- Validation - Comprehensive file validation with detailed error reporting
- Descriptive statistics - Mean, median, std, quartiles, and more
- Missing value analysis - Patterns and recommendations for handling
- Correlation analysis - Pearson, Spearman, and categorical correlations
- Outlier detection - IQR and Z-score based identification
- Distribution plots - Histograms and density plots
- Relationship plots - Scatter plots and pair plots
- Correlation heatmaps - Beautiful visual correlation matrices
- Interactive plots - Plotly-powered interactive visualizations
- OpenAI Integration - GPT-powered analysis suggestions
- Google Gemini Support - Alternative AI provider
- Smart suggestions - Data cleaning and feature engineering recommendations
- Fallback mode - Works offline with rule-based suggestions
- Problem type detection - Classification, regression, clustering
- Algorithm suggestions - Ranked list of suitable models
- Baseline pipelines - Ready-to-use sklearn pipeline code
- Jupyter notebooks - Complete analysis as executable notebooks
- Python scripts - Standalone .py file generation
- Documentation - Well-commented, reproducible code
- Python 3.8 or higher
- pip package manager
# Clone the repository
git clone https://github.com/yourusername/EVA.git
cd EVA
# Create a virtual environment (recommended)
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtpip install -e .# Basic analysis
python -m eva.cli analyze data.csv
# With AI suggestions enabled
python -m eva.cli analyze data.csv --enable-ai
# Export to notebook
python -m eva.cli analyze data.csv --export-notebook --output ./resultsfrom eva.orchestrator import AnalysisOrchestrator
from eva.models.core import AnalysisContext, AnalysisConfig
from eva.agents.csv_ingestor import CSVIngestorAgent
from eva.agents.eda_generator import EDAGeneratorAgent
from eva.agents.visualizer import VisualizerAgent
# Create orchestrator
orchestrator = AnalysisOrchestrator(max_workers=3)
# Configure analysis
config = AnalysisConfig(
processing_timeout_minutes=5,
enable_ai_suggestions=True
)
# Create context
context = AnalysisContext(
session_id="my_analysis",
config=config
)
# Set file path
context.metadata = {'file_path': 'data.csv'}
# Create and run agents
agents = [
CSVIngestorAgent(),
EDAGeneratorAgent(),
VisualizerAgent()
]
results = orchestrator.execute_pipeline(agents, context)
# Access results
print(f"Dataset shape: {context.dataset.shape}")
print(f"EDA completed: {results['EDAGeneratorAgent'].success}")python -m eva.cli analyze <file_path> [OPTIONS]| Option | Description |
|---|---|
--output, -o |
Output directory for results |
--config, -c |
Path to configuration file |
--enable-ai |
Enable AI-powered suggestions |
--export-notebook |
Generate Jupyter notebook |
--export-script |
Generate Python script |
--format |
Visualization format (png, html, both) |
--verbose, -v |
Verbose output |
--quiet, -q |
Suppress output |
# Full analysis with all exports
python -m eva.cli analyze sales_data.csv \
--output ./analysis_results \
--enable-ai \
--export-notebook \
--export-script \
--format both \
--verbose
# Quick analysis without AI
python -m eva.cli analyze data.csv --quiet
# Using custom configuration
python -m eva.cli analyze data.csv --config my_config.yamlManages the execution of analysis agents with dependency resolution and parallel processing.
from eva.orchestrator import AnalysisOrchestrator
orchestrator = AnalysisOrchestrator(
max_workers=4, # Parallel worker count
system_limits=limits # Resource limits
)
results = orchestrator.execute_pipeline(agents, context)Shared context object passed between agents.
from eva.models.core import AnalysisContext, AnalysisConfig
context = AnalysisContext(
dataset=None, # Populated by CSVIngestorAgent
metadata={}, # File and analysis metadata
results={}, # Agent results storage
config=AnalysisConfig(), # Configuration
session_id="unique_id" # Session identifier
)| Agent | Description | Dependencies |
|---|---|---|
CSVIngestorAgent |
Loads and validates CSV files | None |
EDAGeneratorAgent |
Statistical analysis | CSVIngestorAgent |
VisualizerAgent |
Creates visualizations | EDAGeneratorAgent |
InsightSuggesterAgent |
AI-powered insights | EDAGeneratorAgent |
ModelRecommenderAgent |
ML model suggestions | EDAGeneratorAgent |
NotebookExporterAgent |
Notebook generation | All others |
For detailed API documentation, see docs/api/README.md.
Create a config.yaml file:
# Analysis settings
analysis:
max_file_size_mb: 100
processing_timeout_minutes: 5
memory_limit_gb: 2
enable_ai_suggestions: true
export_formats:
- ipynb
- py
visualization_formats:
- png
- html
# Logging
log_level: INFO
log_file: null
# Storage
temp_dir: temp/eva
cache_dir: temp/eva/cache
# AI service
ai_service_provider: openai # openai, gemini, mock
ai_api_key: null # Use EVA_AI_API_KEY env var
ai_model: gpt-4
ai_timeout_seconds: 30
# Performance
max_workers: 4
chunk_size: 10000| Variable | Description |
|---|---|
EVA_AI_API_KEY |
API key for AI service |
EVA_CONFIG_PATH |
Custom config file path |
EVA_LOG_LEVEL |
Logging level override |
EVA_OUTPUT_DIR |
Default output directory |
eva/
βββ examples/ # Usage examples
βββ scripts/ # Verification and utility scripts
βββ tests/ # Test suite
β βββ unit/ # Unit tests
β βββ integration/ # Integration tests
βββ eva/ # Source code
β βββ agents/ # Analysis agents
β βββ models/ # Data models
β βββ services/ # Business logic
β βββ utils/ # Utilities
βββ docs/ # Documentation
# Run all tests
pytest tests/ -v
# Run with coverage
pytest tests/ --cov=eva --cov-report=html
# Run specific test file
pytest tests/test_orchestrator.py -v
# Run integration tests
python tests/run_integration_tests.pyWe welcome contributions! Please see CONTRIBUTING.md for guidelines.
# Clone and setup
git clone https://github.com/yourusername/EVA.git
cd EVA
python -m venv .venv
source .venv/bin/activate
# Install dev dependencies
pip install -r requirements.txt
pip install -r requirements-dev.txt
# Install pre-commit hooks
pre-commit install- Formatter: Black
- Linter: Flake8
- Type Checker: mypy
- Import Sorter: isort
This project is licensed under the MIT License - see the LICENSE file for details.
- pandas - Data manipulation
- scikit-learn - Machine learning
- Plotly - Interactive visualizations
- OpenAI - AI capabilities
- Google Gemini - Alternative AI
Made with β€οΈ by the EVA Development Team