EVA - Exploratory Visual Analyzer

🤖 AI-Powered Data Science Assistant

Features • Installation • Quick Start • CLI Usage • API Reference • Configuration • Contributing

🌟 Overview

EVA (Exploratory Visual Analyzer) is an intelligent data science assistant that automates the tedious parts of data analysis. Simply point EVA at your CSV file, and it will:

📊 Analyze your data structure and quality
📈 Generate comprehensive statistics and visualizations
🧠 Suggest insights and data cleaning strategies using AI
🤖 Recommend machine learning models suited for your data
📓 Export everything to a Jupyter notebook for further exploration

EVA uses an agent-based architecture where specialized agents collaborate to provide a complete data analysis pipeline.

✨ Features

🔍 Intelligent Data Ingestion

Smart encoding detection - Automatically handles UTF-8, Latin-1, and other encodings
Type inference - Detects numeric, datetime, categorical, and boolean columns
Validation - Comprehensive file validation with detailed error reporting

📊 Exploratory Data Analysis

Descriptive statistics - Mean, median, std, quartiles, and more
Missing value analysis - Patterns and recommendations for handling
Correlation analysis - Pearson, Spearman, and categorical correlations
Outlier detection - IQR and Z-score based identification

📈 Automatic Visualization

Distribution plots - Histograms and density plots
Relationship plots - Scatter plots and pair plots
Correlation heatmaps - Beautiful visual correlation matrices
Interactive plots - Plotly-powered interactive visualizations

🧠 AI-Powered Insights

OpenAI Integration - GPT-powered analysis suggestions
Google Gemini Support - Alternative AI provider
Smart suggestions - Data cleaning and feature engineering recommendations
Fallback mode - Works offline with rule-based suggestions

🤖 Model Recommendations

Problem type detection - Classification, regression, clustering
Algorithm suggestions - Ranked list of suitable models
Baseline pipelines - Ready-to-use sklearn pipeline code

📓 Notebook Export

Jupyter notebooks - Complete analysis as executable notebooks
Python scripts - Standalone .py file generation
Documentation - Well-commented, reproducible code

🚀 Installation

Prerequisites

Python 3.8 or higher
pip package manager

Install from Source

# Clone the repository
git clone https://github.com/yourusername/EVA.git
cd EVA

# Create a virtual environment (recommended)
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Install as Package

pip install -e .

🏃 Quick Start

Command Line

# Basic analysis
python -m eva.cli analyze data.csv

# With AI suggestions enabled
python -m eva.cli analyze data.csv --enable-ai

# Export to notebook
python -m eva.cli analyze data.csv --export-notebook --output ./results

Python API

from eva.orchestrator import AnalysisOrchestrator
from eva.models.core import AnalysisContext, AnalysisConfig
from eva.agents.csv_ingestor import CSVIngestorAgent
from eva.agents.eda_generator import EDAGeneratorAgent
from eva.agents.visualizer import VisualizerAgent

# Create orchestrator
orchestrator = AnalysisOrchestrator(max_workers=3)

# Configure analysis
config = AnalysisConfig(
    processing_timeout_minutes=5,
    enable_ai_suggestions=True
)

# Create context
context = AnalysisContext(
    session_id="my_analysis",
    config=config
)

# Set file path
context.metadata = {'file_path': 'data.csv'}

# Create and run agents
agents = [
    CSVIngestorAgent(),
    EDAGeneratorAgent(),
    VisualizerAgent()
]

results = orchestrator.execute_pipeline(agents, context)

# Access results
print(f"Dataset shape: {context.dataset.shape}")
print(f"EDA completed: {results['EDAGeneratorAgent'].success}")

💻 CLI Usage

Analyze Command

python -m eva.cli analyze <file_path> [OPTIONS]

Option	Description
`--output`, `-o`	Output directory for results
`--config`, `-c`	Path to configuration file
`--enable-ai`	Enable AI-powered suggestions
`--export-notebook`	Generate Jupyter notebook
`--export-script`	Generate Python script
`--format`	Visualization format (png, html, both)
`--verbose`, `-v`	Verbose output
`--quiet`, `-q`	Suppress output

Examples

# Full analysis with all exports
python -m eva.cli analyze sales_data.csv \
    --output ./analysis_results \
    --enable-ai \
    --export-notebook \
    --export-script \
    --format both \
    --verbose

# Quick analysis without AI
python -m eva.cli analyze data.csv --quiet

# Using custom configuration
python -m eva.cli analyze data.csv --config my_config.yaml

📚 API Reference

Core Classes

AnalysisOrchestrator

Manages the execution of analysis agents with dependency resolution and parallel processing.

from eva.orchestrator import AnalysisOrchestrator

orchestrator = AnalysisOrchestrator(
    max_workers=4,           # Parallel worker count
    system_limits=limits     # Resource limits
)

results = orchestrator.execute_pipeline(agents, context)

AnalysisContext

Shared context object passed between agents.

from eva.models.core import AnalysisContext, AnalysisConfig

context = AnalysisContext(
    dataset=None,            # Populated by CSVIngestorAgent
    metadata={},             # File and analysis metadata
    results={},              # Agent results storage
    config=AnalysisConfig(), # Configuration
    session_id="unique_id"   # Session identifier
)

Agents

Agent	Description	Dependencies
`CSVIngestorAgent`	Loads and validates CSV files	None
`EDAGeneratorAgent`	Statistical analysis	CSVIngestorAgent
`VisualizerAgent`	Creates visualizations	EDAGeneratorAgent
`InsightSuggesterAgent`	AI-powered insights	EDAGeneratorAgent
`ModelRecommenderAgent`	ML model suggestions	EDAGeneratorAgent
`NotebookExporterAgent`	Notebook generation	All others

For detailed API documentation, see docs/api/README.md.

⚙️ Configuration

Configuration File

Create a config.yaml file:

# Analysis settings
analysis:
  max_file_size_mb: 100
  processing_timeout_minutes: 5
  memory_limit_gb: 2
  enable_ai_suggestions: true
  export_formats:
    - ipynb
    - py
  visualization_formats:
    - png
    - html

# Logging
log_level: INFO
log_file: null

# Storage
temp_dir: temp/eva
cache_dir: temp/eva/cache

# AI service
ai_service_provider: openai  # openai, gemini, mock
ai_api_key: null             # Use EVA_AI_API_KEY env var
ai_model: gpt-4
ai_timeout_seconds: 30

# Performance
max_workers: 4
chunk_size: 10000

Environment Variables

Variable	Description
`EVA_AI_API_KEY`	API key for AI service
`EVA_CONFIG_PATH`	Custom config file path
`EVA_LOG_LEVEL`	Logging level override
`EVA_OUTPUT_DIR`	Default output directory

🏗️ Architecture

eva/
├── examples/            # Usage examples
├── scripts/             # Verification and utility scripts
├── tests/               # Test suite
│   ├── unit/           # Unit tests
│   └── integration/    # Integration tests
├── eva/                 # Source code
│   ├── agents/         # Analysis agents
│   ├── models/         # Data models
│   ├── services/       # Business logic
│   └── utils/          # Utilities
└── docs/                # Documentation

🧪 Testing

# Run all tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=eva --cov-report=html

# Run specific test file
pytest tests/test_orchestrator.py -v

# Run integration tests
python tests/run_integration_tests.py

🤝 Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Development Setup

# Clone and setup
git clone https://github.com/yourusername/EVA.git
cd EVA
python -m venv .venv
source .venv/bin/activate

# Install dev dependencies
pip install -r requirements.txt
pip install -r requirements-dev.txt

# Install pre-commit hooks
pre-commit install

Code Style

Formatter: Black
Linter: Flake8
Type Checker: mypy
Import Sorter: isort

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

pandas - Data manipulation
scikit-learn - Machine learning
Plotly - Interactive visualizations
OpenAI - AI capabilities
Google Gemini - Alternative AI

Made with ❤️ by the EVA Development Team

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
docs		docs
eva		eva
examples		examples
scripts		scripts
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

License

Aspect022/EVA-Exploratory_Visual_Analyzer

Folders and files

Latest commit

History

Repository files navigation