A comprehensive Python toolkit for satellite altimetry analysis and intelligent causal discovery with LLM-powered explanations.
This project combines oceanographic data analysis with AI-powered causal discovery:
- π°οΈ Satellite Altimetry - DOT, SLA, SSH analysis from Jason/CMEMS/AVISO
- π¬ Causal Discovery - PCMCI algorithm to find cause-effect relationships with time lags
- π€ LLM Integration - Ollama (qwen3-coder) for automatic data interpretation
- β‘ Physics Validation - Validate patterns against physical laws (wind setup, inverse barometer)
- π Pattern Detection - tsfresh features, association rules, anomaly detection
- π€ Multi-Agent Audit - Parallel quality assurance system (8 specialized agents)
8 Specialized Agents for comprehensive quality monitoring:
# Run full parallel audit (all 8 agents)
python audit_agents/run_all.py
# Or test individual agent
python audit_agents/data_flow_auditor.pyAgents:
- π DataFlowAuditor - CMEMS/ERA5/Cache (β 11/13 checks)
- π InvestigationAuditor - Pipeline E2E
- π KnowledgeAuditor - Services/Persistence
- π APIAuditor - Security/Performance
- π§ CausalAuditor - PCMCI/Tigramite
- π§ͺ QualityAuditor - Tests/Coverage
- π¨ FrontendAuditor - React/TypeScript
- π OpsAuditor - Docker/Monitoring
Output: JSON + Markdown reports in audit_reports/
See audit_agents/README.md for details.
Dataset β LLM Interprets β Find Time Dimension β PCMCI Discovery β Physics Validation β LLM Explains
Example: Load flood data β LLM identifies "sea_level_anomaly" as target β PCMCI finds "precipitation β river_level (lag=2 days)" β Physics confirms wind setup mechanism β LLM explains the Atlantic storm track connection.
# Clone repository
git clone <repo-url>
cd nico
# Use Python 3.12 (recommended - 3.14 has compatibility issues)
python3.12 -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# For causal discovery features
pip install tigramite networkx fastapi uvicorn ollama# Install Ollama: https://ollama.ai
ollama pull qwen3-coder:30b # or llama3.2 for faster inference
ollama serve# Start FastAPI backend
uvicorn api.main:app --reload --port 8000python test_headless.pyExpected output:
β
PASS: llm (Ollama connected, data interpreted)
β
PASS: causal (Found precipitationβriver_level, windβsurge)
β
PASS: satellite (Loaded AVISO/CMEMS data)
β
PASS: llm_explain (Physics validation: 0.95)
nico/
βββ api/ # π FastAPI Backend (NEW)
β βββ main.py # REST endpoints
β βββ services/
β βββ llm_service.py # Ollama LLM integration
β βββ causal_service.py # PCMCI causal discovery
β βββ data_service.py # Dataset loading/preprocessing
β
βββ src/ # π§ Core Analysis Modules
β βββ analysis/ # DOT, slope, statistics
β βββ core/ # Config, coordinates, resolvers
β βββ data/ # Loaders, filters, geoid
β βββ visualization/ # Plotly/Matplotlib charts
β βββ pattern_engine/ # Pattern detection (tsfresh, mlxtend)
β β βββ core/ # Pattern dataclasses
β β βββ detection/ # ML detectors, association rules
β β βββ physics/ # Domain rules (flood, manufacturing)
β β βββ output/ # Gray zone detector
β βββ surge_shazam/ # Physics-informed ML
β βββ physics/ # Shallow water equations (PyTorch)
β βββ causal/ # PCMCI integration (stubs)
β
βββ app/ # π± Streamlit Dashboard
β βββ components/ # UI tabs (analysis, spatial, profiles)
β
βββ data/ # π Satellite Data
β βββ aviso/ # AVISO altimetry
β βββ cmems/ # CMEMS L3/L4
β βββ slcci/ # SLCCI Jason-1/2
β βββ geoid/ # TUM geoid model
β
βββ gates/ # π Strait Shapefiles
β
βββ test_headless.py # π§ͺ Integration tests
βββ gradio_app.py # Alternative Gradio UI
| Endpoint | Method | Description |
|---|---|---|
/api/v1/health |
GET | Check API + Ollama status |
/api/v1/data/files |
GET | List available data files |
/api/v1/data/upload |
POST | Upload CSV/NetCDF |
/api/v1/data/load/{path} |
GET | Load file from data/ |
/api/v1/interpret |
POST | LLM interprets dataset structure |
/api/v1/discover |
POST | Run PCMCI causal discovery |
/api/v1/discover/correlations |
POST | Cross-correlation analysis |
/api/v1/chat |
POST | Chat with LLM about data |
/api/v1/hypotheses |
POST | Generate causal hypotheses |
/api/v1/ws/chat |
WebSocket | Stream LLM responses |
| Endpoint | Method | Description |
|---|---|---|
/api/v1/knowledge/stats |
GET | Knowledge base statistics |
/api/v1/knowledge/papers |
GET/POST | Scientific papers CRUD |
/api/v1/knowledge/events |
GET/POST | Historical events |
/api/v1/knowledge/patterns |
GET/POST | Causal patterns |
| Endpoint | Method | Description |
|---|---|---|
/api/v1/investigate/ws |
WebSocket | Real-time investigation streaming |
/api/v1/investigate/status |
GET | Investigation components status |
Note: All endpoints now require /api/v1 prefix (v1.8.0+)
curl -X POST http://localhost:8000/api/v1/discover \
-H "Content-Type: application/json" \
-d '{
"dataset_name": "flood_data",
"max_lag": 7,
"alpha_level": 0.05,
"domain": "flood",
"use_llm": true
}'Response:
{
"variables": ["precipitation", "wind_speed", "pressure", "river_level", "flood_index"],
"links": [
{
"source": "precipitation",
"target": "river_level",
"lag": 2,
"strength": 0.95,
"p_value": 0.0001,
"explanation": "Heavy precipitation causes river levels to rise with a 2-day lag...",
"physics_valid": true,
"physics_score": 0.92
}
]
}The OllamaLLMService provides:
result = await llm.interpret_dataset(columns_info, filename)
# Returns: domain="flood", temporal_column="timestamp", suggested_targets=["sea_level"]explanation = await llm.explain_causal_relationship(
source="wind_speed", target="storm_surge", lag=1, strength=0.52
)
# Returns: "Wind speed causes storm surge through the wind setup mechanism (Ο β UΒ²)..."validation = await llm.validate_pattern_physics(
pattern="wind β surge", domain="flood", confidence=0.99
)
# Returns: {"is_valid": True, "physics_score": 0.95, "supporting_evidence": ["wind stress formula"]}hypotheses = await llm.generate_hypotheses(variables, domain="flood")
# Returns: [{"source": "NAO_index", "target": "storm_surge", "expected_lag": "3-5 days"}]Built-in physics validation for multiple domains:
| Rule | Formula | Typical Lag |
|---|---|---|
| Wind Setup | Ξ· β UΒ²Β·L/(gΒ·h) | 6-24 hours |
| Inverse Barometer | ΞΞ· β -1 cm/hPa | 12-48 hours |
| Pressure Effect | Low pressure β surge | 24-72 hours |
| Rule | Effect |
|---|---|
| Temperature | Arrhenius: rate Γ2 per 10Β°C |
| Viscosity | Decreases with temperature |
| Speed | Optimal range for quality |
# Headless integration test
python test_headless.py
# Unit tests
pytest tests/
# Multi-agent audit (NEW)
python audit_agents/run_all.pyblack src/ api/
ruff check src/ api/
# Check audit status
python audit_agents/data_flow_auditor.py- FastAPI backend with REST endpoints +
/api/v1versioning - Ollama LLM integration (qwen3-coder, llama3.2)
- PCMCI causal discovery with correlation fallback
- Physics validation rules (flood, manufacturing)
- Data interpretation and explanation generation
- NetCDF/CSV loading with auto-detection
- Headless test pipeline
- Pattern engine (tsfresh, mlxtend, pyod)
- Modular routers (7 routers, 75% code reduction)
- Production middleware (logging, security, rate limiting)
- Multi-Agent Audit System (8 agents, parallel execution)
- MinimalKnowledgeService (in-memory, production-ready)
- Investigation pipeline with WebSocket streaming
- Complete 7 remaining audit agents
- React frontend with PHI spacing layout (partial)
- Interactive causal graph visualization (D3.js)
- Real-time chat with WebSocket streaming (functional)
- Knowledge base persistence (SurrealDB/Neo4j)
- Test coverage 40% β 80%
- Neo4j for causal graph persistence
- RAG with scientific papers (ChromaDB)
- Multi-dataset correlation analysis
- Export to standard causal formats (TETRAD, DOT)
- Teleconnection patterns (NAO, ENSO)
- Automated report generation
- Full audit coverage (156+ checks)
- Docker Compose deployment
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β React Frontend (TODO) β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β Causal Graphβ β Chat (LLM) β β Time Series β β
β β (D3.js) β β Interface β β Explorer β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β REST/WebSocket
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FastAPI Backend (/api) β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β LLM Service β β Causal β β Data β β
β β (Ollama) β β Discovery β β Service β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Core Analysis + Pattern Engine β
β (DOT analysis, tsfresh, mlxtend, physics validation) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
MIT License
Key areas for contribution:
- React Frontend - Build the PHI-spaced dashboard with D3.js graphs
- Physics Rules - Add domain-specific validation rules
- LLM Prompts - Improve scientific explanation quality
- Test Data - Contribute synthetic/real datasets