Lake is the data analytics platform for DoubleZero. It provides a web interface and API for querying network telemetry and Solana validator data stored in ClickHouse.
HTTP API server that powers the web UI. Provides endpoints for:
- SQL query execution against ClickHouse
- AI-powered natural language to SQL generation
- Conversational chat interface for data analysis
- MCP server for Claude Desktop and other MCP clients
- Schema catalog and visualization recommendations
Serves the built web UI as static files in production.
React/TypeScript single-page application. Features:
- SQL editor with syntax highlighting
- Natural language query interface
- Chat mode for conversational data exploration
- Query results with tables and charts
- Session history
LLM-powered workflow for answering natural language questions. Implements a multi-step process: classify → decompose → generate SQL → execute → synthesize answer. Includes evaluation tests for validating agent accuracy.
See agent/README.md for architecture details.
Background service that continuously syncs data from external sources into ClickHouse:
- Network topology from Solana (DZ programs)
- Latency measurements from Solana (DZ programs)
- Device usage metrics from InfluxDB
- Solana validator data from mainnet
- GeoIP enrichment from MaxMind
See indexer/README.md for architecture details.
Slack bot that provides a chat interface for data queries. Users can ask questions in Slack and receive answers powered by the agent workflow.
CLI tool for maintenance operations:
- Database reset
- Data backfills (latency, usage metrics)
- Schema migrations
ClickHouse schema migrations for dimension and fact tables. These are applied automatically by the indexer on startup.
Shared Go packages used across lake services (logging, retry logic, test helpers).
External Sources Lake Services Storage
──────────────── ───────────── ───────
Solana (DZ) ───────────────► Indexer ──────────────────► ClickHouse
InfluxDB ───────────────► │
MaxMind ───────────────► │
│
▼
┌───────────────────────┐
│ API Server │◄────── Web UI
│ • Query execution │◄────── Slack Bot
│ • Agent workflow │
│ • Chat interface │
└───────────────────────┘
The recommended local dev environment uses k3d (lightweight k8s) and Tilt for orchestration with live-reload.
Prerequisites: docker, k3d, tilt, kubectl
brew install k3d tilt-dev/tap/tilt kubectlStart the environment:
./scripts/k8s.sh upThis will:
- Create a k3d cluster with an isolated kubeconfig
- Download GeoIP databases
- Sync secrets from
.envinto the cluster - Start all services via Tilt (ClickHouse, PostgreSQL, Neo4j, Temporal, API, Indexer, Web)
- Set up remote proxy tables if
REMOTE_CH_*vars are configured
The web app will be at http://localhost:5173, API at http://localhost:8080. If those ports conflict, the script auto-detects and shifts all ports (e.g., +100).
Other commands:
./scripts/k8s.sh status # Show cluster and pod status
./scripts/k8s.sh down # Destroy cluster
./scripts/k8s.sh list # List all lake clusters
./scripts/k8s.sh up feature-x # Run an isolated cluster for a feature branchAlternatively, run services directly on the host with Docker for infrastructure:
Run the setup script to get started:
./scripts/dev-setup.shThis will:
- Start Docker services (ClickHouse, PostgreSQL, Neo4j)
- Create
.envfrom.env.example - Download GeoIP databases
Then start the services in separate terminals:
# Terminal 1: Run the mainnet indexer (imports data into ClickHouse)
go run ./indexer/cmd/indexer/ --verbose --migrations-enable
# Optional: run additional environment indexers (each in its own terminal)
go run ./indexer/cmd/indexer/ --dz-env devnet --migrations-enable --create-database --listen-addr :3011
go run ./indexer/cmd/indexer/ --dz-env testnet --migrations-enable --create-database --listen-addr :3012
# Terminal 2: Run the API server
go run ./api/main.go
# Terminal 3: Run the web dev server
cd web
bun install
bun dev
# Optional: for non-localhost access (HTTPS needed for WebGPU)
VITE_HTTPS=1 bun dev --host 0.0.0.0The web app will be at http://localhost:5173, API at http://localhost:8080.
For testing the UI with real production data without running the full indexer, you can set up proxy tables that forward queries from your local ClickHouse to a remote ClickHouse Cloud instance.
Proxy tables are created in a separate lake database to keep them isolated from local data tables in the default database. Existing non-proxy tables are never overwritten unless you pass --force.
-
Add remote credentials to
.env:REMOTE_CH_HOST=your-instance.us-east-1.aws.clickhouse.cloud REMOTE_CH_USER=lake_dev_reader REMOTE_CH_PASSWORD=your-password
-
Pass
--setup-remote-tableswhen starting the indexer:go run ./indexer/cmd/indexer/ --verbose --migrations-enable --setup-remote-tables
-
Point the API server at the remote database:
go run ./api/main.go --use-remote
Alternatively, use the admin CLI directly:
go run ./admin/cmd/admin/ --clickhouse-addr localhost:9100 --setup-remote-tables
The command discovers all tables in the remote lake database and creates local proxies in a lake database, plus proxies for external service tables (e.g., shredder).
Options:
--remote-clickhouse-database/REMOTE_CH_DATABASE— remote database to discover from (default:lake)--force— overwrite existing non-proxy tables
To add proxies for additional external tables, add entries to externalRemoteTables in admin/remotetables/setup.go.
For local-only testing without remote access, seed scripts provide sample data:
# Publisher check test data (shred stats for ~6 sample publishers)
clickhouse-client --port 9100 --multiquery < scripts/seed-publisher-shred-stats.sql
# Edge scoreboard race data (requires SEED_CH_SHREDDER_PASSWORD in .env)
./scripts/seed-shredder-local.sh # all recent data
./scripts/seed-shredder-local.sh 10000 # limit to 10k rows
# Validator data from validators.app (requires SEED_VALIDATORSAPP_API_KEY in .env)
./scripts/seed-validatorsapp-local.shThe shredder and validators.app seed scripts are also run automatically by dev-setup.sh when their credentials are configured in .env.
Seed data is mock data with various states (healthy, retransmitting, needs repair) for UI development.
The agent has evaluation tests that validate the natural language to SQL workflow. Run them with:
./scripts/run-evals.sh # Run all evals in parallel
./scripts/run-evals.sh --show-failures # Show failure logs at end
./scripts/run-evals.sh -s # Short mode (code validation only, no API)
./scripts/run-evals.sh -r 2 # Retry failed tests up to 2 timesOutput goes to eval-runs/<timestamp>/ - check failures.log for any failures.
Lake uses automated CI/CD via GitHub Actions and ArgoCD.
Pushes to staging branches automatically build and deploy:
- Build web assets and upload to S3
- Build Docker image and push to
ghcr.io/malbeclabs/lake - Tag image as
staging(ArgoCD picks up changes automatically)
Current staging branches are configured in .github/workflows/release.docker.lake.yml.
Add the preview-lake label to a PR to trigger a preview build. Assets go to a branch-prefixed location in the preview bucket.
To promote a staging image to production:
Via GitHub Actions (recommended):
- Go to Actions → "promote.lake" workflow
- Run workflow with source_tag=
stagingand target_tag=prod
Via CLI:
./scripts/promote-to-prod.sh # staging → prod (prompts for confirmation)
./scripts/promote-to-prod.sh -n # dry-run, show what would happen
./scripts/promote-to-prod.sh -y # skip confirmation
./scripts/promote-to-prod.sh main prod # promote specific tagArgoCD will automatically sync the new image.
The API server fetches missing static assets from S3 to handle rolling deployments gracefully. When users have cached HTML referencing old JS/CSS bundles, the API fetches those assets from S3 instead of returning 404s.
Configure with:
ASSET_BUCKET_URL=https://my-bucket.s3.amazonaws.com/assetsKey dependencies:
- ClickHouse - Analytics database
- Anthropic API - LLM for natural language features
- InfluxDB (optional) - Device usage metrics source
- MaxMind GeoIP - IP geolocation databases
The API exposes an MCP (Model Context Protocol) server at /api/mcp for use with Claude Desktop and other MCP clients.
| Tool | Description |
|---|---|
execute_sql |
Run SQL queries against ClickHouse |
execute_cypher |
Run Cypher queries against Neo4j (topology, paths) |
get_schema |
Get database schema (tables, columns, types) |
read_docs |
Read DoubleZero documentation |
- Open Settings → Manage Connectors
- Click "Add Custom Connector"
- Enter URL:
https://data.malbeclabs.com/api/mcp
Add a .mcp.json file to your project:
{
"mcpServers": {
"doublezero": {
"type": "http",
"url": "https://data.malbeclabs.com/api/mcp"
}
}
}Lake supports user authentication with daily usage limits.
| Tier | Auth Method | Daily Limit |
|---|---|---|
| Domain users | Google OAuth (allowed domains) | Unlimited |
| Wallet users | Solana wallet (SIWS) | 50 questions |
| Anonymous | IP-based | 5 questions |
Configure with GOOGLE_CLIENT_ID, VITE_GOOGLE_CLIENT_ID, and AUTH_ALLOWED_DOMAINS environment variables. See .env.example for details.