GridSync — Claude Context

Project Overview

A data hub that downloads ERCOT public ESIID data (Report 203), archives it to Backblaze B2, loads it into Postgres, and exposes a fuzzy search API. Designed to be fully provider-agnostic — no AWS, GCP, or Azure dependencies.

Architecture Principles

No cloud vendor lock-in — every component runs on any Linux server, VPS, or container host
Primary language: Rust — use Rust for all components where viable
Fallback language: Go — only where Rust tooling or SDK support is insufficient
Storage: Backblaze B2 — S3-compatible object storage for raw file archives
Database: Supabase / Postgres — same schema either way; self-hostable
No managed schedulers — system cron or pg_cron only

Data Source

ERCOT Report 203 — Public TDSP ESIID Extract (no auth required)

Document list API: https://www.ercot.com/misapp/servlets/IceDocListJsonWS?reportTypeId=203
Download URL: https://www.ercot.com/misdownload/servlets/mirDownload?doclookupId={id}
File naming: ext.00000203.0000000000000000.YYYYMMDD.HHMMSSXXX.{TDSP}_{TYPE}.zip
File types: _FUL (full monthly snapshot) and _DAILY (incremental delta)
FUL publish cadence: first Tuesday following the first Monday of each month
DAILY publish cadence: every day
ERCOT retains ~31 days of documents (~380 files in the feed at any time)
Estimated data volume: 3–5 million rows across all TDSPs

TDSPs published: CENTERPOINT, ONCOR_ELEC, LUBBOCK, NUECES_ELEC, TNMP, AEP_CENTRAL, AEP_NORTH, ENTERGY_GULF, SHARYLAND, AEP_TEXAS_SP, SWEPCO_ENERG

CSV schema (24 fields):

esiid, address, address_overflow, city, state, zip_code, duns, read_cycle,
status, premise_type, power_region, utility, market_id, station_code,
station_name, metered, service_orders, polr_class, settlement_ams, tdsp_ams,
switch_hold, market_code, settlement_load_zone, publish_date

Rolling Data Strategy

FUL load — upsert all rows, then deactivate (is_active = FALSE) any ESIID for that TDSP not present in the file
DAILY load — upsert only; no deactivation
Lookback — 2 days for DAILY, 7 days for FUL (catches the monthly file automatically)
Downloader skips files already present in B2 (checksum/existence check), so reruns are safe

Project Structure

gridsync/
├── Cargo.toml                    # Workspace — all dependency versions pinned here
├── rust-toolchain.toml           # Pins Rust stable channel
├── .mise.toml                    # Tool versions + .env autoloading
├── .env.example                  # Required environment variables
├── .gitignore
├── db/
│   └── migrations/
│       ├── 0001_extensions.sql   # pg_trgm, fuzzystrmatch, pg_cron
│       ├── 0002_create_esiids.sql
│       ├── 0003_create_pipeline_runs.sql
│       ├── 0004_indexes.sql
│       └── 0005_triggers.sql
├── .mise/tasks/                  # All runnable tasks (see Commands section)
└── crates/
    ├── common/                   # Shared: Config, B2Client, DB pool, models, errors
    ├── downloader/               # Fetches ERCOT doc list → downloads → uploads to B2
    ├── etl/                      # Downloads from B2 → parses ZIP/CSV → upserts Postgres
    └── api/                      # Axum REST API — fuzzy search endpoint

Crates

`esiid-common`

Shared library used by all other crates.

Module	Purpose
`config`	`Config::from_env()` — reads all env vars, returns typed struct
`db`	`connect()` pool setup, `migrate()` runs embedded migrations
`storage`	`B2Client` — upload, download, exists, `object_key()` helper
`models`	`Esiid`, `PipelineRun`, `UpsertStats` structs with `sqlx::FromRow`
`error`	`AppError` enum wrapping sqlx, anyhow, and domain errors

`esiid-downloader`

Binary: downloader

Polls the ERCOT document list API, compares against B2 to find new files, downloads and uploads them, records each run in pipeline_runs.

Key modules:

ercot — ErcotClient: list_documents(), download(doc_id), parse_doc() (extracts TDSP + FileType from filename)
storage — re-exports esiid_common::storage::B2Client

CLI flags:

--full-only        Only download FUL files
--daily-only       Only download DAILY files
--tdsp <name>      Filter to a specific TDSP
--dry-run          List files that would be downloaded without fetching

`esiid-etl`

Binary: etl

Downloads a specific file from B2, parses the ZIP/CSV, upserts into Postgres. For FUL files, deactivates ESIIDs no longer in the extract.

Key modules:

parser — parse_zip() extracts CSV from ZIP; ErcotRecord maps CSV headers exactly
loader — upsert_records() uses UNNEST bulk upsert in 1,000-row batches; deactivate_missing() for FUL cleanup

CLI flags:

--key <b2-key>         B2 object key to process
--tdsp <name>          TDSP name (required for FUL deactivation)
--file-type <FUL|DAILY>
--skip-deactivate      Skip deactivation step on FUL files

`esiid-api`

Binary: api

Axum HTTP server with two routes:

GET /search — fuzzy search (see Search section)
GET /health — returns "ok"

CLI flags:

--port <n>    Override API_PORT env var

Database Schema

`esiids` table

id — BIGSERIAL PRIMARY KEY (surrogate; integer for index efficiency at 3–5M rows)
esiid — TEXT NOT NULL UNIQUE (ERCOT stable identifier; upsert key)
All 24 ERCOT CSV fields stored as TEXT or DATE — no coercion at ingest time
Pipeline tracking: first_seen_at, last_seen_at, updated_at, is_active
search_vector TSVECTOR — maintained by trigger, weighted: ESIID=A, address=B, city/zip=C

`pipeline_runs` table

Tracks every downloader and ETL run. Fields: source_file, source_checksum (SHA-256), tdsp, file_type (enum: FUL/DAILY), publish_date, timing, row counts, status (enum: running/completed/failed), error_message.

Indexes

B-tree on: utility, zip_code, premise_type, settlement_load_zone, status, is_active, publish_date
GIN trigram (gin_trgm_ops) on: address, city
GIN on: search_vector

Triggers

trg_esiids_updated_at — sets updated_at = NOW() on every update
trg_esiids_search_vector — rebuilds search_vector on insert/update using 'simple' dictionary (no stemming — address data benefits from exact/prefix matching)

Search

GET /search query parameters:

Param	Type	Description
`q`	string	Free-text fuzzy query — matched via FTS + trigram similarity
`zip`	string	Exact zip code filter
`utility`	string	TDSP name filter (case-insensitive)
`premise_type`	string	e.g. RES, COM (case-insensitive)
`load_zone`	string	Settlement load zone filter
`active_only`	bool	Default `true` — only return `is_active = TRUE` rows
`limit`	int	Max results, clamped 1–100, default 25
`offset`	int	Pagination offset, default 0

When q is provided, results are ordered by ts_rank (FTS) DESC then similarity(address, q) DESC. Without q, results are ordered by esiid.

Response: { total, limit, offset, results: Esiid[] }

B2 Object Key Convention

raw/{source}/{dataset}/YYYY/MM/DD/{original-filename}.zip

The source and dataset segments allow the bucket to expand beyond ESIID data without restructuring.

Segment	Current value	Future examples
`source`	`ercot`	`oncor`, `centerpoint`
`dataset`	`esiid`	`settlement`, `outages`, `usage`

Example: raw/ercot/esiid/2026/03/22/ext.00000203...ONCOR_ELEC_FUL.zip

Environment Variables

Variable	Required	Description
`DATABASE_URL`	Yes	Postgres connection string
`B2_KEY_ID`	Yes	Backblaze B2 application key ID
`B2_APPLICATION_KEY`	Yes	Backblaze B2 application key
`B2_BUCKET_NAME`	Yes	B2 bucket name
`B2_ENDPOINT`	Yes	e.g. `https://s3.us-east-005.backblazeb2.com`
`B2_REGION`	No	Default: `us-east-005`
`ERCOT_API_BASE`	No	Default: `https://www.ercot.com`
`API_HOST`	No	Default: `0.0.0.0`
`API_PORT`	No	Default: `3000`
`RUST_LOG`	No	Tracing filter, e.g. `info` or `esiid_api=debug`

The .env file is autoloaded by mise when entering the project directory.

Commands

All tasks run via mise run <task>. The .env file is loaded automatically.

mise run install          # Install sqlx-cli and cargo-watch
mise run build            # cargo build --workspace (debug)
mise run release          # cargo build --workspace --release
mise run check            # cargo check + cargo clippy -D warnings
mise run fmt              # cargo fmt --all
mise run test             # cargo test --workspace
mise run api              # Run the search API server
mise run watch            # Live-reload API server (requires cargo-watch)
mise run download         # Run the downloader (pass flags with --)
mise run etl              # Run the ETL (pass flags with --)
mise run cron             # Run the full pipeline (download + ETL signal)
mise run db:migrate       # Apply all pending migrations
mise run db:status        # Show applied/pending migration status
mise run db:revert        # Revert the last applied migration
mise run db:prepare       # Generate sqlx offline query cache (.sqlx/)

First-time setup

cp .env.example .env
# fill in DATABASE_URL, B2_*, etc.

mise install              # install Rust stable
mise run install          # install sqlx-cli + cargo-watch
mise run db:migrate       # apply migrations
mise run build            # verify everything compiles
mise run api              # start the server

sqlx offline mode

After adding or changing any sqlx::query! / sqlx::query_as! calls, regenerate the cache so CI can build without a live database:

mise run db:prepare       # writes .sqlx/ directory — commit this

Key Design Decisions

Surrogate BIGSERIAL PK + UNIQUE(esiid) — integer PK is faster for B-tree ops and foreign keys at 3–5M rows; ESIID remains the upsert key
All ERCOT fields stored as TEXT — avoids ingestion failures from unexpected values; coercion happens in the application layer
UNNEST bulk upsert — one round-trip per 1,000-row batch instead of N individual queries
'simple' FTS dictionary — address data doesn't benefit from stemming; exact/prefix matching is more useful
search_vector weights — ESIID exact hits (A) rank above address matches (B) which rank above city/zip (C)
B2Client in common — shared between downloader and ETL; no duplication
Migrations embedded in binaries — each binary runs sqlx::migrate! on startup so schema is always current
FUL deactivation — after a full-snapshot load, ESIIDs for that TDSP not present in the file are marked is_active = FALSE rather than deleted, preserving history

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GridSync — Claude Context

Project Overview

Architecture Principles

Data Source

Rolling Data Strategy

Project Structure

Crates

`esiid-common`

`esiid-downloader`

`esiid-etl`

`esiid-api`

Database Schema

`esiids` table

`pipeline_runs` table

Indexes

Triggers

Search

B2 Object Key Convention

Environment Variables

Commands

First-time setup

sqlx offline mode

Key Design Decisions

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

GridSync — Claude Context

Project Overview

Architecture Principles

Data Source

Rolling Data Strategy

Project Structure

Crates

esiid-common

esiid-downloader

esiid-etl

esiid-api

Database Schema

esiids table

pipeline_runs table

Indexes

Triggers

Search

B2 Object Key Convention

Environment Variables

Commands

First-time setup

sqlx offline mode

Key Design Decisions

`esiid-common`

`esiid-downloader`

`esiid-etl`

`esiid-api`

`esiids` table

`pipeline_runs` table