A data hub that downloads ERCOT public ESIID data (Report 203), archives it to Backblaze B2, loads it into Postgres, and exposes a fuzzy search API. Designed to be fully provider-agnostic — no AWS, GCP, or Azure dependencies.
- No cloud vendor lock-in — every component runs on any Linux server, VPS, or container host
- Primary language: Rust — use Rust for all components where viable
- Fallback language: Go — only where Rust tooling or SDK support is insufficient
- Storage: Backblaze B2 — S3-compatible object storage for raw file archives
- Database: Supabase / Postgres — same schema either way; self-hostable
- No managed schedulers — system
cronorpg_crononly
ERCOT Report 203 — Public TDSP ESIID Extract (no auth required)
- Document list API:
https://www.ercot.com/misapp/servlets/IceDocListJsonWS?reportTypeId=203 - Download URL:
https://www.ercot.com/misdownload/servlets/mirDownload?doclookupId={id} - File naming:
ext.00000203.0000000000000000.YYYYMMDD.HHMMSSXXX.{TDSP}_{TYPE}.zip - File types:
_FUL(full monthly snapshot) and_DAILY(incremental delta) - FUL publish cadence: first Tuesday following the first Monday of each month
- DAILY publish cadence: every day
- ERCOT retains ~31 days of documents (~380 files in the feed at any time)
- Estimated data volume: 3–5 million rows across all TDSPs
TDSPs published: CENTERPOINT, ONCOR_ELEC, LUBBOCK, NUECES_ELEC, TNMP, AEP_CENTRAL, AEP_NORTH, ENTERGY_GULF, SHARYLAND, AEP_TEXAS_SP, SWEPCO_ENERG
CSV schema (24 fields):
esiid, address, address_overflow, city, state, zip_code, duns, read_cycle,
status, premise_type, power_region, utility, market_id, station_code,
station_name, metered, service_orders, polr_class, settlement_ams, tdsp_ams,
switch_hold, market_code, settlement_load_zone, publish_date
- FUL load — upsert all rows, then deactivate (
is_active = FALSE) any ESIID for that TDSP not present in the file - DAILY load — upsert only; no deactivation
- Lookback — 2 days for DAILY, 7 days for FUL (catches the monthly file automatically)
- Downloader skips files already present in B2 (checksum/existence check), so reruns are safe
gridsync/
├── Cargo.toml # Workspace — all dependency versions pinned here
├── rust-toolchain.toml # Pins Rust stable channel
├── .mise.toml # Tool versions + .env autoloading
├── .env.example # Required environment variables
├── .gitignore
├── db/
│ └── migrations/
│ ├── 0001_extensions.sql # pg_trgm, fuzzystrmatch, pg_cron
│ ├── 0002_create_esiids.sql
│ ├── 0003_create_pipeline_runs.sql
│ ├── 0004_indexes.sql
│ └── 0005_triggers.sql
├── .mise/tasks/ # All runnable tasks (see Commands section)
└── crates/
├── common/ # Shared: Config, B2Client, DB pool, models, errors
├── downloader/ # Fetches ERCOT doc list → downloads → uploads to B2
├── etl/ # Downloads from B2 → parses ZIP/CSV → upserts Postgres
└── api/ # Axum REST API — fuzzy search endpoint
Shared library used by all other crates.
| Module | Purpose |
|---|---|
config |
Config::from_env() — reads all env vars, returns typed struct |
db |
connect() pool setup, migrate() runs embedded migrations |
storage |
B2Client — upload, download, exists, object_key() helper |
models |
Esiid, PipelineRun, UpsertStats structs with sqlx::FromRow |
error |
AppError enum wrapping sqlx, anyhow, and domain errors |
Binary: downloader
Polls the ERCOT document list API, compares against B2 to find new files, downloads and uploads them, records each run in pipeline_runs.
Key modules:
ercot—ErcotClient:list_documents(),download(doc_id),parse_doc()(extracts TDSP + FileType from filename)storage— re-exportsesiid_common::storage::B2Client
CLI flags:
--full-only Only download FUL files
--daily-only Only download DAILY files
--tdsp <name> Filter to a specific TDSP
--dry-run List files that would be downloaded without fetching
Binary: etl
Downloads a specific file from B2, parses the ZIP/CSV, upserts into Postgres. For FUL files, deactivates ESIIDs no longer in the extract.
Key modules:
parser—parse_zip()extracts CSV from ZIP;ErcotRecordmaps CSV headers exactlyloader—upsert_records()uses UNNEST bulk upsert in 1,000-row batches;deactivate_missing()for FUL cleanup
CLI flags:
--key <b2-key> B2 object key to process
--tdsp <name> TDSP name (required for FUL deactivation)
--file-type <FUL|DAILY>
--skip-deactivate Skip deactivation step on FUL files
Binary: api
Axum HTTP server with two routes:
GET /search— fuzzy search (see Search section)GET /health— returns"ok"
CLI flags:
--port <n> Override API_PORT env var
id—BIGSERIAL PRIMARY KEY(surrogate; integer for index efficiency at 3–5M rows)esiid—TEXT NOT NULL UNIQUE(ERCOT stable identifier; upsert key)- All 24 ERCOT CSV fields stored as
TEXTorDATE— no coercion at ingest time - Pipeline tracking:
first_seen_at,last_seen_at,updated_at,is_active search_vector TSVECTOR— maintained by trigger, weighted: ESIID=A, address=B, city/zip=C
Tracks every downloader and ETL run. Fields: source_file, source_checksum (SHA-256), tdsp, file_type (enum: FUL/DAILY), publish_date, timing, row counts, status (enum: running/completed/failed), error_message.
- B-tree on:
utility,zip_code,premise_type,settlement_load_zone,status,is_active,publish_date - GIN trigram (
gin_trgm_ops) on:address,city - GIN on:
search_vector
trg_esiids_updated_at— setsupdated_at = NOW()on every updatetrg_esiids_search_vector— rebuildssearch_vectoron insert/update using'simple'dictionary (no stemming — address data benefits from exact/prefix matching)
GET /search query parameters:
| Param | Type | Description |
|---|---|---|
q |
string | Free-text fuzzy query — matched via FTS + trigram similarity |
zip |
string | Exact zip code filter |
utility |
string | TDSP name filter (case-insensitive) |
premise_type |
string | e.g. RES, COM (case-insensitive) |
load_zone |
string | Settlement load zone filter |
active_only |
bool | Default true — only return is_active = TRUE rows |
limit |
int | Max results, clamped 1–100, default 25 |
offset |
int | Pagination offset, default 0 |
When q is provided, results are ordered by ts_rank (FTS) DESC then similarity(address, q) DESC. Without q, results are ordered by esiid.
Response: { total, limit, offset, results: Esiid[] }
raw/{source}/{dataset}/YYYY/MM/DD/{original-filename}.zip
The source and dataset segments allow the bucket to expand beyond ESIID data without restructuring.
| Segment | Current value | Future examples |
|---|---|---|
source |
ercot |
oncor, centerpoint |
dataset |
esiid |
settlement, outages, usage |
Example: raw/ercot/esiid/2026/03/22/ext.00000203...ONCOR_ELEC_FUL.zip
| Variable | Required | Description |
|---|---|---|
DATABASE_URL |
Yes | Postgres connection string |
B2_KEY_ID |
Yes | Backblaze B2 application key ID |
B2_APPLICATION_KEY |
Yes | Backblaze B2 application key |
B2_BUCKET_NAME |
Yes | B2 bucket name |
B2_ENDPOINT |
Yes | e.g. https://s3.us-east-005.backblazeb2.com |
B2_REGION |
No | Default: us-east-005 |
ERCOT_API_BASE |
No | Default: https://www.ercot.com |
API_HOST |
No | Default: 0.0.0.0 |
API_PORT |
No | Default: 3000 |
RUST_LOG |
No | Tracing filter, e.g. info or esiid_api=debug |
The .env file is autoloaded by mise when entering the project directory.
All tasks run via mise run <task>. The .env file is loaded automatically.
mise run install # Install sqlx-cli and cargo-watch
mise run build # cargo build --workspace (debug)
mise run release # cargo build --workspace --release
mise run check # cargo check + cargo clippy -D warnings
mise run fmt # cargo fmt --all
mise run test # cargo test --workspace
mise run api # Run the search API server
mise run watch # Live-reload API server (requires cargo-watch)
mise run download # Run the downloader (pass flags with --)
mise run etl # Run the ETL (pass flags with --)
mise run cron # Run the full pipeline (download + ETL signal)
mise run db:migrate # Apply all pending migrations
mise run db:status # Show applied/pending migration status
mise run db:revert # Revert the last applied migration
mise run db:prepare # Generate sqlx offline query cache (.sqlx/)
cp .env.example .env
# fill in DATABASE_URL, B2_*, etc.
mise install # install Rust stable
mise run install # install sqlx-cli + cargo-watch
mise run db:migrate # apply migrations
mise run build # verify everything compiles
mise run api # start the serverAfter adding or changing any sqlx::query! / sqlx::query_as! calls, regenerate the cache so CI can build without a live database:
mise run db:prepare # writes .sqlx/ directory — commit this- Surrogate
BIGSERIALPK +UNIQUE(esiid)— integer PK is faster for B-tree ops and foreign keys at 3–5M rows; ESIID remains the upsert key - All ERCOT fields stored as
TEXT— avoids ingestion failures from unexpected values; coercion happens in the application layer - UNNEST bulk upsert — one round-trip per 1,000-row batch instead of N individual queries
'simple'FTS dictionary — address data doesn't benefit from stemming; exact/prefix matching is more usefulsearch_vectorweights — ESIID exact hits (A) rank above address matches (B) which rank above city/zip (C)B2Clientincommon— shared between downloader and ETL; no duplication- Migrations embedded in binaries — each binary runs
sqlx::migrate!on startup so schema is always current - FUL deactivation — after a full-snapshot load, ESIIDs for that TDSP not present in the file are marked
is_active = FALSErather than deleted, preserving history