Skip to content

Conversation

@griffinsharps
Copy link
Contributor

No description provided.

@griffinsharps griffinsharps linked an issue Dec 8, 2025 that may be closed by this pull request
Griffin Sharps added 2 commits December 10, 2025 20:04
- Add manifest-based metadata sampling:
  - smart_meter_analysis.manifests.ensure_account_manifest()
  - smart_meter_analysis.manifests.ensure_date_manifest()
  - Memory now scales with number of accounts / days, not 300M+ rows.
- Update prepare_clustering_data_households.py to:
  - Use manifests for sampling accounts and dates.
  - Support chunked streaming profile construction by household.
  - Successfully build ~540k profiles from 10k files, 20k households.
- Update euclidean_clustering.py to support large-N runs
  - Sampled silhouette evaluation and improved logging.
- Wire scripts/run_comed_pipeline.py to:
  - Default to streaming profile construction.
  - Expose sampling and chunk-size parameters at the CLI.
  - Treat Stage 1 as the high-volume streaming path.

Tested:
- End-to-end run on 202308_10000:
  - ~335M interval rows
  - ~225k households
  - 20k sampled households x 31 days
  - 538,552 complete profiles clustered into 4 groups
@griffinsharps griffinsharps merged commit e88b683 into main Dec 10, 2025
7 checks passed
@griffinsharps griffinsharps deleted the 39-smart-meter-analysis-figure-out-high-volume-analysis branch December 10, 2025 20:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Figure out high-volume analysis

2 participants