Data cleaning and calibration pipeline for SRML Data.
Base pipeline:
python -m pip install pandas pyarrowPlotting stack:
python -m pip install holoviews hvplot notebook bokeh matplotlibThe pipeline currently scopes the STW dataset to rows from August 31, 2024 and earlier. Anything after that date is ignored during ingest.
Run the full pipeline:
python run_stw_pipeline.pyRun individual steps for debugging:
python scripts/01_ingest_raw_to_yearly_cleaned.py
python scripts/02_fill_yearly_gaps.py
python scripts/03_build_combined_cleaned.py
python scripts/04_write_reports_and_recheck.py
python scripts/05_map_combined_columns.py
python scripts/06_convert_mapped_to_mv_irr.py
python scripts/07_detect_irr_irregularity_events.pyGenerate metric plots from the notebooks:
python -m notebook plots/ghi_irr_plots.ipynb
python -m notebook plots/dni_irr_plots.ipynb
python -m notebook plots/dhi_irr_plots.ipynb
python -m notebook plots/temp_plots.ipynbEach notebook:
- loads only one metric (
GHI,DNI,DHI, orTEMP) - loops across available years by default
- writes PNGs into
plots/ghi_plots/,plots/dni_plots/,plots/dhi_plots/, orplots/temp_plots/ - lets you change
WINDOW_YEARSfrom1to2if you want two-year windows instead - can optionally render zoomable inline notebook plots for only the years in
INTERACTIVE_YEARS - can optionally lighten those interactive plots with
INTERACTIVE_DOWNSAMPLE_RULElike"5min"
Launch the legacy full-history notebook explorer:
python -m notebook plots/stw_mV_Irr_explorer.ipynbCreate contextual plots for the detected IRR irregularity events:
python -m notebook plots/irr_irregularity_events.ipynbThe detector now defaults to large, sustained irradiance windows that are useful for spotting bad mapping periods or other structural breaks. If you still want the older minute-spike detector, run:
python scripts/07_detect_irr_irregularity_events.py --mode first_differencefinal output/stw_mV_Irr.csv: final calibrated output.final output/stw_mV_Irr.parquet: Parquet copy for interactive notebook exploration.- Detected outliers in the converted
*_mVand*_Irrseries are rewritten toNaN, andTEMPis also cleaned when it falls outside its allowed range, so plots keep the timestamps but show gaps instead of spikes. yearly cleaned/: yearly cleaned CSVs.reports/: missing-timestamp logs, processing summary, recheck report, mapped intermediate CSVs, outlier report, and pipeline state.reports/stw_irr_irregularity_events.csv: large IRR irregularity windows, with the default detector emphasizing sustained mapping/balance breaks and severe one-off failures instead of tiny minute-to-minute spikes.plots/ghi_plots/,plots/dni_plots/,plots/dhi_plots/,plots/temp_plots/: PNG exports generated by the metric notebooks.plots/ghi_irr_plots.ipynb,plots/dni_irr_plots.ipynb,plots/dhi_irr_plots.ipynb,plots/temp_plots.ipynb: metric-specific notebooks for yearly or two-year plot exports.plots/irr_irregularity_events.ipynb: notebook that exports contextual plots for each detected irregularity event.plots/stw_mV_Irr_explorer.ipynb: Datashader notebook for full-history interactive plots.