Skip to content

finnfujimura/LEI-SRML-Intern-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LEI SRML Intern Project

Data cleaning and calibration pipeline for SRML Data.

Dependencies

Base pipeline:

python -m pip install pandas pyarrow

Plotting stack:

python -m pip install holoviews hvplot notebook bokeh matplotlib

Usage

The pipeline currently scopes the STW dataset to rows from August 31, 2024 and earlier. Anything after that date is ignored during ingest.

Run the full pipeline:

python run_stw_pipeline.py

Run individual steps for debugging:

python scripts/01_ingest_raw_to_yearly_cleaned.py
python scripts/02_fill_yearly_gaps.py
python scripts/03_build_combined_cleaned.py
python scripts/04_write_reports_and_recheck.py
python scripts/05_map_combined_columns.py
python scripts/06_convert_mapped_to_mv_irr.py
python scripts/07_detect_irr_irregularity_events.py

Generate metric plots from the notebooks:

python -m notebook plots/ghi_irr_plots.ipynb
python -m notebook plots/dni_irr_plots.ipynb
python -m notebook plots/dhi_irr_plots.ipynb
python -m notebook plots/temp_plots.ipynb

Each notebook:

  • loads only one metric (GHI, DNI, DHI, or TEMP)
  • loops across available years by default
  • writes PNGs into plots/ghi_plots/, plots/dni_plots/, plots/dhi_plots/, or plots/temp_plots/
  • lets you change WINDOW_YEARS from 1 to 2 if you want two-year windows instead
  • can optionally render zoomable inline notebook plots for only the years in INTERACTIVE_YEARS
  • can optionally lighten those interactive plots with INTERACTIVE_DOWNSAMPLE_RULE like "5min"

Launch the legacy full-history notebook explorer:

python -m notebook plots/stw_mV_Irr_explorer.ipynb

Create contextual plots for the detected IRR irregularity events:

python -m notebook plots/irr_irregularity_events.ipynb

The detector now defaults to large, sustained irradiance windows that are useful for spotting bad mapping periods or other structural breaks. If you still want the older minute-spike detector, run:

python scripts/07_detect_irr_irregularity_events.py --mode first_difference

Outputs

  • final output/stw_mV_Irr.csv: final calibrated output.
  • final output/stw_mV_Irr.parquet: Parquet copy for interactive notebook exploration.
  • Detected outliers in the converted *_mV and *_Irr series are rewritten to NaN, and TEMP is also cleaned when it falls outside its allowed range, so plots keep the timestamps but show gaps instead of spikes.
  • yearly cleaned/: yearly cleaned CSVs.
  • reports/: missing-timestamp logs, processing summary, recheck report, mapped intermediate CSVs, outlier report, and pipeline state.
  • reports/stw_irr_irregularity_events.csv: large IRR irregularity windows, with the default detector emphasizing sustained mapping/balance breaks and severe one-off failures instead of tiny minute-to-minute spikes.
  • plots/ghi_plots/, plots/dni_plots/, plots/dhi_plots/, plots/temp_plots/: PNG exports generated by the metric notebooks.
  • plots/ghi_irr_plots.ipynb, plots/dni_irr_plots.ipynb, plots/dhi_irr_plots.ipynb, plots/temp_plots.ipynb: metric-specific notebooks for yearly or two-year plot exports.
  • plots/irr_irregularity_events.ipynb: notebook that exports contextual plots for each detected irregularity event.
  • plots/stw_mV_Irr_explorer.ipynb: Datashader notebook for full-history interactive plots.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors