This repository is a self-contained in-class activity showing idiomatic pandas groupby + datetime patterns
on climate data, now powered by a cloud dataset (NOAA GHCN Daily, public S3).
- (Notebook 02) Fetch Illinois GHCN daily data from NOAA's public S3 for 4 stations with ≥30 years of record and write a local Parquet file.
- (Notebook 01) Load that Parquet (or read via a cloud HTTPS raw URL once you push the repo) and perform
groupby+datetime analytics:- Monthly means (per station), annual totals & rankings
- Station-by-month climatology
- Warm-season aggregations
- Rolling/windowed stats per station
groupbyvsresample
# conda/mamba
mamba env create -f environment.yml
mamba activate pandas-datetime-climate
# or pip
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
# Launch
jupyter labOpen notebooks in this order:
notebooks/02_fetch_ghcn_il_to_parquet.ipynb(createsdata/ghcn_il_top4_daily.parquet)notebooks/01_groupby_datetime.ipynb(loads that Parquet and runs the analytics)
After pushing this repo to GitHub, you can read the Parquet over HTTPS by setting:
CLOUD_PARQUET = "https://raw.githubusercontent.com/USER/REPO/main/data/ghcn_il_top4_daily.parquet"
in 01_groupby_datetime.ipynb.
grad-analytics-pandas-datetime-climate_v2/
├── data/
│ └── ghcn_il_top4_daily.parquet # created by Notebook 02 (not committed by default)
├── notebooks/
│ ├── 01_groupby_datetime.ipynb # the analysis exercise
│ └── 02_fetch_ghcn_il_to_parquet.ipynb # cloud → parquet tutorial
├── src/
│ └── io_helpers.py
├── environment.yml
├── requirements.txt
└── README.md
MIT