Add backup GSP data synthesis from national data when GSPs are missing#163
Add backup GSP data synthesis from national data when GSPs are missing#163ram-from-tvl wants to merge 2 commits intoopenclimatefix:mainfrom
Conversation
When PVLive returns empty data for individual GSPs, this feature: - Tracks missing GSPs and national-level data - Uses DB_URL to query LocationSQL for GSP capacities - Proportionally distributes national generation to missing GSPs - Returns empty DataFrame with expected columns if no data at all Addresses: openclimatefix#105
There was a problem hiding this comment.
Pull request overview
Adds a fallback mechanism to synthesize per-GSP PV generation when PVLive returns empty data for some GSPs, using national PVLive data scaled by GSP installed capacities from the database.
Changes:
- Track missing GSPs while fetching PVLive historic data and capture national (GSP 0) data when available.
- When some GSPs are missing and national data exists, query
LocationSQLcapacities viaDB_URLand generate proportional per-GSP backup rows. - Return an empty DataFrame with the expected schema when no PVLive data is available at all (avoids
pd.concat([])failure).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if missing_gsps and national_df is not None: | ||
| logger.info(f"Creating backup data for {len(missing_gsps)} missing GSPs using national data") | ||
| db_url = os.getenv("DB_URL") | ||
| if not db_url: | ||
| logger.warning("DB_URL not set, cannot create backup GSP data") | ||
| else: | ||
| try: | ||
| connection = DatabaseConnection(url=db_url) | ||
| with connection.get_session() as session: | ||
| locations = session.query(LocationSQL).filter(LocationSQL.gsp_id.in_(missing_gsps)).all() | ||
| backup_rows = [] | ||
| for _, national_row in national_df.iterrows(): | ||
| national_capacity = national_row['installedcapacity_mwp'] | ||
| if national_capacity == 0 or pd.isna(national_capacity): | ||
| continue | ||
| for location in locations: | ||
| if location.installed_capacity_mw is not None and location.installed_capacity_mw > 0: | ||
| factor = location.installed_capacity_mw / national_capacity | ||
| new_row = national_row.copy() | ||
| new_row['solar_generation_kw'] *= factor | ||
| new_row['gsp_id'] = location.gsp_id | ||
| new_row['installedcapacity_mwp'] = location.installed_capacity_mw | ||
| new_row['capacity_mwp'] = location.installed_capacity_mw | ||
| backup_rows.append(new_row) | ||
| if backup_rows: | ||
| backup_df = pd.DataFrame(backup_rows) | ||
| all_gsps_yields.append(backup_df) | ||
| logger.info(f"Created backup data for {len(backup_rows)} entries") | ||
| except Exception as e: | ||
| logger.error(f"Error creating backup GSP data: {e}") | ||
|
|
||
| if not all_gsps_yields: | ||
| return pd.DataFrame( | ||
| columns=[ | ||
| "target_datetime_utc", | ||
| "solar_generation_kw", | ||
| "gsp_id", | ||
| "installedcapacity_mwp", | ||
| "capacity_mwp", | ||
| "regime", | ||
| "pvlive_updated_utc", | ||
| ] | ||
| ) |
There was a problem hiding this comment.
This change introduces new behavior (synthesizing missing GSP series from national + DB capacities, and returning an empty DataFrame with a fixed schema when no data is available). Please add unit tests covering (1) at least one missing GSP being backfilled when DB_URL is set, and (2) the all-empty case returning the expected columns, to prevent regressions/flaky behavior.
There was a problem hiding this comment.
@copilot open a new pull request to apply changes based on this feedback
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
Following comments in #105 decided not to implement this now. Thanks @ram-from-tvl for you work on this |
Pull Request
Description
When PVLive returns empty data for individual GSPs, this feature:
Fixes #
Checklist: