Skip to content

Add backup GSP data synthesis from national data when GSPs are missing#163

Closed
ram-from-tvl wants to merge 2 commits intoopenclimatefix:mainfrom
ram-from-tvl:feature/backup-gsp-data-from-national
Closed

Add backup GSP data synthesis from national data when GSPs are missing#163
ram-from-tvl wants to merge 2 commits intoopenclimatefix:mainfrom
ram-from-tvl:feature/backup-gsp-data-from-national

Conversation

@ram-from-tvl
Copy link
Copy Markdown
Contributor

Pull Request

Description

When PVLive returns empty data for individual GSPs, this feature:

  • Tracks missing GSPs and national-level data
  • Uses DB_URL to query LocationSQL for GSP capacities
  • Proportionally distributes national generation to missing GSPs
  • Returns empty DataFrame with expected columns if no data at all

Fixes #

Checklist:

  • My code follows OCF's coding style guidelines
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked my code and corrected any misspellings

When PVLive returns empty data for individual GSPs, this feature:
- Tracks missing GSPs and national-level data
- Uses DB_URL to query LocationSQL for GSP capacities
- Proportionally distributes national generation to missing GSPs
- Returns empty DataFrame with expected columns if no data at all

Addresses: openclimatefix#105
Copilot AI review requested due to automatic review settings February 17, 2026 06:20
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a fallback mechanism to synthesize per-GSP PV generation when PVLive returns empty data for some GSPs, using national PVLive data scaled by GSP installed capacities from the database.

Changes:

  • Track missing GSPs while fetching PVLive historic data and capture national (GSP 0) data when available.
  • When some GSPs are missing and national data exists, query LocationSQL capacities via DB_URL and generate proportional per-GSP backup rows.
  • Return an empty DataFrame with the expected schema when no PVLive data is available at all (avoids pd.concat([]) failure).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread solar_consumer/data/fetch_gb_data.py Outdated
Comment on lines +179 to +221
if missing_gsps and national_df is not None:
logger.info(f"Creating backup data for {len(missing_gsps)} missing GSPs using national data")
db_url = os.getenv("DB_URL")
if not db_url:
logger.warning("DB_URL not set, cannot create backup GSP data")
else:
try:
connection = DatabaseConnection(url=db_url)
with connection.get_session() as session:
locations = session.query(LocationSQL).filter(LocationSQL.gsp_id.in_(missing_gsps)).all()
backup_rows = []
for _, national_row in national_df.iterrows():
national_capacity = national_row['installedcapacity_mwp']
if national_capacity == 0 or pd.isna(national_capacity):
continue
for location in locations:
if location.installed_capacity_mw is not None and location.installed_capacity_mw > 0:
factor = location.installed_capacity_mw / national_capacity
new_row = national_row.copy()
new_row['solar_generation_kw'] *= factor
new_row['gsp_id'] = location.gsp_id
new_row['installedcapacity_mwp'] = location.installed_capacity_mw
new_row['capacity_mwp'] = location.installed_capacity_mw
backup_rows.append(new_row)
if backup_rows:
backup_df = pd.DataFrame(backup_rows)
all_gsps_yields.append(backup_df)
logger.info(f"Created backup data for {len(backup_rows)} entries")
except Exception as e:
logger.error(f"Error creating backup GSP data: {e}")

if not all_gsps_yields:
return pd.DataFrame(
columns=[
"target_datetime_utc",
"solar_generation_kw",
"gsp_id",
"installedcapacity_mwp",
"capacity_mwp",
"regime",
"pvlive_updated_utc",
]
)
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change introduces new behavior (synthesizing missing GSP series from national + DB capacities, and returning an empty DataFrame with a fixed schema when no data is available). Please add unit tests covering (1) at least one missing GSP being backfilled when DB_URL is set, and (2) the all-empty case returning the expected columns, to prevent regressions/flaky behavior.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@peterdudfield
Copy link
Copy Markdown
Contributor

Following comments in #105 decided not to implement this now. Thanks @ram-from-tvl for you work on this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PVLive: if not national, sum up gsp

3 participants