Skip to content

Commit 02e4a61

Browse files
Merge pull request #344 from PowerGridModel/feature/optional-extra
Logic to handle optional_extra in Vision Excel converter input
2 parents d3fd230 + 734345b commit 02e4a61

15 files changed

+1113
-26
lines changed

docs/converters/vision_converter.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,57 @@ fields of interest.
8989

9090
An examplery usage can be found in the example notebook as well as in the test cases.
9191

92+
## Optional extra columns
93+
94+
When working with Vision Excel exports, some metadata columns (like `GUID` or `StationID`) may not always be present,
95+
especially in partial exports.
96+
The `optional_extra` feature allows you to specify columns that should be included in `extra_info` if present,
97+
but won't cause conversion failure if missing.
98+
99+
**Syntax:**
100+
101+
```yaml
102+
grid:
103+
Transformers:
104+
transformer:
105+
id:
106+
auto_id:
107+
key: Number
108+
# ... other fields ...
109+
extra:
110+
- ID # Required - fails if missing
111+
- Name # Required - fails if missing
112+
- optional_extra:
113+
- GUID # Optional - skipped if missing
114+
- StationID # Optional - skipped if missing
115+
```
116+
117+
**Behavior:**
118+
119+
- Required columns (listed directly under `extra`) will cause a KeyError if missing
120+
- Optional columns (nested under `optional_extra`) are silently skipped if not found
121+
- If some optional columns are present and others missing, only the present ones are included in `extra_info`
122+
- This feature is particularly useful for handling different Vision export configurations or versions
123+
124+
**Duplicate handling:**
125+
When a column appears in both the regular `extra` list and within `optional_extra`,
126+
the regular `extra` entry takes precedence and duplicates are automatically eliminated from `optional_extra`:
127+
128+
```yaml
129+
extra:
130+
- ID # Regular column - always processed
131+
- Name # Regular column - always processed
132+
- optional_extra:
133+
- ID # Duplicate - automatically removed
134+
- GUID # Unique optional - processed if present
135+
- StationID # Unique optional - processed if present
136+
```
137+
138+
In this example, `ID` will only be processed once (from the regular `extra` list),
139+
while `GUID` and `StationID` are processed as optional columns.
140+
This prevents duplicate data in the resulting `extra_info`
141+
and ensures consistent behavior regardless of column ordering.
142+
92143
## Common/Known issues related to Vision
93144

94145
So far we have the following issue known to us related to Vision exported spread sheets.

src/power_grid_model_io/converters/tabular_converter.py

Lines changed: 104 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -337,11 +337,14 @@ def _handle_extra_info( # pylint: disable = too-many-arguments,too-many-positio
337337
if extra_info is None:
338338
return
339339

340+
# Normalize col_def to handle deduplication when optional_extra contains columns also in regular extra
341+
normalized_col_def = self._normalize_extra_col_def(col_def)
342+
340343
extra = self._parse_col_def(
341344
data=data,
342345
table=table,
343346
table_mask=table_mask,
344-
col_def=col_def,
347+
col_def=normalized_col_def,
345348
extra_info=None,
346349
).to_dict(orient="records")
347350
for i, xtr in zip(uuids, extra):
@@ -356,6 +359,57 @@ def _handle_extra_info( # pylint: disable = too-many-arguments,too-many-positio
356359
else:
357360
extra_info[i] = xtr
358361

362+
def _normalize_extra_col_def(self, col_def: Any) -> Any:
363+
"""
364+
Normalize extra column definition to eliminate duplicates between regular columns and optional_extra.
365+
Regular columns take precedence over optional_extra columns.
366+
Additionally, ensure no duplicates within optional_extra.
367+
368+
Args:
369+
col_def: Column definition for extra info that may contain optional_extra sections
370+
371+
Returns:
372+
Normalized column definition with duplicates removed from optional_extra
373+
"""
374+
if not isinstance(col_def, list):
375+
return col_def
376+
377+
# Collect all non-optional_extra column names
378+
regular_columns = set()
379+
380+
for item in col_def:
381+
if isinstance(item, dict) and len(item) == 1 and "optional_extra" in item:
382+
# This is an optional_extra section - we'll process it later
383+
pass
384+
else:
385+
# This is a regular column
386+
if isinstance(item, str):
387+
regular_columns.add(item)
388+
389+
# Now process optional_extra sections and remove duplicates
390+
final_list = []
391+
for item in col_def:
392+
if isinstance(item, dict) and len(item) == 1 and "optional_extra" in item:
393+
optional_cols = item["optional_extra"]
394+
if isinstance(optional_cols, list):
395+
# Filter out columns that are already in regular columns
396+
filtered_optional_cols = []
397+
for col in optional_cols:
398+
if isinstance(col, str) and col in regular_columns:
399+
continue
400+
if col not in filtered_optional_cols:
401+
filtered_optional_cols.append(col)
402+
# Only include the optional_extra section if it has remaining columns
403+
if filtered_optional_cols:
404+
final_list.append({"optional_extra": filtered_optional_cols})
405+
else:
406+
# Keep non-list optional_extra as-is (shouldn't happen but be safe)
407+
final_list.append(item)
408+
else:
409+
final_list.append(item)
410+
411+
return final_list
412+
359413
@staticmethod
360414
def _merge_pgm_data(data: Dict[ComponentType, List[np.ndarray]]) -> Dict[ComponentType, np.ndarray]:
361415
"""During the conversion, multiple numpy arrays can be produced for the same type of component. These arrays
@@ -394,6 +448,8 @@ def _parse_col_def( # pylint: disable = too-many-arguments,too-many-positional-
394448
col_def: Any,
395449
table_mask: Optional[np.ndarray],
396450
extra_info: Optional[ExtraInfo],
451+
*,
452+
allow_missing: bool = False,
397453
) -> pd.DataFrame:
398454
"""Interpret the column definition and extract/convert/create the data as a pandas DataFrame.
399455
@@ -402,15 +458,27 @@ def _parse_col_def( # pylint: disable = too-many-arguments,too-many-positional-
402458
table: str:
403459
col_def: Any:
404460
extra_info: Optional[ExtraInfo]:
461+
allow_missing: bool: If True, missing columns will return empty DataFrame instead of raising KeyError
405462
406463
Returns:
407464
408465
"""
409466
if isinstance(col_def, (int, float)):
410467
return self._parse_col_def_const(data=data, table=table, col_def=col_def, table_mask=table_mask)
411468
if isinstance(col_def, str):
412-
return self._parse_col_def_column_name(data=data, table=table, col_def=col_def, table_mask=table_mask)
469+
return self._parse_col_def_column_name(
470+
data=data, table=table, col_def=col_def, table_mask=table_mask, allow_missing=allow_missing
471+
)
413472
if isinstance(col_def, dict):
473+
# Check if this is an optional_extra wrapper
474+
if len(col_def) == 1 and "optional_extra" in col_def:
475+
# Extract the list of optional columns and parse as composite with allow_missing=True
476+
optional_cols = col_def["optional_extra"]
477+
if not isinstance(optional_cols, list):
478+
raise TypeError(f"optional_extra value must be a list, got {type(optional_cols).__name__}")
479+
return self._parse_col_def_composite(
480+
data=data, table=table, col_def=optional_cols, table_mask=table_mask, allow_missing=True
481+
)
414482
return self._parse_col_def_filter(
415483
data=data,
416484
table=table,
@@ -419,7 +487,9 @@ def _parse_col_def( # pylint: disable = too-many-arguments,too-many-positional-
419487
extra_info=extra_info,
420488
)
421489
if isinstance(col_def, list):
422-
return self._parse_col_def_composite(data=data, table=table, col_def=col_def, table_mask=table_mask)
490+
return self._parse_col_def_composite(
491+
data=data, table=table, col_def=col_def, table_mask=table_mask, allow_missing=allow_missing
492+
)
423493
raise TypeError(f"Invalid column definition: {col_def}")
424494

425495
@staticmethod
@@ -452,6 +522,7 @@ def _parse_col_def_column_name(
452522
table: str,
453523
col_def: str,
454524
table_mask: Optional[np.ndarray] = None,
525+
allow_missing: bool = False,
455526
) -> pd.DataFrame:
456527
"""Extract a column from the data. If the column doesn't exist, check if the col_def is a special float value,
457528
like 'inf'. If that's the case, create a single column pandas DataFrame containing the const value.
@@ -460,6 +531,7 @@ def _parse_col_def_column_name(
460531
data: TabularData:
461532
table: str:
462533
col_def: str:
534+
allow_missing: bool: If True, return empty DataFrame when column is missing instead of raising KeyError
463535
464536
Returns:
465537
@@ -480,18 +552,23 @@ def _parse_col_def_column_name(
480552
col_data = self._apply_multiplier(table=table, column=col_name, data=col_data)
481553
return pd.DataFrame(col_data)
482554

483-
def _get_float(value: str) -> Optional[float]:
484-
try:
485-
return float(value)
486-
except ValueError:
487-
return None
488-
489-
# Maybe it is not a column name, but a float value like 'inf', let's try to convert the string to a float
490-
if (const_value := _get_float(col_def)) is not None:
491-
return self._parse_col_def_const(data=data, table=table, col_def=const_value, table_mask=table_mask)
555+
try: # Maybe it is not a column name, but a float value like 'inf', let's try to convert the string to a float
556+
const_value = float(col_def)
557+
except ValueError as e:
558+
if allow_missing:
559+
# Return empty DataFrame with correct number of rows when column is optional and missing
560+
self._log.debug(
561+
"Optional column not found",
562+
table=table,
563+
columns=" or ".join(f"'{col_name}'" for col_name in columns),
564+
)
565+
index = table_data.index if isinstance(table_data, pd.DataFrame) else pd.RangeIndex(len(table_data))
566+
return pd.DataFrame(index=index)
567+
# pylint: disable=raise-missing-from
568+
columns_str = " and ".join(f"'{col_name}'" for col_name in columns)
569+
raise KeyError(f"Could not find column {columns_str} on table '{table}'") from e
492570

493-
columns_str = " and ".join(f"'{col_name}'" for col_name in columns)
494-
raise KeyError(f"Could not find column {columns_str} on table '{table}'")
571+
return self._parse_col_def_const(data=data, table=table, col_def=const_value, table_mask=table_mask)
495572

496573
def _apply_multiplier(self, table: str, column: str, data: pd.Series) -> pd.Series:
497574
if self._multipliers is None:
@@ -780,13 +857,15 @@ def _parse_col_def_composite(
780857
table: str,
781858
col_def: list,
782859
table_mask: Optional[np.ndarray],
860+
allow_missing: bool = False,
783861
) -> pd.DataFrame:
784862
"""Select multiple columns (each is created from a column definition) and return them as a new DataFrame.
785863
786864
Args:
787865
data: TabularData:
788866
table: str:
789867
col_def: list:
868+
allow_missing: bool: If True, skip missing columns instead of raising errors
790869
791870
Returns:
792871
@@ -799,10 +878,20 @@ def _parse_col_def_composite(
799878
col_def=sub_def,
800879
table_mask=table_mask,
801880
extra_info=None,
881+
allow_missing=allow_missing,
802882
)
803883
for sub_def in col_def
804884
]
805-
return pd.concat(columns, axis=1)
885+
# Filter out DataFrames with no columns (from missing optional columns)
886+
non_empty_columns = [col for col in columns if len(col.columns) > 0]
887+
if not non_empty_columns:
888+
# If all columns are missing, return an empty DataFrame with the correct number of rows
889+
table_data = data[table]
890+
if table_mask is not None:
891+
table_data = table_data[table_mask]
892+
index = table_data.index if isinstance(table_data, pd.DataFrame) else pd.RangeIndex(len(table_data))
893+
return pd.DataFrame(index=index)
894+
return pd.concat(non_empty_columns, axis=1)
806895

807896
def _get_id(self, table: str, key: Mapping[str, int], name: Optional[str]) -> int:
808897
"""
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# SPDX-FileCopyrightText: Contributors to the Power Grid Model project <powergridmodel@lfenergy.org>
2+
#
3+
# SPDX-License-Identifier: MPL-2.0
4+
---
5+
# Test mapping file for optional_extra feature
6+
grid:
7+
nodes:
8+
node:
9+
id:
10+
auto_id:
11+
key: node_id
12+
u_rated: voltage
13+
extra:
14+
- ID
15+
- Name
16+
- optional_extra:
17+
- GUID
18+
- StationID
19+
20+
units:
21+
V:
22+
kV: 1000.0
23+
24+
substitutions: {}
4.9 KB
Binary file not shown.
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
SPDX-FileCopyrightText: Contributors to the Power Grid Model project <powergridmodel@lfenergy.org>
2+
3+
SPDX-License-Identifier: MPL-2.0
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# SPDX-FileCopyrightText: Contributors to the Power Grid Model project <powergridmodel@lfenergy.org>
2+
#
3+
# SPDX-License-Identifier: MPL-2.0
4+
---
5+
# Test mapping file for optional_extra feature with Vision Excel format
6+
id_reference:
7+
nodes_table: Nodes
8+
number: Number
9+
node_number: Node.Number
10+
sub_number: Subnumber
11+
12+
grid:
13+
Nodes:
14+
node:
15+
id:
16+
auto_id:
17+
key: Number
18+
u_rated: Unom
19+
extra:
20+
- ID
21+
- Name
22+
- optional_extra:
23+
- GUID
24+
- StationID
25+
26+
units:
27+
V:
28+
kV: 1000.0
29+
30+
substitutions: {}
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
SPDX-FileCopyrightText: Contributors to the Power Grid Model project <powergridmodel@lfenergy.org>
2+
3+
SPDX-License-Identifier: MPL-2.0
4.84 KB
Binary file not shown.
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
SPDX-FileCopyrightText: Contributors to the Power Grid Model project <powergridmodel@lfenergy.org>
2+
3+
SPDX-License-Identifier: MPL-2.0
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
id_reference:
2+
nodes_table: Nodes
3+
number: Number
4+
node_number: Node.Number
5+
sub_number: Subnumber
6+
7+
grid:
8+
Nodes:
9+
node:
10+
id:
11+
auto_id:
12+
key: Number
13+
u_rated: Unom
14+
extra:
15+
- optional_extra:
16+
- GUID
17+
- StationID
18+
- ID
19+
- Name
20+
- GUID
21+
22+
units:
23+
V:
24+
kV: 1000.0
25+
26+
substitutions: {}

0 commit comments

Comments
 (0)