Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .jules/bolt.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,6 @@
## 2024-05-19 - Caching YAML Load for Framework Registry
**Learning:** `yaml.safe_load` on `frameworks.yml` within `load_framework_registry()` was taking ~2-3 ms per call and it was repeatedly called for every framework entry via `get_framework_config()`. This was a micro-bottleneck, especially when dealing with lists or multiple frameworks.
**Action:** Applied the `@lru_cache` and `deepcopy` pattern successfully again to `load_framework_registry()` and `get_framework_config()` to avoid caching a mutable dictionary directly and avoid repeated YAML I/O parsing.
## 2024-05-19 - [Replace iterrows with faster iteration]
**Learning:** Pandas `iterrows()` is extremely slow for looping over rows. Using `itertuples()` for structured properties and positional access or `to_dict('records')` when column names are dynamic or non-standard identifiers yields a 10-50x speedup with minimal effort.
**Action:** Always prefer `itertuples()` or `to_dict('records')` over `iterrows()`. If index-based loop variables are needed (e.g. `row[0]`), use `itertuples(index=False, name=None)`. If column names are guaranteed to be valid Python identifiers, use `itertuples(index=False)` and access via `row.ColName`. If column names are dynamic or invalid identifiers, use `to_dict('records')`.
2 changes: 1 addition & 1 deletion ml_peg/calcs/bulk_crystal/elasticity/calc_elasticity.py
Original file line number Diff line number Diff line change
Expand Up @@ -289,7 +289,7 @@ def run_elasticity_benchmark(

# Save relaxed structures to extxyz for visualisation
atoms_list = []
for _, row in results.iterrows():
for row in results.to_dict("records"):
struct = row.get("final_structure")
if not isinstance(struct, Structure):
continue
Expand Down
6 changes: 3 additions & 3 deletions ml_peg/calcs/conformers/MPCONF196/calc_MPCONF196.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,9 +86,9 @@ def get_ref_energies(data_path: Path) -> dict[str, float]:
)
ref_energies = {}

for row in df.iterrows():
label = row[1][0]
ref_energies[label] = float(row[1][2]) * KCAL_TO_EV
for row in df.itertuples(index=False, name=None):
label = row[0]
ref_energies[label] = float(row[2]) * KCAL_TO_EV

return ref_energies

Expand Down
6 changes: 3 additions & 3 deletions ml_peg/calcs/conformers/solvMPCONF196/calc_solvMPCONF196.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,9 +84,9 @@ def get_ref_energies(data_path: Path) -> dict[str, float]:
)
ref_energies = {}

for row in df.iterrows():
label = row[1][0]
e_ref = float(row[1][1]) * units.Hartree
for row in df.itertuples(index=False, name=None):
label = row[0]
e_ref = float(row[1]) * units.Hartree
ref_energies[label] = e_ref

return ref_energies
Expand Down
10 changes: 6 additions & 4 deletions ml_peg/calcs/utils/gscdb138.py
Original file line number Diff line number Diff line change
Expand Up @@ -106,11 +106,13 @@ def run_gscdb138(
df_refs["Reference"] *= units.Hartree

# Calculate relative energy for each entry.
for _, row in tqdm(df_refs.iterrows(), dataset, total=df_refs.shape[0]):
for row in tqdm(
df_refs.itertuples(index=False), dataset, total=df_refs.shape[0]
):
atoms_list = []
identifier = row["Reaction"]
reactions = row["Stoichiometry"].split(",") # Parse stoichiometry string.
e_rel_ref = row["Reference"]
identifier = row.Reaction
reactions = row.Stoichiometry.split(",") # Parse stoichiometry string.
e_rel_ref = row.Reference
num_species = len(reactions) // 2 # Each species has coefficient and name.

e_rel_model = 0
Expand Down
Loading