Skip to content

Conversation

@Mikolaj-A-Kowalski
Copy link
Contributor

@Mikolaj-A-Kowalski Mikolaj-A-Kowalski commented Jun 24, 2025

Builds on #34, related to #22

There was an implicit conversion to float64 taking place due to a cast to the Python's list (and hence Python's float). This resulted in a serious (factor ~10) degradation in performance, which should now be fixed.

The problem was ultimately related to how pandas handles partial assignments in DataFrames and Series. If the data is assigned to from a higher precision expression, the dtype of the Series may be implicitly changed, which causes a temporary memory allocation. For example:

import pandas as pd
import numpy as np
    
df = pd.DataFrame({"f32": np.ones(2, dtype="float32")})

# Here  `dtype` of df will change to float64 
# Temporary memory allocation is taking place
df.iloc[1:2] = np.float64(1/3)

Please note that the behaviour is a bit peculiar and perhaps unintuitive. Pandas will not change dtype if it is "not necessary", that is the new value can be represented exactly in the prior dtype. e.g. :

import pandas as pd
import numpy as np
    
df = pd.DataFrame({"f32": np.ones(2, dtype="float32")})

# Here  `dtype` of df will  remain float32 since 2.0 can be represented exactly
df.iloc[1:2] = np.float64(2.0)

The implicit conversion to float64 was causing some DataFrames to change their dtype to "float64", which in turn caused Max_Uptake (in grid.uptake method) getting repeatability changed from "float32" to "float64", which appears to have been the main performance bottleneck.

Tagging @bioatmosphere explicitly since you were interested during the meeting yesterday ;-)

sjavis and others added 4 commits June 6, 2025 18:07
There was an implicit conversion to float64 taking place due to a cast
to the Python's list (and hence Python's float). This resulted in a
serious (factor ~10) degradation in performance, which should now be
fixed.

The performance degradation was a result of temporary allocation
performed by pandas when a dtype of a frame was implicitly changed
in updates of the form e.g.:
```python
import pandas as pd
import numpy as np

df = pd.DataFrame({"f32": np.ones(2, dtype="float32")})
df.iloc[1:2] = np.float64(1/3)
```
since pandas 2.1, such operations raise a FutureWarning. All occurences
of that warning in DEMENTpy are resolved in this commit.
choose_taxa = np.zeros((self.n_taxa,self.gridsize), dtype='int8')
for i in range(self.n_taxa):
choose_taxa[i,:] = np.random.choice([1,0], self.gridsize, replace=True, p=[frequencies[i], 1-frequencies[i]])
choose_taxa[i,:] = np.random.binomial(1, frequencies[i], self.gridsize)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I changed the numbers back to "float32" this sampling started failing saying that the probabilities do not sum to 1.0. I presume they must be converted to "float64" before getting summed inside np.random.random_choice.

I have changed the sampling to binominal, which i believe should be equivalent and would avoid the summation errors.

[Enzyme_Loss,
Enzyme_Loss.mul(self.Enz_Attrib['N_cost'].tolist()*self.gridsize,axis=0),
Enzyme_Loss.mul(self.Enz_Attrib['P_cost'].tolist()*self.gridsize,axis=0)],
Enzyme_Loss.mul(np.repeat(self.Enz_Attrib['N_cost'].values, self.gridsize), axis=0),
Copy link
Contributor Author

@Mikolaj-A-Kowalski Mikolaj-A-Kowalski Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'float64' was appearing here.

tolist method would return a list of Python's floats, which are double precision.

@jgwalkup jgwalkup marked this pull request as ready for review September 2, 2025 14:49
@jgwalkup jgwalkup self-requested a review September 30, 2025 18:30
@jgwalkup jgwalkup merged commit a5f8711 into main Nov 12, 2025
3 checks passed
@jgwalkup jgwalkup deleted the iss22-remove-f64 branch November 12, 2025 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants