Add FileContainer search_intersection method#169
Add FileContainer search_intersection method#169veni-vidi-vici-dormivi wants to merge 6 commits into
search_intersection method#169Conversation
Codecov Report✅ All modified and coverable lines are covered by tests.
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 3 files with indirect coverage changes 🚀 New features to boost your workflow:
|
|
Would it make sense to require two import filefisher
test_paths = [
"historical/tas",
"historical/hfds",
"ssp585/hfds",
]
ff = filefisher.Filefinder("{scen}", "{variable}", test_paths=test_paths)
fc_tas = ff.find_files(variable="tas")
fc_hfds = ff.find_files(variable="hfds")
fc_tas.intersect(fc_hfds, on="variable")where the DetailsThe implementation could be along the lines of import pandas as pd
def intersect(df_l: pd.DataFrame, df_r: pd.DataFrame, on: str):
assert (df_l.columns == df_r.columns).all()
assert len(df_l[on].unique()) == 1
assert len(df_r[on].unique()) == 1
columns = df_l.columns.drop(on)
mi_l = pd.MultiIndex.from_frame(df_l[columns])
mi_r = pd.MultiIndex.from_frame(df_r[columns])
sel = mi_l.intersection(mi_r)
sel_l = mi_l.get_locs(sel)
sel_r = mi_r.get_locs(sel)
l = df_l.iloc[sel_l]
r = df_r.iloc[sel_r]
return pd.concat([l, r])
intersect(fc_tas.df, fc_hfds.df, on="variable") |
|
Right. That is also nice. My application was one where if did Yours is easier to understand. I am thinking about possible advantages of my implementation... With mine only the entries of the |
|
Yet another idea would be to combine this as a filefisher.align(fc.grouby(on="variable"), except="variable")but then we have to pass the |
|
close/ reopen to test something |
for more information, see https://pre-commit.ci
Here is a method that enables searching for intersecting values of a certain key along all values of another key. The specific use case here is: "Which scenarios and members are available for both variables tas and hfds" for example.
The usage would be to find all available scenarios and members for both variables and then search the resulting
FileContainerfor intersecting values alongscenario_member. In this casesearch_key = variableandintersect_key = scenario_member. I chose this approach because I felt it relatively straight forward, more than implementing it inFileFinder.I am not very happy with the names and my explanation in the docstring, but at least the example should make it quite clear.