-
Notifications
You must be signed in to change notification settings - Fork 8
Parallel I/O output module enhancements #655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: parallelio-playground
Are you sure you want to change the base?
Parallel I/O output module enhancements #655
Conversation
4b2f171 to
8abda2c
Compare
|
Hi @Whyborn, I've opened this PR to highlight the output module specific diffs (includes the aggregators and the list data structure for output variables). The implementation still contains a bunch of TODO's, but it would be awesome if I could get your feedback at this stage. I've been testing it only in serial (testing in parallel will only be possible once the met data can be read in via parallel I/O) and currently gives the same answers as before for Qh. However, I did have to change the NetCDF format in the old output module from classic to NetCDF4/HDF5 to easily compare outputs between new and old. This was a hack as the new NetCDF API has the NetCDF4 format hardcoded whenever creating or opening new files. I'll make that configurable so that both classic and NetCDF4 is supported. |
|
First pass comments:
class(aggregator_t), dimension(:), allocatable :: Qh_aggregators
...
Qh_aggregators = create_aggregators(canopy%fh, ["mean", "max"])
call cable_output_add_variable(
...
aggregator=Qh_aggregators
...
)This way it makes the aggregators available for use where necessary (e.g. the
cable_output_shape_land_patch = create_decomposition(["land", "patch"])
cable_output_shape_land_patch_soil = create_decomposition(["land", "patch", "soil"])where the dimensions are defined earlier in initialisation with something like: call define_cable_dimension("soil", 6)
call define_cable_dimension("rad", 2)I think this makes the design more easily extensible? This makes it trivial for someone developing new code which may have new dimensions to output their own stuff.
type aggregator_store_t
class(aggregator_t), allocatable :: aggregators(:)
integer :: num_aggregators
end type aggregator_store_twhere
|
|
Thanks @Whyborn for your thoughts!
I agree that it would be nice if time related control were handled at a higher level (for example with the relevant science output module as you say). However, it wasn't obvious to me how this approach could allow for driving the intermediate aggregators required for
For two reasons:
Oh yep I remember you mentioning this in our discussions. I think this is something we could definitely add in the future - I was hesitant to introduce this now as I want to avoid diverging in functionality from the current output module which assumes a single aggregation method per variable. As for a list of objects being targets, I got around that problem by working with arrays of type aggregator_handle_t which contain a reference to an aggregator rather than the actual aggregator instance.
I like this suggestion, I agree it's definitely not that obvious what one needs to do to create a new
I agree, a timing module for wider use in the code would be great. The |
Does this allow aggregators to be driven independently? It seems to me like specifically doesn't allow this, because every aggregator which accumulates e.g. There has to be a way to trigger accumulation at specific times, for instances like this. This is why I think the aggregators have to be available to work with as standalone objects.
Yea I can see that, it might be more easily readable to have every call contain a instead of a construction like Although I don't see why the latter couldn't also be used to create the same documentation in the same way.
Yea that's what I meant, whether the
Are you sure? I thought arrays of polymorphic classes were part of the standard, I used them in some of my aggregator testing. You just need
There's the datetime-fortran which we already have a spack package for? I haven't actually looked at it's features yet though. |
This change allows the `cable_abort_module` to be used in modules where `cable_def_types_mod` or `cable_IO_vars_module` are a dependee of that module as the removal of `range_abort` avoids introducing cyclic module dependencies. The impact of this change is minimal as `range_abort` is only called from the `cable_abort_module` in the code base.
…regator rather than abstract aggregator type This is done so that new_aggregator can instantiate non-polymorphic aggregator instances as well as polymorphic aggregator instances.
|
Hi @Whyborn, I've made the updates we discussed last week. I think it's looking much better now with your suggestions on how aggregators are organised in the output module, thanks as always for the comments! Please let me know if there is anything else that catches your eye on the design. I'm going to try add support for specifying non-time varying data in the output and restart files. For this I'm thinking we could introduce a |
|
Just a couple of comments:
if (output%patch)
decomp = decomp_land_real32
reducer = "patch"
else
decomp = decomp_land_patch_real32
reducer = "none"
end if
call cable_output_add_variable(
name="Qh",
...
decomp=decomp,
reducer=reducer
)And then in the Then you wouldn't have to have all the buffers. The
|
I'm definitely happy to add the frequency limit to
That sounds good to me, I will rename the module containing the grid cell averaging stuff to be more general (and maybe put this under
The distinction of types for decompositions is required by PIO via the
I'm happy to introduce a string argument for reducer instead of the
I did consider using a function interface instead of a subroutine interface, however I opted for the subroutine approach with the temporary buffer as this avoids introducing a potentially large allocation when computing the grid cell average. I want to limit the number of unnecessary allocations and copy operations in the write procedures where possible. It might be possible to return a preallocated array from a function. I can look into this a bit more. Thank you again for the feedback on this! |
| end do | ||
| end if | ||
|
|
||
| if (time_step_matches(dels, time_index, output%averaging, leaps, start_year)) then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, we need to change "output%averaging" to "output%aggregating" since it's not always an average on that time window.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agree, I probably won't change the name as part of the parallel I/O changes as its likely a breaking change to the CABLE namelist.
Do we have an issue for documenting namelist improvements? I know Jhan proposed some improvements to the organisation of namelists in #588. And there is also:
CABLE/src/offline/cable_driver_common.F90
Lines 166 to 179 in 8e20bac
| ! TODO(Sean): we should not be setting namelist parameters in the following if | |
| ! block - all options are all configurable via the namelist file and is | |
| ! unclear that these options are being overwritten. A better approach would be | |
| ! to error for bad combinations of namelist parameters. | |
| IF (icycle > CASAONLY_ICYCLE_MIN) THEN | |
| icycle = icycle - CASAONLY_ICYCLE_MIN | |
| CASAONLY = .TRUE. | |
| CABLE_USER%CASA_DUMP_READ = .TRUE. | |
| CABLE_USER%CASA_DUMP_WRITE = .FALSE. | |
| ELSE IF (icycle == 0) THEN | |
| CABLE_USER%CASA_DUMP_READ = .FALSE. | |
| spincasa = .FALSE. | |
| CABLE_USER%CALL_POP = .FALSE. | |
| END IF |
And
CABLE/src/offline/cable_driver_common.F90
Lines 181 to 183 in 8e20bac
| ! TODO(Sean): overwriting l_casacnp defeats the purpose of it being a namelist | |
| ! option - we should either deprecate the l_casacnp option or not overwrite | |
| ! its value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, we need to change "output%averaging" to "output%aggregating" since it's not always an average on that time window.
Just thought of this now, a way to make this a bit nicer is to have an aggregation_frequency component in the cable_output_profile_t type which is assigned to output%averaging on initialisation. I think this makes sense as the sampling frequency is intended to be set at the profile level.
| ! TODO(Sean): this is a hack for determining if the current time step | ||
| ! is the last of the month. Better way to do this? | ||
| IF(ktau == 1) THEN | ||
| !MC - use met%year(1) instead of CABLE_USER%YearStart for non-GSWP forcing and leap years | ||
| IF ( TRIM(cable_user%MetType) .EQ. '' ) THEN | ||
| start_year = met%year(1) | ||
| ELSE | ||
| start_year = CABLE_USER%YearStart | ||
| ENDIF | ||
| END IF | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we read a met file during the initialisation (outside the time loop), we could then initialise "CABLE_USER%YearStart = met%year(1)" for the MetType = ''?
It would remove the need for this if condition and it makes sure that cable_user%YearStart is always defined and always has the same meaning.
Not sure that's what you mean in your TODO. It seems to apply more to the code in cable_timing_utils.F90 than here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep I definitely want to get back to this, I'll test out the solution you propose. I mean to try out the site case (which I believe corresponds to MetType == ' ') to confirm I haven't broken things here for this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure that's what you mean in your TODO. It seems to apply more to the code in cable_timing_utils.F90 than here.
You've interpreted that correctly, it was a reminder to myself to look into an alternative algorithm for monthly timings that doesn't require the start year of the simulation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an alternate, we could consider an additional "ktau"-like variable that is reset every year. (maybe there is one in the code, I don't know). The time loop tells us when we change year (to check if this is true with the site simulations).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason not to have cable_user%yearstart be something that is unconditionally required in the namelist, and have that be the final truth about the start point of the simulation? Having that able to be overwritten is bound to lead to confusion in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an alternate, we could consider an additional "ktau"-like variable that is reset every year. (maybe there is one in the code, I don't know). The time loop tells us when we change year (to check if this is true with the site simulations).
I've seen this pattern used in other models (see #656). I agree that tracking information like this throughout the run makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason not to have
cable_user%yearstartbe something that is unconditionally required in the namelist, and have that be the final truth about the start point of the simulation? Having that able to be overwritten is bound to lead to confusion in the future.
That's a great point, it might make sense to instead error if met%year(1) does not match cable_user%yearstart, and rely on cable_user%yearstart for timing logic.
I don't really understand why met%year(1) is required for this case, I'll see what happens when we rely on cable_user%yearstart alone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't ktau already that? The timestep within the current year, with ktau_tot being the total timestep in the simulation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe ktau can extend beyond an year for site configurations (dependent on kend). For the MetType = 'gswp3' case that I'm testing at the moment, ktau is reset every year. This is probably a symptom of the many met forcing formats / configurations which are supported by the driver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like ktau loops with do ktau = kstart, kend here and kend is set to be the number of steps in a year in almost every instance, except for gswp which determines it based off the length of the time dimension in the met file (this case block)
…_daily aggregators to canopy_type
Specific changes include: 1. Remove `output_aggregator_t` and instead assign each aggregator handle to either the `accumulations_time_step`, `accumulations_daily` list based on the accumulation frequency. 2. When updating outputs at each time step, accumulate aggregators according to the assigned accumulation frequency. If writing output variables, then perform the normalisation and resetting of each aggregator before and after the write operation. Intermediate aggregators (such as tscrn_max_daily) are assumed to be driven correctly outside the output module, i.e. they are treated the same as other working variables. 3. Add range check functionality
…al for land dimension
|
Thanks for the comments @ccarouge, I'll add in those suggestions! |
| end do | ||
| end if | ||
|
|
||
| if (time_step_matches(dels, time_index, output%averaging, leaps, start_year)) then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would split this subroutine in two (or more) to split out where we accumulate the aggregators and where we write out the data. I think we should have an update and a write procedure. I've seen you discussed this with Lachlan as well. I understand the idea of replicating the current behaviour. I'm just worried this is going to push us towards a design we would have to rethink to split to get the accumulation step handled by the science modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lachlan and I sort of came to the agreement that any aggregators used as part of non-output related computations would be driven outside the output module (similar to the canopy%tscrn_max_daily aggregator), and that aggregators for writing output would be driven by the output module. Is there a specific use case you have in mind where it makes sense to have the accumulation step for the aggregators for writing output handled by the science modules?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't have to go to one of the science modules. I think it still makes sense to accumulate and write in different routines.
|
|
||
| end subroutine write_variable | ||
|
|
||
| subroutine write_variable_grid_cell_average(output_variable, output_file, time_index, patch, landpt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I follow correctly, the reason the grid averaging is done within the write procedure is to save on memory allocation. But then, I don't understand why we need to save the averaging within output_variable%, why not use local variables to store the data for the write?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I would like to move this functionality out of the output module, and have a reducer defined in the output variable initialisation, that would recognise various reductions i.e. grid cell average, dominant patch. But I think this is quite difficult to do with the tools Fortran offers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Whyborn , my first comment was to say I'd prefer the grid averaging to be done separately from the write. But then, I thought Sean must have had a reason to do it this way. I thought hard (and not that long) and thought the memory thing could be it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we need to save the averaging within output_variable%, why not use local variables to store the data for the write?
Just to confirm, are you asking whether the time aggregated quantities should be the grid cell averaged values rather than the values per patch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ccarouge I think it's actually quite difficult to decouple them completely, because the grid cell averages don't have working variables to be associated with.
…omputing mean aggregations
03eaa9b to
fb4febe
Compare
…bles in the same list
Restore pack functionality, move if (active) block from cable_output_add_variable to cable_output_commit.
4724830 to
9263f1e
Compare
📚 Documentation preview 📚: https://cable--655.org.readthedocs.build/en/655/