Conversation
gonuke
left a comment
There was a problem hiding this comment.
I started reviewing this a while ago and never got around to finishing it.
Here are some specific suggestions.
Overall, I think this PR is mixing things based on what your task was at the time and should maybe be broken down somehwat, aka separation of concerns.
Fundamentally, if you are going to rely on the ADF structure that we have developed for holding ALARA output data, then I think this PR should focus on methods that modify the ADF and then write it to SQL. Everything else is for a specific use case that can come in a future PR that focuses on how to process a variety of data that you have already generated using these new capabilities.
| lib = aop.DataLibrary() | ||
| adf = lib.make_entries(run_dicts[run_dict]) | ||
| adf_data.append(adf) | ||
| adf = pd.concat(adf_data) |
There was a problem hiding this comment.
Or maybe it's indented too much and should be outside the loop?
|
|
||
| def modify_adf(adf, norm_flux_arr, t_irr_arr, inputs): | ||
| #Remove some columns: | ||
| adf.drop(columns=['time', 'time_unit', 'variable', 'var_unit', 'block', 'block_num'], inplace=True) |
There was a problem hiding this comment.
Before you do this, will it be safer to first filter to get only the number density info. I guess you are running these to only generate number density, but just in case...
|
What if we limit |
|
To repeat the comments from #35:
Notably, this particular approach for generating the ADF makes sense for the current "campaign" of problems, but may not always make sense as we consider different variations on how we set up problems. |
Co-authored-by: Paul Wilson <paul.wilson@wisc.edu>
gonuke
left a comment
There was a problem hiding this comment.
I think it's important to separate the method for knowing the values of t_irr_flat from the modifying of the ADF. More generally, parsing a run_label may become the least common way to know this information.
There was a problem hiding this comment.
It looks like you only need this file to get the number of groups. Could that just be an input/command-line parameter? That is, in stead of providing a filename for the group bounds, you can just provide an integer number of groups.
| - 64 | ||
| flux_file : /filespace/a/asrajendra/research/activationDB/ref_flux_files/iter_dt_flux_2.0986E14 | ||
| flux_file : ../calc_dwell_dir/ref_flux_files/iter_dt_flux_2.0986E14 | ||
| vit_j_file : ../data/vit-j-175-bins.txt No newline at end of file |
There was a problem hiding this comment.
See previous comment - since this file only used to determine the number of groups - maybe this entry is just the number of groups
| vit_j_file : ../data/vit-j-175-bins.txt | |
| n_groups : 175 |
| inputs = yaml.safe_load(yaml_file) | ||
| return inputs | ||
|
|
||
| def modify_adf(adf, norm_flux_arr, t_irr_arr, inputs): |
There was a problem hiding this comment.
Adding docstrings early can help the reviewer know your intent for the data structures.
| num_pulses = inputs['num_pulses'] | ||
| duty_cycles = inputs['duty_cycles'] | ||
|
|
||
| # Extract the number of pulses and duty cycles from the run label | ||
| pulse_num_dc = adf['run_lbl'].str.extract(r'_(\d+)p_(\d+)_').astype(int) | ||
|
|
||
| # Map num_pulses and duty_cycles to an index | ||
| pulse_idx = pulse_num_dc[0].map({pulse_num: i for i, pulse_num in enumerate(num_pulses)}) | ||
| duty_cycle_idx = (pulse_num_dc[1]/100).map({duty_cycle: i for i, duty_cycle in enumerate(duty_cycles)}) | ||
|
|
||
| # Add flattened irradiation time: | ||
| adf['t_irr_flat'] = t_irr_arr.T[pulse_idx.to_numpy(), duty_cycle_idx.to_numpy()] | ||
| adf['t_irr_flat'] = aop.convert_times(adf['t_irr_flat'], from_unit='y', to_unit='s') |
There was a problem hiding this comment.
Separation of concerns: All of this seems like it's in the wrong place. All the data for the new columns in the ADF should be computed before you make the call the modify the ADF. Remember, there may be many ways to compute the data for the new columns depending on what the original source of the data is, and you want flexibility to mix and match.
Specifically, I thought we discussed that you shouldn't rely on decoding information from the run_label to get this information, anyway. If you still want to have that as one pathway to determine this data, that should be it's own method with only that purpose.
This PR introduces a structure to the sqlite activation results database and writes some preliminary results to it.
The
run_lblcolumn is used to keep track of the ALARA runs from which results were produced (using the number of pulses, duty cycle %, and active burn time).The
block_namecolumn has the effect of tracking the parent nuclide. The (flattened) irradiation time in seconds, flux spectrum shape, and number density all have separate columns.This database should be expanded to include the average flux magnitude (pending #14)
fixes #35