Conversation
|
Hi @imedan, thanks for adding this datamodel, and for addressing the bugs that appeared for this case. Would you confirm that the datamodel renders as you'd expect in the DSI, here: https://sas.sdss.org/dsi/file/select-mdwarf-contin/mdwarf_contin_vac ? I request 2 changes prior to the merge in order that the VAC datamodel be geared toward the data release. First would you please do a global search and replace of DR19 -> DR20? Second, VAC_ROOT should be replaced with the most closely related, existing environment variable: MWM_MDWARF, so I can include the VAC in a subfolder in the dr20 volume. So would you do a search and replace of: $VAC_ROOT/mwm/mdwarf-contin/mdwarf-contin-standardization-[V_VAC].h5 (Not needed for this PR, but is the h5 file ready on the SAS so I can stage it in DR20?) |
|
Yes, the render looks good to me! I just pushed the two changes you requested. Do things look correct to you? For the actual file, no. I just tried to copy the file: and I got a permission denied error. |
|
Thanks @imedan, this looks good to me. I ran h5check and the h5 file validated, created the checksum file, and mirrored the VAC to JHU for sciserver compute access via the Ceph-fs disk (useful if a tutorial jupyter notebook is planned). I updated your VAC announcement with the datamodel and SAS path info, and updated Anne-Marie's VAC spreadsheet. I also symlinked the dr20 version into IPL-4, and if you need to make changes to the file prior to its release, you can use /uufs/chpc.utah.edu/common/home/sdss52/ipl-4/vac/mwm/m-dwarf/mdwarf-contin to stage new versions (u6033276 added to the FACL write permissions). |
havok2063
left a comment
There was a problem hiding this comment.
If this is a new VAC for DR20, how was this first generated for DR19? If this was released in both, there should be an entry for both DR19 and DR20. If this was new for DR20, the datamodel should be generated using the WORK release, then migrated to DR20.
Also, I don't recommend the find+replace method, as there is DR logic used to populate fields and a DR is not simply an entry in the datamodel. This could introduce inconsistencies. If you must do it, you should do it once in the yaml, check all the sub-fields are correct for the new DR, and re-generate the other files automatically. This will ensure those files are correct everywhere.
| base = os.path.splitext(self.output)[0] | ||
| yaml_dir = os.path.dirname(base).replace(f'/{self.format}', '/yaml') | ||
| cached_file = os.path.join(yaml_dir, os.path.basename(base) + '.yaml') | ||
|
|
There was a problem hiding this comment.
Can you explain what these changes are for? it looks like this tries to replace both the suffix and the path part, but the original code already does that.
| @@ -232,7 +232,7 @@ def design_content(self, name: str = '/', description: str = None, hdftype: str | |||
| member['attrs'] = self._design_attrs(attrs) or [] | |||
| if hdftype == 'dataset': | |||
| member['shape'] = ds_shape or (10,) | |||
| member['size'] = np.prod(member['shape']).item() | |||
| member['size'] = np.prod(member['shape']) | |||
| member['dtype'] = ds_dtype or 'S10' | |||
There was a problem hiding this comment.
All of these .items were necessary to dump the hdf5 variable into a valid format when serializing into yaml. Any particular reason these got changed? There are tests that are now failing, see https://github.com/sdss/datamodel/actions/runs/23303538048. This change may be the cause. Can you investigate?
| design: false | ||
| vac: false | ||
| recommended_science_product: false | ||
| changelog: |
There was a problem hiding this comment.
If this is a VAC and recommended for science, these should be set to true, and the datamodels re-validated.
This is for the data model files for the DR20 VAC: https://soji.sdss.utah.edu/collaboration/announcements/vacs/browse/select/sdss-5.20. Will also tag @zachway here as he in one of the other main authors of this VAC. There are three commits to the branch summarized below.
1f65e1e includes the new data model files for the VAC. As a note, dr20.cfg is not available yet in tree, so this was setup with DR19. Can do a replace easily with DR20 if that makes sense?
d78e81d deals with a bug I found in creating the YAML from HDF5 files. Specifically, I found that
h5obj.sizealready returns anint, so this have an error when trying to do.item(). I am unsure if this is a difference inh5pyor not.376d161 deals with a string replacement bug. My file name included
md(e.g./uufs/chpc.utah.edu/common/home/u6033276/.modulefiles/miniconda/datamodel/datamodel/products/yaml/mdwarf-contin.yaml). So, when trying to make the markdown path, it would look for a yaml file that looked like/uufs/chpc.utah.edu/common/home/u6033276/.modulefiles/miniconda/datamodel/datamodel/products/yaml/yamlwarf-contin.yaml. I made the string replacement more explicit to only replace the directory and.yaml.Would be good to review the above and see what should be kept or not.