✨ add files and assay model#21
Conversation
RobertJCarroll
left a comment
There was a problem hiding this comment.
Hi Christina. It looks like you haven't been able to test this yet. I think the github action for running the build may have not triggered as it was converted from a draft, but I'm not sure. Happy to help get that set up, You won't need copier, but the other prerequisites here should get you started:
https://github.com/dalito/linkml-project-copier?tab=readme-ov-file#prerequisites
I tested locally, and the build is failing. One part that stood out is that the slot definitions should all happen in the slot section (ie, line 637 and beyond in your version). Looking at your yaml, none of the information about descriptions should appear there, for example, this description is already present (broadly) in the slot definition here. We also want to use slot_usage as infrequently as possible. IE, where the names, titles, types, and descriptions can reasonably be the same, we want to preserve that. It looks like you might be trying to define everything within the classes, which is understandable, but not the "linkml way". Slot_usage would mainly be to specify something as the unique key of a class/table or to set something as required in that specific class/table's context.
To get this building, I would recommend starting by moving all of these "slot definition details", eg, the descriptions nested in the class slots section and the other definitions in the class slot_usage section into the high level slots. You'd then only have the names of those slots in the class, eg, like DOI and it's slots do_id and bibliographic_reference.
We can dig a bit deeper on the field-level specifics once it's building- I see some duplicate slots in here, and it'll be easier to review when they are cleaned up. Please let me know if you need more guidance on the "how".
|
torstees
left a comment
There was a problem hiding this comment.
There are some string types that might be better suited as enums to ensure we are consistent across studies/programs. @RobertJCarroll can say for sure whether these might be better suited as strings, but I thought it was worth flagging.
| description: Identifier for a specific version of the object | ||
| range: string | ||
| required: true | ||
| access_type: |
There was a problem hiding this comment.
Should this not be an enumeration?
There was a problem hiding this comment.
yea @torstees I agree - i set them as strings for now and was hoping to add some initial Enums I had in mind this week. I have an idea of the KF enums we typically use, but wasn't sure about INCLUDE ones we'd need. Would love to discuss further.
| range: string | ||
| required: true | ||
| experimental_strategy: | ||
| title: Experimental Strategy |
There was a problem hiding this comment.
Some of these look like suitable candidates for Enumerations: Exp Strategy, Assay Center, Platform.
|
@torstees @RobertJCarroll - I added some initial ideas for Enums. I'd like to discuss further enums for |
|
We should have a deeper conversation about the enumerations. Some of these are complex enough that they likely require a separate management process if we can't use external sources. EG, the EnumPlatform has two different type of Illumina. One is more specific, but they aren't nested. We also likely want to avoid "other" terms if possible. Up front I'd prefer permitting nulls and adding them in ASAP (or keeping it required with the supports for rapid response changes). Untangling "Did we pick other before we added that one" is pernicious. With regards to FileAdmin, we probably want to either just have it reference a File or inherit from File- every FileAdmin should have a File, too, right? Similar with FileAssay- If we want it to extend File that's ok probably, but we should think about where things belong. Would it make more sense for it to just be "Assay" and point to files generated? I do worry a bit about having too many places to link files to individuals or samples, but maybe that's ok? |
agree on having a larger convo about the enums - we have a lot of standardization on our end for file enums so curious how that fits into this model and wondering if others have initial feedback too - I don't think I added every single enum (as we have a ton in a master file). @calkinsh @chris-s-friedman @awarkow @allisonheath
Hm yea that'a good point. I took another look at it - and I agree. I think in my head I mixed up some of the logic with inheritance there (since file is a subset of fileAdmin - i got the inheritance direction confused 😅 ).
For FileAssay I was imagining a similar scenario with File / FileAdmin. In the sense there is some universal/operational table with the all the relevant FileAssay information we track, and then FileAssay is a subset of that table. I think we can remove the subject_id from that model and have it point to files generated and samples. I think it's okay in this case, so we don't have to join this file assay table to a files table just to see that file --> sample mapping. But yea curious to get more feedback on this one. |
Linting requires a specific EFO URI: > warning Schema maps prefix 'EFO' to namespace 'http://www.ebi.ac.uk/efo' instead of namespace 'http://identifiers.org/efo/' (canonical_prefixes)
Somewhat hierarchical platforms with meanings pointing to EFO
…lude-dcc/include-access-model into d3b-2559-add-files-and-assay
|
Looking over this PR, it's currently up to speed with main, but we still need some changes on the file metadata organization. @Christina-J-Diaz are you working on those updates? There's also some changes on the enum side, but that can be in a pass after we have the structure aligned. |
Adding modeling for a files access model and assay access model. still WIP