update HACCP_term regex to required FOODON, add multivalue example#802
update HACCP_term regex to required FOODON, add multivalue example#802
Conversation
| multivalued: true | ||
| range: string | ||
| pattern: ^([^\s-]{1,2}|[^\s-]+.+[^\s-]+) \[[a-zA-Z]{2,}:[a-zA-Z0-9]\d+\]$ | ||
| pattern: ^.+\s*\[FOODON:\d+\](;\s*\[FOODON:\d+\])*$ |
There was a problem hiding this comment.
This regex requires FOODON ontology. Is this what we want?
There was a problem hiding this comment.
There was a problem hiding this comment.
Thanks! Please use the pattern ^(\S[^\r\n]*) [FOODON:\d{7,8}]$ instead of ^.+\s*\[FOODON:\d+\](;\s*\[FOODON:\d+\])*$ or see my notes on dynamic enumerations.
There was a problem hiding this comment.
Did you intentionally remove the white-space between the label and the term id? I don't think that's consistent with other ontology term patterns in MIxS
There was a problem hiding this comment.
@turbomam
For the white space, do you mean if it should be "lead poisoning [FOODON:03530243]" vs "lead poisoning[FOODON:03530243]"
So, the white space is supposed to be there? I thought I had it set to be valid with or without it... does it matter? If so, I'll make sure I correct it. Just tell me which is correct.
Looking at the submission schema the white space should be there. So I can make that update to the regex.
There was a problem hiding this comment.
From your comment here : #802 (comment)
^(\S[^\r\n]*) [FOODON:\d{7,8}]$
I f we want to use pattern-only validation, I suggest we go with that.
That regex ^(\S[^\r\n]*) [FOODON:\d{7,8}]$
is showing me that "lead poisoning [FOODON:03530243]" is invalid... :(
... are you sure that's right?? Or am I missing something about the formatting of the value for "lead poisoning [FOODON:03530243]" ?
I think it needs to be ^.+\s*\[FOODON:\d{7,8}]$
There was a problem hiding this comment.
decision, the regex in the 2nd image is good.
I'll test this and confirm then finish this PR.
Discussed 12/03
pattern vs structured_pattern : we have this for some of the more generic term label and term IDs.
Look for "settings" section in schema.
There was a problem hiding this comment.
I forgot to escape the square brackets around FOODON with backslashes \[F... etc
|
I didn't include an example. I am not at all familiar with the FoodAnimalAndAnimalFeed extension. Before I committed time to getting familair and making an example, I wanted to check that this was a good change. |
|
Thanks @mslarae13. This is good progress. We can refine it a little: First of all, how long are the numeric portions of FOODON URIs? I used ChatGPT 4 to help me with that SAPRQL query 7 or 8, after subtracting the 38 characters in the base portion or the URIs, "http://purl.obolibrary.org/obo/FOODON_" Next I asked ChatGPT 4
after a little testing with regexr, we came up with
I f we want to use pattern-only validation, I suggest we go with that. |
|
That doesn't check that the label and id portion match, etc., and it doesn't limit the choices to sub-classes of haccp guide food safety term A better LinkML validation strategy for this might be a dynamic enumeration. They are expressed with logic, but can be expanded to an enumeration with explicit permissible values. A limitation right now is that be that the permissible values won't include the label and the id won't be enclosed in square brackets. But I would like to use this case to motivate improvements to LinkML dynamic enumerations in support of MIxS. |
|
The vskit expand -s schema.yaml -o schema_expanded.yamlto expand this enums:
HaccpTerm:
reachable_from:
source_ontology: bioregistry:foodon
source_nodes:
- FOODON:03530221 ## haccp guide food safety term
is_direct: false
relationship_types:
- rdfs:subClassOfinto this enums:
HaccpTerm:
reachable_from:
source_ontology: bioregistry:foodon
source_nodes:
- FOODON:03530221 ## haccp guide food safety term
is_direct: false
relationship_types:
- rdfs:subClassOf
permissible_values:
FOODON:03530231:
text: FOODON:03530231
meaning: FOODON:03530231
title: hazard 3
FOODON:03530244:
text: FOODON:03530244
meaning: FOODON:03530244
title: sodium tripolyphosphate
FOODON:03530237:
text: FOODON:03530237
meaning: FOODON:03530237
title: hazard 9 |
|
If using this mechanism sounds promising to you, and you want the OAK code to be modified to emit "sodium tripolyphosphate [FOODON:03530244]" instead of "FOODON:03530244", please up-vote this |
| multivalued: true | ||
| range: string | ||
| pattern: ^([^\s-]{1,2}|[^\s-]+.+[^\s-]+) \[[a-zA-Z]{2,}:[a-zA-Z0-9]\d+\]$ | ||
| pattern: ^.+\s*\[FOODON:\d+\](;\s*\[FOODON:\d+\])*$ |
There was a problem hiding this comment.
Thanks! Please use the pattern ^(\S[^\r\n]*) [FOODON:\d{7,8}]$ instead of ^.+\s*\[FOODON:\d+\](;\s*\[FOODON:\d+\])*$ or see my notes on dynamic enumerations.
|
I agree that the change is suitable. As for the actual patturn being used, I bow to @turbomam's greater expertise on that! |
only1chunts
left a comment
There was a problem hiding this comment.
the changes seem reasonable to me, and I trust the combination of @turbomam and @mslarae13 to get the patturns correct (I dont have the expertise to know whats right).
There was a problem hiding this comment.
Pull request overview
Updates the MIxS LinkML schema to better reflect HACCP term expectations (FOODON-referenced terms) and refreshes documentation/examples.
Changes:
- Updated
HACCP_termto use a FOODON-specific regex and added a multi-value example. - Reflowed several slot descriptions into single-line strings (no semantic change).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| multivalued: true | ||
| range: string | ||
| pattern: ^([^\s-]{1,2}|[^\s-]+.+[^\s-]+) \[[a-zA-Z]{2,}:[a-zA-Z0-9]\d+\]$ | ||
| pattern: ^.+\s*\[FOODON:\d+\](;\s*\[FOODON:\d+\])*$ |
There was a problem hiding this comment.
This regex is effectively satisfied by any string that ends with a bracketed FOODON ID, even if earlier semicolon-separated terms don't have their own IDs (e.g., term1; term2 [FOODON:...] would match). If the intent is to validate a list of HACCP terms, each entry should include its own termLabel [FOODON:digits] (and if you keep ; as the separator, the repeating group should include the label as well). Tightening the pattern to require the termLabel + [FOODON:id] structure per item will prevent partially-specified multi-term values from being accepted.
| pattern: ^.+\s*\[FOODON:\d+\](;\s*\[FOODON:\d+\])*$ | |
| pattern: ^[^;]+\s*\[FOODON:\d+\](;\s*[^;]+\s*\[FOODON:\d+\])*$ |
| title: Hazard Analysis Critical Control Points (HACCP) guide food safety term | ||
| examples: | ||
| - value: tetrodotoxic poisoning [FOODON:03530249] | ||
| - value: tetrodotoxic poisoning[FOODON:03530249]; neurotoxic shellfish poisoning[FOODON:03530246] |
There was a problem hiding this comment.
The new multi-value example uses ; as a delimiter and omits the space before [ (e.g., poisoning[FOODON:...]). In this schema, multi-term free-text fields typically document/illustrate pipe-separated values (|), e.g. dietary_claim_use description (around mixs.yaml:5761-5763) and animal_feed_equip example ...|... (around mixs.yaml:4216). Consider updating this example to use | and the canonical termLabel [TERM:ID] spacing, or explicitly document that ; and optional spacing are intended for HACCP_term.
|
We've requested a GitHub Copilot review on this PR as part of a pass across all open MixS PRs. Copilot catches things like unused imports, resource leaks, and naming inconsistencies — it's a lightweight first pass, not a substitute for human review. No action needed from you unless Copilot flags something you agree with. |


Address syntax match to examples
Update regexs for MIxS
Based on the description for HACCP this requires the FOODON ontology.
description: Hazard Analysis Critical Control Points (HACCP) food safety terms;
This field accepts terms listed under HACCP guide food safety term (http://purl.obolibrary.org/obo/FOODON_03530221)
While this doesn't perform any validation to check if what's been entered is really in FOODON, it does some string check.