Skip to content

update HACCP_term regex to required FOODON, add multivalue example#802

Open
mslarae13 wants to merge 3 commits intomainfrom
514-example-syntax-check
Open

update HACCP_term regex to required FOODON, add multivalue example#802
mslarae13 wants to merge 3 commits intomainfrom
514-example-syntax-check

Conversation

@mslarae13
Copy link
Collaborator

@mslarae13 mslarae13 commented Jun 14, 2024

Address syntax match to examples
Update regexs for MIxS

Based on the description for HACCP this requires the FOODON ontology.
description: Hazard Analysis Critical Control Points (HACCP) food safety terms;
This field accepts terms listed under HACCP guide food safety term (http://purl.obolibrary.org/obo/FOODON_03530221)

While this doesn't perform any validation to check if what's been entered is really in FOODON, it does some string check.

multivalued: true
range: string
pattern: ^([^\s-]{1,2}|[^\s-]+.+[^\s-]+) \[[a-zA-Z]{2,}:[a-zA-Z0-9]\d+\]$
pattern: ^.+\s*\[FOODON:\d+\](;\s*\[FOODON:\d+\])*$
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regex requires FOODON ontology. Is this what we want?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Please use the pattern ^(\S[^\r\n]*) [FOODON:\d{7,8}]$ instead of ^.+\s*\[FOODON:\d+\](;\s*\[FOODON:\d+\])*$ or see my notes on dynamic enumerations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you intentionally remove the white-space between the label and the term id? I don't think that's consistent with other ontology term patterns in MIxS

Copy link
Collaborator Author

@mslarae13 mslarae13 Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@turbomam
For the white space, do you mean if it should be "lead poisoning [FOODON:03530243]" vs "lead poisoning[FOODON:03530243]"

So, the white space is supposed to be there? I thought I had it set to be valid with or without it... does it matter? If so, I'll make sure I correct it. Just tell me which is correct.

Looking at the submission schema the white space should be there. So I can make that update to the regex.

Copy link
Collaborator Author

@mslarae13 mslarae13 Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From your comment here : #802 (comment)

^(\S[^\r\n]*) [FOODON:\d{7,8}]$

I f we want to use pattern-only validation, I suggest we go with that.

That regex ^(\S[^\r\n]*) [FOODON:\d{7,8}]$
is showing me that "lead poisoning [FOODON:03530243]" is invalid... :(

... are you sure that's right?? Or am I missing something about the formatting of the value for "lead poisoning [FOODON:03530243]" ?

I think it needs to be ^.+\s*\[FOODON:\d{7,8}]$

(https://www.ebi.ac.uk/ols4/ontologies/foodon/classes/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FFOODON_03530221)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2024-07-09 at 4 53 27 PM Screenshot 2024-07-09 at 4 53 40 PM

Copy link
Collaborator Author

@mslarae13 mslarae13 Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decision, the regex in the 2nd image is good.

I'll test this and confirm then finish this PR.

Discussed 12/03
pattern vs structured_pattern : we have this for some of the more generic term label and term IDs.
Look for "settings" section in schema.

Copy link
Member

@turbomam turbomam Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to escape the square brackets around FOODON with backslashes \[F... etc

@mslarae13 mslarae13 marked this pull request as ready for review June 28, 2024 17:48
@mslarae13
Copy link
Collaborator Author

I didn't include an example. I am not at all familiar with the FoodAnimalAndAnimalFeed extension. Before I committed time to getting familair and making an example, I wanted to check that this was a good change.

@turbomam
Copy link
Member

turbomam commented Jul 9, 2024

Thanks @mslarae13. This is good progress. We can refine it a little:

First of all, how long are the numeric portions of FOODON URIs?

I used ChatGPT 4 to help me with that SAPRQL query

7 or 8, after subtracting the 38 characters in the base portion or the URIs, "http://purl.obolibrary.org/obo/FOODON_"

Next I asked ChatGPT 4

I want to write a regular expression for a FOODON label followed by one white-space and then a FOODON CURIe. The CURIes should be enclosed in square brackets. They start with "FOODON:" and are followed by 7 or 8 digits. The label must start with a non-white-space character but can have any number of any characters after that, as long as they aren't carriage returns, line feeds, etc.

after a little testing with regexr, we came up with

^(\S[^\r\n]*) [FOODON:\d{7,8}]$

I f we want to use pattern-only validation, I suggest we go with that.

@turbomam
Copy link
Member

turbomam commented Jul 9, 2024

That doesn't check that the label and id portion match, etc., and it doesn't limit the choices to sub-classes of haccp guide food safety term

A better LinkML validation strategy for this might be a dynamic enumeration. They are expressed with logic, but can be expanded to an enumeration with explicit permissible values. A limitation right now is that be that the permissible values won't include the label and the id won't be enclosed in square brackets. But I would like to use this case to motivate improvements to LinkML dynamic enumerations in support of MIxS.

@turbomam
Copy link
Member

turbomam commented Jul 9, 2024

The vskit command from the Ontology Access Kit can be used like this

vskit expand -s schema.yaml -o schema_expanded.yaml

to expand this

enums:
  HaccpTerm:
    reachable_from:
      source_ontology: bioregistry:foodon
      source_nodes:
      - FOODON:03530221   ## haccp guide food safety term
      is_direct: false
      relationship_types:
      - rdfs:subClassOf

into this

enums:
  HaccpTerm:
    reachable_from:
      source_ontology: bioregistry:foodon
      source_nodes:
      - FOODON:03530221   ## haccp guide food safety term
      is_direct: false
      relationship_types:
      - rdfs:subClassOf
    permissible_values:
      FOODON:03530231:
        text: FOODON:03530231
        meaning: FOODON:03530231
        title: hazard 3
      FOODON:03530244:
        text: FOODON:03530244
        meaning: FOODON:03530244
        title: sodium tripolyphosphate
      FOODON:03530237:
        text: FOODON:03530237
        meaning: FOODON:03530237
        title: hazard 9

@turbomam
Copy link
Member

turbomam commented Jul 9, 2024

If using this mechanism sounds promising to you, and you want the OAK code to be modified to emit "sodium tripolyphosphate [FOODON:03530244]" instead of "FOODON:03530244", please up-vote this

multivalued: true
range: string
pattern: ^([^\s-]{1,2}|[^\s-]+.+[^\s-]+) \[[a-zA-Z]{2,}:[a-zA-Z0-9]\d+\]$
pattern: ^.+\s*\[FOODON:\d+\](;\s*\[FOODON:\d+\])*$
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Please use the pattern ^(\S[^\r\n]*) [FOODON:\d{7,8}]$ instead of ^.+\s*\[FOODON:\d+\](;\s*\[FOODON:\d+\])*$ or see my notes on dynamic enumerations.

@only1chunts
Copy link
Member

I agree that the change is suitable. As for the actual patturn being used, I bow to @turbomam's greater expertise on that!
The additional idea of using some sort of automated expansion thingy sounds like a good idea to me, so I have thumbs-up'd that ticket in the ontology access toolkit repo.

Copy link
Member

@only1chunts only1chunts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the changes seem reasonable to me, and I trust the combination of @turbomam and @mslarae13 to get the patturns correct (I dont have the expertise to know whats right).

@mslarae13 mslarae13 self-assigned this Sep 3, 2024
@mslarae13 mslarae13 added the 1-TermUpdate Update suggestion for existing term, including bugs. Issues from "cig-bug" label moved here. label Sep 3, 2024
@turbomam turbomam requested a review from Copilot March 11, 2026 00:30
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the MIxS LinkML schema to better reflect HACCP term expectations (FOODON-referenced terms) and refreshes documentation/examples.

Changes:

  • Updated HACCP_term to use a FOODON-specific regex and added a multi-value example.
  • Reflowed several slot descriptions into single-line strings (no semantic change).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

multivalued: true
range: string
pattern: ^([^\s-]{1,2}|[^\s-]+.+[^\s-]+) \[[a-zA-Z]{2,}:[a-zA-Z0-9]\d+\]$
pattern: ^.+\s*\[FOODON:\d+\](;\s*\[FOODON:\d+\])*$
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regex is effectively satisfied by any string that ends with a bracketed FOODON ID, even if earlier semicolon-separated terms don't have their own IDs (e.g., term1; term2 [FOODON:...] would match). If the intent is to validate a list of HACCP terms, each entry should include its own termLabel [FOODON:digits] (and if you keep ; as the separator, the repeating group should include the label as well). Tightening the pattern to require the termLabel + [FOODON:id] structure per item will prevent partially-specified multi-term values from being accepted.

Suggested change
pattern: ^.+\s*\[FOODON:\d+\](;\s*\[FOODON:\d+\])*$
pattern: ^[^;]+\s*\[FOODON:\d+\](;\s*[^;]+\s*\[FOODON:\d+\])*$

Copilot uses AI. Check for mistakes.
title: Hazard Analysis Critical Control Points (HACCP) guide food safety term
examples:
- value: tetrodotoxic poisoning [FOODON:03530249]
- value: tetrodotoxic poisoning[FOODON:03530249]; neurotoxic shellfish poisoning[FOODON:03530246]
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new multi-value example uses ; as a delimiter and omits the space before [ (e.g., poisoning[FOODON:...]). In this schema, multi-term free-text fields typically document/illustrate pipe-separated values (|), e.g. dietary_claim_use description (around mixs.yaml:5761-5763) and animal_feed_equip example ...|... (around mixs.yaml:4216). Consider updating this example to use | and the canonical termLabel [TERM:ID] spacing, or explicitly document that ; and optional spacing are intended for HACCP_term.

Copilot uses AI. Check for mistakes.
@turbomam
Copy link
Member

We've requested a GitHub Copilot review on this PR as part of a pass across all open MixS PRs. Copilot catches things like unused imports, resource leaks, and naming inconsistencies — it's a lightweight first pass, not a substitute for human review. No action needed from you unless Copilot flags something you agree with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

1-TermUpdate Update suggestion for existing term, including bugs. Issues from "cig-bug" label moved here.

Projects

Status: Backlog

4 participants