Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions submit/samples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -88,3 +88,4 @@ Find specific advice on registering studies using your preferred method below:

samples/interactive
samples/programmatic
samples/sample_checklist
Empty file.
81 changes: 43 additions & 38 deletions submit/samples/missing-values.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,52 +20,57 @@ INSDC Missing Value Reporting Terms

+----------------------------+------------------------------+-----------------------------------------------+----------------------------------+---------------------------------------------------+
| **INSDC term (top level)** | **INSDC term (lower level)** | **Definition** | **INSDC term (reporting level)** | **Definition** |
+============================+==============================+===============================================+==================================+===================================================+
| not applicable | | information is inappropriate to report, can | control sample | Information is not applicable as the sample |
| | | indicate that the standard itself fails to | | represents a negative control sample |
| | | model or represent the information | | collected in a lab |
| | | appropriately +----------------------------------+---------------------------------------------------+
| | | | sample group | Information is not applicable as the sample |
| | | | | represents a group of samples that do not |
| | | | | have a single origin. E.g. for co-assembly or |
| | | | | transcriptome assembly. |
+----------------------------+------------------------------+-----------------------------------------------+----------------------------------+---------------------------------------------------+
| not applicable | | | information is inappropriate to report, can | control sample | | Information is not applicable as the sample |
| | | | indicate that the standard itself fails to | | | represents a negative control sample |
| | | | model or represent the information | | | collected in a lab |
| | | | appropriately +----------------------------------+---------------------------------------------------+
| | | | | sample group | | Information is not applicable as the sample |
| | | | | | | represents a group of samples that do not |
| | | | | | | have a single origin. E.g. for co-assembly or |
| | | | | | | transcriptome assembly. |
| missing | not collected | information of an expected format was not | synthetic construct | Information does not exist as the sample |
| | | given because it has not been collected | | represents an ab-initio synthetic construct. |
| | | +----------------------------------+---------------------------------------------------+
| | | | lab stock | Information was not collected as the sample |
| | | | | represents a cultured cell line or model |
| | | | | organism under long-term lab control. |
| | | +----------------------------------+---------------------------------------------------+
| | | | third party data | Information does not exist as the metadata |
| | | | | was not collected or reported in records |
| | | | | predating the 2023 agreement. For use in |
| | | | | Third Party data submissions. |
+----------------------------+------------------------------+-----------------------------------------------+----------------------------------+---------------------------------------------------+
| missing | not collected | | information of an expected format was not | synthetic construct | | Information does not exist as the sample |
| | | | given because it has not been collected | | | represents an ab-initio synthetic construct. |
| | | | +----------------------------------+---------------------------------------------------+
| | | | | lab stock | | Information was not collected as the sample |
| | | | | | | represents a cultured cell line or model |
| | | | | | | organism under long-term lab control. |
| | | | +----------------------------------+---------------------------------------------------+
| | | | | third party data | | Information does not exist as the metadata |
| | | | | | | was not collected or reported in records |
| | | | | | | predating the 2023 agreement. For use in |
| | | | | | | Third Party data submissions. |
| +------------------------------+-----------------------------------------------+----------------------------------+---------------------------------------------------+
| | not provided | | information of an expected format was not | data agreement established | | Data agreements were established before the |
| | | | given, a value may be given at the later | pre-2023 | | 2023 INSDC standard and metadata can not be |
| | | | stage | | | provided. A value may be given at a later stage |
| +------------------------------+-----------------------------------------------+----------------------------------+---------------------------------------------------+
| | restricted access | | information exists but can not be released | endangered species | | Information can not be reported as the target |
| | | | openly because of privacy concerns | | | organism is endangered e.g. on the IUCN red- |
| | | | | | | list |
| | | | +----------------------------------+---------------------------------------------------+
| | | | | human-identifiable | | Information can not be reported as the |
| | | | | | | metadata would make the sample human- |
| | | | | | | identifiable. |
| missing | not provided | information of an expected format was not | data agreement established | Data agreements were established before the |
| | | given, a value may be given at the later | pre-2023 | 2023 INSDC standard and metadata can not be |
| | | stage | | provided. A value may be given at a later stage |
+----------------------------+------------------------------+-----------------------------------------------+----------------------------------+---------------------------------------------------+
| missing | restricted access | information exists but can not be released | endangered species | Information can not be reported as the target |
| | | openly because of privacy concerns | | organism is endangered e.g. on the IUCN red- |
| | | | | list |
| | | +----------------------------------+---------------------------------------------------+
| | | | human-identifiable | Information can not be reported as the |
| | | | | metadata would make the sample human- |
| | | | | identifiable. |
+----------------------------+------------------------------+-----------------------------------------------+----------------------------------+---------------------------------------------------+


Usage of INSDC Missing Value Reporting Terms
============================================

Please use the above standardised missing value vocabulary **only if a true value of an expected format for a mandatory field is missing**. If a true value is missing for a **recommended** or an **optional** field, then these fields should not be used for reporting at all. When reporting a missing mandatory field, the eight granular **‘reporting level’** terms need to be preceded with the term *missing:* to declare both the absence of a true value as well as the reason.
*not applicable* is only ever used as a top level term, its reporting level terms ought to be prefixed by *missing:*

Example of usage:
-----------------

**geographic location (country and/or sea)**: *missing: data agreement-established pre-2023*

**collection date**: *missing: control sample*
Examples of Usage:
------------------

**geographic location (country and/or sea)**: *missing: human-identifiable*
+---------------------------+----------------------------------------------+-----------------------------------------------+
| **Short Field Name** | **Long Field Name** | **Missing Value** Example |
+===========================+==============+===============================+===============================================+
| **geo_loc_name** | **geographic location (country and/or sea)** |*missing: data agreement established pre-2023* |
+---------------------------+----------------------------------------------+-----------------------------------------------+
| **collection_date** | **collection date** | *missing: control sample* |
+---------------------------+----------------------------------------------+-----------------------------------------------+
| **geo_loc_name** | **geographic location (country and/or sea)** + *missing: human-identifiable* |
+---------------------------+----------------------------------------------+-----------------------------------------------+
10 changes: 10 additions & 0 deletions submit/samples/sample_checklist.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
========================
Sample Checklist Related
========================

Various sample related information

.. toctree:
:maxdepth: 1
sample_checklist/sample_checklist_introduction.rst
sample_checklist/updates.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
===============================
Sample Checklist Infrastructure
===============================


DRAFT!

------------
Introduction
------------


In late 2024/early 2025 ENA implementing modernisation of the underlying ENA checklist systems architecture.

---------------------------------------------------
Why we moved to using Versioning Sample Checklists?
---------------------------------------------------
It will allow ENA to more rapidly update checklists (e.g. when new GSC MIxS releases) and also use ontologies for terms.

What it has meant is that checklists will all have versions and you will need to pull down the latest one after submission.
Changes that need a new version could be as simple as the required pattern changing.

------
High Level Infrastructural Changes
It makes the system more maintenance friendly.

--------------------------------------------------------
Technical Endpoints to Computationally Access Checklists
--------------------------------------------------------

Endpoints, you can use here, (TO BE UPDATED TO PROD instances)
Paginate over all versioned schemas.
Schema and metadata (JSON Schema is embedded here)
Just a summary of schema
Paginate over latest schemas (same as above 'a', with latest=true query param)
Get JSON Schema of the latest version of a given checklist (eg. ERC000022)
Loading