Skip to content

Entity and attribute names and formats for sample and diffraction plan shipment/upload #4

@KarlLevik

Description

@KarlLevik

As a starting-point, below is documentation for the CSV format we currently use for this at Diamond.

I imagine we would want to agree on a standard for attribute names as well as a JSON format to replace this.

These are the CSV column names:

oscillationRange,proteinAcronym,proteinName,spaceGroup,sampleBarcode,sampleName,samplePosition,sampleComments,
cell_a,cell_b,cell_c,cell_alpha,cell_beta,cell_gamma,subLocation,loopType,requiredResolution,centringMethod,experimentKind,
radiationSensitivity,energy,userPath,screenAndCollectRecipe,screenAndCollectNValue,sampleGroup

In our actual CSV files, the first line is a header which "dynamically" defines which columns you have and their ordering. So, you can have different columns and ordering for each file, just as long as the column names are ones we know about, and you have included the mandatory columns.

Here is an example - only the three first lines of data - and note that empty columns are ignored:

#proposalCode,proposalNumber,visitNumber,shippingName,dewarCode,containerCode,preObsResolution,neededResolution,oscillationRange,proteinAcronym,proteinName,spaceGroup,sampleBarcode,sampleName,samplePosition,sampleComments,cell_a,cell_b,cell_c,cell_alpha,cell_beta,cell_gamma,subLocation,loopType,requiredResolution,centringMethod,experimentKind,radiationSensitivity,energy,userPath,screenAndCollectRecipe,screenAndCollectNValue,sampleGroup
mx,32101,21,mx32101-23,DLS-MX-0079,MEP-005,,,,GPP91,GPP91,,,GPP91-2059-263C10A,1,,,,,,,,,Litho Loop,,,,,,,,,
mx,32101,21,mx32101-23,DLS-MX-0079,MEP-005,,,,GPP91,GPP91,,,GPP91-2059-263C10B,2,,,,,,,,,Litho Loop,,,,,,,,,
mx,32101,21,mx32101-23,DLS-MX-0079,MEP-005,,,,GPP91,GPP91,,,GPP91-2059-263C10C,3,,,,,,,,,Litho Loop,,,,,,,,,
...

I assume many of the attribute/column names are familiar and self-explanatory, but here is some extra info:

  • subLocation is an index referring to a position within a multipin sample.
  • userPath describes one or two levels of folders (folder1/folder2) that will be created inside the visit directory and into which the acquisition system will write diffraction images for the given sample.
  • screenAndCollectRecipe: can be "best", "all" or "none" (or empty). If using "best", then set screenAndCollectNValue to some integer, e.g. 3 if you want the best 3 samples from the group collected on. It has to be a value in the range 1 to 5.
  • sampleGroup: should be the name of a new group. If you want an existing group, use the group id.

The following fields are mandatory:

  • In the first line only: proposalCode, proposalNumber, shippingName
  • In all lines: dewarCode (i.e. dewar name) + containerCode + proteinAcronym + proteinName + sampleName + sampleBarcode

Additionally, you can specify flags when you upload the file:

  • --queuecontainer so that the container is queued for Unattended Data Collection (UDC)
  • --highpriority|mediumpriority|lowpriority so that the container is moved in the UDC queue (DLS staff only)
  • --allowanyregcontainer to use any puck, not just the ones associated with a proposal
  • --allowmissingfacilitycode so you dont need to specify a dewar facility code

Validation

If not successful, the uploader will abort with an error message. If there was a minor problem, then it will complete but with a warning message.

The warning messages are:

Unable to calculate unit cell volume for sample %s with cell params %s.
Unit cell volume must be positive. Got %s for sample %s with cell params %s
Not setting lab contacts for shipment as the csv file owner %s is not a lab contact for proposal %s.
The csv file owner %s is not in the ISPyB database.

The error messages are:

client is required.

inputcsvfile is required.

file %s not found.

The csv file owner %s is not in the ISPyB database.

If either of the unit cell parameters are defined, then all must be defined. Got %s for sample %s

All unit cell angles must be < 180 degrees. Got %s for sample %s

User-defined field list is missing the following mandatory fields: %s

If uploading the csv file from a visit dir, then the visit's proposal (%s) must match that given in the file (%s).

Authorisation failure - the time delta is too large.

The csv file owner %s is not a member of any sessions/visits in the ISPyB database.

If not uploading the csv file from a visit dir, then you must be a member of a session on the proposal you're trying to upload to (%s).

Illegal characters in sampleGroup %s. Legal characters: alpha-numeric, hyphen and underscore.

The sample group ID %d does not exist

The proposalId of sample group ID %d is different from the proposalId of sample %s

There is already a sample group for proposal %s with name %s

screenAndCollectNValue is not an integer - problem with sampleName %s

screenAndCollectRecipe 'none' requires a value for requiredResolution - sampleName %s

For screenAndCollectRecipe 'best' the screenAndCollectNValue must be from 1 to 5 - problem with sampleName %s

For screenAndCollectRecipe 'best' a sampleGroup is required - problem with sampleName %s

screenAndCollectRecipe 'all' requires a value for neededResolution - problem with sampleName %s

'%s' not a valid screenAndCollectRecipe - problem with sampleName %s
Mandatory field %s not filled in. (Only mandatory for first row.) Required format is: %s

Mandatory field %s not filled in. Required format is: %s

Field %s must be max 45 characters long, this value is longer: %s

Illegal characters in sampleName %s. Legal characters: alpha-numeric, hyphen and underscore.
Space group must be at least 2 characters long or be a positive integer: %s

Space group number must be in the range [1, 230]: %s

The dewar code %s is not a registered facility code for proposal %s

The container code %s is not a registered container code

The userPath can be max 100 characters long, this one is longer: %s

The proteins must have been approved - this one isn't: acronym: %s

The proteins must already exist in ISPyB - this one doesn't: acronym: %s

Sample with name %s already exists for protein with acronym %s in this proposal.

Value required for experimentKind when UDC/queueContainer option specified. No value found for sampleName %s

Sample %s in container %s is in an invalid location %s. Valid locations are 1 to 16.

Sample %s in container %s has an invalid non-integer location %s

Sample %s in container %s is in an invalid sub-location %s. Valid locations are 0 to 7.

Sample %s in container %s has location %s, sub-location %s which is already taken.

Project %s does not exist

There are %d occurrences of sample with name %s and protein acronym %s in this CSV file.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions