Skip to content

Comments on plan for AGR/MGI in the README #21

@cmungall

Description

@cmungall

I see the README has details on how Alliance data is to be encoded in DATS, thanks for adding this.

Are there any example JSON files?

The README says:

AGR/MGI encoding
The preliminary encoding for the MGI mouse reference genome annotation is quite simple

This is a bit confusing. The AGR (preferred name: Alliance) is more than MGI. Is the plan to get data directly from MGI? Or to get mouse data from the Alliance (which may temporarily be less complete than what is obtained from MGI), or to get all species data from the Alliance.

I think it should be all species data, not sure why MGI is highlighted specifically?

The HomoloGene ids and HomoloGene-derived human gene ids in relatedIdentifiers...

Human homologs should be obtained from the Alliance, this will be more accurate than Homologene

Overall comments:

The KC7 products google doc says that expression data will be captured from the Alliance (or at least from MGI) but the example in the README is just the basic gene information. Also the Alliance is producing gene to phenotype that is of broad interest. How should this be resolved?

It looks like the datamodel used is a generic one in which arbitrary Dimensions and CategoryValuePairs can be attached to abritrary molecular entities. I think there are some advantages to such a generic model but I question whether this is the best way of representing what is in knowledge bases like the Alliance. It feels like an impedance mismatch. In the diagram:

image

This just seems like a slightly awkward way of expressing what can be expressed more accurately in a line of GFF3 or in the Alliance's own native JSON format. It's not clear how well the dimension model will adapt to richer data from the alliance, e.g. expression or phenotype.

I propose that we simultaneously evaluate the biolink model for knowledge resources such as the Alliance. This would incur additional cost on the full stacks if they want to support both but it would be interesting to compare.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions