Decide whether we should include certain node properties as "optional desirable"

I the current aggregation function, we are integrating the following node properties:

```
def union_and_deduplicate_nodes(retrieve_most_specific_category: bool, *nodes, cols: List[str]) -> ps.DataFrame:
    """Function to unify nodes datasets."""
    # fmt: off
    unioned_datasets = (
        _union_datasets(*nodes)
        # first we group the dataset by id to deduplicate
        .groupBy("id")
        .agg(
            F.first("name", ignorenulls=True).alias("name"),
            F.first("category", ignorenulls=True).alias("category"),
            F.first("description", ignorenulls=True).alias("description"),
            F.first("international_resource_identifier", ignorenulls=True).alias("international_resource_identifier"),
            F.flatten(F.collect_set("equivalent_identifiers")).alias("equivalent_identifiers"),
            F.flatten(F.collect_set("all_categories")).alias("all_categories"),
            F.flatten(F.collect_set("labels")).alias("labels"),
            F.flatten(F.collect_set("publications")).alias("publications"),
            F.flatten(F.collect_set("upstream_data_source")).alias("upstream_data_source"),
        )
    )
    # next we need to apply a number of transformations to the nodes to ensure grouping by id did not select wrong information
    # this is especially important if we integrate multiple KGs

    if retrieve_most_specific_category:
        unioned_datasets = unioned_datasets.transform(determine_most_specific_category)

    return unioned_datasets.select(*cols)
    # fmt: on
```

**Used by EC pipeline and already required**

- id
- category

**Used by EC pipeline but not required**

- publications
- description
- name

**Useful but not biolink:**

 - international_resource_identifier
- equivalent_identifiers
- all_categories
- labels (should be `synonyms`)
- upstream_data_source

## Action items

- [ ] determine if the ones not in biolink should be added / have a corresponding correct attribute
- [ ] decide which ones to make required, if any
- [ ] decide how to check "optional columns of interest" in the validator. For example, robokop has a column `name:string` which, if it was an "optional desirable" attribute, should have been `name`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decide whether we should include certain node properties as "optional desirable" #33

Action items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Decide whether we should include certain node properties as "optional desirable" #33

Description

Action items

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions