Skip to content

Downstream use of records #467

@conorfitch

Description

@conorfitch

I’ve been reading through some of the issues here and wanted to confirm my understanding of the intended design philosophy for the OSV schema. Does the following accurately describe some of the approach taken?:

  • One record should represent one data source.

    • e.g. Alpine and Debian were recently decoupled from the CVE record into their own records: Improve CVE entry management in OSV google/osv.dev#2465
    • However, there are examples like the OpenSSF malicious packages repo, which often merges multiple data sources into a single record. (The versions from each source are preserved under database_specific, and each source has a newline and subtitle in description.) Although, this repo could just be viewed as a single data source itself.
  • There should be no potentially redundant fields.

    • e.g. a numerical CVSS field isn’t present because it can be inferred from the vector. Another example that was discussed is whether “severity source” is needed (Proposal: add new severity[].source field #248) because, as a record is for a singular source, the severity source can just be inferred from the id.

      Although, it was mentioned that this would still be useful for the cases where a record contains third-party severity sources. e.g.

A discussion about EPSS (#144) brought up the possibility of downstream consumers merging the scores into existing records.

I imagine there are other metrics (e.g. other severity scores) that a downstream consumer might have themselves. So, what’s best for them to do if they have supplementary data they want to incorporate?:

  • Should they follow the philosophy of a record only containing data from a single data source - therefore, create a new record (linked as an alias or a "downstream" by specifying the upstream record) for their own data?
  • Or is it ok for them to merge it into existing records? Or even combine existing ones together under a new id e.g. to make some data easier to query?
    • EPSS - could they add it into the database_specific field in the top-level CVE record? Or is this for NVD data only?
    • Other severity scores - as mentioned above, a “source” field doesn’t currently exist. So maybe it would also be useful for this.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions