Downstream use of records

I’ve been reading through some of the issues here and wanted to confirm my understanding of the intended design philosophy for the OSV schema. Does the following accurately describe some of the approach taken?:

- **One record should represent one data source.**
    - e.g. Alpine and Debian were recently decoupled from the CVE record into their own records: https://github.com/google/osv.dev/issues/2465
    - However, there are examples like the OpenSSF malicious packages repo, which often merges multiple data sources into a single record. (The versions from each source are preserved under `database_specific`, and each source has a newline and subtitle in `description`.) Although, this repo could just be viewed as a single data source itself.

- **There should be no potentially redundant fields.**
    - e.g. a numerical CVSS field isn’t present because it can be inferred from the vector. Another example that was discussed is whether “severity source” is needed (https://github.com/ossf/osv-schema/issues/248) because, as a record is for a singular source, the severity source can just be inferred from the id.
        
        Although, it was mentioned that this would still be useful for the cases where a record contains third-party severity sources. e.g. 
        
        - A severity in a CVE record may be [a newer CVSS version that is not NVD’s](https://github.com/google/osv.dev/blob/ff13718c6bcd1e67cba5c29768c4945d643078ba/vulnfeeds/vulns/vulns.go#L324) own.
        - The conversion for Debian [adds severity from NVD](https://github.com/google/osv.dev/blob/ff13718c6bcd1e67cba5c29768c4945d643078ba/vulnfeeds/cmd/debian/main.go#L126) (which actually seems to not follow the idea that only one database is the source for an entire record).

A discussion about EPSS (https://github.com/ossf/osv-schema/issues/144) brought up the possibility of downstream consumers merging the scores into existing records.

I imagine there are other metrics (e.g. other severity scores) that a downstream consumer might have themselves. So, what’s best for them to do if they have supplementary data they want to incorporate?:

- Should they follow the philosophy of a record *only* containing data from a single data source - therefore, create a new record (linked as an `alias` or a "downstream" by specifying the `upstream` record) for their own data?
- Or is it ok for them to merge it into existing records? Or even combine existing ones together under a new id e.g. to make some data easier to query?
    - EPSS - could they add it into the `database_specific` field in the top-level CVE record? Or is this for NVD data only?
    - Other severity scores - as mentioned above, a “source” field doesn’t currently exist. So maybe it would also be useful for this.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Downstream use of records #467

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Downstream use of records #467

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions