-
Notifications
You must be signed in to change notification settings - Fork 108
Description
I’ve been reading through some of the issues here and wanted to confirm my understanding of the intended design philosophy for the OSV schema. Does the following accurately describe some of the approach taken?:
-
One record should represent one data source.
- e.g. Alpine and Debian were recently decoupled from the CVE record into their own records: Improve CVE entry management in OSV google/osv.dev#2465
- However, there are examples like the OpenSSF malicious packages repo, which often merges multiple data sources into a single record. (The versions from each source are preserved under
database_specific, and each source has a newline and subtitle indescription.) Although, this repo could just be viewed as a single data source itself.
-
There should be no potentially redundant fields.
-
e.g. a numerical CVSS field isn’t present because it can be inferred from the vector. Another example that was discussed is whether “severity source” is needed (Proposal: add new severity[].source field #248) because, as a record is for a singular source, the severity source can just be inferred from the id.
Although, it was mentioned that this would still be useful for the cases where a record contains third-party severity sources. e.g.
- A severity in a CVE record may be a newer CVSS version that is not NVD’s own.
- The conversion for Debian adds severity from NVD (which actually seems to not follow the idea that only one database is the source for an entire record).
-
A discussion about EPSS (#144) brought up the possibility of downstream consumers merging the scores into existing records.
I imagine there are other metrics (e.g. other severity scores) that a downstream consumer might have themselves. So, what’s best for them to do if they have supplementary data they want to incorporate?:
- Should they follow the philosophy of a record only containing data from a single data source - therefore, create a new record (linked as an
aliasor a "downstream" by specifying theupstreamrecord) for their own data? - Or is it ok for them to merge it into existing records? Or even combine existing ones together under a new id e.g. to make some data easier to query?
- EPSS - could they add it into the
database_specificfield in the top-level CVE record? Or is this for NVD data only? - Other severity scores - as mentioned above, a “source” field doesn’t currently exist. So maybe it would also be useful for this.
- EPSS - could they add it into the
Thanks!