feat: optimizations for OSV data fetching

Now that we have basic functionality working, it would be good to optimize for decreasing the disk space consumption and overall download volume efficiency.

Please see full context in my comments in #4956. Relevant excerpts below for convenience.

**(one)**
W.r.t. my previous comment about disk space/inodes consumed - it would be better to do this unpacking and CVE import per-file (i.e., per-ecosystem), and delete both the ZIP (as you do here already) and the unpacked files (as I propose for the `process_data_from_disk()`) as soon as a given ecosystem is done. That way you won't have more than one ZIP/set of JSONs on disk at any given time, decreasing the disk space usage significantly.

_Originally posted by @alex-ter in https://github.com/ossf/cve-bin-tool/pull/4956#discussion_r2838318643_

**(two)**
Aside from the other comment about correctness, this will fetch full ecosystem files every time, which AFAICS is about a gigabyte of data (judging by the undocumented `all.zip` at the root of the hierarchy, I presume it has all of them). It will then unpack and process all of them as well, which is quite inefficient.

EDIT: while testing, the unpacked size is 7.1 GB and there are about 600K files in the cache. [...]

_Originally posted by @alex-ter in https://github.com/ossf/cve-bin-tool/pull/4956#discussion_r2679463584_
            

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: optimizations for OSV data fetching #5569

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: optimizations for OSV data fetching #5569

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions