We are very interested keeping our wfcatalog in sync with the file present in the archive. In particularly we were looking into removing and updating documents after files are removed. This occasionally happens due to data curation. So we are positively surprised to see that a delete operation was recently added to the WFCollector.
However, after some code auditing I suspect that the logic of these operation might be flaw and would not ensure consistency between waveform archive and wfcatalog. I might be wrong and this is just my lag of understanding the details.
In particular the delete operation seems not to update all potentially affected documents:
For the update operation the effect seems to be somewhat minor:
I understand that especially for high sampling the effect of this "details", might be minor, but for low rates this will have an important impact.
Where I am missing something?
We are very interested keeping our wfcatalog in sync with the file present in the archive. In particularly we were looking into removing and updating documents after files are removed. This occasionally happens due to data curation. So we are positively surprised to see that a delete operation was recently added to the
WFCollector.However, after some code auditing I suspect that the logic of these operation might be flaw and would not ensure consistency between waveform archive and wfcatalog. I might be wrong and this is just my lag of understanding the details.
In particular the
deleteoperation seems not to update all potentially affected documents:fileIDonly returns one documents;https://github.com/EIDA/wfcatalog/blob/master/collector/WFCatalogCollector.py#L226
https://github.com/EIDA/wfcatalog/blob/master/collector/WFCatalogCollector.py#L1246-L1251
https://github.com/EIDA/wfcatalog/blob/master/collector/WFCatalogCollector.py#L229
For the
updateoperation the effect seems to be somewhat minor:also in this case only the document for the nominal day is found and touched;
https://github.com/EIDA/wfcatalog/blob/master/collector/WFCatalogCollector.py#L510
the document of the nominal day is update and includes data for the precedent waveform file, so it remains consistent with the waveform archive;
however, the document related to day after nominal, seems not to be updated, even if the underlying waveform data might have changes.
I also doubt that the update should use information for previous processing. One just needs to identify all documents for update and reprocess without its content;
https://github.com/EIDA/wfcatalog/blob/master/collector/WFCatalogCollector.py#L529
I understand that especially for high sampling the effect of this "details", might be minor, but for low rates this will have an important impact.
Where I am missing something?