handle virtuoso dump for initial sync#39
Conversation
| }); | ||
| const distribution = await resultDistribution.json(); | ||
| return new DumpFile(distributionMetaData, distribution.data[0].relationships.subject.data); | ||
| const fileResponse = await fetcher(`${GET_FILE_ENDPOINT.replace(':id', distribution.data[0].relationships.subject.data.id)}`, { |
There was a problem hiding this comment.
this is just taking the first distribution found, it might be better to search for a specific one, similar to what is done here:
https://github.com/lblod/app-burgernabije-besluitendatabank/blob/master/scripts/import-dumps/run.rb#L130
There was a problem hiding this comment.
this should now be fixed
b48f357 to
7d5a2f2
Compare
| export const SYNC_FILES_PATH = process.env.DCR_SYNC_FILES_PATH || '/sync/files'; | ||
| export const DOWNLOAD_FILE_PATH = process.env.DCR_DOWNLOAD_FILE_PATH || '/files/:id/download'; | ||
| export const GET_FILE_PATH = process.env.DCR_GET_FILE_PATH || '/files/:id'; | ||
| export const DOWNLOAD_FILE_PATH = process.env.DCR_DOWNLOAD_FILE_PATH || GET_FILE_PATH + '/download'; |
There was a problem hiding this comment.
could you clarify why this split was necessary? it seems this would be a breaking change?
There was a problem hiding this comment.
we need to fetch metadata about the file, thus the new GET_FILE_PATH env var. I think the two could even be static, probably no app uses it (it's not documented in the readme). it's only the relative path of the file endpoint, and the previous one was too specific (pointing to the download endpoint), and thus we couldn't fetch metadata about the file.
| try { | ||
| await this.download(); | ||
| const changeSets = await fs.readJson(this.filePath, { encoding: 'utf-8' }); | ||
| let fileStream = fs.createReadStream(this.filePath); |
There was a problem hiding this comment.
I'm always a bit confused about streams and their events in an async method. I assume that the final await json will throw any pipeline errors as they cascade to the final consumer in the stream pipeline. If that's the case this looks fine.
There was a problem hiding this comment.
yup, it's a special function from stream/consumers module provided by node. I also get confused with the behavior of streams, that's why I used the json helper to convert this to a simple async await (and the error will cascade idd)
| "homepage": "https://github.com/lblod/delta-consumer#readme", | ||
| "dependencies": { | ||
| "@lblod/mu-auth-sudo": "0.6.1", | ||
| "@lblod/mu-auth-sudo": "0.6.2", |
There was a problem hiding this comment.
consider bumping the template to 1.9.1 and just using auth-sudo from the template
There was a problem hiding this comment.
bumped to 1.9.1 but didn't switch to sudo support of the js template yet as it doesn't provide the ability to override the sparql endpoint and the retry mechanism
nvdk
left a comment
There was a problem hiding this comment.
Some minor remarks but changes seem sound. I didn't test it yet though
related to lblod/app-decide#8.
This PR should make the delta consumer a bit more resilient when consuming large dumps generated with the Graph dump service. Added The
updateWithRecoverfunction utils and theHTTP_MAX_QUERY_SIZE_BYTESenvironment variable, which should be used in custom dispatching, as it splits the query according to the size of the query itself (I could be wrong but in my test a query over 100kb fails), reducing the number of retry attempts. Gzipped delta files and dumps are also now supported.