handle virtuoso dump for initial sync by nbittich · Pull Request #39 · lblod/delta-consumer

nbittich · 2025-10-08T14:18:06Z

This PR should make the delta consumer a bit more resilient when consuming large dumps generated with the Graph dump service. Added The updateWithRecover function utils and the HTTP_MAX_QUERY_SIZE_BYTES environment variable, which should be used in custom dispatching, as it splits the query according to the size of the query itself (I could be wrong but in my test a query over 100kb fails), reducing the number of retry attempts. Gzipped delta files and dumps are also now supported.

nvdk · 2025-10-08T14:35:22Z

      });
      const distribution = await resultDistribution.json();
-      return new DumpFile(distributionMetaData, distribution.data[0].relationships.subject.data);
+      const fileResponse = await fetcher(`${GET_FILE_ENDPOINT.replace(':id', distribution.data[0].relationships.subject.data.id)}`, {


this is just taking the first distribution found, it might be better to search for a specific one, similar to what is done here:
https://github.com/lblod/app-burgernabije-besluitendatabank/blob/master/scripts/import-dumps/run.rb#L130

this should now be fixed

nvdk · 2025-10-20T08:46:02Z

 export const SYNC_FILES_PATH = process.env.DCR_SYNC_FILES_PATH || '/sync/files';
-export const DOWNLOAD_FILE_PATH = process.env.DCR_DOWNLOAD_FILE_PATH || '/files/:id/download';
+export const GET_FILE_PATH = process.env.DCR_GET_FILE_PATH || '/files/:id';
+export const DOWNLOAD_FILE_PATH = process.env.DCR_DOWNLOAD_FILE_PATH || GET_FILE_PATH + '/download';


could you clarify why this split was necessary? it seems this would be a breaking change?

we need to fetch metadata about the file, thus the new GET_FILE_PATH env var. I think the two could even be static, probably no app uses it (it's not documented in the readme). it's only the relative path of the file endpoint, and the previous one was too specific (pointing to the download endpoint), and thus we couldn't fetch metadata about the file.

nvdk · 2025-10-20T08:58:52Z

    try {
      await this.download();
-      const changeSets = await fs.readJson(this.filePath, { encoding: 'utf-8' });
+      let fileStream = fs.createReadStream(this.filePath);


I'm always a bit confused about streams and their events in an async method. I assume that the final await json will throw any pipeline errors as they cascade to the final consumer in the stream pipeline. If that's the case this looks fine.

yup, it's a special function from stream/consumers module provided by node. I also get confused with the behavior of streams, that's why I used the json helper to convert this to a simple async await (and the error will cascade idd)

nvdk · 2025-10-20T09:02:27Z

  "homepage": "https://github.com/lblod/delta-consumer#readme",
  "dependencies": {
-    "@lblod/mu-auth-sudo": "0.6.1",
+    "@lblod/mu-auth-sudo": "0.6.2",


consider bumping the template to 1.9.1 and just using auth-sudo from the template

bumped to 1.9.1 but didn't switch to sudo support of the js template yet as it doesn't provide the ability to override the sparql endpoint and the retry mechanism

nvdk

Some minor remarks but changes seem sound. I didn't test it yet though

This reverts commit 7db7e99.

nbittich requested review from cecemel, claire-lovisa and nvdk October 8, 2025 14:18

nvdk reviewed Oct 8, 2025

View reviewed changes

nbittich marked this pull request as ready for review October 20, 2025 08:38

gzip support, smarter update with recover

7d5a2f2

nbittich force-pushed the feature/initial-sync-with-virtuoso-dump branch from b48f357 to 7d5a2f2 Compare October 20, 2025 08:43

nbittich requested a review from nvdk October 20, 2025 08:44

nvdk reviewed Oct 20, 2025

View reviewed changes

Comment thread package.json Outdated

nvdk approved these changes Oct 20, 2025

View reviewed changes

nbittich and others added 14 commits October 21, 2025 15:12

bump js template

9ef76f3

allow mu auth sudo by default

25d1a3c

extra changes

22a2d23

trigger build

da81c31

put the config folder back as it's necessary for custom dispatch

28c9153

export everything from config

9b85019

export helpers insertIntoGraph and deleteFromGraph

297ac79

extraheaders is optional

d619528

use a more realistic example

7a5c0e7

regression: some producers don't expose file endpoint

8d15dc4

fix regression

2c92dfd

fix incorrect distribution selection

59bc1a8

Merge branch 'master' into feature/initial-sync-with-virtuoso-dump

f9b7025

make new helpers available on finish ingest callback & delta sync

af35349

nbittich merged commit 7db7e99 into master Jan 12, 2026
1 check passed

nbittich deleted the feature/initial-sync-with-virtuoso-dump branch January 12, 2026 10:51

nbittich restored the feature/initial-sync-with-virtuoso-dump branch January 22, 2026 09:34

nbittich added a commit that referenced this pull request Jan 22, 2026

Revert "handle virtuoso dump for initial sync (#39)"

c7c29cc

This reverts commit 7db7e99.

nbittich added a commit that referenced this pull request Jan 22, 2026

Revert "handle virtuoso dump for initial sync (#39)"

4e60611

This reverts commit 7db7e99.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

handle virtuoso dump for initial sync#39

handle virtuoso dump for initial sync#39
nbittich merged 15 commits intomasterfrom
feature/initial-sync-with-virtuoso-dump

nbittich commented Oct 8, 2025 •

edited

Loading

Uh oh!

nvdk Oct 8, 2025

Uh oh!

nbittich Oct 21, 2025

Uh oh!

nvdk Oct 20, 2025

Uh oh!

nbittich Oct 21, 2025

Uh oh!

nvdk Oct 20, 2025

Uh oh!

nbittich Oct 21, 2025

Uh oh!

nvdk Oct 20, 2025

Uh oh!

nbittich Oct 21, 2025

Uh oh!

Uh oh!

nvdk left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nbittich commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nvdk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nbittich commented Oct 8, 2025 •

edited

Loading