Feature description
When building a custom viewer or handling documents with a large number of pages, it is often necessary to retrieve the dimensions and rotation of all pages upfront (e.g., to calculate the total scroll height or render a skeleton layout).
Currently, the only way to achieve this is by calling doc.getPage(i) sequentially for every page index. This requires N separate round-trips between the main thread and the PDF worker. For documents with hundreds of pages, the cumulative overhead of postMessage communication causes a significant performance bottleneck, often taking several seconds just to retrieve basic metadata (view, rotate, userUnit) that allows the UI to layout the document placeholders.
I propose adding a new method
getPagesInfo() to the PDFDocumentProxy API. This method would trigger a single message to the worker (GetPagesInfo), which iterates the page tree on the worker side and returns the layout data for all pages in a single response.
This eliminates the message-passing overhead and reduces the operation time from seconds to milliseconds for large documents.
I have implemented and tested this locally with the following changes:
- src/core/worker.js (or pdf.worker.mjs build): Added a GetPagesInfo handler that resolves
view, rotate, and userUnit for all pages efficiently.
handler.on("GetPagesInfo", function (data) {
return pdfManager.ensureDoc("numPages").then(function (numPages) {
const promises = [];
for (let i = 0; i < numPages; i++) {
promises.push(pdfManager.getPage(i).then(function (page) {
return Promise.all([
pdfManager.ensure(page, "view"),
pdfManager.ensure(page, "rotate"),
pdfManager.ensure(page, "userUnit")
]).then(function ([view, rotate, userUnit]) {
return { view, rotate, userUnit };
});
}));
}
return Promise.all(promises);
});
});
// In PDFDocumentProxy
getPagesInfo() {
return this._transport.getPagesInfo();
}
// In WorkerTransport
getPagesInfo() {
return this.messageHandler.sendWithPromise("GetPagesInfo", null);
}
This simple addition drastically improves the user experience for custom viewers dealing with long documents, allowing for instant "skeleton" rendering without blocking the UI or waiting for N asynchronous
getPage
calls.
Feature description
When building a custom viewer or handling documents with a large number of pages, it is often necessary to retrieve the dimensions and rotation of all pages upfront (e.g., to calculate the total scroll height or render a skeleton layout).
Currently, the only way to achieve this is by calling doc.getPage(i) sequentially for every page index. This requires N separate round-trips between the main thread and the PDF worker. For documents with hundreds of pages, the cumulative overhead of postMessage communication causes a significant performance bottleneck, often taking several seconds just to retrieve basic metadata (view, rotate, userUnit) that allows the UI to layout the document placeholders.
I propose adding a new method
getPagesInfo() to the PDFDocumentProxy API. This method would trigger a single message to the worker (GetPagesInfo), which iterates the page tree on the worker side and returns the layout data for all pages in a single response.
This eliminates the message-passing overhead and reduces the operation time from seconds to milliseconds for large documents.
I have implemented and tested this locally with the following changes:
view, rotate, and userUnit for all pages efficiently.
This simple addition drastically improves the user experience for custom viewers dealing with long documents, allowing for instant "skeleton" rendering without blocking the UI or waiting for N asynchronous
getPage
calls.