Update test examples and document virtualized windowed fetching

glin · glin · commit 703d51aaeffb · 2026-04-13T22:14:46.000-05:00
- Make DuckDB test Rmd chunks self-contained (inline data generation)
- Add Parquet + virtual scrolling test example
- Update virtual scrolling test to use backend API
- Document pre-render jank with virtual + pagination=FALSE
- Add virtualized windowed fetching as future Phase 5
diff --git a/design/duckdb-wasm-engine/duckdb-wasm-engine-test.Rmd b/design/duckdb-wasm-engine/duckdb-wasm-engine-test.Rmd
@@ -131,6 +131,14 @@ reactable(
 ### Large dataset: 1M rows (virtualized)
 
 ```{r}
+n <- 1000000
+big_data <- data.frame(
+  id = seq_len(n),
+  value = rnorm(n),
+  category = sample(LETTERS, n, replace = TRUE),
+  score = round(runif(n, 0, 100), 2)
+)
+
 reactable(
   big_data,
   backend = backendDuckDB(),
@@ -547,6 +555,15 @@ reactable(
 ### Large multi-level grouping: 100k rows grouped by category and region
 
 ```{r}
+n <- 100000
+large_data <- data.frame(
+  id = seq_len(n),
+  category = sample(LETTERS, n, replace = TRUE),
+  region = sample(c("East", "West", "North", "South"), n, replace = TRUE),
+  value = round(rnorm(n, mean = 100, sd = 50), 2),
+  score = round(runif(n, 0, 100), 2)
+)
+
 reactable(
   large_data,
   backend = backendDuckDB(),
@@ -890,6 +907,15 @@ reactable(
 Force embedded Arrow IPC even for large data.
 
 ```{r}
+n <- 1000000
+parquet_data <- data.frame(
+  id = seq_len(n),
+  value = rnorm(n),
+  category = sample(LETTERS, n, replace = TRUE),
+  score = round(runif(n, 0, 100), 2),
+  label = paste0("item-", seq_len(n))
+)
+
 reactable(
   parquet_data,
   backend = backendDuckDB(format = "arrow"),
@@ -899,6 +925,33 @@ reactable(
 )
 ```
 
+### Parquet with virtual scrolling
+
+Unpaginated virtual scrolling with Parquet sidecar. DuckDB sends all rows in a single
+query (pageSize = null), and virtual scrolling handles rendering.
+
+```{r}
+n <- 1000000
+parquet_data <- data.frame(
+  id = seq_len(n),
+  value = rnorm(n),
+  category = sample(LETTERS, n, replace = TRUE),
+  score = round(runif(n, 0, 100), 2),
+  label = paste0("item-", seq_len(n))
+)
+
+reactable(
+  parquet_data,
+  backend = backendDuckDB(format = "parquet"),
+  pagination = FALSE,
+  virtual = TRUE,
+  height = 500,
+  sortable = TRUE,
+  filterable = TRUE,
+  searchable = TRUE
+)
+```
+
 ### Parquet in Shiny (client mode)
 
 Test that Parquet sidecar files work in a Shiny app. In Shiny, `backendDuckDB()` defaults
diff --git a/design/duckdb-wasm-engine/duckdb-wasm-engine.md b/design/duckdb-wasm-engine/duckdb-wasm-engine.md
@@ -1099,6 +1099,10 @@ and memory than it saves on queries.
 - **Debounce tuning:** The POC debounces search input at 300ms. For large datasets, increasing the debounce or
   requiring a minimum query length (3+ chars) would reduce perceived lag.
 
+### Future: virtualized windowed fetching
+
+With `virtual = TRUE, pagination = FALSE`, DuckDB currently fetches all rows at once (`pageSize: null` omits LIMIT/OFFSET). For Parquet, this means downloading the entire file over HTTP before the table renders. A future enhancement would use scroll-position-driven queries to fetch only a sliding window of rows around the viewport, leveraging Parquet HTTP range requests for efficient partial reads. See Phase 5 in `design/server-side-data/server-side-data.md` for the full plan.
+
 ## End-to-end benchmark: DuckDB vs default backend
 
 Measured in Chrome (Windows), serving rendered R Markdown documents over HTTP. Both documents use the same dataset
diff --git a/design/server-side-data/server-side-data.md b/design/server-side-data/server-side-data.md
@@ -133,37 +133,22 @@ For multi-level grouping, nested data frames contain their own `.subRows`.
 ### 1. Bug Fixes and Polish
 
 #### 1.1 Documentation Typo
-**File:** `man/reactable-server.Rd` line 52-53
 
-Current text incorrectly says:
-```
-- `reactableServerData()` should return a `resolvedData()` object.
-- `reactableServerData()` should not return any value.
-```
+~~**File:** `man/reactable-server.Rd` line 52-53~~
 
-Should be:
-```
-- `reactableServerData()` should return a `resolvedData()` object.
-- `reactableServerInit()` should not return any value.
-```
+**Done.** The Rd already correctly says `reactableServerInit()` should not return any value.
 
 #### 1.2 df Backend groupBy Bug
-**File:** `R/server-df.R`
 
-When `Reactable.toggleGroupBy()` is called via JavaScript API, the df backend returns grouped rows without the `__state` property needed for proper row identification. The V8 backend handles this correctly.
+~~**File:** `R/server-df.R`~~
 
-**Fix:** In `dfGroupBy()`, add state information to grouped rows:
-```r
-df[["__state"]] <- listSafeDataFrame(
-  id = sapply(df[[groupedColumnId]], function(x) sprintf("%s:%s", groupedColumnId, x)),
-  grouped = rep(TRUE, nrow(df))
-)
-```
+**Done.** `dfGroupBy()` now adds `__state` with `id`, `grouped`, and `subRowCount` to grouped rows.
 
 #### 1.3 Pagination Display with Empty Results
-**File:** `srcjs/Reactable.js`
 
-When server-side search returns zero results, pagination shows "1-10 of 0 rows" instead of "0-0 of 0 rows".
+~~**File:** `srcjs/Reactable.js`~~
+
+**Done.** `Pagination.js` uses `Math.min(page * pageSize + 1, rowCount)` to correctly show "0-0 of 0 rows" when `rowCount = 0`.
 
 #### 1.4 Stop Sending Unused State
 **File:** `srcjs/Reactable.js`
@@ -197,6 +182,8 @@ When server-side search returns zero results, pagination shows "1-10 of 0 rows"
 
 **Future simplification: consider removing pre-rendered first page.** The R-side pre-rendering of the first page (to avoid a blank flash while WASM loads) adds significant JS complexity: `canSkipInitialDuckDBQuery`, `duckdbQueryCount`, `stateMatchesPrerender` comparison against `defaultSorted`, the groupBy special case (pre-rendered data is flat so we must query immediately), and race conditions when users interact before DuckDB is ready. Without pre-rendering, the query effect fires unconditionally after init and the entire skip optimization disappears. The tradeoff is showing a loading/empty state during WASM init instead of instant first-page display.
 
+Pre-rendering is also problematic with **virtual scrolling + `pagination = FALSE`**: the pre-rendered `defaultPageSize` rows (e.g., 10) display immediately, then several seconds later the full dataset loads from DuckDB and the table jumps to show all rows. This creates a jarring partial-load effect. Deferring table readiness until all client-side data is fetched (showing a loading indicator instead of the partial pre-render) would give a smoother experience for this combination.
+
 Another issue: **floating point precision mismatch** between the two data paths. The pre-rendered page goes through `jsonlite::toJSON(digits = NA)` which uses C's `%.15g` format (15 significant digits), while DuckDB query results come through Arrow's `row.toJSON()` which uses JavaScript's `Number.toString()` (up to 17 significant digits for exact float64 round-trip). Since 15 significant digits isn't always enough to recover the exact float64 value, numbers with many decimal places can visibly change when the user first interacts and DuckDB takes over from the pre-rendered data. This is unsolvable without either (a) increasing jsonlite's digits to 17 for exact round-trip, (b) rounding DuckDB results to 15 significant digits to match jsonlite, or (c) removing pre-rendering so there's only one data path.
 
 **Option B: Full server-side implementation (future)**
@@ -419,6 +406,16 @@ reactableServerData.duckdb_backend <- function(
    - Document current limitation first
    - Full implementation if user demand warrants
 
+5. **Phase 5: Virtualized windowed fetching** (future)
+   - Enable `virtual = TRUE, pagination = FALSE` with DuckDB/Parquet without loading all rows at once
+   - Watch `virtualizer.range` (debounced) to detect when visible rows change
+   - Fire DuckDB queries with `LIMIT bufferSize OFFSET scrollPosition` for a sliding window (~500 rows centered on viewport)
+   - Maintain a sparse data array of length `totalRowCount` with placeholder objects for unfetched rows
+   - Show loading skeleton/shimmer for placeholder rows while data is in-flight
+   - Invalidate entire buffer on sort/filter/search and re-fetch from current scroll position
+   - Key benefit for Parquet: HTTP range requests mean DuckDB reads only the byte ranges needed, not the full file
+   - This is bidirectional infinite scroll -- the main complexity is buffer management and debouncing queries during fast scrolling
+
 ## Verification
 
 ### Manual Testing
diff --git a/design/virtual-scrolling/virtual-scrolling-test.Rmd b/design/virtual-scrolling/virtual-scrolling-test.Rmd
@@ -228,7 +228,7 @@ server <- function(input, output) {
 
     reactable(
       data,
-      server = TRUE,
+      backend = backendV8(),
       virtual = TRUE,
       height = 500,
       defaultPageSize = 1000,
@@ -276,7 +276,7 @@ Hypothetical API:
 reactable(
   # Column schema only, no data
   data.frame(id = integer(), value = numeric(), category = character()),
-  server = TRUE,
+  backend = backendV8(),
   virtual = TRUE,
   pagination = FALSE,  # No pagination - seamless scrolling
   height = 500,
diff --git a/pkgdown/_pkgdown.yml b/pkgdown/_pkgdown.yml
@@ -90,7 +90,9 @@ reference:
   contents:
   - reactable-server
   - resolvedData
+  - backendDf
   - backendDuckDB
+  - backendV8
 
 # Exclude duplicate no-static examples from search index
 search: