Skip to content

Column jank on cleaning toggle: split df_data_dict + df_display_args publish causes half-update flash #740

@paddymul

Description

@paddymul

Summary

Toggling cleaning_method (and likely other state that bumps dataframe_id) shows a transient mid-state where the widget renders new data against a stale or empty column_config, then re-renders with the correct config. Visible as a brief column flash / "half update."

Currently mitigated by a JS-side bandaid (last-known-good column_config fallback in BuckarooInfiniteWidget), but the two underlying issues should be fixed at the root.

Root cause

buckaroo/dataflow/dataflow.py:481-525 (_handle_widget_change) publishes two traitlets in sequence:

  • L499 / L503: self.df_data_dict = {...} — first comm sync
  • L525: self.df_display_args = temp_display_args — second comm sync

Each self.X = ... triggers an immediate traitlet/comm sync. anywidget's useModelState subscribes per-key (change:df_data_dict, change:df_display_args), so the JS gets two separate React renders. Between them, the active view's df_viewer_config.column_config can mismatch the data, or be empty (EMPTY_DFVIEWER_CONFIG in styling_core.py:186-189).

Compounded by BuckarooWidgetInfinite.tsx:277-286: effectiveDataframeId (the AG-Grid remount key) is derived from optimistic local state (buckaroo_state.cleaning_method), so the grid remounts the instant the user clicks the toggle — before Python has sent any reply — mounting fresh against the still-stale df_display_args.

Fix 1 — Atomic publish on the Python side

Batch the two traitlet writes so the JS receives one comm payload, fires both change: events synchronously, and React 18 auto-batches into one render.

with self.hold_sync():
    self.df_data_dict = {...}
    self.df_display_args = temp_display_args

hold_sync comes from ipywidgets.Widget. DataFlow doesn't inherit from it directly — needs a quick check whether the call needs to be hoisted to the outer BuckarooWidget (which does inherit DOMWidget), or if hold_trait_notifications on HasTraits is sufficient. The two combined fields are small; one custom message (model.send) is an option if the trait-level batching doesn't compose cleanly.

Fix 2 — Server-confirmed remount key on the JS side

Drop the optimistic-state bundling from effectiveDataframeId. Use only dataframe_id (the server-confirmed id, bumped by Python on operations / cleaning / post-processing / quick-command changes). The grid then stays mounted until Python's reply arrives, and remounts cleanly against new data + new config in the same tick.

Currently:

const effectiveDataframeId = JSON.stringify([
    dataframe_id, operations, post_processing, cleaning_method, quick_command_args
]);

Target:

const effectiveDataframeId = dataframe_id;

Trade-off: a brief perceived latency between clicking the toggle and the grid changing, versus today's instant-but-half-flash. With Fix 1 in place this latency is just the Python round-trip; without Fix 1 the in-between state is still visible.

Requires verifying that Python actually bumps dataframe_id on every row-content-changing state change (operations, post_processing, cleaning_method, quick_command_args). If any of those don't bump it today, that needs adding before this fix is safe.

Current bandaid (to be removed once 1+2 land)

BuckarooInfiniteWidget holds a ref lastGoodDfvcRef keyed by view name. If incoming column_config is empty for a view we've previously seen non-empty, the prior df_viewer_config is substituted. Logged via [bk-flash] effectiveDisplayArgs falling back to last-known-good column_config.

This handles the empty case but doesn't help the column-name-mismatch case (new data vs old config with overlapping but not identical column sets). Fixes 1 and 2 do.

Acceptance

  • Toggle cleaning_method back and forth ~5 times on a notebook df. No bk-flash log line about the last-known-good fallback firing. No visible column flash between toggles.
  • Toggle post_processing and quick_command_args (sort/search) similarly.
  • Confirm fallback ref no longer needed and remove it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions