diff --git a/docs/content-plan.md b/docs/content-plan.md
new file mode 100644
index 00000000..5d52a702
--- /dev/null
+++ b/docs/content-plan.md
@@ -0,0 +1,62 @@
+
+# Content Plan
+
+## Published (merged or ready to merge)
+
+### Dastardly DataFrame Dataset (PR #641)
+Published at `docs/source/articles/dastardly-dataframe-dataset.rst`. Covers DDD with static embeds, full dtype coverage table, weird types for pandas and polars. Includes Polars DDD (issue #622).
+
+### How types and data move from engine to browser
+Published at `docs/source/articles/types-to-display.rst`. Column renaming (a,b,c..z,aa,ab), type coercion before parquet, fastparquet encoding, base64 transport, hyparquet decode in browser, displayer/formatter dispatch. Full pipeline trace for a single cell value.
+
+### So you want to write a DataFrame viewer
+Published at `docs/source/articles/so-you-want-to-write-a-dataframe-viewer.rst`. Comparison of open source DataFrame viewers (Buckaroo, Perspective, iTables, Great Tables, DTale, Mito, Marimo, ipydatagrid, quak). Research in `~/personal/buckaroo-writing/research/`.
+
+### Why Buckaroo uses Depot for CI
+Draft at `docs/source/articles/why-depot.rst`. Depot sponsorship story. Honest benchmarking: Depot isn't measurably faster than GitHub runners (I/O-bound workload), but consistent provisioning + no minute quotas gave confidence to grow from 3 to 23 CI jobs. Pending: email to Depot CTO for input before publishing.
+
+## Planned
+
+### Static embedding improvements
+- Publish JS to CDN → reduced embed size. Talk about the journey: Jupyter → Marimo/Pyodide → static embedding → smaller static embedding
+- Page weight comparison: dbt (501KB compressed, 28MB total, 1.41s DCL), Snowflake (128kb/1.28mb/22.51mb/445ms), Databricks (127kb/797kb/313ms)
+- Customizing buckaroo via API for embeds — show styling, link to styling docs
+- Static search — maybe, take a crack at it
+- Link to the static embedding guide
+
+### Styling buckaroo chrome
+Based on https://github.com/buckaroo-data/buckaroo/pull/583
+
+### Buckaroo embedding guide
+- Why to embed buckaroo
+- Which config makes sense for you — along with data sizes reasoning
+- Customizing appearance
+- Customizing buckaroo
+
+### Embedding buckaroo for bigger data
+Parquet range queries on S3/R2 buckets. Sponsored by Cloudflare?
+
+### How I made Buckaroo fast
+The philosophy: do the right things fast, but mostly just do less. Not a performance optimization article — it's about architecture decisions that avoid work entirely.
+- Column renaming to a,b,c means shorter keys everywhere, no escaping
+- Parquet instead of JSON: moved from Python JSON serialization (the slowest part of the original render) to binary Parquet. Faster encoding, smaller payloads, type preservation for free
+- Sampling: don't process the whole DataFrame. Sample first, compute stats on the sample, display the sample. The user sees 500 rows, not 500,000
+- Summary stats: compute once, cache. Don't recompute on every view switch
+- hyparquet decodes in the browser — no round-trip to the server for data
+- LRU cache on decoded Parquet so switching between main/stats views doesn't re-decode
+- AG-Grid does the hard rendering work (virtual scrolling, column virtualization) — don't fight it, feed it clean data
+- The lesson: most "performance work" was removing unnecessary work, not optimizing hot paths
+
+### Testing Buckaroo: unit tests, integration tests, and everything in between
+How a solo developer tests a project that spans Python + TypeScript across 8 deployment environments.
+- **Python unit tests** (pytest): serialization, stats computation, type coercion, column renaming. Fast, reliable, the foundation. ~60s for the full suite
+- **JS unit tests** (vitest): component logic, displayer/formatter functions, parquet decoding. Run in Node, no browser needed
+- **Playwright integration tests** (6 suites): Storybook (component rendering), JupyterLab (full widget lifecycle), Marimo, WASM Marimo, Server (MCP/standalone), Static Embed. These catch "it works in Jupyter but is blank in Marimo" — the bugs you can't find any other way
+- **Styling screenshot comparisons**: before/after captures on every PR using Storybook + Playwright. Catches visual regressions (column width changes, color map shifts) that no unit test can detect
+- **Smoke tests**: install the wheel with each optional extras group (`[mcp]`, `[notebook]`, etc.) and verify imports work. Catches dependency conflicts
+- **MCP integration tests**: install the wheel, start the MCP server, make a `tools/call` request, verify the response includes static assets
+- **Dual dependency strategy**: run all Python tests twice — once with minimum pinned versions, once with `--resolution=highest`. Catches pandas/polars/pyarrow compatibility issues before users do
+- **The DDD as a test suite**: the Dastardly DataFrame Dataset isn't just documentation — each weird DataFrame exercises edge cases through the full serialization → display pipeline
+- What I don't test: VSCode, Google Colab (no headless automation), visual pixel-perfect matching (too brittle)
+- The lesson: integration tests are worth the CI investment. Most real bugs are at boundaries (Python→Parquet→JS→AG-Grid), not inside any one layer
+
diff --git a/docs/source/articles/buckaroo-compare.rst b/docs/source/articles/buckaroo-compare.rst
new file mode 100644
index 00000000..04874b47
--- /dev/null
+++ b/docs/source/articles/buckaroo-compare.rst
@@ -0,0 +1,208 @@
+BuckarooCompare — Diff Your DataFrames
+=======================================
+
+When you change a pipeline, how do you know what changed in the output? When
+you migrate a table from one database to another, how do you verify the data
+matches? When two teams produce different versions of the same report, where
+are the differences?
+
+You diff them. But ``df1.equals(df2)`` returns a single boolean, and
+``df1.compare(df2)`` only works if the DataFrames have identical shapes and
+indexes. Real-world comparisons are messier: rows may be reordered, columns
+may be added or removed, and the join key might not be the index.
+
+Buckaroo's ``col_join_dfs`` function handles all of this and renders the
+result as a color-coded interactive table where differences jump out
+visually.
+
+
+Quick start
+-----------
+
+.. code-block:: python
+
+    from buckaroo.compare import col_join_dfs
+    import pandas as pd
+
+    df1 = pd.DataFrame({
+        'id': [1, 2, 3, 4],
+        'name': ['Alice', 'Bob', 'Charlie', 'Diana'],
+        'score': [88.5, 92.1, 75.3, 96.7],
+    })
+
+    df2 = pd.DataFrame({
+        'id': [1, 2, 3, 5],
+        'name': ['Alice', 'Robert', 'Charlie', 'Eve'],
+        'score': [88.5, 92.1, 80.0, 81.0],
+    })
+
+    merged_df, column_config_overrides, eqs = col_join_dfs(
+        df1, df2,
+        join_columns=['id'],
+        how='outer'
+    )
+
+The function returns three things:
+
+1. **merged_df**: The joined DataFrame with all rows from both inputs,
+   plus hidden metadata columns for diff state
+2. **column_config_overrides**: A dict of buckaroo styling config that
+   color-codes each cell based on whether it matches, differs, or is
+   missing from one side
+3. **eqs**: A summary dict showing the diff count per column — how many
+   rows differ for each column
+
+
+How the diff works
+------------------
+
+``col_join_dfs`` performs a ``pd.merge`` on the join columns, then for each
+data column:
+
+- Creates a hidden ``{col}|df2`` column with the df2 value
+- Creates a hidden ``{col}|eq`` column encoding the combined state:
+  is the row in df1 only, df2 only, both-and-matching, or both-and-different?
+- Generates a ``color_map_config`` that maps these states to colors
+
+The color scheme:
+
+.. list-table::
+   :header-rows: 1
+
+   * - State
+     - Color
+     - Meaning
+   * - df1 only
+     - Pink
+     - Row exists in df1 but not df2
+   * - df2 only
+     - Green
+     - Row exists in df2 but not df1
+   * - Match
+     - Light blue
+     - Row in both, values identical
+   * - Diff
+     - Dark blue
+     - Row in both, values differ
+
+Join key columns are highlighted in purple so you can immediately see what
+was used for matching.
+
+
+The eqs summary
+---------------
+
+The third return value tells you at a glance where the differences are:
+
+.. code-block:: python
+
+    >>> eqs
+    {
+        'id': {'diff_count': 'join_key'},
+        'name': {'diff_count': 2},      # 2 rows differ
+        'score': {'diff_count': 1},      # 1 row differs
+    }
+
+Special values:
+
+- ``"join_key"`` — this column was used for matching, not compared
+- ``"df_1"`` — column only exists in df1
+- ``"df_2"`` — column only exists in df2
+- An integer — number of rows where values differ
+
+
+Using it with the server
+------------------------
+
+The buckaroo server exposes a ``/load_compare`` endpoint that loads two
+files, runs the diff, and pushes the styled result to any connected browser:
+
+.. code-block:: bash
+
+    curl -X POST http://localhost:8888/load_compare \
+      -H "Content-Type: application/json" \
+      -d '{
+        "session": "my-session",
+        "path1": "/data/report_v1.csv",
+        "path2": "/data/report_v2.csv",
+        "join_columns": ["id"],
+        "how": "outer"
+      }'
+
+The response includes the diff summary:
+
+.. code-block:: json
+
+    {
+      "session": "my-session",
+      "rows": 5,
+      "columns": ["id", "name", "score"],
+      "eqs": {
+        "id": {"diff_count": "join_key"},
+        "name": {"diff_count": 2},
+        "score": {"diff_count": 1}
+      }
+    }
+
+The browser view updates immediately with the color-coded merged table.
+Hover over any differing cell to see the df2 value in a tooltip.
+
+
+Multi-column joins
+------------------
+
+.. code-block:: python
+
+    merged_df, overrides, eqs = col_join_dfs(
+        df1, df2,
+        join_columns=['region', 'date'],
+        how='inner'
+    )
+
+Composite join keys work naturally. Both ``region`` and ``date`` will be
+highlighted in purple.
+
+
+Use cases
+---------
+
+**Data migration validation**
+    Migrating from Postgres to Snowflake? Export both tables, diff them.
+    The color coding immediately shows which rows are missing and which
+    values changed.
+
+**Pipeline output comparison**
+    Changed a transform? Diff the before and after. The ``eqs`` summary
+    tells you exactly which columns were affected and by how many rows.
+
+**A/B test result inspection**
+    Compare experiment vs control DataFrames on a user ID join key. See
+    which metrics actually differ.
+
+**Schema evolution**
+    When df2 has columns that df1 doesn't (or vice versa), those columns
+    are marked as ``"df_1"`` or ``"df_2"`` in the eqs summary, so you
+    can see schema changes alongside data changes.
+
+
+Integration with datacompy
+--------------------------
+
+The ``docs/example-notebooks/datacompy_app.py`` example shows how to use
+`datacompy <https://github.com/capitalone/datacompy>`_ for metadata-rich
+comparison (column matching stats, row-level match rates) while using
+buckaroo for the visual rendering.
+
+This gives you the best of both: datacompy's statistical summary plus
+buckaroo's interactive, color-coded table view.
+
+
+Limitations
+-----------
+
+- Join columns must be unique in each DataFrame (no many-to-many joins).
+  If duplicates are detected, ``col_join_dfs`` raises a ``ValueError``.
+- Column names cannot contain ``|df2`` or ``__buckaroo_merge`` (these are
+  used internally).
+- Very large DataFrames (>100K rows) will work but the browser may be slow
+  to render the full color-coded table.
diff --git a/docs/source/articles/embedding-guide.rst b/docs/source/articles/embedding-guide.rst
new file mode 100644
index 00000000..2cba2d0d
--- /dev/null
+++ b/docs/source/articles/embedding-guide.rst
@@ -0,0 +1,259 @@
+Buckaroo Embedding Guide
+========================
+
+This guide covers everything you need to embed interactive buckaroo tables
+in your own applications, documentation, and reports.
+
+
+Why embed
+---------
+
+- **Share DataFrames without Jupyter**: Send a colleague an HTML file they
+  can open in any browser. No Python install required.
+- **Build data apps**: Integrate the buckaroo viewer into React dashboards,
+  internal tools, or customer-facing data products.
+- **Static reports**: Generate HTML reports from your pipeline that include
+  interactive, sortable tables with summary statistics.
+- **Documentation**: Embed live data tables in your docs site (Sphinx,
+  MkDocs, or plain HTML).
+
+
+Choose your embedding mode
+--------------------------
+
+Buckaroo offers two static embed modes and one live widget mode:
+
+``embed_type="DFViewer"`` — Lightweight table
+    Just the data grid with sortable columns, summary stats pinned at the
+    bottom, histograms, and type-aware formatting. Smaller payload. Best
+    for documentation, reports, and sharing.
+
+``embed_type="Buckaroo"`` — Full experience
+    Everything in DFViewer plus the display switcher bar, multiple computed
+    views, and the interactive analysis pipeline. Larger payload. Best for
+    data exploration and internal tools.
+
+**anywidget** — Live in notebooks
+    The ``BuckarooWidget`` runs inside Jupyter, Marimo, VS Code notebooks,
+    and Google Colab via anywidget. Full interactivity including the command
+    UI for data cleaning operations. Requires a running Python kernel.
+
+For most embedding use cases, start with ``DFViewer``.
+
+
+Data size guidelines
+~~~~~~~~~~~~~~~~~~~~
+
+.. list-table::
+   :header-rows: 1
+
+   * - Row count
+     - Recommended approach
+   * - < 1,000 rows
+     - Inline static embed. JSON payload is small (~10-50 KB).
+   * - 1,000 - 100,000 rows
+     - Static embed still works. Parquet encoding keeps payload
+       compact (50-500 KB). Consider sampling for faster page load.
+   * - > 100,000 rows
+     - Host data separately. Use Parquet range queries on S3/R2 to
+       fetch only the visible rows and columns.
+
+
+Generate a static embed
+-----------------------
+
+.. code-block:: python
+
+    from buckaroo.artifact import to_html
+    import pandas as pd
+
+    df = pd.read_csv('my_data.csv')
+    html = to_html(df, title="My Data", embed_type="DFViewer")
+
+    with open('my-data.html', 'w') as f:
+        f.write(html)
+
+The HTML file references ``static-embed.js`` and ``static-embed.css``.
+These are shipped in the buckaroo wheel under ``buckaroo/static/``.
+Copy them alongside your generated HTML:
+
+.. code-block:: bash
+
+    STATIC=$(python -c "from pathlib import Path; import buckaroo; print(Path(buckaroo.__file__).parent / 'static')")
+    cp "$STATIC/static-embed.js" "$STATIC/static-embed.css" ./
+
+**With polars:**
+
+.. code-block:: python
+
+    import polars as pl
+    from buckaroo.artifact import to_html
+
+    df = pl.read_parquet('my_data.parquet')
+    html = to_html(df, title="Polars Data")
+
+``to_html()`` auto-detects polars DataFrames and uses the polars analysis
+pipeline.
+
+**From a file path:**
+
+.. code-block:: python
+
+    from buckaroo.artifact import to_html
+
+    # Reads CSV, Parquet, JSON, or JSONL automatically
+    html = to_html('/path/to/data.parquet', title="Direct from file")
+
+
+Customizing appearance
+----------------------
+
+Column config overrides
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Pass ``column_config_overrides`` to control per-column display:
+
+.. code-block:: python
+
+    html = to_html(df, column_config_overrides={
+        'revenue': {
+            'color_map_config': {
+                'color_rule': 'color_from_column',
+                'map_name': 'RdYlGn',
+            }
+        },
+        'join_key': {
+            'color_map_config': {
+                'color_rule': 'color_static',
+                'color': '#6c5fc7',
+            }
+        }
+    })
+
+Available color rules:
+
+- ``color_from_column``: Color cells based on their value using a named
+  colormap (e.g., ``RdYlGn``, ``Blues``, ``Viridis``)
+- ``color_categorical``: Map categorical values to a list of colors
+- ``color_static``: Constant background color for every cell in the column
+
+Tooltips
+~~~~~~~~
+
+Show the value of another column on hover:
+
+.. code-block:: python
+
+    column_config_overrides={
+        'name': {
+            'tooltip_config': {
+                'tooltip_type': 'simple',
+                'val_column': 'full_name',
+            }
+        }
+    }
+
+
+Analysis classes
+~~~~~~~~~~~~~~~~
+
+Control which summary statistics are computed:
+
+.. code-block:: python
+
+    from buckaroo.artifact import to_html
+    from buckaroo.pluggable_analysis_framework.analysis_management import (
+        ColAnalysis,
+    )
+
+    # Use extra_analysis_klasses to add custom stats
+    # Use analysis_klasses to replace the default set
+    html = to_html(df,
+                   extra_analysis_klasses=[MyCustomAnalysis],
+                   embed_type="Buckaroo")
+
+See :doc:`pluggable` for details on writing custom analysis classes.
+
+
+Pinned rows
+~~~~~~~~~~~
+
+Add custom pinned rows (shown at the bottom of the table):
+
+.. code-block:: python
+
+    html = to_html(df,
+                   extra_pinned_rows=[
+                       {'index': 'target', 'a': 100, 'b': 200},
+                   ])
+
+
+Integration patterns
+--------------------
+
+Static HTML file
+~~~~~~~~~~~~~~~~
+
+The simplest approach. Generate the HTML, copy ``static-embed.js`` and
+``static-embed.css`` next to it, and open in a browser or serve from any
+static file host.
+
+.. code-block:: bash
+
+    cp $(python -c "import buckaroo; print(buckaroo.__path__[0])")/static/static-embed.* ./
+    open my-data.html
+
+React component
+~~~~~~~~~~~~~~~
+
+For deeper integration, import the React components directly from
+``buckaroo-js-core``:
+
+.. code-block:: bash
+
+    npm install buckaroo-js-core
+
+.. code-block:: typescript
+
+    import { DFViewer } from 'buckaroo-js-core';
+
+    function MyTable({ data, config, summaryStats }) {
+      return (
+        <DFViewer
+          df_data={data}
+          df_viewer_config={config}
+          summary_stats_data={summaryStats}
+        />
+      );
+    }
+
+Sphinx / ReadTheDocs
+~~~~~~~~~~~~~~~~~~~~~
+
+Use a ``raw`` directive to embed an iframe pointing to a pre-generated
+static HTML file:
+
+.. code-block:: rst
+
+    .. raw:: html
+
+       <iframe src="_static/my-table.html"
+               style="width:100%; height:400px; border:none;">
+       </iframe>
+
+Generate the HTML with the ``to_html()`` function and place it in your
+Sphinx ``_static`` directory.
+
+
+What's included in the bundle
+-----------------------------
+
+The ``static-embed.js`` bundle (1.3 MB minified) includes:
+
+- React 18 + ReactDOM
+- AG-Grid Community v33 (table rendering)
+- hyparquet (Parquet decoding in the browser)
+- recharts (histogram rendering)
+- lodash-es (utility functions, tree-shaken)
+
+The bundle is built with esbuild and shipped as an ES module.
diff --git a/docs/source/articles/so-you-want-to-write-a-dataframe-viewer.rst b/docs/source/articles/so-you-want-to-write-a-dataframe-viewer.rst
new file mode 100644
index 00000000..a7cda8bd
--- /dev/null
+++ b/docs/source/articles/so-you-want-to-write-a-dataframe-viewer.rst
@@ -0,0 +1,325 @@
+So You Want to Write a DataFrame Viewer
+========================================
+
+You want to write a better viewer for tabular data. That's great, the
+world needs better interfaces in this space, and there is so much that
+can be improved on. Here are some of the biggest design decisions and
+their potential side effects, along with projects that chose different
+routes. There are many closed source data table viewers with various
+levels of capability. It seems like every new notebook hosting
+environment feels compelled to build their own dataframe viewer. In this
+article I will draw on my own experience creating
+`Buckaroo <https://github.com/buckaroo-data/buckaroo>`__, as well as
+observations from looking at popular open source table viewers like
+`Perspective <https://github.com/finos/perspective>`__,
+`Great Tables <https://github.com/posit-dev/great-tables>`__,
+`DTale <https://github.com/man-group/dtale>`__,
+`Marimo <https://github.com/marimo-team/marimo>`__,
+`iTables <https://github.com/mwouts/itables>`__,
+`ipydatagrid <https://github.com/jupyter-widgets/ipydatagrid>`__,
+`Panel Tabulator <https://panel.holoviz.org/reference/widgets/Tabulator.html>`__,
+and Streamlit's
+`st.dataframe <https://docs.streamlit.io/develop/api-reference/data/st.dataframe>`__.
+
+I have run into each one of these issues while building buckaroo.
+
+
+Use-case questions
+-------------------
+
+Before starting, think about what use case you are looking to solve for.
+Are you trying to build tables for relatively static display (PDF to
+Huggingface data browser)? Do you want to serve dashboards (a limited
+set of interactions with users willing to customize heavily and
+specifically for styling)? Do you want to facilitate interactive use in
+an IDE like environment (VSCode notebooks, some internal data bench)? Do
+you want to work in notebook environments? What size datasets do you
+expect your users to work with? What performance expectations do your
+users have? Do you want users to be able to customize the experience?
+Without writing JS? Do you want to deal with streaming data? Do you want
+to allow editing of data?
+
+
+Processing: server-side or browser-based
+-----------------------------------------
+
+The biggest decision to make when building a table viewer is what to do
+with the data. Do you want the entire dataset to reside in the browser
+or do you want to leave it on the server and page the currently viewed
+section back and forth to the browser. Both approaches have their place.
+
+Browser based approaches are much cheaper to serve at scale. Browsers
+have improved significantly in the past decade and there are many
+applications that put over a gigabyte of data into the browser with no
+ill effects. Further with HTTP range requests, the full dataset doesn't
+even have to be loaded at once. Apache Arrow and Parquet make this
+approach more performant and attractive. This approach scales with little
+cost because S3 and Cloudflare are incredibly performant and inexpensive
+compared to spinning up server infrastructure.
+
+Browser based approaches fall down with datasets over 1 GB. Additionally
+1 GB is about the total limit of memory use that you want a single page
+to have, so if you have multiple dataframes that you want to display
+simultaneously, keep that in mind. Finally, browser based solutions
+require using browser based analytics engines instead of familiar tools
+like pandas and polars. Apache Arrow is packageable into a WebAssembly
+module, but packaging it into a JS build is tricky.
+
+Server based solutions are more familiar as traditional web apps,
+sometimes with some twists. Server based solutions excel for very large
+datasets that are backed by analytics engines. If your 10 GB table is
+already in a relational database, let the database do the sorting, and
+only send over the limited rows that are being displayed. Server based
+solutions with persistent connections also allow many more tables to be
+displayed simultaneously while limiting browser memory usage. If you have
+infrastructure built around analytics pipelines in traditional
+environments, server side solutions are often the better way to go.
+Sorting and histograms in particular can be hard to implement identically
+in different numerical engines.
+
+The downsides of a server based approach are that you always need to have
+the server running to make the table work. At the small end this means
+you can't simply host an artifact with your table in it. You can't serve
+a Jupyter notebook statically in a GitHub repo. If you intend to host an
+analytics system with your table, you now need server infrastructure to
+back it. Server infrastructure connected to a relational database or
+data warehouse is one level of expense — it is even more expensive (in
+terms of memory and CPU) to host Python-based analytics server-side.
+
+
+Serializing data
+-----------------
+
+For buckaroo, serializing data to JSON was the slowest part of the
+initial render (not true anymore, because of better lazy fetching).
+Serializing dataframes is hard. There are multiple numerical Python
+(Arrow, computation) concepts that don't have direct equivalents in JS
+or JSON. Notably infinity and NaN aren't valid in JSON. Furthermore
+datetime handling across JSON requires a processing layer — you will
+either encode strings or millisecond offsets, either requiring a metadata
+layer that can then be interpreted. Then there are common Python
+datatypes like timedelta that have no native JS equivalent.
+
+Next we get to the difficulty of serializing pandas data structures.
+Pandas indexes which apply to rows and columns occur in a variety of
+formats. Multi-level indexes can be challenging for display — they have
+to be special-cased in your display code regardless of how they are
+serialized. Pandas columns can also be named in a variety of ways,
+including as numerics or strings.
+
+These different dataframe configurations are challenging because they are
+hard to completely anticipate. In my experience, when a user constructed
+a dataframe with an unexpected structure, it was one of the most likely
+things to blow up buckaroo with a JS typing error. There were also
+exceptions thrown through most of the pandas processing code.
+
+Polars is a bit easier in this regard. Polars eschews having an index.
+
+Many of these issues exist when serializing to a binary format like
+Feather or Parquet, but are a bit different. With Feather/Parquet, make
+sure Python objects and lists serialize properly. Also if you want a
+single-file static HTML export to work, you will need to base64 encode
+the binary data. True binary-to-binary transfer requires a network
+connection.
+
+
+The table viewer component
+---------------------------
+
+There are many table components, so much so that there is a site
+dedicated to tracking their popularity. Increasing in complexity you
+have everything from static HTML, to jQuery-based libraries, to modern
+table grids, to AG-Grid, to extreme custom-coded frontend libraries.
+HTML-based tables allow simple customizability along with a great story
+for static export to the widest list of targets. jQuery-based libraries
+(limited table rows, pagination) are relatively simple to use and limit
+complexity — previously they were much easier to package into the Jupyter
+frontend environment than full JS build chains.
+
+Then there are modern table libraries that aren't AG-Grid.
+`React-data-grid <https://github.com/adazzle/react-data-grid>`_,
+angular-grid,
+`tanstack-table <https://github.com/TanStack/table>`_,
+`handsome-table <https://github.com/nicenemo/handsome-table>`_.
+These libraries might be familiar. They have a straightforward licensing
+story. They also tend to have rough edges, limited adoption, and they
+tend to be abandoned. I haven't investigated these packages as much.
+
+Next up is `AG-Grid <https://github.com/ag-grid/ag-grid>`_. AG-Grid is
+the reliable gold standard for tables, under active development for over
+a decade. AG-Grid has a full commercial company behind it, along with a
+permissively licensed community edition. From my experience they haven't
+kneecapped the community edition in favor of the commercial edition, and
+aim to have the community edition as the best free table widget on the
+market. The tool is extensively documented with working examples. The
+company is completely unresponsive to bug reports from non-paying users
+in my experience. I chose AG-Grid after listening to
+`an interview with their founder
+<https://topenddevs.com/podcasts/javascript-jabber/episodes/ag-grid-with-niall-crosby-jsj-412>`_
+on the JS Jabber podcast.
+
+Then there are custom table widgets like
+`Perspective <https://github.com/finos/perspective>`__,
+`glide-data-grid <https://github.com/glideapps/glide-data-grid>`__,
+and whatever you cooked up yourself. Perspective has a very impressive
+table, and I suspect it has better performance than AG-Grid. It is
+minimally documented and doesn't have the wide community adoption that
+generates Stack Overflow guidance. glide-data-grid is an impressive
+piece of software, rendering to canvas. It is solo-maintained by its
+creator at Glide Apps — actively developed but quietly, with Streamlit
+as its biggest downstream consumer.
+
+If you are writing your own table, congrats. You will have ultimate
+control over your user experience. You won't have to worry about
+dependencies on ``isEven`` or other npm trash. You will have a very
+complex core piece to maintain. At a minimum I'd recommend thoroughly
+investigating other widgets to see how they approached problems.
+
+
+The notebook environment
+-------------------------
+
+There are many different notebook environments. Jupyter Notebook, Google
+Colab, VSCode notebooks, classic notebooks (before Notebook 7.0),
+Marimo, Jupyter running on WASM (JupyterLite). All have slight
+differences that become especially significant for frontend code.
+Styling works differently, loading JavaScript is a bit different.
+`Anywidget <https://anywidget.dev/>`_ was developed to make all of this
+easier, and it does. Before anywidget, this section would have been much
+longer.
+
+Even determining what environment you are running in is challenging.
+This will come up when users file bugs. `widget_utils.py
+<https://github.com/paddymul/buckaroo/blob/main/buckaroo/widget_utils.py#L139-L169>`_
+is my function for determining which Jupyter environment I'm running in.
+
+
+Other questions
+----------------
+
+**Do you want to enable editing tables?** It isn't too challenging to
+enable frontend edits to modify the core dataframe of a table. But then
+what? For a full fledged application, you have a bunch of options. In the
+Jupyter notebook, you don't have many good options. Accessing widget
+state in a Jupyter notebook is possible, but it isn't obvious. Jupyter
+notebooks also make it easy to inadvertently rerun a cell — which would
+cause your user to lose all edits — a very frustrating experience.
+
+**What about events and callbacks?** Adding click handling events plumbed
+through to Python is an attractive option. But now your users have to
+make sure they don't have cycles in the event handlers. This is another
+place where building a tool for a Jupyter widget is different than
+building a tool for a framework or dashboard.
+
+
+Conclusion
+-----------
+
+I'm not suggesting that you avoid creating a table for the Jupyter
+environment. I am suggesting that you understand how broad a task it is,
+and the ways it could fail.
+
+
+Comparison of open source DataFrame viewers
+---------------------------------------------
+
+.. list-table::
+   :header-rows: 1
+   :widths: 15 12 10 10 12 10 15 12
+
+   * - Name
+     - Server / Browser
+     - JSON / Numeric
+     - Static Export
+     - Jupyter Compatible
+     - Dynamic
+     - Table Viewer
+     - Built on Anywidget?
+   * - `Buckaroo <https://github.com/buckaroo-data/buckaroo>`_
+     - Server
+     - Numeric
+     - Yes
+     - Yes
+     - Yes
+     - AG-Grid
+     - Yes
+   * - `ipydatagrid <https://github.com/jupyter-widgets/ipydatagrid>`_
+     - Server
+     - JSON
+     - No
+     - Yes
+     - Yes
+     - Lumino DataGrid (canvas)
+     - No
+   * - `Perspective <https://github.com/finos/perspective>`_
+     - Both
+     - Arrow
+     - Yes
+     - Yes
+     - Yes
+     - Custom
+     - No
+   * - `iTables <https://github.com/mwouts/itables>`_
+     - Browser
+     - JSON
+     - Yes
+     - Yes
+     - No
+     - datatables (jQuery based)
+     - Optional
+   * - `Great Tables <https://github.com/posit-dev/great-tables>`_
+     - Browser
+     - HTML
+     - Yes
+     - Yes
+     - No
+     - HTML
+     - No
+   * - `DTale <https://github.com/man-group/dtale>`_
+     - Server
+     - JSON
+     - No
+     - Yes
+     - Yes
+     - react-virtualized
+     - No
+   * - `Mito <https://github.com/mito-ds/mito>`_
+     - Server
+     - JSON
+     - No
+     - Yes
+     - Yes
+     - Endo (custom)
+     - No
+   * - `Marimo <https://github.com/marimo-team/marimo>`_
+     - Server
+     - JSON
+     - Yes
+     - No
+     - Yes
+     - tanstack-table
+     - No
+   * - `Panel Tabulator <https://github.com/holoviz/panel>`_
+     - Both
+     - JSON
+     - Yes
+     - Yes
+     - Yes
+     - Tabulator.js
+     - No
+   * - `Streamlit <https://github.com/streamlit/streamlit>`_
+     - Server
+     - Arrow
+     - No
+     - No
+     - Yes
+     - glide-data-grid (canvas)
+     - No
+   * - `quak <https://github.com/manzt/quak>`_
+     - Server
+     - Arrow
+     - No
+     - Yes
+     - Yes
+     - Custom HTML
+     - Yes
diff --git a/docs/source/articles/static-embedding.rst b/docs/source/articles/static-embedding.rst
new file mode 100644
index 00000000..5a95bc82
--- /dev/null
+++ b/docs/source/articles/static-embedding.rst
@@ -0,0 +1,180 @@
+Static Embedding & the Incredible Shrinking Widget
+====================================================
+
+Buckaroo started as a Jupyter widget. You had to install Python, install
+Jupyter, install buckaroo, start a kernel, and run a cell — just to see a
+table. Then came Marimo and Pyodide, which cut out the kernel but still
+needed a Python runtime in the browser.
+
+Now there's a third option: **static embedding**. A single HTML file that
+renders a fully interactive buckaroo table with no server, no kernel, no
+Python runtime. Just a browser.
+
+How it works
+------------
+
+.. code-block:: python
+
+    from buckaroo.artifact import to_html
+    import pandas as pd
+
+    df = pd.read_csv('sales.csv')
+    html = to_html(df, title="Sales Data", embed_type="DFViewer")
+
+    with open('sales.html', 'w') as f:
+        f.write(html)
+
+That's it. ``to_html()`` does the following:
+
+1. Runs the buckaroo analysis pipeline on the DataFrame — computing dtypes,
+   summary stats, histograms, column configs
+2. Serializes the data to **base64-encoded Parquet** (much more compact than
+   JSON, especially for numeric columns)
+3. Wraps everything in an HTML template that references ``static-embed.js``
+   and ``static-embed.css``
+
+The resulting HTML is self-describing. The JS bundle reads the embedded JSON,
+decodes the Parquet payload using `hyparquet <https://github.com/hyparam/hyparquet>`_,
+and renders the table with AG-Grid — all client-side.
+
+Two embedding modes
+-------------------
+
+``embed_type="DFViewer"`` (default)
+    Lightweight table viewer with summary stats pinned at the bottom.
+    Includes dtypes, histograms, and basic statistics. Smaller payload.
+
+``embed_type="Buckaroo"``
+    The full buckaroo experience: display switcher bar, multiple computed
+    views (main data, summary stats, other analysis outputs), and the
+    interactive analysis pipeline UI. Larger payload but more powerful.
+
+For most documentation and sharing use cases, ``DFViewer`` is the right
+choice.
+
+
+Bundle size
+-----------
+
+The ``static-embed.js`` bundle is currently **1.3 MB** (minified). This
+includes React, AG-Grid, hyparquet, recharts (for histograms), and lodash-es.
+
+How does this compare to the data industry?
+
+========================== ==================
+Site                       Total page weight
+========================== ==================
+MongoDB                    11.5 MB
+Confluent                  10.7 MB
+Snowflake                  8.4 MB
+Elastic                    6.1 MB
+dbt Labs                   5.0 MB
+Fivetran                   3.4 MB
+Datadog                    2.3 MB
+Palantir                   2.0 MB
+Databricks                 1.6 MB
+**Buckaroo static embed**  **~1.3 MB + data**
+========================== ==================
+
+Confluent ships 9.2 MB of JavaScript to show you a marketing page. MongoDB
+loads a 1.7 MB Optimizely tracking script before you see a single word of
+content. Buckaroo delivers an interactive data viewer — with histograms,
+sortable columns, summary stats, and type-aware formatting — in less than
+Palantir's homepage JavaScript alone.
+
+And that 1.3 MB includes the *viewer itself*. Your data is on top of that,
+but Parquet-encoded data is compact: a 10,000-row DataFrame with 10 columns
+typically adds 50-200 KB depending on column types.
+
+
+What we did to get here
+-----------------------
+
+Recent releases shipped several size optimizations:
+
+**lodash → lodash-es** (`#624 <https://github.com/buckaroo-data/buckaroo/pull/624>`_)
+    Migrated from the CommonJS lodash bundle (which includes every function)
+    to lodash-es, which is tree-shakeable. Only the functions actually used
+    end up in the bundle.
+
+**AG Grid v32 → v33** (`#625 <https://github.com/buckaroo-data/buckaroo/pull/625>`_)
+    AG Grid v33 unified its package structure. Instead of importing from
+    multiple packages (``@ag-grid-community/core``, ``@ag-grid-community/client-side-row-model``,
+    etc.), there's now a single ``ag-grid-community`` package with module
+    registration. This lets the bundler do a single pass of tree-shaking
+    instead of trying to deduplicate across packages.
+
+**Minification** (`#624 <https://github.com/buckaroo-data/buckaroo/pull/624>`_)
+    The ``widget.js`` and ``static-embed.js`` bundles are now minified with
+    esbuild. Previously they shipped unminified.
+
+**Parquet encoding**
+    Switching from JSON arrays to Parquet for the data payload was itself
+    a size win. A DataFrame with 1000 rows of integers takes ~4 KB in
+    Parquet vs ~12 KB in JSON. The savings compound with row count.
+
+
+What's next: CDN-hosted viewer
+------------------------------
+
+Today, every static embed includes the full 1.3 MB viewer bundle. If you
+generate 10 pages, you serve 13 MB of identical JavaScript.
+
+The next step is publishing ``static-embed.js`` to a CDN (e.g., jsDelivr or
+a Cloudflare R2 bucket). Each embed page would reference the CDN URL instead
+of a local file. The per-page payload drops to just the data — typically
+under 200 KB.
+
+This also opens the door to embedding buckaroo tables directly in
+GitHub READMEs (via ``<img>`` or GitHub Pages), documentation sites, and
+email reports.
+
+
+For larger data: Parquet range queries
+--------------------------------------
+
+Static embeds work great for data that fits in a single HTML file — up to
+about 100K rows before the file gets unwieldy. Beyond that, the data should
+live separately.
+
+Parquet files are designed for partial reads. The file footer contains a
+directory of column chunks with byte offsets. A client can fetch just the
+columns and row groups it needs using HTTP range requests — no server
+required, just a file on object storage (S3, Cloudflare R2, GCS).
+
+This is the subject of a future post, but the architecture looks like:
+
+1. Parquet file on a private R2 bucket
+2. Cloudflare Worker generates a time-limited presigned URL
+3. Browser-side buckaroo fetches column chunks via ``Range`` headers
+4. Data never flows through your server
+
+See the content plan for details.
+
+
+Try it
+------
+
+.. code-block:: bash
+
+    pip install buckaroo
+
+.. code-block:: python
+
+    from buckaroo.artifact import to_html
+    import pandas as pd
+
+    # Any DataFrame works
+    df = pd.read_csv('your_data.csv')
+    html = to_html(df, title="My Data")
+
+    with open('my-data.html', 'w') as f:
+        f.write(html)
+
+    # Full buckaroo experience (larger bundle, more features)
+    html_full = to_html(df, title="My Data", embed_type="Buckaroo")
+
+The generated HTML references ``static-embed.js`` and ``static-embed.css``
+which are included in the ``buckaroo`` Python package under
+``buckaroo/static/``. Copy those files alongside your HTML, or serve them
+from a web server.
diff --git a/docs/source/articles/types-to-display.rst b/docs/source/articles/types-to-display.rst
new file mode 100644
index 00000000..2925c171
--- /dev/null
+++ b/docs/source/articles/types-to-display.rst
@@ -0,0 +1,334 @@
+How Types and Data Move from Engine to Browser
+================================================
+
+You have a DataFrame in Python. Moments later it's rendered in a
+browser — scrollable, formatted, with histograms in the summary row.
+What happened in between?
+
+This article traces the full path: column renaming, type coercion,
+Parquet encoding, base64 transport, hyparquet decoding, and finally the
+displayer/formatter system that turns raw values into what you see on
+screen.
+
+
+Column renaming: why everything becomes ``a, b, c``
+-----------------------------------------------------
+
+The very first thing buckaroo does when serializing a DataFrame is
+rename every column. The original column ``"revenue"`` becomes ``a``.
+``"cost"`` becomes ``b``. The 27th column becomes ``aa``, then ``ab``,
+``ac``, and so on — base-26 using lowercase ASCII.
+
+Why? Two reasons:
+
+1. **Column names can be anything.** Tuples (from MultiIndex), integers,
+   strings with spaces and special characters, even a column literally
+   called ``"index"``. Parquet column names must be strings. AG-Grid
+   field names should be simple identifiers. Renaming to ``a, b, c``
+   sidesteps every edge case at once.
+
+2. **Collision avoidance.** When a DataFrame has a column named
+   ``"index"`` and we need to serialize the actual index as a column
+   too, there's a name collision. Renaming to short opaque names means
+   the index columns (``index``, ``index_a``, ``index_b`` for
+   MultiIndex levels) never collide with data columns.
+
+The original name is preserved in the ``column_config`` that travels
+alongside the data. On the JS side, each column's ``header_name``
+(or ``col_path`` for MultiIndex) tells AG-Grid what to display in the
+header. The user never sees ``a, b, c`` — they see the real names.
+
+.. code-block:: python
+
+    # In styling_core.py — fix_column_config maps col→header_name
+    base_cc['col_name'] = col        # "a"
+    base_cc['header_name'] = str(orig_col_name)  # "revenue"
+
+
+Cleaning before serialization
+------------------------------
+
+Python's type system is richer than what Parquet (or JSON) can express
+directly. Before writing to Parquet, buckaroo coerces the awkward types:
+
+.. list-table::
+   :header-rows: 1
+   :widths: 30 30 40
+
+   * - Python type
+     - Becomes
+     - Why
+   * - ``pd.Period`` (e.g. "2021-01")
+     - ``str``
+     - Parquet has no period type
+   * - ``pd.Interval`` (e.g. ``(0, 1]``)
+     - ``str``
+     - Parquet has no interval type
+   * - ``pd.Timedelta``
+     - ``str`` (e.g. "1 days 02:03:04")
+     - fastparquet can't encode timedeltas
+   * - ``bytes`` (e.g. from ``pl.Binary``)
+     - hex string (e.g. ``"68656c6c6f"``)
+     - Parquet object columns need strings
+   * - PyArrow-backed strings
+     - ``object`` dtype
+     - fastparquet needs object, not ArrowDtype
+   * - Timezone-naive datetimes
+     - UTC datetimes
+     - Avoids ambiguous serialization
+
+For the main DataFrame, this happens in ``to_parquet()``
+(``serialization_utils.py``). The function also calls
+``prepare_df_for_serialization()`` which does the column rename and
+flattens MultiIndex levels into regular columns (``index_a``,
+``index_b``, etc.).
+
+Summary stats have an additional wrinkle: each column's stats dict
+contains mixed types (strings like ``"int64"`` for dtype, floats for
+mean, lists for histogram bins). fastparquet can't handle mixed-type
+columns, so ``sd_to_parquet_b64()`` JSON-encodes every cell value first,
+making each column a pure string column. The JS side knows to
+``JSON.parse`` each cell back.
+
+.. code-block:: python
+
+    # Every cell becomes a JSON string before parquet encoding
+    def _json_encode_cell(val):
+        return json.dumps(_make_json_safe(val), default=str)
+
+
+Parquet encoding and base64 transport
+--------------------------------------
+
+buckaroo uses **fastparquet** with a custom JSON codec to write the
+DataFrame to an in-memory Parquet file. Categorical and object columns
+get JSON-encoded within the Parquet file (fastparquet's ``object_encoding='json'``).
+
+The raw Parquet bytes are then base64-encoded into an ASCII string:
+
+.. code-block:: python
+
+    def to_parquet_b64(df):
+        raw_bytes = to_parquet(df)
+        return base64.b64encode(raw_bytes).decode('ascii')
+
+The result is a tagged payload:
+
+.. code-block:: json
+
+    {"format": "parquet_b64", "data": "UEFSMQ..."}
+
+This travels over the wire — via Jupyter's comm protocol, a WebSocket,
+or embedded directly in an HTML ``<script>`` tag for static embeds. The
+format tag lets the JS side know it needs to decode Parquet rather than
+expecting raw JSON arrays.
+
+Why Parquet instead of JSON? Parquet is a columnar binary format —
+it's typically 5–10x smaller than the equivalent JSON for numeric data,
+and it preserves type information (int64 vs float64 vs string) that
+JSON discards.
+
+
+hyparquet: decoding Parquet in the browser
+-------------------------------------------
+
+On the JavaScript side, `hyparquet <https://github.com/hyparam/hyparquet>`_
+is a pure-JS Parquet reader. No WASM, no server — it reads the binary
+format directly in the browser.
+
+.. code-block:: typescript
+
+    // resolveDFData.ts
+    const buf = b64ToArrayBuffer(val.data);         // base64 → ArrayBuffer
+    const metadata = parquetMetadata(buf);           // read parquet footer
+    parquetRead({
+        file: buf,
+        metadata,
+        rowFormat: 'object',
+        onComplete: (data) => {
+            result = data.map(parseParquetRow);      // JSON.parse each cell
+        },
+    });
+
+The ``parseParquetRow`` step handles two things the raw Parquet decode
+doesn't:
+
+1. **JSON-encoded cells** (from summary stats): each string cell gets
+   ``JSON.parse``'d back to its real type — numbers, arrays, objects.
+
+2. **BigInt safety**: hyparquet decodes Parquet INT64 columns as
+   JavaScript ``BigInt``. If the value fits in ``Number.MAX_SAFE_INTEGER``
+   (2^53 - 1), it's converted to a regular ``Number``. Otherwise it's
+   stringified to preserve precision — this is why
+   ``9999999999999999999`` displays correctly instead of silently rounding.
+
+Buckaroo caches decoded results in its own LRU cache (8 entries) in
+``resolveDFData.ts`` — hyparquet itself doesn't cache. When you switch
+between the "main" and "summary stats" views, the parquet bytes don't
+get re-decoded if they're still in the cache.
+
+The type journey through this layer looks like:
+
+.. code-block:: text
+
+    Python sends:  string (base64)
+         ↓
+    b64ToArrayBuffer():  ArrayBuffer (raw bytes)
+         ↓
+    parquetRead():  Array<Record<string, unknown>>
+         ↓
+    parseParquetRow():  DFData (Array<DFDataRow>)
+         ↓
+    AG-Grid receives: typed cell values (number | string | boolean | object)
+
+
+Displayers and formatters: the last mile
+------------------------------------------
+
+At this point we have rows of data (``DFData``) and a ``column_config``
+that describes how each column should look. The ``column_config`` for
+each column includes a ``displayer_args`` object that names a
+**displayer** — this is the bridge between "raw value" and "what the
+user sees in the cell."
+
+The Python side picks the displayer based on summary stats:
+
+.. code-block:: python
+
+    # In a StylingAnalysis subclass
+    def style_column(cls, col, col_meta):
+        dtype = col_meta.get('dtype')
+        if dtype == 'float64':
+            return {'displayer_args': {
+                'displayer': 'float',
+                'min_fraction_digits': 2,
+                'max_fraction_digits': 4}}
+        elif dtype == 'timedelta64[ns]':
+            return {'displayer_args': {'displayer': 'duration'}}
+        ...
+
+The JS side receives this config and dispatches to the right formatter:
+
+.. code-block:: typescript
+
+    // Displayer.ts — getFormatter() is the dispatcher
+    switch (fArgs.displayer) {
+        case "integer":  return getIntegerFormatter(fArgs);
+        case "float":    return getFloatFormatter(fArgs);
+        case "string":   return getStringFormatter(fArgs);
+        case "boolean":  return booleanFormatter;
+        case "duration": return getDurationFormatter();
+        case "obj":      return getObjectFormatter(fArgs);
+        ...
+    }
+
+Each formatter is an AG-Grid ``ValueFormatterFunc`` — it receives the
+raw cell value and returns the display string. Some highlights:
+
+- **Integers** get thousands separators via ``Intl.NumberFormat`` and
+  right-padding for alignment.
+- **Floats** get configurable decimal places, also via
+  ``Intl.NumberFormat``, with padding to align decimal points across
+  rows.
+- **Durations** parse pandas timedelta strings (``"1 days 02:03:04"``)
+  and render as ``"1d 2h 3m 4s"``, with sub-second precision down to
+  microseconds.
+- **Booleans** display as Python-convention ``True``/``False``, not
+  JS-convention ``true``/``false``.
+- **Objects** (dicts, lists, None) get a recursive Python-like repr:
+  ``{ 'key': value }``, ``[ 1, 2, 3 ]``, ``None``.
+
+For richer displays, there are **cell renderers** instead of formatters
+— these return React components rather than strings. Histograms, charts,
+links, images, and SVGs all use this path.
+
+.. code-block:: typescript
+
+    // Cell renderers return React components
+    case "histogram": return HistogramCell;
+    case "linkify":   return LinkCellRenderer;
+    case "chart":     return getChartCell(crArgs);
+
+
+The full pipeline
+------------------
+
+Putting it all together, here's the journey of a single cell value —
+say, a ``pd.Timedelta`` of "1 day, 2 hours, 3 minutes, 4 seconds":
+
+.. code-block:: text
+
+    Python                          Wire              Browser
+    ──────                          ────              ───────
+    pd.Timedelta('1d 2h 3m 4s')
+        │
+        ▼
+    rename columns (a, b, c...)
+        │
+        ▼
+    coerce to str: "1 days 02:03:04"
+        │
+        ▼
+    write to Parquet (fastparquet)
+        │
+        ▼
+    base64 encode ──────────────► {"format": "parquet_b64",
+                                   "data": "UEFSMQ..."}
+                                        │
+                                        ▼
+                                  b64 → ArrayBuffer
+                                        │
+                                        ▼
+                                  hyparquet.parquetRead()
+                                        │
+                                        ▼
+                                  parseParquetRow() → "1 days 02:03:04"
+                                        │
+                                        ▼
+                                  getDurationFormatter()
+                                        │
+                                        ▼
+                                  formatDuration() → "1d 2h 3m 4s"
+                                        │
+                                        ▼
+                                  AG-Grid renders: │ 1d 2h 3m 4s │
+
+The column header shows the original name from ``header_name`` in the
+config. The user sees a human-readable duration in a column with its
+real name. Everything in between — the rename, the coercion, the binary
+encoding, the BigInt handling — is invisible.
+
+That's the point. The pipeline exists so that every type, every edge
+case, every weird DataFrame gets displayed correctly without the user
+having to think about it.
+
+
+Why it ended up this way
+-------------------------
+
+Buckaroo originally relied on default AG-Grid behavior and pandas'
+built-in JSON serialization. That worked for simple DataFrames, but
+edge cases kept appearing — and Python's JSON encoding turned out to be
+very, very slow. Moving to Parquet solved the performance problem and
+brought type preservation for free.
+
+A few examples of how the pipeline handles specific types:
+
+**BigInts (>2^53):** On the Python side, these are just regular int64
+values — they get written to Parquet as INT64, no conversion needed,
+full speed. The complexity lives entirely on the JS side: hyparquet
+decodes INT64 as JavaScript ``BigInt``, and buckaroo's
+``parseParquetRow()`` checks whether each value fits in
+``Number.MAX_SAFE_INTEGER``. If it does, it becomes a regular
+``Number`` (so the integer formatter works). If not, it's stringified
+to preserve precision. This means Python doesn't have to know or care
+about JavaScript's numeric limitations.
+
+**Durations / Timedeltas:** These are coerced to strings on the Python
+side — the entire column becomes string values like
+``"1 days 02:03:04"`` before Parquet encoding. Parquet has no native
+duration type, and fastparquet can't encode timedeltas directly. The JS
+side then parses these strings back into human-readable format
+(``"1d 2h 3m 4s"``) via the duration formatter. The round-trip through
+strings is lossy in theory but lossless in practice — pandas timedelta
+string repr preserves full precision down to microseconds.
diff --git a/docs/source/articles/why-depot.rst b/docs/source/articles/why-depot.rst
new file mode 100644
index 00000000..b431c75b
--- /dev/null
+++ b/docs/source/articles/why-depot.rst
@@ -0,0 +1,290 @@
+Why Buckaroo Uses Depot for CI
+===============================
+
+`Depot <https://depot.dev/>`_ sponsors Buckaroo's CI infrastructure. I
+ran a controlled benchmark — 21 runs across different scenarios — to
+understand exactly what that sponsorship buys. The results surprised me.
+
+The problem with GitHub Actions
+--------------------------------
+
+
+GitHub Actions is slow, but not in the way I expected. The jobs
+themselves are fine — the runners are fast enough. The problem is
+queueing. When you have a 23-job pipeline and GitHub is busy, your jobs
+don't start simultaneously. They trickle in one at a time over minutes.
+
+When your CI takes 10 minutes because of queueing, you stop pushing
+small changes. You batch things up. You skip the test run "just this
+once." You merge without waiting for green because you've already
+context-switched to something else. Slow CI makes you write worse code.
+
+
+What Buckaroo's CI does
+------------------------
+
+Buckaroo is a DataFrame viewer with a Python backend and TypeScript/React
+frontend. It deploys to 8 environments — Jupyter, Marimo, JupyterLite
+(WASM), Marimo WASM, VSCode, Google Colab, static embeds, and a
+standalone server (used for MCP). I can't manually test each environment
+on every code change.
+
+The CI pipeline runs **23 jobs** across 2 waves:
+
+- **Wave 1** (no dependencies): lint, JS build + test, wheel build,
+  Python tests across 4 versions with two dependency strategies (8 matrix
+  jobs), styling screenshots, docs build
+- **Wave 2** (needs the built wheel): 6 Playwright integration suites
+  (Storybook, JupyterLab, Marimo, WASM Marimo, Server, Static Embed),
+  MCP integration, smoke tests, TestPyPI publish
+
+Three months ago this pipeline had 3 jobs.
+
+
+LLMs changed the equation
+---------------------------
+
+LLM coding changed the way I approach devops. Claude made it possible to
+get my Playwright integration tests to a place where I trust them to run
+reliably. But LLMs also make testing more important than ever. When
+Claude makes a change across 5 files, I need to know in minutes — not 10
+minutes — whether it broke something. The tighter the feedback loop, the
+more ambitious the changes I can attempt.
+
+
+The benchmark
+--------------
+
+I ran the same 23-job pipeline on both Depot and GitHub Actions runners
+across 21 runs over a Sunday night and Monday morning, covering cold
+cache, warm cache, parallel, and sequential scenarios. All runs used
+2-CPU Linux runners.
+
+Reproduction scripts are in the `buckaroo repo
+<https://github.com/buckaroo-data/buckaroo/tree/main/scripts>`_:
+
+.. code-block:: bash
+
+    # Critical path for a single run
+    bash scripts/ci_critical_path.sh <run-id>
+
+    # List runs for a PR or branch
+    bash scripts/ci_list_runs.sh <pr-number-or-branch>
+
+    # Full timing data as JSON (pipe to ci_timing_table.py)
+    bash scripts/ci_all_timings.sh <run-id> [<run-id> ...] \
+      | python3 scripts/ci_timing_table.py --labels "Run 1" "Run 2" ...
+
+    # Launch paired cold-cache benchmark runs
+    bash scripts/cold_cache_benchmark.sh
+
+
+The results
+------------
+
+Critical path time (excluding the non-blocking Windows job):
+
+.. list-table::
+   :header-rows: 1
+   :widths: 35 12 12 12 12 5
+
+   * - Scenario
+     - Mean
+     - Std Dev
+     - Min
+     - Max
+     - n
+   * - GitHub, Sunday night, 1 PR
+     - 3m09s
+     - —
+     - 3m09s
+     - 3m09s
+     - 1
+   * - GitHub, Monday, cold, 3 parallel
+     - 9m15s
+     - ±30s
+     - 8m49s
+     - 9m49s
+     - 3
+   * - GitHub, Monday, warm, 3 parallel
+     - 8m09s
+     - ±158s
+     - 5m06s
+     - 11m11s
+     - 6
+   * - GitHub, Monday, warm, sequential
+     - 5m19s
+     - ±62s
+     - 4m25s
+     - 6m28s
+     - 3
+   * - Depot, Monday, cold, 3 parallel
+     - 3m53s
+     - ±2s
+     - 3m50s
+     - 3m55s
+     - 3
+   * - Depot, Monday, warm, 3 parallel
+     - 4m08s
+     - ±23s
+     - 3m38s
+     - 4m32s
+     - 6
+
+Aggregated across all Monday runs:
+
+.. list-table::
+   :header-rows: 1
+   :widths: 30 12 12 12 12 5
+
+   * - Runner
+     - Mean
+     - Std Dev
+     - Min
+     - Max
+     - n
+   * - GitHub Actions
+     - 7m46s
+     - ±143s
+     - 4m25s
+     - 11m11s
+     - 12
+   * - Depot
+     - 4m03s
+     - ±20s
+     - 3m38s
+     - 4m32s
+     - 9
+
+**Depot's standard deviation is ±20 seconds. GitHub's is ±143 seconds.**
+
+
+What's actually happening
+--------------------------
+
+Each Depot runner takes a few seconds longer to provision than a GitHub
+runner that's already available — there's a fixed overhead per machine
+spin-up. That makes individual job durations slightly longer on Depot.
+But it doesn't matter because Depot provisions all runners in parallel.
+GitHub provisions them sequentially from a shared pool, so you wait
+for each one.
+
+"Wave 1 stagger" is the time between the first and last Wave 1 job
+starting — it measures how long the runner takes to provision all the
+parallel jobs:
+
+- **Depot**: 14–35 seconds. All jobs start within half a minute.
+- **GitHub, Monday morning**: 90–447 seconds. Jobs trickle in over
+  1.5–7 minutes as runners become available.
+
+On a Sunday night with one PR, GitHub's stagger was 1 second — identical
+to Depot. The difference only shows up under load on Monday morning.
+
+Cache performance is close. Depot reads caches ~30% faster (2.8s vs 4.1s
+per step), but GitHub writes caches ~3x faster (0.8s vs 2.1s per step on
+Monday). Cache writes happen in post-job cleanup steps and don't affect
+the critical path. Neither difference materially changes the overall
+timing.
+
+
+What Depot actually gave me
+-----------------------------
+
+Three things, in order of importance:
+
+1. **Consistent provisioning.** Depot provisions all runners within 20
+   seconds, every time. GitHub ranges from instant to 7 minutes depending
+   on load. When you're pushing 10 times a day and iterating with an LLM,
+   unpredictable queue times kill your flow.
+
+2. **Confidence to invest in CI.** Because I knew the infrastructure was
+   solid, I actually spent time making CI better — removing unnecessary
+   setup steps, parallelizing into two waves, tuning the pipeline. When
+   your CI infrastructure feels like a liability, you don't invest in
+   it — you avoid it.
+
+Before and after
+-----------------
+
+On December 24, 2025 — the day Depot's CTO responded to my sponsorship
+request — Buckaroo's CI had **3 jobs**: lint, Python tests, and a wheel
+build.
+
+Since then I've added **20 new jobs**:
+
+- **6 Playwright integration suites** — Storybook, JupyterLab, Marimo,
+  WASM Marimo, Server, and Static Embed. These catch real bugs — "it
+  renders in Jupyter but is blank in Marimo" is the kind of thing I
+  don't want to eyeball on every PR.
+- **Python tests across 4 versions** with two dependency strategies
+  (min pinned + max latest) — 8 matrix jobs total
+- **MCP integration tests** — verifying the MCP server works against
+  the built wheel
+- **Smoke tests** for each optional extras group
+- **Styling screenshot comparisons** — before/after captures on every PR
+- **Docs build + link checker**
+- **TestPyPI publish** on every PR with an install command in the PR
+  comment
+
+The critical path completes in about **4 minutes** on Depot. The Windows
+job runs longer but is non-blocking (``continue-on-error: true``).
+
+
+Testing against dependency versions
+-------------------------------------
+
+Depending on pandas, PyArrow, and polars simultaneously is tricky. A new
+pandas release can change default string dtype behavior. A polars update
+can change how Duration columns serialize. PyArrow versions affect
+Parquet compatibility.
+
+Buckaroo runs two sets of test suites: the regular suite tests against
+the minimum pinned versions in ``pyproject.toml``, and the "Max
+Versions" suite tests against the latest releases of every dependency.
+This runs across Python 3.11 through 3.14. The goal is to catch
+compatibility issues before users do.
+
+This strategy only works if the test suite is fast enough to run both
+configurations on every push. On slow CI, you'd run one and hope for
+the best.
+
+
+The scariest part
+-------------------
+
+The scariest part of switching to Depot wasn't Depot itself — it was
+that their open source program requires a GitHub organization. Buckaroo
+lived at ``paddymul/buckaroo`` under my personal account. To use Depot I
+had to create the ``buckaroo-data`` organization and **transfer** the
+repository there.
+
+I was terrified of losing my GitHub stars. That sounds vain, but stars
+are the main signal to potential users that a project is real. Losing
+them would set the project back.
+
+It turns out GitHub's repository transfer preserves everything — stars,
+issues, pull requests, forks, watchers. It even sets up URL redirects
+from the old path. The transfer itself took seconds. But I didn't know
+that going in, and I spent more time worrying about it than about any
+technical aspect of the Depot migration.
+
+If you're in the same situation: do the transfer. You won't lose
+anything.
+
+
+What I'd tell other open source maintainers
+---------------------------------------------
+
+If your CI takes more than 5 minutes and you've been meaning to fix it
+but haven't, Depot's `open source sponsorship program
+<https://depot.dev/open-source>`_ is worth applying to. The switch is
+straightforward — change the ``runs-on`` label in your workflow YAML,
+everything else stays the same. If you need to create an organization
+and transfer your repo, that's painless too — stars and all metadata
+carry over.
+
+The real value isn't raw speed — individual jobs run at about the same
+pace. It's that your jobs all start at once instead of queueing. That
+consistency changes your behavior. You push more often, you test more
+things, you catch problems earlier. Slow CI is a tax on every decision
+you make. Removing that tax compounds.
diff --git a/scripts/ci_all_timings.sh b/scripts/ci_all_timings.sh
new file mode 100755
index 00000000..2fb0d5c6
--- /dev/null
+++ b/scripts/ci_all_timings.sh
@@ -0,0 +1,97 @@
+#!/bin/bash
+# Usage: bash scripts/ci_all_timings.sh <run-id> [<run-id> ...]
+#
+# Outputs one JSON line per run with critical path, wave1 stagger, per-job
+# durations, and cache read/write stats. Pipe to ci_timing_table.py for
+# formatted output.
+#
+# Example:
+#   bash scripts/ci_all_timings.sh 12345 67890 | python3 scripts/ci_timing_table.py
+
+set -euo pipefail
+
+for RUN_ID in "$@"; do
+python3 -c "
+import json, subprocess, sys
+from datetime import datetime
+
+def parse(t):
+    return datetime.fromisoformat(t.replace('Z','+00:00'))
+
+def dur(s, e):
+    return int((parse(e) - parse(s)).total_seconds())
+
+run_id = '$RUN_ID'
+result = subprocess.run(
+    ['gh', 'api', f'repos/buckaroo-data/buckaroo/actions/runs/{run_id}/jobs', '--paginate'],
+    capture_output=True, text=True)
+data = json.loads(result.stdout)
+
+# Run metadata
+run_meta = subprocess.run(
+    ['gh', 'api', f'repos/buckaroo-data/buckaroo/actions/runs/{run_id}'],
+    capture_output=True, text=True)
+meta = json.loads(run_meta.stdout)
+branch = meta.get('head_branch', '')
+created = meta.get('created_at', '')
+
+# Per-job timings (excl Windows)
+jobs = {}
+for j in data['jobs']:
+    if not j['completed_at'] or 'Windows' in j['name']:
+        continue
+    jobs[j['name']] = dur(j['started_at'], j['completed_at'])
+
+# Critical path
+completed = [(j['name'], j['started_at'], j['completed_at'])
+             for j in data['jobs'] if j['completed_at'] and 'Windows' not in j['name']]
+if not completed:
+    sys.exit(0)
+starts = [parse(s) for _, s, _ in completed]
+ends = [parse(e) for _, _, e in completed]
+critical_path = int((max(ends) - min(starts)).total_seconds())
+
+# Wave 1 stagger
+wave1_names = [n for n, _, _ in completed if 'Playwright' not in n and 'MCP' not in n
+               and 'Smoke' not in n and 'Publish' not in n and 'Static Embed' not in n]
+wave1_starts = sorted([parse(s) for n, s, _ in completed if n in wave1_names])
+wave1_stagger = int((wave1_starts[-1] - wave1_starts[0]).total_seconds()) if len(wave1_starts) >= 2 else 0
+
+# Cache stats from steps
+reads = []
+writes = []
+for j in data['jobs']:
+    if 'Windows' in j['name']:
+        continue
+    for step in j['steps']:
+        if not step['completed_at']:
+            continue
+        d = dur(step['started_at'], step['completed_at'])
+        name = step['name']
+        is_read = (any(x in name for x in ['Install uv', 'Install the project', 'Install pnpm', 'Cache Playwright'])
+                   and not name.startswith('Post'))
+        is_write = (name.startswith('Post ')
+                    and any(x in name for x in ['uv', 'cache', 'Cache', 'Playwright'])
+                    and 'checkout' not in name and 'pnpm' not in name)
+        if is_read:
+            reads.append(d)
+        elif is_write:
+            writes.append(d)
+
+output = {
+    'run_id': run_id,
+    'branch': branch,
+    'created': created,
+    'critical_path': critical_path,
+    'wave1_stagger': wave1_stagger,
+    'jobs': jobs,
+    'cache_reads': reads,
+    'cache_writes': writes,
+    'cache_read_total': sum(reads),
+    'cache_write_total': sum(writes),
+    'cache_read_mean': round(sum(reads)/len(reads), 1) if reads else 0,
+    'cache_write_mean': round(sum(writes)/len(writes), 1) if writes else 0,
+}
+print(json.dumps(output))
+"
+done
diff --git a/scripts/ci_critical_path.sh b/scripts/ci_critical_path.sh
new file mode 100755
index 00000000..99822f65
--- /dev/null
+++ b/scripts/ci_critical_path.sh
@@ -0,0 +1,41 @@
+#!/bin/bash
+# Usage: bash scripts/ci_critical_path.sh <run-id>
+#
+# Prints the critical path time (excluding Windows) for a GitHub Actions run.
+
+set -euo pipefail
+
+RUN_ID=$1
+
+python3 -c "
+import json, subprocess, sys
+from datetime import datetime
+
+def parse(t):
+    return datetime.fromisoformat(t.replace('Z','+00:00'))
+
+run_id = '$RUN_ID'
+result = subprocess.run(
+    ['gh', 'api', f'repos/buckaroo-data/buckaroo/actions/runs/{run_id}/jobs', '--paginate'],
+    capture_output=True, text=True)
+data = json.loads(result.stdout)
+jobs = [(j['name'], j['started_at'], j['completed_at'], j['conclusion'])
+        for j in data['jobs'] if j['completed_at'] and 'Windows' not in j['name']]
+
+if not jobs:
+    print('No completed non-Windows jobs found.')
+    sys.exit(1)
+
+starts = [parse(s) for _, s, e, _ in jobs]
+ends = [parse(e) for _, s, e, _ in jobs]
+cp = int((max(ends) - min(starts)).total_seconds())
+
+first_start = min(starts)
+last_end = max(ends)
+last_job = [n for n, s, e, _ in jobs if parse(e) == last_end][0]
+
+print(f'Run {run_id}: {cp//60}m{cp%60:02d}s (critical path excl Windows)')
+print(f'  First job started: {min(starts).isoformat()}')
+print(f'  Last job finished: {max(ends).isoformat()} ({last_job})')
+print(f'  Jobs: {len(jobs)} completed (excl Windows)')
+"
diff --git a/scripts/ci_list_runs.sh b/scripts/ci_list_runs.sh
new file mode 100755
index 00000000..d9414567
--- /dev/null
+++ b/scripts/ci_list_runs.sh
@@ -0,0 +1,23 @@
+#!/bin/bash
+# Usage: bash scripts/ci_list_runs.sh <pr-number-or-branch>
+#
+# Lists all Checks workflow runs for a PR number or branch name.
+
+set -euo pipefail
+
+INPUT=$1
+
+# If it's a number, treat as PR and get the branch
+if [[ "$INPUT" =~ ^[0-9]+$ ]]; then
+    BRANCH=$(gh pr view "$INPUT" --json headRefName -q '.headRefName')
+    echo "PR #$INPUT → branch: $BRANCH"
+else
+    BRANCH="$INPUT"
+    echo "Branch: $BRANCH"
+fi
+
+echo ""
+gh run list --branch "$BRANCH" --workflow checks.yml --limit 20 \
+    --json databaseId,status,conclusion,createdAt,updatedAt,event \
+    -q '.[] | "\(.databaseId)\t\(.status)\t\(.conclusion // "-")\t\(.createdAt)\t\(.updatedAt)\t\(.event)"' | \
+    column -t -s $'\t'
diff --git a/scripts/ci_timing_table.py b/scripts/ci_timing_table.py
new file mode 100755
index 00000000..975b7042
--- /dev/null
+++ b/scripts/ci_timing_table.py
@@ -0,0 +1,155 @@
+#!/usr/bin/env python3
+"""Read JSON lines from ci_all_timings.sh and print a comparison table.
+
+Usage:
+    bash scripts/ci_all_timings.sh <id1> <id2> ... | python3 scripts/ci_timing_table.py
+
+    # Or with labels:
+    bash scripts/ci_all_timings.sh <id1> <id2> <id3> | python3 scripts/ci_timing_table.py --labels "GH warm 1" "GH warm 2" "GH warm 3"
+
+    # Group by prefix for summary rows:
+    bash scripts/ci_all_timings.sh <ids...> | python3 scripts/ci_timing_table.py --groups "GitHub warm:0,1,2" "Depot warm:3,4,5"
+"""
+import json
+import sys
+import argparse
+
+
+def fmt(secs):
+    if secs is None:
+        return "?"
+    return f"{secs // 60}m{secs % 60:02d}s"
+
+
+def mean(vals):
+    vals = [v for v in vals if v is not None]
+    return int(sum(vals) / len(vals)) if vals else 0
+
+
+def median(vals):
+    vals = sorted(v for v in vals if v is not None)
+    if not vals:
+        return 0
+    n = len(vals)
+    if n % 2:
+        return vals[n // 2]
+    return (vals[n // 2 - 1] + vals[n // 2]) // 2
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--labels", nargs="*", help="Labels for each run")
+    parser.add_argument(
+        "--groups",
+        nargs="*",
+        help='Group runs for summary: "Label:0,1,2" (0-indexed)',
+    )
+    args = parser.parse_args()
+
+    runs = []
+    for line in sys.stdin:
+        line = line.strip()
+        if line:
+            runs.append(json.loads(line))
+
+    if not runs:
+        print("No data on stdin. Pipe output from ci_all_timings.sh.")
+        sys.exit(1)
+
+    labels = args.labels or [r.get("branch", r["run_id"]) for r in runs]
+
+    # Individual runs
+    n = len(runs)
+    col_w = max(12, max(len(l) for l in labels) + 2)
+
+    header = f"{'':>30}" + "".join(f"{l:>{col_w}}" for l in labels)
+    print(header)
+    print("=" * len(header))
+
+    print("Critical path (excl Windows):")
+    vals = [r["critical_path"] for r in runs]
+    print(f"  {'':>28}" + "".join(f"{fmt(v):>{col_w}}" for v in vals))
+
+    print("Wave 1 stagger:")
+    vals = [r["wave1_stagger"] for r in runs]
+    print(
+        f"  {'':>28}" + "".join(f"{str(v) + 's':>{col_w}}" for v in vals)
+    )
+
+    print("Cache read total:")
+    vals = [r["cache_read_total"] for r in runs]
+    print(
+        f"  {'':>28}" + "".join(f"{str(v) + 's':>{col_w}}" for v in vals)
+    )
+
+    print("Cache read mean/step:")
+    vals = [r["cache_read_mean"] for r in runs]
+    print(
+        f"  {'':>28}"
+        + "".join(f"{str(v) + 's':>{col_w}}" for v in vals)
+    )
+
+    print("Cache write total:")
+    vals = [r["cache_write_total"] for r in runs]
+    print(
+        f"  {'':>28}" + "".join(f"{str(v) + 's':>{col_w}}" for v in vals)
+    )
+
+    print("Cache write mean/step:")
+    vals = [r["cache_write_mean"] for r in runs]
+    print(
+        f"  {'':>28}"
+        + "".join(f"{str(v) + 's':>{col_w}}" for v in vals)
+    )
+
+    # Groups summary
+    if args.groups:
+        print()
+        print("=" * 72)
+        print("SUMMARY")
+        print("=" * 72)
+        print(
+            f"{'Group':>30} {'Mean CP':>10} {'Med CP':>10} {'Stagger':>10} {'CR mean':>10} {'CW mean':>10}"
+        )
+        print("-" * 82)
+        for group_spec in args.groups:
+            label, indices_str = group_spec.split(":")
+            indices = [int(i) for i in indices_str.split(",")]
+            group_runs = [runs[i] for i in indices if i < len(runs)]
+
+            cp_mean = mean([r["critical_path"] for r in group_runs])
+            cp_med = median([r["critical_path"] for r in group_runs])
+            stg_mean = mean([r["wave1_stagger"] for r in group_runs])
+            cr_mean = round(
+                mean([r["cache_read_mean"] for r in group_runs]), 1
+            )
+            cw_mean = round(
+                mean([r["cache_write_mean"] for r in group_runs]), 1
+            )
+
+            print(
+                f"{label:>30} {fmt(cp_mean):>10} {fmt(cp_med):>10} {str(stg_mean) + 's':>10} {str(cr_mean) + 's':>10} {str(cw_mean) + 's':>10}"
+            )
+
+    # Per-job breakdown if few runs
+    if len(runs) <= 6:
+        print()
+        print("Per-job durations:")
+        all_jobs = sorted(
+            set(j for r in runs for j in r["jobs"].keys())
+        )
+        print(f"{'Job':>35}" + "".join(f"{l:>{col_w}}" for l in labels))
+        print("-" * (35 + col_w * n))
+        for job in all_jobs:
+            vals = [r["jobs"].get(job) for r in runs]
+            print(
+                f"{job:>35}"
+                + "".join(
+                    f"{(str(v) + 's') if v is not None else '-':>{col_w}}"
+                    for v in vals
+                )
+            )
+
+
+if __name__ == "__main__":
+    main()