Skip to content

Content plan#638

Open
paddymul wants to merge 29 commits intomainfrom
content-plan
Open

Content plan#638
paddymul wants to merge 29 commits intomainfrom
content-plan

Conversation

@paddymul
Copy link
Copy Markdown
Collaborator

No description provided.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 20, 2026

📦 TestPyPI package published

pip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.13.2.dev23491757659

or with uv:

uv pip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.13.2.dev23491757659

MCP server for Claude Code

claude mcp add buckaroo-table -- uvx --from "buckaroo[mcp]==0.13.2.dev23491757659" --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo-table

📖 Docs preview

🎨 Storybook preview

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 38a51e33e6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +76 to +78
The HTML file references ``static-embed.js`` and ``static-embed.css``.
These are included in the buckaroo package under ``buckaroo/static/`` —
copy them alongside your HTML or serve them from a web server.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Stop telling users to copy assets that the wheel doesn't ship

Anyone following this section on a clean install will generate HTML that never renders: to_html() hardcodes static-embed.js/css, but the package build still only ships widget.js, compiled.css, standalone.js, and standalone.css (see pyproject.toml's tool.hatch.build.artifacts). The same missing files also break the new DDD pages on ReadTheDocs, because scripts/generate_ddd_static_html.py copies from buckaroo/static/ and only warns when static-embed.* is absent.

Useful? React with 👍 / 👎.

Comment on lines +71 to +72
('weird-types-polars', 'Weird Types (Polars → Pandas)',
pl_df_with_weird_types_as_pandas(),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Generate the DDD Polars page from a real Polars DataFrame

This entry bypasses the Polars static-embed path by converting the sample to pandas before calling to_html(). buckaroo.artifact.prepare_buckaroo_artifact() only switches to PolarsBuckarooWidget for an actual pl.DataFrame, so the published weird-types-polars.html page won't exercise the Polars serializer/analysis at all and can miss regressions like Duration/Decimal formatting while the docs still claim Polars embedding works.

Useful? React with 👍 / 👎.

@github-actions
Copy link
Copy Markdown
Contributor

paddymul added a commit that referenced this pull request Mar 21, 2026
This file belongs on the content-plan branch (PR #638), not here.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
github-merge-queue bot pushed a commit that referenced this pull request Mar 21, 2026
* docs: Dastardly DataFrame Dataset post with static embeds

- DDD article with inline iframe embeds for 10 tricky DataFrames
- Generator script to produce static embed HTML at RTD build time
- RTD config: build JS bundle, generate DDD pages, copy to output
- Fix: coerce Period/Interval/Timedelta to strings in parquet b64
  (hyparquet can't decode pandas Arrow extension types)
- Tests for weird-type parquet roundtrip
- Ship static-embed.js/css in wheel (pyproject.toml artifacts)
- Docs preview link in TestPyPI PR comment
- BuckarooCompare article
- Content plan outline

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add voice/personality to DDD post, add type coverage research

Incorporate colorful wording from buckaroo_writing notes into the DDD blog
post intro ("wonderfully variant splendor", "weirdest DataFrames in the
wild", "hard fought experience"). Add Buckaroo's displayability philosophy.

New research doc tests every pandas (classic, extension, arrow-backed) and
polars dtype through both JSON and parquet serialization paths, documenting
which are covered by the DDD and test suite.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add full dtype coverage table to DDD post

Cross-engine table showing Yes/not tested/fail for every dtype across
pandas-classic, pandas-arrow, and polars. Footnote noting that serialization
complexity warrants its own follow-up post.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add parquet, JS type, and display columns to dtype table

Expand the dtype coverage table with three new columns showing the full
pipeline: what Parquet physical type each dtype maps to, what JS type
hyparquet produces after decode, and how buckaroo's display formatter
renders it. Footnotes explain BigInt handling, coercion for types without
native Parquet equivalents, and the two serialization failures.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: credit Cecil Curry / beartype for DDD inspiration

The naming and early shape of the DDD was influenced by an exchange
with leycec on beartype#529. Added a shout-out in the intro.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update copyright years to include 2026

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: link first DDD mention to source on GitHub main

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: show actual rendered values in dtype table Buckaroo display column

Replace generic type names with example values buckaroo actually displays:
1,234 for integers, 1d 2h 3m 4s for durations, 68656c6c6f for binary, etc.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: fix dtype table — actual display values, footnotes, no blank cells

- Replace generic type names with actual buckaroo display values
- Add footnotes for BigInt (stringified, no commas) and Period (time span)
- Fill all blank cells with — for dtypes that don't exist in that engine
- Clarify Period label as "Period (time span)"
- Fix Nullable int/float/bool row with proper values

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* WIP

* docs: remove content-plan.md from ddd-post branch

This file belongs on the content-plan branch (PR #638), not here.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: address inline review comments on DDD article

- Shorten Cecil Curry praise (remove "makes open source worth doing")
- Add LLM-generated DataFrames and inherited pipelines to "why this matters"
- Fix "three values" → "three non-numeric values" for infinity/NaN section
- Remove duplicate None mention from MultiIndex rows section
- Add note about planned MultiIndex-both-axes improvements
- Update footnote 1: "putting together this table exposed areas that need work"
- Fix comma→period typo in footnote 1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
github-merge-queue bot pushed a commit that referenced this pull request Mar 21, 2026
* docs: Dastardly DataFrame Dataset post with static embeds

- DDD article with inline iframe embeds for 10 tricky DataFrames
- Generator script to produce static embed HTML at RTD build time
- RTD config: build JS bundle, generate DDD pages, copy to output
- Fix: coerce Period/Interval/Timedelta to strings in parquet b64
  (hyparquet can't decode pandas Arrow extension types)
- Tests for weird-type parquet roundtrip
- Ship static-embed.js/css in wheel (pyproject.toml artifacts)
- Docs preview link in TestPyPI PR comment
- BuckarooCompare article
- Content plan outline

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add voice/personality to DDD post, add type coverage research

Incorporate colorful wording from buckaroo_writing notes into the DDD blog
post intro ("wonderfully variant splendor", "weirdest DataFrames in the
wild", "hard fought experience"). Add Buckaroo's displayability philosophy.

New research doc tests every pandas (classic, extension, arrow-backed) and
polars dtype through both JSON and parquet serialization paths, documenting
which are covered by the DDD and test suite.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add full dtype coverage table to DDD post

Cross-engine table showing Yes/not tested/fail for every dtype across
pandas-classic, pandas-arrow, and polars. Footnote noting that serialization
complexity warrants its own follow-up post.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add parquet, JS type, and display columns to dtype table

Expand the dtype coverage table with three new columns showing the full
pipeline: what Parquet physical type each dtype maps to, what JS type
hyparquet produces after decode, and how buckaroo's display formatter
renders it. Footnotes explain BigInt handling, coercion for types without
native Parquet equivalents, and the two serialization failures.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: credit Cecil Curry / beartype for DDD inspiration

The naming and early shape of the DDD was influenced by an exchange
with leycec on beartype#529. Added a shout-out in the intro.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update copyright years to include 2026

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: link first DDD mention to source on GitHub main

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: show actual rendered values in dtype table Buckaroo display column

Replace generic type names with example values buckaroo actually displays:
1,234 for integers, 1d 2h 3m 4s for durations, 68656c6c6f for binary, etc.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: fix dtype table — actual display values, footnotes, no blank cells

- Replace generic type names with actual buckaroo display values
- Add footnotes for BigInt (stringified, no commas) and Period (time span)
- Fill all blank cells with — for dtypes that don't exist in that engine
- Clarify Period label as "Period (time span)"
- Fix Nullable int/float/bool row with proper values

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* WIP

* docs: remove content-plan.md from ddd-post branch

This file belongs on the content-plan branch (PR #638), not here.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: address inline review comments on DDD article

- Shorten Cecil Curry praise (remove "makes open source worth doing")
- Add LLM-generated DataFrames and inherited pipelines to "why this matters"
- Fix "three values" → "three non-numeric values" for infinity/NaN section
- Remove duplicate None mention from MultiIndex rows section
- Add note about planned MultiIndex-both-axes improvements
- Update footnote 1: "putting together this table exposed areas that need work"
- Fix comma→period typo in footnote 1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* build: auto-create static stubs in hatch hook, remove manual touch hacks

The hatch build hook (initialize2 → initialize) now auto-creates empty
stub files in buckaroo/static/ when the real JS artifacts haven't been
built. This eliminates the need for manual mkdir/touch blocks in CI
and ReadTheDocs.

- scripts/hatch_build.py: rename initialize2 → initialize (was dead
  code), create stubs for editable installs, build JS for wheel builds
- pyproject.toml: use glob for artifacts (buckaroo/static/*.js|*.css)
  instead of incomplete file list
- checks.yml: remove 4 separate mkdir/touch stub blocks
- .readthedocs.yaml: remove pnpm install for js-core (stubs handle it)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* build: stop tracking standalone.js/css in git

These are build artifacts produced by full_build.sh. They're already
in .gitignore but were force-tracked. The glob artifact pattern in
pyproject.toml ensures they're still included in the wheel.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: build all static assets in hatch hook pnpm fallback

The pnpm fallback only built buckaroo-widget (widget.js/css) but not
compiled.css, standalone.js/css, or static-embed.js/css. A bare
`uv build` would produce an incomplete wheel.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
paddymul and others added 5 commits March 21, 2026 13:00
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Post 1: Dastardly DataFrame Dataset with inline iframe embeds
- Post 3: Static Embedding & the Incredible Shrinking Widget
- Post 5: Buckaroo Embedding Guide
- Post 8: BuckarooCompare — Diff Your DataFrames
- Script to generate DDD static HTML pages at docs build time
- RTD config runs generate_ddd_static_html.py before copying extra-html
- Fleshed out content-plan.md with 9-post publishing sequence

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Touch empty static files (compiled.css, widget.js, etc.) before
  running generate_ddd_static_html.py so anywidget import succeeds
  without a full JS build
- Widen RST table columns to fit "Buckaroo static embed" row

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a step to the CheckDocs job that comments on PRs with the
ReadTheDocs preview URL and links to key article pages.
Uses the same create-or-update pattern as the TestPyPI comment.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
paddymul and others added 4 commits March 21, 2026 13:00
Converts so-you-want-to-write-a-dataframe-viewer to RST with a proper
list-table for the comparison of open source DataFrame viewers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Updates based on research into each project:
- Buckaroo: Static Export → Yes
- Perspective: Both server/browser, Arrow serialization, Jupyter compatible
- DTale: JSON confirmed, No static export, uses react-virtualized
- Marimo: JSON confirmed, not Jupyter compatible, not anywidget
- ipydatagrid: No static export (confirmed broken), Lumino DataGrid
- Mito: Endo (custom) table viewer
- iTables: anywidget optional
- Add quak (manzt): DuckDB-backed, Arrow, anywidget
- Add hyperlinks to all project GitHub repos

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…mparison as published

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DDD article, pyproject.toml, and readthedocs.yaml were modified by
earlier commits on this branch but the canonical versions were merged
to main via PRs 641 and 642. Reset to main to avoid regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove Python code block for column renaming (not needed)
- Drop "smaller payloads" reason — not true for Parquet (column names
  stored once in metadata, not per-row)
- Clarify caching is buckaroo's LRU in resolveDFData.ts, not hyparquet
- Add TypeScript type flow diagram (string → ArrayBuffer → rows → cells)
- Add "why it ended up this way" section: evolution from default AG-Grid
  + pandas JSON (slow) to Parquet
- Explain BigInt flow: fast INT64 on Python side, hyparquet decodes as
  BigInt, buckaroo stringifies only when > MAX_SAFE_INTEGER
- Explain duration flow: whole column coerced to string in Python,
  parsed back to human-readable on JS side

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Drop nonexistent "Hytable" from intro
- Add Panel Tabulator and Streamlit st.dataframe to comparison table
- Add hyperlinks throughout prose: AG-Grid, Perspective, glide-data-grid,
  tanstack-table, react-data-grid, anywidget, JS Jabber podcast
- Update glide-data-grid description (actively maintained, not abandoned)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ting

- Add before/after comparison: 4 jobs in 6 min → 23 jobs in 7 min
- Add section on dual dependency testing strategy (min pinned + max versions)
- Explain why pandas/pyarrow/polars compatibility testing needs fast CI
- Link to Depot open source sponsorship program

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Dec 24 (pre-Depot): 3 jobs. Now: 23 jobs (20 added since sponsorship).
Itemized breakdown of what was added: 6 Playwright suites, 8 Python
matrix jobs, MCP integration, smoke tests, screenshots, docs, TestPyPI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…speed

Same-pipeline benchmarks show Depot isn't measurably faster than GitHub
runners (I/O-bound workload). The real value is consistent provisioning
(no Monday afternoon queue delays) and no minute quotas, which gave
confidence to invest in CI optimization and grow from 3 to 23 jobs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- "How I made Buckaroo fast" — do less, not optimize more
- "Testing Buckaroo" — unit, integration, Playwright, screenshots,
  smoke, MCP, dual deps, DDD as test suite
- Mark Depot article as draft with pending CTO input

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Article now includes a 4-scenario comparison table with real data:
- GitHub Sunday night (3m15s), Monday sequential (5m19s), Monday
  parallel (5m58s), Depot Monday parallel (4m18s)
- Key finding: per-job Depot is slightly slower, but consistent
  provisioning (19s stagger vs 114s) wins on critical path
- GitHub ranges 3m–6m+, Depot is 4m–4m30s regardless

New scripts:
- ci_critical_path.sh: critical path for a single run
- ci_list_runs.sh: list runs for a PR or branch
- ci_all_timings.sh: JSON output of all timing data per run
- ci_timing_table.py: formatted comparison table from JSON

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Complete rewrite with controlled benchmark results:
- 21 runs across 6 scenarios (cold/warm, parallel/sequential, Sunday/Monday)
- GitHub Monday: 7m46s ±143s. Depot Monday: 4m03s ±20s.
- Key finding: per-job Depot is slightly slower, but consistent
  provisioning (all jobs start in 20s vs 1-7 min stagger) wins
- Includes reproduction steps with scripts
- Restructured: benchmark → results → analysis → before/after → advice

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix "GitHub Actions is" grammar
- Explain per-job slowness: Depot has per-runner provisioning overhead,
  but provisions all in parallel; GitHub provisions sequentially from pool
- Verify Monday cache write speed (GH 0.8s vs Depot 2.1s, confirmed)
- Fix "Monday morning" typo
- Remove "no minute quotas" point — reads as "Depot is great if free"
- Renumber value prop list (now 2 items: consistency + confidence)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The scariest part was transferring paddymul/buckaroo to the buckaroo-data
org (required for Depot's open source program). Feared losing GitHub
stars — turns out GitHub's transfer preserves everything (stars, issues,
PRs, forks, redirects).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant