Releases · EleutherAI/bergson

HuggingFace Dataset column access (ds["length"]) returns a PyArrow Column, not a Python list. Iterating over it element-by-element (via sorted(), random indexing) is ~1000x slower than on a native list. For 10M items this caused allocate_batches to hang for 13+ hours instead of completing in ~17 seconds.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Convert PyArrow columns to list at callsites of allocate_batches (5d734dc)

Move the list conversion out of allocate_batches (which types doc_lengths as list[int]) to the callsites that pass HF Dataset columns. Use ds["length"][:] which returns a plain list[int].

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Remove redundant zero-fill loop in MemmapSequenceScoreWriter (558829f)

np.memmap w+ mode already creates a zero-filled file, making the per-field written flag initialization loop unnecessary. For large datasets (10M+ items) with many query scores, the strided writes through the structured dtype caused multi-hour hangs.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Use [:] instead of list() for consistency (c76d131)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Detailed Changes: v0.6.1...v0.6.2

Assets 2

02 Mar 00:53

github-actions

v0.6.1

b983984

v0.6.1

v0.6.1 (2026-03-02)

This release is published under the MIT License License.

Bug Fixes

Unpin transformers by explicitly setting float32 dtype in tests (0b6c226)

Transformers 4.56+ changed from_config() to honor the config's torch_dtype field, causing test models (tiny-GPTNeoX, tiny-Phi3) to be created in float16 instead of float32. This caused gradient comparison tests to fail from reduced precision, not from any actual change in gradient collection logic.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Detailed Changes: v0.6.0...v0.6.1

Assets 2

17 Feb 04:54

github-actions

v0.6.0

bcb862b

v0.6.0

v0.6.0 (2026-02-17)

This release is published under the MIT License License.

Bug Fixes

Use _csv._writer type for csv_recorder annotation (6e6289c)

csv.writer is a function, not a class, so it cannot be used as a type annotation. Import the private _writer type from _csv and use it for the Generator yield type. Also fix the None check to use if not path since QueryConfig.record uses empty string as the sentinel value.

Co-authored-by: Lucia Quirke luciaquirke@users.noreply.github.com

Continuous Integration

Pin pyright version and fix faiss type error (b9f54cf)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Use Python 3.11 for typechecking (9ef4122)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Use Python 3.11 for typechecking (ea50dd8)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Features

Add --record flag to query CLI for saving results to CSV (59770ff)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Refactoring

Replace try/finally CSV block with context manager (6431320)

Co-authored-by: Lucia Quirke luciaquirke@users.noreply.github.com

Detailed Changes: v0.5.2...v0.6.0

Assets 2

Releases: EleutherAI/bergson

v0.9.1

v0.9.1 (2026-04-10)

Bug Fixes

Uh oh!

v0.9.0

v0.9.0 (2026-03-18)

Bug Fixes

Features

Uh oh!

v0.8.1

v0.8.1 (2026-03-18)

Bug Fixes

Uh oh!

v0.8.0

v0.8.0 (2026-03-08)

Features

Uh oh!

v0.7.2

What's Changed

Uh oh!

v0.7.1

v0.7.1 (2026-03-03)

Bug Fixes

Uh oh!

v0.7.0

v0.7.0 (2026-03-03)

Bug Fixes

Features

Uh oh!

v0.6.2

v0.6.2 (2026-03-02)

Bug Fixes

Uh oh!

v0.6.1

v0.6.1 (2026-03-02)

Bug Fixes

Uh oh!

v0.6.0

v0.6.0 (2026-02-17)

Bug Fixes

Continuous Integration

Features

Refactoring

Uh oh!