Skip to content

Prepare v0.2.0 release#744

Merged
Cyan4973 merged 577 commits into
releasefrom
dev
May 7, 2026
Merged

Prepare v0.2.0 release#744
Cyan4973 merged 577 commits into
releasefrom
dev

Conversation

@Cyan4973
Copy link
Copy Markdown
Contributor

@Cyan4973 Cyan4973 commented May 7, 2026

No description provided.

Cyan4973 and others added 30 commits March 7, 2026 18:43
Summary:
This new test is designed to detect issues like #282

In the process, it was discovered that #282 was not enough to fix compatibility issues,
so this PR bundles an additional `cmake` fix.

Pull Request resolved: #423

Reviewed By: terrelln

Differential Revision: D94418093

Pulled By: Cyan4973

fbshipit-source-id: a1064c2affff30b5c0e05d42d17454c9e43bef5a
Summary:
`zli` already supports `le-u64`, `le-u32` and `le-u16`,
but strangely does not offer anything for 8-bit numeric.

This PR fixes it, now `u8` and `i8` are selectable profiles,
and there are tests to check the new support.

_note: there is no `le-` prefix, since 8-bit numeric has no endianess_.

Among the desirable properties,
it's possible to train a `u8` profile, thus starting with a `num8` conversion.

It will also be possible to pass such data to `num8` specific Selectors in the future.

Pull Request resolved: #483

Reviewed By: terrelln

Differential Revision: D95691133

Pulled By: Cyan4973

fbshipit-source-id: 08d7835087e5dfcca1208be76c5bf5e4d99b9b39
Summary:
Pull Request resolved: #457

This diff splits `CompilerTest.cpp` into separate test files per compiler stage. This makes it easier to test the functionality of each component in isolation; e.g making sure the parser produces the expected AST before optimizations.

Reviewed By: Cyan4973

Differential Revision: D95084293

fbshipit-source-id: 60dc1e8a63fccdfcbcf0b6f082584a83ca7539d7
Summary: Pull Request resolved: #481

Differential Revision: D95033895

fbshipit-source-id: ed3249c02f357e7de9815ada711b39d525c94087
Summary:
Pull Request resolved: #475

Two fixes in dev & release ROLZ

Reviewed By: kevinjzhang

Differential Revision: D95604468

fbshipit-source-id: 377023da42cebf135dce4099b118ad10ea5b4d9a
Summary: Pull Request resolved: #480

Differential Revision: D95043302

fbshipit-source-id: 81bd9061eb6f0a9f5731fd40dab3bfd341aa692b
Summary:
Pull Request resolved: #467

# Stack
The goal of this stack is to add support for variable assignments and assume operations to SDDL2.

# Diff
This diff implements variable assignment and reference support in the code generator. This enables SDDL2 programs to use global variables for parameterizing field sizes and reusing type definitions.

- Implemented `Op::ASSIGN` codegen: evaluates the RHS expression, allocates a VM register via `getOrAssignRegister()`, and emits `var.store`.
- Implemented `ASTVar` reference codegen: looks up the variable's register and emits `var.load`.  If this is the last reference to a particular variable, returns the register to the free pool.
- Added a register allocator (`var_registry_`, `free_registers_`, `next_register_`) that reuses freed registers and enforces the `SDDL2_VAR_REGISTER_COUNT` (256) limit at compile time.

Reviewed By: Cyan4973

Differential Revision: D94947414

fbshipit-source-id: 2626f437bd16fb7891a689131e5ef4f3b927de1a
Summary:
Pull Request resolved: #468

# Stack
The goal of this stack is to add support for variable assignments and assume operations to SDDL2.

# Diff
This diff adds a `MEMBER` op to the SDDL grammar, which can be used to access fields of consumed recrds. This operator (`.`) was already supported in the syntax.

Reviewed By: Cyan4973

Differential Revision: D95396139

fbshipit-source-id: b1ba4f6c16ff9bd7fd633b01611206ab0b1f4c98
Summary:
Pull Request resolved: #471

# Stack
The goal of this stack is to add support for variable assignments and assume operations to SDDL2.

# Diff
This diff handles the `MEMBER` op in the dead var optimizer.

Reviewed By: Cyan4973

Differential Revision: D95396507

fbshipit-source-id: 21cd50847087d86b8e17a4258f0116e47c988f0e
Summary:
Pull Request resolved: #431

Add bitsplit_fp codec support to OpenZL for floating-point data compression. This extends the existing bitsplit codec family to handle floating-point types, enabling more efficient compression of floating-point data streams in the OpenZL framework.

Reviewed By: Cyan4973

Differential Revision: D94410436

fbshipit-source-id: e42fa9ec08ec3971eaa22c1b6381c91036ef6441
Summary:
Pull Request resolved: #470

- Adds `testCodecVO` and `AssertEqVOFunctionGraph` to the codec
  test framework, enabling tests to verify that a codec produces the expected output
  streams from a single input
- Adds `FP32_OutputOrder`, `FP64_OutputOrder`, and `FP16_OutputOrder` tests that verify
  bitsplit_fp splits IEEE 754 floats into the correct mantissa, exponent, and sign streams

Reviewed By: Cyan4973

Differential Revision: D95400978

fbshipit-source-id: 292c3cf6010009dd9db8e5e81a5a0df9a087f0e2
…itBench benchmarks (#478)

Summary:
Pull Request resolved: #478

Add specialized bitSplit encoder fast paths for IEEE 754 floating-point types
(fp16, fp32, fp64) and unitBench benchmark scenarios to measure their performance.

Kernel changes (encode_bitSplit_kernel.c):
- Wire up fp16 and fp32 and fp64 fast path dispatches in ZL_bitSplitEncode

Benchmark results (10MB random data, before -> after fast path):

  │      Scenario       │ Generic (MB/s) │ Fast path (MB/s) │ Speedup │
  ├─────────────────────┼────────────────┼──────────────────┼─────────┤
  │ bitSplitEncode_fp16 │ 544            │ 17,286           │ 31.8x   │
  ├─────────────────────┼────────────────┼──────────────────┼─────────┤
  │ bitSplitEncode_fp32 │ 1,130          │ 17,890           │ 15.8x   │
  ├─────────────────────┼────────────────┼──────────────────┼─────────┤
  │ bitSplitEncode_fp64 │ 2,045          │ 21,037           │ 10.3x   │
  └─────────────────────┴────────────────┴──────────────────┴─────────┘

New unitBench scenarios:
- bitSplitEncode_fp16: kernel-level fp16 encode benchmark
- bitSplitEncode_fp64: kernel-level fp64 encode benchmark

Also adds a unitbench-scenarios Claude skill for creating future benchmarks.

Reviewed By: Cyan4973

Differential Revision: D95618687

fbshipit-source-id: cf68e2109a57bea263de5e84474df2e3cea55fcd
Summary:
Pull Request resolved: #436

Add a helper to save all input streams to a codec with a given name. This is used to extract the offsets that are being passed into the bucketing codec for benchmarking.

Also, print a summary of the results at the end of benchmarking.

Reviewed By: Cyan4973

Differential Revision: D94680942

fbshipit-source-id: 0adaa56f32920ac256adfabb2ddd3d8d79775520
Summary:
Pull Request resolved: #435

1. Improve algorithm to find partitions
2. Improve encoding speed
3. Other misc changes to bucketing
4. Print 3 decimals in compression ratio

Reviewed By: Cyan4973

Differential Revision: D94680941

fbshipit-source-id: 0bd1cc7635a7b56c81cf58f8c950ec58127547aa
Summary:
Pull Request resolved: #490

Fixes fuzzing issues found with Bitpack OpenZLComponent. When generating inputs during fuzzing, does not check whether head node is illegal or not.

Reviewed By: daniellerozenblit

Differential Revision: D95835522

fbshipit-source-id: 9e3d8215ae53fbcda82bb0d5617521f821f05530
Summary:
Pull Request resolved: #472

# Stack
The goal of this stack is to add support for variable assignments and assume operations to SDDL2.

# Diff
The diff updates the semantic analyzer to support the `MEMBER` operation. This mainly validates that operands are semantically correct (e.g., LHS is a consumed record, RHS is a field).

Reviewed By: Cyan4973

Differential Revision: D95430690

fbshipit-source-id: 981dccc6529a5500bf293d39790159bb66935914
Summary:
Pull Request resolved: #476

1. Fix invalid shift in conversion decoder
2. Fix segmenter with `ZL_Type_string` input. It was broken in 2 places in `stream.c` & add a unit test
3. Catch exceptions in the `FuzzCompress` fuzzer. Initially, I was hoping that permissive mode would be enough. But when we run into limits like too many streams or too many nodes, permissive mode also fails.

Reviewed By: Cyan4973

Differential Revision: D95604467

fbshipit-source-id: 9d39ae7abc5c6ed8932a4d9aeb900028f2a0dcea
Summary:
Pull Request resolved: #489

Fixes failures for fuzzer health in the no generator variants of the fuzzers.

Reviewed By: terrelln

Differential Revision: D95816404

fbshipit-source-id: b8c1a64256e852a8ad5322ef20e9deb48ea25d17
Summary:
Pull Request resolved: #366

Added new stream preview feature to visualization (appears for both normal and proxy nodes).
- preview shows up in popover UI
    - popover closes when user clicks on another item
    - preview is limited to just 6 lines (4 lines of data + header/optional footer) and character length is restricted to 45 characters
- Old cbor files without stream preview still works as stream preview button only appears if stream preview is defined

Reviewed By: Victor-C-Zhang

Differential Revision: D91596594

fbshipit-source-id: 58c82ae8b585ae981fde19d8d21f782d381bdf9a
Summary:
to compress single-byte repeated patterns within numeric streams.

This saves 1-2 bytes per generated numeric stream feature single-byte repeat pattern.

Pull Request resolved: #482

Reviewed By: terrelln

Differential Revision: D95689234

Pulled By: Cyan4973

fbshipit-source-id: d84e82dd37053719207f40a5f1f2a199b9500fcc
Summary:
Pull Request resolved: #474

# Stack
The goal of this stack is to add support for variable assignments and assume operations to SDDL2.

# Diff
This diff extracts register allocation logic from `CodeGenerator` into a dedicated `RegisterAllocator` utility class. This improves code organization.

Reviewed By: Cyan4973

Differential Revision: D95582393

fbshipit-source-id: 8a6ec53b60d55286a0204fe2a2fb3d6c811264b5
Summary:
Pull Request resolved: #485

This diff expands the SDDL semantic analyzer to prevent variable reassignment. This makes implementation of codegen much simpler in a lot of cases.

Reviewed By: Cyan4973

Differential Revision: D95802815

fbshipit-source-id: da542c8b8e3e67e7e74a0e13130b22bcceab235b
Summary:
Pull Request resolved: #484

# Stack
The goal of this stack is to add support for variable assignments and assume operations to SDDL2.

# Diff
This diff adds assume support for builtin fields and record types in SDDL2, enabling the compiler to handle assume expressions that constrain field values and record type constraints.

## Key Changes
- **Refactored CodeGenerator:** Replaced monolithic generateNode() with specialized generators (generateOp(), generateValue(), generateType()) for cleaner separation of concerns.

- **Assume codegen:** Saves buffer position before consuming, then stores either the loaded value (builtin types) or position (records) in a register for later access.

- **Member access:** Added generateMember() to handle nested field access (e.g., foo.bar.x) by walking record fields and accumulating byte offsets.

- **AssemblyOutput class:** New helper using std::list for efficient instruction concatenation via splice().

- **State tracking:** Added type_aliases_ and assumed_types_ maps to track type information for member access codegen.

Reviewed By: Cyan4973

Differential Revision: D95685706

fbshipit-source-id: 20f42b5d9624e1bde1a3a2efc2e0bac0d7f8df61
Summary: Pull Request resolved: #496

Reviewed By: ahilger

Differential Revision: D95998798

fbshipit-source-id: 3898f05b82f9803e1e1921ea8039b5ea517f8c91
Summary:
Pull Request resolved: #459

Move streamdump to chunktrace instead of trace
Add testcase for streamdump

Reviewed By: Victor-C-Zhang

Differential Revision: D95257237

fbshipit-source-id: 8bb656877c94a14d7527fccbdf4c003db03e44a2
Summary:
Pull Request resolved: #497

Fix `os.getenv` boolean handling for `OPENZL_SKIP_VISUALIZER_BUILD` and `OPENZL_USE_SYSTEM_PYTHON_EXTENSION`.

Previously used `False` as the default, which returns a boolean when unset but a string when set.  [OPENZL_SKIP_VISUALIZER_BUILD](https://fburl.com/code/ffh2g9s7) and [OPENZL_USE_SYSTEM_PYTHON_EXTENSION](https://fburl.com/code/8y08ameq) are defined with "1", directly check if equal to string now.

Reviewed By: daniellerozenblit

Differential Revision: D96188097

fbshipit-source-id: 7a2237de473f31f1a2ff969876c52a17033d2085
Summary:
Pull Request resolved: #464

Add a partition codec that:
* Can replace `ZL_NODE_QUANTIZE_OFFSETS` (preset present, but implementation not optimized for this yet)
* Can replace `ZL_NODE_QUANTIZE_LENGTHS` (preset present, but implementation not optimized for this yet)
* Can implement the `VarByte16` and `Bucket16` codecs from `contrib/lz-research`

See the [design doc](https://docs.google.com/document/d/15ojKVhaCa1wwecbt3mRCgiy59pveJVQBASKbC2zuzEI/edit?tab=t.6ggqt4jnzgcj#heading=h.wvrp8vwdxgp8) for details.

Reviewed By: Cyan4973

Differential Revision: D95218214

fbshipit-source-id: 29084633943639260e90e09a0ad990a2a0f3c479
Summary:
Pull Request resolved: #463

As title

Reviewed By: mmandina

Differential Revision: D95302280

fbshipit-source-id: b6585e29d41b59d38441bd4b71a26d8f6361118a
Summary:
Pull Request resolved: #491

* Add a segmenter to the PyTorch model parser
* Allow segmenters for older format versions so long as they only emit a single segment

Reviewed By: Cyan4973

Differential Revision: D95605317

fbshipit-source-id: 6d3c48f3f1915a760760f28216d1ae739bee9a83
Summary:
Pull Request resolved: #493

There was a name conflict with `zl_reflection.h`

Reviewed By: daniellerozenblit

Differential Revision: D95872479

fbshipit-source-id: 04951e8d9f866ab8663c01196be50970fbf6c47f
daniellerozenblit and others added 24 commits May 4, 2026 09:04
Summary:
Pull Request resolved: #717

Per the parquet spec, definition/repetition level blocks are omitted from data pages when the column has no `OPTIONAL`/`REPEATED` ancestor. We did not previously handle this case, causing parsing of Parquet files with required columns to fail.

This diff adds tracking of `hasDefinitionLevels` / `hasRepetitionLevels` per leaf in the schema metadata, and consume the repetition/definition blocks independently in the lexer only when present.

Reviewed By: terrelln

Differential Revision: D103245483

fbshipit-source-id: 7b7b9f51c9507cab5f078f311fb8f09b760939ee
Summary:
Pull Request resolved: #699

Extract `parseSize` / `suffixToMultiplier` from `ProfileArgs` into
reusable `checkedstoi`, `checkedstol`, `checkedstoul`, and `checkedstoll`
functions in `cli/utils/util.{h,cpp}`. These accept an optional trailing
size suffix (K, MB, GiB, etc.) and validate the input, replacing the
raw `std::stoi` / `std::stoul` calls that silently ignore trailing
characters.

Migrate all 9 call sites in CLI arg headers (BenchmarkArgs,
CompressArgs, GlobalArgs, TrainArgs) to the new checked variants.

Note that by moving this to util.h, there was a  circular dependency that had to be broken between `compress_profiles.h` and `util.h`. So this diff also moves `ProfileArgs` method bodies from the header into
`compress_profiles.cpp` to break that circular dependency.

Reviewed By: kevinjzhang

Differential Revision: D103111345

fbshipit-source-id: f3734d07bf78f22b4c12ba76d54d2215352e60f1
Summary:
Pull Request resolved: #722

Add a minimal VS Code extension that provides TextMate syntax highlighting for OpenZL SDDL2 (`.sddl`) files.

This makes `.sddl` files easier to read and write by colorizing keywords, builtin types, builtin functions, operators, numeric literals, and comments. The extension is intended to be installed locally via a symlink (see README); it is not published.

Reviewed By: Cyan4973

Differential Revision: D103065790

fbshipit-source-id: 5181ec0a99dfad3b543da5d4b4354a9488d6d24d
Summary:
Pull Request resolved: #723

Adds a brief section documenting syntax highlighting support for SDDL files.

Reviewed By: mmandina

Differential Revision: D103700217

fbshipit-source-id: 7003a6ed4ee91274ef269c72a234c9faa860db7f
Summary:
Pull Request resolved: #719

I also just saw some timeouts in non-version-test fuzzers, so opt into the 10 minute initialization timeout globally. Hopefully, some of the other speedups will also help, but I don't want to be spammed by this issue.

Reviewed By: Cyan4973

Differential Revision: D103449503

fbshipit-source-id: c1089d78e64ebcba34473b807f88410e4d7ee09b
Summary:
Pull Request resolved: #726

Separate SAO example files by SDDL language version: SDDL1 examples live in `examples/sddl/`, and SDDL2 examples live in `examples/sddl2/`.

Also adds an SDDL1-syntax port of the SAO parser.

### Changes

- Move `sao_full.sddl` and `sao_silesia.sddl` from `examples/sddl/` to `examples/sddl2/` (these are SDDL2 sources)
- Add `examples/sddl/sao_full.oldv1.sddl`: an SDDL1 implementation of the SAO star catalog parser

Reviewed By: Cyan4973

Differential Revision: D103718414

fbshipit-source-id: d7281be5f6118864651a5b2f76731d570471b6eb
Summary:
Pull Request resolved: #720

Fuzzers were failing because we were consuming too much data after the end of the input. This was for two reasons:

1. Many fuzzers use more than 1 byte of entropy for each byte of input data produced. So divide `num_remaining_bytes()` by 4.
2. Some components, namely `MuxLengths`, was treating `maxInputSize` as a number of elements, rather than a number of bytes. Fix that.
3. Claude also found two other components `SentinelByte` and `CompressSmallLengths` that had the same issue, so proactively fix it.

Reviewed By: daniellerozenblit

Differential Revision: D103685305

fbshipit-source-id: ff6c7c01b279277774ad61d7357f5644f3744957
Summary:
Pull Request resolved: #721

If allocation fails, when freeing the arena mark it as `NULL` so it isn't double freed.

Reviewed By: Victor-C-Zhang

Differential Revision: D103685306

fbshipit-source-id: 321481299933384265d19fd099eabc7530a440e3
Summary:
Pull Request resolved: #725

Replace out-parameter error handling with Result types that bundle values with error codes. This prevents bugs from uninitialized outputs and uses ZL_NODISCARD to catch ignored errors at compile time.

New SDDL2_RESULT_OF(type) macro creates Result structs with accessor macros (SDDL2_isError, SDDL2_error, SDDL2_value) and constructors (SDDL2_success, SDDL2_failure).

API changes:
- SDDL2_kind_size(), SDDL2_Type_size() now return SDDL2_RESULT_OF(size_t)
- SDDL2_Stack_pop() now returns SDDL2_RESULT_OF(SDDL2_Value)
- Internal pop_* helpers now return Result types
- Added SDDL2_TRY_LET for concise error propagation

Updated all call sites in sddl2.c, sddl2_vm.c, and tests.

Reviewed By: Cyan4973

Differential Revision: D103698980

fbshipit-source-id: e23c80aa876d658dc707173332401bc8f826089b
Summary:
Bumps [postcss](https://github.com/postcss/postcss) from 8.5.9 to 8.5.12.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/postcss/postcss/releases">postcss's releases</a>.</em></p>
<blockquote>
<h2>8.5.12</h2>
<ul>
<li>Fixed reading any file via user-generated CSS.</li>
<li>Added <code>opts.unsafeMap</code> to disable checks.</li>
</ul>
<h2>8.5.11</h2>
<ul>
<li>Fixed nested brackets parsing performance (by <a href="https://github.com/offset"><code>@​offset</code></a>).</li>
</ul>
<h2>8.5.10</h2>
<ul>
<li>Fixed XSS via unescaped <code>&lt;/style&gt;</code> in non-bundler cases (by <a href="https://github.com/TharVid"><code>@​TharVid</code></a>).</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/postcss/postcss/blob/main/CHANGELOG.md">postcss's changelog</a>.</em></p>
<blockquote>
<h2>8.5.12</h2>
<ul>
<li>Fixed reading any file via user-generated CSS.</li>
<li>Added <code>opts.unsafeMap</code> to disable checks.</li>
</ul>
<h2>8.5.11</h2>
<ul>
<li>Fixed nested brackets parsing performance (by <a href="https://github.com/offset"><code>@​offset</code></a>).</li>
</ul>
<h2>8.5.10</h2>
<ul>
<li>Fixed XSS via unescaped <code>&lt;/style&gt;</code> in non-bundler cases (by <a href="https://github.com/TharVid"><code>@​TharVid</code></a>).</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="https://github.com/postcss/postcss/commit/9bc81c48f054a630c9a2e3868263b7ad4fc15013"><code>9bc81c4</code></a> Release 8.5.12 version</li>
<li><a href="https://github.com/postcss/postcss/commit/85c4d7dab830be366f8a96047f9e5b7944e101d8"><code>85c4d7d</code></a> Another try to fix coverage</li>
<li><a href="https://github.com/postcss/postcss/commit/94484cae6d4308167939f2ac888d166bd80dff01"><code>94484ca</code></a> Try to fix coverage</li>
<li><a href="https://github.com/postcss/postcss/commit/c64b7488d2731dfa16213739b42c34faf5a9eba3"><code>c64b748</code></a> Load only .map source maps</li>
<li><a href="https://github.com/postcss/postcss/commit/aaec7b78b3ce2792585b4b300ef1bd5dd5b3e8ad"><code>aaec7b7</code></a> Avoid throwing JSON parsing errors for non-JSON source maps</li>
<li><a href="https://github.com/postcss/postcss/commit/233fb264ea4c37f9e2d7b64b2726e6d23fd02327"><code>233fb26</code></a> Mention original author of the solution</li>
<li><a href="https://github.com/postcss/postcss/commit/2502f750307acde733a39f9dfd4ef3cf6c6b734d"><code>2502f75</code></a> Release 8.5.11 version</li>
<li><a href="https://github.com/postcss/postcss/commit/5ca19019495b3fa08205f5fd2eeed57892f9fa3d"><code>5ca1901</code></a> Speed up parsing many nested brackets</li>
<li><a href="https://github.com/postcss/postcss/commit/42b5337dd7e2fa9a03566495cfad2737eb19e712"><code>42b5337</code></a> Update dependencies</li>
<li><a href="https://github.com/postcss/postcss/commit/7e36e153d075ef56ebc352f298b65f646c700a06"><code>7e36e15</code></a> Cache node.raws locally in Stringifier hot methods</li>
<li>Additional commits viewable in <a href="https://github.com/postcss/postcss/compare/8.5.9...8.5.12">compare view</a></li>
</ul>
</details>
<br />

[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=postcss&package-manager=npm_and_yarn&previous-version=8.5.9&new-version=8.5.12)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

 ---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `dependabot rebase` will rebase this PR
- `dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- `dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/facebook/openzl/network/alerts).

</details>

Pull Request resolved: #682

Reviewed By: terrelln

Differential Revision: D103783596

Pulled By: Victor-C-Zhang

fbshipit-source-id: dd09cad8cd89dd98fb4bde33d11ea23cd60efb26
Summary:
Pull Request resolved: #718

The CSV segmenter crashed (fatal assertion) when processing input with high token density (e.g., many consecutive separators). The root cause was twofold:

1. `maxNumTokens` was calculated as `min(byteSize, chunkByteSizeMax)`, assuming at most 1 token per byte. But consecutive separators produce 2 tokens per byte (empty field + separator), causing the lexer to exhaust the token buffer before consuming enough bytes.

2. When the token buffer was exhausted mid-chunk, the segmenter's internal consistency checks fired with `logicError`, which triggers a fatal assertion in `ZL_E_create_va` ("Logic errors should never actually be generated").

Fix:
- Increase `maxNumTokens` to `2 * chunkSize + 1` to handle worst-case token density

Reviewed By: Victor-C-Zhang

Differential Revision: D103001388

fbshipit-source-id: 8fe4b418b25b551ea76a68a846f9150c59e09f44
Summary:
Pull Request resolved: #732

Add a bounds check immediately after the chunk-advance so that a malformed input which lands `chunkIdx` past the end of the column-chunk vector is rejected with `node_invalid_input` instead of aborting.

Reviewed By: terrelln

Differential Revision: D103844843

fbshipit-source-id: 270fe012bb9f87d9fe756e2d50ea63206877d7d5
Summary:
Pull Request resolved: #733

Adds a flag `--save-ace-state` to the `zli` that saves the ace state to the serialized compressor produced.

Implements this by preserving the ACE state local parameter after graph replacement is done.

Reviewed By: daniellerozenblit

Differential Revision: D102877314

fbshipit-source-id: f0455f2e21aeba3b034bc4fd5eba79a72702275b
Summary:
Pull Request resolved: #724

Adding banner in visualizer to notify users of new keyboard navigation mode.
will removed in some future date.

Reviewed By: Victor-C-Zhang

Differential Revision: D103706533

fbshipit-source-id: 7eb3e4bcbfec09ecf2faa01534c92b902c298d43
Summary:
Pull Request resolved: #734

The segmenter callbacks already treat a missing chunk-size local int param as "use the built-in default" (`SEGM_NUM_FROM_SERIAL_DEFAULT_CHUNK_SIZE`, currently `ZL_DEFAULT_SEGMENTER_CHUNK_BYTE_SIZE` = 16 MiB).

The matching builder helper `ZL_Compressor_buildNumFromSerialSegmenter[2]`, however, rejected `chunkByteSize == 0`. The only way to get the segmenter's default through the builder API today was to pass `ZL_DEFAULT_SEGMENTER_CHUNK_BYTE_SIZE` explicitly.

This aligns the builder with the library convention: `chunkByteSize == 0` means "use `ZL_DEFAULT_SEGMENTER_CHUNK_BYTE_SIZE`". The substitution happens before the `>= eltByteWidth` validation, so explicit small values are still rejected as before. Doxygen on the public header is updated to document the new sentinel.

Reviewed By: kevinjzhang

Differential Revision: D103898790

fbshipit-source-id: 4f9c2fc37d437ac43cc6539af4ea59aa1f39eac7
Summary:
Pull Request resolved: #730

The `serial` profile can no longer ingest inputs larger than 4 GiB, likely due to the internal LZ engine being updated from Zstandard to the new implementation.

This change applies the same approach used by numeric profiles: large inputs are automatically split into smaller, more manageable chunks. The default chunk size is 16 MiB, and users can override it with `--chunk-size`.

A new standard segmenter, `SEGM_serial`, is added and modeled directly on `SEGM_numFromSerial`. It splits serial input by byte size, using a default of 16 MiB. The chunk size can also be configured through the new public local integer parameter `ZL_SEGMENT_SERIAL_CHUNK_BYTE_SIZE_PARAM`.

Each chunk is forwarded to a successor graph. If no successor is provided, the segmenter falls back to `ZL_GRAPH_COMPRESS_GENERIC`.

As with the other segmenters, when `formatVersion < ZL_CHUNK_VERSION_MIN`, the segmenter emits a single chunk so the resulting frame remains decodable by older clients.

This also adds:

  - A new public macro, `ZL_SEGMENT_SERIAL`
  - Two builder helpers, `ZL_Compressor_buildSerialSegmenter[2]`, in `include/openzl/codecs/zl_segmenters.h`
  - A typed C++ wrapper, `graphs::SegmentSerial`, modeled on `graphs::SDDL2`

The CLI `serial` profile now reads `--chunk-size` and wraps its existing `ACE+LZ` graph with the new segmenter. As a result, it now sets `supportsChunkSize_ = true`.

Finally, the new standard graph ID, `ZL_StandardGraphID_segment_serial`, is appended to the public enum before `_public_end`, preserving all existing wire-format IDs.

Reviewed By: kevinjzhang

Differential Revision: D103759746

fbshipit-source-id: 13225c2be665bfa5529f3bb0dc1a40c02b930ad9
Summary:
Pull Request resolved: #727

Fix `make -j1` build on MSYS2/MinGW by specifying the "Unix Makefiles" CMake generator (otherwise CMake may pick Ninja or NMake on Windows)

Reviewed By: Cyan4973

Differential Revision: D103741888

fbshipit-source-id: 6084366ee09817a3eca9d2194955f82d59d13206
Summary:
Pull Request resolved: #737

Ensure the benchmark command reproducers are committed in the repo for our release

Reviewed By: daniellerozenblit, Victor-C-Zhang

Differential Revision: D104059044

fbshipit-source-id: c1d12069f1a652c3b548306c57aed41cecbbe47e
Summary:
Pull Request resolved: #586

ClangCL disables C++ exception handling by default, enable exception handling by adding `/EHsc` flag for when MSVC/ClangCL is used.

Added CI and fixed other small issues with ClangCL

Github issue: #357

Reviewed By: Cyan4973

Differential Revision: D99160390

fbshipit-source-id: 1639336dd452247d1f677e6b87c8f0d2387c6cbf
Summary:
Pull Request resolved: #736

The test checks that two trained compressors with different chunk sizes produce different `.zlc` files.
The previous version compared file sizes, but `ACE` training is non-deterministic, and two runs can produce different artifacts but with identical size, which trips the assertion as a false positive. This is rare but can happen in CI, making CI signal flaky.

Switch to a bytewise comparison, which must always be true.

Reviewed By: terrelln

Differential Revision: D103963402

fbshipit-source-id: 4e57acdd54818e4ed55719d716083b8fbb026941
Summary:
Pull Request resolved: #740

As titled.

The `Zstrong_ParquetLexerTest_FuzzLexerValidInput` lionhead harness was previously crashing with when generating parquet data.

Reviewed By: terrelln

Differential Revision: D104079523

fbshipit-source-id: 9f088ef4758fb5916f4a5558b367b65832bcb685
Summary:
Pull Request resolved: #739

We don't test on 32-bit platforms yet, so static assert that it isn't supported in one known place where we don't support 32-bits.

Reviewed By: daniellerozenblit

Differential Revision: D104076319

fbshipit-source-id: 9ba32fba441c3ba84e79c129a64996c6c07baeb9
Summary:
Pull Request resolved: #741

Adds the SDDL2 announcement to the SDDL docs.

Reviewed By: Cyan4973

Differential Revision: D104098404

fbshipit-source-id: d44b5e195c7447c9f6e0b23e16e8c9e61f8b8d02
Summary:
Pull Request resolved: #742

The CSV parser `maxNumTokens` is incorrect in cases where the input cannot be chunked properly (no newlines present to chunk at). Raise the limit of this value.

Reviewed By: terrelln

Differential Revision: D104082821

fbshipit-source-id: fb2b2d8dcf10cead7f6427bc9619d1a0950feefb
Comment on lines +136 to +148
name: CMake Tarball Build Test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Test tarball build scenario
run: |
# Simulate tarball environment by removing git repo and deps
rm -rf .git deps/zstd
# Configure and build -- should download deps
cmake -B cmakebuild -DOPENZL_BUILD_TESTS=ON -DOPENZL_ALLOW_INTROSPECTION=OFF
cmake --build cmakebuild
# quick runtime check, to verify it works
cmakebuild/cli/zli --version
Comment on lines +193 to +201
name: C99 Compatibility
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Verify public headers compile under strict C99
run: CC=clang make c99_compat

# CLI integration testing
test-cli:
Comment on lines +16 to +35
name: Build SDist
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
submodules: true

- name: Build SDist
run: pipx run build --sdist contrib/openzl-demo

- name: Check metadata
run: pipx run twine check contrib/openzl-demo/dist/*

- uses: actions/upload-artifact@v4
with:
name: dist-sdist
path: contrib/openzl-demo/dist/*.tar.gz


build_wheels:
Comment on lines +36 to +64
name: Wheels on ${{ matrix.os }}
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
# Once MSVC is supported we can build Windows packages as well by adding windows-latest
# Once we're public we can build on ubuntu-24.04-arm & windows-11-arm.
# They are not supported for private repos.
# TODO: Remove filesystem for macos-13
os: [ubuntu-latest, macos-latest]

steps:
- uses: actions/checkout@v4
with:
submodules: true

- uses: pypa/cibuildwheel@v3.0.0
with:
package-dir: contrib/openzl-demo

- name: Verify clean directory
run: git diff --exit-code
shell: bash

- name: Upload wheels
uses: actions/upload-artifact@v4
with:
path: wheelhouse/*.whl
name: dist-${{ matrix.os }}
Comment on lines +83 to +99
name: Upload if release
needs: [build_wheels, build_sdist]
runs-on: ubuntu-latest
if: github.event_name == 'release' && github.event.action == 'published'

steps:
- uses: actions/setup-python@v5
- uses: actions/download-artifact@v4
with:
path: dist
pattern: dist-*
merge-multiple: true

- uses: pypa/gh-action-pypi-publish@release/v1
with:
user: __token__
password: ${{ secrets.pypi_password }}
@meta-cla meta-cla Bot added the cla signed label May 7, 2026
@Cyan4973
Copy link
Copy Markdown
Contributor Author

Cyan4973 commented May 7, 2026

CodeQL flagged issues are interesting, but not release-blocking.
We'll take care of them after the release.

Summary:
one last thing to do before the release...

Pull Request resolved: #745

Reviewed By: jlee303

Differential Revision: D104265941

Pulled By: Cyan4973

fbshipit-source-id: 8ed4b925e09120f8588e639ef83c5ffe7f0013b7
@Cyan4973 Cyan4973 merged commit 3dceb64 into release May 7, 2026
59 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.