Conversation
Summary: This new test is designed to detect issues like #282 In the process, it was discovered that #282 was not enough to fix compatibility issues, so this PR bundles an additional `cmake` fix. Pull Request resolved: #423 Reviewed By: terrelln Differential Revision: D94418093 Pulled By: Cyan4973 fbshipit-source-id: a1064c2affff30b5c0e05d42d17454c9e43bef5a
Summary: `zli` already supports `le-u64`, `le-u32` and `le-u16`, but strangely does not offer anything for 8-bit numeric. This PR fixes it, now `u8` and `i8` are selectable profiles, and there are tests to check the new support. _note: there is no `le-` prefix, since 8-bit numeric has no endianess_. Among the desirable properties, it's possible to train a `u8` profile, thus starting with a `num8` conversion. It will also be possible to pass such data to `num8` specific Selectors in the future. Pull Request resolved: #483 Reviewed By: terrelln Differential Revision: D95691133 Pulled By: Cyan4973 fbshipit-source-id: 08d7835087e5dfcca1208be76c5bf5e4d99b9b39
Summary: Pull Request resolved: #457 This diff splits `CompilerTest.cpp` into separate test files per compiler stage. This makes it easier to test the functionality of each component in isolation; e.g making sure the parser produces the expected AST before optimizations. Reviewed By: Cyan4973 Differential Revision: D95084293 fbshipit-source-id: 60dc1e8a63fccdfcbcf0b6f082584a83ca7539d7
Summary: Pull Request resolved: #481 Differential Revision: D95033895 fbshipit-source-id: ed3249c02f357e7de9815ada711b39d525c94087
Summary: Pull Request resolved: #475 Two fixes in dev & release ROLZ Reviewed By: kevinjzhang Differential Revision: D95604468 fbshipit-source-id: 377023da42cebf135dce4099b118ad10ea5b4d9a
Summary: Pull Request resolved: #480 Differential Revision: D95043302 fbshipit-source-id: 81bd9061eb6f0a9f5731fd40dab3bfd341aa692b
Summary: Pull Request resolved: #467 # Stack The goal of this stack is to add support for variable assignments and assume operations to SDDL2. # Diff This diff implements variable assignment and reference support in the code generator. This enables SDDL2 programs to use global variables for parameterizing field sizes and reusing type definitions. - Implemented `Op::ASSIGN` codegen: evaluates the RHS expression, allocates a VM register via `getOrAssignRegister()`, and emits `var.store`. - Implemented `ASTVar` reference codegen: looks up the variable's register and emits `var.load`. If this is the last reference to a particular variable, returns the register to the free pool. - Added a register allocator (`var_registry_`, `free_registers_`, `next_register_`) that reuses freed registers and enforces the `SDDL2_VAR_REGISTER_COUNT` (256) limit at compile time. Reviewed By: Cyan4973 Differential Revision: D94947414 fbshipit-source-id: 2626f437bd16fb7891a689131e5ef4f3b927de1a
Summary: Pull Request resolved: #468 # Stack The goal of this stack is to add support for variable assignments and assume operations to SDDL2. # Diff This diff adds a `MEMBER` op to the SDDL grammar, which can be used to access fields of consumed recrds. This operator (`.`) was already supported in the syntax. Reviewed By: Cyan4973 Differential Revision: D95396139 fbshipit-source-id: b1ba4f6c16ff9bd7fd633b01611206ab0b1f4c98
Summary: Pull Request resolved: #471 # Stack The goal of this stack is to add support for variable assignments and assume operations to SDDL2. # Diff This diff handles the `MEMBER` op in the dead var optimizer. Reviewed By: Cyan4973 Differential Revision: D95396507 fbshipit-source-id: 21cd50847087d86b8e17a4258f0116e47c988f0e
Summary: Pull Request resolved: #431 Add bitsplit_fp codec support to OpenZL for floating-point data compression. This extends the existing bitsplit codec family to handle floating-point types, enabling more efficient compression of floating-point data streams in the OpenZL framework. Reviewed By: Cyan4973 Differential Revision: D94410436 fbshipit-source-id: e42fa9ec08ec3971eaa22c1b6381c91036ef6441
Summary: Pull Request resolved: #470 - Adds `testCodecVO` and `AssertEqVOFunctionGraph` to the codec test framework, enabling tests to verify that a codec produces the expected output streams from a single input - Adds `FP32_OutputOrder`, `FP64_OutputOrder`, and `FP16_OutputOrder` tests that verify bitsplit_fp splits IEEE 754 floats into the correct mantissa, exponent, and sign streams Reviewed By: Cyan4973 Differential Revision: D95400978 fbshipit-source-id: 292c3cf6010009dd9db8e5e81a5a0df9a087f0e2
…itBench benchmarks (#478) Summary: Pull Request resolved: #478 Add specialized bitSplit encoder fast paths for IEEE 754 floating-point types (fp16, fp32, fp64) and unitBench benchmark scenarios to measure their performance. Kernel changes (encode_bitSplit_kernel.c): - Wire up fp16 and fp32 and fp64 fast path dispatches in ZL_bitSplitEncode Benchmark results (10MB random data, before -> after fast path): │ Scenario │ Generic (MB/s) │ Fast path (MB/s) │ Speedup │ ├─────────────────────┼────────────────┼──────────────────┼─────────┤ │ bitSplitEncode_fp16 │ 544 │ 17,286 │ 31.8x │ ├─────────────────────┼────────────────┼──────────────────┼─────────┤ │ bitSplitEncode_fp32 │ 1,130 │ 17,890 │ 15.8x │ ├─────────────────────┼────────────────┼──────────────────┼─────────┤ │ bitSplitEncode_fp64 │ 2,045 │ 21,037 │ 10.3x │ └─────────────────────┴────────────────┴──────────────────┴─────────┘ New unitBench scenarios: - bitSplitEncode_fp16: kernel-level fp16 encode benchmark - bitSplitEncode_fp64: kernel-level fp64 encode benchmark Also adds a unitbench-scenarios Claude skill for creating future benchmarks. Reviewed By: Cyan4973 Differential Revision: D95618687 fbshipit-source-id: cf68e2109a57bea263de5e84474df2e3cea55fcd
Summary: Pull Request resolved: #436 Add a helper to save all input streams to a codec with a given name. This is used to extract the offsets that are being passed into the bucketing codec for benchmarking. Also, print a summary of the results at the end of benchmarking. Reviewed By: Cyan4973 Differential Revision: D94680942 fbshipit-source-id: 0adaa56f32920ac256adfabb2ddd3d8d79775520
Summary: Pull Request resolved: #435 1. Improve algorithm to find partitions 2. Improve encoding speed 3. Other misc changes to bucketing 4. Print 3 decimals in compression ratio Reviewed By: Cyan4973 Differential Revision: D94680941 fbshipit-source-id: 0bd1cc7635a7b56c81cf58f8c950ec58127547aa
Summary: Pull Request resolved: #490 Fixes fuzzing issues found with Bitpack OpenZLComponent. When generating inputs during fuzzing, does not check whether head node is illegal or not. Reviewed By: daniellerozenblit Differential Revision: D95835522 fbshipit-source-id: 9e3d8215ae53fbcda82bb0d5617521f821f05530
Summary: Pull Request resolved: #472 # Stack The goal of this stack is to add support for variable assignments and assume operations to SDDL2. # Diff The diff updates the semantic analyzer to support the `MEMBER` operation. This mainly validates that operands are semantically correct (e.g., LHS is a consumed record, RHS is a field). Reviewed By: Cyan4973 Differential Revision: D95430690 fbshipit-source-id: 981dccc6529a5500bf293d39790159bb66935914
Summary: Pull Request resolved: #476 1. Fix invalid shift in conversion decoder 2. Fix segmenter with `ZL_Type_string` input. It was broken in 2 places in `stream.c` & add a unit test 3. Catch exceptions in the `FuzzCompress` fuzzer. Initially, I was hoping that permissive mode would be enough. But when we run into limits like too many streams or too many nodes, permissive mode also fails. Reviewed By: Cyan4973 Differential Revision: D95604467 fbshipit-source-id: 9d39ae7abc5c6ed8932a4d9aeb900028f2a0dcea
Summary: Pull Request resolved: #489 Fixes failures for fuzzer health in the no generator variants of the fuzzers. Reviewed By: terrelln Differential Revision: D95816404 fbshipit-source-id: b8c1a64256e852a8ad5322ef20e9deb48ea25d17
Summary: Pull Request resolved: #366 Added new stream preview feature to visualization (appears for both normal and proxy nodes). - preview shows up in popover UI - popover closes when user clicks on another item - preview is limited to just 6 lines (4 lines of data + header/optional footer) and character length is restricted to 45 characters - Old cbor files without stream preview still works as stream preview button only appears if stream preview is defined Reviewed By: Victor-C-Zhang Differential Revision: D91596594 fbshipit-source-id: 58c82ae8b585ae981fde19d8d21f782d381bdf9a
Summary: to compress single-byte repeated patterns within numeric streams. This saves 1-2 bytes per generated numeric stream feature single-byte repeat pattern. Pull Request resolved: #482 Reviewed By: terrelln Differential Revision: D95689234 Pulled By: Cyan4973 fbshipit-source-id: d84e82dd37053719207f40a5f1f2a199b9500fcc
Summary: Pull Request resolved: #474 # Stack The goal of this stack is to add support for variable assignments and assume operations to SDDL2. # Diff This diff extracts register allocation logic from `CodeGenerator` into a dedicated `RegisterAllocator` utility class. This improves code organization. Reviewed By: Cyan4973 Differential Revision: D95582393 fbshipit-source-id: 8a6ec53b60d55286a0204fe2a2fb3d6c811264b5
Summary: Pull Request resolved: #485 This diff expands the SDDL semantic analyzer to prevent variable reassignment. This makes implementation of codegen much simpler in a lot of cases. Reviewed By: Cyan4973 Differential Revision: D95802815 fbshipit-source-id: da542c8b8e3e67e7e74a0e13130b22bcceab235b
Summary: Pull Request resolved: #484 # Stack The goal of this stack is to add support for variable assignments and assume operations to SDDL2. # Diff This diff adds assume support for builtin fields and record types in SDDL2, enabling the compiler to handle assume expressions that constrain field values and record type constraints. ## Key Changes - **Refactored CodeGenerator:** Replaced monolithic generateNode() with specialized generators (generateOp(), generateValue(), generateType()) for cleaner separation of concerns. - **Assume codegen:** Saves buffer position before consuming, then stores either the loaded value (builtin types) or position (records) in a register for later access. - **Member access:** Added generateMember() to handle nested field access (e.g., foo.bar.x) by walking record fields and accumulating byte offsets. - **AssemblyOutput class:** New helper using std::list for efficient instruction concatenation via splice(). - **State tracking:** Added type_aliases_ and assumed_types_ maps to track type information for member access codegen. Reviewed By: Cyan4973 Differential Revision: D95685706 fbshipit-source-id: 20f42b5d9624e1bde1a3a2efc2e0bac0d7f8df61
Summary: Pull Request resolved: #496 Reviewed By: ahilger Differential Revision: D95998798 fbshipit-source-id: 3898f05b82f9803e1e1921ea8039b5ea517f8c91
Summary: Pull Request resolved: #459 Move streamdump to chunktrace instead of trace Add testcase for streamdump Reviewed By: Victor-C-Zhang Differential Revision: D95257237 fbshipit-source-id: 8bb656877c94a14d7527fccbdf4c003db03e44a2
Summary: Pull Request resolved: #497 Fix `os.getenv` boolean handling for `OPENZL_SKIP_VISUALIZER_BUILD` and `OPENZL_USE_SYSTEM_PYTHON_EXTENSION`. Previously used `False` as the default, which returns a boolean when unset but a string when set. [OPENZL_SKIP_VISUALIZER_BUILD](https://fburl.com/code/ffh2g9s7) and [OPENZL_USE_SYSTEM_PYTHON_EXTENSION](https://fburl.com/code/8y08ameq) are defined with "1", directly check if equal to string now. Reviewed By: daniellerozenblit Differential Revision: D96188097 fbshipit-source-id: 7a2237de473f31f1a2ff969876c52a17033d2085
Summary: Pull Request resolved: #464 Add a partition codec that: * Can replace `ZL_NODE_QUANTIZE_OFFSETS` (preset present, but implementation not optimized for this yet) * Can replace `ZL_NODE_QUANTIZE_LENGTHS` (preset present, but implementation not optimized for this yet) * Can implement the `VarByte16` and `Bucket16` codecs from `contrib/lz-research` See the [design doc](https://docs.google.com/document/d/15ojKVhaCa1wwecbt3mRCgiy59pveJVQBASKbC2zuzEI/edit?tab=t.6ggqt4jnzgcj#heading=h.wvrp8vwdxgp8) for details. Reviewed By: Cyan4973 Differential Revision: D95218214 fbshipit-source-id: 29084633943639260e90e09a0ad990a2a0f3c479
Summary: Pull Request resolved: #463 As title Reviewed By: mmandina Differential Revision: D95302280 fbshipit-source-id: b6585e29d41b59d38441bd4b71a26d8f6361118a
Summary: Pull Request resolved: #491 * Add a segmenter to the PyTorch model parser * Allow segmenters for older format versions so long as they only emit a single segment Reviewed By: Cyan4973 Differential Revision: D95605317 fbshipit-source-id: 6d3c48f3f1915a760760f28216d1ae739bee9a83
Summary: Pull Request resolved: #493 There was a name conflict with `zl_reflection.h` Reviewed By: daniellerozenblit Differential Revision: D95872479 fbshipit-source-id: 04951e8d9f866ab8663c01196be50970fbf6c47f
Summary: Pull Request resolved: #717 Per the parquet spec, definition/repetition level blocks are omitted from data pages when the column has no `OPTIONAL`/`REPEATED` ancestor. We did not previously handle this case, causing parsing of Parquet files with required columns to fail. This diff adds tracking of `hasDefinitionLevels` / `hasRepetitionLevels` per leaf in the schema metadata, and consume the repetition/definition blocks independently in the lexer only when present. Reviewed By: terrelln Differential Revision: D103245483 fbshipit-source-id: 7b7b9f51c9507cab5f078f311fb8f09b760939ee
Summary: Pull Request resolved: #699 Extract `parseSize` / `suffixToMultiplier` from `ProfileArgs` into reusable `checkedstoi`, `checkedstol`, `checkedstoul`, and `checkedstoll` functions in `cli/utils/util.{h,cpp}`. These accept an optional trailing size suffix (K, MB, GiB, etc.) and validate the input, replacing the raw `std::stoi` / `std::stoul` calls that silently ignore trailing characters. Migrate all 9 call sites in CLI arg headers (BenchmarkArgs, CompressArgs, GlobalArgs, TrainArgs) to the new checked variants. Note that by moving this to util.h, there was a circular dependency that had to be broken between `compress_profiles.h` and `util.h`. So this diff also moves `ProfileArgs` method bodies from the header into `compress_profiles.cpp` to break that circular dependency. Reviewed By: kevinjzhang Differential Revision: D103111345 fbshipit-source-id: f3734d07bf78f22b4c12ba76d54d2215352e60f1
Summary: Pull Request resolved: #722 Add a minimal VS Code extension that provides TextMate syntax highlighting for OpenZL SDDL2 (`.sddl`) files. This makes `.sddl` files easier to read and write by colorizing keywords, builtin types, builtin functions, operators, numeric literals, and comments. The extension is intended to be installed locally via a symlink (see README); it is not published. Reviewed By: Cyan4973 Differential Revision: D103065790 fbshipit-source-id: 5181ec0a99dfad3b543da5d4b4354a9488d6d24d
Summary: Pull Request resolved: #723 Adds a brief section documenting syntax highlighting support for SDDL files. Reviewed By: mmandina Differential Revision: D103700217 fbshipit-source-id: 7003a6ed4ee91274ef269c72a234c9faa860db7f
Summary: Pull Request resolved: #719 I also just saw some timeouts in non-version-test fuzzers, so opt into the 10 minute initialization timeout globally. Hopefully, some of the other speedups will also help, but I don't want to be spammed by this issue. Reviewed By: Cyan4973 Differential Revision: D103449503 fbshipit-source-id: c1089d78e64ebcba34473b807f88410e4d7ee09b
Summary: Pull Request resolved: #726 Separate SAO example files by SDDL language version: SDDL1 examples live in `examples/sddl/`, and SDDL2 examples live in `examples/sddl2/`. Also adds an SDDL1-syntax port of the SAO parser. ### Changes - Move `sao_full.sddl` and `sao_silesia.sddl` from `examples/sddl/` to `examples/sddl2/` (these are SDDL2 sources) - Add `examples/sddl/sao_full.oldv1.sddl`: an SDDL1 implementation of the SAO star catalog parser Reviewed By: Cyan4973 Differential Revision: D103718414 fbshipit-source-id: d7281be5f6118864651a5b2f76731d570471b6eb
Summary: Pull Request resolved: #720 Fuzzers were failing because we were consuming too much data after the end of the input. This was for two reasons: 1. Many fuzzers use more than 1 byte of entropy for each byte of input data produced. So divide `num_remaining_bytes()` by 4. 2. Some components, namely `MuxLengths`, was treating `maxInputSize` as a number of elements, rather than a number of bytes. Fix that. 3. Claude also found two other components `SentinelByte` and `CompressSmallLengths` that had the same issue, so proactively fix it. Reviewed By: daniellerozenblit Differential Revision: D103685305 fbshipit-source-id: ff6c7c01b279277774ad61d7357f5644f3744957
Summary: Pull Request resolved: #721 If allocation fails, when freeing the arena mark it as `NULL` so it isn't double freed. Reviewed By: Victor-C-Zhang Differential Revision: D103685306 fbshipit-source-id: 321481299933384265d19fd099eabc7530a440e3
Summary: Pull Request resolved: #725 Replace out-parameter error handling with Result types that bundle values with error codes. This prevents bugs from uninitialized outputs and uses ZL_NODISCARD to catch ignored errors at compile time. New SDDL2_RESULT_OF(type) macro creates Result structs with accessor macros (SDDL2_isError, SDDL2_error, SDDL2_value) and constructors (SDDL2_success, SDDL2_failure). API changes: - SDDL2_kind_size(), SDDL2_Type_size() now return SDDL2_RESULT_OF(size_t) - SDDL2_Stack_pop() now returns SDDL2_RESULT_OF(SDDL2_Value) - Internal pop_* helpers now return Result types - Added SDDL2_TRY_LET for concise error propagation Updated all call sites in sddl2.c, sddl2_vm.c, and tests. Reviewed By: Cyan4973 Differential Revision: D103698980 fbshipit-source-id: e23c80aa876d658dc707173332401bc8f826089b
Summary: Bumps [postcss](https://github.com/postcss/postcss) from 8.5.9 to 8.5.12. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/postcss/postcss/releases">postcss's releases</a>.</em></p> <blockquote> <h2>8.5.12</h2> <ul> <li>Fixed reading any file via user-generated CSS.</li> <li>Added <code>opts.unsafeMap</code> to disable checks.</li> </ul> <h2>8.5.11</h2> <ul> <li>Fixed nested brackets parsing performance (by <a href="https://github.com/offset"><code>@offset</code></a>).</li> </ul> <h2>8.5.10</h2> <ul> <li>Fixed XSS via unescaped <code></style></code> in non-bundler cases (by <a href="https://github.com/TharVid"><code>@TharVid</code></a>).</li> </ul> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/postcss/postcss/blob/main/CHANGELOG.md">postcss's changelog</a>.</em></p> <blockquote> <h2>8.5.12</h2> <ul> <li>Fixed reading any file via user-generated CSS.</li> <li>Added <code>opts.unsafeMap</code> to disable checks.</li> </ul> <h2>8.5.11</h2> <ul> <li>Fixed nested brackets parsing performance (by <a href="https://github.com/offset"><code>@offset</code></a>).</li> </ul> <h2>8.5.10</h2> <ul> <li>Fixed XSS via unescaped <code></style></code> in non-bundler cases (by <a href="https://github.com/TharVid"><code>@TharVid</code></a>).</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/postcss/postcss/commit/9bc81c48f054a630c9a2e3868263b7ad4fc15013"><code>9bc81c4</code></a> Release 8.5.12 version</li> <li><a href="https://github.com/postcss/postcss/commit/85c4d7dab830be366f8a96047f9e5b7944e101d8"><code>85c4d7d</code></a> Another try to fix coverage</li> <li><a href="https://github.com/postcss/postcss/commit/94484cae6d4308167939f2ac888d166bd80dff01"><code>94484ca</code></a> Try to fix coverage</li> <li><a href="https://github.com/postcss/postcss/commit/c64b7488d2731dfa16213739b42c34faf5a9eba3"><code>c64b748</code></a> Load only .map source maps</li> <li><a href="https://github.com/postcss/postcss/commit/aaec7b78b3ce2792585b4b300ef1bd5dd5b3e8ad"><code>aaec7b7</code></a> Avoid throwing JSON parsing errors for non-JSON source maps</li> <li><a href="https://github.com/postcss/postcss/commit/233fb264ea4c37f9e2d7b64b2726e6d23fd02327"><code>233fb26</code></a> Mention original author of the solution</li> <li><a href="https://github.com/postcss/postcss/commit/2502f750307acde733a39f9dfd4ef3cf6c6b734d"><code>2502f75</code></a> Release 8.5.11 version</li> <li><a href="https://github.com/postcss/postcss/commit/5ca19019495b3fa08205f5fd2eeed57892f9fa3d"><code>5ca1901</code></a> Speed up parsing many nested brackets</li> <li><a href="https://github.com/postcss/postcss/commit/42b5337dd7e2fa9a03566495cfad2737eb19e712"><code>42b5337</code></a> Update dependencies</li> <li><a href="https://github.com/postcss/postcss/commit/7e36e153d075ef56ebc352f298b65f646c700a06"><code>7e36e15</code></a> Cache node.raws locally in Stringifier hot methods</li> <li>Additional commits viewable in <a href="https://github.com/postcss/postcss/compare/8.5.9...8.5.12">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `dependabot rebase` will rebase this PR - `dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/facebook/openzl/network/alerts). </details> Pull Request resolved: #682 Reviewed By: terrelln Differential Revision: D103783596 Pulled By: Victor-C-Zhang fbshipit-source-id: dd09cad8cd89dd98fb4bde33d11ea23cd60efb26
Summary: Pull Request resolved: #718 The CSV segmenter crashed (fatal assertion) when processing input with high token density (e.g., many consecutive separators). The root cause was twofold: 1. `maxNumTokens` was calculated as `min(byteSize, chunkByteSizeMax)`, assuming at most 1 token per byte. But consecutive separators produce 2 tokens per byte (empty field + separator), causing the lexer to exhaust the token buffer before consuming enough bytes. 2. When the token buffer was exhausted mid-chunk, the segmenter's internal consistency checks fired with `logicError`, which triggers a fatal assertion in `ZL_E_create_va` ("Logic errors should never actually be generated"). Fix: - Increase `maxNumTokens` to `2 * chunkSize + 1` to handle worst-case token density Reviewed By: Victor-C-Zhang Differential Revision: D103001388 fbshipit-source-id: 8fe4b418b25b551ea76a68a846f9150c59e09f44
Summary: Pull Request resolved: #732 Add a bounds check immediately after the chunk-advance so that a malformed input which lands `chunkIdx` past the end of the column-chunk vector is rejected with `node_invalid_input` instead of aborting. Reviewed By: terrelln Differential Revision: D103844843 fbshipit-source-id: 270fe012bb9f87d9fe756e2d50ea63206877d7d5
Summary: Pull Request resolved: #733 Adds a flag `--save-ace-state` to the `zli` that saves the ace state to the serialized compressor produced. Implements this by preserving the ACE state local parameter after graph replacement is done. Reviewed By: daniellerozenblit Differential Revision: D102877314 fbshipit-source-id: f0455f2e21aeba3b034bc4fd5eba79a72702275b
Summary: Pull Request resolved: #724 Adding banner in visualizer to notify users of new keyboard navigation mode. will removed in some future date. Reviewed By: Victor-C-Zhang Differential Revision: D103706533 fbshipit-source-id: 7eb3e4bcbfec09ecf2faa01534c92b902c298d43
Summary: Pull Request resolved: #734 The segmenter callbacks already treat a missing chunk-size local int param as "use the built-in default" (`SEGM_NUM_FROM_SERIAL_DEFAULT_CHUNK_SIZE`, currently `ZL_DEFAULT_SEGMENTER_CHUNK_BYTE_SIZE` = 16 MiB). The matching builder helper `ZL_Compressor_buildNumFromSerialSegmenter[2]`, however, rejected `chunkByteSize == 0`. The only way to get the segmenter's default through the builder API today was to pass `ZL_DEFAULT_SEGMENTER_CHUNK_BYTE_SIZE` explicitly. This aligns the builder with the library convention: `chunkByteSize == 0` means "use `ZL_DEFAULT_SEGMENTER_CHUNK_BYTE_SIZE`". The substitution happens before the `>= eltByteWidth` validation, so explicit small values are still rejected as before. Doxygen on the public header is updated to document the new sentinel. Reviewed By: kevinjzhang Differential Revision: D103898790 fbshipit-source-id: 4f9c2fc37d437ac43cc6539af4ea59aa1f39eac7
Summary: Pull Request resolved: #730 The `serial` profile can no longer ingest inputs larger than 4 GiB, likely due to the internal LZ engine being updated from Zstandard to the new implementation. This change applies the same approach used by numeric profiles: large inputs are automatically split into smaller, more manageable chunks. The default chunk size is 16 MiB, and users can override it with `--chunk-size`. A new standard segmenter, `SEGM_serial`, is added and modeled directly on `SEGM_numFromSerial`. It splits serial input by byte size, using a default of 16 MiB. The chunk size can also be configured through the new public local integer parameter `ZL_SEGMENT_SERIAL_CHUNK_BYTE_SIZE_PARAM`. Each chunk is forwarded to a successor graph. If no successor is provided, the segmenter falls back to `ZL_GRAPH_COMPRESS_GENERIC`. As with the other segmenters, when `formatVersion < ZL_CHUNK_VERSION_MIN`, the segmenter emits a single chunk so the resulting frame remains decodable by older clients. This also adds: - A new public macro, `ZL_SEGMENT_SERIAL` - Two builder helpers, `ZL_Compressor_buildSerialSegmenter[2]`, in `include/openzl/codecs/zl_segmenters.h` - A typed C++ wrapper, `graphs::SegmentSerial`, modeled on `graphs::SDDL2` The CLI `serial` profile now reads `--chunk-size` and wraps its existing `ACE+LZ` graph with the new segmenter. As a result, it now sets `supportsChunkSize_ = true`. Finally, the new standard graph ID, `ZL_StandardGraphID_segment_serial`, is appended to the public enum before `_public_end`, preserving all existing wire-format IDs. Reviewed By: kevinjzhang Differential Revision: D103759746 fbshipit-source-id: 13225c2be665bfa5529f3bb0dc1a40c02b930ad9
Summary: Pull Request resolved: #727 Fix `make -j1` build on MSYS2/MinGW by specifying the "Unix Makefiles" CMake generator (otherwise CMake may pick Ninja or NMake on Windows) Reviewed By: Cyan4973 Differential Revision: D103741888 fbshipit-source-id: 6084366ee09817a3eca9d2194955f82d59d13206
Summary: Pull Request resolved: #737 Ensure the benchmark command reproducers are committed in the repo for our release Reviewed By: daniellerozenblit, Victor-C-Zhang Differential Revision: D104059044 fbshipit-source-id: c1d12069f1a652c3b548306c57aed41cecbbe47e
Summary: Pull Request resolved: #586 ClangCL disables C++ exception handling by default, enable exception handling by adding `/EHsc` flag for when MSVC/ClangCL is used. Added CI and fixed other small issues with ClangCL Github issue: #357 Reviewed By: Cyan4973 Differential Revision: D99160390 fbshipit-source-id: 1639336dd452247d1f677e6b87c8f0d2387c6cbf
Summary: Pull Request resolved: #736 The test checks that two trained compressors with different chunk sizes produce different `.zlc` files. The previous version compared file sizes, but `ACE` training is non-deterministic, and two runs can produce different artifacts but with identical size, which trips the assertion as a false positive. This is rare but can happen in CI, making CI signal flaky. Switch to a bytewise comparison, which must always be true. Reviewed By: terrelln Differential Revision: D103963402 fbshipit-source-id: 4e57acdd54818e4ed55719d716083b8fbb026941
Summary: Pull Request resolved: #740 As titled. The `Zstrong_ParquetLexerTest_FuzzLexerValidInput` lionhead harness was previously crashing with when generating parquet data. Reviewed By: terrelln Differential Revision: D104079523 fbshipit-source-id: 9f088ef4758fb5916f4a5558b367b65832bcb685
Summary: Pull Request resolved: #739 We don't test on 32-bit platforms yet, so static assert that it isn't supported in one known place where we don't support 32-bits. Reviewed By: daniellerozenblit Differential Revision: D104076319 fbshipit-source-id: 9ba32fba441c3ba84e79c129a64996c6c07baeb9
Summary: Pull Request resolved: #741 Adds the SDDL2 announcement to the SDDL docs. Reviewed By: Cyan4973 Differential Revision: D104098404 fbshipit-source-id: d44b5e195c7447c9f6e0b23e16e8c9e61f8b8d02
Summary: Pull Request resolved: #742 The CSV parser `maxNumTokens` is incorrect in cases where the input cannot be chunked properly (no newlines present to chunk at). Raise the limit of this value. Reviewed By: terrelln Differential Revision: D104082821 fbshipit-source-id: fb2b2d8dcf10cead7f6427bc9619d1a0950feefb
Comment on lines
+136
to
+148
| name: CMake Tarball Build Test | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - uses: actions/checkout@v4 | ||
| - name: Test tarball build scenario | ||
| run: | | ||
| # Simulate tarball environment by removing git repo and deps | ||
| rm -rf .git deps/zstd | ||
| # Configure and build -- should download deps | ||
| cmake -B cmakebuild -DOPENZL_BUILD_TESTS=ON -DOPENZL_ALLOW_INTROSPECTION=OFF | ||
| cmake --build cmakebuild | ||
| # quick runtime check, to verify it works | ||
| cmakebuild/cli/zli --version |
Comment on lines
+193
to
+201
| name: C99 Compatibility | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - uses: actions/checkout@v4 | ||
| - name: Verify public headers compile under strict C99 | ||
| run: CC=clang make c99_compat | ||
|
|
||
| # CLI integration testing | ||
| test-cli: |
Comment on lines
+16
to
+35
| name: Build SDist | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - uses: actions/checkout@v4 | ||
| with: | ||
| submodules: true | ||
|
|
||
| - name: Build SDist | ||
| run: pipx run build --sdist contrib/openzl-demo | ||
|
|
||
| - name: Check metadata | ||
| run: pipx run twine check contrib/openzl-demo/dist/* | ||
|
|
||
| - uses: actions/upload-artifact@v4 | ||
| with: | ||
| name: dist-sdist | ||
| path: contrib/openzl-demo/dist/*.tar.gz | ||
|
|
||
|
|
||
| build_wheels: |
Comment on lines
+36
to
+64
| name: Wheels on ${{ matrix.os }} | ||
| runs-on: ${{ matrix.os }} | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| # Once MSVC is supported we can build Windows packages as well by adding windows-latest | ||
| # Once we're public we can build on ubuntu-24.04-arm & windows-11-arm. | ||
| # They are not supported for private repos. | ||
| # TODO: Remove filesystem for macos-13 | ||
| os: [ubuntu-latest, macos-latest] | ||
|
|
||
| steps: | ||
| - uses: actions/checkout@v4 | ||
| with: | ||
| submodules: true | ||
|
|
||
| - uses: pypa/cibuildwheel@v3.0.0 | ||
| with: | ||
| package-dir: contrib/openzl-demo | ||
|
|
||
| - name: Verify clean directory | ||
| run: git diff --exit-code | ||
| shell: bash | ||
|
|
||
| - name: Upload wheels | ||
| uses: actions/upload-artifact@v4 | ||
| with: | ||
| path: wheelhouse/*.whl | ||
| name: dist-${{ matrix.os }} |
Comment on lines
+83
to
+99
| name: Upload if release | ||
| needs: [build_wheels, build_sdist] | ||
| runs-on: ubuntu-latest | ||
| if: github.event_name == 'release' && github.event.action == 'published' | ||
|
|
||
| steps: | ||
| - uses: actions/setup-python@v5 | ||
| - uses: actions/download-artifact@v4 | ||
| with: | ||
| path: dist | ||
| pattern: dist-* | ||
| merge-multiple: true | ||
|
|
||
| - uses: pypa/gh-action-pypi-publish@release/v1 | ||
| with: | ||
| user: __token__ | ||
| password: ${{ secrets.pypi_password }} |
Contributor
Author
|
CodeQL flagged issues are interesting, but not release-blocking. |
Summary: one last thing to do before the release... Pull Request resolved: #745 Reviewed By: jlee303 Differential Revision: D104265941 Pulled By: Cyan4973 fbshipit-source-id: 8ed4b925e09120f8588e639ef83c5ffe7f0013b7
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.