diff --git a/.gitignore b/.gitignore index 9347919..d90cbdc 100644 --- a/.gitignore +++ b/.gitignore @@ -18,3 +18,4 @@ pom.xml.asc .DS_Store /.cpcache /AGENTS.md +/CLAUDE.md diff --git a/CHANGES.md b/CHANGES.md index 1444b58..9f7958e 100644 --- a/CHANGES.md +++ b/CHANGES.md @@ -1,5 +1,251 @@ # Changelog +## [0.2.1] - 2026-04-17 + +### New Collection Types + +- **StringRope** (`string-rope`) — persistent chunked text sequence backed by + `java.lang.String` chunks. Implements `java.lang.CharSequence` so it drops + into `re-find`/`re-seq`/`re-matches`, `clojure.string`, and any Java API + expecting text. Equality with `String` is content-based and hash-compatible. + `#string/rope "…"` tagged literal with EDN round-trip. Constructor: + `string-rope` / `string-rope-concat`. At 100K+ characters, up to + ~38x faster than `String` on repeated structural edits, growing to + ~130x at 500K. +- **ByteRope** (`byte-rope`) — persistent chunked binary sequence backed by + `byte[]` chunks. Unsigned byte semantics (0–255 as long). Unsigned + lexicographic `Comparable` via `Arrays.compareUnsigned`. `#byte/rope "hex"` + tagged literal. Constructor: `byte-rope` / `byte-rope-concat`. Extras: + `byte-rope-bytes`, `byte-rope-hex`, `byte-rope-write`, + `byte-rope-input-stream`, `byte-rope-get-byte`/`-short`/`-int`/`-long` + (plus `-le` variants), `byte-rope-index-of`, and a streaming + `byte-rope-digest` that feeds chunks through `java.security.MessageDigest` + without materialization. + +### Rope Family Improvements + +- **Flat-mode optimization** for all three rope variants (`rope`, + `string-rope`, `byte-rope`). When a rope's element count is at or below + the per-variant flat threshold (1024 elements, characters, or bytes), + the rope stores its content as a bare concrete collection + (`PersistentVector`, `java.lang.String`, or `byte[]`) directly in the + root field, skipping the tree wrapper entirely. Reads dispatch straight + to the native type with zero indirection overhead; edits that grow the + rope past the threshold transparently promote to chunked tree form; + transients demote back to flat form at `persistent!` time when the + result fits. Memory for small ropes is essentially identical to the + natural baseline (1.00x vs `PersistentVector` / `String` / `byte[]`). + StringRope and ByteRope had this from day one; the generic Rope + gained it late in the 0.2.1 cycle so all three variants now share the + same optimization pattern. +- **Per-variant Chunk Size Invariant (CSI)** — each rope variant now + declares its own `+target-chunk-size+` / `+min-chunk-size+` constants + and binds them via its `with-tree` macro into the kernel's new + `*target-chunk-size*` / `*min-chunk-size*` dynamic vars. Tuned via + `lein bench-rope-tuning`: all three variants default to 1024/512 + (up from the historical 256/128). At 500K elements, generic Rope + gains +41% nth, +38% split, and 5x concat; StringRope and ByteRope + improve on every measured operation. +- **`kernel/chunk.clj`** — extracted from `kernel/rope.clj`. Holds the + `PRopeChunk` protocol extensions for the three chunk backends + (`APersistentVector`, `String`, `byte[]`) as a standalone kernel + submodule. `kernel/rope.clj` drops from 1237 to 1155 lines and is now + purely the rope tree algebra. +- **StringRope internals refactor** — `with-tree` macro replaces 16+ + copies of the `(binding [*t-join* alloc] ...)` form; `->StringRope*` + helper replaces 35+ copies of the 6-arg constructor; `coll->str` and + `coll->tree-root` coercion helpers deduplicate scattered dispatch + logic in the PRope method bodies. +- **Monomorphic hot paths for `nth` and `reduce`** on all three rope + variants. Each variant's deftype now inlines the tree walk directly, + replacing the generic kernel's protocol-dispatched `rope-nth` / + `rope-chunk-at` / `rope-reduce` with concrete chunk-type calls + (`alength`/`aget` for byte[], `.length`/`.charAt` for String, + `.count`/`.nth` for vector). Eliminates per-tree-level `PRopeChunk` + protocol dispatch (~9 dispatches per `nth` at N=500K), the + `[chunk offset]` tuple allocation that `rope-chunk-at` returned on + every call, and per-chunk `chunk-reduce-init` dispatch on every leaf + during `reduce`. + Measured at N=500K (1000 random nth, full reduce): + - Rope `nth`: 106 → 58 µs (**1.8x faster**, 0.09x → 0.16x vs vector) + - StringRope `nth`: 120 → 50 µs (**2.4x faster**, 0.013x → 0.030x vs String) + - ByteRope `nth`: 145 → 62 µs (**2.3x faster**, 0.003x → 0.015x vs byte[]) + - StringRope `reduce`: 1.81 → 1.07 ms (**1.7x faster**, 0.31x → 0.52x vs String) + - ByteRope `reduce`: 3.53 → 1.91 ms (**1.8x faster**) + - No structural-op regression: splice, concat, insert, remove, and + repeated-edits all within ±3% of prior run. +- **Removed cursor cache from StringRope and ByteRope.** The volatile-mutable + `_cc_chunk`/`_cc_start`/`_cc_end` fields introduced torn-read races + under concurrent access (three volatile writes are not atomic as a group) + and caused cache thrashing when two threads did sequential access on + the same rope instance — violating the thread-safety guarantees + expected of persistent data structures. The monomorphic tree walk is + fast enough (~50–70 ns per `nth` at N=500K) that the cache's benefit + on sequential access was not worth the correctness cost. If sequential + `charAt` throughput becomes a bottleneck for regex-heavy workloads, an + explicit cursor wrapper (opt-in, not shared) may be added in a future + release. +- **`rope-splice-inplace`** fused single-chunk splice path avoids an + intermediate `chunk-splice` allocation on the overflow path via + `chunk-splice-split`. + +### Performance Improvements + +- **Primitive rank for `long-ordered-set` / `string-ordered-set` / + `long-ordered-map` / `string-ordered-map`.** `rank-of` and `indexOf` + now dispatch to `node-rank-long` / `node-rank-string` on primitive- + specialized collections, bypassing the generic `Comparator` dispatch. + Matches the existing primitive fast-path pattern already used for + `contains` / `find` / `find-val`. At N=1K, rank on a + `long-ordered-set` is ~4x faster than on a `data.avl/sorted-set`; + `string-ordered-set` rank is ~3.4x faster. +- **Range-map bulk construction.** `(range-map coll)` with sorted + disjoint input now takes an O(n) balanced-build path (`node-build- + sorted`) instead of per-entry `assoc` with carve-out. Input with + overlapping ranges falls through to the general carving path, + preserving "later wins" semantics. ~10x faster than the previous + per-insert path; at N=1K the bulk path is already ~2.2x faster than + Guava's `TreeRangeMap` construction. +- **Non-allocating `java.util.Iterator` for `OrderedSet` / + `OrderedMap`.** A new `tree/NodeIterator` deftype advances the tree + enumerator in place via an unsynchronized-mutable field, avoiding + the per-step seq-cell allocation of `clojure.lang.SeqIterator` over + a lazy seq. Thread-safety contract is unchanged: the iterator is + per-call fresh (no shared state on the collection), matching the + memory model of `SeqIterator`. Java-style iteration on + `OrderedSet` is now ~2x faster than on `sorted-set` and ~3.6x + faster than on `data.avl` at N=1K. + +### Refactoring + +- **`RopeSeq` / `RopeSeqReverse` moved from `kernel/rope.clj` into + `types/rope.clj`.** These generic-Rope-specific seq types were + only used by the generic `Rope` deftype — `StringRope` and + `ByteRope` carry their own monomorphic seq types. Relocating them + makes `kernel/rope.clj` honestly chunk-protocol-agnostic and cuts + ~220 lines from the kernel (now 1001 lines). No user-visible + change. + +### Bug Fixes + +- **Primitive node specialization preserved across mutations.** + `conj`/`disjoin` on `OrderedSet` and `assoc`/`without` on + `OrderedMap` were passing the generic `SimpleNode` constructor + instead of the collection's stored allocator. After a single + `conj` on a `long-ordered-set`, the root silently degraded from + `LongKeyNode` to `SimpleNode`, losing unboxed-key performance. + Fixed by threading `alloc` through all `node-add`/`node-remove` + call sites. `ordered-merge-with` also propagated nil alloc/stitch + into the result; fixed. +- **PriorityQueue and OrderedMultiset** `getAllocator` and `getStitch` + returned nil instead of the generic constructor, violating the + `INodeCollection`/`IBalancedCollection` contract. +- **Empty StringRope** `charAt` and `nth` dereferenced nil root + instead of throwing bounds exceptions. +- **StringRope `valAt`** threw `ClassCastException` on non-integer + keys (e.g., `(get sr :x)`). Added `integer?` guard. +- **Empty StringRope/ByteRope `r/fold`** crashed instead of returning + `(combinef)`. +- **ByteRope `InputStream.read(buf, off, 0)`** returned -1 at EOF + instead of 0 per `InputStream` contract. +- **Auto-boxing in `str->root` and `bytes->root`.** The loop variable + `pos` was inferred as primitive `long` but the recur argument came + from `clojure.core/min` (Object) and `unchecked-dec-int` (int), + forcing auto-boxing per iteration. Threaded as primitive `long` + throughout using `unchecked-add` / `unchecked-dec` / `unchecked-int` + consistently. Pre-existing latent warning exposed when compiling + under `*warn-on-reflection*`. + +### Benchmarks and Tooling + +- **`lein bench-rope-tuning`** fully rewritten to sweep chunk sizes + across all three rope variants (`Rope` vs `Vector`, `StringRope` vs + `String`, `ByteRope` vs `byte[]`). Reports per-operation speedups and + a geomean score for ranking. Supports + `--variant rope|string-rope|byte-rope`. +- **`lein bench`** (`bench_runner.clj`) full suite gains N=1000 and + N=5000 cardinalities alongside the existing 10K/100K/500K. The 1K + column exercises flat-mode for all three rope variants; the 5K + column exercises the smallest tree-mode regime. +- **`lein bench-simple`** gains a `:rope` category (alongside the + existing `:string-rope` and new `:byte-rope` categories) and adds + N=5000 to the shared size defaults. +- **Memory test** (`memory_test.clj`) gains `string-rope-memory` and + `byte-rope-memory` deftests plus a new rope family section in the + summary report table, showing all three variants against their + natural baselines. The `specialized-collection-memory` deftest + extends to cover range-map, segment-tree, and fuzzy-map (previously + only interval-set/-map, multiset, priority-queue, and fuzzy-set). +- **`lein bench-report`** gains three new sections: *Performance by + Category* (aggregated wins/parity/losses per category with geomean + speedup and best/worst case), *Rope Family at Scale* (side-by-side + speedups for all three rope variants on structural ops), and + *Significant Wins* (parallel to the existing Significant Losses + section — the significant-wins analyzer was always computed but + previously not rendered). All existing sections — Headline + Performance, Parity, Significant Losses, Full Scorecard, + Regressions, Improvements — render identically. +- **`lein bench` auto-compare** — after writing a fresh + `bench-results/.edn`, the runner looks for the + most-recent prior EDN in the same directory, flat-walks both files, + matches leaf measurements by `(size, group, variant)`, and prints a + compact Regressions / Improvements section with timing deltas. + Self-contained (no dependency on the `bb` report tool); suggests + `lein bench-report --baseline` for the full comparison. +- **Main bench suite coverage parity** — `bench_runner.clj` now + benchmarks range-map, segment-tree, priority-queue, ordered-multiset, + fuzzy-set, and fuzzy-map alongside the existing set / map / rope + coverage. Previously these types were only exercised by specialized + scripts (`lein bench-range-map`) or not at all, which meant the main + `lein bench --full` pipeline and `bench-report` had no visibility + into their performance. +- **`lein bench-charts`** generates 7 PNG charts in `doc/charts/` from + the latest benchmark EDN via XChart. Charts: set-algebra scaling, + rope editing scaling, collection winners (dot plot), rope operations + profile (win/loss), rope vs vector absolute time (diverging lines), + StringRope crossover, ByteRope crossover. +- **`lein bench-report` auto-baseline** — when `--baseline` is not + specified, the report automatically selects the prior timestamped + EDN, so Regressions and Improvements sections render by default. + Headline sections now include ordered-set, ordered-map, + long-specialized, and string-specialized vs their competitors. +- **Rope tuner scoring** — `lein bench-rope-tuning` now uses + structural-editing geomean (splice, split, concat) as the primary + score, with the equal-weight geomean shown as a secondary `all` + column. The old equal-weight geomean was misleadingly driven by + concat scaling. +- **`lein bench-report --publish`** suppresses the Full Scorecard, + Regressions, and Improvements sections. These are useful for + interactive A/B review during development but are noise for outside + readers of the committed `doc/report.txt` snapshot. The default + (no flag) still shows everything. Recommended snapshot workflow: + `lein bench-report --publish > doc/report.txt`. +- **New bench cases exercising the primitive-rank / range-map-bulk / + iterator optimizations.** `bench-long-rank-lookup` and + `bench-string-rank-lookup` hit the primitive `node-rank-long` / + `node-rank-string` paths that the generic `bench-rank-lookup` + missed. `bench-range-map-bulk-construction` uses the single-argument + `(core/range-map coll)` constructor to exercise the new O(n) + balanced-build path alongside the existing per-insert + `bench-range-map-construction`. `bench-set-iteration-iterator` + traverses via `.iterator()` to exercise `NodeIterator` (the + existing `bench-set-iteration` goes through `reduce`). + +### Documentation + +- [Cookbook](doc/cookbook.md) restructured with six rope recipes at the + front (text editor, regex on StringRope, bulk sequence assembly, binary + protocol, streaming digest, undo history). Duplicate section + numbering cleaned up; existing collection recipes renumbered. +- [Ropes](doc/ropes.md) gains a "Chunk Abstraction: One Kernel, Many + Backends" section explaining `PRopeChunk` and `kernel/chunk.clj`, a + "Specialized Ropes" section with per-variant design and examples, + and a variant-picker table. API section now covers all three + variants with the shared `PRope` surface up front. +- [Collections API](doc/collections-api.md) gains full StringRope and + ByteRope sections with constructors, interfaces, and per-variant + operations. + ## [0.2.0] - 2026-04-08 ### New Collection Types @@ -52,7 +298,7 @@ ### EDN Tagged Literals -Round-trip serialization via `data_readers.clj`: `#ordered/set`, `#ordered/map`, `#ordered/interval-set`, `#ordered/interval-map`, `#ordered/range-map`, `#ordered/priority-queue`, `#ordered/multiset`, `#ordered/rope`. Collections with custom comparators (including `general-compare`) print in opaque `#` form to avoid non-round-trippable tagged literals. +Round-trip serialization via `data_readers.clj`: `#ordered/set`, `#ordered/map`, `#interval/set`, `#interval/map`, `#range/map`, `#priority/queue`, `#multi/set`, `#vec/rope`. Collections with custom comparators (including `general-compare`) print in opaque `#` form to avoid non-round-trippable tagged literals. ### Performance diff --git a/README.md b/README.md index 6724769..cde1f02 100644 --- a/README.md +++ b/README.md @@ -14,8 +14,8 @@ you could solve efficiently: -- **Ropes** — concat, split, splice, insert: **10–5000x** faster than Clojure vector at scale -- **Sets and Maps** work exactly as you're used to, but do more, up to **50x** faster +- **Ropes** — concat, split, splice, insert: **10–1300x** faster than native baselines at scale, in three flavors: generic, `StringRope` (CharSequence), and `ByteRope` (unsigned bytes) +- **Sets and Maps** work exactly as you're used to, but do more, up to **60x** faster - **Interval maps** for overlap queries ("what's scheduled at 3pm?") - **Range maps** for non-overlapping regions ("which subnet owns this IP?") - **Segment trees** for range aggregation ("total sales from day 10 to 50?") @@ -34,7 +34,8 @@ foundation for splitting, joining, and parallel operations. - [Cookbook](doc/cookbook.md) — Practical patterns: leaderboards, time-series, scheduling, queues, multisets, and more - [Collections API](doc/collections-api.md) — Collection-by-collection constructor and operation reference - [Ropes](doc/ropes.md) — Rope tutorial, use cases, and design -- [Benchmarks](doc/benchmarks.md) — Detailed performance measurements +- [Benchmark Report](doc/report.txt) — Current performance numbers (auto-generated) +- [Benchmark Methodology](doc/benchmarks.md) — Infrastructure, interpretation, and how to run - [Competitive Analysis](doc/competitive-analysis.md) — Comparison with other libraries - [vs clojure.data.avl](doc/vs-clojure-data-avl.md) — For data.avl users considering a switch - [Algorithms](doc/algorithms.md) — Tree structure, rotations, split/join, interval augmentation @@ -55,12 +56,12 @@ hood — and in the new things you can do. ;; Ropes — O(log n) split, splice, concat -(def r (oc/rope [:a :b :c :d :e])) ;=> #ordered/rope [:a :b :c :d:e] +(def r (oc/rope [:a :b :c :d :e])) ;=> #vec/rope [:a :b :c :d:e] (apply oc/rope-concat - (reverse (oc/rope-split r 2))) ;=> #ordered/rope [:c :d :e :a :b] + (reverse (oc/rope-split r 2))) ;=> #vec/rope [:c :d :e :a :b] -(oc/rope-insert r 2 [:x :y]) ;=> #ordered/rope [:a :b :x :y :c :d :e] +(oc/rope-insert r 2 [:x :y]) ;=> #vec/rope [:a :b :x :y :c :d :e] ;; Sets @@ -84,27 +85,9 @@ hood — and in the new things you can do. Across the measured workloads, `ordered-collections` is faster than both `clojure.core/sorted-set` and `clojure.data.avl` at every -cardinality Set algebra is the standout, with 28-57x wins at 500K. -Even against unordered `clojure.core/set`the benchmarks still show -roughly 4-19x wins. - -### Rope vs PersistentVector - -| Workload | N=10K | N=100K | N=500K | -|---|---:|---:|---:| -| 200 random edits | **43x** | **498x** | **1968x** | -| Single splice | **6x** | **116x** | **584x** | -| Concat many pieces | **3.4x** | **5.4x** | **9.5x** | -| Chunk iteration | **58x** | **83x** | **117x** | -| Fold (sum) | **5.6x** | **1.5x** | **1.3x** | -| Reduce (sum) | 0.4x | **1.7x** | **1.3x** | -| Random nth (1000) | 0.7x | 0.5x | 0.4x | - -The rope wins on 6 of 7 workloads at scale and the advantage grows with -collection size. Concat improves with N because the rope collects chunks in -O(k) while the vector copies O(n) elements. Reduce beats vectors at N ≥ 100K -thanks to 256-element chunk locality. Random access is slower (O(log n) vs -O(1)) but bounded. See [Ropes](doc/ropes.md) for the full tutorial. +cardinality. Set algebra is the standout, with 18-60x wins at 500K. +Even against unordered `clojure.core/set` the benchmarks still show +roughly 4-28x wins. ### Set algebra @@ -112,43 +95,62 @@ O(1)) but bounded. See [Ropes](doc/ropes.md) for the full tutorial. | Operation | N=10K | N=100K | N=500K | |-----------|------:|-------:|-------:| -| Union | **15.4x** | **26.4x** | **56.6x** | -| Intersection | **9.0x** | **17.0x** | **36.2x** | -| Difference | **9.6x** | **22.1x** | **50.2x** | +| Union | **12.9x** | **23.4x** | **59.6x** | +| Intersection | **9.3x** | **15.5x** | **34.6x** | +| Difference | **10.8x** | **24.3x** | **54.7x** | #### vs clojure.data.avl | Operation | N=10K | N=100K | N=500K | |-----------|------:|-------:|-------:| -| Union | **10.9x** | **20.5x** | **42.1x** | -| Intersection | **7.2x** | **13.0x** | **28.1x** | -| Difference | **7.2x** | **12.7x** | **32.0x** | +| Union | **9.7x** | **20.0x** | **51.3x** | +| Intersection | **6.9x** | **13.1x** | **29.9x** | +| Difference | **7.9x** | **15.0x** | **37.2x** | #### vs clojure.core/set | Operation | N=10K | N=100K | N=500K | |-----------|------:|-------:|-------:| -| Union | **4.2x** | **7.2x** | **16.3x** | -| Intersection | **3.8x** | **6.1x** | **12.9x** | -| Difference | **4.4x** | **7.6x** | **18.6x** | - -### Set equality +| Union | **3.7x** | **7.2x** | **20.5x** | +| Intersection | **3.5x** | **6.6x** | **17.8x** | +| Difference | **4.6x** | **9.8x** | **28.4x** | -| | vs hash-set | vs sorted-set | vs data.avl | -|--|------------:|--------------:|------------:| -| N=10K | **2.8x** | **12.0x** | **14.1x** | -| N=100K | **2.3x** | **9.3x** | **9.5x** | - -### Other operations +#### Other operations | Operation | vs sorted-set | vs data.avl | |-----------|---------------|-------------| -| Construction | **3.0x / 2.8x / 3.1x** | **1.5x / 1.3x / 1.7x** | -| Lookup | 1.1x / 1.1x / 1.1x | 1.0x / 1.0x / 1.0x | -| Split | — | **6.8x / 7.2x / 7.8x** | -| Fold | **2.5x / 4.1x / 4.1x** | **5.8x / 5.9x / 8.8x** | +| Construction | **2.1x / 2.4x / 2.8x** | **1.1x / 1.2x / 1.5x** | +| Lookup | 1.1x / 1.0x / 0.9x | 1.0x / 0.9x / 0.9x | +| Split | — | **4.9x / 6.1x / 6.8x** | +| Fold | **2.5x / 4.0x / 3.5x** | **3.1x / 5.4x / 4.8x** | + +### Rope vs PersistentVector -*Benchmarked on a 2023 MacBook Pro (M2). See [Benchmarks](doc/benchmarks.md) for full results.* +| Workload | N=1K | N=5K | N=10K | N=100K | N=500K | +|---|---:|---:|---:|---:|---:| +| 200 random edits | **4.7x** | **14x** | **26x** | **261x** | **1237x** | +| Single splice | **4.8x** | **13x** | **106x** | **762x** | **863x** | +| Concat pieces | **169x** | **22x** | **29x** | **39x** | **36x** | +| Reduce (sum) | **1.0x** | **1.7x** | **1.4x** | **1.5x** | **1.3x** | +| Fold (sum) | **2.9x** | **1.4x** | **1.2x** | **1.3x** | **1.6x** | +| Random nth (1000) | 0.5x | 0.2x | 0.2x | 0.2x | 0.2x | + +### StringRope vs String + +| Workload | N=1K | N=5K | N=10K | N=100K | N=500K | +|---|---:|---:|---:|---:|---:| +| 200 random edits | 0.6x | **2.6x** | **5.7x** | **38x** | **130x** | +| Single splice | 0.4x | **3.2x** | **5.9x** | **42x** | **349x** | +| Single insert | 0.4x | **2.7x** | **6.2x** | **40x** | **154x** | +| Single remove | **1.5x** | **3.6x** | **7.1x** | **44x** | **412x** | +| Concat halves | 0.9x | 0.5x | **2.5x** | **20x** | **29x** | +| Reduce (sum chars) | 0.4x | 0.5x | 0.5x | 0.5x | 0.5x | +| `re-find` / `re-seq` | 0.6-1.3x | 0.1-0.2x | 0.1-0.2x | 0.1-0.2x | 0.1-0.2x | + +The rope family wins decisively on structural editing at scale; the advantage +grows with collection size. See [Ropes](doc/ropes.md) for the full tutorial. + +*Benchmarked on Apple M2 (aarch64), OpenJDK 25.0.2, Clojure 1.12.4. See [report.txt](doc/report.txt) for full results and [benchmarks.md](doc/benchmarks.md) for methodology.* --- @@ -215,6 +217,12 @@ and `fuzzy-map`. | `(oc/rope coll)` | Persistent sequence for structural editing | | `(oc/rope-concat a b)` | Concatenate two ropes — O(log n) | | `(oc/rope-splice r start end items)` | Replace a range — O(log n) | +| **StringRope** | | +| `(oc/string-rope s)` | Persistent text sequence (implements `CharSequence`) | +| `(oc/string-rope-concat a b)` | Concatenate two string ropes — O(log n) | +| **ByteRope** | | +| `(oc/byte-rope bs)` | Persistent memory — structural editing, zero-cost snapshots, structure sharing | +| `(oc/byte-rope-concat a b)` | Concatenate two byte ropes — O(log n) | | **Fuzzy Collections** | | | `(oc/fuzzy-set coll)` | Returns closest element to query | | `(oc/fuzzy-map coll)` | Returns value for closest key to query | @@ -226,7 +234,11 @@ and `fuzzy-map`. A rope is a persistent sequence optimized for **structural editing** — concatenation, splitting, splicing, and insertion in the middle of large sequences. Where `PersistentVector` is O(n) for mid-sequence edits, -the rope is O(log n). +the rope is O(log n). Three variants share the same kernel: + +- **`rope`** — arbitrary Clojure values (vector-compatible) +- **`string-rope`** — UTF-16 text, implements `CharSequence` for `re-find`/`clojure.string` +- **`byte-rope`** — persistent memory with structural editing, zero-cost snapshots, and structure sharing. Think of it as a byte buffer with the safety properties of a persistent data structure: splice at any offset in O(log n), keep old versions for free, let the GC reclaim what's unreachable ```clojure (def r (oc/rope (range 100000))) @@ -238,10 +250,15 @@ the rope is O(log n). (let [[left right] (oc/rope-split r 50000)] [(count left) (count right)]) ;=> [50000 50000] -;; Concatenate — O(log n) -(oc/rope-concat (oc/rope [1 2 3]) (oc/rope [4 5 6])) -;=> #ordered/rope [1 2 3 4 5 6] +;; StringRope — drops into regex and clojure.string +(def text (oc/string-rope "hello world")) +(re-find #"wor" text) ;=> "wor" + +;; ByteRope — binary protocols, streaming digest +(def packet (oc/byte-rope [0x48 0x45 0x4C 0x4C 0x4F])) +(oc/byte-rope-get-int packet 0) ;=> 1212501068 ``` + See [Ropes](doc/ropes.md) for the full tutorial. --- @@ -399,7 +416,7 @@ All collection types implement `CollFold` for efficient `r/fold`: ``` $ lein test -Ran 454 tests containing 466,000+ assertions. +Ran 690 tests containing 471,000+ assertions. 0 failures, 0 errors. ``` @@ -417,7 +434,7 @@ $ lein stats # Print project statistics ``` $ lein bench # Criterium, N=100K (~5 min) -$ lein bench --full # Criterium, N=10K,100K,500K (~40 min) +$ lein bench --full # Criterium, N=1K,5K,10K,100K,500K (~60 min) $ lein bench --readme --full # README tables only (~10 min) $ lein bench --sizes 50000 # Custom sizes diff --git a/deps.edn b/deps.edn index f75baa0..d32cf36 100644 --- a/deps.edn +++ b/deps.edn @@ -60,4 +60,22 @@ :extra-deps {org.clojure/data.avl {:mvn/version "0.2.0"} criterium/criterium {:mvn/version "0.4.6"}} :jvm-opts ["-Xmx8g"] - :main-opts ["-m" "ordered-collections.parallel-threshold-bench"]}}} + :main-opts ["-m" "ordered-collections.parallel-threshold-bench"]} + + :bench-rope-fold + {:extra-paths ["test"] + :extra-deps {criterium/criterium {:mvn/version "0.4.6"}} + :jvm-opts ["-Xmx8g"] + :main-opts ["-m" "ordered-collections.rope-fold-bench"]} + + :bench-transient-rope + {:extra-paths ["test"] + :extra-deps {criterium/criterium {:mvn/version "0.4.6"}} + :jvm-opts ["-Xmx8g"] + :main-opts ["-m" "ordered-collections.transient-rope-bench"]} + + :bench-rope-tuning + {:extra-paths ["test"] + :extra-deps {criterium/criterium {:mvn/version "0.4.6"}} + :jvm-opts ["-Xmx8g"] + :main-opts ["-m" "ordered-collections.rope-tuning-bench"]}}} diff --git a/doc/algorithms.md b/doc/algorithms.md index 1b148a5..f0e34c8 100644 --- a/doc/algorithms.md +++ b/doc/algorithms.md @@ -597,13 +597,19 @@ by element count. Chunks are bounded by a formal invariant analogous to B-tree minimum fill: ``` -target = 256 min = 256 +target = 1024 min = 512 (per-variant, kernel-wide default) Every chunk has size in [min, target] except: - If the rope has ≤ 1 chunk, it may be any size in [1, target] - Otherwise, only the rightmost chunk (the "runt") may be [1, target] ``` +Each rope variant (`rope`, `string-rope`, `byte-rope`) carries its own +`+target-chunk-size+` / `+min-chunk-size+` constants and binds them into +the kernel's `*target-chunk-size*` / `*min-chunk-size*` dynamic vars via +its `with-tree` macro. The values above are the current tuned defaults; +`lein bench-rope-tuning` sweeps candidate sizes for each variant. + CSI is enforced locally at each mutation site: | Operation | Enforcement | @@ -618,6 +624,31 @@ CSI is enforced locally at each mutation site: size, but when the last full chunk would leave a remainder below min, it splits the final two pieces evenly so both halves are ≥ min. +### Flat Mode + +Below the target chunk size, a rope would consist of exactly one chunk +wrapped in a single tree node. That wrapper adds no information, so +every rope variant applies a **flat-mode** optimization: when the +element count is at or below the flat threshold (= `+target-chunk-size+`, +currently 1024), the rope stores its content as a bare concrete +collection in its `root` field — a `PersistentVector` for the generic +rope, a `java.lang.String` for the string rope, a `byte[]` for the byte +rope — and skips the tree wrapper entirely. + +Reads dispatch directly to the underlying type's native operations with +zero indirection (`.nth` on the vector, `.charAt` on the string, +`aget` on the byte array). Structural edits use the native type's +own efficient operations (`subvec`+`into`, `StringBuilder`, +`System.arraycopy`) and promote to the chunked tree form only when +the result would exceed the threshold. Transient construction always +builds a tree internally but demotes back to flat form at +`persistent!` time if the final result fits. + +The flat threshold is `+target-chunk-size+` — "small enough to live in +one chunk" and "small enough to stay flat" are the same regime. Memory +overhead for a flat-mode rope is essentially identical to the raw +underlying type. + ### Indexed Access `rope-nth` descends by subtree element counts: @@ -751,11 +782,17 @@ Rope loses (bounded, inherent O(log n) vs O(1)): └────────────────────┴───────┴────────┴────────┘ ``` -Reduce beats vectors at N ≥ 100K because the rope's 256-element chunk size +Reduce beats vectors at N ≥ 100K because the rope's 1024-element chunk size gives better cache locality per reduction step than PersistentVector's 32-wide trie nodes. The direct recursive tree walk (no enumerator frames) and native vector `.reduce` delegation keep per-element overhead minimal. +At N ≤ 1024 the rope is in **flat mode** — the root holds a bare +`PersistentVector` directly rather than a one-chunk tree, so every read +dispatches straight to the underlying vector with zero indirection. At +that size, the rope's read performance is essentially identical to +`PersistentVector` itself. + Every remaining loss is structural — there are no non-inherent performance gaps: ``` diff --git a/doc/benchmarks.md b/doc/benchmarks.md index d6c67ca..b87a1a0 100644 --- a/doc/benchmarks.md +++ b/doc/benchmarks.md @@ -1,14 +1,144 @@ # Benchmarks -This is the canonical performance document for the project: benchmark -methodology, current numbers, and the implementation details that most directly -explain those numbers. +## Current Numbers + +Run `lein bench-report` to generate the full performance report from the +latest benchmark data. The output is also committed as +[report.txt](report.txt) for quick reference. + +The report includes headline speedup tables for every collection type, +per-category geomean aggregation, rope family cross-variant comparison, +significant wins/losses, full scorecard, and regressions/improvements +vs the prior run. + +## Infrastructure + +### Design + +Performance work on this project is driven by a structured, versioned +artifact pipeline rather than ad-hoc timing. Every `lein bench` run +produces a self-describing EDN file that records what was measured, on +what hardware, at what commit, with full Criterium statistics per cell. +These artifacts accumulate in `bench-results/` and form a database of +performance history going back to 0.2.0, enabling detailed A/B +comparison against any branch or point in time. + +The pipeline has three stages: + +1. **Capture** (`bench_runner.clj`) — runs Criterium benchmarks across + all collection types and cardinalities, normalizes the raw Criterium + output, and writes a timestamped EDN artifact. +2. **Analyze** (`bench_analyze.clj`) — flattens and classifies the + artifact, computes the ordered scorecard (OC variant vs best peer), + detects regressions/improvements against a baseline, and aggregates + by category. +3. **Render** (`bench_render.clj` + `bench_report.bb`) — produces the + formatted terminal report with scaling tables, ranked sections, and + delta annotations. + +This separation means the analysis logic is reusable — the same +scorecard and regression functions work whether you're comparing two +runs on the same branch, an optimization branch against master, or +the latest nightly against a release tag. + +### The EDN Artifact + +Each benchmark run writes a single file to `bench-results/` with the +naming convention `YYYY-MM-DD_HH-mm-ss.edn`. The artifact contains: + +- **`:artifact-version`** — schema version (currently 3). Incremented + when the field structure changes so downstream tools can detect + incompatibility. +- **`:system`** — full environment snapshot: git rev, branch, dirty + flag, `git describe`, Java version/vendor/VM, OS, processor count, + heap sizes, JVM args, Clojure version, Leiningen version, hostname, + timezone. +- **`:sizes`** — the cardinalities measured (e.g., `[1000 5000 10000 + 100000 500000]`). +- **`:benchmarks`** — nested map of `{size → {group → {variant → + stats}}}`. Each leaf holds the full Criterium result: mean, standard + deviation, confidence intervals, sample count, execution count, + outlier classification. +- **`:started-at`**, **`:timestamp`**, **`:duration-ms`** — wall-clock + bookkeeping. +- **`:mode`**, **`:opts`**, **`:argv`** — reproducibility metadata. + +The system snapshot makes it possible to attribute performance changes +to hardware/JVM differences vs code changes. The artifact version +ensures tools fail cleanly rather than silently misinterpreting a +changed schema. + +### A/B Comparison + +Both `lein bench` and `lein bench-report` automatically compare against +the prior run: + +1. **File discovery** — scan `bench-results/` for timestamped EDN + files, sort lexically (which is chronological), and select the + most recent file before the current one. +2. **Cell matching** — flatten both artifacts into rows keyed by + `[size, group, variant]`. Match by composite key. +3. **Delta computation** — for each matched cell, compute + `new-ns / old-ns` and classify: + - **Major regression**: >25% slower + - **Regression**: >10% slower + - **Improvement**: >10% faster + - **Major improvement**: >25% faster + - **Unchanged**: within 10% + +This runs automatically at the end of every `lein bench` invocation +(self-contained, no dependency on the bb report tool) and as the +Regressions/Improvements sections in `lein bench-report`. + +For targeted A/B testing, specify any two artifacts explicitly: + +``` +$ lein bench-report \ + --file bench-results/2026-04-12_05-22-59.edn \ + --baseline bench-results/2026-04-09_09-11-13.edn +``` + +### Analysis Functions + +The analyze layer (`bench_analyze.clj`) provides: + +| Function | Purpose | +|----------|---------| +| `ordered-scorecard` | For each `[size, group]`, find best OC variant vs best peer, compute speedup, classify as win/parity/loss (thresholds: 1.05x / 0.95x) | +| `regression-report` | Match cells across two artifacts, compute deltas, classify severity | +| `headline-wins` | Extract curated comparisons (from a hand-maintained spec) pivoted by size for the scaling tables | +| `category-summary` | Per-category aggregates: win/parity/loss counts, geometric mean speedup, best win, worst loss with group name | +| `rope-family-summary` | Cross-variant side-by-side at the largest measured size | +| `significant-wins` / `significant-losses` | Filter scorecard by magnitude thresholds | +| `parity-cases` | Filter scorecard for near-1.0x cases | +| `executive-summary` | One-paragraph overview: case count, win/loss tally, best/worst, regression count | + +### The Report Sections + +`lein bench-report` renders these sections in order: + +1. **Run** / **Platform** / **Baseline Run** — metadata headers +2. **Summary** — one-paragraph executive summary +3. **Headline Performance** — scaling tables grouped by section + (set algebra, ordered set, ordered map, long-specialized, + string-specialized, rope, string-rope, byte-rope, range map, + segment tree, priority queue, multiset, fuzzy set/map) +4. **Performance by Category** — geomean aggregation across all cases + in each category +5. **Rope Family at Scale** — cross-variant structural ops at N=500K +6. **Significant Wins** — ranked wins above 1.2x +7. **At Parity** — cases within 5% of 1.0x +8. **Significant Losses** — ranked losses below 0.83x +9. **Full Scorecard** — all measured comparisons with times and status *(omitted under `--publish`)* +10. **Regressions** — A/B deltas flagged as slower *(omitted under `--publish`)* +11. **Improvements** — A/B deltas flagged as faster *(omitted under `--publish`)* + ## Running ``` $ lein bench # Criterium, N=100K (~5 min) -$ lein bench --full # Criterium, N=10K,100K,500K (~30 min) +$ lein bench --full # Criterium, N=1K,5K,10K,100K,500K (~60 min) $ lein bench --readme --full # README tables only (~10 min) $ lein bench --sizes 50000 # Custom sizes @@ -17,422 +147,145 @@ $ lein bench-simple --full # Full suite (100 to 1M) $ lein bench-range-map # Range-map vs Guava TreeRangeMap $ lein bench-parallel # Parallel threshold crossover analysis $ lein bench-rope-tuning # Rope chunk-size sweep -``` - -Results are written to `bench-results/.edn` with system info, sizes, and per-operation Criterium statistics. - -## Environment - -| | | -|--|--| -| JVM | OpenJDK 25.0.2, 64-bit Server VM | -| Clojure | 1.12.4 | -| OS / Arch | Mac OS X 26.3.1 / aarch64 | -| Available processors | 12 | -| Heap | 8192 MB | -| Method | Criterium quick-benchmark (6 samples, JIT warmup, outlier detection) | - -Relative ratios are more meaningful than absolute times. - -## Set Operations - -Two sets of size N with 50% overlap. Adams' divide-and-conquer with optional -fork-join parallelism. Set algebra uses operation-specific root thresholds: - -- union `131,072` -- intersection `65,536` -- difference `131,072` -- ordered-map merge `65,536` - -Recursive re-forking currently uses `65,536` for all four operations, plus a -`65,536` minimum-branch guard and a `64` sequential cutoff for tiny subtrees. - -This is the library's dominant advantage. The split/join structure gives -work-optimal set algebra, and fork-join parallelism helps extend that advantage -to larger multicore workloads. - -### vs sorted-set (speedup) - -| Operation | N=10K | N=100K | N=500K | -|-----------|------:|-------:|-------:| -| Union | **15.4x** | **26.4x** | **56.6x** | -| Intersection | **9.0x** | **17.0x** | **36.2x** | -| Difference | **9.6x** | **22.1x** | **50.2x** | - -### vs data.avl (speedup) - -| Operation | N=10K | N=100K | N=500K | -|-----------|------:|-------:|-------:| -| Union | **10.9x** | **20.5x** | **42.1x** | -| Intersection | **7.2x** | **13.0x** | **28.1x** | -| Difference | **7.2x** | **12.7x** | **32.0x** | - -Interpretation: -- against `sorted-set`, the gap is mainly algorithmic: generic `clojure.set` - paths over built-in sorted collections do not exploit a native split/join - algebra -- against `data.avl`, both libraries benefit from ordered trees, but this - library's split/join constant factors are lower and the set operations also - parallelize - -### vs clojure.set on hash-set (exploratory, unfair baseline) - -This is not an ordered-collection comparison, so it should be read as an -exploratory stress test rather than as the main benchmark story. Even so, the -current split/join implementation still wins decisively. - -| Operation | N=10K | N=100K | N=500K | -|-----------|------:|-------:|-------:| -| Union | **4.2x** | **7.2x** | **16.3x** | -| Intersection | **3.8x** | **6.1x** | **12.9x** | -| Difference | **4.4x** | **7.6x** | **18.6x** | - -### Raw times (ms) - -| Operation | N | sorted-set | data.avl | ordered-set | -|-----------|---|----------:|----------:|----------:| -| Union | 10K | 3.63 | 2.56 | **0.24** | -| | 100K | 42.71 | 33.27 | **1.62** | -| | 500K | 246.60 | 183.40 | **4.36** | -| Intersection | 10K | 2.32 | 1.85 | **0.26** | -| | 100K | 30.36 | 23.25 | **1.79** | -| | 500K | 170.02 | 131.96 | **4.70** | -| Difference | 10K | 2.02 | 1.51 | **0.21** | -| | 100K | 30.70 | 17.71 | **1.39** | -| | 500K | 156.97 | 100.24 | **3.13** | - -### Raw times vs clojure.set on hash-set (ms) - -| Operation | N | clojure.set/hash-set | ordered-set | -|-----------|---|---------------------:|------------:| -| Union | 10K | 1.00 | **0.24** | -| | 100K | 11.65 | **1.62** | -| | 500K | 70.93 | **4.36** | -| Intersection | 10K | 0.98 | **0.26** | -| | 100K | 10.85 | **1.79** | -| | 500K | 60.70 | **4.70** | -| Difference | 10K | 0.91 | **0.21** | -| | 100K | 10.51 | **1.39** | -| | 500K | 58.12 | **3.13** | - -## Fold (r/fold) - -Parallel fold via `r/fold`. The tree is split into equal subtrees and folded in parallel. sorted-set and data.avl fall back to sequential reduce. - -Implementation note: `CollFold` is not just delegated blindly to `r/fold`. -`node-fold` splits the tree eagerly in the caller thread and then folds chunk -indices in parallel. That -keeps split overhead under control and avoids depending on dynamic bindings -inside worker tasks. - -| | N=10K | N=100K | N=500K | -|--|------:|-------:|-------:| -| sorted-set | 0.33ms | 3.02ms | 16.37ms | -| data.avl | 0.76ms | 4.34ms | 35.04ms | -| **ordered-set** | **0.13ms** | **0.73ms** | **3.97ms** | -| vs sorted-set | **2.5x** | **4.1x** | **4.1x** | -| vs data.avl | **5.8x** | **5.9x** | **8.8x** | - -## Construction - -Batch from collection (parallel fold + union). - -This is why constructor-based bulk loading is the right path to benchmark and -the right path to use. Sequential `conj` is a different workload and is covered -separately below. - -| | N=10K | N=100K | N=500K | -|--|------:|-------:|-------:| -| sorted-set | 4.65ms | 75.02ms | 542.18ms | -| data.avl | 2.33ms | 35.16ms | 291.11ms | -| **ordered-set** | **1.55ms** | **26.34ms** | **172.21ms** | -| vs sorted-set | **3.0x** | **2.8x** | **3.1x** | -| vs data.avl | **1.5x** | **1.3x** | **1.7x** | - -## Split - -100 splits at random keys. Weight-balanced trees have lower constant factors — no height recomputation. - -This is one of the cleanest demonstrations of the representation choice. Weight -composes trivially after join; AVL trees must recompute heights bottom-up. - -| | N=10K | N=100K | N=500K | -|--|------:|-------:|-------:| -| data.avl | 0.45ms | 0.68ms | 0.96ms | -| **ordered-set** | **0.07ms** | **0.09ms** | **0.12ms** | -| vs data.avl | **6.8x** | **7.2x** | **7.8x** | - -## Lookup - -10K random lookups. These are all in the same practical performance tier; small -differences are not especially meaningful compared with the much larger wins in -set algebra, split/join-derived operations, construction, and fold. - -Treat these as near-parity numbers, not as a headline differentiator. - -### Set (ms) - -| | N=10K | N=100K | N=500K | -|--|------:|-------:|-------:| -| sorted-set | 1.47 | 2.14 | 2.89 | -| data.avl | 1.51 | 2.13 | 2.85 | -| ordered-set | 1.42 | 2.58 | 2.94 | - -### Map (ms) - -| | N=10K | N=100K | N=500K | -|--|------:|-------:|-------:| -| sorted-map | 1.55 | 2.21 | 3.12 | -| data.avl | 1.45 | 2.11 | 2.85 | -| ordered-map | 1.21 | 1.87 | 2.71 | - -## Set Equality - -Randomized integer sets, measured in isolation with `lein bench-simple`. - -This is not one of the library's headline capabilities, but it is a useful -sanity check for ordered-set's direct tree comparison path. The most meaningful -cases are: - -- equal sets of the same size -- same-size sets differing in one element - -The cardinality-mismatch case is effectively a count check and is not very -interesting. - -### Equal sets - -| | N=1K | N=10K | N=100K | -|--|-----:|------:|-------:| -| hash-set | 0.03ms | 0.28ms | 3.66ms | -| sorted-set | 0.10ms | 1.19ms | 14.80ms | -| data.avl | 0.13ms | 1.40ms | 15.10ms | -| ordered-set | 0.01ms | 0.10ms | 1.59ms | -| vs hash-set | **3.4x** | **2.8x** | **2.3x** | -| vs sorted-set | **11.3x** | **12.0x** | **9.3x** | -| vs data.avl | **14.0x** | **14.1x** | **9.5x** | - -Interpretation: - -- by `1K`, ordered-set is already competitive to clearly faster on equal-set comparison -- by `10K`, ordered-set's direct ordered comparison path is clearly better -- by `100K`, ordered-set is still several times faster than hash-set and about - an order of magnitude faster than sorted-set and data.avl - -### Same size, one different element - -| | N=1K | N=10K | N=100K | -|--|-----:|------:|-------:| -| hash-set | 0.01ms | 0.10ms | 0.33ms | -| sorted-set | 0.10ms | 1.20ms | 14.93ms | -| data.avl | 0.11ms | 1.36ms | 15.07ms | -| ordered-set | 0.01ms | 0.10ms | 1.09ms | - -Interpretation: - -- hash-set still wins the unequal cases at larger sizes -- ordered-set still substantially beats sorted-set and data.avl -- ordered-set is still in the same practical tier at `10K`, then well ahead of - the ordered competitors by `100K` - -This is a good example of where direct ordered traversal helps beyond set -algebra itself: once sets are large enough, the library can compare two -compatible ordered sets very efficiently without falling back to generic -membership-oriented equality work. - -## Last Element - -1000 calls. O(log n) via `java.util.SortedSet.last()` vs O(n) seq traversal. - -This is endpoint access, not a claim about `clojure.core/last`. - -| | N=10K | N=100K | -|--|------:|-------:| -| sorted-set | 424ms | 5,085ms | -| data.avl | 538ms | 5,376ms | -| **ordered-set** | **0.12ms** | **0.13ms** | - -~40,000x faster at N=100K. Gap grows linearly with N. - -## Iteration (reduce) - -Both ordered-collections and data.avl implement direct tree reduction paths, -which is why both are much faster than seq-driven reduction over `sorted-set`. - -| | N=10K | N=100K | N=500K | -|--|------:|-------:|-------:| -| sorted-set | 0.31ms | 3.43ms | 17.81ms | -| data.avl | 0.06ms | 0.95ms | 5.31ms | -| ordered-set | 0.07ms | 0.98ms | 4.84ms | -4–5x faster than sorted-set at larger sizes. On par with data.avl. - -## Insert (sequential conj, not batch) - -This section is deliberately separate from construction. It measures repeated -single-element mutation, not bulk loading. - -| | N=10K | N=100K | N=500K | -|--|------:|-------:|-------:| -| sorted-set | 6.37ms | 71.62ms | 1,220ms | -| data.avl | 5.25ms | 49.26ms | 983ms | -| ordered-set | 4.61ms | 44.81ms | 906ms | - -## Delete - -| | N=10K | N=100K | N=500K | -|--|------:|-------:|-------:| -| sorted-set | 3.30ms | 36.43ms | 621.05ms | -| data.avl | 1.91ms | 26.13ms | 482.01ms | -| ordered-set | 1.82ms | 22.90ms | 461.38ms | - -## Positional Access - -Both O(log n). sorted-set has no nth/rank. - -These are enabled by subtree sizes. For this library, size is part of the core -tree invariant rather than an add-on feature. - -### nth (10K accesses by index) - -| | N=10K | N=100K | N=500K | -|--|------:|-------:|-------:| -| data.avl | 0.85ms | 1.73ms | 2.83ms | -| ordered-set | 1.40ms | 2.00ms | 2.80ms | - -data.avl is usually faster, but the gap narrows substantially at 500K. - -### rank-of (10K lookups) - -| | N=10K | N=100K | N=500K | -|--|------:|-------:|-------:| -| data.avl | 2.39ms | 3.49ms | 6.03ms | -| ordered-set | 1.15ms | 2.62ms | 4.99ms | - -ordered-set is ~1.2-2.1x faster in these measurements. - -## Interval Collections - -### Construction (ms) - -| | N=10K | N=100K | N=500K | -|--|------:|-------:|-------:| -| interval-set | 9.3 | 191.9 | 1,367.3 | -| interval-map | 10.4 | 233.0 | 1,667.2 | - -### Lookup (1K overlap queries, ms) - -| | N=10K | N=100K | N=500K | -|--|------:|-------:|-------:| -| interval-map | 61.4 | 80.3 | 94.9 | - -Sub-linear growth — O(log n + k) means query time depends more on result count than collection size. - -## Map Construction - -| | N=10K | N=100K | N=500K | -|--|------:|-------:|-------:| -| sorted-map | 3.14ms | 55.25ms | 436.80ms | -| data.avl | 2.37ms | 37.41ms | 335.96ms | -| ordered-map | **1.45ms** | **28.81ms** | **191.14ms** | - -2.3x faster than sorted-map and 1.8x faster than data.avl at N=500K in these measurements. - -## Rope vs PersistentVector - -The rope is a persistent sequence optimized for structural editing. Where -PersistentVector is O(n) for mid-sequence splice/insert/remove, the rope is -O(log n). The advantage is unbounded and grows linearly with collection size. - -### Structural Editing - -| Workload | N=10K | N=100K | N=500K | -|---|---:|---:|---:| -| 200 random edits — rope | 2.6ms | 2.2ms | 2.9ms | -| 200 random edits — vector | 97ms | 1.04s | 5.4s | -| **Speedup** | **38x** | **473x** | **1862x** | - -| Workload | N=10K | N=100K | N=500K | -|---|---:|---:|---:| -| Single splice — rope | 128µs | 54µs | 49µs | -| Single splice — vector | 842µs | 6.0ms | 27ms | -| **Speedup** | **7x** | **111x** | **551x** | - -At 500K elements, 200 random splice operations take ~3ms on the rope vs ~5.4 -seconds on the vector. The rope's time is nearly constant across sizes because -each operation is O(log n). - -### Concatenation - -| Workload | N=10K | N=100K | N=500K | -|---|---:|---:|---:| -| Concat many pieces — rope | 36µs | 123µs | 364µs | -| Concat many pieces — vector | 102µs | 800µs | 3.4ms | -| **Speedup** | **3x** | **7x** | **9x** | - -Bulk concatenation collects chunks in O(total chunks) and builds the tree -directly, avoiding pairwise tree operations. - -### Reduce - -| Workload | N=10K | N=100K | N=500K | -|---|---:|---:|---:| -| Reduce sum — rope | 145µs | 601µs | 2.8ms | -| Reduce sum — vector | 93µs | 617µs | 3.6ms | -| **Ratio** | 0.6x | ~1x | **1.3x** | - -The rope beats vectors on reduce at N >= 100K because 256-element chunks give -better cache locality per reduction step than PersistentVector's 32-wide trie -nodes. The rope uses a direct in-order tree walk (no enumerator frames) and -delegates to the vector's native `.reduce` for chunk-internal iteration. - -### Parallel Fold - -| Workload | N=10K | N=100K | N=500K | -|---|---:|---:|---:| -| Fold sum — rope | 0.22ms | 0.31ms | 0.85ms | -| Fold sum — vector | 1.23ms | 0.47ms | 1.13ms | -| **Speedup** | **5.6x** | **1.5x** | **1.3x** | - -The rope's `r/fold` uses tree-based fork-join decomposition — split at the -midpoint, fork left, compute right inline, join. This maps directly onto the -`ForkJoinPool` work-stealing model without a separate chunking pass. At small N -the speedup is largest because the rope's tree structure provides immediate -parallelism while the vector's fold has higher setup overhead. +$ lein bench-report # Analyze latest results (auto-selects baseline) +$ lein bench-report --all # Show all rows instead of top 30 +$ lein bench-report --publish # Omit scorecard, regressions, improvements +``` -### Rope vs String (text workload) +Results are written to `bench-results/.edn`. The report tool +reads the EDN and produces formatted output. To commit a snapshot: -For text-editing workloads, the rope also beats `java.lang.String` at scale: +``` +$ lein bench-report --publish > doc/report.txt +``` -| Workload (N=100K chars) | Rope | String | Speedup | -|---|---:|---:|---:| -| Splice 32 chars at midpoint | 3.6µs | 13.4µs | **3.8x** | -| Split at midpoint | 425ns | 2.6µs | **6.1x** | +`--publish` suppresses the Full Scorecard, Regressions, and Improvements +sections — those are useful for interactive A/B review but are noise for +outside readers of the committed snapshot. The full report remains the +default for `lein bench-report` so local inspection still sees everything. -String splice and split copy the entire string (O(n)); the rope does O(log n) -tree work. +## Methodology + +- **Criterium** for statistical benchmarking: JIT warmup, outlier + detection, confidence intervals. Quick-benchmark mode (6 samples) for + the full suite; full benchmark (60 samples) available via `--readme`. +- **Relative ratios** are more meaningful than absolute times. The report + presents speedup factors (>1x = we win) throughout. +- **Geometric mean** per category: the right average for ratios because + a 2x win and a 0.5x loss cancel to 1.0x, not 1.25x. +- **Auto-compare**: both `lein bench` and `lein bench-report` + automatically compare against the prior run and flag regressions + (>10% slower) and improvements (>10% faster). +- **Artifact versioning**: schema changes increment the artifact + version so tools fail cleanly on incompatible data rather than + silently producing wrong results. -### Where the Rope Loses -| Workload | Ratio | Why | -|---|---|---| -| Split / slice | ~20x slower | O(log n) vs O(1) `subvec` wrapper | -| Random nth | 0.4-0.7x | O(log n) vs O(1) trie lookup | -| Build via transient | 2-3x slower | Periodic O(log n) tree flush vs O(1) array append | +## How to Read the Results -These losses are inherent to the tree-backed design, not to the -implementation. The absolute times are microseconds. +### Set Algebra -## Specialized Node Types +Two sets of size N with 50% overlap. Adams' divide-and-conquer with +fork-join parallelism. This is the library's dominant advantage — the +split/join structure gives work-optimal set algebra that parallelizes +naturally. + +Fork-join thresholds (tuned empirically): +- union: root 131,072 / recursive 65,536 / sequential cutoff 64 +- intersection: root 65,536 / recursive 65,536 / sequential cutoff 64 +- difference: root 131,072 / recursive 65,536 / sequential cutoff 64 + +Against `sorted-set`, the gap is mainly algorithmic: `clojure.set` paths +over built-in sorted collections do not exploit a native split/join +algebra. Against `data.avl`, both libraries benefit from ordered trees, +but our split/join constant factors are lower and the operations also +parallelize. + +Against `clojure.core/set` (hash-set), this is not an ordered-collection +comparison — it's an exploratory stress test. Even so, the current +implementation wins decisively because hash-set operations are O(n) +membership scans while ours are O(n log n / p) with parallelism. + +### Construction + +Batch from collection via parallel fold + union. This is the right path +to benchmark and to use. Sequential `conj` is a different workload +(covered as "insert" in the bench runner). + +### Split + +Weight-balanced trees have lower constant factors than AVL for split — +no height recomputation needed. Weight composes trivially after join; +AVL trees must recompute heights bottom-up. + +### Fold (r/fold) + +`CollFold` is not just delegated blindly to `r/fold`. `node-fold` splits +the tree eagerly in the caller thread and folds subtrees in parallel. +This keeps split overhead under control and avoids depending on dynamic +bindings inside ForkJoinPool worker tasks. + +### Lookup + +All ordered-tree libraries are in the same practical tier for point +lookup: O(log n) with similar constants. Small differences are not +meaningful compared with the much larger wins in set algebra and +split/join-derived operations. + +### Iteration / Reduce + +Both ordered-collections and data.avl implement direct tree reduction +paths. The monomorphic reduce paths (added in 0.2.1) bypass the +PRopeChunk protocol for rope variants, yielding ~1.7-1.8x improvement +on byte-rope and string-rope reduce. + +### Rope Family + +The rope family (generic `rope`, `string-rope`, `byte-rope`) is +optimized for structural editing — concat, split, splice, insert, and +remove are O(log n) vs O(n) for the native baselines (PersistentVector, +String, byte[]). The advantage is unbounded and grows linearly with +collection size. + +**Where the ropes lose:** + +| Area | Ratio vs baseline | Why | +|------|-------------------|-----| +| Random nth | 0.01-0.2x | O(log n) tree descent vs O(1) array/trie lookup | +| Reduce | 0.1-0.5x (string/byte) | Per-chunk overhead vs bare array loop | +| Split | 0.1-0.3x at small N | O(log n) vs O(1) `subvec` / memcpy; crosses over at ~50K | +| Construction | 0.2-0.5x at large N | Tree building vs array allocation | +| Regex (re-seq) | 0.2x | CharSequence dispatch overhead vs String fast path in Pattern | + +These losses are architectural — inherent to the tree-backed design, not +the implementation. At scale, the structural editing wins (100-1300x) +dominate any mixed workload. + +**Monomorphic hot paths** (0.2.1): each variant inlines the tree walk +for `nth` and `reduce`, using direct chunk-type calls (`alength`/`aget`, +`.length`/`.charAt`, `.count`/`.nth`) instead of dispatching through the +`PRopeChunk` protocol at every tree level. This cuts `nth` cost by +~2-2.5x and `reduce` by ~1.7-3.3x vs the generic kernel path. + +### Specialized Node Types + +Primitive-specialized node types (`LongKeyNode`, `DoubleKeyNode`) +reduce boxing and comparison overhead for homogeneous-key cases. +`string-ordered-set` uses `String.compareTo` directly. These are +constant-factor optimizations on top of the same tree algebra. -Primitive-specialized node types reduce boxing and comparison overhead for -common homogeneous-key cases: +### Specialized Collections -```clojure -(long-ordered-set data) ;; unboxed long keys -(double-ordered-set data) ;; unboxed double keys -(string-ordered-set data) ;; String.compareTo -``` +Range maps, segment trees, priority queues, multisets, and fuzzy +collections are each benchmarked against their natural competitor: -These are constant-factor optimizations on top of the same tree algebra, not a -separate implementation strategy. +- **Segment tree** vs sorted-map subseq for range queries — the standout, + with O(log n) vs O(k) giving 2800x at 500K. +- **Priority queue** vs sorted-set-by of tuples — 2-4x on push/pop due + to avoiding tuple allocation. +- **Range map** vs Guava TreeRangeMap — comparable on construction and + lookup, 2-2.5x on carve-out insert and iteration due to persistent + structure sharing. +- **Fuzzy set/map** vs sorted-set with manual floor/ceiling — comparable. + The value is API ergonomics, not raw speed. diff --git a/doc/charts/byte-rope-crossover.png b/doc/charts/byte-rope-crossover.png new file mode 100644 index 0000000..d429442 Binary files /dev/null and b/doc/charts/byte-rope-crossover.png differ diff --git a/doc/charts/collection-winners.png b/doc/charts/collection-winners.png new file mode 100644 index 0000000..6f18953 Binary files /dev/null and b/doc/charts/collection-winners.png differ diff --git a/doc/charts/rope-editing-scaling.png b/doc/charts/rope-editing-scaling.png new file mode 100644 index 0000000..1e786bf Binary files /dev/null and b/doc/charts/rope-editing-scaling.png differ diff --git a/doc/charts/rope-operations-profile.png b/doc/charts/rope-operations-profile.png new file mode 100644 index 0000000..19c79c8 Binary files /dev/null and b/doc/charts/rope-operations-profile.png differ diff --git a/doc/charts/rope-vs-vector-absolute.png b/doc/charts/rope-vs-vector-absolute.png new file mode 100644 index 0000000..431b7f6 Binary files /dev/null and b/doc/charts/rope-vs-vector-absolute.png differ diff --git a/doc/charts/set-algebra-scaling.png b/doc/charts/set-algebra-scaling.png new file mode 100644 index 0000000..20c6e20 Binary files /dev/null and b/doc/charts/set-algebra-scaling.png differ diff --git a/doc/charts/string-rope-crossover.png b/doc/charts/string-rope-crossover.png new file mode 100644 index 0000000..aceb8bf Binary files /dev/null and b/doc/charts/string-rope-crossover.png differ diff --git a/doc/collections-api.md b/doc/collections-api.md index ef1fefc..18be906 100644 --- a/doc/collections-api.md +++ b/doc/collections-api.md @@ -556,6 +556,11 @@ split, splice, insert, and remove. Backed by a weight-balanced tree of chunk vectors. Implements `IPersistentVector` (`(vector? rope)` is true), `java.util.List`, `java.util.RandomAccess`, `Comparable`, and `r/fold`. +Small ropes (≤ 1024 elements) are stored as a raw `PersistentVector` +internally, skipping the tree wrapper entirely. Reads dispatch straight +to the vector with zero indirection; edits that grow past the threshold +transparently promote to chunked tree form. + ### Constructors - `rope` @@ -583,13 +588,13 @@ vectors. Implements `IPersistentVector` (`(vector? rope)` is true), |---|---|---| | `conj` | `[rope x]` | Append to end. | | `assoc` | `[rope i x]` | Replace element at index (or append if `i = count`). | -| `nth` | `[rope i]` `[rope i not-found]` | Positional access. O(log n). | +| `nth` | `[rope i]` `[rope i not-found]` | Positional access. O(log n) on tree mode; direct `PersistentVector.nth` on flat mode. | | `get` | `[rope i]` `[rope i not-found]` | Same as `nth`. | | `peek` | `[rope]` | Last element. | | `pop` | `[rope]` | Remove last element. | | `seq` / `rseq` | `[rope]` | Forward / reverse traversal. | | `reduce` | `[f rope]` `[f init rope]` | Chunk-aware reduction with `reduced` support. | -| `r/fold` | `[n combinef reducef rope]` | Parallel fork-join fold. | +| `r/fold` | `[n combinef reducef rope]` | Parallel fork-join fold. (Sequential reduce on flat-mode ropes since they are already small.) | | `compare` | `[rope1 rope2]` | Lexicographic comparison. | | `count` | `[rope]` | O(1). | @@ -601,3 +606,222 @@ supporting all rope operations including `assoc`, `conj`, `peek`, and `pop`. Since `Rope` implements `IPersistentVector`, `(vec rope)` returns the rope itself. Use `(into [] rope)` to materialize a `PersistentVector`. + +--- + +## String Rope + +Persistent chunked text sequence optimized for structural text editing: +O(log n) concat, split, splice, insert, and remove. Backed by a +weight-balanced tree of `java.lang.String` chunks. Implements +`java.lang.CharSequence` for seamless Java text interop, so it drops +into `java.util.regex`, `clojure.string`, and any API expecting text. + +Small strings (≤ 1024 characters) are stored as a raw `String` +internally with zero tree overhead. Edits that grow past the threshold +transparently promote to the chunked form. + +### Constructors + +- `string-rope` + - `(string-rope)` + - `(string-rope s)` — accepts a `String`, another `StringRope`, or + anything `str` can coerce. +- `string-rope-concat` + - `(string-rope-concat x)` — coerce one argument + - `(string-rope-concat a b)` — O(log n) binary join + - `(string-rope-concat a b & more)` — O(total chunks) bulk + +### Collection-specific operations + +All operations from the shared `PRope` protocol work on StringRope via +the same public functions documented for Rope: + +| Function | Signature(s) | Notes | +|---|---|---| +| `rope-concat` | `[x]` `[a b]` `[a b & more]` | Prefer `string-rope-concat` for type-preserving concat. | +| `rope-split` | `[sr i]` | Split at index, returns `[left right]`. O(log n). | +| `rope-sub` | `[sr start end]` | Structure-sharing subrange. O(log n). | +| `rope-insert` | `[sr i coll]` | Insert text at index. `coll` may be a `String`, another `StringRope`, or anything `(str coll)` can coerce. | +| `rope-remove` | `[sr start end]` | Remove range `[start, end)`. O(log n). | +| `rope-splice` | `[sr start end coll]` | Replace range with new text. O(log n). | +| `rope-chunks` | `[sr]` | Seq of internal `String` chunks. | +| `rope-str` | `[sr]` | Materialize to `java.lang.String` (same as `(str sr)`). | + +### Standard collection operations + +| Operation | Signature(s) | Notes | +|---|---|---| +| `count` | `[sr]` | O(1). | +| `nth` | `[sr i]` `[sr i not-found]` | Returns a `Character`. O(log n) on tree mode, O(1) on flat mode. | +| `get` / IFn | `[sr i]` `[sr i not-found]` | Same as `nth`. | +| `conj` | `[sr c]` | Append a single character. | +| `assoc` | `[sr i c]` | Replace character at index (or append if `i = count`). | +| `peek` | `[sr]` | Last character. | +| `pop` | `[sr]` | Remove last character. | +| `seq` / `rseq` | `[sr]` | Forward / reverse `Character` seq. | +| `reduce` | `[f sr]` `[f init sr]` | Chunk-aware reduction with `reduced` support. | +| `r/fold` | `[n combinef reducef sr]` | Parallel fork-join fold. | +| `str` | `[sr]` | Materialize content to a `java.lang.String`. | +| `compare` | `[a b]` | Lexicographic, matches `String.compareTo`. | + +### Java interop + +Because `StringRope` implements `java.lang.CharSequence`, it works +directly with: + +- `java.util.regex.Pattern` / `Matcher` — `re-find`, `re-seq`, + `re-matches`, `re-matcher` +- All `clojure.string` functions (they accept `CharSequence`) +- `java.io.Writer.append(CharSequence)` and friends + +```clojure +(def doc (oc/string-rope "the quick brown fox")) + +(re-find #"\w+" doc) ;=> "the" +(clojure.string/upper-case (str doc)) ;=> "THE QUICK BROWN FOX" +(count doc) ;=> 19 +(.charAt ^CharSequence doc 4) ;=> \q +``` + +### Equality and hashing + +- `(= (string-rope "x") "x")` is true — StringRope is equal to any + `CharSequence` with the same content. +- `(hash (string-rope "x"))` matches `(hash "x")`, so StringRope and + String can be used interchangeably as hash-map keys. +- `(= (string-rope "x") (oc/rope [\x]))` is false — the generic rope + and the string rope have different identity. + +### Printed form + +```clojure +#string/rope "hello world" +``` + +Round-trips through EDN via the `#string/rope` tagged literal. + +### Transient + +`(transient string-rope)` returns a `TransientStringRope` backed by a +`StringBuilder` tail buffer. Call `conj!` with characters (or +single-character strings) and `persistent!` to finalize. Useful for +batch construction of large strings. + +--- + +## Byte Rope + +Persistent chunked binary sequence: O(log n) concat, split, splice, +insert, and remove. Backed by a weight-balanced tree of `byte[]` +chunks. Bytes are exposed as unsigned longs in `[0, 255]` throughout +the API — storage is signed Java bytes (same bits), avoiding the usual +signed-byte pitfalls. + +Small byte sequences (≤ 1024 bytes) are stored as a raw `byte[]` +internally. Edits that grow past the threshold transparently promote +to chunked form. + +ByteRope is the immutable persistent counterpart to `java.nio.ByteBuffer` +/ protobuf `ByteString` / Okio `ByteString` — same conventions +(unsigned bytes, big-endian default, lexicographic compare via +`Arrays/compareUnsigned`), different semantics (persistent snapshots, +structural sharing, O(log n) edits). + +### Constructors + +- `byte-rope` + - `(byte-rope)` + - `(byte-rope x)` — accepts any of: + - `byte[]` (defensively copied) + - another `ByteRope` + - `String` (UTF-8 encoded) + - `java.io.InputStream` (fully consumed) + - sequential of unsigned integers in `[0, 255]` +- `byte-rope-concat` + - `(byte-rope-concat x)` — coerce one argument + - `(byte-rope-concat a b)` — O(log n) binary join + - `(byte-rope-concat a b & more)` — O(total chunks) bulk + +### Collection-specific operations + +From the shared `PRope` protocol: + +| Function | Signature(s) | Notes | +|---|---|---| +| `rope-split` | `[br i]` | Split at index. O(log n). | +| `rope-sub` | `[br start end]` | Structure-sharing subrange. O(log n). | +| `rope-insert` | `[br i coll]` | Insert bytes at index. `coll` may be a `byte[]`, another `ByteRope`, or a sequential of unsigned integers. | +| `rope-remove` | `[br start end]` | Remove range `[start, end)`. O(log n). | +| `rope-splice` | `[br start end coll]` | Replace range with new bytes. O(log n). | +| `rope-chunks` | `[br]` | Seq of internal `byte[]` chunks. | +| `rope-str` | `[br]` | Materialize to a defensively-copied `byte[]`. | + +### Byte-specific operations + +| Function | Signature(s) | Notes | +|---|---|---| +| `byte-rope-bytes` | `[br]` | Defensive-copy `byte[]` materialization. Same as `rope-str` but with a more precise name. | +| `byte-rope-hex` | `[br]` | Return a lowercase hex string. | +| `byte-rope-write` | `[br out]` | Stream chunks to a `java.io.OutputStream`. | +| `byte-rope-input-stream` | `[br]` | Return a fresh `java.io.InputStream` over the contents. | +| `byte-rope-get-byte` | `[br offset]` | Unsigned byte value (long in `[0, 255]`). | +| `byte-rope-get-short` | `[br offset]` | Big-endian unsigned 16-bit integer. | +| `byte-rope-get-short-le` | `[br offset]` | Little-endian unsigned 16-bit integer. | +| `byte-rope-get-int` | `[br offset]` | Big-endian signed 32-bit integer. | +| `byte-rope-get-int-le` | `[br offset]` | Little-endian signed 32-bit integer. | +| `byte-rope-get-long` | `[br offset]` | Big-endian signed 64-bit integer. | +| `byte-rope-get-long-le` | `[br offset]` | Little-endian signed 64-bit integer. | +| `byte-rope-index-of` | `[br b]` `[br b from]` | First index of the unsigned byte value, or -1. | +| `byte-rope-digest` | `[br algorithm]` | Compute a cryptographic digest (`"SHA-256"`, `"MD5"`, etc.) by streaming chunks through `java.security.MessageDigest`. Returns a ByteRope of the digest. | + +### Standard collection operations + +| Operation | Signature(s) | Notes | +|---|---|---| +| `count` | `[br]` | O(1). | +| `nth` | `[br i]` `[br i not-found]` | Returns an unsigned long in `[0, 255]`. O(log n) on tree mode, O(1) on flat mode. | +| `get` / IFn | `[br i]` `[br i not-found]` | Same as `nth`. | +| `conj` | `[br b]` | Append a single byte (accepts an integer in `[0, 255]`). | +| `assoc` | `[br i b]` | Replace byte at index (or append if `i = count`). | +| `peek` | `[br]` | Last byte as an unsigned long. | +| `pop` | `[br]` | Remove last byte. | +| `seq` / `rseq` | `[br]` | Forward / reverse seq of unsigned longs. | +| `reduce` | `[f br]` `[f init br]` | Chunk-aware reduction with `reduced` support. | +| `r/fold` | `[n combinef reducef br]` | Parallel fork-join fold. | +| `compare` | `[a b]` | Unsigned lexicographic via `Arrays/compareUnsigned`. | + +### Equality and hashing + +- `(= (byte-rope (byte-array [1 2 3])) (byte-array [1 2 3]))` is true — + ByteRope is equal to a `byte[]` with the same content. +- `(= (byte-rope [1 2 3]) [1 2 3])` is **false** — intentionally not equal + to a Clojure vector, to avoid signed vs unsigned confusion. +- `(hash (byte-rope [1 2 3]))` is a content-based Murmur3 hash over the + unsigned byte values. (Clojure's default `hash` on a raw `byte[]` is + identity-based, not content-based, so ByteRope hash and byte[] hash + are not comparable — use ByteRope instances as hash-map keys.) + +### Printed form + +```clojure +#byte/rope "48656c6c6f" +``` + +Round-trips through EDN via the `#byte/rope` tagged literal. The literal +content is a lowercase hex string. + +### Transient + +`(transient byte-rope)` returns a `TransientByteRope` backed by a +`ByteArrayOutputStream` tail buffer. Call `conj!` with unsigned integer +values and `persistent!` to finalize. + +### Does NOT implement + +- `java.lang.CharSequence` — ByteRope is not text. Convert explicitly + via `(String. (byte-rope-bytes br) "UTF-8")` or similar. +- `IPersistentVector` — ByteRope is a specialized byte sequence, not + a general vector. +- `java.util.List` — too many mutable method stubs to implement + meaningfully for an unsigned byte domain. diff --git a/doc/cookbook.md b/doc/cookbook.md index f317337..9c268e3 100644 --- a/doc/cookbook.md +++ b/doc/cookbook.md @@ -7,11 +7,299 @@ Practical examples showing where ordered-collections shines. ```clojure (require '[ordered-collections.core :as oc]) (require '[clojure.core.reducers :as r]) +(require '[clojure.string :as str]) ``` --- -## 1. Leaderboard with Rank Queries +## Ropes + +The library provides three rope variants that share one tree kernel: a +generic `rope` for arbitrary Clojure values, a `string-rope` specialized +for text, and a `byte-rope` specialized for binary data. All three support +O(log n) concat, split, splice, insert, and remove; the specialized +variants add type-appropriate Java interop (CharSequence, byte[]) and +faster materialization. + +--- + +## 1. Text Editor Buffer (StringRope) + +**Problem:** An editor needs to insert, delete, and replace characters +anywhere in a document with low latency, regardless of document size. +Plain strings are O(n) per edit because every character after the cut +must be shifted. A StringRope makes every edit O(log n) and keeps old +versions available for free. + +```clojure +(def doc (oc/string-rope "The quick brown fox jumps over the lazy dog.")) + +(count doc) ;; => 44 +(nth doc 10) ;; => \b +(str doc) ;; => "The quick brown fox jumps over the lazy dog." + +;; Insert at cursor — O(log n) +(def v1 (oc/rope-insert doc 10 "dark ")) +(str v1) +;; => "The quick dark brown fox jumps over the lazy dog." + +;; Delete a range — O(log n) +(def v2 (oc/rope-remove v1 10 15)) +(str v2) +;; => "The quick brown fox jumps over the lazy dog." + +;; Find-and-replace is just splice — O(log n) +(def v3 (oc/rope-splice doc 16 19 "cat")) +(str v3) +;; => "The quick brown cat jumps over the lazy dog." + +;; Extract the visible window — shares structure, no copying +(str (oc/rope-sub v3 4 19)) +;; => "quick brown cat" + +;; Undo history is free — every version is a persistent snapshot +;; sharing structure with its parent. +(def history [doc v1 v2 v3]) +(mapv count history) ;; => [44 49 44 44] +``` + +**Why StringRope?** String edits in the middle are O(n) because every +character after the cut must be shifted. A StringRope does each edit in +O(log n). At 100K characters with 200 random edits, StringRope is **~38x +faster than plain String**; at 500K characters the gap grows to **~130x**. +`StringRope` implements `java.lang.CharSequence`, so it drops +into any Java API expecting text, and `(str sr)` materializes back to a +regular Java `String` whenever you need one. + +--- + +## 2. Regex and clojure.string on Large Text (StringRope) + +**Problem:** Run regex matching, `clojure.string` helpers, and ad-hoc +`java.util.regex.Matcher` work on a multi-megabyte log file that you +also want to edit in place. + +```clojure +(def log-text + (oc/string-rope + (str "2026-04-10 09:14 INFO started user=alice\n" + "2026-04-10 09:14 INFO request path=/home user=alice\n" + "2026-04-10 09:15 ERROR auth failed token=xyz123 user=bob\n" + "2026-04-10 09:16 INFO request path=/login user=bob\n" + "2026-04-10 09:17 ERROR db timeout password=s3cret user=bob\n"))) + +;; StringRope implements CharSequence, so all of java.util.regex works directly. +(re-seq #"ERROR.*" log-text) +;; => ("ERROR auth failed token=xyz123 user=bob" +;; "ERROR db timeout password=s3cret user=bob") + +;; clojure.string functions accept CharSequence +(count (str/split-lines log-text)) +;; => 5 + +;; Redact sensitive fields — each replace is O(log n) per match, +;; not O(n) per match like a flat String. +(def sanitized + (-> log-text + (str/replace #"password=\S+" "password=") + (str/replace #"token=\S+" "token="))) + +(str/includes? (str sanitized) "") ;; => true + +;; java.util.regex.Matcher works on the rope directly +(let [m (re-matcher #"user=(\w+)" log-text)] + (loop [users #{}] + (if (.find m) + (recur (conj users (.group m 1))) + users))) +;; => #{"alice" "bob"} +``` + +**Why StringRope?** The `CharSequence` contract means every Java text +API works without conversion. You can hold a multi-megabyte log as a +rope, run regex and `clojure.string` over it, and splice in edits — all +in O(log n) per edit. A plain `String` would force an O(n) copy on +every `str/replace` match. + +--- + +## 3. Assembling Large Sequences from Many Parts (Rope) + +**Problem:** Collect data from many sources and merge the pieces into +one indexable sequence without paying O(n²) for repeated `into`. + +```clojure +;; Imagine sensor batches arriving from many collectors +(def batches + (for [i (range 20)] + (vec (range (* i 1000) (* (inc i) 1000))))) + +;; Naive vector concat — each `into` copies everything accumulated. +;; Fine for 4 batches, collapses to O(n²) for many. +(def naive (reduce into [] batches)) + +;; Rope concat — O(k log n). Each piece is joined structurally; +;; nothing is copied. +(def combined (apply oc/rope-concat (map oc/rope batches))) + +(count combined) ;; => 20000 +(nth combined 12345) +;; => 12345 + +;; The combined rope stays fully efficient for downstream work +(reduce + 0 combined) +;; => 199990000 + +;; Parallel fold splits along the natural tree structure +(r/fold + combined) +;; => 199990000 +``` + +**Why Rope?** When you assemble a sequence from many sources, vector +`into` is O(*total accumulated*) at each step and degrades to O(n²). +Rope concat is O(log n) per join, and the combined rope remains +efficient for random access, reduce, parallel fold, and further +splicing. + +--- + +## 4. Binary Protocol Assembly (ByteRope) + +**Problem:** Build a framed network message with a length header plus +payload, then insert a checksum, then slice out just the payload. With +`byte[]` every edit forces an `arraycopy` of the entire buffer; with +ByteRope each edit is O(log n). + +```clojure +;; Framed message format: [u32 big-endian length] [payload bytes] +(defn pack-message [^bytes payload] + (let [len (alength payload) + header (byte-array 4)] + (aset header 0 (unchecked-byte (bit-shift-right len 24))) + (aset header 1 (unchecked-byte (bit-shift-right len 16))) + (aset header 2 (unchecked-byte (bit-shift-right len 8))) + (aset header 3 (unchecked-byte len)) + (oc/byte-rope-concat (oc/byte-rope header) (oc/byte-rope payload)))) + +(def msg (pack-message (.getBytes "Hello, World!" "UTF-8"))) + +(count msg) ;; => 17 +(oc/byte-rope-get-int msg 0) ;; => 13 (4-byte BE length) +(oc/byte-rope-get-byte msg 4) ;; => 72 (unsigned 'H') +(oc/byte-rope-hex msg) +;; => "0000000d48656c6c6f2c20576f726c6421" + +;; Splice a checksum between header and payload — O(log n) +(def with-csum + (oc/rope-insert msg 4 (byte-array [(unchecked-byte 0xde) + (unchecked-byte 0xad) + (unchecked-byte 0xbe) + (unchecked-byte 0xef)]))) + +(oc/byte-rope-hex with-csum) +;; => "0000000ddeadbeef48656c6c6f2c20576f726c6421" + +;; Extract just the payload — shares structure with the original +(def payload (oc/rope-sub with-csum 8 (count with-csum))) +(String. (oc/byte-rope-bytes payload) "UTF-8") +;; => "Hello, World!" +``` + +**Why ByteRope?** Bytes are exposed as unsigned longs (0–255), avoiding +signed-byte pitfalls. Big-endian and little-endian multi-byte reads +(`byte-rope-get-short/int/long` with `-le` variants) are built in. You +can splice, insert, or remove byte ranges in O(log n) instead of copying +the whole buffer, which is exactly what protocol assembly and packet +editing want. + +--- + +## 5. Streaming Cryptographic Digest (ByteRope) + +**Problem:** Compute a SHA-256 (or any `MessageDigest` algorithm) over +a large binary value without materializing the whole thing as one +`byte[]`. + +```clojure +;; Build a ~2 MB rope from 1-KB chunks — no intermediate copies +(def large-data + (apply oc/byte-rope-concat + (for [i (range 2048)] + (byte-array 1024 (unchecked-byte i))))) + +(count large-data) ;; => 2097152 + +;; Compute SHA-256 by streaming chunks through MessageDigest. +;; The rope never materializes the whole thing — each chunk is fed +;; directly into the digest in its natural block size. +(oc/byte-rope-hex (oc/byte-rope-digest large-data "SHA-256")) +;; => "…64 hex chars…" + +;; Any algorithm the JVM supports +(oc/byte-rope-hex (oc/byte-rope-digest large-data "MD5")) +(oc/byte-rope-hex (oc/byte-rope-digest large-data "SHA-512")) + +;; Digests are themselves byte-ropes, so you can splice them into +;; other messages without conversion +(def stamped + (oc/byte-rope-concat + large-data + (oc/byte-rope-digest large-data "SHA-256"))) +``` + +**Why ByteRope?** `byte-rope-digest` iterates the rope chunk-by-chunk +through `java.security.MessageDigest` without building an intermediate +`byte[]`. The same pattern applies to streaming compression, encryption, +and any block-oriented consumer. For multi-gigabyte ropes this is the +difference between working and OOM. + +--- + +## 6. Persistent Undo History (any rope variant) + +**Problem:** Keep an arbitrarily long edit history for a document +without paying the memory cost of a full copy per version. + +```clojure +(def v0 (oc/string-rope "initial document")) +(def v1 (oc/rope-insert v0 0 "The ")) +(def v2 (oc/rope-splice v1 4 7 "my")) +(def v3 (oc/rope-splice v2 0 3 "this is")) +(def v4 (oc/rope-insert v3 (count v3) "!")) + +(mapv str [v0 v1 v2 v3 v4]) +;; => ["initial document" +;; "The initial document" +;; "The my document" +;; "this is my document" +;; "this is my document!"] + +;; All five versions coexist. Each is a persistent snapshot that shares +;; structure with its neighbours — most of the internal tree nodes are +;; reused across versions. + +;; Undo is just picking an older reference +(def current v4) +(def after-undo (nth [v0 v1 v2 v3] 2)) +(str after-undo) +;; => "The my document" + +;; Diff two versions without materializing either +(= v1 v3) ;; => false +(count v1) ;; => 20 +(count v3) ;; => 19 +``` + +**Why Rope?** Persistent ropes make undo trivial — every edit returns +a new value whose internal tree mostly overlaps the previous one. You +can keep hundreds of historical versions of a megabyte document for +the cost of tens of kilobytes. The same pattern works for StringRope +(text editors), ByteRope (binary patch editors), and generic Rope +(any sequential data with a cursor). + +--- + +## 7. Leaderboard with Rank Queries **Problem:** Maintain a leaderboard where you need to: - Add player scores @@ -70,7 +358,7 @@ Practical examples showing where ordered-collections shines. --- -## 2. Time-Series Windowing +## 8. Time-Series Windowing **Problem:** Store timestamped events and efficiently query time ranges. @@ -118,7 +406,7 @@ Practical examples showing where ordered-collections shines. --- -## 3. Meeting Room Scheduler +## 9. Meeting Room Scheduler **Problem:** Track meeting room bookings and find conflicts or free slots. @@ -161,7 +449,7 @@ Practical examples showing where ordered-collections shines. --- -## 4. Persistent Work Queue +## 10. Persistent Work Queue **Problem:** Schedule work by priority, while keeping stable ordering among equal priorities. @@ -201,7 +489,7 @@ Practical examples showing where ordered-collections shines. --- -## 5. Parallel Aggregation +## 11. Parallel Aggregation **Problem:** Aggregate large datasets efficiently using multiple cores. @@ -240,7 +528,7 @@ Practical examples showing where ordered-collections shines. --- -## 6. Efficient Set Algebra +## 12. Efficient Set Algebra **Problem:** Compute intersections/unions/differences on large sorted sets. @@ -272,7 +560,7 @@ Practical examples showing where ordered-collections shines. --- -## 7. Sliding Window Statistics +## 13. Sliding Window Statistics **Problem:** Maintain statistics over a sliding time window. @@ -314,7 +602,7 @@ Practical examples showing where ordered-collections shines. --- -## 8. Range Aggregate Queries (Segment Tree) +## 14. Range Aggregate Queries (Segment Tree) **Problem:** Answer "what is the sum/max/min of values from key a to key b?" with efficient updates. @@ -355,7 +643,7 @@ Practical examples showing where ordered-collections shines. --- -## 9. Database Index Simulation +## 15. Database Index Simulation **Problem:** Build a secondary index supporting range queries. @@ -401,7 +689,7 @@ Practical examples showing where ordered-collections shines. --- -## 10. Ordered Multiset +## 16. Ordered Multiset **Problem:** Track duplicate values while keeping them sorted. @@ -431,7 +719,7 @@ Practical examples showing where ordered-collections shines. --- -## 11. Fuzzy Lookup / Nearest Neighbor +## 17. Fuzzy Lookup / Nearest Neighbor **Problem:** Find the closest matching value when exact match doesn't exist. @@ -470,7 +758,7 @@ Practical examples showing where ordered-collections shines. --- -## 12. Splitting Collections +## 18. Splitting Collections **Problem:** Partition a collection at a key or index for divide-and-conquer algorithms. @@ -510,7 +798,7 @@ Practical examples showing where ordered-collections shines. --- -## 13. Subrange Extraction +## 19. Subrange Extraction **Problem:** Extract a contiguous range of elements by key bounds. @@ -544,7 +832,7 @@ Practical examples showing where ordered-collections shines. --- -## 14. Floor/Ceiling Queries +## 20. Floor/Ceiling Queries **Problem:** Find the nearest element at or above/below a target. @@ -585,6 +873,136 @@ Practical examples showing where ordered-collections shines. --- +## 21. Non-Overlapping Ranges (Range Map) + +**Problem:** Map non-overlapping half-open ranges `[lo, hi)` to values and +query by point. Inserting a new range should automatically carve out +whatever it overlaps with existing ranges. Classic use cases: pricing +tiers, version-gated feature flags, IP subnet ownership, memory-mapped +region tables. + +```clojure +;; Customer tier based on cumulative purchase amount +(def tiers + (-> (oc/range-map) + (assoc [0 100] :bronze) + (assoc [100 500] :silver) + (assoc [500 5000] :gold) + (assoc [5000 25000] :platinum))) + +(tiers 75) ;; => :bronze (point lookup) +(tiers 250) ;; => :silver +(tiers 5000) ;; => :platinum +(tiers 30000) ;; => nil (outside all ranges) + +;; Which range does a point belong to? +(oc/get-entry tiers 250) +;; => [[100 500] :silver] + +;; Insert a flash-sale range — bronze and silver are automatically +;; carved out so the new range sits cleanly in the middle. +(def with-sale (assoc tiers [50 200] :flash)) + +(oc/ranges with-sale) +;; => ([[0 50] :bronze] ← auto-trimmed +;; [[50 200] :flash] ← inserted +;; [[200 500] :silver] ← auto-trimmed +;; [[500 5000] :gold] +;; [[5000 25000] :platinum]) + +(with-sale 75) ;; => :flash (used to be :bronze) +(with-sale 600) ;; => :gold (unchanged) + +;; What ranges are unallocated? +(oc/gaps tiers) +;; => () (contiguous from 0 to 25000) + +(oc/gaps (-> (oc/range-map) + (assoc [0 100] :a) + (assoc [500 1000] :b))) +;; => ([100 500]) + +;; Coalescing: adjacent ranges with the same value merge automatically. +(-> (oc/range-map) + (oc/assoc-coalescing [0 50] :a) + (oc/assoc-coalescing [50 100] :a) ; adjacent & same value → merged + (oc/assoc-coalescing [100 150] :b) + oc/ranges) +;; => ([[0 100] :a] [[100 150] :b]) + +;; Range removal clears a region entirely, leaving a gap. +(oc/ranges (oc/range-map (-> tiers (oc/range-remove [100 500])))) +;; => ([[0 100] :bronze] [[500 5000] :gold] [[5000 25000] :platinum]) +``` + +**Why ordered-collections?** Range-map is a persistent version of +Guava's `TreeRangeMap` — overlap detection is O(log n + k), point +lookup is O(log n), and the carve-out-on-insert semantics mean you +never have to manually split overlapping ranges. The persistent +structure gives you free snapshots of pricing history or feature +rollouts. + +--- + +## 22. Availability Windows (Interval Set) + +**Problem:** Track a set of half-open time intervals (maintenance +windows, busy periods, quiet hours, event schedules) and answer "is +the current time inside any of them?". Unlike an interval-map, you +don't have per-interval values — you just want overlap membership. + +```clojure +;; Service maintenance windows — a set of intervals (no values). +;; Scalar values are treated as point intervals. +(def maintenance + (oc/interval-set [[100 130] ; Mon 01:00-01:30 + [700 730] ; Mon 07:00-07:30 + [1200 1215] ; Mon 12:00-12:15 (deploy) + [2200 2230]])) ; Mon 22:00-22:30 + +;; Point query — is time t inside any maintenance window? +(defn maintenance-now? [t] + (boolean (seq (oc/overlapping maintenance t)))) + +(maintenance-now? 105) ;; => true (inside [100 130]) +(maintenance-now? 600) ;; => false +(maintenance-now? 1210) ;; => true (inside [1200 1215]) + +;; Range query — which windows overlap a time range? +(oc/overlapping maintenance [115 720]) +;; => ([100 130] [700 730]) + +;; Set algebra on intervals +(def planned + (oc/interval-set [[100 130] [700 730] [1200 1215]])) + +(def ad-hoc + (oc/interval-set [[800 815] [1205 1210] [1800 1830]])) + +(oc/union planned ad-hoc) +;; => all planned + ad-hoc windows + +(oc/intersection planned ad-hoc) +;; => #{[1200 1215]} ← the one overlap (structurally: 1205-1210 is a subset) + +;; When does the next maintenance end? +(defn next-free-slot [iset now] + (when-let [[_ hi] (first (oc/overlapping iset now))] + hi)) + +(next-free-slot maintenance 1205) ;; => 1215 +(next-free-slot maintenance 1500) ;; => nil (not in a window) +``` + +**Why ordered-collections?** Interval-set gives you O(log n + k) +overlap queries via the interval-tree augmentation, plus set algebra +(`union`, `intersection`, `difference`) over intervals as elements. +When you care about "what's happening" and not "what values are +attached", interval-set is lighter than interval-map and supports +the same spatial-query operations. + +--- + ## Performance Tips 1. **Use `reduce` over `seq`** - Direct reduce uses optimized IReduceInit path @@ -645,82 +1063,14 @@ Practical examples showing where ordered-collections shines. (oc/string-ordered-set ["alice" "bob" "carol"]) ``` ---- - -## 11. Document Editor Buffer - -**Problem:** Implement an editor buffer that supports: -- Efficient insert and delete at any position -- Undo/redo via persistent snapshots -- Extracting a visible window without copying the whole document -- Fast bulk assembly from parts (e.g., loading a file in chunks) - -Vectors are O(n) for mid-document edits — every insertion shifts all subsequent -elements. A rope does these in O(log n), and persistent snapshots are -nearly free because edits share structure with previous versions. - -```clojure -;; Build a document from paragraphs -(def doc - (apply oc/rope-concat - (map oc/rope - [["T" "h" "e" " " "q" "u" "i" "c" "k" " "] - ["b" "r" "o" "w" "n" " " "f" "o" "x" " "] - ["j" "u" "m" "p" "s" "."]]))) - -(count doc) ;; => 26 -(nth doc 10) ;; => "b" -(apply str doc) ;; => "The quick brown fox jumps." - -;; Insert at cursor position — O(log n) -(def after-insert - (oc/rope-insert doc 10 ["dark " ])) - -;; after-insert is not a copy — it shares almost all structure with doc -(apply str after-insert) -;; => "The quick dark brown fox jumps." - -;; Delete a range — O(log n) -(def after-delete - (oc/rope-remove after-insert 10 15)) - -(apply str after-delete) -;; => "The quick brown fox jumps." - -;; Replace (find and replace) — O(log n) -(def after-replace - (oc/rope-splice doc 4 9 (seq "slow"))) - -(apply str after-replace) -;; => "The slow brown fox jumps." - -;; Undo: just keep the old version — structural sharing makes this cheap -(def history [doc after-insert after-delete after-replace]) -(apply str (nth history 0)) ;; => "The quick brown fox jumps." -(apply str (nth history 1)) ;; => "The quick dark brown fox jumps." +8. **Pick the right rope variant** + ```clojure + ;; Text editing — StringRope beats plain String at ~100+ chars + (oc/string-rope "…") -;; Extract visible window — O(log n), shares structure -(def visible (oc/rope-sub doc 4 15)) -(apply str visible) ;; => "quick brown" + ;; Binary data — ByteRope beats byte[] once edits get expensive + (oc/byte-rope #_…) -;; Split document at a section break -(let [[before after] (oc/rope-split doc 10)] - [(apply str before) (apply str after)]) -;; => ["The quick " "brown fox jumps."] -``` - -**Why ordered-collections?** Every edit is O(log n) regardless of document -size. A 100K-character document with 200 random edits takes ~3ms on a rope vs -~5 seconds on a vector — a 1,968x advantage. The persistent structure means -undo history is just a list of references, not copies. - -**Scaling:** - -| Operation | Rope | Vector | -|---|---|---| -| Insert at position | O(log n) | O(n) | -| Delete range | O(log n) | O(n) | -| Concatenate documents | O(log n) | O(n) | -| Extract visible window | O(log n) | O(1) | -| Undo (keep old version) | free | O(n) copy | -| Reduce over full document | O(n) | O(n) | + ;; Anything else sequential — the generic rope + (oc/rope [1 2 3 …]) + ``` diff --git a/doc/report.txt b/doc/report.txt new file mode 100644 index 0000000..6cbc454 --- /dev/null +++ b/doc/report.txt @@ -0,0 +1,320 @@ + _ _ _ +| |__ ___ _ _ __| |_ _ _ ___ _ __ ___ _ _| |_ +| '_ \/ -_) ' \/ _| ' \ | '_/ -_) '_ \/ _ \ '_| _| +|_.__/\___|_||_\__|_||_| |_| \___| .__/\___/_| \__| + |_| + + +──────────────────────────────────────────────────────────────────────── + Run +──────────────────────────────────────────────────────────────────────── +File /Users/dan/src/ordered-collections/bench-results/2026-04-17_11-03-53.edn +Baseline /Users/dan/src/ordered-collections/bench-results/2026-04-12_16-48-22.edn +Timestamp 2026-04-17T17:32:10.303781Z +Mode :full +Artifact version 3 +Git branch 021-specialized-ropes +Git rev 990b9a5114 +Sizes [1000 5000 10000 100000 500000] +Benchmark cases 1254 +Benchmark groups 96 + +──────────────────────────────────────────────────────────────────────── + Platform +──────────────────────────────────────────────────────────────────────── +Host kiwi +OS Mac OS X 26.3.1 aarch64 +Processors 12 +Java 25.0.2 (Homebrew) +VM OpenJDK 64-Bit Server VM +Max memory (MB) 8192 +Heap max (MB) 8192 +Heap committed (MB) 376 +Heap used (MB) 107 + +──────────────────────────────────────────────────────────────────────── + Baseline Run +──────────────────────────────────────────────────────────────────────── +File /Users/dan/src/ordered-collections/bench-results/2026-04-12_16-48-22.edn +Timestamp 2026-04-12T23:16:26.435562Z +Mode :full +Git branch 021-specialized-ropes +Git rev 6acf560a90 + +──────────────────────────────────────────────────────────────────────── + Summary +──────────────────────────────────────────────────────────────────────── +1254 benchmarks across 96 groups at N=1000, 5000, 10000, 100000, 500000. 154 +wins, 18 at parity, 88 losses. Best win: 1236.6x on rope-repeated-edits. +Worst loss: 9.1x slower on string-rope-re-seq. 92 regressions, 98 +improvements vs baseline. + +──────────────────────────────────────────────────────────────────────── + Headline Performance +──────────────────────────────────────────────────────────────────────── + + Set Algebra vs sorted-set + N=1000 N=5000 N=10000 N=100000 N=500000 +Union 11.4x 10.6x 12.9x 23.4x 59.6x +Intersection 7.6x 7.7x 9.3x 15.5x 34.6x +Difference 9.4x 10.5x 10.8x 24.3x 54.7x + + Set Algebra vs data.avl + N=1000 N=5000 N=10000 N=100000 N=500000 +Union 7.9x 7.8x 9.7x 20.0x 51.3x +Intersection 6.5x 5.8x 6.9x 13.1x 29.9x +Difference 5.9x 6.2x 7.9x 15.0x 37.2x + + Set Algebra vs clojure.core/set + N=1000 N=5000 N=10000 N=100000 N=500000 +Union 3.5x 2.9x 3.7x 7.2x 20.5x +Intersection 3.4x 3.0x 3.5x 6.6x 17.8x +Difference 3.6x 3.7x 4.6x 9.8x 28.4x + + Ordered Set vs sorted-set + N=1000 N=5000 N=10000 N=100000 N=500000 +Construction 2.2x 1.9x 2.1x 2.4x 2.8x +Lookup 1.3x 1.1x 1.1x 1.0x 0.9x +Iteration 1.7x 1.0x 1.1x 1.0x 1.0x +Fold 1.1x 2.0x 2.5x 4.0x 3.5x +Split 5.1x 4.4x 4.9x 6.1x 6.8x + + Ordered Set vs data.avl + N=1000 N=5000 N=10000 N=100000 N=500000 +Construction 1.1x 1.0x 1.1x 1.2x 1.5x +Lookup 1.2x 0.9x 1.0x 0.9x 0.9x +Iteration 0.2x 0.3x 0.4x 0.3x 0.3x + + Ordered Map vs sorted-map + N=1000 N=5000 N=10000 N=100000 N=500000 +Construction 1.3x 1.2x 1.4x 1.6x 2.3x +Lookup 1.2x 1.3x 1.2x 1.2x 1.1x +Iteration 2.8x 1.1x 1.1x 0.9x 1.0x +Reduce 3.1x 1.1x 1.2x 1.0x 1.0x + + Ordered Map vs data.avl + N=1000 N=5000 N=10000 N=100000 N=500000 +Construction 1.1x 1.0x 1.2x 1.2x 1.6x +Lookup 1.4x 1.2x 1.2x 1.1x 1.0x +Iteration 0.5x 0.8x 0.8x 0.7x 0.6x + + Long-Specialized vs sorted-set + N=1000 N=5000 N=10000 N=100000 N=500000 +Construction 0.6x 1.5x 1.8x 1.9x 2.6x +Lookup 1.9x 1.4x 1.2x 1.3x 1.2x +Union 6.6x 7.4x 9.6x 16.4x 41.3x +Intersection 4.7x 5.2x 6.7x 11.1x 27.3x +Difference 6.4x 7.2x 8.3x 16.9x 38.0x + + String-Specialized vs sorted-set-by + N=1000 N=5000 N=10000 N=100000 N=500000 +Construction 1.1x 2.2x 2.8x 2.8x 2.8x +Lookup 1.7x 1.6x 1.5x 1.3x 1.0x +Union 1.4x 1.9x 2.2x 2.1x 5.8x +Intersection 1.4x 2.0x 2.2x 2.1x 5.4x +Difference 2.4x 3.6x 4.2x 8.8x 13.7x + + Rope vs PersistentVector + N=1000 N=5000 N=10000 N=100000 N=500000 +200 Random Edits 4.7x 13.7x 25.7x 261.4x 1236.6x +Single Splice 4.8x 12.7x 105.6x 761.8x 863.1x +Concat Pieces 169.3x 22.4x 29.0x 39.1x 35.9x +Chunk Iteration 1.0x 1.0x 1.0x 1.0x 1.0x +Reduce (sum) 1.0x 1.7x 1.4x 1.5x 1.4x +Fold (sum) 2.9x 1.4x 1.2x 1.4x 1.6x +Random nth (1000) 0.5x 0.2x 0.2x 0.2x 0.2x +Fold (freq map) 1.0x 0.7x 0.7x 1.0x 1.2x + + StringRope vs String + N=1000 N=5000 N=10000 N=100000 N=500000 +Single Splice 0.4x 3.2x 5.9x 41.7x 349.2x +Single Insert 0.4x 2.7x 6.2x 39.7x 153.5x +Single Remove 1.5x 3.6x 7.1x 43.8x 412.2x +Concat Halves 0.9x 0.5x 2.5x 20.0x 28.9x +Split at Midpoint 0.9x 0.1x 0.3x 1.7x 6.6x +200 Random Edits 0.6x 2.6x 5.7x 38.2x 129.6x +Random nth (1000) 0.2x 0.1x 0.1x 0.0x 0.0x +Reduce (sum chars) 0.5x 0.5x 0.5x 0.5x 0.5x +re-find 1.3x 0.1x 0.1x 0.1x 0.1x +re-seq 0.6x 0.2x 0.2x 0.1x 0.1x +Materialization (str) 0.7x 0.0x 0.0x 0.0x 0.0x +str/replace (regex) 1.0x 1.0x 1.0x 1.0x 1.0x + + StringRope vs StringBuilder + N=1000 N=5000 N=10000 N=100000 N=500000 +Single Splice 0.2x 2.1x 3.5x 22.7x 195.4x +Single Insert 0.2x 1.7x 3.5x 22.5x 80.7x +Single Remove 0.8x 2.2x 4.2x 24.5x 236.5x +Concat Halves 0.7x 0.4x 2.2x 16.4x 22.4x +Split at Midpoint 0.8x 0.1x 0.3x 1.7x 6.6x +200 Random Edits 0.2x 1.4x 3.3x 22.4x 74.8x +Construction 10.0x 0.5x 1.0x 1.0x 0.9x + + ByteRope vs byte[] + N=1000 N=5000 N=10000 N=100000 N=500000 +Single Splice 0.1x 1.4x 2.7x 11.3x 109.6x +Single Insert 0.1x 1.0x 2.4x 9.7x 43.4x +Single Remove 0.8x 1.5x 2.8x 10.4x 128.3x +Concat 4 Pieces 0.1x 0.3x 0.3x 0.4x 0.4x +Split at Midpoint 0.8x 0.1x 0.3x 1.8x 7.1x +200 Random Edits 0.2x 1.0x 2.0x 13.6x 46.4x +Random nth (1000) 0.1x 0.0x 0.0x 0.0x 0.0x +Reduce (sum bytes) 0.1x 0.2x 0.1x 0.1x 0.1x +Fold (sum bytes) 0.1x 0.0x 0.1x 0.4x 0.6x +Construction 74.2x 2.9x 1.8x 0.5x 0.3x +Materialization 73.7x 12.2x 6.5x 1.3x 0.7x +SHA-256 0.9x 0.9x 0.9x 1.0x 1.0x + + Range Map vs Guava TreeRangeMap + N=1000 N=5000 N=10000 N=100000 N=500000 +Construction 0.2x 0.2x 0.2x 0.2x 0.2x +Point Lookup 0.2x 0.3x 0.3x 0.4x 0.5x +Carve-out Insert 2.4x 1.6x 1.8x 2.0x 2.1x +Iteration 1.7x 1.5x 1.5x 1.6x 2.0x + + Segment Tree vs sorted-map + N=1000 N=5000 N=10000 N=100000 N=500000 +Construction 0.3x 0.3x 0.4x 0.4x 0.4x +Range Query 9.2x 30.3x 65.6x 518.7x 3043.6x +Point Update 0.4x 0.4x 0.5x 0.5x 0.5x + + Priority Queue vs sorted-set-by + N=1000 N=5000 N=10000 N=100000 N=500000 +Construction 1.7x 1.9x 1.8x 2.7x 3.2x +Push 2.1x 2.2x 2.2x 3.0x 3.7x +Pop-min 2.1x 2.1x 1.6x 1.9x 2.3x + + Ordered Multiset vs sorted-map counts + N=1000 N=5000 N=10000 N=100000 N=500000 +Construction 0.9x 0.7x 0.7x 0.7x 0.6x +Multiplicity 1.2x 1.1x 1.1x 1.2x 1.1x +Iteration 1.5x 1.3x 1.8x 1.6x 1.9x + + Fuzzy Set vs sorted-set + N=1000 N=5000 N=10000 N=100000 N=500000 +Construction 1.0x 1.7x 2.1x 2.3x 3.0x +Nearest Lookup 1.0x 0.8x 0.8x 0.7x 0.8x + + Fuzzy Map vs sorted-map + N=1000 N=5000 N=10000 N=100000 N=500000 +Construction 0.6x 0.6x 0.6x 0.7x 0.7x +Nearest Lookup 1.0x 0.7x 0.8x 0.6x 0.7x + +──────────────────────────────────────────────────────────────────────── + Performance by Category +──────────────────────────────────────────────────────────────────────── +Category Wins Parity Losses Geomean Best Worst case (group) +set-algebra 45 0 0 4.0x 28.4x - +construction 16 2 7 0.8x 1.7x 7.9x slower (long-construction) +lookup 18 4 8 0.9x 1.5x 4.6x slower (long-lookup) +iteration 2 4 16 0.6x 1.2x 3.4x slower (set-iteration) +fold 31 2 4 1.9x 4.9x 1.5x slower (rope-fold-freq) +split 10 0 0 4.7x 6.8x - +equality 5 2 0 1.9x 6.1x - +rank 0 0 5 0.7x - 1.5x slower (rank-access) +range-map 9 0 10 0.7x 2.4x 5.5x slower (range-map-construction) +rope 10 0 0 45.6x 1236.6x - +string-rope 4 4 8 0.9x 74.8x 9.1x slower (string-rope-re-seq) +other 4 0 30 0.6x 1.3x 2.7x slower (long-insert) + +──────────────────────────────────────────────────────────────────────── + Rope Family at Scale +──────────────────────────────────────────────────────────────────────── + Each cell is 'variant vs natural baseline' speedup at N=500000. + rope vs PersistentVector · string-rope vs String · byte-rope vs byte[] + +Operation rope string-rope byte-rope +Concat 36x 29x 0.40x +Split — 6.6x 7.1x +Splice 863x 349x 110x +Insert — 154x 43x +Remove — 412x 128x +200 Random Edits 1237x 130x 46x +Random nth 0.17x 0.034x 0.017x +Reduce 1.4x 0.47x 0.11x + +──────────────────────────────────────────────────────────────────────── + Significant Wins +──────────────────────────────────────────────────────────────────────── +Size Group OC Variant Peer Speedup +500000 rope-repeated-edits rope vector 1236.6x +500000 rope-splice rope vector 863.1x +100000 rope-repeated-edits rope vector 261.4x +500000 string-rope-repeated-edits string-rope string-builder 74.8x +100000 rope-concat rope vector 39.1x +500000 rope-concat rope vector 35.9x +500000 set-difference ordered-set clojure-set 28.4x +10000 rope-repeated-edits rope vector 25.7x +100000 string-rope-repeated-edits string-rope string-builder 22.4x +500000 set-union ordered-set clojure-set 20.5x +500000 set-intersection ordered-set clojure-set 17.8x +500000 long-difference long-ordered clojure-set 16.1x +500000 long-union long-ordered clojure-set 14.1x +5000 rope-repeated-edits rope vector 13.7x +5000 rope-splice rope vector 12.7x +500000 long-intersection long-ordered clojure-set 11.6x +500000 string-set-difference string-ordered data-avl 10.6x +100000 set-difference ordered-set clojure-set 9.8x +100000 set-union ordered-set clojure-set 7.2x +100000 long-difference long-ordered clojure-set 7.0x +500000 split ordered-set data-avl 6.8x +100000 set-intersection ordered-set clojure-set 6.6x +500000 different ordered-set hash-set 6.1x +100000 split ordered-set data-avl 6.1x +500000 long-split long-ordered data-avl 5.9x +100000 string-set-difference string-ordered data-avl 5.8x +500000 string-set-union string-ordered sorted-set-by 5.8x +100000 long-union long-ordered clojure-set 5.2x +500000 string-set-intersection string-ordered data-avl 5.1x +1000 split ordered-set data-avl 5.1x + +──────────────────────────────────────────────────────────────────────── + At Parity +──────────────────────────────────────────────────────────────────────── +Group OC Variant Peer Ratio +map-construction ordered-map data-avl ~1.01x +map-lookup ordered-map data-avl ~1.03x +rank-lookup ordered-set data-avl ~1.02x +rope-chunk-iteration rope vector ~1.00x +rope-fold-freq rope vector ~1.04x +set-construction ordered-set data-avl ~0.97x +set-lookup ordered-set data-avl ~1.00x +string-rope-re-replace string-rope string ~0.98x +string-set-lookup string-ordered sorted-set-by ~1.03x +equal ordered-set hash-set ~0.99x + +──────────────────────────────────────────────────────────────────────── + Significant Losses +──────────────────────────────────────────────────────────────────────── +Size Group OC Variant Peer Slowdown Context +500000 string-rope-re-seq string-rope string 9.1x slower +100000 string-rope-re-seq string-rope string 9.0x slower +1000 long-construction long-ordered hash-set 7.9x slower +10000 string-rope-re-seq string-rope string 6.3x slower +5000 range-map-construction range-map guava-range-map 5.5x slower +1000 range-map-construction range-map guava-range-map 4.9x slower +5000 string-rope-re-seq string-rope string 4.9x slower +10000 range-map-construction range-map guava-range-map 4.9x slower +5000 long-construction long-ordered hash-set 4.9x slower +100000 range-map-construction range-map guava-range-map 4.7x slower +100000 long-lookup long-ordered hash-set 4.6x slower +100000 long-construction long-ordered hash-set 4.6x slower +500000 long-lookup long-ordered hash-set 4.5x slower +1000 string-rope-repeated-edits string-rope string-builder 4.5x slower +500000 range-map-construction range-map guava-range-map 4.5x slower +1000 range-map-lookup range-map guava-range-map 4.3x slower +10000 long-construction long-ordered hash-set 4.0x slower +5000 long-lookup long-ordered hash-set 3.8x slower +10000 long-lookup long-ordered hash-set 3.7x slower +500000 long-construction long-ordered hash-set 3.6x slower +500000 set-iteration ordered-set data-avl 3.4x slower Enumerator-based seq allocates per step +5000 range-map-lookup range-map guava-range-map 3.4x slower +100000 set-iteration ordered-set data-avl 3.3x slower Enumerator-based seq allocates per step +10000 range-map-lookup range-map guava-range-map 3.1x slower +5000 set-iteration ordered-set data-avl 3.0x slower Enumerator-based seq allocates per step +100000 long-iteration long-ordered data-avl 2.9x slower +1000 long-insert long-ordered data-avl 2.7x slower +10000 set-iteration ordered-set data-avl 2.7x slower Enumerator-based seq allocates per step +1000 long-lookup long-ordered hash-set 2.5x slower +500000 long-iteration long-ordered data-avl 2.4x slower + diff --git a/doc/ropes.md b/doc/ropes.md index 01cbdb3..e08a344 100644 --- a/doc/ropes.md +++ b/doc/ropes.md @@ -2,18 +2,35 @@ ## Status -The rope is a **public** collection type in `ordered-collections.core`. +The library provides three public rope variants in +`ordered-collections.core`, all backed by the same weight-balanced tree +kernel: + +- **`rope`** — generic persistent sequence for arbitrary Clojure values, + backed by `PersistentVector` chunks. +- **`string-rope`** — specialized text rope backed by `java.lang.String` + chunks. Implements `CharSequence` for Java interop. +- **`byte-rope`** — specialized binary rope backed by `byte[]` chunks. + Unsigned 0–255 element semantics. ```clojure (require '[ordered-collections.core :as oc]) -(oc/rope [1 2 3 4 5]) +(oc/rope [1 2 3 4 5]) ; generic +(oc/string-rope "the quick brown fox") ; text +(oc/byte-rope (byte-array [1 2 3])) ; binary ``` Implementation namespaces: -- `ordered-collections.types.rope` — `Rope` deftype -- `ordered-collections.kernel.rope` — low-level chunked tree operations +- `ordered-collections.kernel.rope` — shared chunked-tree operations + (concat, split, splice, reduce, fold). Dispatches to chunk primitives + via the `PRopeChunk` protocol. +- `ordered-collections.kernel.chunk` — `PRopeChunk` extensions for the + three chunk backends (`APersistentVector`, `String`, `byte[]`). +- `ordered-collections.types.rope` — generic `Rope` deftype. +- `ordered-collections.types.string-rope` — `StringRope` deftype. +- `ordered-collections.types.byte-rope` — `ByteRope` deftype. ## What a Rope Is @@ -89,18 +106,25 @@ The better question is: | Workload | N=10K | N=100K | N=500K | |---|---:|---:|---:| -| 200 random edits | **43x** | **498x** | **1968x** | -| Single splice | **6x** | **116x** | **584x** | -| Concat many pieces | **3.4x** | **5.4x** | **9.5x** | -| Chunk iteration | **58x** | **83x** | **117x** | -| Fold (sum) | **5.6x** | **1.5x** | **1.3x** | -| Reduce (sum) | 0.4x | **1.7x** | **1.3x** | -| Random nth (1000) | 0.7x | 0.5x | 0.4x | - -The rope wins on 6 of 7 workloads at scale. The advantage grows with collection -size because structural editing is O(log n) vs O(n). Parallel fold beats vectors -via tree-based fork-join decomposition. Random nth is slower (O(log n) vs O(1)) -— an inherent tradeoff of tree-backed indexing. +| 200 random edits | **26x** | **261x** | **1237x** | +| Single splice | **106x** | **762x** | **863x** | +| Concat many pieces | **29x** | **39x** | **36x** | +| Chunk iteration | 1.0x | 1.0x | 1.0x | +| Fold (sum) | **1.2x** | **1.4x** | **1.6x** | +| Reduce (sum) | **1.4x** | **1.5x** | **1.4x** | +| Random nth (1000) | 0.2x | 0.2x | 0.2x | + +The rope wins decisively on structural editing at scale — the advantage grows +with collection size because structural editing is O(log n) vs O(n). Parallel +fold beats vectors via tree-based fork-join decomposition. Random nth is +slower (O(log n) vs O(1)) — an inherent tradeoff of tree-backed indexing. + +> These numbers are all for tree-mode ropes (above the 1024-element +> flat threshold). Below that threshold the rope stores its content as +> a bare `PersistentVector` directly and every read dispatches to +> vector operations with zero indirection — read performance is +> essentially identical to a raw vector. See **Flat Mode: Zero-Overhead +> Small Ropes** below. ## Rope Design in This Library @@ -137,6 +161,64 @@ That is why the rope can support `nth`, `assoc`, split, and slicing without pretending that element positions are stored as explicit keys. +## The Chunk Abstraction: One Kernel, Many Backends + +The rope kernel is written once and works over any chunk type that +satisfies the `PRopeChunk` protocol. This is what lets the same tree +algebra back the generic `Rope`, the `StringRope`, and the `ByteRope` +without the kernel needing to know which backend it is operating on. + +`PRopeChunk` is a small internal protocol (13 methods) that captures +every primitive operation the kernel needs on a chunk: + +``` +chunk-length — element count +chunk-nth — element at index +chunk-slice — subrange [start, end) +chunk-merge — concatenate two chunks +chunk-append — append a single element +chunk-last — last element +chunk-butlast — all but last element +chunk-update — replace element at index +chunk-of — build a single-element chunk +chunk-reduce-init — reduce over the chunk with an initial value +chunk-append-sb — append to a StringBuilder (for materialization) +chunk-splice — replace a range inside the chunk +chunk-splice-split — same but returning a split pair (overflow path) +``` + +The extensions live in `ordered-collections.kernel.chunk`: + +| Backend | Variant | Primary operations | +|-------------------------------|---------------|---------------------------------| +| `clojure.lang.APersistentVector` | `rope` | `subvec`, `into`, `.nth`, `conj` | +| `java.lang.String` | `string-rope` | `.substring`, `.charAt`, `StringBuilder` | +| `byte[]` | `byte-rope` | `Arrays/copyOfRange`, `System/arraycopy`, `aget`/`aset` | + +Each backend is self-contained — it only touches the underlying JVM +type and has no dependency on the rest of the rope kernel. Structural +algorithms (`rope-concat`, `rope-split-at`, `rope-splice-root`, +`rope-reduce`, `rope-fold`, CSI repair) are written exactly once and +dispatch through the protocol. + +This is internal dispatch, not user-facing interop. User code never +calls `chunk-length` or `chunk-slice` directly — those are the kernel's +own adapter layer for "here's how to manipulate a chunk of type X". If +you want to add a new rope variant (say, a `LongRope` over `long[]`), +the recipe is: + +1. Extend `PRopeChunk` to the new chunk type in `kernel/chunk.clj`. +2. Add a `-rope-node-create` allocator in `kernel/rope.clj`. +3. Add a `->root` chunking builder. +4. Create a `types/_rope.clj` deftype that wraps the tree root + and binds `*t-join*` to the new allocator in each mutating operation. +5. Expose the public API in `core.clj`. + +The shared-kernel approach means every optimization the kernel gains — +`rope-splice-inplace`, monomorphic hot paths, parallel fold, transient +construction — is inherited by all variants at once. + + ## Chunked Leaves The current rope implementation is chunked rather than storing one element per @@ -163,23 +245,85 @@ slice, and splice can move a former right-edge runt into the interior, where it must be merged and rechunked back into a valid shape. +## Flat Mode: Zero-Overhead Small Ropes + +All three rope variants apply the same **flat-mode optimization**: when a +rope's element count is at or below the per-variant flat threshold (1024 +by default), the rope skips the tree wrapper entirely and stores its +content as a bare concrete collection in its `root` field — a +`PersistentVector` for the generic rope, a `java.lang.String` for the +string rope, a `byte[]` for the byte rope. + +This is just good engineering rather than clever algorithms: a +1024-element rope does not need an outer tree node, an augmented +subtree element count, and a chunk-protocol dispatch layer just to +read one element. Below the threshold every operation dispatches +straight to the native type: + +| Variant | Below threshold | Above threshold | +|--- |--- |--- | +| `rope` | `PersistentVector` | chunked WBT of vectors | +| `string-rope`| `java.lang.String` | chunked WBT of strings | +| `byte-rope` | `byte[]` | chunked WBT of byte arrays | + +Concretely: + +- **Reads** (`nth`, `seq`, `reduce`, `charAt`, indexed access) go + directly to the underlying type. No tree descent, no chunk protocol + call, no per-op indirection. +- **Structural edits** (`rope-insert`, `rope-splice`, `rope-cat`, etc.) + use the native type's own efficient operations (`subvec` + `into` for + vectors, `StringBuilder` for strings, `System.arraycopy` for byte + arrays). If the result would exceed the threshold, the rope + transparently **promotes** to the chunked tree form. +- **Transients** always build in tree form internally, but at + `persistent!` time the final result is **demoted** back to flat form + if it fits under the threshold. +- **Memory overhead** for a small rope is essentially identical to the + underlying type — the flat-mode rope is just the bare collection + plus the deftype field headers (alloc, meta). + Measured with `clj-memory-meter`, a 1024-element rope is within 0.1 + bytes/element of a raw `PersistentVector`. + +The flat threshold for each variant is tuned independently and can +diverge as each variant's performance profile demands, but currently +all three use 1024 — that happens to coincide with the +`+target-chunk-size+`, so "small enough to live in a single chunk" +and "small enough to stay flat" are the same regime. + +The asymmetry worth noting: on the generic rope, flat-mode reads are +*not* O(1) — `PersistentVector.nth` is O(log₃₂ n), a trie-level +lookup. They are, however, as fast as reads get on the JVM for +arbitrary Clojure values, and they skip the outer rope-tree +indirection entirely. On the specialized variants, flat-mode +`charAt` and unsigned-byte indexing are genuinely O(1) because +`String` and `byte[]` are contiguous. + + ## API -From `ordered-collections.core`: +### Shared rope API (works on all three variants) + +The `PRope` protocol is implemented by `Rope`, `StringRope`, and +`ByteRope`. Every function below dispatches on the protocol, so the +same call works regardless of which rope variant you pass in: -- `rope` -- `rope-concat` — two args: O(log n) tree join; three or more: O(total chunks) bulk -- `rope-split` -- `rope-sub` -- `rope-insert` -- `rope-remove` -- `rope-splice` -- `rope-chunks` -- `rope-chunks-reverse` -- `rope-chunk-count` -- `rope-str` +- `rope-split` — split at index, returns `[left right]` +- `rope-sub` — extract subrange `[start, end)` +- `rope-insert` — insert content at index +- `rope-remove` — remove range +- `rope-splice` — replace range with new content +- `rope-chunks` — seq of internal chunk values +- `rope-str` — materialize to the natural backing type (String / byte[] / etc.) -And the `Rope` type itself supports: +### Generic Rope + +- `rope` — constructor (any seqable) +- `rope-concat` — 1-arg coerce, 2-arg O(log n) join, 3+-arg bulk +- `rope-chunks-reverse` — reverse chunk seq +- `rope-chunk-count` — number of chunks + +The `Rope` type supports: - `count`, `nth`, `get`, `assoc`, `conj`, `peek`, `pop` - vector-style function lookup @@ -189,8 +333,72 @@ And the `Rope` type itself supports: - `compare` (lexicographic) - `java.util.List`: `get`, `indexOf`, `lastIndexOf`, `contains`, `subList` - `java.util.Collection`: `size`, `isEmpty`, `toArray`, `containsAll` +- `IPersistentVector` — `(vector? r)` is true - metadata, sequential equality, ordered hashing +### StringRope + +- `string-rope` — constructor (from `String`, another StringRope, or anything `str` can coerce) +- `string-rope-concat` — variadic, same 1/2/3+ semantics as `rope-concat` + +The `StringRope` type implements `java.lang.CharSequence` and most of +the same Clojure interfaces as the generic Rope, with text-appropriate +semantics: + +- `nth` / `charAt` return a `Character` +- `(str sr)` materializes to a `java.lang.String` +- Equality with `java.lang.String` is content-based +- Hash matches `String`'s so ropes and strings can co-exist as map keys +- `Comparable` — lexicographic compare matches `String.compareTo` +- `IEditableCollection` — `TransientStringRope` with a `StringBuilder` tail +- Works with `re-find` / `re-seq` / `re-matches` / `java.util.regex.Matcher` +- Works with `clojure.string` (all of its functions accept `CharSequence`) +- Works with `java.io.*` APIs that accept `CharSequence` + +### ByteRope + +ByteRope is essentially a **persistent, structure-sharing memory** — a +model of a contiguous byte region that supports O(log n) structural +editing (splice, insert, remove), zero-cost immutable snapshots (every +version persists via path-copying), automatic coalescing of adjacent +chunks (the CSI invariant merges undersized neighbors), and garbage +collection of unreachable versions by the JVM. You get the mental model +of a mutable byte buffer with the safety properties of a persistent data +structure. + +This makes ByteRope useful far beyond simple binary blobs: +- **Binary protocol construction** — build packets by splicing fields + at offsets, roll back on error by keeping the prior version +- **Undo/redo** — each edit produces a new ByteRope; old versions are + retained as long as referenced, discarded by GC when not +- **Diffing and patching** — apply a patch to a snapshot without + copying the unmodified regions (structure sharing) +- **Streaming** — `byte-rope-input-stream` and `byte-rope-write` let + you feed chunks through Java I/O without materializing the full + contents + +API surface: + +- `byte-rope` — constructor (from `byte[]`, another ByteRope, `String` (UTF-8), `InputStream`, seq of unsigned longs) +- `byte-rope-concat` — variadic, same 1/2/3+ semantics +- `byte-rope-bytes` — defensively-copied `byte[]` materialization +- `byte-rope-hex` — lowercase hex string +- `byte-rope-write` — stream chunks to an `OutputStream` +- `byte-rope-input-stream` — adapter returning a fresh `java.io.InputStream` +- `byte-rope-get-byte` / `-short` / `-int` / `-long` — big-endian multi-byte reads +- `byte-rope-get-short-le` / `-int-le` / `-long-le` — little-endian variants +- `byte-rope-index-of` — first index of a given unsigned byte value +- `byte-rope-digest` — streaming `java.security.MessageDigest`; returns a ByteRope + +The `ByteRope` type exposes bytes as unsigned longs in `[0, 255]`: + +- `nth` / `reduce` / `seq` yield longs in that range +- Equality with `byte[]` is content-based +- Not equal to Clojure vectors (intentional — avoids signed/unsigned confusion) +- `Comparable` — unsigned lexicographic via `Arrays/compareUnsigned` +- `IEditableCollection` — `TransientByteRope` with a `ByteArrayOutputStream` tail +- Does **not** implement `CharSequence` or `IPersistentVector` — bytes are their own domain + ## Examples @@ -219,12 +427,12 @@ And the `Rope` type itself supports: (def a (oc/rope [0 1 2])) (def b (oc/rope [3 4 5])) -(oc/rope-concat a b) ;; => #ordered/rope [0 1 2 3 4 5] +(oc/rope-concat a b) ;; => #vec/rope [0 1 2 3 4 5] ;; Variadic — bulk concatenation in O(total chunks) (oc/rope-concat (oc/rope [1 2]) (oc/rope [3 4]) (oc/rope [5 6])) -;; => #ordered/rope [1 2 3 4 5 6] +;; => #vec/rope [1 2 3 4 5 6] ``` ### Split and Slice @@ -524,10 +732,10 @@ coerce their arguments automatically: ```clojure (oc/rope-concat (oc/rope [1 2]) [3 4]) -;; => #ordered/rope [1 2 3 4] +;; => #vec/rope [1 2 3 4] (oc/rope-insert (oc/rope [1 2 3]) 1 [:a :b]) -;; => #ordered/rope [1 :a :b 2 3] +;; => #vec/rope [1 :a :b 2 3] ``` **Interop summary:** @@ -540,7 +748,7 @@ coerce their arguments automatically: | Rope | PersistentVector | `(vec r)` | | Rope | lazy seq | `(seq r)` | | Rope | Java array | `(.toArray r)` | -| Rope | EDN round-trip | `#ordered/rope` tagged literal | +| Rope | EDN round-trip | `#vec/rope` tagged literal | ### Scenario 9: Java Interop @@ -555,7 +763,7 @@ with Java APIs that expect these interfaces: (.size r) ;; => 5 (.contains r 30) ;; => true (.indexOf r 40) ;; => 3 -(.subList r 1 4) ;; => #ordered/rope [20 30 40] +(.subList r 1 4) ;; => #vec/rope [20 30 40] (.toArray r) ;; => Object[5] {10, 20, 30, 40, 50} ``` @@ -579,6 +787,137 @@ Use a **rope** when: - you assemble a sequence from many parts and then process it +## Specialized Ropes + +The generic `Rope` stores any Clojure value in `PersistentVector` +chunks. That is the right choice for heterogeneous sequential data +but pays for boxing and per-element object headers on workloads where +the element type is uniform. The library provides two specialized +variants backed by native JVM types: + +### StringRope — text + +`StringRope` backs each chunk with a `java.lang.String`. The JEP 254 +compact-string optimization means a 256-character ASCII chunk occupies +about the same space as a 256-byte `byte[]`, plus object headers. +`StringRope` implements `java.lang.CharSequence`, so every Java and +Clojure text API accepts it directly: + +```clojure +(require '[clojure.string :as str]) + +(def doc (oc/string-rope "The quick brown fox jumps over the lazy dog.")) + +(count doc) ;; => 44 +(str doc) ;; => "The quick brown fox jumps over the lazy dog." +(subs doc 4 9) ;; NOT supported — use rope-sub + +(str (oc/rope-sub doc 4 9)) ;; => "quick" +(str (oc/rope-splice doc 4 9 "slow")) ;; => "The slow brown fox jumps over the lazy dog." + +;; Regex and clojure.string work directly because doc implements CharSequence +(re-seq #"\w+" doc) +(str/upper-case (str doc)) +(str/replace doc #"\w+" clojure.string/upper-case) +``` + +Key properties: + +- `(= (string-rope "hello") "hello")` is true +- `(hash (string-rope "hello"))` matches `(hash "hello")` +- `#string/rope "…"` tagged literal round-trips through EDN +- Transient support via `StringBuilder` tail buffer for fast batch construction +- Small strings (≤ 1024 chars) are stored as a raw `String` internally + with zero tree overhead; edits that grow past the threshold transparently + promote to chunked form + +Performance vs `java.lang.String` (structural editing workloads): + +| Workload | N=10K | N=100K | N=500K | +|---|---:|---:|---:| +| 200 random edits | **5.7x** | **38x** | **130x** | +| Single splice | **5.9x** | **42x** | **349x** | +| Single remove | **7.1x** | **44x** | **412x** | +| Random nth | 0.1x | 0.0x | 0.0x | + +Random-access reads are slower than `String` (O(log n) vs O(1)) but +bounded; structural edits scale indefinitely while `String` edits are +always O(n). + +### ByteRope — binary data + +`ByteRope` backs each chunk with a primitive `byte[]`. This matches +the mutable-world defaults of `java.nio.ByteBuffer`, protobuf +`ByteString`, Okio, and Netty — but with persistent semantics and +O(log n) structural edits. + +```clojure +(def msg (oc/byte-rope (.getBytes "Hello, World!" "UTF-8"))) + +(count msg) ;; => 13 +(oc/byte-rope-hex msg) ;; => "48656c6c6f2c20576f726c6421" +(nth msg 0) ;; => 72 (unsigned 'H') +(oc/byte-rope-get-int msg 0) ;; => 1214606444 (big-endian u32) + +(def with-prefix + (oc/byte-rope-concat + (oc/byte-rope (byte-array [0xde 0xad 0xbe 0xef])) + msg)) + +(oc/byte-rope-hex with-prefix) +;; => "deadbeef48656c6c6f2c20576f726c6421" + +;; Cryptographic digest streams chunks through MessageDigest +(oc/byte-rope-hex (oc/byte-rope-digest msg "SHA-256")) +;; => "dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f" +``` + +Key properties: + +- Unsigned byte semantics — `nth` / `reduce` / `seq` yield longs in `[0, 255]` +- Equality with `byte[]` is content-based; intentionally not equal to Clojure vectors +- Unsigned lexicographic `Comparable` (consistent with protobuf/Okio/Netty) +- Big-endian multi-byte reads with `-le` variants for little-endian +- `#byte/rope "hex"` tagged literal round-trips through EDN +- Defensive copy on construction — never shares mutable `byte[]` with the caller +- Transient support via `ByteArrayOutputStream` tail buffer +- Small byte sequences (≤ 1024 bytes) are stored as a raw `byte[]` with zero + tree overhead; edits that grow past the threshold promote transparently +- Streaming primitives: `byte-rope-write` for `OutputStream`, + `byte-rope-input-stream` for `InputStream`, `byte-rope-digest` for + `MessageDigest` — no full materialization needed + +Performance vs `byte[]` (structural editing workloads): + +| Workload | N=10K | N=100K | N=500K | +|---|---:|---:|---:| +| 200 random edits | **2.0x** | **14x** | **46x** | +| Single splice at midpoint | **2.7x** | **11x** | **110x** | +| Single remove | **2.8x** | **10x** | **128x** | +| Split at midpoint | 0.3x | **1.8x** | **7.1x** | +| Random nth | 0.0x | 0.0x | 0.0x | + +Small-scale byte-array operations win because `System/arraycopy` is +absurdly fast. The rope takes over at the scale where persistent edits +matter — roughly 10K+ bytes with repeated splicing. For single reads +or small buffers, stay with `byte[]`. For binary-protocol assembly, +streaming digests, or any workload with many edits on a large buffer, +use `ByteRope`. + +### Choosing a variant + +| You want... | Use | +|---|---| +| Text editing, regex, clojure.string, Java interop | `string-rope` | +| Binary protocol assembly, streaming digest, patch editing | `byte-rope` | +| Anything else sequential (vectors of arbitrary values) | `rope` | + +All three share the same public API through the `PRope` protocol — +`rope-split`, `rope-sub`, `rope-insert`, `rope-remove`, `rope-splice`, +`rope-chunks`, `rope-str` — so code that works on one variant often +works on the others with minimal changes. + + ## Conceptual Tradeoffs ### Strengths diff --git a/doc/vs-clojure-data-avl.md b/doc/vs-clojure-data-avl.md index c2efcc3..e14e2e3 100644 --- a/doc/vs-clojure-data-avl.md +++ b/doc/vs-clojure-data-avl.md @@ -58,8 +58,8 @@ The core operations are the same, with one syntax difference: ordered-collection ## What you gain -- **Parallel set operations** — about 7-42x faster union/intersection/difference in the current Criterium run -- **Parallel fold** — `r/fold` uses tree-based fork-join and is about 6-9x faster than data.avl on large reductions +- **Parallel set operations** — about 7-51x faster union/intersection/difference in the current Criterium run +- **Parallel fold** — `r/fold` uses tree-based fork-join and is about 3-5x faster than data.avl on large reductions - **Fast endpoint access** — `java.util.SortedSet.last()` is O(log n); data.avl falls back to seq traversal for `last` - **`median`, `percentile`, `slice`** — positional operations beyond nth/rank - **Specialized collections** — interval trees, segment trees, range maps, fuzzy lookup, priority queues, multisets diff --git a/etc/bench_report.bb b/etc/bench_report.bb index 8eb87c9..7f1c7b0 100644 --- a/etc/bench_report.bb +++ b/etc/bench_report.bb @@ -23,7 +23,10 @@ sizes (sort (set (map :size target-rows))) headlines (analyze/headline-wins target-rows sizes) parity (analyze/parity-cases scorecard) + wins (analyze/significant-wins scorecard) losses (analyze/significant-losses scorecard) + categories (analyze/category-summary scorecard) + rope-family (analyze/rope-family-summary target-rows sizes) regressions (->> (analyze/regression-report target-rows baseline-rows) (filter #(#{:regression :major-regression} (:status %))) (sort-by :ratio >)) @@ -34,15 +37,22 @@ (render/render-header opts target-summary baseline-summary) (render/render-executive-summary summary) (render/render-headline-wins headlines sizes) + (render/render-category-summary categories) + (render/render-rope-family rope-family) + (render/render-significant-wins wins opts) (render/render-parity parity opts) (render/render-significant-losses losses opts) - (render/render-scorecard - (sort-by (juxt (comp #(.indexOf analyze/category-order %) :category) - #(- (:speedup %))) - scorecard) - opts) - (render/render-regressions "Regressions" regressions opts) - (render/render-regressions "Improvements" improvements opts) + ;; Full Scorecard, Regressions, and Improvements are for interactive + ;; A/B review. They are noisy for an outside reader of the committed + ;; doc/report.txt snapshot, so --publish suppresses them. + (when-not (:publish opts) + (render/render-scorecard + (sort-by (juxt (comp #(.indexOf analyze/category-order %) :category) + #(- (:speedup %))) + scorecard) + opts) + (render/render-regressions "Regressions" regressions opts) + (render/render-regressions "Improvements" improvements opts)) (println))))) (apply -main *command-line-args*) diff --git a/etc/gen_chart.clj b/etc/gen_chart.clj new file mode 100644 index 0000000..0a0b5f0 --- /dev/null +++ b/etc/gen_chart.clj @@ -0,0 +1,267 @@ +(ns gen-chart + "Generate a PNG benchmark chart using Java 2D Graphics." + (:import [java.awt Color Font Graphics2D BasicStroke RenderingHints] + [java.awt.font TextLayout FontRenderContext] + [java.awt.geom Line2D$Double Ellipse2D$Double Path2D$Double + Rectangle2D$Double AffineTransform] + [java.awt.image BufferedImage] + [javax.imageio ImageIO] + [java.io File])) + +;; ── Benchmark data ────────────────────────────────────────────────────────── + +(def data + ;; [size string-ns stringbuilder-ns stringrope-ns] + {:repeated-edits + {:label "200 Random Edits (splice + insert + remove)" + :rows [[1000 23500 11200 31300] + [4000 73200 42400 27000] + [10000 205700 130000 28100] + [100000 1620000 968400 38100]]} + :single-splice + {:label "Single Splice (replace 10 chars at midpoint)" + :rows [[1000 110 55 146] + [4000 334 189 95] + [10000 884 635 109] + [100000 8200 4800 51]]}}) + +;; ── Geometry helpers ──────────────────────────────────────────────────────── + +(def W 1600) +(def H 900) +(def margin-left 140) +(def margin-right 40) +(def margin-top 70) +(def margin-bottom 80) +(def gap 100) ;; between left and right panels + +(defn panel-bounds [panel-idx] + (let [pw (/ (- W margin-left margin-right gap) 2) + x0 (+ margin-left (* panel-idx (+ pw gap))) + y0 margin-top + x1 (+ x0 pw) + y1 (- H margin-bottom)] + {:x0 x0 :y0 y0 :x1 x1 :y1 y1 :pw pw :ph (- y1 y0)})) + +(defn log10 [x] (Math/log10 (double x))) + +(defn map-range [v vmin vmax pmin pmax] + (+ pmin (* (- pmax pmin) (/ (- v vmin) (- vmax vmin))))) + +;; ── Colors & styling ──────────────────────────────────────────────────────── + +(def color-string (Color. 220 60 60)) ;; red +(def color-stringbuilder (Color. 50 120 220)) ;; blue +(def color-stringrope (Color. 30 170 70)) ;; green +(def color-grid (Color. 220 220 220)) +(def color-axis (Color. 80 80 80)) +(def color-bg (Color. 252 252 252)) +(def color-panel-bg Color/WHITE) +(def color-title (Color. 30 30 30)) +(def color-subtitle (Color. 100 100 100)) + +(def font-title (Font. "Helvetica" Font/BOLD 28)) +(def font-subtitle (Font. "Helvetica" Font/PLAIN 16)) +(def font-label (Font. "Helvetica" Font/PLAIN 14)) +(def font-tick (Font. "Helvetica" Font/PLAIN 12)) +(def font-legend (Font. "Helvetica" Font/PLAIN 14)) +(def font-annot (Font. "Helvetica" Font/BOLD 13)) + +(def stroke-line (BasicStroke. 3.0 BasicStroke/CAP_ROUND BasicStroke/JOIN_ROUND)) +(def stroke-grid (BasicStroke. 1.0)) +(def stroke-axis (BasicStroke. 1.5)) + +;; ── Drawing helpers ───────────────────────────────────────────────────────── + +(defn draw-line [^Graphics2D g x1 y1 x2 y2] + (.draw g (Line2D$Double. x1 y1 x2 y2))) + +(defn draw-circle [^Graphics2D g cx cy r] + (.fill g (Ellipse2D$Double. (- cx r) (- cy r) (* 2 r) (* 2 r)))) + +(defn draw-string-centered [^Graphics2D g ^String s x y] + (let [fm (.getFontMetrics g) + sw (.stringWidth fm s)] + (.drawString g s (int (- x (/ sw 2))) (int y)))) + +(defn draw-string-right [^Graphics2D g ^String s x y] + (let [fm (.getFontMetrics g) + sw (.stringWidth fm s)] + (.drawString g s (int (- x sw)) (int y)))) + +(defn format-ns [ns] + (cond + (>= ns 1000000) (format "%.1fms" (/ ns 1000000.0)) + (>= ns 1000) (format "%.0fus" (/ ns 1000.0)) + :else (format "%.0fns" (double ns)))) + +(defn format-size [n] + (cond + (>= n 1000000) (str (/ n 1000000) "M") + (>= n 1000) (str (/ n 1000) "k") + :else (str n))) + +;; ── Panel rendering ───────────────────────────────────────────────────────── + +(defn draw-panel [^Graphics2D g {:keys [label rows]} panel-idx] + (let [{:keys [x0 y0 x1 y1 pw ph]} (panel-bounds panel-idx) + ;; Compute data ranges (log scale) + sizes (mapv first rows) + all-vals (mapcat (fn [[_ a b c]] [a b c]) rows) + log-xmin (log10 (apply min sizes)) + log-xmax (log10 (apply max sizes)) + log-ymin (Math/floor (log10 (apply min (filter pos? all-vals)))) + log-ymax (Math/ceil (log10 (apply max all-vals))) + ;; Add padding + log-xmin (- log-xmin 0.1) + log-xmax (+ log-xmax 0.1) + log-ymin (- log-ymin 0.2) + log-ymax (+ log-ymax 0.2) + ;; Map functions + mx (fn [v] (map-range (log10 v) log-xmin log-xmax x0 x1)) + my (fn [v] (map-range (log10 v) log-ymin log-ymax y1 y0))] + + ;; Panel background + (.setColor g color-panel-bg) + (.fill g (Rectangle2D$Double. x0 y0 pw ph)) + + ;; Grid lines (Y axis - powers of 10) + (.setStroke g stroke-grid) + (.setColor g color-grid) + (doseq [exp (range (int log-ymin) (inc (int log-ymax)))] + (let [yy (my (Math/pow 10 exp))] + (when (and (>= yy y0) (<= yy y1)) + (draw-line g x0 yy x1 yy)))) + + ;; Grid lines (X axis - at data points) + (doseq [s sizes] + (let [xx (mx s)] + (draw-line g xx y0 xx y1))) + + ;; Axes + (.setStroke g stroke-axis) + (.setColor g color-axis) + (draw-line g x0 y1 x1 y1) ;; bottom + (draw-line g x0 y0 x0 y1) ;; left + + ;; Y tick labels + (.setFont g font-tick) + (.setColor g color-axis) + (doseq [exp (range (int log-ymin) (inc (int log-ymax)))] + (let [yy (my (Math/pow 10 exp)) + ns-val (long (Math/pow 10 exp))] + (when (and (>= yy y0) (<= yy y1)) + (draw-string-right g (format-ns ns-val) (- x0 8) (+ yy 4))))) + + ;; X tick labels + (doseq [s sizes] + (draw-string-centered g (format-size s) (mx s) (+ y1 20))) + + ;; X axis label + (.setFont g font-label) + (.setColor g color-subtitle) + (draw-string-centered g "String Length (chars)" (/ (+ x0 x1) 2) (+ y1 45)) + + ;; Y axis label (only for left panel) + (when (zero? panel-idx) + (let [at (AffineTransform.) + cy (/ (+ y0 y1) 2)] + (.rotate at (- (/ Math/PI 2))) + (let [old-transform (.getTransform g)] + (.setTransform g (doto (AffineTransform. old-transform) + (.translate 30 cy) + (.rotate (- (/ Math/PI 2))))) + (.setFont g font-label) + (.setColor g color-subtitle) + (draw-string-centered g "Time (log scale)" 0 0) + (.setTransform g old-transform)))) + + ;; Panel title + (.setFont g font-label) + (.setColor g color-title) + (draw-string-centered g label (/ (+ x0 x1) 2) (- y0 12)) + + ;; Plot lines and points + (doseq [[color-val series-idx series-label] + [[color-string 1 "String (str+subs)"] + [color-stringbuilder 2 "StringBuilder"] + [color-stringrope 3 "StringRope"]]] + (.setColor g color-val) + (.setStroke g stroke-line) + ;; Lines + (let [pts (mapv (fn [row] + [(mx (nth row 0)) (my (nth row series-idx))]) + rows)] + (doseq [i (range (dec (count pts)))] + (let [[px1 py1] (nth pts i) + [px2 py2] (nth pts (inc i))] + (draw-line g px1 py1 px2 py2))) + ;; Points + (doseq [[px py] pts] + (draw-circle g px py 5)))) + + ;; Speedup annotations at 100k + (let [last-row (last rows) + string-ns (nth last-row 1) + sb-ns (nth last-row 2) + rope-ns (nth last-row 3) + xx (+ (mx (first last-row)) 8) + yr (my rope-ns)] + (.setFont g font-annot) + (.setColor g color-stringrope) + (let [vs-str (format "%.0fx vs String" (/ (double string-ns) rope-ns)) + vs-sb (format "%.0fx vs StringBuilder" (/ (double sb-ns) rope-ns))] + (.drawString g vs-str (int (- xx 100)) (int (- yr 18))) + (.drawString g vs-sb (int (- xx 100)) (int (- yr 4))))))) + +;; ── Main rendering ────────────────────────────────────────────────────────── + +(defn render-chart [^String path] + (let [img (BufferedImage. W H BufferedImage/TYPE_INT_ARGB) + g (.createGraphics img)] + ;; Anti-aliasing + (.setRenderingHint g RenderingHints/KEY_ANTIALIASING RenderingHints/VALUE_ANTIALIAS_ON) + (.setRenderingHint g RenderingHints/KEY_TEXT_ANTIALIASING RenderingHints/VALUE_TEXT_ANTIALIAS_ON) + + ;; Background + (.setColor g color-bg) + (.fillRect g 0 0 W H) + + ;; Title + (.setFont g font-title) + (.setColor g color-title) + (draw-string-centered g "StringRope: Persistent O(log n) String Editing" (/ W 2) 35) + + (.setFont g font-subtitle) + (.setColor g color-subtitle) + (draw-string-centered g "vs String (idiomatic Clojure) and StringBuilder (optimal mutable)" + (/ W 2) 58) + + ;; Panels + (draw-panel g (:single-splice data) 0) + (draw-panel g (:repeated-edits data) 1) + + ;; Legend (centered below panels) + (let [ly (- H 18) + lx (- (/ W 2) 200) + items [[color-string "String (str + subs)"] + [color-stringbuilder "StringBuilder"] + [color-stringrope "StringRope (persistent, O(log n))"]]] + (loop [items items, x lx] + (when (seq items) + (let [[color label] (first items)] + (.setColor g color) + (.fillRect g (int x) (int (- ly 8)) 20 3) + (draw-circle g (+ x 10) (- ly 7) 4) + (.setFont g font-legend) + (.setColor g color-axis) + (.drawString g ^String label (int (+ x 28)) (int ly)) + (let [fm (.getFontMetrics g) + sw (.stringWidth fm ^String label)] + (recur (rest items) (+ x sw 60))))))) + + (.dispose g) + (ImageIO/write img "png" (File. path)) + (println (str "Chart written to: " path)))) + +(render-chart "bench-results/string-rope-benchmark.png") diff --git a/etc/lib/bench_analyze.clj b/etc/lib/bench_analyze.clj index 718a7e1..d921fdc 100644 --- a/etc/lib/bench_analyze.clj +++ b/etc/lib/bench_analyze.clj @@ -9,7 +9,7 @@ (def ^:private oc-prefixes ["ordered-set" "ordered-map" "long-ordered" "string-ordered" - "interval-set" "interval-map" "range-map" "rope"]) + "interval-set" "interval-map" "range-map" "string-rope" "rope"]) (defn- oc-variant? [variant] @@ -23,20 +23,30 @@ (def category-order [:set-algebra :construction :lookup :iteration :fold :split - :equality :rank :string :interval :rope :other]) + :equality :rank :string :interval + :range-map :segment-tree :priority-queue :multiset :fuzzy + :rope :string-rope :byte-rope :other]) (def ^:private category-patterns - [[:set-algebra #"union|intersection|difference"] - [:construction #"construction"] - [:lookup #"lookup"] - [:iteration #"iteration"] - [:fold #"fold|reduce"] - [:split #"split"] - [:equality #"equal|different|size-different"] - [:rank #"rank"] - [:string #"string"] - [:interval #"interval"] - [:rope #"rope"]]) + ;; Specific patterns first — classify-group returns the first match. + [[:set-algebra #"union|intersection|difference"] + [:range-map #"^range-map"] + [:segment-tree #"^segment-tree"] + [:priority-queue #"^priority-queue"] + [:multiset #"^multiset"] + [:fuzzy #"^fuzzy-"] + [:string-rope #"^string-rope"] + [:byte-rope #"^byte-rope"] + [:construction #"construction"] + [:lookup #"lookup"] + [:iteration #"iteration"] + [:fold #"fold|reduce"] + [:split #"split"] + [:equality #"equal|different|size-different"] + [:rank #"rank"] + [:string #"string"] + [:interval #"interval"] + [:rope #"rope"]]) (defn classify-group [group] @@ -171,11 +181,39 @@ {:pattern #"^set-union$" :oc :ordered-set :peer :clojure-set :label "Union" :section "Set Algebra vs clojure.core/set"} {:pattern #"^set-intersection$" :oc :ordered-set :peer :clojure-set :label "Intersection" :section "Set Algebra vs clojure.core/set"} {:pattern #"^set-difference$" :oc :ordered-set :peer :clojure-set :label "Difference" :section "Set Algebra vs clojure.core/set"} - ;; Other operations - {:pattern #"^set-construction$" :oc :ordered-set :peer :sorted-set :label "Construction" :section "Other Operations"} - {:pattern #"^set-lookup$" :oc :ordered-set :peer :sorted-set :label "Lookup" :section "Other Operations"} - {:pattern #"^split$" :oc :ordered-set :peer :data-avl :label "Split" :section "Other Operations"} - {:pattern #"^set-fold$" :oc :ordered-set-fold :peer :sorted-set-fold :label "Fold" :section "Other Operations"} + ;; Ordered Set vs competitors + {:pattern #"^set-construction$" :oc :ordered-set :peer :sorted-set :label "Construction" :section "Ordered Set vs sorted-set"} + {:pattern #"^set-lookup$" :oc :ordered-set :peer :sorted-set :label "Lookup" :section "Ordered Set vs sorted-set"} + {:pattern #"^set-iteration$" :oc :ordered-set :peer :sorted-set :label "Iteration" :section "Ordered Set vs sorted-set"} + {:pattern #"^set-fold$" :oc :ordered-set-fold :peer :sorted-set-fold :label "Fold" :section "Ordered Set vs sorted-set"} + {:pattern #"^split$" :oc :ordered-set :peer :data-avl :label "Split" :section "Ordered Set vs sorted-set"} + {:pattern #"^set-iteration-iterator$" :oc :ordered-set :peer :sorted-set :label "Iteration (Iterator)" :section "Ordered Set vs sorted-set"} + {:pattern #"^set-construction$" :oc :ordered-set :peer :data-avl :label "Construction" :section "Ordered Set vs data.avl"} + {:pattern #"^set-lookup$" :oc :ordered-set :peer :data-avl :label "Lookup" :section "Ordered Set vs data.avl"} + {:pattern #"^set-iteration$" :oc :ordered-set :peer :data-avl :label "Iteration" :section "Ordered Set vs data.avl"} + {:pattern #"^set-iteration-iterator$" :oc :ordered-set :peer :data-avl :label "Iteration (Iterator)" :section "Ordered Set vs data.avl"} + ;; Ordered Map vs competitors + {:pattern #"^map-construction$" :oc :ordered-map :peer :sorted-map :label "Construction" :section "Ordered Map vs sorted-map"} + {:pattern #"^map-lookup$" :oc :ordered-map :peer :sorted-map :label "Lookup" :section "Ordered Map vs sorted-map"} + {:pattern #"^map-iteration$" :oc :ordered-map :peer :sorted-map :label "Iteration" :section "Ordered Map vs sorted-map"} + {:pattern #"^map-fold$" :oc :ordered-map-reduce :peer :sorted-map-reduce :label "Reduce" :section "Ordered Map vs sorted-map"} + {:pattern #"^map-construction$" :oc :ordered-map :peer :data-avl :label "Construction" :section "Ordered Map vs data.avl"} + {:pattern #"^map-lookup$" :oc :ordered-map :peer :data-avl :label "Lookup" :section "Ordered Map vs data.avl"} + {:pattern #"^map-iteration$" :oc :ordered-map :peer :data-avl :label "Iteration" :section "Ordered Map vs data.avl"} + ;; Long-specialized + {:pattern #"^long-construction$" :oc :long-ordered :peer :sorted-set :label "Construction" :section "Long-Specialized vs sorted-set"} + {:pattern #"^long-lookup$" :oc :long-ordered :peer :sorted-set :label "Lookup" :section "Long-Specialized vs sorted-set"} + {:pattern #"^long-union$" :oc :long-ordered :peer :sorted-set :label "Union" :section "Long-Specialized vs sorted-set"} + {:pattern #"^long-intersection$" :oc :long-ordered :peer :sorted-set :label "Intersection" :section "Long-Specialized vs sorted-set"} + {:pattern #"^long-difference$" :oc :long-ordered :peer :sorted-set :label "Difference" :section "Long-Specialized vs sorted-set"} + {:pattern #"^long-rank-lookup$" :oc :long-ordered :peer :data-avl :label "Rank lookup" :section "Long-Specialized vs data.avl"} + ;; String-specialized + {:pattern #"^string-set-construction$" :oc :string-ordered :peer :sorted-set-by :label "Construction" :section "String-Specialized vs sorted-set-by"} + {:pattern #"^string-set-lookup$" :oc :string-ordered :peer :sorted-set-by :label "Lookup" :section "String-Specialized vs sorted-set-by"} + {:pattern #"^string-set-union$" :oc :string-ordered :peer :sorted-set-by :label "Union" :section "String-Specialized vs sorted-set-by"} + {:pattern #"^string-set-intersection$" :oc :string-ordered :peer :sorted-set-by :label "Intersection" :section "String-Specialized vs sorted-set-by"} + {:pattern #"^string-set-difference$" :oc :string-ordered :peer :sorted-set-by :label "Difference" :section "String-Specialized vs sorted-set-by"} + {:pattern #"^string-rank-lookup$" :oc :string-ordered :peer :data-avl :label "Rank lookup" :section "String-Specialized vs data.avl"} ;; Rope vs PersistentVector (matches bench_runner groups) {:pattern #"^rope-repeated-edits$" :oc :rope :peer :vector :label "200 Random Edits" :section "Rope vs PersistentVector"} {:pattern #"^rope-splice$" :oc :rope :peer :vector :label "Single Splice" :section "Rope vs PersistentVector"} @@ -183,7 +221,66 @@ {:pattern #"^rope-chunk-iteration$" :oc :rope :peer :vector :label "Chunk Iteration" :section "Rope vs PersistentVector"} {:pattern #"^rope-reduce$" :oc :rope :peer :vector :label "Reduce (sum)" :section "Rope vs PersistentVector"} {:pattern #"^rope-fold-sum$" :oc :rope :peer :vector :label "Fold (sum)" :section "Rope vs PersistentVector"} - {:pattern #"^rope-nth$" :oc :rope :peer :vector :label "Random nth (1000)" :section "Rope vs PersistentVector"}]) + {:pattern #"^rope-nth$" :oc :rope :peer :vector :label "Random nth (1000)" :section "Rope vs PersistentVector"} + {:pattern #"^rope-fold-freq$" :oc :rope :peer :vector :label "Fold (freq map)" :section "Rope vs PersistentVector"} + ;; StringRope vs String (idiomatic str+subs) + {:pattern #"^string-rope-splice$" :oc :string-rope :peer :string :label "Single Splice" :section "StringRope vs String"} + {:pattern #"^string-rope-insert$" :oc :string-rope :peer :string :label "Single Insert" :section "StringRope vs String"} + {:pattern #"^string-rope-remove$" :oc :string-rope :peer :string :label "Single Remove" :section "StringRope vs String"} + {:pattern #"^string-rope-concat$" :oc :string-rope :peer :string :label "Concat Halves" :section "StringRope vs String"} + {:pattern #"^string-rope-split$" :oc :string-rope :peer :string :label "Split at Midpoint" :section "StringRope vs String"} + {:pattern #"^string-rope-repeated-edits$" :oc :string-rope :peer :string :label "200 Random Edits" :section "StringRope vs String"} + {:pattern #"^string-rope-nth$" :oc :string-rope :peer :string :label "Random nth (1000)" :section "StringRope vs String"} + {:pattern #"^string-rope-reduce$" :oc :string-rope :peer :string :label "Reduce (sum chars)" :section "StringRope vs String"} + {:pattern #"^string-rope-re-find$" :oc :string-rope :peer :string :label "re-find" :section "StringRope vs String"} + {:pattern #"^string-rope-re-seq$" :oc :string-rope :peer :string :label "re-seq" :section "StringRope vs String"} + {:pattern #"^string-rope-str$" :oc :string-rope :peer :string :label "Materialization (str)" :section "StringRope vs String"} + {:pattern #"^string-rope-re-replace$" :oc :string-rope :peer :string :label "str/replace (regex)" :section "StringRope vs String"} + ;; StringRope vs StringBuilder (optimal mutable baseline) + {:pattern #"^string-rope-splice$" :oc :string-rope :peer :string-builder :label "Single Splice" :section "StringRope vs StringBuilder"} + {:pattern #"^string-rope-insert$" :oc :string-rope :peer :string-builder :label "Single Insert" :section "StringRope vs StringBuilder"} + {:pattern #"^string-rope-remove$" :oc :string-rope :peer :string-builder :label "Single Remove" :section "StringRope vs StringBuilder"} + {:pattern #"^string-rope-concat$" :oc :string-rope :peer :string-builder :label "Concat Halves" :section "StringRope vs StringBuilder"} + {:pattern #"^string-rope-split$" :oc :string-rope :peer :string-builder :label "Split at Midpoint" :section "StringRope vs StringBuilder"} + {:pattern #"^string-rope-repeated-edits$" :oc :string-rope :peer :string-builder :label "200 Random Edits" :section "StringRope vs StringBuilder"} + {:pattern #"^string-rope-construction$" :oc :string-rope :peer :string-builder :label "Construction" :section "StringRope vs StringBuilder"} + ;; ByteRope vs byte[] (arraycopy baseline) + {:pattern #"^byte-rope-splice$" :oc :byte-rope :peer :byte-array :label "Single Splice" :section "ByteRope vs byte[]"} + {:pattern #"^byte-rope-insert$" :oc :byte-rope :peer :byte-array :label "Single Insert" :section "ByteRope vs byte[]"} + {:pattern #"^byte-rope-remove$" :oc :byte-rope :peer :byte-array :label "Single Remove" :section "ByteRope vs byte[]"} + {:pattern #"^byte-rope-concat$" :oc :byte-rope :peer :byte-array :label "Concat 4 Pieces" :section "ByteRope vs byte[]"} + {:pattern #"^byte-rope-split$" :oc :byte-rope :peer :byte-array :label "Split at Midpoint" :section "ByteRope vs byte[]"} + {:pattern #"^byte-rope-repeated-edits$" :oc :byte-rope :peer :byte-array :label "200 Random Edits" :section "ByteRope vs byte[]"} + {:pattern #"^byte-rope-nth$" :oc :byte-rope :peer :byte-array :label "Random nth (1000)" :section "ByteRope vs byte[]"} + {:pattern #"^byte-rope-reduce$" :oc :byte-rope :peer :byte-array :label "Reduce (sum bytes)" :section "ByteRope vs byte[]"} + {:pattern #"^byte-rope-fold$" :oc :byte-rope :peer :byte-array :label "Fold (sum bytes)" :section "ByteRope vs byte[]"} + {:pattern #"^byte-rope-construction$" :oc :byte-rope :peer :byte-array :label "Construction" :section "ByteRope vs byte[]"} + {:pattern #"^byte-rope-bytes$" :oc :byte-rope :peer :byte-array :label "Materialization" :section "ByteRope vs byte[]"} + {:pattern #"^byte-rope-digest$" :oc :byte-rope :peer :byte-array :label "SHA-256" :section "ByteRope vs byte[]"} + ;; Range Map vs Guava TreeRangeMap + {:pattern #"^range-map-construction$" :oc :range-map :peer :guava-range-map :label "Construction" :section "Range Map vs Guava TreeRangeMap"} + {:pattern #"^range-map-bulk-construction$" :oc :range-map :peer :guava-range-map :label "Bulk Construction" :section "Range Map vs Guava TreeRangeMap"} + {:pattern #"^range-map-lookup$" :oc :range-map :peer :guava-range-map :label "Point Lookup" :section "Range Map vs Guava TreeRangeMap"} + {:pattern #"^range-map-carve-out$" :oc :range-map :peer :guava-range-map :label "Carve-out Insert" :section "Range Map vs Guava TreeRangeMap"} + {:pattern #"^range-map-iteration$" :oc :range-map :peer :guava-range-map :label "Iteration" :section "Range Map vs Guava TreeRangeMap"} + ;; Segment Tree vs sorted-map range reduction + {:pattern #"^segment-tree-construction$" :oc :segment-tree :peer :sorted-map :label "Construction" :section "Segment Tree vs sorted-map"} + {:pattern #"^segment-tree-query$" :oc :segment-tree :peer :sorted-map :label "Range Query" :section "Segment Tree vs sorted-map"} + {:pattern #"^segment-tree-update$" :oc :segment-tree :peer :sorted-map :label "Point Update" :section "Segment Tree vs sorted-map"} + ;; Priority Queue vs sorted-set-by of [priority seqnum value] tuples + {:pattern #"^priority-queue-construction$" :oc :priority-queue :peer :sorted-set-by :label "Construction" :section "Priority Queue vs sorted-set-by"} + {:pattern #"^priority-queue-push$" :oc :priority-queue :peer :sorted-set-by :label "Push" :section "Priority Queue vs sorted-set-by"} + {:pattern #"^priority-queue-pop-min$" :oc :priority-queue :peer :sorted-set-by :label "Pop-min" :section "Priority Queue vs sorted-set-by"} + ;; Ordered Multiset vs sorted-map counts + {:pattern #"^multiset-construction$" :oc :ordered-multiset :peer :sorted-map-counts :label "Construction" :section "Ordered Multiset vs sorted-map counts"} + {:pattern #"^multiset-multiplicity$" :oc :ordered-multiset :peer :sorted-map-counts :label "Multiplicity" :section "Ordered Multiset vs sorted-map counts"} + {:pattern #"^multiset-iteration$" :oc :ordered-multiset :peer :sorted-map-counts :label "Iteration" :section "Ordered Multiset vs sorted-map counts"} + ;; Fuzzy Set vs sorted-set + manual floor/ceiling + {:pattern #"^fuzzy-set-construction$" :oc :fuzzy-set :peer :sorted-set :label "Construction" :section "Fuzzy Set vs sorted-set"} + {:pattern #"^fuzzy-set-nearest$" :oc :fuzzy-set :peer :sorted-set :label "Nearest Lookup" :section "Fuzzy Set vs sorted-set"} + ;; Fuzzy Map vs sorted-map + manual floor/ceiling + {:pattern #"^fuzzy-map-construction$" :oc :fuzzy-map :peer :sorted-map :label "Construction" :section "Fuzzy Map vs sorted-map"} + {:pattern #"^fuzzy-map-nearest$" :oc :fuzzy-map :peer :sorted-map :label "Nearest Lookup" :section "Fuzzy Map vs sorted-map"}]) (defn headline-wins "Extract headline speedups pivoted by size, with explicit OC variant and peer. @@ -255,6 +352,111 @@ vec)) +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Category Summary — per-category aggregated stats +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn- median + [xs] + (let [sorted (sort xs) + n (count sorted)] + (when (pos? n) + (nth sorted (quot n 2))))) + +(defn- geomean + "Geometric mean — the right average for speedup ratios because + the arithmetic mean over-weights the large wins." + [xs] + (let [clean (remove (fn [x] (or (nil? x) (not (pos? x)))) xs) + n (count clean)] + (when (pos? n) + (Math/exp (/ (reduce + (map #(Math/log (double %)) clean)) n))))) + +(defn category-summary + "Aggregate the scorecard by category and return per-category stats. + Each entry reports case counts, median/geomean speedup, best win, + and worst loss within the category. Useful for a top-level + 'Performance by Category' section in the report." + [scorecard] + (->> scorecard + (group-by :category) + (map (fn [[category rows]] + (let [speedups (map :speedup rows) + wins (filter #(= :win (:status %)) rows) + losses (filter #(= :loss (:status %)) rows) + parity (filter #(= :parity (:status %)) rows) + best (when (seq wins) (apply max-key :speedup wins)) + worst (when (seq losses) (apply min-key :speedup losses))] + {:category category + :total (count rows) + :wins (count wins) + :parity (count parity) + :losses (count losses) + :median (median speedups) + :geomean (geomean speedups) + :best-win (:speedup best) + :best-win-group (:group best) + :worst-loss (:speedup worst) + :worst-loss-group (:group worst)}))) + (sort-by (juxt (fn [{:keys [category]}] + (.indexOf category-order category)))) + vec)) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Rope Family Summary — cross-variant comparison for structural ops +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(def ^:private rope-family-ops + "Operations where all three rope variants (rope, string-rope, + byte-rope) have matching benchmark groups, laid out side-by-side + for a quick cross-variant glance." + [{:label "Concat" :rope "rope-concat" :string "string-rope-concat" :byte "byte-rope-concat"} + {:label "Split" :rope nil :string "string-rope-split" :byte "byte-rope-split"} + {:label "Splice" :rope "rope-splice" :string "string-rope-splice" :byte "byte-rope-splice"} + {:label "Insert" :rope nil :string "string-rope-insert" :byte "byte-rope-insert"} + {:label "Remove" :rope nil :string "string-rope-remove" :byte "byte-rope-remove"} + {:label "200 Random Edits" :rope "rope-repeated-edits" :string "string-rope-repeated-edits" :byte "byte-rope-repeated-edits"} + {:label "Random nth" :rope "rope-nth" :string "string-rope-nth" :byte "byte-rope-nth"} + {:label "Reduce" :rope "rope-reduce" :string "string-rope-reduce" :byte "byte-rope-reduce"}]) + +(defn- rope-family-variant-baseline + [variant group-name] + (case variant + :rope {:oc :rope :peer :vector} + :string-rope {:oc :string-rope :peer :string} + :byte-rope {:oc :byte-rope :peer :byte-array})) + +(defn rope-family-summary + "For each structural op, collect per-variant speedup vs the natural + baseline at the largest benchmarked size. Returns rows with keys + :label, :rope-speedup, :string-rope-speedup, :byte-rope-speedup. + Missing cells are nil (e.g. generic rope has no 'insert' group)." + [rows sizes] + (let [by-key (group-by (juxt :size :group :variant) rows) + max-n (apply max sizes) + lookup (fn [size group variant] + (:mean-ns (first (by-key [size group (keyword variant)])))) + speedup (fn [variant group-name] + (when group-name + (let [{:keys [oc peer]} (rope-family-variant-baseline variant group-name) + oc-ns (lookup max-n (keyword group-name) oc) + peer-ns (lookup max-n (keyword group-name) peer)] + (when (and oc-ns peer-ns (pos? (double oc-ns))) + (/ (double peer-ns) (double oc-ns))))))] + (->> rope-family-ops + (map (fn [{:keys [label rope string byte]}] + {:label label + :size max-n + :rope-speedup (speedup :rope rope) + :string-rope-speedup (speedup :string-rope string) + :byte-rope-speedup (speedup :byte-rope byte)})) + ;; Only include rows that have at least one variant speedup + (filter (fn [{:keys [rope-speedup string-rope-speedup byte-rope-speedup]}] + (some some? [rope-speedup string-rope-speedup byte-rope-speedup]))) + vec))) + + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; Regressions (baseline comparison) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; diff --git a/etc/lib/bench_files.clj b/etc/lib/bench_files.clj index fac34bd..9b45077 100644 --- a/etc/lib/bench_files.clj +++ b/etc/lib/bench_files.clj @@ -9,9 +9,12 @@ (common/path "bench-results")) (defn benchmark-files + "Return timestamped benchmark EDN files in chronological order. + Excludes non-timestamped files like string-rope-full.edn." [] (->> (fs/glob (bench-results-dir) "*.edn") (map str) + (filter #(re-find #"/\d{4}-\d{2}-\d{2}_" %)) sort)) (defn latest-benchmark-file @@ -27,11 +30,12 @@ (defn parse-args [args] (loop [[arg & more] args - opts {:top 12}] + opts {:top 30}] (cond (nil? arg) opts (#{"--help" "-h"} arg) (recur more (assoc opts :help true)) (= "--all" arg) (recur more (assoc opts :all true)) + (= "--publish" arg) (recur more (assoc opts :publish true)) (= "--file" arg) (recur (rest more) (assoc opts :file (first more))) (= "--baseline" arg) (recur (rest more) (assoc opts :baseline (first more))) (= "--top" arg) (recur (rest more) (assoc opts :top (Long/parseLong (first more)))) @@ -43,9 +47,18 @@ (let [target (or file (latest-benchmark-file))] (when-not target (throw (ex-info "No benchmark result files found." {:dir (bench-results-dir)}))) - (assoc opts - :file (ensure-file! target) - :baseline (some-> baseline ensure-file!)))) + ;; Auto-select the second-most-recent EDN as baseline when none is + ;; specified, matching the auto-compare behavior in bench-runner. + (let [auto-baseline + (when-not baseline + (let [all (benchmark-files) + target-str (str target)] + (->> all + (remove #(= % target-str)) + last)))] + (assoc opts + :file (ensure-file! target) + :baseline (some-> (or baseline auto-baseline) ensure-file!))))) (defn usage [] (str/join @@ -57,4 +70,8 @@ " --baseline PATH Compare against a baseline result file" " --top N Number of ranked rows to show (default 12)" " --all Show all ranked rows" + " --publish Publish mode: omit the Full Scorecard, Regressions," + " and Improvements sections. Use when redirecting to" + " doc/report.txt — those sections are intended for" + " interactive A/B review, not the committed snapshot." " --help Show this help"])) diff --git a/etc/lib/bench_render.clj b/etc/lib/bench_render.clj index 22af70a..4c15b42 100644 --- a/etc/lib/bench_render.clj +++ b/etc/lib/bench_render.clj @@ -16,6 +16,9 @@ (def loss-cols [{:width 8} {:width 22} {:width 18} {:width 18} {:width 12 :align :right} {:width 0}]) +(def win-cols + [{:width 8} {:width 22} {:width 18} {:width 18} {:width 12 :align :right}]) + (def delta-cols [{:width 8} {:width 22} {:width 20} {:width 10 :align :right} {:width 10 :align :right} {:width 10 :align :right} {:width 14}]) @@ -27,6 +30,15 @@ (def parity-cols [{:width 24} {:width 18} {:width 18} {:width 10 :align :right}]) +(def category-cols + [{:width 18} {:width 7 :align :right} {:width 7 :align :right} + {:width 8 :align :right} {:width 9 :align :right} {:width 9 :align :right} + {:width 28}]) + +(def rope-family-cols + [{:width 22} {:width 10 :align :right} {:width 14 :align :right} + {:width 13 :align :right}]) + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; Helpers @@ -137,10 +149,7 @@ (doseq [{:keys [label sizes]} section-rows] (let [cells (mapv (fn [entry] (if entry - (let [s (:speedup entry)] - (if (>= s 1.0) - (format "**%.1fx**" (double s)) - (format "%.1fx" (double s)))) + (format "%.1fx" (double (:speedup entry))) "—")) sizes)] (apply report/table-row cols (into [label] cells))))))))) @@ -164,6 +173,27 @@ (format "~%.2fx" (double speedup))]))))) +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Significant Wins +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn render-significant-wins + "Scorecard entries where OC wins by more than ~1.2x, sorted by speedup." + [wins opts] + (when (seq wins) + (common/section "Significant Wins") + (apply report/table-row win-cols + ["Size" "Group" "OC Variant" "Peer" "Speedup"]) + (doseq [{:keys [size group ordered-variant peer-variant speedup]} + (limit wins opts)] + (apply report/table-row win-cols + [(str size) + (name group) + (name ordered-variant) + (name peer-variant) + (format "%.1fx" (double speedup))])))) + + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; Significant Losses ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; @@ -184,6 +214,73 @@ (or context "")])))) +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Category Summary +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn- fmt-ratio-right [r] + (when r (format "%.1fx" (double r)))) + +(defn render-category-summary + "Per-category aggregated stats. Each row shows how many wins/parity/losses + the category contributes, the geometric mean speedup across all of the + category's cases, and the single best and worst cases with their groups." + [summary] + (when (seq summary) + (common/section "Performance by Category") + (apply report/table-row category-cols + ["Category" "Wins" "Parity" "Losses" "Geomean" "Best" "Worst case (group)"]) + (doseq [{:keys [category wins parity losses geomean best-win + worst-loss worst-loss-group]} summary] + (apply report/table-row category-cols + [(name category) + (str wins) + (str parity) + (str losses) + (or (fmt-ratio-right geomean) "-") + (or (fmt-ratio-right best-win) "-") + (if worst-loss + (format "%.1fx slower (%s)" + (/ 1.0 (double worst-loss)) + (name worst-loss-group)) + "-")])))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Rope Family Summary +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn- fmt-speedup-cell [s] + (cond + (nil? s) "—" + (>= s 10.0) (format "%.0fx" (double s)) + (>= s 1.0) (format "%.1fx" (double s)) + (>= s 0.1) (format "%.2fx" (double s)) + (>= s 0.01) (format "%.3fx" (double s)) + :else (format "%.4fx" (double s)))) + +(defn render-rope-family + "Side-by-side speedups for the three rope variants on structural + operations, each compared against its natural baseline (vector / + String / byte[]) at the largest benchmarked size." + [rows] + (when (seq rows) + (let [size (:size (first rows))] + (common/section "Rope Family at Scale") + (println (str " Each cell is 'variant vs natural baseline' speedup at N=" + size ".")) + (println (str " rope vs PersistentVector · string-rope vs String · byte-rope vs byte[]")) + (println) + (apply report/table-row rope-family-cols + ["Operation" "rope" "string-rope" "byte-rope"]) + (doseq [{:keys [label rope-speedup string-rope-speedup byte-rope-speedup]} rows] + (apply report/table-row rope-family-cols + [label + (fmt-speedup-cell rope-speedup) + (fmt-speedup-cell string-rope-speedup) + (fmt-speedup-cell byte-rope-speedup)]))))) + + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; Full Scorecard ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; diff --git a/project.clj b/project.clj index 4431f4a..e498435 100644 --- a/project.clj +++ b/project.clj @@ -1,4 +1,4 @@ -(defproject com.dean/ordered-collections "0.2.0" +(defproject com.dean/ordered-collections "0.2.1" :description "Fast, modern, _ropes_ and ordered collections that do more than sort." :author "Dan Lentz " :url "http://github.com/dco-dev/ordered-collections" @@ -12,7 +12,8 @@ [org.clojure/math.combinatorics "0.3.2"] [criterium "0.4.6"] [com.clojure-goes-fast/clj-memory-meter "0.3.0"] - [com.google.guava/guava "33.0.0-jre"]] + [com.google.guava/guava "33.0.0-jre"] + [org.knowm.xchart/xchart "3.8.8"]] :jvm-opts ["-Djdk.attach.allowAttachSelf" "-XX:+EnableDynamicAgentLoading"]}} @@ -39,6 +40,8 @@ "bench-rope-fold" ["run" "-m" "ordered-collections.rope-fold-bench"] "bench-transient-rope" ["run" "-m" "ordered-collections.transient-rope-bench"] "bench-rope-tuning" ["run" "-m" "ordered-collections.rope-tuning-bench"] + "bench-string-rope" ["run" "-m" "ordered-collections.string-rope-bench"] "stats" ["shell" "bb" "stats"] "bench-report" ["shell" "bb" "bench-report"] + "bench-charts" ["run" "-m" "ordered-collections.bench-charts"] "paper" ["shell" "bb" "paper"]}) diff --git a/resources/.#data_readers.clj b/resources/.#data_readers.clj new file mode 120000 index 0000000..24dd359 --- /dev/null +++ b/resources/.#data_readers.clj @@ -0,0 +1 @@ +dan.lentz@lentz-mbpro-14233.830 \ No newline at end of file diff --git a/resources/data_readers.clj b/resources/data_readers.clj index 19e9be3..87dd611 100644 --- a/resources/data_readers.clj +++ b/resources/data_readers.clj @@ -1,8 +1,10 @@ -{ordered/set ordered-collections.readers/ordered-set - ordered/map ordered-collections.readers/ordered-map - ordered/interval-set ordered-collections.readers/interval-set - ordered/interval-map ordered-collections.readers/interval-map - ordered/range-map ordered-collections.readers/range-map - ordered/priority-queue ordered-collections.readers/priority-queue - ordered/multiset ordered-collections.readers/ordered-multiset - ordered/rope ordered-collections.readers/rope} +{ordered/set ordered-collections.readers/ordered-set + ordered/map ordered-collections.readers/ordered-map + interval/set ordered-collections.readers/interval-set + interval/map ordered-collections.readers/interval-map + range/map ordered-collections.readers/range-map + priority/queue ordered-collections.readers/priority-queue + multi/set ordered-collections.readers/ordered-multiset + vec/rope ordered-collections.readers/rope + string/rope ordered-collections.readers/string-rope + byte/rope ordered-collections.readers/byte-rope} diff --git a/src/ordered_collections/core.clj b/src/ordered_collections/core.clj index ac2e421..3cb477c 100644 --- a/src/ordered_collections/core.clj +++ b/src/ordered_collections/core.clj @@ -18,6 +18,8 @@ [ordered-collections.types.priority-queue :as pq] [ordered-collections.types.range-map :as rmap] [ordered-collections.types.rope :as rope] + [ordered-collections.types.string-rope :as string-rope] + [ordered-collections.types.byte-rope :as byte-rope] [ordered-collections.types.segment-tree :as segtree] [ordered-collections.protocol :as proto] [ordered-collections.util :refer [defalias]]) @@ -1012,9 +1014,9 @@ Examples: (rope-concat (rope [1 2 3]) (rope [4 5 6])) - ;=> #ordered/rope [1 2 3 4 5 6] + ;=> #vec/rope [1 2 3 4 5 6] (rope-concat (rope [1 2]) (rope [3 4]) (rope [5 6])) - ;=> #ordered/rope [1 2 3 4 5 6]" + ;=> #vec/rope [1 2 3 4 5 6]" rope/rope-concat) (defalias rope-split @@ -1022,7 +1024,7 @@ Examples: (rope-split (rope (range 10)) 4) - ;=> [#ordered/rope [0 1 2 3] #ordered/rope [4 5 6 7 8 9]]" + ;=> [#vec/rope [0 1 2 3] #vec/rope [4 5 6 7 8 9]]" proto/rope-split) (defalias rope-sub @@ -1031,7 +1033,7 @@ Examples: (rope-sub (rope (range 100)) 20 30) - ;=> #ordered/rope [20 21 22 23 24 25 26 27 28 29]" + ;=> #vec/rope [20 21 22 23 24 25 26 27 28 29]" proto/rope-sub) (defalias rope-insert @@ -1039,7 +1041,7 @@ Examples: (rope-insert (rope [0 1 2 3]) 2 [:a :b]) - ;=> #ordered/rope [0 1 :a :b 2 3]" + ;=> #vec/rope [0 1 :a :b 2 3]" proto/rope-insert) (defalias rope-remove @@ -1047,7 +1049,7 @@ Examples: (rope-remove (rope (range 10)) 3 7) - ;=> #ordered/rope [0 1 2 7 8 9]" + ;=> #vec/rope [0 1 2 7 8 9]" proto/rope-remove) (defalias rope-splice @@ -1055,7 +1057,7 @@ Examples: (rope-splice (rope (range 10)) 2 5 [:x :y]) - ;=> #ordered/rope [0 1 :x :y 5 6 7 8 9]" + ;=> #vec/rope [0 1 :x :y 5 6 7 8 9]" proto/rope-splice) (defalias rope-chunks @@ -1078,3 +1080,113 @@ (rope-str (rope (seq \"hello world\"))) ;=> \"hello world\"" proto/rope-str) + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; String Rope +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defalias string-rope + "Create a persistent string rope for structural text editing. + Backed by a chunked weight-balanced tree with String chunks: + O(log n) concat, split, splice, insert, and remove. Competitive at + all sizes, dominant at scale. + + Implements CharSequence for seamless Java interop (regex, clojure.string, + java.io, etc.). String equality: (= (string-rope \"x\") \"x\") is true. + + Examples: + (string-rope \"hello world\") + (string-rope (slurp \"big-file.txt\")) + (str (string-rope \"hello\")) ;=> \"hello\"" + string-rope/string-rope) + +(defalias string-rope-concat + "Concatenate string ropes or strings. + Two arguments: O(log n) binary tree join. + Three or more: O(total chunks) bulk construction. + + Examples: + (string-rope-concat (string-rope \"hello \") (string-rope \"world\")) + ;=> #string/rope \"hello world\"" + string-rope/string-rope-concat) + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Byte Rope +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defalias byte-rope + "Create a persistent byte rope for structural editing of binary data. + Backed by a chunked weight-balanced tree with byte[] chunks: + O(log n) concat, split, splice, insert, and remove. Unsigned byte + semantics — elements are longs in [0, 255]. + + Accepts nil, byte[] (defensively copied), String (UTF-8 encoded), + InputStream (fully consumed), another ByteRope, or any sequential + collection of unsigned byte values. + + Examples: + (byte-rope) ;=> #byte/rope \"\" + (byte-rope (byte-array [1 2 3])) ;=> #byte/rope \"010203\" + (byte-rope [0 127 255]) ;=> #byte/rope \"007fff\" + (byte-rope \"hello\") ;=> #byte/rope \"68656c6c6f\"" + byte-rope/byte-rope) + +(defalias byte-rope-concat + "Concatenate byte ropes or byte arrays. + Two arguments: O(log n) binary tree join. + Three or more: O(total chunks) bulk construction." + byte-rope/byte-rope-concat) + +(defalias byte-rope-bytes + "Materialize a byte rope to a defensively-copied byte[]." + byte-rope/byte-rope-bytes) + +(defalias byte-rope-hex + "Return the byte rope's contents as a lowercase hex string." + byte-rope/byte-rope-hex) + +(defalias byte-rope-write + "Stream a byte rope's contents to an OutputStream, chunk by chunk." + byte-rope/byte-rope-write) + +(defalias byte-rope-input-stream + "Return a java.io.InputStream that reads over the byte rope's contents." + byte-rope/byte-rope-input-stream) + +(defalias byte-rope-get-byte + "Return the unsigned byte value (long in [0, 255]) at offset." + byte-rope/byte-rope-get-byte) + +(defalias byte-rope-get-short + "Return a big-endian unsigned 16-bit integer at offset." + byte-rope/byte-rope-get-short) + +(defalias byte-rope-get-short-le + "Return a little-endian unsigned 16-bit integer at offset." + byte-rope/byte-rope-get-short-le) + +(defalias byte-rope-get-int + "Return a big-endian signed 32-bit integer at offset." + byte-rope/byte-rope-get-int) + +(defalias byte-rope-get-int-le + "Return a little-endian signed 32-bit integer at offset." + byte-rope/byte-rope-get-int-le) + +(defalias byte-rope-get-long + "Return a big-endian signed 64-bit integer at offset." + byte-rope/byte-rope-get-long) + +(defalias byte-rope-get-long-le + "Return a little-endian signed 64-bit integer at offset." + byte-rope/byte-rope-get-long-le) + +(defalias byte-rope-index-of + "Return the first index of the given unsigned byte value, or -1." + byte-rope/byte-rope-index-of) + +(defalias byte-rope-digest + "Compute a cryptographic digest (SHA-256, SHA-1, MD5, etc.) of the byte + rope's contents by streaming chunks through java.security.MessageDigest. + Returns a byte rope of the digest." + byte-rope/byte-rope-digest) diff --git a/src/ordered_collections/kernel/chunk.clj b/src/ordered_collections/kernel/chunk.clj new file mode 100644 index 0000000..bf5b375 --- /dev/null +++ b/src/ordered_collections/kernel/chunk.clj @@ -0,0 +1,265 @@ +(ns ordered-collections.kernel.chunk + "PRopeChunk protocol extensions for the built-in chunk backends used by + the rope kernel. + + The rope kernel operates on opaque 'chunks' whose concrete type varies by + rope variant: + + Rope — APersistentVector chunks (arbitrary Clojure values) + StringRope — java.lang.String chunks (UTF-16 code units) + ByteRope — byte[] chunks (raw bytes, unsigned 0–255 semantics) + + Each backend here provides the 13 primitive operations the kernel needs + (length, slice, merge, nth, append, last, butlast, update, of, + reduce-init, append-sb, splice, splice-split). The rope kernel dispatches + through the PRopeChunk protocol so that rope-concat/split/splice/etc. + are written once and work for every backend. + + PRopeChunk is strictly an internal dispatch table — nothing outside the + kernel should depend on these methods directly. User-facing interop with + Clojure built-in collections lives in `ordered-collections.types.interop`." + (:require [ordered-collections.protocol :as proto])) + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; APersistentVector — generic Rope backend +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(extend-type clojure.lang.APersistentVector + proto/PRopeChunk + (chunk-length [c] (.count ^clojure.lang.Counted c)) + (chunk-slice [c start end] (subvec c (int start) (int end))) + (chunk-merge [c other] (into c other)) + (chunk-nth [c i] (.nth ^clojure.lang.Indexed c (unchecked-int i))) + (chunk-append [c x] (conj c x)) + (chunk-last [c] (peek c)) + (chunk-butlast [c] (pop c)) + (chunk-update [c i x] (assoc c (int i) x)) + (chunk-of [_ x] [x]) + (chunk-reduce-init [c f init] + (if (instance? clojure.lang.IReduceInit c) + (.reduce ^clojure.lang.IReduceInit c f init) + ;; SubVector fallback — indexed iteration + (let [^clojure.lang.Indexed c c + n (.count ^clojure.lang.Counted c)] + (loop [i (int 0) acc init] + (if (< i n) + (let [ret (f acc (.nth c i))] + (if (reduced? ret) @ret (recur (unchecked-inc-int i) ret))) + acc))))) + (chunk-append-sb [c ^java.lang.StringBuilder sb] + (let [n (.count ^clojure.lang.Counted c)] + (dotimes [i n] + (.append sb (.nth ^clojure.lang.Indexed c (unchecked-int i)))))) + (chunk-splice [c start end replacement] + (let [prefix (subvec c 0 (int start)) + suffix (subvec c (int end))] + (if replacement + (into (into prefix replacement) suffix) + (into prefix suffix)))) + (chunk-splice-split [c start end replacement half] + (let [si (int start) + ei (int end) + r (or replacement []) + h (int half) + rlen (count r)] + (cond + (<= h si) + [(subvec c 0 h) + (into (into (subvec c h si) r) (subvec c ei))] + + (<= h (+ si rlen)) + (let [roff (- h si) + rv (vec r)] + [(into (subvec c 0 si) (subvec rv 0 roff)) + (into (subvec rv roff) (subvec c ei))]) + + :else + (let [soff (+ ei (- h si rlen))] + [(into (into (subvec c 0 si) r) (subvec c ei soff)) + (subvec c soff)]))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; String — StringRope backend +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(extend-type String + proto/PRopeChunk + (chunk-length [c] (.length ^String c)) + (chunk-slice [c start end] (.substring ^String c (int start) (int end))) + (chunk-merge [c other] (str c other)) + (chunk-nth [c i] (.charAt ^String c (unchecked-int i))) + (chunk-append [c x] (str c (char x))) + (chunk-last [c] (.charAt ^String c (unchecked-dec-int (.length ^String c)))) + (chunk-butlast [c] (.substring ^String c 0 (unchecked-dec-int (.length ^String c)))) + (chunk-update [c i x] + (let [sb (StringBuilder. ^String c)] + (.setCharAt sb (unchecked-int i) (char x)) + (.toString sb))) + (chunk-of [_ x] (String/valueOf (char x))) + (chunk-reduce-init [c f init] + (let [^String s c + n (.length s)] + (loop [i (int 0) acc init] + (if (< i n) + (let [ret (f acc (.charAt s i))] + (if (reduced? ret) @ret (recur (unchecked-inc-int i) ret))) + acc)))) + (chunk-append-sb [c ^java.lang.StringBuilder sb] + (.append sb ^String c)) + (chunk-splice [c start end replacement] + (let [^String s c + ^String r (or replacement "") + si (int start) + ei (int end) + sb (StringBuilder. (+ (- (.length s) (- ei si)) (.length r)))] + (.append sb s 0 si) + (.append sb r) + (.append sb s ei (.length s)) + (.toString sb))) + (chunk-splice-split [c start end replacement half] + (let [^String s c + ^String r (or replacement "") + si (int start) + ei (int end) + rl (.length r) + sl (.length s) + h (int half) + rhs-len (- (+ sl rl) (- ei si) h)] + (cond + ;; Split falls in prefix — left is a substring, right needs StringBuilder + (<= h si) + [(.substring s 0 h) + (let [sb (StringBuilder. rhs-len)] + (.append sb s h si) + (.append sb r) + (.append sb s ei sl) + (.toString sb))] + + ;; Split falls in replacement — both need StringBuilder + (<= h (+ si rl)) + (let [roff (- h si)] + [(let [sb (StringBuilder. h)] + (.append sb s 0 si) + (.append sb r 0 roff) + (.toString sb)) + (let [sb (StringBuilder. rhs-len)] + (.append sb r roff rl) + (.append sb s ei sl) + (.toString sb))]) + + ;; Split falls in suffix — left needs StringBuilder, right is a substring + :else + (let [soff (+ ei (- h si rl))] + [(let [sb (StringBuilder. h)] + (.append sb s 0 si) + (.append sb r) + (.append sb s ei soff) + (.toString sb)) + (.substring s soff sl)]))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; byte[] — ByteRope backend +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(extend (Class/forName "[B") + proto/PRopeChunk + {:chunk-length (fn [c] (alength ^bytes c)) + :chunk-slice (fn [c start end] + (java.util.Arrays/copyOfRange ^bytes c (int start) (int end))) + :chunk-merge (fn [c other] + (let [^bytes a c, ^bytes b other + al (alength a), bl (alength b) + result (byte-array (+ al bl))] + (System/arraycopy a 0 result 0 al) + (System/arraycopy b 0 result al bl) + result)) + :chunk-nth (fn [c i] + (bit-and (long (aget ^bytes c (unchecked-int i))) 0xFF)) + :chunk-append (fn [c x] + (let [^bytes a c + n (alength a) + result (java.util.Arrays/copyOf a (unchecked-inc-int n))] + (aset result n (unchecked-byte (long x))) + result)) + :chunk-last (fn [c] + (let [^bytes a c + n (alength a)] + (bit-and (long (aget a (unchecked-dec-int n))) 0xFF))) + :chunk-butlast (fn [c] + (let [^bytes a c] + (java.util.Arrays/copyOf a (unchecked-dec-int (alength a))))) + :chunk-update (fn [c i x] + (let [result (aclone ^bytes c)] + (aset result (unchecked-int i) (unchecked-byte (long x))) + result)) + :chunk-of (fn [_ x] + (let [a (byte-array 1)] + (aset a 0 (unchecked-byte (long x))) + a)) + :chunk-reduce-init (fn [c f init] + (let [^bytes a c, n (alength a)] + (loop [i (int 0), acc init] + (if (< i n) + (let [ret (f acc (bit-and (long (aget a i)) 0xFF))] + (if (reduced? ret) @ret (recur (unchecked-inc-int i) ret))) + acc)))) + :chunk-append-sb (fn [c ^StringBuilder sb] + ;; Hex-encode bytes for textual display (bytes are not chars). + (let [^bytes a c, n (alength a)] + (dotimes [i n] + (let [b (bit-and (long (aget a i)) 0xFF)] + (.append sb (Character/forDigit (int (bit-shift-right b 4)) 16)) + (.append sb (Character/forDigit (int (bit-and b 0xF)) 16)))))) + :chunk-splice (fn [c start end replacement] + (let [^bytes s c + ^bytes r (if replacement replacement (byte-array 0)) + si (int start), ei (int end) + sl (alength s), rl (alength r) + result (byte-array (+ (- sl (- ei si)) rl))] + (System/arraycopy s 0 result 0 si) + (when (pos? rl) + (System/arraycopy r 0 result si rl)) + (System/arraycopy s ei result (+ si rl) (- sl ei)) + result)) + :chunk-splice-split (fn [c start end replacement half] + (let [^bytes s c + ^bytes r (if replacement replacement (byte-array 0)) + si (int start), ei (int end) + sl (alength s), rl (alength r) + h (int half) + rhs-len (- (+ sl rl) (- ei si) h)] + (cond + ;; Split falls in prefix — left is a copyOfRange, right built from pieces + (<= h si) + [(java.util.Arrays/copyOfRange s 0 h) + (let [rhs (byte-array rhs-len)] + (System/arraycopy s h rhs 0 (- si h)) + (when (pos? rl) + (System/arraycopy r 0 rhs (- si h) rl)) + (System/arraycopy s ei rhs (+ (- si h) rl) (- sl ei)) + rhs)] + + ;; Split falls within the replacement + (<= h (+ si rl)) + (let [roff (- h si) + lhs (byte-array h) + rhs (byte-array rhs-len)] + (System/arraycopy s 0 lhs 0 si) + (when (pos? roff) + (System/arraycopy r 0 lhs si roff)) + (when (< roff rl) + (System/arraycopy r roff rhs 0 (- rl roff))) + (System/arraycopy s ei rhs (- rl roff) (- sl ei)) + [lhs rhs]) + + ;; Split falls in suffix + :else + (let [soff (+ ei (- h si rl)) + lhs (byte-array h)] + (System/arraycopy s 0 lhs 0 si) + (when (pos? rl) + (System/arraycopy r 0 lhs si rl)) + (System/arraycopy s ei lhs (+ si rl) (- soff ei)) + [lhs (java.util.Arrays/copyOfRange s soff sl)])))) }) diff --git a/src/ordered_collections/kernel/rope.clj b/src/ordered_collections/kernel/rope.clj index b144ed2..f1123a0 100644 --- a/src/ordered_collections/kernel/rope.clj +++ b/src/ordered_collections/kernel/rope.clj @@ -28,20 +28,51 @@ - rope-conj-right: fills rightmost chunk, overflows to new node - rope-pop-right: shrinks rightmost chunk, removes if empty - coll->root: partition-all target produces valid chunks" - (:require [ordered-collections.kernel.node :as node + (:refer-clojure :exclude [chunk-append]) + (:require [ordered-collections.protocol :as proto + :refer [chunk-length chunk-slice chunk-merge chunk-nth + chunk-append chunk-last chunk-butlast chunk-update + chunk-of chunk-reduce-init chunk-append-sb + chunk-splice chunk-splice-split]] + ;; Force-load PRopeChunk extensions for the built-in chunk + ;; backends (APersistentVector, String, byte[]). These must be + ;; loaded before any rope-kernel function dispatches on a chunk. + [ordered-collections.kernel.chunk] + [ordered-collections.kernel.node :as node :refer [leaf leaf? -k -v -l -r]] [ordered-collections.kernel.tree :as tree] - [ordered-collections.parallel :as par]) - (:import [clojure.lang Murmur3 SeqIterator Util])) + [ordered-collections.parallel :as par])) +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Rope Invariants +;; +;; Library-wide default CSI values. Kept as plain defs so external code +;; (tests, benchmarks) can reference them. Internal kernel code reads +;; from the dynamic vars below, which each rope variant binds inside +;; its `with-tree` macro to its own per-variant constants. This lets +;; the generic rope, string-rope, and byte-rope each carry a chunk size +;; that is appropriate for their underlying storage. +;; +;; Defaults are 1024/512 after tuning via `lein bench-rope-tuning`. +;; The generic rope, string rope, and byte rope all benefit from larger +;; chunks at 100K+ element counts. See each types/*_rope.clj file for +;; the per-variant rationale. +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; -(def ^:const +target-chunk-size+ 256) -(def ^:const +min-chunk-size+ 128) +(def +target-chunk-size+ 1024) +(def +min-chunk-size+ 512) + +(def ^:dynamic *target-chunk-size* +target-chunk-size+) +(def ^:dynamic *min-chunk-size* +min-chunk-size+) (declare raw-rope-concat) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; Rope Node Basics +;; +;; The chunk-backend protocol extensions this kernel dispatches on +;; (APersistentVector, String, byte[]) live in +;; `ordered-collections.kernel.chunk`, loaded via the ns require above. ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; (defn rope-size @@ -54,19 +85,38 @@ ^long [root] (tree/node-size root)) -(defn- rope-node-create - "Create a rope node rooted at chunk. The node value stores subtree element - count; the balance metric remains ordinary node count." +(defn rope-node-create + "Create a rope node for generic (vector) chunks. The node value stores + subtree element count; the balance metric remains ordinary node count. + This is the *t-join* binding for the generic Rope type." [chunk _ l r] (node/->SimpleNode chunk - (+ (count chunk) (rope-size l) (rope-size r)) + (+ (chunk-length chunk) (rope-size l) (rope-size r)) + l r + (+ 1 (tree/node-size l) (tree/node-size r)))) + +(defn string-rope-node-create + "Create a rope node for String chunks. The node value stores subtree + character count. This is the *t-join* binding for StringRope." + [^String chunk _ l r] + (node/->SimpleNode chunk + (+ (.length chunk) (rope-size l) (rope-size r)) + l r + (+ 1 (tree/node-size l) (tree/node-size r)))) + +(defn byte-rope-node-create + "Create a rope node for byte[] chunks. The node value stores subtree + byte count. This is the *t-join* binding for ByteRope." + [^bytes chunk _ l r] + (node/->SimpleNode chunk + (+ (alength chunk) (rope-size l) (rope-size r)) l r (+ 1 (tree/node-size l) (tree/node-size r)))) (defn- chunk-node [chunk] - (when (seq chunk) - (rope-node-create (vec chunk) nil (leaf) (leaf)))) + (when (pos? (chunk-length chunk)) + (tree/*t-join* chunk nil (leaf) (leaf)))) (defn- node-chunk [n] @@ -74,48 +124,120 @@ (defn- build-root [chunks] - (let [n (count chunks)] - (when (pos? n) - (let [mid (quot n 2) - chunk (nth chunks mid)] - (rope-node-create chunk nil - (build-root (subvec chunks 0 mid)) - (build-root (subvec chunks (inc mid) n))))))) + (let [create tree/*t-join*] + (letfn [(build [chunks] + (let [n (count chunks)] + (when (pos? n) + (let [mid (quot n 2) + chunk (nth chunks mid)] + (create chunk nil + (build (subvec chunks 0 mid)) + (build (subvec chunks (inc mid) n)))))))] + (build chunks)))) (defn chunks->root + "Build a balanced rope tree from a sequence of chunks. + Empty chunks are filtered out." [chunks] - (build-root (vec (remove empty? chunks)))) + (build-root (into [] (remove #(zero? (chunk-length %))) chunks))) (defn root->chunks + "Extract all chunks from a rope tree in order." [root] (if (leaf? root) [] (tree/node-reduce-keys conj [] root))) (defn coll->root + "Build a rope tree from a sequential collection, partitioned into chunks. + Uses the currently bound `*target-chunk-size*`." [coll] - (chunks->root (mapv vec (partition-all +target-chunk-size+ coll)))) + (chunks->root (mapv vec (partition-all (long *target-chunk-size*) coll)))) + +(defn str->root + "Build a rope tree from a String, partitioned into substring chunks. + Uses the currently bound `*target-chunk-size*`. Split points are + adjusted to avoid breaking UTF-16 surrogate pairs at chunk boundaries." + [^String s] + (let [n (long (.length s)) + target (long *target-chunk-size*)] + (when (pos? n) + (build-root + (loop [pos (long 0), acc (transient [])] + (if (>= pos n) + (persistent! acc) + (let [raw-end (unchecked-add pos target) + end (if (< raw-end n) raw-end n) + ;; Don't split between a high and low surrogate. + end (if (and (< end n) + (pos? end) + (Character/isHighSurrogate + (.charAt s (unchecked-int (unchecked-dec end))))) + (unchecked-dec end) + end)] + (recur end (conj! acc (.substring s (unchecked-int pos) (unchecked-int end))))))))))) + +(defn bytes->root + "Build a rope tree from a byte array, partitioned into target-sized byte[] chunks. + Always copies the input so the tree never shares mutable state with the caller. + Uses the currently bound `*target-chunk-size*`." + [^bytes data] + (let [n (long (alength data)) + target (long *target-chunk-size*)] + (when (pos? n) + (build-root + (loop [pos (long 0), acc (transient [])] + (if (>= pos n) + (persistent! acc) + (let [raw-end (unchecked-add pos target) + end (if (< raw-end n) raw-end n)] + (recur end (conj! acc + (java.util.Arrays/copyOfRange data + (unchecked-int pos) (unchecked-int end))))))))))) + +(defn byte-rope->bytes + "Materialize a byte rope tree to a single byte array via bulk arraycopy. O(n)." + ^bytes [root] + (if (leaf? root) + (byte-array 0) + (let [n (long (rope-size root)) + result (byte-array n) + offset (long-array 1)] + (letfn [(walk [n] + (when-not (leaf? n) + (walk (-l n)) + (let [^bytes chunk (-k n) + clen (alength chunk) + off (aget offset 0)] + (System/arraycopy chunk 0 result (int off) clen) + (aset offset 0 (unchecked-add off clen))) + (walk (-r n))))] + (walk root)) + result))) (defn chunks->root-csi "Build a rope tree from a sequence of chunks, ensuring CSI. Scans left to right, accumulating a current chunk. When the current chunk reaches [min, target] it is emitted; when it would exceed target - it is split evenly. The last chunk is emitted at any size (the runt)." + it is split evenly. The last chunk is emitted at any size (the runt). + Uses the currently bound `*target-chunk-size*` and `*min-chunk-size*`." [chunks] - (let [chunks (vec (remove empty? chunks)) - n (count chunks)] + (let [chunks (into [] (remove #(zero? (chunk-length %))) chunks) + n (count chunks) + target (long *target-chunk-size*) + minsz (long *min-chunk-size*)] (if (<= n 1) (build-root chunks) (let [fixed (loop [i (int 0), cur nil, acc (transient [])] (if (>= i n) (persistent! (if cur (conj! acc cur) acc)) (let [chunk (nth chunks i) - merged (if cur (into cur chunk) chunk) - mn (count merged)] + merged (if cur (chunk-merge cur chunk) chunk) + mn (long (chunk-length merged))] (cond ;; Merged chunk fits in target — keep accumulating - (<= mn +target-chunk-size+) - (if (and (>= mn +min-chunk-size+) + (<= mn target) + (if (and (>= mn minsz) (< (unchecked-inc-int i) n)) ;; Large enough and not last — emit and reset (recur (unchecked-inc-int i) nil (conj! acc merged)) @@ -126,12 +248,12 @@ :else (let [half (quot mn 2)] (recur (unchecked-inc-int i) - (subvec merged half) - (conj! acc (subvec merged 0 half))))))))] + (chunk-slice merged half mn) + (conj! acc (chunk-slice merged 0 half))))))))] (build-root fixed))))) (defn- rechunk-balanced - "Partition a flat vector of elements into chunks satisfying CSI: + "Partition a merged chunk into sub-chunks satisfying CSI: every chunk in [min, target] except possibly the last which may be smaller. When the last full-sized chunk would leave a remainder below min, the final two pieces are split evenly instead. @@ -139,58 +261,69 @@ For the small inputs produced by boundary normalization (~512 elements) this is constant-time work." [elems] - (let [n (count elems)] + (let [n (long (chunk-length elems)) + target (long *target-chunk-size*) + minsz (long *min-chunk-size*)] (cond - (<= n 0) [] - (<= n +target-chunk-size+) [elems] + (<= n 0) [] + (<= n target) [elems] :else (loop [pos 0, acc (transient [])] (let [rem (- n pos)] (cond - (<= rem +target-chunk-size+) - (persistent! (conj! acc (subvec elems pos n))) + (<= rem target) + (persistent! (conj! acc (chunk-slice elems pos n))) - (< (- rem +target-chunk-size+) +min-chunk-size+) + (< (- rem target) minsz) ;; Taking a full target chunk would leave a runt below min. ;; Split the remaining portion evenly so both halves >= min. (let [half (quot rem 2)] (persistent! - (conj! (conj! acc (subvec elems pos (+ pos half))) - (subvec elems (+ pos half) n)))) + (conj! (conj! acc (chunk-slice elems pos (+ pos half))) + (chunk-slice elems (+ pos half) n)))) :else - (recur (+ pos +target-chunk-size+) - (conj! acc (subvec elems pos (+ pos +target-chunk-size+)))))))))) + (recur (+ pos target) + (conj! acc (chunk-slice elems pos (+ pos target)))))))))) (defn- rechunk-root - "Build a rope root from a small flat vector by rechunking to CSI. + "Build a rope root from a merged chunk by rechunking to CSI. Boundary repair only produces a handful of chunks, so it is worth constructing those shapes directly instead of routing through the general chunks->root builder." [elems] - (let [chunks (rechunk-balanced elems)] + (let [chunks (rechunk-balanced elems) + create tree/*t-join*] (case (count chunks) 0 nil 1 (chunk-node (nth chunks 0)) 2 (raw-rope-concat (chunk-node (nth chunks 0)) (chunk-node (nth chunks 1))) - 3 (rope-node-create (nth chunks 1) nil + 3 (create (nth chunks 1) nil (chunk-node (nth chunks 0)) (chunk-node (nth chunks 2))) (build-root chunks)))) (defn normalize-root - "Full O(n) rechunk of a rope tree so every chunk satisfies CSI." + "Full O(n) rechunk of a rope tree so every chunk satisfies CSI. + Note: materializes all chunks to elements and reassembles as vectors. + For string ropes, use chunks->root-csi on root->chunks instead. + + Accepts a flat-mode vector root as well, in which case it is rebuilt + into a balanced tree via `coll->root`." [root] - (if (leaf? root) - nil + (cond + (leaf? root) nil + (instance? clojure.lang.APersistentVector root) + (coll->root root) + :else (chunks->root (rechunk-balanced (vec (mapcat seq (root->chunks root))))))) (defn- rope-remove-greatest [n] - (let [create rope-node-create] + (let [create tree/*t-join*] (letfn [(rm-greatest [n] (cond (leaf? n) (throw (ex-info "remove-greatest: empty rope" {:node n})) @@ -200,41 +333,41 @@ (rm-greatest n)))) (defn- raw-rope-concat - [l r] - (let [create rope-node-create] - (letfn [(cat [l r] - (cond - (leaf? l) r - (leaf? r) l - :else - (let [lw (tree/node-weight l) - rw (tree/node-weight r)] - (cond - (< (* tree/+delta+ lw) rw) - (let [rk (node-chunk r) - rl (-l r) - rr (-r r)] - (tree/node-stitch rk nil (cat l rl) rr create)) - - (< (* tree/+delta+ rw) lw) - (let [lk (node-chunk l) - ll (-l l) - lr (-r l)] - (tree/node-stitch lk nil ll (cat lr r) create)) - - :else - (let [[chunk _] (tree/node-least-kv r)] - (tree/node-stitch chunk nil l - (tree/node-remove-least r create) - create))))))] - (cat l r)))) + ([l r] (raw-rope-concat l r tree/*t-join*)) + ([l r create] + (letfn [(cat [l r] + (cond + (leaf? l) r + (leaf? r) l + :else + (let [lw (tree/node-weight l) + rw (tree/node-weight r)] + (cond + (< (* tree/+delta+ lw) rw) + (let [rk (node-chunk r) + rl (-l r) + rr (-r r)] + (tree/node-stitch rk nil (cat l rl) rr create)) + + (< (* tree/+delta+ rw) lw) + (let [lk (node-chunk l) + ll (-l l) + lr (-r l)] + (tree/node-stitch lk nil ll (cat lr r) create)) + + :else + (let [[chunk _] (tree/node-least-kv r)] + (tree/node-stitch chunk nil l + (tree/node-remove-least r create) + create))))))] + (cat l r)))) (defn- rope-join "Balanced join: elements of l, then chunk, then elements of r. Analogous to node-concat3 but positional rather than comparator-ordered. Cost is O(|height(l) - height(r)|) when both are non-leaf." [chunk l r] - (let [create rope-node-create] + (let [create tree/*t-join*] (cond (and (leaf? l) (leaf? r)) (chunk-node chunk) @@ -272,16 +405,17 @@ [root] (if (leaf? root) root - (if (or (>= (count (node-chunk (tree/node-greatest root))) +min-chunk-size+) - (<= (tree/node-size root) 1)) - root - (let [last-chunk (node-chunk (tree/node-greatest root)) - rest (rope-remove-greatest root) - prev-chunk (node-chunk (tree/node-greatest rest)) - rest2 (rope-remove-greatest rest) - combined (into prev-chunk last-chunk) - mid (rechunk-root combined)] - (raw-rope-concat rest2 mid))))) + (let [minsz (long *min-chunk-size*)] + (if (or (>= (chunk-length (node-chunk (tree/node-greatest root))) minsz) + (<= (tree/node-size root) 1)) + root + (let [last-chunk (node-chunk (tree/node-greatest root)) + rest (rope-remove-greatest root) + prev-chunk (node-chunk (tree/node-greatest rest)) + rest2 (rope-remove-greatest rest) + combined (chunk-merge prev-chunk last-chunk) + mid (rechunk-root combined)] + (raw-rope-concat rest2 mid)))))) (defn- ensure-left-fringe "Restore CSI when the leftmost chunk may be undersized. @@ -292,23 +426,25 @@ [root] (if (leaf? root) root - (let [create rope-node-create] - (if (or (>= (count (node-chunk (tree/node-least root))) +min-chunk-size+) + (let [create tree/*t-join* + target (long *target-chunk-size*) + minsz (long *min-chunk-size*)] + (if (or (>= (chunk-length (node-chunk (tree/node-least root))) minsz) (<= (tree/node-size root) 1)) root (let [first-chunk (node-chunk (tree/node-least root)) rest (tree/node-remove-least root create) next-chunk (node-chunk (tree/node-least rest)) rest2 (tree/node-remove-least rest create) - combined (into first-chunk next-chunk) - cn (count combined)] - (if (<= cn +target-chunk-size+) + combined (chunk-merge first-chunk next-chunk) + cn (long (chunk-length combined))] + (if (<= cn target) (raw-rope-concat (chunk-node combined) rest2) ;; Split at min: left piece = min (valid internal), ;; right piece = cn - min which is in [min+1, target-1] ;; because cn is in (target, 2*target) and min = target/2. - (let [c1 (subvec combined 0 +min-chunk-size+) - c2 (subvec combined +min-chunk-size+)] + (let [c1 (chunk-slice combined 0 minsz) + c2 (chunk-slice combined minsz cn)] (raw-rope-concat (raw-rope-concat (chunk-node c1) (chunk-node c2)) rest2)))))))) @@ -324,12 +460,13 @@ internal, so it must be >= min. If the combined boundary chunks are still below min, pull one more neighbor chunk." [l r lchunk rchunk] - (let [create rope-node-create + (let [create tree/*t-join* + minsz (long *min-chunk-size*) l' (rope-remove-greatest l) r' (tree/node-remove-least r create) - combined (into lchunk rchunk) - cn (count combined)] - (if (or (>= cn +min-chunk-size+) + combined (chunk-merge lchunk rchunk) + cn (long (chunk-length combined))] + (if (or (>= cn minsz) (and (leaf? l') (leaf? r'))) ;; Combined is large enough or it is the only content (let [mid (rechunk-root combined)] @@ -338,13 +475,13 @@ (if-not (leaf? l') (let [prev (node-chunk (tree/node-greatest l')) l'' (rope-remove-greatest l') - all (into prev combined) + all (chunk-merge prev combined) mid (rechunk-root all)] (raw-rope-concat (raw-rope-concat l'' mid) r')) ;; l' is empty so r' must be non-empty (both-empty handled above) (let [nxt (node-chunk (tree/node-least r')) r'' (tree/node-remove-least r' create) - all (into combined nxt) + all (chunk-merge combined nxt) mid (rechunk-root all)] (raw-rope-concat (raw-rope-concat l' mid) r'')))))) @@ -362,13 +499,14 @@ (leaf? l) r (leaf? r) l :else - (let [lchunk (node-chunk (tree/node-greatest l)) + (let [minsz (long *min-chunk-size*) + lchunk (node-chunk (tree/node-greatest l)) rchunk (node-chunk (tree/node-least r)) ;; l's rightmost becomes internal after concat - l-ok (>= (count lchunk) +min-chunk-size+) + l-ok (>= (chunk-length lchunk) minsz) ;; r's leftmost stays as rightmost (any size ok) when r has 1 chunk r-ok (or (<= (tree/node-size r) 1) - (>= (count rchunk) +min-chunk-size+))] + (>= (chunk-length rchunk) minsz))] (if (and l-ok r-ok) (raw-rope-concat l r) (merge-boundary l r lchunk rchunk))))) @@ -384,29 +522,44 @@ (let [l (-l n) ls (if (leaf? l) 0 (long (-v l))) ck (-k n) - cs (long (.count ^clojure.lang.Counted ck)) + cs (long (chunk-length ck)) rs (+ ls cs)] (cond (< i ls) (recur l i) - (< i rs) (.nth ^clojure.lang.Indexed ck (unchecked-int (- i ls))) + (< i rs) (chunk-nth ck (- i ls)) :else (recur (-r n) (- i rs)))))) +(defn rope-chunk-at + "Find the chunk containing index i and its global start offset. + Returns [chunk chunk-start-offset]. Index must be in bounds." + [root ^long i] + (loop [n root, i i, offset (long 0)] + (let [l (-l n) + ls (if (leaf? l) 0 (long (-v l))) + ck (-k n) + cs (long (chunk-length ck)) + rs (+ ls cs)] + (cond + (< i ls) (recur l i offset) + (< i rs) [ck (+ offset ls)] + :else (recur (-r n) (- i rs) (+ offset rs)))))) + (defn rope-assoc [root ^long i x] - (let [create rope-node-create] + (let [create tree/*t-join*] (letfn [(assoc* [n ^long i] (let [ck (-k n) l (-l n) r (-r n) ls (if (leaf? l) 0 (long (-v l))) - cs (long (.count ^clojure.lang.Counted ck)) + cs (long (chunk-length ck)) rs (+ ls cs)] (cond (< i ls) (tree/node-stitch ck nil (assoc* l i) r create) (< i rs) - (create (assoc ck (- i ls) x) nil l r) + (create (chunk-update ck (- i ls) x) nil l r) :else (tree/node-stitch ck nil l @@ -434,7 +587,7 @@ l (-l n) r (-r n) ls (if (leaf? l) 0 (long (-v l))) - cs (long (.count ^clojure.lang.Counted chunk)) + cs (long (chunk-length chunk)) rs (+ ls cs)] (cond (< i ls) @@ -446,12 +599,12 @@ (< i rs) (let [offset (- i ls) - lc (subvec chunk 0 offset) - rc (subvec chunk offset cs)] - [(if (pos? (count lc)) + lc (chunk-slice chunk 0 offset) + rc (chunk-slice chunk offset cs)] + [(if (pos? (chunk-length lc)) (raw-rope-concat l (chunk-node lc)) l) - (if (pos? (count rc)) + (if (pos? (chunk-length rc)) (raw-rope-concat (chunk-node rc) r) r)]) @@ -492,7 +645,7 @@ l (-l n) r (-r n) ls (if (leaf? l) 0 (long (-v l))) - cs (long (.count ^clojure.lang.Counted chunk)) + cs (long (chunk-length chunk)) rs (+ ls cs)] (cond (<= end ls) @@ -502,7 +655,7 @@ (slice* r (- start rs) (- end rs)) (and (>= start ls) (<= end rs)) - (chunk-node (subvec chunk (- start ls) (- end ls))) + (chunk-node (chunk-slice chunk (- start ls) (- end ls))) :else (let [left (when (< start ls) @@ -510,7 +663,7 @@ c0 (max 0 (- start ls)) c1 (min cs (- end ls)) mid-chunk (when (< c0 c1) - (subvec chunk c0 c1)) + (chunk-slice chunk c0 c1)) right (when (> end rs) (slice* r 0 (- end rs)))] (slice-3 left mid-chunk right))))))))] @@ -551,6 +704,77 @@ left+mid (rope-concat left+mid rr)))))) +(defn rope-splice-inplace + "Fused single-chunk splice: replace [start, end) with replacement-chunk in + a single tree traversal. Returns new root when [start, end) falls entirely + within one chunk and the result chunk is in [1, target], or nil to signal + fallback to the multi-traversal path. + + replacement-chunk may be nil for pure removal. The chunk type must match + the tree's chunk type (String for StringRope, vector for generic Rope)." + [root start end replacement-chunk create] + (when-not (leaf? root) + (let [start (long start) + end (long end) + target (long *target-chunk-size*) + minsz (long *min-chunk-size*) + ;; Single-chunk trees allow any result in [1, target]. + ;; Multi-chunk trees require >= min to preserve CSI. + single-chunk? (and (leaf? (-l root)) (leaf? (-r root))) + min-len (if single-chunk? 1 minsz)] + (letfn [(splice* [n ^long start ^long end] + (when-not (leaf? n) + (let [ck (-k n) + l (-l n) + r (-r n) + ls (if (leaf? l) 0 (long (-v l))) + cs (long (chunk-length ck)) + rs (+ ls cs)] + (cond + ;; Range starts in left subtree + (< start ls) + (when (<= end ls) + (when-let [new-l (splice* l start end)] + (tree/node-stitch ck nil new-l r create))) + + ;; Range starts in (or at end of) this chunk + (<= start rs) + (when (<= end rs) + (let [c-start (- start ls) + c-end (- end ls) + rep-len (if replacement-chunk + (long (chunk-length replacement-chunk)) + 0) + new-len (long (+ cs (- rep-len (- c-end c-start))))] + (cond + ;; Result fits — simple replacement + (and (>= new-len min-len) + (<= new-len target)) + (create (chunk-splice ck c-start c-end + replacement-chunk) + nil l r) + + ;; Overflow — build two halves directly, no intermediate + (> new-len target) + (let [half (quot new-len 2) + [c1 c2] (chunk-splice-split ck c-start c-end + replacement-chunk half) + c2-node (create c2 nil (leaf) (leaf))] + (tree/node-stitch c1 nil l + (if (leaf? r) + c2-node + (raw-rope-concat c2-node r create)) + create)) + + ;; Too small — fall back + :else nil))) + + ;; Range starts in right subtree + :else + (when-let [new-r (splice* r (- start rs) (- end rs))] + (tree/node-stitch ck nil l new-r create))))))] + (splice* root start end))))) + (defn rope-insert-root "Insert mid-root at start in root." [root ^long start mid-root] @@ -565,20 +789,20 @@ [root] (when-not (leaf? root) (let [chunk (node-chunk (tree/node-greatest root))] - (peek chunk)))) + (chunk-last chunk)))) (defn rope-pop-right [root] (if (leaf? root) (throw (IllegalStateException. "Can't pop empty vector")) - (let [create rope-node-create] + (let [create tree/*t-join*] (letfn [(pop* [n] (let [ck (-k n) l (-l n) r (-r n)] (if (leaf? r) - (if (> (.count ^clojure.lang.Counted ck) 1) - (create (pop ck) nil l r) + (if (> (chunk-length ck) 1) + (create (chunk-butlast ck) nil l r) l) (tree/node-stitch ck nil l (pop* r) create))))] (pop* root))))) @@ -587,15 +811,17 @@ [root x] (if (leaf? root) (chunk-node [x]) - (let [create rope-node-create] + (let [create tree/*t-join* + target (long *target-chunk-size*)] (letfn [(conj* [n] (let [ck (-k n) l (-l n) r (-r n)] (if (leaf? r) - (if (< (.count ^clojure.lang.Counted ck) +target-chunk-size+) - (create (conj ck x) nil l r) - (tree/node-stitch ck nil l (chunk-node [x]) create)) + (if (< (chunk-length ck) target) + (create (chunk-append ck x) nil l r) + (tree/node-stitch ck nil l + (chunk-node (chunk-of ck x)) create)) (tree/node-stitch ck nil l (conj* r) create))))] (conj* root))))) @@ -614,224 +840,8 @@ (when-not (leaf? root) (tree/node-key-seq-reverse root (tree/node-size root)))) -(defn- seq-equiv - "Element-wise sequential equivalence." - [s1 o] - (if-not (or (instance? clojure.lang.Sequential o) - (instance? java.util.List o)) - false - (loop [s1 (seq s1) s2 (seq o)] - (cond - (nil? s1) (nil? s2) - (nil? s2) false - (not (Util/equiv (first s1) (first s2))) false - :else (recur (next s1) (next s2)))))) - -(deftype RopeSeq [enum chunk ^long i cnt _meta] - clojure.lang.ISeq - (first [_] - (.nth ^clojure.lang.Indexed chunk (unchecked-int i))) - (next [_] - (let [next-cnt (when cnt (unchecked-dec-int cnt)) - next-i (unchecked-inc i)] - (if (< next-i (count chunk)) - (RopeSeq. enum chunk next-i next-cnt nil) - (when-let [e (tree/node-enum-rest enum)] - (let [chunk' (-k (tree/node-enum-first e))] - (RopeSeq. e chunk' 0 next-cnt nil)))))) - (more [this] - (or (.next this) ())) - (cons [this o] - (clojure.lang.Cons. o this)) - - clojure.lang.Seqable - (seq [this] this) - - clojure.lang.Sequential - - java.lang.Iterable - (iterator [this] - (SeqIterator. this)) - - clojure.lang.Counted - (count [_] - (or cnt - (loop [e enum - chunk chunk - i i - n 0] - (let [n (+ n (- (count chunk) i))] - (if-let [e' (tree/node-enum-rest e)] - (let [chunk' (-k (tree/node-enum-first e'))] - (recur e' chunk' 0 n)) - n))))) - - clojure.lang.IReduceInit - (reduce [_ f init] - (loop [e enum - chunk chunk - i i - acc init] - (let [acc (loop [idx i - acc acc] - (if (< idx (count chunk)) - (let [ret (f acc (.nth ^clojure.lang.Indexed chunk (unchecked-int idx)))] - (if (reduced? ret) - ret - (recur (unchecked-inc idx) ret))) - acc))] - (if (reduced? acc) - @acc - (if-let [e' (tree/node-enum-rest e)] - (let [chunk' (-k (tree/node-enum-first e'))] - (recur e' chunk' 0 acc)) - acc))))) - - clojure.lang.IReduce - (reduce [this f] - (if enum - (let [acc (.nth ^clojure.lang.Indexed chunk (unchecked-int i)) - next-i (unchecked-inc i)] - (if (< next-i (count chunk)) - (.reduce ^clojure.lang.IReduceInit - (RopeSeq. enum chunk next-i nil nil) f acc) - (if-let [e' (tree/node-enum-rest enum)] - (let [chunk' (-k (tree/node-enum-first e'))] - (.reduce ^clojure.lang.IReduceInit - (RopeSeq. e' chunk' 0 nil nil) f acc)) - acc))) - (f))) - - clojure.lang.IHashEq - (hasheq [this] - (Murmur3/hashOrdered this)) - - clojure.lang.IPersistentCollection - (empty [_] ()) - (equiv [this o] - (seq-equiv this o)) - - Object - (hashCode [this] - (Util/hash this)) - (equals [this o] - (Util/equals this o)) - - clojure.lang.IMeta - (meta [_] _meta) - - clojure.lang.IObj - (withMeta [_ m] - (RopeSeq. enum chunk i cnt m))) - -(deftype RopeSeqReverse [enum chunk ^long i cnt _meta] - clojure.lang.ISeq - (first [_] - (.nth ^clojure.lang.Indexed chunk (unchecked-int i))) - (next [_] - (let [next-cnt (when cnt (unchecked-dec-int cnt))] - (if (pos? i) - (RopeSeqReverse. enum chunk (unchecked-dec i) next-cnt nil) - (when-let [e (tree/node-enum-prior enum)] - (let [chunk' (-k (tree/node-enum-first e))] - (RopeSeqReverse. e chunk' (dec (count chunk')) next-cnt nil)))))) - (more [this] - (or (.next this) ())) - (cons [this o] - (clojure.lang.Cons. o this)) - - clojure.lang.Seqable - (seq [this] this) - - clojure.lang.Sequential - - java.lang.Iterable - (iterator [this] - (SeqIterator. this)) - - clojure.lang.Counted - (count [_] - (or cnt - (loop [e enum - chunk chunk - i i - n 0] - (let [n (+ n (inc i))] - (if-let [e' (tree/node-enum-prior e)] - (let [chunk' (-k (tree/node-enum-first e'))] - (recur e' chunk' (dec (count chunk')) n)) - n))))) - - clojure.lang.IReduceInit - (reduce [_ f init] - (loop [e enum - chunk chunk - i i - acc init] - (let [acc (loop [idx i - acc acc] - (if (neg? idx) - acc - (let [ret (f acc (.nth ^clojure.lang.Indexed chunk (unchecked-int idx)))] - (if (reduced? ret) - ret - (recur (unchecked-dec idx) ret)))))] - (if (reduced? acc) - @acc - (if-let [e' (tree/node-enum-prior e)] - (let [chunk' (-k (tree/node-enum-first e'))] - (recur e' chunk' (dec (count chunk')) acc)) - acc))))) - - clojure.lang.IReduce - (reduce [this f] - (if enum - (let [acc (.nth ^clojure.lang.Indexed chunk (unchecked-int i))] - (if (pos? i) - (.reduce ^clojure.lang.IReduceInit - (RopeSeqReverse. enum chunk (unchecked-dec i) nil nil) f acc) - (if-let [e' (tree/node-enum-prior enum)] - (let [chunk' (-k (tree/node-enum-first e'))] - (.reduce ^clojure.lang.IReduceInit - (RopeSeqReverse. e' chunk' (dec (count chunk')) nil nil) f acc)) - acc))) - (f))) - - clojure.lang.IHashEq - (hasheq [this] - (Murmur3/hashOrdered this)) - - clojure.lang.IPersistentCollection - (empty [_] ()) - (equiv [this o] - (seq-equiv this o)) - - Object - (hashCode [this] - (Util/hash this)) - (equals [this o] - (Util/equals this o)) - - clojure.lang.IMeta - (meta [_] _meta) - - clojure.lang.IObj - (withMeta [_ m] - (RopeSeqReverse. enum chunk i cnt m))) - -(defn rope-seq - [root] - (when-let [enum (tree/node-enumerator root)] - (let [chunk (-k (tree/node-enum-first enum))] - (RopeSeq. enum chunk 0 (rope-size root) nil)))) - -(defn rope-rseq - [root] - (when-let [enum (tree/node-enumerator-reverse root)] - (let [chunk (-k (tree/node-enum-first enum))] - (RopeSeqReverse. enum chunk (dec (count chunk)) (rope-size root) nil)))) - (defn rope-chunks-reduce + "Reduce over chunks (not individual elements) of a rope tree." ([f init root] (if (leaf? root) init @@ -841,94 +851,66 @@ (f) (tree/node-reduce-keys f root)))) -(defn- reduce-chunk-indexed - "Fallback reduce for SubVector and other non-IReduceInit chunks." - [f acc chunk] - (let [^clojure.lang.Indexed chunk chunk - n (.count ^clojure.lang.Counted chunk)] - (loop [i (int 0) acc acc] - (if (< i n) - (let [ret (f acc (.nth chunk i))] - (if (reduced? ret) - ret - (recur (unchecked-inc-int i) ret))) - acc)))) + +(defn- rope-tree-walk + "In-order tree walk reducing with wf (a wrapper around f that tracks + early termination via the stopped volatile). Left subtree recurses on + the stack (bounded by height O(log n)); right subtree uses recur for + zero stack growth. Returns result, possibly (reduced ...)." + [wf stopped init root] + (letfn [(reduce-chunk [acc chunk] + (let [result (chunk-reduce-init chunk wf acc)] + (if @stopped (reduced result) result))) + (walk [acc n] + (if (leaf? n) + acc + (let [acc (walk acc (-l n))] + (if (reduced? acc) + acc + (let [acc (reduce-chunk acc (-k n))] + (if (reduced? acc) + acc + (recur acc (-r n))))))))] + (walk init root))) + +(defn- wrap-reduce-fn + "Create the stopped volatile + wrapper fn pair for rope-reduce. + chunk-reduce-init derefs reduced values internally, so the stopped + volatile provides a side-channel for the walk to detect early stop." + [f] + (let [stopped (volatile! false) + wf (fn [acc x] + (let [ret (f acc x)] + (if (reduced? ret) + (do (vreset! stopped true) (reduced @ret)) + ret)))] + [stopped wf])) (defn rope-reduce "Direct in-order tree walk: left subtree, chunk, right subtree. Bypasses the enumerator infrastructure to eliminate EnumFrame allocation - and per-chunk lambda overhead. The right-subtree continuation uses recur - for zero stack growth on that branch; left-subtree recursion depth is - bounded by tree height (O(log n)). - - For IReduceInit chunks (PersistentVector), delegates to the vector's - native reduce for tight array iteration. A single volatile + wrapper - closure is allocated once per reduce call (not per chunk) to detect - early termination from reduced." + and per-chunk lambda overhead." ([f init root] (if (leaf? root) init - (let [stopped (volatile! false) - wf (fn [acc x] - (let [ret (f acc x)] - (if (reduced? ret) - (do (vreset! stopped true) (reduced @ret)) - ret)))] - (letfn [(reduce-chunk [acc chunk] - (if (instance? clojure.lang.IReduceInit chunk) - (let [result (.reduce ^clojure.lang.IReduceInit chunk wf acc)] - (if @stopped (reduced result) result)) - (reduce-chunk-indexed f acc chunk))) - (walk [acc n] - (if (leaf? n) - acc - (let [acc (walk acc (-l n))] - (if (reduced? acc) - acc - (let [acc (reduce-chunk acc (-k n))] - (if (reduced? acc) - acc - (recur acc (-r n))))))))] - (let [result (walk init root)] - (if (reduced? result) @result result)))))) - ;; 1-arity duplicates the walk/reduce-chunk letfn because both arities - ;; close over the same volatile + wrapper; factoring it out would require - ;; passing them as arguments through the recursive walk for no readability gain. + (let [[stopped wf] (wrap-reduce-fn f) + result (rope-tree-walk wf stopped init root)] + (if (reduced? result) @result result)))) ([f root] (if (leaf? root) (f) - (let [stopped (volatile! false) - wf (fn [acc x] - (let [ret (f acc x)] - (if (reduced? ret) - (do (vreset! stopped true) (reduced @ret)) - ret)))] - (letfn [(reduce-chunk [acc chunk] - (if (instance? clojure.lang.IReduceInit chunk) - (let [result (.reduce ^clojure.lang.IReduceInit chunk wf acc)] - (if @stopped (reduced result) result)) - (reduce-chunk-indexed f acc chunk))) - (walk [acc n] - (if (leaf? n) - acc - (let [acc (walk acc (-l n))] - (if (reduced? acc) - acc - (let [acc (reduce-chunk acc (-k n))] - (if (reduced? acc) - acc - (recur acc (-r n))))))))] - ;; Bootstrap: first element as init, then reduce the rest - (let [least (tree/node-least root) - chunk0 (-k least) - init (.nth ^clojure.lang.Indexed chunk0 0) - acc0 (reduce-chunk init - (subvec chunk0 1 (.count ^clojure.lang.Counted chunk0)))] - (if (reduced? acc0) - @acc0 - (let [rest-root (tree/node-remove-least root rope-node-create) - result (walk acc0 rest-root)] - (if (reduced? result) @result result))))))))) + (let [[stopped wf] (wrap-reduce-fn f) + least (tree/node-least root) + chunk0 (-k least) + init (chunk-nth chunk0 0) + rest0 (chunk-slice chunk0 1 (chunk-length chunk0)) + acc0 (let [result (chunk-reduce-init rest0 wf init)] + (if @stopped (reduced result) result))] + (if (reduced? acc0) + @acc0 + (let [rest-root (tree/node-remove-least root) + result (rope-tree-walk wf stopped acc0 rest-root)] + (if (reduced? result) @result result))))))) (defn rope-fold "Parallel fold over the rope's existing tree shape. @@ -938,9 +920,7 @@ left-to-right order, using subtree sizes to decide when to stop splitting." [root ^long n combinef reducef] (letfn [(reduce-chunk [acc chunk] - (if (instance? clojure.lang.IReduceInit chunk) - (.reduce ^clojure.lang.IReduceInit chunk reducef acc) - (reduce-chunk-indexed reducef acc chunk))) + (chunk-reduce-init chunk reducef acc)) (fold-node [node] (cond (leaf? node) @@ -982,9 +962,7 @@ (letfn [(walk [n] (when-not (leaf? n) (walk (-l n)) - (let [chunk (-k n)] - (dotimes [i (count chunk)] - (.append sb (nth chunk i)))) + (chunk-append-sb (-k n) sb) (walk (-r n))))] (walk root)) (.toString sb)))) @@ -997,13 +975,31 @@ "Check that a rope root satisfies the Chunk Size Invariant: - every chunk has size in [1, target] - every chunk except the rightmost has size >= min - Returns true if CSI holds, false otherwise." - [root] - (if (leaf? root) - true - (let [chunks (root->chunks root) - sizes (mapv count chunks)] - (and - (every? pos? sizes) - (every? #(<= % +target-chunk-size+) sizes) - (every? #(>= % +min-chunk-size+) (butlast sizes)))))) + Returns true if CSI holds, false otherwise. + + A flat-mode root (a raw PersistentVector that some rope variants use + as a single-chunk optimization) is trivially valid when its size fits + in `[0, target]`. + + Reads the currently bound `*target-chunk-size*` and `*min-chunk-size*`, + so callers testing a particular rope variant should invoke this from + inside that variant's `with-tree` binding (or the 3-arity form that + takes explicit sizes)." + ([root] + (invariant-valid? root (long *target-chunk-size*) (long *min-chunk-size*))) + ([root ^long target ^long minsz] + (cond + (leaf? root) true + + ;; Flat-mode root: generic Rope stores small ropes as a bare + ;; APersistentVector. Always valid as long as size ≤ target. + (instance? clojure.lang.APersistentVector root) + (<= (.count ^clojure.lang.Counted root) target) + + :else + (let [chunks (root->chunks root) + sizes (mapv #(long (chunk-length %)) chunks)] + (and + (every? pos? sizes) + (every? #(<= (long %) target) sizes) + (every? #(>= (long %) minsz) (butlast sizes))))))) diff --git a/src/ordered_collections/kernel/tree.clj b/src/ordered_collections/kernel/tree.clj index d6bb4e7..c186560 100644 --- a/src/ordered_collections/kernel/tree.clj +++ b/src/ordered_collections/kernel/tree.clj @@ -1802,6 +1802,34 @@ (neg? c) (recur (-l n) k rank) :else (recur (-r n) k (+ 1 rank (node-size (-l n)))))))))) +(defn node-rank-long + "Primitive-specialized rank for Long keys. Bypasses Comparator dispatch + by using Long/compare directly. Returns the 0-based rank of k, or nil + if absent." + [n ^long k] + (loop [n n rank (long 0)] + (if (leaf? n) + nil + (let [nk (long (-k n)) + c (Long/compare k nk)] + (cond + (zero? c) (+ rank (node-size (-l n))) + (neg? c) (recur (-l n) rank) + :else (recur (-r n) (+ 1 rank (node-size (-l n))))))))) + +(defn node-rank-string + "String-specialized rank. Uses String.compareTo directly. Returns the + 0-based rank of k, or nil if absent." + [n ^String k] + (loop [n n rank (long 0)] + (if (leaf? n) + nil + (let [c (.compareTo k ^String (-k n))] + (cond + (zero? c) (+ rank (node-size (-l n))) + (neg? c) (recur (-l n) rank) + :else (recur (-r n) (+ 1 rank (node-size (-l n))))))))) + ;; MAYBE: other splits? <= < > ? (defn node-split-nth @@ -1825,6 +1853,28 @@ (fold #(conj! %1 (nval %2)) acc n) (persistent! acc))) +(defn node-build-sorted + "Build a balanced tree from pre-sorted [k v] pairs in O(n). The caller + must guarantee the input is sorted by key under the current ordering + and contains no duplicate keys. Uses the currently bound `*t-join*`. + + Because halves differ by at most 1 in count, the result is within the + weight-balance bound without any rotation, so `node-stitch` is not + needed on the build path." + [sorted-kvs] + (let [v (vec sorted-kvs) + n (count v) + create *t-join*] + (letfn [(build [^long lo ^long hi] + (if (>= lo hi) + (leaf) + (let [mid (quot (+ lo hi) 2) + kv (nth v mid)] + (create (nth kv 0) (nth kv 1) + (build lo mid) + (build (unchecked-inc mid) hi)))))] + (build 0 n)))) + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; Lazy Seq ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; @@ -1984,3 +2034,38 @@ (leaf? n) nil (not (pos? cnt)) nil true (->> from (node-split-nth n) node-seq (take cnt)))))) + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Direct Iterator +;; +;; Non-allocating `java.util.Iterator` over a tree. Avoids the per-step +;; seq-cell allocation incurred by `SeqIterator` over a lazy seq of entries +;; — advances the underlying enumerator in place via an unsynchronized +;; mutable field. + +(deftype NodeIterator [^:unsynchronized-mutable enum + ^clojure.lang.IFn advance + ^clojure.lang.IFn project] + java.util.Iterator + (hasNext [_] (some? enum)) + (next [_] + (when (nil? enum) + (throw (java.util.NoSuchElementException.))) + (let [node (node-enum-first enum) + result (project node)] + (set! enum (advance enum)) + result)) + (remove [_] + (throw (UnsupportedOperationException.)))) + +(defn node-iterator + "Non-allocating forward `java.util.Iterator` over tree nodes. Applies + `project` to each visited node before yielding (typically `-k`, `-v`, + `-kv`, or `identity`)." + [root project] + (NodeIterator. (node-enumerator root) node-enum-rest project)) + +(defn node-iterator-reverse + "Non-allocating reverse `java.util.Iterator` over tree nodes." + [root project] + (NodeIterator. (node-enumerator-reverse root) node-enum-prior project)) diff --git a/src/ordered_collections/protocol.clj b/src/ordered_collections/protocol.clj index 7418177..cd5d2af 100644 --- a/src/ordered_collections/protocol.clj +++ b/src/ordered_collections/protocol.clj @@ -1,5 +1,5 @@ (ns ordered-collections.protocol - (:refer-clojure :exclude [split-at subrange])) + (:refer-clojure :exclude [chunk-append split-at subrange])) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; @@ -165,6 +165,34 @@ "Get value for exactly k (maps only, no fuzzy matching).")) +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Rope Chunk Protocol +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defprotocol PRopeChunk + "Chunk abstraction allowing the rope kernel to be generic over chunk type. + Extended to APersistentVector (generic rope) and String (string rope). + Protocol dispatch cost (~2-3ns) is comparable to the type-hinted interface + calls it replaces." + (chunk-length [c] "Number of elements/characters in chunk") + (chunk-slice [c start end] "Subrange [start, end)") + (chunk-merge [c other] "Concatenate two chunks") + (chunk-nth [c i] "Element/character at index i") + (chunk-append [c x] "Append a single element/character") + (chunk-last [c] "Last element/character") + (chunk-butlast [c] "All but last element/character") + (chunk-update [c i x] "Replace element/character at index i") + (chunk-of [c x] "Create a new single-element chunk of the same type") + (chunk-reduce-init [c f init] "Reduce over elements/characters with init value. + When f returns (reduced x), stops and returns @(reduced x).") + (chunk-append-sb [c sb] "Append chunk contents to StringBuilder") + (chunk-splice [c start end replacement] "Replace [start, end) with replacement chunk (may be nil for deletion)") + (chunk-splice-split [c start end replacement half] + "Like chunk-splice but returns [left-chunk right-chunk] split at position + half in the spliced result. Avoids building the full intermediate chunk. + Used by the fused splice overflow path.")) + + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; Rope Protocol ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; diff --git a/src/ordered_collections/readers.clj b/src/ordered_collections/readers.clj index aaca481..5e5b070 100644 --- a/src/ordered_collections/readers.clj +++ b/src/ordered_collections/readers.clj @@ -5,7 +5,8 @@ the Clojure reader when it encounters tagged literals like #ordered/set. For use with clojure.edn/read-string, pass `readers` as the :readers option." - (:require [ordered-collections.core :as oc])) + (:require [ordered-collections.core :as oc] + [ordered-collections.types.byte-rope :as byte-rope])) (def ordered-set oc/ordered-set) (def ordered-map oc/ordered-map) @@ -15,17 +16,23 @@ (def priority-queue oc/priority-queue) (def ordered-multiset oc/ordered-multiset) (def rope oc/rope) +(def string-rope oc/string-rope) +;; Byte rope reader accepts a hex string (distinct from the byte-rope +;; constructor, which UTF-8-encodes strings). +(def byte-rope byte-rope/read-byte-rope) (def readers "Map of tag symbols to reader functions. Pass to clojure.edn/read-string as the :readers option: (clojure.edn/read-string {:readers readers} s)" - {'ordered/set ordered-set - 'ordered/map ordered-map - 'ordered/interval-set interval-set - 'ordered/interval-map interval-map - 'ordered/range-map range-map - 'ordered/priority-queue priority-queue - 'ordered/multiset ordered-multiset - 'ordered/rope rope}) + {'ordered/set ordered-set + 'ordered/map ordered-map + 'interval/set interval-set + 'interval/map interval-map + 'range/map range-map + 'priority/queue priority-queue + 'multi/set ordered-multiset + 'vec/rope rope + 'string/rope string-rope + 'byte/rope byte-rope}) diff --git a/src/ordered_collections/types/byte_rope.clj b/src/ordered_collections/types/byte_rope.clj new file mode 100644 index 0000000..5e8509c --- /dev/null +++ b/src/ordered_collections/types/byte_rope.clj @@ -0,0 +1,1442 @@ +(ns ordered-collections.types.byte-rope + "Persistent byte rope optimized for structural editing of binary data. + Backed by a chunked weight-balanced tree with byte[] chunks. + O(log n) concat, split, splice, insert, and remove. + + Bytes are exposed as unsigned longs in [0, 255]. Storage is signed Java + bytes (same bits). Equality with byte[] is content-based; equality with + Clojure vectors is always false to avoid signed/unsigned confusion. + + Small byte sequences (≤ +flat-threshold+ bytes) are stored as a raw + byte[] internally, giving byte[]-equivalent performance on reads. When + edits grow past the threshold, the representation is transparently + promoted to chunked tree form." + (:require [clojure.core.protocols :as cp] + [clojure.core.reducers :as r] + [ordered-collections.protocol :as proto] + [ordered-collections.kernel.node :as node + :refer [leaf leaf? -k -v -l -r]] + [ordered-collections.kernel.root] + [ordered-collections.kernel.tree :as tree] + [ordered-collections.kernel.rope :as ropetree]) + (:import [clojure.lang RT Murmur3 Util SeqIterator + IReduce IReduceInit + IEditableCollection ITransientCollection] + [ordered_collections.kernel.root INodeCollection] + [java.util ArrayList Arrays] + [java.io ByteArrayOutputStream InputStream OutputStream])) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Constants +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(def ^:const ^:private +flat-threshold+ + "Maximum byte length stored in flat (raw byte[]) representation. + Above this the rope promotes to chunked tree form. Matches the + StringRope threshold — small binary data stays on the fast path." + 1024) + +(def ^:const ^:private +target-chunk-size+ + "ByteRope target chunk size in bytes. Bound into the kernel's + `*target-chunk-size*` dynamic var via `with-tree`. Tuned via + `lein bench-rope-tuning`: at 500K bytes, 1024 beats 256 on every + operation — construction (~3x), nth (+50%), split (+47%), + splice (+21%), concat (~2.4x) — because byte[] System.arraycopy + throughput is so high that the win is almost entirely in reducing + per-chunk tree overhead." + 1024) + +(def ^:const ^:private +min-chunk-size+ + "ByteRope minimum internal chunk size (= target/2)." + 512) + +(def ^:private byte-array-class (Class/forName "[B")) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Tree binding macro +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defmacro ^:private with-tree + "Bind the kernel's dynamic rope context for ByteRope operations: + `tree/*t-join*` to the allocator, and the CSI target/min to the + ByteRope-specific constants. Every tree-mutating operation must + execute inside this binding." + [alloc & body] + `(binding [tree/*t-join* ~alloc + ropetree/*target-chunk-size* +target-chunk-size+ + ropetree/*min-chunk-size* +min-chunk-size+] + ~@body)) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Flat-mode helpers +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn- byte-array? + [x] + (instance? byte-array-class x)) + +(defn- flat? + "True when root is a raw byte[] (flat representation)." + [root] + (byte-array? root)) + +(defn- flat-size + "Size of a flat or tree root. Handles nil, byte[], and tree nodes." + ^long [root] + (cond + (nil? root) 0 + (byte-array? root) (alength ^bytes root) + :else (long (-v root)))) + +(defn- defensive-copy + ^bytes [^bytes b] + (Arrays/copyOf b (alength b))) + +(defn- ensure-tree-root + "Promote a flat byte[] root to a tree root. Returns tree nodes unchanged. + Caller must bind tree/*t-join*." + [root] + (cond + (nil? root) nil + (byte-array? root) (ropetree/bytes->root ^bytes root) + :else root)) + +(defn- flat-splice + "Bulk-arraycopy splice on a flat byte[]. Returns a byte[]." + ^bytes [^bytes s ^long start ^long end ^bytes rep] + (let [si (int start) + ei (int end) + rl (int (if rep (alength rep) 0)) + sl (alength s) + result (byte-array (+ (- sl (- ei si)) rl))] + (System/arraycopy s 0 result 0 si) + (when (pos? rl) + (System/arraycopy rep 0 result si rl)) + (System/arraycopy s ei result (+ si rl) (- sl ei)) + result)) + +(defn- make-root + "Create a ByteRope root from a byte[] result. Stays flat if ≤ threshold, + otherwise promotes to tree. Caller must bind tree/*t-join* for promotion." + [^bytes b] + (cond + (zero? (alength b)) nil + (<= (alength b) +flat-threshold+) b + :else (ropetree/bytes->root b))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Index / range checks +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn- valid-index? + [^long n ^long k] + (and (<= 0 k) (< k n))) + +(defn- insert-index? + [^long n ^long k] + (and (<= 0 k) (<= k n))) + +(defn- check-insert-index! + [^long n ^long k] + (when-not (insert-index? n k) + (throw (IndexOutOfBoundsException.)))) + +(defn- check-range! + [^long start ^long end ^long n] + (when (or (neg? start) (neg? end) (> start end) (> end n)) + (throw (IndexOutOfBoundsException.)))) + + +(declare ->byte-rope ->TransientByteRope ->ByteRope* + coll->bytes coll->tree-root) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Monomorphic tree reduce +;; +;; The generic kernel `rope-reduce` dispatches through `PRopeChunk/chunk-reduce-init` +;; for every leaf, which dispatches through `PRopeChunk/chunk-nth` for every +;; element. For a 500K-byte rope that's ~489 outer dispatches + 500K inner +;; dispatches. Replacing these with a direct byte[] loop is the single biggest +;; reduce speedup we can make — the inner loop is then a tight `aget` chain +;; the JIT can vectorize. +;; +;; Walks the tree in order: left subtree, chunk, right subtree. Returns acc +;; (possibly wrapped in Reduced). Caller unwraps. +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn- byte-rope-tree-reduce + "Reduce `f` over every byte in the rope tree rooted at `n` with starting + accumulator `acc`. Monomorphic — assumes chunks are byte[]. Returns either + the final acc or a `clojure.lang.Reduced` wrapping an early-exit value." + [f acc n] + (if (leaf? n) + acc + (let [l (-l n) + acc-left (if (leaf? l) acc (byte-rope-tree-reduce f acc l))] + (if (reduced? acc-left) + acc-left + (let [^bytes ck (-k n) + len (alength ck) + acc-chunk + (loop [i (int 0), acc acc-left] + (if (< i len) + (let [ret (f acc (bit-and (long (aget ck i)) 0xFF))] + (if (reduced? ret) + ret + (recur (unchecked-inc-int i) ret))) + acc))] + (if (reduced? acc-chunk) + acc-chunk + (byte-rope-tree-reduce f acc-chunk (-r n)))))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; ByteRopeSeq — forward seq over byte[] chunks, yielding unsigned longs +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn- seq-equiv + [s1 o] + (if-not (or (instance? clojure.lang.Sequential o) + (instance? java.util.List o)) + false + (loop [s1 (seq s1) s2 (seq o)] + (cond + (nil? s1) (nil? s2) + (nil? s2) false + (not= (first s1) (first s2)) false + :else (recur (next s1) (next s2)))))) + +(deftype ByteRopeSeq [enum ^bytes chunk ^long i cnt _meta] + clojure.lang.ISeq + (first [_] + (bit-and (long (aget chunk (unchecked-int i))) 0xFF)) + (next [_] + (let [next-cnt (when cnt (unchecked-dec-int cnt)) + next-i (unchecked-inc i)] + (if (< next-i (alength chunk)) + (ByteRopeSeq. enum chunk next-i next-cnt nil) + (when-let [e (tree/node-enum-rest enum)] + (let [chunk' ^bytes (-k (tree/node-enum-first e))] + (ByteRopeSeq. e chunk' 0 next-cnt nil)))))) + (more [this] + (or (.next this) ())) + (cons [this o] + (clojure.lang.Cons. o this)) + + clojure.lang.Seqable + (seq [this] this) + + clojure.lang.Sequential + + java.lang.Iterable + (iterator [this] + (SeqIterator. this)) + + clojure.lang.Counted + (count [_] + (or cnt + (loop [e enum + ^bytes chunk chunk + i i + n 0] + (let [n (+ n (- (alength chunk) i))] + (if-let [e' (tree/node-enum-rest e)] + (let [chunk' ^bytes (-k (tree/node-enum-first e'))] + (recur e' chunk' 0 n)) + n))))) + + IReduceInit + (reduce [_ f init] + (loop [e enum + ^bytes chunk chunk + i (int i) + acc init] + (let [acc (loop [idx i acc acc] + (if (< idx (alength chunk)) + (let [ret (f acc (bit-and (long (aget chunk (unchecked-int idx))) 0xFF))] + (if (reduced? ret) + ret + (recur (unchecked-inc-int idx) ret))) + acc))] + (if (reduced? acc) + @acc + (if-let [e' (tree/node-enum-rest e)] + (let [chunk' ^bytes (-k (tree/node-enum-first e'))] + (recur e' chunk' (int 0) acc)) + acc))))) + + IReduce + (reduce [this f] + (let [acc (bit-and (long (aget chunk (unchecked-int i))) 0xFF) + next-i (unchecked-inc i)] + (if (< next-i (alength chunk)) + (.reduce ^IReduceInit + (ByteRopeSeq. enum chunk next-i nil nil) f acc) + (if-let [e' (and enum (tree/node-enum-rest enum))] + (let [chunk' ^bytes (-k (tree/node-enum-first e'))] + (.reduce ^IReduceInit + (ByteRopeSeq. e' chunk' 0 nil nil) f acc)) + acc)))) + + clojure.lang.IHashEq + (hasheq [this] + (Murmur3/hashOrdered this)) + + clojure.lang.IPersistentCollection + (empty [_] ()) + (equiv [this o] + (seq-equiv this o)) + + Object + (hashCode [this] + (Util/hash this)) + (equals [this o] + (Util/equals this o)) + + clojure.lang.IMeta + (meta [_] _meta) + + clojure.lang.IObj + (withMeta [_ m] + (ByteRopeSeq. enum chunk i cnt m))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; ByteRopeSeqReverse — reverse seq over byte[] chunks +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(deftype ByteRopeSeqReverse [enum ^bytes chunk ^long i cnt _meta] + clojure.lang.ISeq + (first [_] + (bit-and (long (aget chunk (unchecked-int i))) 0xFF)) + (next [_] + (let [next-cnt (when cnt (unchecked-dec-int cnt))] + (if (pos? i) + (ByteRopeSeqReverse. enum chunk (unchecked-dec i) next-cnt nil) + (when-let [e (tree/node-enum-prior enum)] + (let [chunk' ^bytes (-k (tree/node-enum-first e))] + (ByteRopeSeqReverse. e chunk' (dec (alength chunk')) next-cnt nil)))))) + (more [this] + (or (.next this) ())) + (cons [this o] + (clojure.lang.Cons. o this)) + + clojure.lang.Seqable + (seq [this] this) + + clojure.lang.Sequential + + java.lang.Iterable + (iterator [this] + (SeqIterator. this)) + + clojure.lang.Counted + (count [_] + (or cnt + (loop [e enum + ^bytes chunk chunk + i i + n 0] + (let [n (+ n (inc i))] + (if-let [e' (tree/node-enum-prior e)] + (let [chunk' ^bytes (-k (tree/node-enum-first e'))] + (recur e' chunk' (dec (alength chunk')) n)) + n))))) + + IReduceInit + (reduce [_ f init] + (loop [e enum + ^bytes chunk chunk + i (int i) + acc init] + (let [acc (loop [idx i acc acc] + (if (neg? idx) + acc + (let [ret (f acc (bit-and (long (aget chunk (unchecked-int idx))) 0xFF))] + (if (reduced? ret) + ret + (recur (unchecked-dec-int idx) ret)))))] + (if (reduced? acc) + @acc + (if-let [e' (tree/node-enum-prior e)] + (let [chunk' ^bytes (-k (tree/node-enum-first e'))] + (recur e' chunk' (int (dec (alength chunk'))) acc)) + acc))))) + + IReduce + (reduce [this f] + (let [acc (bit-and (long (aget chunk (unchecked-int i))) 0xFF)] + (if (pos? i) + (.reduce ^IReduceInit + (ByteRopeSeqReverse. enum chunk (unchecked-dec i) nil nil) f acc) + (if-let [e' (and enum (tree/node-enum-prior enum))] + (let [chunk' ^bytes (-k (tree/node-enum-first e'))] + (.reduce ^IReduceInit + (ByteRopeSeqReverse. e' chunk' (dec (alength chunk')) nil nil) f acc)) + acc)))) + + clojure.lang.IHashEq + (hasheq [this] + (Murmur3/hashOrdered this)) + + clojure.lang.IPersistentCollection + (empty [_] ()) + (equiv [this o] + (seq-equiv this o)) + + Object + (hashCode [this] + (Util/hash this)) + (equals [this o] + (Util/equals this o)) + + clojure.lang.IMeta + (meta [_] _meta) + + clojure.lang.IObj + (withMeta [_ m] + (ByteRopeSeqReverse. enum chunk i cnt m))) + + +(defn- byte-rope-seq + [root] + (cond + (nil? root) nil + (byte-array? root) + (let [^bytes b root] + (when (pos? (alength b)) + ;; Flat mode: single-chunk seq with nil enum (node-enum-rest tolerates nil). + (ByteRopeSeq. nil b 0 (alength b) nil))) + :else + (when-let [enum (tree/node-enumerator root)] + (let [chunk ^bytes (-k (tree/node-enum-first enum))] + (ByteRopeSeq. enum chunk 0 (ropetree/rope-size root) nil))))) + +(defn- byte-rope-rseq + [root] + (cond + (nil? root) nil + (byte-array? root) + (let [^bytes b root] + (when (pos? (alength b)) + (ByteRopeSeqReverse. nil b (dec (alength b)) (alength b) nil))) + :else + (when-let [enum (tree/node-enumerator-reverse root)] + (let [chunk ^bytes (-k (tree/node-enum-first enum))] + (ByteRopeSeqReverse. enum chunk (dec (alength chunk)) + (ropetree/rope-size root) nil))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Equality / Hashing / Compare +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn- byte-rope-equiv + "Content equality. A ByteRope equals: + - another ByteRope with identical byte content + - a byte[] with identical byte content + Does not equal Clojure vectors, strings, or generic ropes — the signed + vs unsigned domain mismatch is too easy to get wrong." + [this o] + (cond + (identical? this o) true + + (instance? (class this) o) + (let [n (long (count this))] + (and (= n (long (count o))) + (Arrays/equals ^bytes (proto/rope-str this) + ^bytes (proto/rope-str o)))) + + (byte-array? o) + (Arrays/equals ^bytes (proto/rope-str this) ^bytes o) + + :else false)) + +(defn- byte-rope-hasheq + "Murmur3 hash over the sequence of unsigned byte values. + Consistent with seq-based equality of byte ropes; not compatible with + Clojure's default (identity) hash of a raw byte[]." + [root] + (if (nil? root) + (Murmur3/hashOrdered ()) + (Murmur3/hashOrdered (byte-rope-seq root)))) + +(defn- byte-rope-compare + "Unsigned lexicographic comparison. Consistent with Arrays.compareUnsigned + on byte[] and with protobuf/Okio/Netty ordering conventions." + ^long [this o] + (let [^bytes a (proto/rope-str this) + ^bytes b (cond + (byte-array? o) o + :else (proto/rope-str o))] + (Arrays/compareUnsigned a b))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; ByteRope +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(deftype ByteRope [root alloc _meta] + + java.io.Serializable + + INodeCollection + (getAllocator [_] alloc) + + clojure.lang.IMeta + (meta [_] _meta) + + clojure.lang.IObj + (withMeta [_ m] (->ByteRope* root alloc m)) + + java.lang.Comparable + (compareTo [this o] + (if (identical? this o) + 0 + (byte-rope-compare this o))) + + clojure.lang.Counted + (count [_] + (flat-size root)) + + clojure.lang.Indexed + ;; Monomorphic nth — inlines the tree walk with direct `alength`/`aget` + ;; on the known byte[] chunks, bypassing the PRopeChunk protocol dispatch + ;; that the generic kernel `rope-nth` incurs at every tree level. + (nth [_ i] + (let [ii (long i) + n (flat-size root)] + (when-not (valid-index? n ii) + (throw (IndexOutOfBoundsException.))) + (if (byte-array? root) + (bit-and (long (aget ^bytes root (unchecked-int ii))) 0xFF) + (loop [nd root, j ii] + (let [l (-l nd) + ls (if (leaf? l) 0 (long (-v l))) + ^bytes ck (-k nd) + cs (long (alength ck)) + rs (+ ls cs)] + (cond + (< j ls) (recur l j) + (< j rs) (bit-and (long (aget ck (unchecked-int (- j ls)))) 0xFF) + :else (recur (-r nd) (- j rs)))))))) + (nth [this i not-found] + (let [ii (long i) + n (flat-size root)] + (if (and (integer? i) (valid-index? n ii)) + (.nth this (int ii)) + not-found))) + + clojure.lang.ILookup + (valAt [this k] + (if (integer? k) + (.nth this (int k) nil) + nil)) + (valAt [this k not-found] + (if (integer? k) + (.nth this (int k) not-found) + not-found)) + + clojure.lang.IFn + (invoke [this k] + (.valAt this k)) + (invoke [this k not-found] + (.valAt this k not-found)) + (applyTo [this args] + (let [n (RT/boundedLength args 2)] + (case n + 0 (throw (clojure.lang.ArityException. n (.. this getClass getSimpleName))) + 1 (.invoke this (first args)) + 2 (.invoke this (first args) (second args)) + (throw (clojure.lang.ArityException. n (.. this getClass getSimpleName)))))) + + clojure.lang.IPersistentCollection + (cons [_ o] + (let [b (unchecked-byte (long o))] + (if (byte-array? root) + (let [^bytes s root + n (alength s)] + (if (< n +flat-threshold+) + (let [result (Arrays/copyOf s (unchecked-inc-int n))] + (aset result n b) + (->ByteRope* result alloc _meta)) + (with-tree alloc + (->ByteRope* (ropetree/rope-conj-right (ropetree/bytes->root s) b) + alloc _meta)))) + (if (nil? root) + (let [a (byte-array 1)] + (aset a 0 b) + (->ByteRope* a alloc _meta)) + (with-tree alloc + (->ByteRope* (ropetree/rope-conj-right root b) alloc _meta)))))) + (empty [_] + (->ByteRope* nil alloc _meta)) + (equiv [this o] + (byte-rope-equiv this o)) + + clojure.lang.IPersistentStack + (peek [_] + (cond + (nil? root) nil + (byte-array? root) + (let [^bytes s root + n (alength s)] + (when (pos? n) + (bit-and (long (aget s (unchecked-dec-int n))) 0xFF))) + :else + (ropetree/rope-peek-right root))) + (pop [_] + (cond + (nil? root) + (throw (IllegalStateException. "Can't pop empty byte-rope")) + + (byte-array? root) + (let [^bytes s root + n (alength s)] + (cond + (<= n 1) (->ByteRope* nil alloc _meta) + :else (->ByteRope* + (Arrays/copyOf s (unchecked-dec-int n)) + alloc _meta))) + + :else + (with-tree alloc + (->ByteRope* (ropetree/rope-pop-right root) alloc _meta)))) + + clojure.lang.Seqable + (seq [_] + (byte-rope-seq root)) + + clojure.lang.Reversible + (rseq [_] + (byte-rope-rseq root)) + + clojure.lang.Sequential + + java.lang.Iterable + (iterator [this] + (SeqIterator. (seq this))) + + IReduceInit + (reduce [_ f init] + (cond + (nil? root) init + + (byte-array? root) + (let [^bytes s root + len (alength s)] + (loop [i (int 0), acc init] + (if (< i len) + (let [ret (f acc (bit-and (long (aget s i)) 0xFF))] + (if (reduced? ret) + @ret + (recur (unchecked-inc-int i) ret))) + acc))) + + :else + (let [result (byte-rope-tree-reduce f init root)] + (if (reduced? result) @result result)))) + + IReduce + (reduce [this f] + (cond + (nil? root) (f) + + (byte-array? root) + (let [^bytes s root + len (alength s)] + (if (zero? len) + (f) + (loop [i (int 1) + acc (Long/valueOf (bit-and (long (aget s 0)) 0xFF))] + (if (< i len) + (let [ret (f acc (bit-and (long (aget s (unchecked-int i))) 0xFF))] + (if (reduced? ret) + @ret + (recur (unchecked-inc-int i) ret))) + acc)))) + + :else + ;; Seed the accumulator with the first byte, then reduce the rest. + (let [^ordered_collections.kernel.node.INode least (tree/node-least root) + ^bytes first-chunk (-k least) + first-byte (Long/valueOf (bit-and (long (aget first-chunk 0)) 0xFF)) + rest-root (tree/node-remove-least root) + ;; Pre-consume the rest of the first chunk before recursing. + rest-of-first-chunk-acc + (let [len (alength first-chunk)] + (loop [i (int 1), acc first-byte] + (if (< i len) + (let [ret (f acc (bit-and (long (aget first-chunk i)) 0xFF))] + (if (reduced? ret) + ret + (recur (unchecked-inc-int i) ret))) + acc)))] + (if (reduced? rest-of-first-chunk-acc) + @rest-of-first-chunk-acc + (let [result (if (leaf? rest-root) + rest-of-first-chunk-acc + (byte-rope-tree-reduce f rest-of-first-chunk-acc rest-root))] + (if (reduced? result) @result result)))))) + + cp/CollReduce + (coll-reduce [this f] + (.reduce ^IReduce this f)) + (coll-reduce [this f init] + (.reduce ^IReduceInit this f init)) + + r/CollFold + (coll-fold [this n combinef reducef] + (cond + (nil? root) (combinef) + (byte-array? root) (.reduce ^IReduceInit this reducef (combinef)) + :else (ropetree/rope-fold root (long n) combinef reducef))) + + clojure.lang.IHashEq + (hasheq [_] + (byte-rope-hasheq root)) + + clojure.lang.Associative + (containsKey [_ k] + (and (integer? k) (valid-index? (flat-size root) (long k)))) + (entryAt [this k] + (when (.containsKey this k) + (clojure.lang.MapEntry. k (.nth ^clojure.lang.Indexed this (int k))))) + (assoc [this k v] + (let [i (long k) + n (flat-size root) + b (unchecked-byte (long v))] + (cond + (not (insert-index? n i)) + (throw (IndexOutOfBoundsException.)) + + (= i n) + (.cons this v) + + :else + (if (byte-array? root) + (let [^bytes s root + result (Arrays/copyOf s (alength s))] + (aset result (int i) b) + (->ByteRope* result alloc _meta)) + (with-tree alloc + (->ByteRope* (ropetree/rope-assoc root i b) alloc _meta)))))) + + java.util.Collection + (toArray [_] + (let [n (flat-size root) + arr (object-array n)] + (cond + (zero? n) arr + + (byte-array? root) + (let [^bytes s root] + (dotimes [i n] + (aset arr i (Long/valueOf (bit-and (long (aget s i)) 0xFF)))) + arr) + + :else + (do + (ropetree/rope-reduce + (fn [^long i x] + (aset arr i (Long/valueOf (long x))) + (unchecked-inc i)) + (long 0) + root) + arr)))) + (isEmpty [_] + (zero? (flat-size root))) + (^boolean contains [_ x] + (if (integer? x) + (let [target (bit-and (long x) 0xFF)] + (cond + (nil? root) false + + (byte-array? root) + (let [^bytes s root, n (alength s)] + (loop [i 0] + (cond + (>= i n) false + (= (bit-and (long (aget s i)) 0xFF) target) true + :else (recur (unchecked-inc i))))) + + :else + (true? + (ropetree/rope-reduce + (fn [_ v] + (if (= (long v) target) (reduced true) false)) + false + root)))) + false)) + (containsAll [this c] + (every? #(.contains this %) c)) + (size [_] + (int (flat-size root))) + (add [_ _] + (throw (UnsupportedOperationException.))) + (addAll [_ _] + (throw (UnsupportedOperationException.))) + (^boolean remove [_ _] + (throw (UnsupportedOperationException.))) + (removeAll [_ _] + (throw (UnsupportedOperationException.))) + (retainAll [_ _] + (throw (UnsupportedOperationException.))) + (clear [_] + (throw (UnsupportedOperationException.))) + + proto/PRope + (rope-cat [this other] + (when-not (or (byte-array? other) (instance? ByteRope other)) + (throw (IllegalArgumentException. + "ByteRope rope-cat requires a ByteRope or byte[]"))) + (let [^bytes b1 (cond (nil? root) (byte-array 0) + (byte-array? root) root + :else nil) + ^bytes b2 (if (byte-array? other) + other + (let [r (.-root ^ByteRope other)] + (cond (nil? r) (byte-array 0) + (byte-array? r) r + :else nil)))] + ;; Fast path: both flat, combined fits threshold + (if (and b1 b2 (<= (+ (alength b1) (alength b2)) +flat-threshold+)) + (let [n1 (alength b1) + n2 (alength b2) + result (byte-array (+ n1 n2))] + (when (pos? n1) (System/arraycopy b1 0 result 0 n1)) + (when (pos? n2) (System/arraycopy b2 0 result n1 n2)) + (->ByteRope* (when (pos? (alength result)) result) alloc _meta)) + ;; Tree path + (with-tree alloc + (let [l (ensure-tree-root root) + r (if (byte-array? other) + (ropetree/bytes->root ^bytes other) + (ensure-tree-root (.-root ^ByteRope other)))] + (->ByteRope* (ropetree/rope-concat l r) alloc _meta)))))) + (rope-split [_ i] + (let [n (flat-size root) + ii (long i)] + (check-insert-index! n ii) + (if (byte-array? root) + (let [^bytes s root + li (int ii)] + [(->ByteRope* (when (pos? li) (Arrays/copyOfRange s 0 li)) alloc _meta) + (->ByteRope* (when (< li (alength s)) + (Arrays/copyOfRange s li (alength s))) + alloc _meta)]) + (with-tree alloc + (let [[l r] (ropetree/ensure-split-parts + (ropetree/rope-split-at root ii))] + [(->ByteRope* l alloc _meta) (->ByteRope* r alloc _meta)]))))) + (rope-sub [_ start end] + (let [n (flat-size root) + si (long start) + ei (long end)] + (check-range! si ei n) + (if (byte-array? root) + (let [^bytes s root] + (->ByteRope* + (when (> ei si) (Arrays/copyOfRange s (int si) (int ei))) + alloc _meta)) + (with-tree alloc + (->ByteRope* (ropetree/rope-subvec-root root si ei) alloc _meta))))) + (rope-insert [this i coll] + (let [n (flat-size root) + ii (long i)] + (check-insert-index! n ii) + (if (byte-array? root) + (let [^bytes ins (coll->bytes coll)] + (with-tree alloc + (->ByteRope* + (make-root (flat-splice ^bytes root ii ii ins)) + alloc _meta))) + (let [^bytes ins (coll->bytes coll) + ins-len (alength ins)] + (or (when (<= ins-len +target-chunk-size+) + (when-let [new-root (ropetree/rope-splice-inplace + root ii ii + (when (pos? ins-len) ins) + alloc)] + (->ByteRope* new-root alloc _meta))) + (with-tree alloc + (->ByteRope* (ropetree/rope-insert-root root ii + (coll->tree-root coll)) + alloc _meta))))))) + (rope-remove [this start end] + (let [n (flat-size root) + si (long start) + ei (long end)] + (check-range! si ei n) + (if (byte-array? root) + (let [^bytes s root] + (->ByteRope* + (let [^bytes result (flat-splice s si ei nil)] + (when (pos? (alength result)) result)) + alloc _meta)) + (or (when-let [new-root (ropetree/rope-splice-inplace + root si ei nil alloc)] + (->ByteRope* new-root alloc _meta)) + (with-tree alloc + (->ByteRope* (ropetree/rope-remove-root root si ei) + alloc _meta)))))) + (rope-splice [this start end coll] + (let [n (flat-size root) + si (long start) + ei (long end)] + (check-range! si ei n) + (if (byte-array? root) + (let [^bytes rep (coll->bytes coll)] + (with-tree alloc + (->ByteRope* + (make-root (flat-splice ^bytes root si ei rep)) + alloc _meta))) + (let [^bytes rep (coll->bytes coll) + rep-len (alength rep)] + (or (when (<= rep-len +target-chunk-size+) + (when-let [new-root (ropetree/rope-splice-inplace + root si ei + (when (pos? rep-len) rep) + alloc)] + (->ByteRope* new-root alloc _meta))) + (with-tree alloc + (let [mid-root (when (pos? rep-len) + (coll->tree-root coll))] + (->ByteRope* (ropetree/rope-splice-root root si ei mid-root) + alloc _meta)))))))) + (rope-chunks [_] + (cond + (nil? root) nil + (byte-array? root) (list root) + :else (ropetree/rope-chunks-seq root))) + (rope-str [_] + ;; rope-str on a ByteRope returns a byte[], not a String. + ;; The protocol name is a legacy of the StringRope-first design; for a + ;; ByteRope, materialization produces bytes. + (cond + (nil? root) (byte-array 0) + (byte-array? root) (defensive-copy ^bytes root) + :else (ropetree/byte-rope->bytes root))) + + IEditableCollection + (asTransient [_] + (->TransientByteRope + (ensure-tree-root root) + alloc (ArrayList.) (ByteArrayOutputStream.) 0 true _meta)) + + Object + (hashCode [this] + (Util/hash this)) + (equals [this o] + (byte-rope-equiv this o)) + (toString [this] + ;; Human-readable hex representation. For programmatic access to the + ;; raw bytes, use rope-str (which returns byte[]). + (let [^bytes b (proto/rope-str this) + n (alength b) + sb (StringBuilder. (* 2 n))] + (dotimes [i n] + (let [byte-val (bit-and (long (aget b i)) 0xFF)] + (.append sb (Character/forDigit (int (bit-shift-right byte-val 4)) 16)) + (.append sb (Character/forDigit (int (bit-and byte-val 0xF)) 16)))) + (.toString sb)))) + + +(defn- ->ByteRope* + "Construct a ByteRope." + [root alloc meta] + (ByteRope. root alloc meta)) + +(defn- coll->bytes + "Coerce a splice/insert argument to a byte[]." + ^bytes [coll] + (cond + (nil? coll) (byte-array 0) + (byte-array? coll) coll + (instance? ByteRope coll) (proto/rope-str ^ByteRope coll) + (sequential? coll) + (let [n (count coll) + a (byte-array n)] + (loop [s (seq coll), i 0] + (if s + (do (aset a i (unchecked-byte (long (first s)))) + (recur (next s) (unchecked-inc i))) + a))) + :else + (throw (IllegalArgumentException. + (str "ByteRope cannot coerce " (class coll) " to byte[]"))))) + +(defn- coll->tree-root + "Coerce a splice/insert argument to a tree root for the multi-traversal + fallback. Preserves existing ByteRope tree structure when possible." + [coll] + (cond + (instance? ByteRope coll) + (ensure-tree-root (.-root ^ByteRope coll)) + (byte-array? coll) (ropetree/bytes->root ^bytes coll) + :else (ropetree/bytes->root (coll->bytes coll)))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; TransientByteRope +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn- transient-byte-appended-root + "Build a rope tree from flushed chunks + tail. Caller must bind *t-join*." + [^ArrayList chunks ^ByteArrayOutputStream tail] + (let [chunk-count (.size chunks) + tail-empty? (zero? (.size tail))] + (cond + (and (zero? chunk-count) tail-empty?) + nil + + (zero? chunk-count) + (ropetree/chunks->root [(.toByteArray tail)]) + + tail-empty? + (ropetree/chunks->root (vec chunks)) + + :else + (ropetree/chunks->root-csi + (conj (vec chunks) (.toByteArray tail)))))) + +(def ^:private ^:const +transient-rebuild-threshold+ 4) + +(defn- transient-byte-final-root + "Merge original root with appended chunks/tail. Caller must bind *t-join*." + [root ^ArrayList chunks ^ByteArrayOutputStream tail] + (cond + (and (zero? (.size chunks)) (zero? (.size tail))) + root + + ;; Fast path: small tail, no flushed chunks — conj into root directly + (and (zero? (.size chunks)) (<= (.size tail) 32)) + (let [^bytes a (.toByteArray tail) + n (alength a)] + (if (nil? root) + (tree/*t-join* a nil (leaf) (leaf)) + (loop [i 0, r root] + (if (< i n) + (recur (unchecked-inc i) + (ropetree/rope-conj-right r (aget a i))) + r)))) + + :else + (let [appended-root (transient-byte-appended-root chunks tail) + appended-chunks (+ (.size chunks) (if (zero? (.size tail)) 0 1))] + (cond + (nil? root) + appended-root + + (<= appended-chunks +transient-rebuild-threshold+) + (ropetree/rope-concat root appended-root) + + :else + (ropetree/chunks->root-csi + (cond-> (vec (ropetree/root->chunks root)) + (pos? (.size chunks)) (into (vec chunks)) + (pos? (.size tail)) (conj (.toByteArray tail)))))))) + +(deftype TransientByteRope [^:unsynchronized-mutable root + alloc + ^ArrayList chunks + ^ByteArrayOutputStream tail + ^:unsynchronized-mutable chunk-bytes + ^:unsynchronized-mutable edit + _meta] + ITransientCollection + (conj [this x] + (when-not edit (throw (IllegalAccessError. "Transient used after persistent! call"))) + (.write tail (unchecked-int (long x))) + (when (>= (.size tail) +target-chunk-size+) + (.add chunks (.toByteArray tail)) + (set! chunk-bytes (+ chunk-bytes (.size tail))) + (.reset tail)) + this) + + (persistent [_] + (when-not edit (throw (IllegalAccessError. "Transient used after persistent! call"))) + (set! edit false) + (with-tree alloc + (let [tree-root (transient-byte-final-root root chunks tail) + ;; Demote to flat if the result is small enough + final-root (if (and tree-root + (<= (ropetree/rope-size tree-root) +flat-threshold+)) + (ropetree/byte-rope->bytes tree-root) + tree-root)] + (->ByteRope* final-root alloc _meta)))) + + clojure.lang.Counted + (count [_] + (+ (ropetree/rope-size root) chunk-bytes (.size tail))) + + clojure.lang.Indexed + (nth [this i] + (let [rs (ropetree/rope-size root) + j (- (long i) rs)] + (cond + (and (>= i 0) (< i rs)) + (ropetree/rope-nth root (long i)) + + (and (>= j 0) (< j chunk-bytes)) + (let [chunk-idx (quot j +target-chunk-size+) + offset (rem j +target-chunk-size+) + ^bytes chunk (.get chunks (int chunk-idx))] + (bit-and (long (aget chunk (unchecked-int offset))) 0xFF)) + + (< (- j chunk-bytes) (.size tail)) + (let [^bytes tb (.toByteArray tail)] + (bit-and (long (aget tb (unchecked-int (- j chunk-bytes)))) 0xFF)) + + :else + (throw (IndexOutOfBoundsException.))))) + (nth [this i not-found] + (let [rs (ropetree/rope-size root) + n (+ rs chunk-bytes (.size tail))] + (if (and (>= (long i) 0) (< (long i) n)) + (.nth this (int i)) + not-found)))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Constructors +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn byte-rope + "Create a persistent byte rope for structural editing of binary data. + Backed by a chunked weight-balanced tree with byte[] chunks: + O(log n) concat, split, splice, insert, and remove. + + Bytes are exposed as unsigned longs in [0, 255]. Storage is signed Java + bytes. Use the input as raw bytes: + + (byte-rope) ;=> empty + (byte-rope (byte-array [1 2 3])) ;=> from byte[] (defensively copied) + (byte-rope [0 128 255]) ;=> from seq of unsigned longs + (byte-rope \"hello\") ;=> UTF-8 encoding of the string + + Small byte sequences (≤ 1024 bytes) are stored as a raw byte[] internally + for zero-overhead reads. Edits that grow past the threshold are + transparently promoted to chunked tree form." + ([] + (->ByteRope* nil ropetree/byte-rope-node-create {})) + ([x] + (let [^bytes b (cond + (nil? x) (byte-array 0) + (byte-array? x) (defensive-copy x) + (instance? ByteRope x) + (proto/rope-str ^ByteRope x) + (string? x) (.getBytes ^String x "UTF-8") + (instance? InputStream x) + (with-open [in ^InputStream x + out (ByteArrayOutputStream.)] + (let [buf (byte-array 4096)] + (loop [] + (let [n (.read in buf)] + (when (pos? n) + (.write out buf 0 n) + (recur)))) + (.toByteArray out))) + (sequential? x) + (let [n (count x) + a (byte-array n)] + (loop [s (seq x), i 0] + (if s + (do (aset a i (unchecked-byte (long (first s)))) + (recur (next s) (unchecked-inc i))) + a))) + :else + (throw (IllegalArgumentException. + (str "byte-rope cannot coerce " (class x)))))] + (cond + (zero? (alength b)) + (->ByteRope* nil ropetree/byte-rope-node-create {}) + + (<= (alength b) +flat-threshold+) + (->ByteRope* b ropetree/byte-rope-node-create {}) + + :else + (with-tree ropetree/byte-rope-node-create + (->ByteRope* (ropetree/bytes->root b) + ropetree/byte-rope-node-create {})))))) + +(defn byte-rope-concat + "Concatenate byte ropes or byte arrays. + One argument: returns it as a byte rope. + Two arguments: O(log n) binary tree join. + Three or more: O(total chunks) bulk construction." + ([x] + (->byte-rope x)) + ([left right] + (proto/rope-cat (->byte-rope left) (->byte-rope right))) + ([left right & more] + (with-tree ropetree/byte-rope-node-create + (let [alloc ropetree/byte-rope-node-create + all (list* left right more) + chunks (into [] + (mapcat (fn [x] + (let [br (->byte-rope x) + rt (.-root ^ByteRope br)] + (cond + (nil? rt) [] + (byte-array? rt) [rt] + :else (ropetree/root->chunks rt))))) + all)] + (->ByteRope* (ropetree/chunks->root-csi chunks) + alloc (or (meta left) {})))))) + +(defn- ->byte-rope + "Coerce x to a ByteRope." + [x] + (cond + (instance? ByteRope x) x + :else (byte-rope x))) + +(defn read-byte-rope + "Reader function for #byte/rope tagged literals. Input is a hex string." + [^String hex] + (let [n (.length hex) + _ (when (odd? n) + (throw (IllegalArgumentException. + "byte/rope hex literal must have an even number of characters"))) + b (byte-array (quot n 2))] + (dotimes [i (quot n 2)] + (let [hi (Character/digit (.charAt hex (* 2 i)) 16) + lo (Character/digit (.charAt hex (unchecked-inc-int (* 2 i))) 16)] + (when (or (neg? hi) (neg? lo)) + (throw (IllegalArgumentException. + (str "byte/rope: invalid hex character at position " (* 2 i))))) + (aset b i (unchecked-byte (bit-or (bit-shift-left hi 4) lo))))) + (byte-rope b))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Literal Representation +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defmethod print-method ByteRope [^ByteRope r ^java.io.Writer w] + (.write w "#byte/rope ") + (print-method (.toString r) w)) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Materialization +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn byte-rope-bytes + "Materialize a byte rope to a byte[]. Defensive copy — the caller may + mutate the returned array without affecting the rope." + ^bytes [^ByteRope br] + (proto/rope-str br)) + +(defn byte-rope-hex + "Return the byte rope's contents as a lowercase hex string." + ^String [^ByteRope br] + (.toString br)) + +(defn byte-rope-write + "Stream a byte rope's contents to an OutputStream, chunk by chunk. + Writes each chunk via one OutputStream.write call so large ropes don't + materialize the whole content as a single byte[]." + [^ByteRope br ^OutputStream out] + (doseq [chunk (proto/rope-chunks br)] + (let [^bytes c chunk] + (.write out c 0 (alength c)))) + nil) + +(defn byte-rope-input-stream + "Return a java.io.InputStream that reads over the byte rope's contents. + Stateful — each call returns a fresh stream." + ^InputStream [^ByteRope br] + (let [^bytes data (proto/rope-str br) + n (alength data) + pos (int-array 1)] + (proxy [InputStream] [] + (available [] + (unchecked-subtract-int n (aget pos 0))) + (read + ([] + (let [p (aget pos 0)] + (if (>= p n) + -1 + (do (aset pos 0 (unchecked-inc-int p)) + (bit-and (long (aget data p)) 0xFF))))) + ([buf] + (let [^bytes buf buf] + (.read ^InputStream this buf 0 (alength buf)))) + ([buf off len] + (let [^bytes buf buf + off (int off) + len (int len)] + ;; InputStream contract: validate bounds before reading. + (when (or (neg? off) (neg? len) (> (+ off len) (alength buf))) + (throw (IndexOutOfBoundsException.))) + (if (zero? len) + 0 + (let [p (aget pos 0)] + (if (>= p n) + -1 + (let [remaining (unchecked-subtract-int n p) + want (min len remaining)] + (System/arraycopy data p buf off want) + (aset pos 0 (unchecked-add-int p want)) + want)))))))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Multi-Byte Reads +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn- read-byte* + ^long [^ByteRope br ^long offset] + (long (.nth ^clojure.lang.Indexed br (int offset)))) + +(defn byte-rope-get-byte + "Return the unsigned byte value (long in [0, 255]) at offset." + ^long [^ByteRope br offset] + (read-byte* br (long offset))) + +(defn byte-rope-get-short + "Return a big-endian unsigned 16-bit integer (long in [0, 65535]) at offset. + Reads two bytes starting at offset." + ^long [^ByteRope br offset] + (let [off (long offset) + hi (read-byte* br off) + lo (read-byte* br (unchecked-inc off))] + (bit-or (bit-shift-left hi 8) lo))) + +(defn byte-rope-get-short-le + "Return a little-endian unsigned 16-bit integer (long in [0, 65535]) at offset." + ^long [^ByteRope br offset] + (let [off (long offset) + lo (read-byte* br off) + hi (read-byte* br (unchecked-inc off))] + (bit-or (bit-shift-left hi 8) lo))) + +(defn byte-rope-get-int + "Return a big-endian signed 32-bit integer (long with int sign extension) + at offset. Reads four bytes." + ^long [^ByteRope br offset] + (let [off (long offset) + b0 (read-byte* br off) + b1 (read-byte* br (unchecked-add off 1)) + b2 (read-byte* br (unchecked-add off 2)) + b3 (read-byte* br (unchecked-add off 3)) + u32 (bit-or (bit-shift-left b0 24) + (bit-shift-left b1 16) + (bit-shift-left b2 8) + b3)] + ;; Truncate to 32 bits then sign-extend to 64 + (long (unchecked-int u32)))) + +(defn byte-rope-get-int-le + "Return a little-endian signed 32-bit integer (long with int sign + extension) at offset." + ^long [^ByteRope br offset] + (let [off (long offset) + b0 (read-byte* br off) + b1 (read-byte* br (unchecked-add off 1)) + b2 (read-byte* br (unchecked-add off 2)) + b3 (read-byte* br (unchecked-add off 3)) + u32 (bit-or (bit-shift-left b3 24) + (bit-shift-left b2 16) + (bit-shift-left b1 8) + b0)] + (long (unchecked-int u32)))) + +(defn byte-rope-get-long + "Return a big-endian signed 64-bit integer at offset. Reads eight bytes." + ^long [^ByteRope br offset] + (let [off (long offset) + b0 (read-byte* br off) + b1 (read-byte* br (unchecked-add off 1)) + b2 (read-byte* br (unchecked-add off 2)) + b3 (read-byte* br (unchecked-add off 3)) + b4 (read-byte* br (unchecked-add off 4)) + b5 (read-byte* br (unchecked-add off 5)) + b6 (read-byte* br (unchecked-add off 6)) + b7 (read-byte* br (unchecked-add off 7))] + (bit-or (bit-shift-left b0 56) + (bit-shift-left b1 48) + (bit-shift-left b2 40) + (bit-shift-left b3 32) + (bit-shift-left b4 24) + (bit-shift-left b5 16) + (bit-shift-left b6 8) + b7))) + +(defn byte-rope-get-long-le + "Return a little-endian signed 64-bit integer at offset." + ^long [^ByteRope br offset] + (let [off (long offset) + b0 (read-byte* br off) + b1 (read-byte* br (unchecked-add off 1)) + b2 (read-byte* br (unchecked-add off 2)) + b3 (read-byte* br (unchecked-add off 3)) + b4 (read-byte* br (unchecked-add off 4)) + b5 (read-byte* br (unchecked-add off 5)) + b6 (read-byte* br (unchecked-add off 6)) + b7 (read-byte* br (unchecked-add off 7))] + (bit-or (bit-shift-left b7 56) + (bit-shift-left b6 48) + (bit-shift-left b5 40) + (bit-shift-left b4 32) + (bit-shift-left b3 24) + (bit-shift-left b2 16) + (bit-shift-left b1 8) + b0))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Search +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn byte-rope-index-of + "Return the index of the first occurrence of the unsigned byte value + (0–255) in the byte rope, or -1 if not found. Optional `from` position." + (^long [^ByteRope br byte-val] + (byte-rope-index-of br byte-val 0)) + (^long [^ByteRope br byte-val from] + (let [target (bit-and (long byte-val) 0xFF) + n (long (count br)) + from (long from) + start (max 0 from)] + (if (>= start n) + -1 + (let [result (ropetree/rope-reduce + (fn [^long i v] + (cond + (< i start) (unchecked-inc i) + (= (long v) target) (reduced i) + :else (unchecked-inc i))) + (long 0) + (let [root (.-root ^ByteRope br)] + (cond + (nil? root) nil + (byte-array? root) (ropetree/bytes->root ^bytes root) + :else root)))] + (if (and (integer? result) (< (long result) n)) + (long result) + -1)))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Digest +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn byte-rope-digest + "Compute a cryptographic digest of the byte rope's contents using the + named algorithm (\"SHA-256\", \"SHA-1\", \"MD5\", etc.). Streams chunks + through java.security.MessageDigest without materializing the whole + rope. Returns a byte rope of the digest." + [^ByteRope br ^String algorithm] + (let [md (java.security.MessageDigest/getInstance algorithm)] + (doseq [chunk (proto/rope-chunks br)] + (let [^bytes c chunk] + (.update md c 0 (alength c)))) + (byte-rope (.digest md)))) diff --git a/src/ordered_collections/types/interval_map.clj b/src/ordered_collections/types/interval_map.clj index 3002107..746b90f 100644 --- a/src/ordered_collections/types/interval_map.clj +++ b/src/ordered_collections/types/interval_map.clj @@ -226,7 +226,7 @@ (defmethod print-method IntervalMap [^IntervalMap m ^java.io.Writer w] (if (order/default-comparator? (.getCmp ^IOrderedCollection m)) - (do (.write w "#ordered/interval-map [") + (do (.write w "#interval/map [") (let [s (seq m)] (when s (let [[k v] (first s)] diff --git a/src/ordered_collections/types/interval_set.clj b/src/ordered_collections/types/interval_set.clj index 9d96715..23956fc 100644 --- a/src/ordered_collections/types/interval_set.clj +++ b/src/ordered_collections/types/interval_set.clj @@ -302,7 +302,7 @@ (defmethod print-method IntervalSet [^IntervalSet s ^java.io.Writer w] (if (order/default-comparator? (.getCmp ^IOrderedCollection s)) - (do (.write w "#ordered/interval-set ") + (do (.write w "#interval/set ") (print-method (vec s) w)) (do (.write w "#= (+ (tree/node-size root1) (tree/node-size root2)) tree/+parallel-merge-root-threshold+)] - (binding [order/*compare* cmp] + (binding [order/*compare* cmp + tree/*t-join* alloc] (->OrderedMap (if use-parallel? (tree/node-map-merge-parallel root1 root2 merge-fn) (tree/node-map-merge root1 root2 merge-fn)) - cmp nil nil {}))) + cmp alloc stitch {}))) ;; Fallback: use sequential assoc (reduce-kv (fn [m k v] (if-let [existing (get m k)] diff --git a/src/ordered_collections/types/ordered_multiset.clj b/src/ordered_collections/types/ordered_multiset.clj index 33dbc9d..c523e83 100644 --- a/src/ordered_collections/types/ordered_multiset.clj +++ b/src/ordered_collections/types/ordered_multiset.clj @@ -252,7 +252,7 @@ java.io.Serializable INodeCollection - (getAllocator [_] nil) + (getAllocator [_] tree/node-create-weight-balanced) (getRoot [_] root) IOrderedCollection @@ -263,7 +263,7 @@ (isSimilar [_ _] false) IBalancedCollection - (getStitch [_] nil) + (getStitch [_] tree/node-stitch) clojure.lang.IMeta (meta [_] _meta) @@ -522,7 +522,7 @@ (defmethod print-method OrderedMultiset [^OrderedMultiset ms ^java.io.Writer w] (if (order/default-comparator? (.cmp ms)) - (do (.write w "#ordered/multiset [") + (do (.write w "#multi/set [") (when-let [s (seq ms)] (print-method (first s) w) (doseq [x (rest s)] diff --git a/src/ordered_collections/types/ordered_set.clj b/src/ordered_collections/types/ordered_set.clj index 3987920..0a3887c 100644 --- a/src/ordered_collections/types/ordered_set.clj +++ b/src/ordered_collections/types/ordered_set.clj @@ -197,7 +197,11 @@ java.util.List (indexOf [_ x] - (or (tree/node-rank root x cmp) -1)) + (or (cond + (identical? cmp order/long-compare) (tree/node-rank-long root (long x)) + (identical? cmp order/string-compare) (tree/node-rank-string root x) + :else (tree/node-rank root x cmp)) + -1)) (lastIndexOf [this x] (.indexOf this x)) @@ -205,7 +209,7 @@ (size [_] (tree/node-size root)) (iterator [this] - (clojure.lang.SeqIterator. (seq this))) + (tree/node-iterator root node/-k)) (containsAll [this s] (with-compare this (cond @@ -307,9 +311,9 @@ (identical? cmp order/string-compare) (tree/node-contains-string? root k) :else (tree/node-contains? root k cmp))) (disjoin [this k] - (new OrderedSet (tree/node-remove root k cmp tree/node-create-weight-balanced) cmp alloc stitch _meta)) + (new OrderedSet (tree/node-remove root k cmp alloc) cmp alloc stitch _meta)) (cons [this k] - (new OrderedSet (tree/node-add root k k cmp tree/node-create-weight-balanced) cmp alloc stitch _meta)) + (new OrderedSet (tree/node-add root k k cmp alloc) cmp alloc stitch _meta)) Object (toString [this] @@ -376,7 +380,11 @@ PRanked (rank-of [_ x] - (or (tree/node-rank root x cmp) -1)) + (or (cond + (identical? cmp order/long-compare) (tree/node-rank-long root (long x)) + (identical? cmp order/string-compare) (tree/node-rank-string root x) + :else (tree/node-rank root x cmp)) + -1)) (slice [_ start end] (let [n (tree/node-size root) start (max 0 (long start)) diff --git a/src/ordered_collections/types/priority_queue.clj b/src/ordered_collections/types/priority_queue.clj index 8c20444..05c0121 100644 --- a/src/ordered_collections/types/priority_queue.clj +++ b/src/ordered_collections/types/priority_queue.clj @@ -262,7 +262,7 @@ java.io.Serializable INodeCollection - (getAllocator [_] nil) + (getAllocator [_] tree/node-create-weight-balanced) (getRoot [_] root) IOrderedCollection @@ -273,7 +273,7 @@ (isSimilar [_ _] false) IBalancedCollection - (getStitch [_] nil) + (getStitch [_] tree/node-stitch) clojure.lang.IMeta (meta [_] _meta) @@ -533,7 +533,7 @@ (defmethod print-method PriorityQueue [^PriorityQueue pq ^java.io.Writer w] (if (order/default-comparator? (.cmp pq)) - (do (.write w "#ordered/priority-queue [") + (do (.write w "#priority/queue [") (when-let [s (seq pq)] (print-method (first s) w) (doseq [x (rest s)] diff --git a/src/ordered_collections/types/range_map.clj b/src/ordered_collections/types/range_map.clj index e25dde9..74ce311 100644 --- a/src/ordered_collections/types/range_map.clj +++ b/src/ordered_collections/types/range_map.clj @@ -376,11 +376,40 @@ ;; Constructor ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +(defn- sorted-disjoint? + "True iff every range in the sorted vector satisfies `lo < hi` AND the + ranges are non-overlapping. Ranges are half-open `[lo, hi)`, so + disjointness requires `prev-hi <= cur-lo`. Input with invalid or + overlapping ranges returns false so the caller falls through to the + general path, which throws on the invalid input (matching the + single-insert semantics of `assoc`)." + [sorted-entries] + (let [n (count sorted-entries)] + (loop [i (long 0) prev-hi nil] + (if (>= i n) + true + (let [[[lo hi] _] (nth sorted-entries i)] + (cond + ;; Invalid range: lo >= hi. + (not (neg? (clojure.core/compare lo hi))) + false + + ;; Overlap with predecessor: prev-hi > lo. + (and prev-hi (pos? (clojure.core/compare prev-hi lo))) + false + + :else (recur (unchecked-inc i) hi))))))) + (defn range-map "Create a range map from a collection of [range value] pairs. Ranges are [lo hi) (half-open, hi exclusive). + When the input is disjoint (no overlapping ranges), the tree is built + directly in O(n) via a balanced bottom-up construction, which is + substantially faster than carving per insert. Overlapping input falls + through to the general carving path, preserving 'later wins' semantics. + Example: (range-map {[0 10] :a [20 30] :b}) (range-map [[[0 10] :a] [[20 30] :b]])" @@ -388,10 +417,13 @@ (RangeMap. (node/leaf) range-compare {})) ([coll] (binding [order/*compare* range-compare] - (reduce - (fn [rm [rng v]] (range-map-assoc rm rng v false)) - (RangeMap. (node/leaf) range-compare {}) - coll)))) + (let [sorted (vec (sort-by (comp first first) coll))] + (if (sorted-disjoint? sorted) + (RangeMap. (tree/node-build-sorted sorted) range-compare {}) + (reduce + (fn [rm [rng v]] (range-map-assoc rm rng v false)) + (RangeMap. (node/leaf) range-compare {}) + coll)))))) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; Public API (delegates to protocol) @@ -449,7 +481,7 @@ ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; (defmethod print-method RangeMap [^RangeMap m ^java.io.Writer w] - (.write w "#ordered/range-map [") + (.write w "#range/map [") (let [s (seq m)] (when s (let [[k v] (first s)] diff --git a/src/ordered_collections/types/rope.clj b/src/ordered_collections/types/rope.clj index 278aba7..17e6637 100644 --- a/src/ordered_collections/types/rope.clj +++ b/src/ordered_collections/types/rope.clj @@ -4,15 +4,97 @@ (:require [clojure.core.protocols :as cp] [clojure.core.reducers :as r] [ordered-collections.protocol :as proto] - [ordered-collections.kernel.node :as node] + [ordered-collections.kernel.node :as node + :refer [leaf? -k -v -l -r]] + [ordered-collections.kernel.tree :as tree] [ordered-collections.kernel.rope :as ropetree]) (:import [clojure.lang RT Murmur3 MapEntry Indexed Util IPersistentCollection IPersistentStack IPersistentVector IEditableCollection ITransientCollection IReduce IReduceInit SeqIterator] + [ordered_collections.kernel.root INodeCollection] [java.util ArrayList])) +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Constants & tree binding macro +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(def ^:const ^:private +target-chunk-size+ + "Generic Rope target chunk size (element count). Bound into the + kernel's `*target-chunk-size*` dynamic var via `with-tree`. Tuned + via `lein bench-rope-tuning`: at 100K+ elements, 1024 gives better + nth/reduce/concat than the historical 256 with only a small splice + regression (which is still ~6000x faster than PersistentVector)." + 1024) + +(def ^:const ^:private +min-chunk-size+ + "Generic Rope minimum internal chunk size (= target/2)." + 512) + +(def ^:const ^:private +flat-threshold+ + "Maximum element count stored in flat (raw PersistentVector) + representation. Below this, the rope skips the tree wrapper entirely + and holds the vector directly, eliminating the per-rope SimpleNode + header and one layer of indirection on every operation. Above the + threshold, the representation is transparently promoted to the + chunked tree form. Matches `+target-chunk-size+` — a rope small + enough to live in one chunk goes flat." + 1024) + +(defmacro ^:private with-tree + "Bind the kernel's dynamic rope context for generic Rope operations: + `tree/*t-join*` to the allocator, and the CSI target/min to the + generic-rope constants. Every tree-mutating operation must execute + inside this binding." + [alloc & body] + `(binding [tree/*t-join* ~alloc + ropetree/*target-chunk-size* +target-chunk-size+ + ropetree/*min-chunk-size* +min-chunk-size+] + ~@body)) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Flat-mode helpers +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn- flat? + "True when root is a raw PersistentVector (flat representation). + Also matches APersistentVector$SubVector, which rope-sub returns." + [root] + (instance? clojure.lang.APersistentVector root)) + +(defn- flat-size + "Size of a flat or tree root. Handles nil, flat vector, and tree nodes." + ^long [root] + (cond + (nil? root) 0 + (flat? root) (.count ^clojure.lang.Counted root) + :else (long (node/-v root)))) + +(defn- ensure-tree-root + "Promote a flat vector root to a tree root. Tree nodes are returned + unchanged. Caller must bind tree/*t-join* (and CSI vars)." + [root] + (cond + (nil? root) nil + (flat? root) (if (zero? (.count ^clojure.lang.Counted root)) + nil + (ropetree/chunks->root [root])) + :else root)) + +(defn- make-root + "Create a Rope root from a (flat) vector. Stays flat if ≤ threshold, + otherwise promotes to tree via str-like rechunking so internal + chunks satisfy CSI. Caller must bind tree/*t-join* for promotion." + [^clojure.lang.IPersistentVector v] + (let [n (.length v)] + (cond + (zero? n) nil + (<= n +flat-threshold+) v + :else (ropetree/coll->root v)))) + + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; Helpers ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; @@ -103,39 +185,279 @@ (defn- linear-index-of "Linear scan for first index of x, or -1." [root x] - (let [n (ropetree/rope-size root) - result (ropetree/rope-reduce - (fn [^long i elem] - (if (Util/equiv elem x) - (reduced i) - (unchecked-inc i))) - (long 0) - root)] - (if (< (long result) n) (long result) -1))) + (cond + (nil? root) -1 + (flat? root) (.indexOf ^java.util.List root x) + :else + (let [n (ropetree/rope-size root) + result (ropetree/rope-reduce + (fn [^long i elem] + (if (Util/equiv elem x) + (reduced i) + (unchecked-inc i))) + (long 0) + root)] + (if (< (long result) n) (long result) -1)))) (defn- linear-last-index-of "Forward linear scan tracking the last matching index. O(n)." [root x] - (let [found (volatile! (long -1))] - (ropetree/rope-reduce - (fn [^long i elem] - (when (Util/equiv elem x) (vreset! found i)) - (unchecked-inc i)) - (long 0) - root) - (long @found))) + (cond + (nil? root) -1 + (flat? root) (.lastIndexOf ^java.util.List root x) + :else + (let [found (volatile! (long -1))] + (ropetree/rope-reduce + (fn [^long i elem] + (when (Util/equiv elem x) (vreset! found i)) + (unchecked-inc i)) + (long 0) + root) + (long @found)))) (defn- rope-to-array ^objects [root] - (let [n (ropetree/rope-size root) - arr (object-array n)] - (ropetree/rope-reduce - (fn [^long i x] - (aset arr i x) - (unchecked-inc i)) - (long 0) - root) - arr)) + (cond + (nil? root) (object-array 0) + (flat? root) (.toArray ^java.util.Collection root) + :else + (let [n (ropetree/rope-size root) + arr (object-array n)] + (ropetree/rope-reduce + (fn [^long i x] + (aset arr i x) + (unchecked-inc i)) + (long 0) + root) + arr))) + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; RopeSeq / RopeSeqReverse — direct seq types over vector chunks +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; +;; These monomorphic seq types walk a rope tree's vector chunks using +;; `.count ^Counted` and `.nth ^Indexed`. They are specific to the generic +;; Rope (whose chunks are APersistentVector); the StringRope and ByteRope +;; carry their own specialized seq types with .charAt and aget fast paths. + +(defn- seq-equiv + "Element-wise sequential equivalence." + [s1 o] + (if-not (or (instance? clojure.lang.Sequential o) + (instance? java.util.List o)) + false + (loop [s1 (seq s1) s2 (seq o)] + (cond + (nil? s1) (nil? s2) + (nil? s2) false + (not (Util/equiv (first s1) (first s2))) false + :else (recur (next s1) (next s2)))))) + +(deftype RopeSeq [enum chunk ^long i cnt _meta] + clojure.lang.ISeq + (first [_] + (.nth ^clojure.lang.Indexed chunk (unchecked-int i))) + (next [_] + (let [next-cnt (when cnt (unchecked-dec-int cnt)) + next-i (unchecked-inc i)] + (if (< next-i (count chunk)) + (RopeSeq. enum chunk next-i next-cnt nil) + (when-let [e (tree/node-enum-rest enum)] + (let [chunk' (-k (tree/node-enum-first e))] + (RopeSeq. e chunk' 0 next-cnt nil)))))) + (more [this] + (or (.next this) ())) + (cons [this o] + (clojure.lang.Cons. o this)) + + clojure.lang.Seqable + (seq [this] this) + + clojure.lang.Sequential + + java.lang.Iterable + (iterator [this] + (SeqIterator. this)) + + clojure.lang.Counted + (count [_] + (or cnt + (loop [e enum + chunk chunk + i i + n 0] + (let [n (+ n (- (count chunk) i))] + (if-let [e' (tree/node-enum-rest e)] + (let [chunk' (-k (tree/node-enum-first e'))] + (recur e' chunk' 0 n)) + n))))) + + clojure.lang.IReduceInit + (reduce [_ f init] + (loop [e enum + chunk chunk + i i + acc init] + (let [acc (loop [idx i + acc acc] + (if (< idx (count chunk)) + (let [ret (f acc (.nth ^clojure.lang.Indexed chunk (unchecked-int idx)))] + (if (reduced? ret) + ret + (recur (unchecked-inc idx) ret))) + acc))] + (if (reduced? acc) + @acc + (if-let [e' (tree/node-enum-rest e)] + (let [chunk' (-k (tree/node-enum-first e'))] + (recur e' chunk' 0 acc)) + acc))))) + + clojure.lang.IReduce + (reduce [this f] + (if enum + (let [acc (.nth ^clojure.lang.Indexed chunk (unchecked-int i)) + next-i (unchecked-inc i)] + (if (< next-i (count chunk)) + (.reduce ^clojure.lang.IReduceInit + (RopeSeq. enum chunk next-i nil nil) f acc) + (if-let [e' (tree/node-enum-rest enum)] + (let [chunk' (-k (tree/node-enum-first e'))] + (.reduce ^clojure.lang.IReduceInit + (RopeSeq. e' chunk' 0 nil nil) f acc)) + acc))) + (f))) + + clojure.lang.IHashEq + (hasheq [this] + (Murmur3/hashOrdered this)) + + clojure.lang.IPersistentCollection + (empty [_] ()) + (equiv [this o] + (seq-equiv this o)) + + Object + (hashCode [this] + (Util/hash this)) + (equals [this o] + (Util/equals this o)) + + clojure.lang.IMeta + (meta [_] _meta) + + clojure.lang.IObj + (withMeta [_ m] + (RopeSeq. enum chunk i cnt m))) + +(deftype RopeSeqReverse [enum chunk ^long i cnt _meta] + clojure.lang.ISeq + (first [_] + (.nth ^clojure.lang.Indexed chunk (unchecked-int i))) + (next [_] + (let [next-cnt (when cnt (unchecked-dec-int cnt))] + (if (pos? i) + (RopeSeqReverse. enum chunk (unchecked-dec i) next-cnt nil) + (when-let [e (tree/node-enum-prior enum)] + (let [chunk' (-k (tree/node-enum-first e))] + (RopeSeqReverse. e chunk' (dec (count chunk')) next-cnt nil)))))) + (more [this] + (or (.next this) ())) + (cons [this o] + (clojure.lang.Cons. o this)) + + clojure.lang.Seqable + (seq [this] this) + + clojure.lang.Sequential + + java.lang.Iterable + (iterator [this] + (SeqIterator. this)) + + clojure.lang.Counted + (count [_] + (or cnt + (loop [e enum + chunk chunk + i i + n 0] + (let [n (+ n (inc i))] + (if-let [e' (tree/node-enum-prior e)] + (let [chunk' (-k (tree/node-enum-first e'))] + (recur e' chunk' (dec (count chunk')) n)) + n))))) + + clojure.lang.IReduceInit + (reduce [_ f init] + (loop [e enum + chunk chunk + i i + acc init] + (let [acc (loop [idx i + acc acc] + (if (neg? idx) + acc + (let [ret (f acc (.nth ^clojure.lang.Indexed chunk (unchecked-int idx)))] + (if (reduced? ret) + ret + (recur (unchecked-dec idx) ret)))))] + (if (reduced? acc) + @acc + (if-let [e' (tree/node-enum-prior e)] + (let [chunk' (-k (tree/node-enum-first e'))] + (recur e' chunk' (dec (count chunk')) acc)) + acc))))) + + clojure.lang.IReduce + (reduce [this f] + (if enum + (let [acc (.nth ^clojure.lang.Indexed chunk (unchecked-int i))] + (if (pos? i) + (.reduce ^clojure.lang.IReduceInit + (RopeSeqReverse. enum chunk (unchecked-dec i) nil nil) f acc) + (if-let [e' (tree/node-enum-prior enum)] + (let [chunk' (-k (tree/node-enum-first e'))] + (.reduce ^clojure.lang.IReduceInit + (RopeSeqReverse. e' chunk' (dec (count chunk')) nil nil) f acc)) + acc))) + (f))) + + clojure.lang.IHashEq + (hasheq [this] + (Murmur3/hashOrdered this)) + + clojure.lang.IPersistentCollection + (empty [_] ()) + (equiv [this o] + (seq-equiv this o)) + + Object + (hashCode [this] + (Util/hash this)) + (equals [this o] + (Util/equals this o)) + + clojure.lang.IMeta + (meta [_] _meta) + + clojure.lang.IObj + (withMeta [_ m] + (RopeSeqReverse. enum chunk i cnt m))) + +(defn- rope-seq + "Forward seq over a rope tree's elements." + [root] + (when-let [enum (tree/node-enumerator root)] + (let [chunk (-k (tree/node-enum-first enum))] + (RopeSeq. enum chunk 0 (ropetree/rope-size root) nil)))) + +(defn- rope-rseq + "Reverse seq over a rope tree's elements." + [root] + (when-let [enum (tree/node-enumerator-reverse root)] + (let [chunk (-k (tree/node-enum-first enum))] + (RopeSeqReverse. enum chunk (dec (count chunk)) (ropetree/rope-size root) nil)))) (declare ->TransientRope) @@ -152,29 +474,51 @@ ;; IPersistentVector for full vector-contract compatibility, including ;; indexed access, assoc, conj-to-end, peek/pop-right. -(deftype Rope [root _meta] +(deftype Rope [root alloc _meta] java.io.Serializable java.util.RandomAccess + INodeCollection + (getAllocator [_] alloc) + clojure.lang.IMeta (meta [_] _meta) clojure.lang.IObj - (withMeta [_ m] (Rope. root m)) + (withMeta [_ m] (Rope. root alloc m)) clojure.lang.Counted (count [_] - (ropetree/rope-size root)) + (flat-size root)) Indexed + ;; Monomorphic nth — inlines the tree walk so we can replace the kernel's + ;; protocol-dispatched `chunk-length` and `chunk-nth` with direct calls + ;; on `Counted` and `Indexed` interfaces. APersistentVector is the only + ;; chunk type for generic Rope, so the type is known statically. (nth [_ i] - (if (valid-index? (ropetree/rope-size root) i) - (ropetree/rope-nth root (long i)) - (throw (IndexOutOfBoundsException.)))) - (nth [_ i not-found] - (if (valid-index? (ropetree/rope-size root) i) - (ropetree/rope-nth root (long i)) + (let [ii (long i) + n (flat-size root)] + (when-not (valid-index? n ii) + (throw (IndexOutOfBoundsException.))) + (if (flat? root) + ;; Flat mode: direct .nth on the backing PersistentVector/SubVector. + (.nth ^clojure.lang.Indexed root (unchecked-int ii)) + ;; Tree mode: inline monomorphic walk using Counted + Indexed. + (loop [nd root, j ii] + (let [l (-l nd) + ls (if (leaf? l) 0 (long (-v l))) + ck (-k nd) + cs (long (.count ^clojure.lang.Counted ck)) + rs (+ ls cs)] + (cond + (< j ls) (recur l j) + (< j rs) (.nth ^clojure.lang.Indexed ck (unchecked-int (- j ls))) + :else (recur (-r nd) (- j rs)))))))) + (nth [this i not-found] + (if (and (integer? i) (valid-index? (flat-size root) (long i))) + (.nth this (int i)) not-found)) clojure.lang.ILookup @@ -198,49 +542,98 @@ clojure.lang.Associative (containsKey [_ k] - (valid-index? (ropetree/rope-size root) k)) + (valid-index? (flat-size root) k)) (entryAt [this k] (when (.containsKey this k) (MapEntry. k (.nth this k)))) (assoc [this k v] - (let [n (ropetree/rope-size root)] + (let [n (flat-size root) + i (long k)] (cond - (not (insert-index? n k)) + (not (insert-index? n i)) (throw (IndexOutOfBoundsException.)) - (= (long k) n) - (Rope. (ropetree/rope-conj-right root v) _meta) + (flat? root) + (let [^clojure.lang.IPersistentVector fv root] + (if (= i n) + ;; append + (if (< n +flat-threshold+) + (Rope. (.cons fv v) alloc _meta) + (with-tree alloc + (Rope. (ropetree/rope-conj-right (ensure-tree-root root) v) alloc _meta))) + ;; replace + (Rope. (.assocN fv (int i) v) alloc _meta))) :else - (Rope. (ropetree/rope-assoc root (long k) v) _meta)))) + (with-tree alloc + (if (= i n) + (Rope. (ropetree/rope-conj-right root v) alloc _meta) + (Rope. (ropetree/rope-assoc root (long i) v) alloc _meta)))))) IPersistentVector (assocN [this i v] (.assoc this i v)) (length [_] - (ropetree/rope-size root)) + (flat-size root)) IPersistentCollection (cons [_ o] - (Rope. (ropetree/rope-conj-right root o) _meta)) + (cond + (nil? root) + (Rope. [o] alloc _meta) + + (flat? root) + (let [^clojure.lang.IPersistentVector fv root + n (.length fv)] + (if (< n +flat-threshold+) + (Rope. (.cons fv o) alloc _meta) + (with-tree alloc + (Rope. (ropetree/rope-conj-right (ensure-tree-root root) o) alloc _meta)))) + + :else + (with-tree alloc + (Rope. (ropetree/rope-conj-right root o) alloc _meta)))) (empty [_] - (Rope. nil _meta)) + (Rope. nil alloc _meta)) (equiv [this o] (rope-equiv this o)) IPersistentStack (peek [_] - (ropetree/rope-peek-right root)) + (cond + (nil? root) nil + (flat? root) (let [^clojure.lang.IPersistentStack fv root] + (.peek fv)) + :else (ropetree/rope-peek-right root))) (pop [_] - (Rope. (ropetree/rope-pop-right root) _meta)) + (cond + (nil? root) + (throw (IllegalStateException. "Can't pop empty vector")) + + (flat? root) + (let [^clojure.lang.IPersistentVector fv root + n (.length fv)] + (if (<= n 1) + (Rope. nil alloc _meta) + (Rope. (.pop ^clojure.lang.IPersistentStack fv) alloc _meta))) + + :else + (with-tree alloc + (Rope. (ropetree/rope-pop-right root) alloc _meta)))) clojure.lang.Seqable (seq [_] - (ropetree/rope-seq root)) + (cond + (nil? root) nil + (flat? root) (seq root) + :else (rope-seq root))) clojure.lang.Reversible (rseq [_] - (ropetree/rope-rseq root)) + (cond + (nil? root) nil + (flat? root) (rseq ^clojure.lang.Reversible root) + :else (rope-rseq root))) clojure.lang.Sequential @@ -256,11 +649,19 @@ IReduceInit (reduce [_ f init] - (ropetree/rope-reduce f init root)) + (cond + (nil? root) init + ;; clojure.core/reduce dispatches via CollReduce/IReduceInit as + ;; appropriate — works for PersistentVector and SubVector alike. + (flat? root) (clojure.core/reduce f init root) + :else (ropetree/rope-reduce f init root))) IReduce (reduce [_ f] - (ropetree/rope-reduce f root)) + (cond + (nil? root) (f) + (flat? root) (clojure.core/reduce f root) + :else (ropetree/rope-reduce f root))) cp/CollReduce (coll-reduce [this f] @@ -270,7 +671,10 @@ r/CollFold (coll-fold [this n combinef reducef] - (ropetree/rope-fold root (long n) combinef reducef)) + (cond + (nil? root) (combinef) + (flat? root) (clojure.core/reduce reducef (combinef) root) + :else (ropetree/rope-fold root (long n) combinef reducef))) clojure.lang.IHashEq (hasheq [this] @@ -280,13 +684,13 @@ (toArray [_] (rope-to-array root)) (isEmpty [_] - (nil? root)) + (zero? (flat-size root))) (^boolean contains [_ x] (not (neg? (linear-index-of root x)))) (containsAll [this c] (every? #(.contains this %) c)) (size [_] - (ropetree/rope-size root)) + (int (flat-size root))) (add [_ _] (throw (UnsupportedOperationException.))) (addAll [_ _] @@ -301,10 +705,8 @@ (throw (UnsupportedOperationException.))) java.util.List - (get [_ i] - (if (valid-index? (ropetree/rope-size root) i) - (ropetree/rope-nth root (long i)) - (throw (IndexOutOfBoundsException.)))) + (get [this i] + (.nth this (int i))) (indexOf [_ x] (linear-index-of root x)) (lastIndexOf [_ x] @@ -316,40 +718,182 @@ proto/PRope (rope-cat [this other] - (Rope. (ropetree/rope-concat root (.-root ^Rope other)) _meta)) + (when-not (instance? Rope other) + (throw (IllegalArgumentException. "Rope rope-cat requires a Rope"))) + (let [other-root (.-root ^Rope other) + fv1 (when (flat? root) root) + fv2 (when (flat? other-root) other-root)] + (cond + ;; Both empty + (and (nil? root) (nil? other-root)) + (Rope. nil alloc _meta) + + ;; Other empty + (nil? other-root) this + + ;; This empty + (nil? root) (Rope. other-root alloc _meta) + + ;; Fast path: both flat and combined fits + (and fv1 fv2 + (<= (+ (count fv1) (count fv2)) +flat-threshold+)) + (Rope. (into fv1 fv2) alloc _meta) + + :else + (with-tree alloc + (Rope. (ropetree/rope-concat (ensure-tree-root root) + (ensure-tree-root other-root)) + alloc _meta))))) (rope-split [_ i] - (check-insert-index! (ropetree/rope-size root) i) - (let [[l r] (ropetree/ensure-split-parts - (ropetree/rope-split-at root (long i)))] - [(Rope. l _meta) (Rope. r _meta)])) + (let [n (flat-size root)] + (check-insert-index! n i) + (let [ii (long i)] + (cond + (nil? root) + [(Rope. nil alloc _meta) (Rope. nil alloc _meta)] + + (flat? root) + (let [^clojure.lang.IPersistentVector fv root] + [(Rope. (when (pos? ii) (subvec fv 0 (int ii))) alloc _meta) + (Rope. (when (< ii n) (subvec fv (int ii) (int n))) alloc _meta)]) + + :else + (with-tree alloc + (let [[l r] (ropetree/ensure-split-parts + (ropetree/rope-split-at root ii))] + [(Rope. l alloc _meta) (Rope. r alloc _meta)])))))) (rope-sub [_ start end] - (let [n (ropetree/rope-size root)] + (let [n (flat-size root)] (check-range! start end n) - (Rope. (ropetree/rope-subvec-root root (long start) (long end)) _meta))) + (let [si (long start) + ei (long end)] + (cond + (or (nil? root) (= si ei)) + (Rope. nil alloc _meta) + + (flat? root) + (let [^clojure.lang.IPersistentVector fv root] + (Rope. (subvec fv (int si) (int ei)) alloc _meta)) + + :else + (with-tree alloc + (Rope. (ropetree/rope-subvec-root root si ei) alloc _meta)))))) (rope-insert [this i coll] - (let [n (ropetree/rope-size root)] + (let [n (flat-size root)] (check-insert-index! n i) - (let [mid (->rope coll)] - (Rope. (ropetree/rope-insert-root root (long i) (.-root ^Rope mid)) _meta)))) + (let [ii (long i) + ins (if (instance? clojure.lang.APersistentVector coll) coll (vec coll)) + ins-n (.count ^clojure.lang.Counted ins)] + (cond + (zero? ins-n) this + + ;; Flat fast path + (flat? root) + (let [^clojure.lang.IPersistentVector fv root + total (+ n ins-n)] + (if (<= total +flat-threshold+) + ;; Build new flat vector directly + (let [new-v (-> [] + (into (subvec fv 0 (int ii))) + (into ins) + (into (subvec fv (int ii) (int n))))] + (Rope. new-v alloc _meta)) + ;; Promote to tree + (with-tree alloc + (Rope. (ropetree/rope-insert-root + (ensure-tree-root root) ii + (ensure-tree-root ins)) + alloc _meta)))) + + :else + (or (when (and (<= ins-n +target-chunk-size+) (pos? ins-n)) + (when-let [new-root (ropetree/rope-splice-inplace + root ii ii ins alloc)] + (Rope. new-root alloc _meta))) + (with-tree alloc + (Rope. (ropetree/rope-insert-root root ii + (ensure-tree-root ins)) + alloc _meta))))))) (rope-remove [this start end] - (check-range! start end (ropetree/rope-size root)) - (let [start (long start) - end (long end)] - (Rope. (ropetree/rope-remove-root root start end) _meta))) + (let [n (flat-size root)] + (check-range! start end n) + (let [si (long start) + ei (long end)] + (cond + (= si ei) this + + (flat? root) + (let [^clojure.lang.IPersistentVector fv root + new-v (into (subvec fv 0 (int si)) (subvec fv (int ei) (int n)))] + (Rope. (when (pos? (count new-v)) new-v) alloc _meta)) + + :else + (or (when-let [new-root (ropetree/rope-splice-inplace + root si ei nil alloc)] + (Rope. new-root alloc _meta)) + (with-tree alloc + (Rope. (ropetree/rope-remove-root root si ei) alloc _meta))))))) (rope-splice [this start end coll] - (check-range! start end (ropetree/rope-size root)) - (let [start (long start) - end (long end) - mid (->rope coll)] - (Rope. (ropetree/rope-splice-root root start end (.-root ^Rope mid)) _meta))) + (let [n (flat-size root)] + (check-range! start end n) + (let [si (long start) + ei (long end) + ins (if (instance? clojure.lang.APersistentVector coll) coll (vec coll)) + ins-n (.count ^clojure.lang.Counted ins)] + (cond + (flat? root) + (let [^clojure.lang.IPersistentVector fv root + total (+ (- n (- ei si)) ins-n)] + (if (<= total +flat-threshold+) + (let [new-v (-> [] + (into (subvec fv 0 (int si))) + (into ins) + (into (subvec fv (int ei) (int n))))] + (Rope. (when (pos? (count new-v)) new-v) alloc _meta)) + (with-tree alloc + (Rope. (ropetree/rope-splice-root + (ensure-tree-root root) si ei + (ensure-tree-root ins)) + alloc _meta)))) + + (nil? root) + (if (zero? ins-n) + this + (with-tree alloc + (Rope. (make-root ins) alloc _meta))) + + :else + (or (when (<= ins-n +target-chunk-size+) + (let [rep-chunk (when (pos? ins-n) ins)] + (when-let [new-root (ropetree/rope-splice-inplace + root si ei rep-chunk alloc)] + (Rope. new-root alloc _meta)))) + (with-tree alloc + (Rope. (ropetree/rope-splice-root root si ei + (ensure-tree-root ins)) + alloc _meta))))))) (rope-chunks [_] - (ropetree/rope-chunks-seq root)) + (cond + (nil? root) nil + (flat? root) (list root) + :else (ropetree/rope-chunks-seq root))) (rope-str [_] - (ropetree/rope->str root)) + (cond + (nil? root) "" + (flat? root) + (let [sb (StringBuilder.)] + (run! #(.append sb %) root) + (.toString sb)) + :else + (ropetree/rope->str root))) IEditableCollection (asTransient [_] - (->TransientRope root (ArrayList.) (ArrayList.) 0 true _meta)) + ;; TransientRope's internal machinery (rope-size, rope-nth, etc.) + ;; expects a tree root, so promote any flat root on the way in. + (->TransientRope + (with-tree alloc (ensure-tree-root root)) + alloc (ArrayList.) (ArrayList.) 0 true _meta)) Object (hashCode [this] @@ -361,11 +905,48 @@ (defn- ->rope - "Coerce x to a Rope, returning x if already a Rope." + "Coerce x to a Rope, returning x if already a Rope. Small inputs + stay in flat form (raw vector); larger inputs build a tree." [x] (if (instance? Rope x) x - (Rope. (ropetree/coll->root x) {}))) + (let [v (if (instance? clojure.lang.APersistentVector x) x (vec x)) + n (.count ^clojure.lang.Counted v)] + (cond + (zero? n) + (Rope. nil ropetree/rope-node-create {}) + + (<= n +flat-threshold+) + (Rope. v ropetree/rope-node-create {}) + + :else + (with-tree ropetree/rope-node-create + (Rope. (ropetree/coll->root v) ropetree/rope-node-create {})))))) + +(defn- ->tree-root + "Coerce `x` to a tree root suitable for kernel operations. + Caller must bind tree/*t-join* (and CSI vars). + - `nil` → `nil` + - `Rope` with flat → single-node tree from the flat vector + - `Rope` with tree → that tree + - anything else → promoted via coll->root" + [x] + (cond + (nil? x) nil + + (instance? Rope x) + (let [rt (.-root ^Rope x)] + (cond + (nil? rt) nil + (flat? rt) (if (zero? (count rt)) + nil + (ropetree/chunks->root [rt])) + :else rt)) + + :else + (ropetree/coll->root (if (instance? clojure.lang.APersistentVector x) + x + (vec x))))) (defn rope-concat "Concatenate ropes or rope-coercible collections. @@ -377,12 +958,20 @@ ([left right] (proto/rope-cat (->rope left) (->rope right))) ([left right & more] - (Rope. (ropetree/chunks->root-csi - (into [] (mapcat (comp ropetree/root->chunks rope-root)) - (list* left right more))) - (or (meta left) {})))) + (with-tree ropetree/rope-node-create + (Rope. (ropetree/chunks->root-csi + (into [] + (mapcat (fn [x] + (let [rt (->tree-root x)] + (if rt + (ropetree/root->chunks rt) + [])))) + (list* left right more))) + ropetree/rope-node-create + (or (meta left) {}))))) (defn- transient-appended-root + "Build a rope tree from flushed chunks + tail. Caller must bind *t-join*." [^ArrayList chunks ^ArrayList tail] (let [chunk-count (.size chunks) tail-empty? (.isEmpty tail)] @@ -404,6 +993,7 @@ 4) (defn- transient-final-root + "Merge original root with appended chunks/tail. Caller must bind *t-join*." [root ^ArrayList chunks ^ArrayList tail] (cond ;; Fast path: nothing appended — return original root unchanged @@ -465,6 +1055,7 @@ ;; rope API. (deftype TransientRope [^:unsynchronized-mutable root + alloc ^ArrayList chunks ^ArrayList tail ^:unsynchronized-mutable chunk-elems @@ -474,16 +1065,24 @@ (conj [this x] (when-not edit (throw (IllegalAccessError. "Transient used after persistent! call"))) (.add tail x) - (when (>= (.size tail) ropetree/+target-chunk-size+) + (when (>= (.size tail) +target-chunk-size+) (.add chunks (vec tail)) - (set! chunk-elems (+ chunk-elems ropetree/+target-chunk-size+)) + (set! chunk-elems (+ chunk-elems +target-chunk-size+)) (.clear tail)) this) (persistent [_] (when-not edit (throw (IllegalAccessError. "Transient used after persistent! call"))) (set! edit false) - (Rope. (transient-final-root root chunks tail) _meta)) + (with-tree alloc + (let [tree-root (transient-final-root root chunks tail) + ;; Demote to flat if the result is small enough + final-root (if (and tree-root + (not (flat? tree-root)) + (<= (ropetree/rope-size tree-root) +flat-threshold+)) + (vec (mapcat identity (ropetree/root->chunks tree-root))) + tree-root)] + (Rope. final-root alloc _meta)))) clojure.lang.Counted (count [_] @@ -497,8 +1096,8 @@ (cond (and (>= i 0) (< i rs)) (ropetree/rope-nth root i) (and (>= j 0) (< j chunk-elems)) - (let [chunk-idx (quot j ropetree/+target-chunk-size+) - offset (rem j ropetree/+target-chunk-size+) + (let [chunk-idx (quot j +target-chunk-size+) + offset (rem j +target-chunk-size+) chunk (.get chunks chunk-idx)] (.nth ^clojure.lang.Indexed chunk offset)) @@ -517,35 +1116,49 @@ ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; (defn rope - ([] (Rope. nil {})) + "Create a persistent rope from a collection. + Small inputs (≤ 1024 elements) are stored as a raw PersistentVector + root with zero tree overhead. Larger inputs build the chunked tree." + ([] (Rope. nil ropetree/rope-node-create {})) ([coll] - (Rope. (ropetree/coll->root coll) {}))) - -(defn- rope-root - [x] - (if (instance? Rope x) - (.-root ^Rope x) - (ropetree/coll->root x))) + (->rope coll))) (defn rope-concat-all "Bulk concatenation of rope values or rope-coercible collections. Collects all chunks and builds the tree directly in O(total chunks), avoiding pairwise tree operations." [& xs] - (Rope. (ropetree/chunks->root-csi - (into [] (mapcat (comp ropetree/root->chunks rope-root)) xs)) - (or (meta (first xs)) {}))) + (with-tree ropetree/rope-node-create + (Rope. (ropetree/chunks->root-csi + (into [] + (mapcat (fn [x] + (let [rt (->tree-root x)] + (if rt + (ropetree/root->chunks rt) + [])))) + xs)) + ropetree/rope-node-create + (or (meta (first xs)) {})))) (defn rope-chunks-reverse "Reverse seq of internal chunk vectors." [v] - (ropetree/rope-chunks-rseq (rope-root v))) + (let [^Rope r (->rope v) + root (.-root r)] + (cond + (nil? root) nil + (flat? root) (list root) + :else (ropetree/rope-chunks-rseq root)))) (defn rope-chunk-count "Number of internal chunks. O(1)." [v] - (let [root (rope-root v)] - (if (nil? root) 0 (ropetree/chunk-count root)))) + (let [^Rope r (->rope v) + root (.-root r)] + (cond + (nil? root) 0 + (flat? root) 1 + :else (ropetree/chunk-count root)))) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; @@ -553,5 +1166,5 @@ ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; (defmethod print-method Rope [^Rope r ^java.io.Writer w] - (.write w "#ordered/rope ") + (.write w "#vec/rope ") (print-method (into [] r) w)) diff --git a/src/ordered_collections/types/string_rope.clj b/src/ordered_collections/types/string_rope.clj new file mode 100644 index 0000000..3b98b08 --- /dev/null +++ b/src/ordered_collections/types/string_rope.clj @@ -0,0 +1,1132 @@ +(ns ordered-collections.types.string-rope + "Persistent string rope optimized for structural text editing. + Backed by a chunked weight-balanced tree with String chunks. + O(log n) concat, split, splice, insert, and remove. + Implements CharSequence for seamless Java interop. + + Small strings (≤ +flat-threshold+ chars) are stored as a raw String + internally, giving String-equivalent performance on read operations. + When edits grow the content past the threshold, the representation + is transparently promoted to the chunked tree form." + (:require [clojure.core.protocols :as cp] + [clojure.core.reducers :as r] + [ordered-collections.protocol :as proto] + [ordered-collections.kernel.node :as node + :refer [leaf leaf? -k -v -l -r]] + [ordered-collections.kernel.tree :as tree] + [ordered-collections.kernel.rope :as ropetree]) + (:import [clojure.lang RT Murmur3 Util SeqIterator + IReduce IReduceInit + IEditableCollection ITransientCollection] + [ordered_collections.kernel.root INodeCollection] + [java.util ArrayList])) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Constants +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(def ^:const ^:private +flat-threshold+ + "Maximum string length stored in flat (raw String) representation. + Above this, the rope promotes to chunked tree form. Set to match + the crossover point where tree edits outperform StringBuilder." + 1024) + +(def ^:const ^:private +target-chunk-size+ + "StringRope target chunk size in characters. Bound into the kernel's + `*target-chunk-size*` dynamic var via `with-tree`. Tuned via + `lein bench-rope-tuning`: 1024 wins on every operation at N=500K + vs the historical 256, because JEP 254 compact strings make larger + String chunks proportionally cheaper than vector chunks." + 1024) + +(def ^:const ^:private +min-chunk-size+ + "StringRope minimum internal chunk size (= target/2)." + 512) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Tree binding macro +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defmacro ^:private with-tree + "Bind the kernel's dynamic rope context for StringRope operations: + `tree/*t-join*` to the allocator, and the CSI target/min to the + StringRope-specific constants. Every tree-mutating operation must + execute inside this binding." + [alloc & body] + `(binding [tree/*t-join* ~alloc + ropetree/*target-chunk-size* +target-chunk-size+ + ropetree/*min-chunk-size* +min-chunk-size+] + ~@body)) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Flat-mode helpers +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn- flat? + "True when root is a raw String (flat representation)." + [root] + (string? root)) + +(defn- flat-size + "Size of a flat or tree root. Handles nil, String, and tree nodes." + ^long [root] + (cond + (nil? root) 0 + (string? root) (.length ^String root) + :else (long (-v root)))) + +(defn- ensure-tree-root + "Promote a flat String root to a tree root. Returns tree nodes unchanged. + Caller must bind tree/*t-join*." + [root alloc] + (cond + (nil? root) nil + (string? root) (ropetree/str->root ^String root) + :else root)) + +(defn- flat-splice + "StringBuilder-based splice on a flat String. Returns a String." + ^String [^String s ^long start ^long end ^String rep] + (let [si (int start) + ei (int end) + rlen (int (if rep (.length rep) 0)) + sb (StringBuilder. (+ (.length s) rlen (- si) ei))] + (.append sb s 0 si) + (when rep (.append sb rep)) + (.append sb s ei (.length s)) + (.toString sb))) + +(defn- make-root + "Create a StringRope root from a String result. Stays flat if ≤ threshold, + otherwise promotes to tree. Caller must bind tree/*t-join* for promotion." + [^String s alloc] + (cond + (zero? (.length s)) nil + (<= (.length s) +flat-threshold+) s + :else (ropetree/str->root s))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Helpers +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn- valid-index? + [^long n ^long k] + (and (<= 0 k) (< k n))) + +(defn- insert-index? + [^long n ^long k] + (and (<= 0 k) (<= k n))) + +(defn- check-index! + [^long n ^long k] + (when-not (valid-index? n k) + (throw (IndexOutOfBoundsException.)))) + +(defn- check-insert-index! + [^long n ^long k] + (when-not (insert-index? n k) + (throw (IndexOutOfBoundsException.)))) + +(defn- check-range! + [^long start ^long end ^long n] + (when (or (neg? start) (neg? end) (> start end) (> end n)) + (throw (IndexOutOfBoundsException.)))) + +(defn- safe-split-index + "Adjust split index to avoid splitting a UTF-16 surrogate pair. + If i lands between a high and low surrogate, moves back one position." + ^long [^String s ^long i] + (if (and (pos? i) (< i (.length s)) + (Character/isHighSurrogate (.charAt s (unchecked-int (dec i))))) + (dec i) + i)) + +(declare ->string-rope ->TransientStringRope ->StringRope* + coll->str coll->tree-root) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Monomorphic tree reduce (same rationale as byte_rope.clj) +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn- string-rope-tree-reduce + "Reduce `f` over every character in the string-rope tree rooted at `n`. + Monomorphic — assumes chunks are String. Returns acc or Reduced." + [f acc n] + (if (leaf? n) + acc + (let [l (-l n) + acc-left (if (leaf? l) acc (string-rope-tree-reduce f acc l))] + (if (reduced? acc-left) + acc-left + (let [^String ck (-k n) + len (.length ck) + acc-chunk + (loop [i (int 0), acc acc-left] + (if (< i len) + (let [ret (f acc (Character/valueOf (.charAt ck (unchecked-int i))))] + (if (reduced? ret) + ret + (recur (unchecked-inc-int i) ret))) + acc))] + (if (reduced? acc-chunk) + acc-chunk + (string-rope-tree-reduce f acc-chunk (-r n)))))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; StringRopeSeq — forward seq over String chunks using .charAt +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn- seq-equiv + [s1 o] + (if-not (or (instance? clojure.lang.Sequential o) + (instance? java.util.List o)) + false + (loop [s1 (seq s1) s2 (seq o)] + (cond + (nil? s1) (nil? s2) + (nil? s2) false + (not= (first s1) (first s2)) false + :else (recur (next s1) (next s2)))))) + +(deftype StringRopeSeq [enum ^String chunk ^long i cnt _meta] + clojure.lang.ISeq + (first [_] + (Character/valueOf (.charAt chunk (unchecked-int i)))) + (next [_] + (let [next-cnt (when cnt (unchecked-dec-int cnt)) + next-i (unchecked-inc i)] + (if (< next-i (.length chunk)) + (StringRopeSeq. enum chunk next-i next-cnt nil) + (when-let [e (tree/node-enum-rest enum)] + (let [chunk' ^String (-k (tree/node-enum-first e))] + (StringRopeSeq. e chunk' 0 next-cnt nil)))))) + (more [this] + (or (.next this) ())) + (cons [this o] + (clojure.lang.Cons. o this)) + + clojure.lang.Seqable + (seq [this] this) + + clojure.lang.Sequential + + java.lang.Iterable + (iterator [this] + (SeqIterator. this)) + + clojure.lang.Counted + (count [_] + (or cnt + (loop [e enum + ^String chunk chunk + i i + n 0] + (let [n (+ n (- (.length chunk) i))] + (if-let [e' (tree/node-enum-rest e)] + (let [chunk' ^String (-k (tree/node-enum-first e'))] + (recur e' chunk' 0 n)) + n))))) + + IReduceInit + (reduce [_ f init] + (loop [e enum + ^String chunk chunk + i (int i) + acc init] + (let [acc (loop [idx i acc acc] + (if (< idx (.length chunk)) + (let [ret (f acc (Character/valueOf (.charAt chunk (unchecked-int idx))))] + (if (reduced? ret) + ret + (recur (unchecked-inc-int idx) ret))) + acc))] + (if (reduced? acc) + @acc + (if-let [e' (tree/node-enum-rest e)] + (let [chunk' ^String (-k (tree/node-enum-first e'))] + (recur e' chunk' (int 0) acc)) + acc))))) + + IReduce + (reduce [this f] + (let [acc (Character/valueOf (.charAt chunk (unchecked-int i))) + next-i (unchecked-inc i)] + (if (< next-i (.length chunk)) + (.reduce ^IReduceInit + (StringRopeSeq. enum chunk next-i nil nil) f acc) + (if-let [e' (and enum (tree/node-enum-rest enum))] + (let [chunk' ^String (-k (tree/node-enum-first e'))] + (.reduce ^IReduceInit + (StringRopeSeq. e' chunk' 0 nil nil) f acc)) + acc)))) + + clojure.lang.IHashEq + (hasheq [this] + (Murmur3/hashOrdered this)) + + clojure.lang.IPersistentCollection + (empty [_] ()) + (equiv [this o] + (seq-equiv this o)) + + Object + (hashCode [this] + (Util/hash this)) + (equals [this o] + (Util/equals this o)) + + clojure.lang.IMeta + (meta [_] _meta) + + clojure.lang.IObj + (withMeta [_ m] + (StringRopeSeq. enum chunk i cnt m))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; StringRopeSeqReverse — reverse seq over String chunks using .charAt +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(deftype StringRopeSeqReverse [enum ^String chunk ^long i cnt _meta] + clojure.lang.ISeq + (first [_] + (Character/valueOf (.charAt chunk (unchecked-int i)))) + (next [_] + (let [next-cnt (when cnt (unchecked-dec-int cnt))] + (if (pos? i) + (StringRopeSeqReverse. enum chunk (unchecked-dec i) next-cnt nil) + (when-let [e (tree/node-enum-prior enum)] + (let [chunk' ^String (-k (tree/node-enum-first e))] + (StringRopeSeqReverse. e chunk' (dec (.length chunk')) next-cnt nil)))))) + (more [this] + (or (.next this) ())) + (cons [this o] + (clojure.lang.Cons. o this)) + + clojure.lang.Seqable + (seq [this] this) + + clojure.lang.Sequential + + java.lang.Iterable + (iterator [this] + (SeqIterator. this)) + + clojure.lang.Counted + (count [_] + (or cnt + (loop [e enum + ^String chunk chunk + i i + n 0] + (let [n (+ n (inc i))] + (if-let [e' (tree/node-enum-prior e)] + (let [chunk' ^String (-k (tree/node-enum-first e'))] + (recur e' chunk' (dec (.length chunk')) n)) + n))))) + + IReduceInit + (reduce [_ f init] + (loop [e enum + ^String chunk chunk + i (int i) + acc init] + (let [acc (loop [idx i acc acc] + (if (neg? idx) + acc + (let [ret (f acc (Character/valueOf (.charAt chunk (unchecked-int idx))))] + (if (reduced? ret) + ret + (recur (unchecked-dec-int idx) ret)))))] + (if (reduced? acc) + @acc + (if-let [e' (tree/node-enum-prior e)] + (let [chunk' ^String (-k (tree/node-enum-first e'))] + (recur e' chunk' (int (dec (.length chunk'))) acc)) + acc))))) + + IReduce + (reduce [this f] + (let [acc (Character/valueOf (.charAt chunk (unchecked-int i)))] + (if (pos? i) + (.reduce ^IReduceInit + (StringRopeSeqReverse. enum chunk (unchecked-dec i) nil nil) f acc) + (if-let [e' (and enum (tree/node-enum-prior enum))] + (let [chunk' ^String (-k (tree/node-enum-first e'))] + (.reduce ^IReduceInit + (StringRopeSeqReverse. e' chunk' (dec (.length chunk')) nil nil) f acc)) + acc)))) + + clojure.lang.IHashEq + (hasheq [this] + (Murmur3/hashOrdered this)) + + clojure.lang.IPersistentCollection + (empty [_] ()) + (equiv [this o] + (seq-equiv this o)) + + Object + (hashCode [this] + (Util/hash this)) + (equals [this o] + (Util/equals this o)) + + clojure.lang.IMeta + (meta [_] _meta) + + clojure.lang.IObj + (withMeta [_ m] + (StringRopeSeqReverse. enum chunk i cnt m))) + + +(defn- string-rope-seq + [root] + (cond + (nil? root) nil + (string? root) + (let [^String s root] + (when (pos? (.length s)) + ;; Flat mode: use the whole string as a single chunk with nil enum. + ;; node-enum-rest handles nil safely, returning nil at end of chunk. + (StringRopeSeq. nil s 0 (.length s) nil))) + :else + (when-let [enum (tree/node-enumerator root)] + (let [chunk ^String (-k (tree/node-enum-first enum))] + (StringRopeSeq. enum chunk 0 (ropetree/rope-size root) nil))))) + +(defn- string-rope-rseq + [root] + (cond + (nil? root) nil + (string? root) + (let [^String s root] + (when (pos? (.length s)) + (StringRopeSeqReverse. nil s (dec (.length s)) (.length s) nil))) + :else + (when-let [enum (tree/node-enumerator-reverse root)] + (let [chunk ^String (-k (tree/node-enum-first enum))] + (StringRopeSeqReverse. enum chunk (dec (.length chunk)) + (ropetree/rope-size root) nil))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Equality / Hashing +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn- string-rope-equiv + "CharSequence-based equality. StringRope equals: + - another StringRope with same content + - a java.lang.String with same content + - any CharSequence with same content + Does NOT equal a generic Rope of Characters." + [this o] + (cond + (identical? this o) true + + (instance? CharSequence o) + (let [^CharSequence cs1 this + ^CharSequence cs2 o + n (.length cs1)] + (and (= n (.length cs2)) + (loop [i 0] + (if (= i n) + true + (if (= (.charAt cs1 i) (.charAt cs2 i)) + (recur (unchecked-inc i)) + false))))) + + :else false)) + +(defn- string-rope-hasheq + "Hash compatible with String's hasheq for value equality. + Clojure strings use Murmur3/hashInt on String.hashCode." + [root] + (if (string? root) + (Murmur3/hashInt (.hashCode ^String root)) + (Murmur3/hashInt (.hashCode (ropetree/rope->str root))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; StringRope +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(deftype StringRope [root alloc _meta] + + java.io.Serializable + + INodeCollection + (getAllocator [_] alloc) + + clojure.lang.IMeta + (meta [_] _meta) + + clojure.lang.IObj + (withMeta [_ m] (->StringRope* root alloc m)) + + java.lang.CharSequence + ;; Monomorphic charAt — inlines the tree walk with direct `.length`/`.charAt` + ;; on the known String chunks, bypassing PRopeChunk protocol dispatch. + (charAt [_ i] + (let [ii (long i) + n (flat-size root)] + (when-not (valid-index? n ii) + (throw (StringIndexOutOfBoundsException. (int i)))) + (if (string? root) + (.charAt ^String root (int i)) + (loop [nd root, j ii] + (let [l (-l nd) + ls (if (leaf? l) 0 (long (-v l))) + ^String ck (-k nd) + cs (long (.length ck)) + rs (+ ls cs)] + (cond + (< j ls) (recur l j) + (< j rs) (.charAt ck (unchecked-int (- j ls))) + :else (recur (-r nd) (- j rs)))))))) + (length [_] + (flat-size root)) + (subSequence [_ start end] + (if (string? root) + (let [^String s root + n (.length s)] + (check-range! (long start) (long end) n) + (let [sub (.substring s (int start) (int end))] + (->StringRope* (when (pos? (.length sub)) sub) alloc _meta))) + (let [n (ropetree/rope-size root)] + (check-range! (long start) (long end) n) + (with-tree alloc + (->StringRope* (ropetree/rope-subvec-root root (long start) (long end)) + alloc _meta))))) + (toString [_] + (cond + (nil? root) "" + (string? root) root + :else (ropetree/rope->str root))) + + java.lang.Comparable + (compareTo [this o] + (if (identical? this o) + 0 + (let [^CharSequence cs (if (instance? CharSequence o) + o + (str o)) + n1 (.length ^CharSequence this) + n2 (.length cs)] + (loop [i 0] + (cond + (and (= i n1) (= i n2)) 0 + (= i n1) -1 + (= i n2) 1 + :else + (let [c (compare (.charAt ^CharSequence this i) (.charAt cs i))] + (if (zero? c) + (recur (unchecked-inc i)) + c))))))) + + clojure.lang.Counted + (count [_] + (flat-size root)) + + clojure.lang.Indexed + ;; Monomorphic nth — same tree walk as charAt, returns Character (boxed) + ;; per the Indexed contract. + (nth [_ i] + (let [ii (long i) + n (flat-size root)] + (when-not (valid-index? n ii) + (throw (IndexOutOfBoundsException.))) + (if (string? root) + (Character/valueOf (.charAt ^String root (unchecked-int ii))) + (loop [nd root, j ii] + (let [l (-l nd) + ls (if (leaf? l) 0 (long (-v l))) + ^String ck (-k nd) + cs (long (.length ck)) + rs (+ ls cs)] + (cond + (< j ls) (recur l j) + (< j rs) (Character/valueOf (.charAt ck (unchecked-int (- j ls)))) + :else (recur (-r nd) (- j rs)))))))) + (nth [this i not-found] + (if (and (integer? i) (valid-index? (flat-size root) (long i))) + (.nth this (int i)) + not-found)) + + clojure.lang.ILookup + (valAt [this k] + (if (integer? k) + (.nth this (int k) nil) + nil)) + (valAt [this k not-found] + (if (integer? k) + (.nth this (int k) not-found) + not-found)) + + clojure.lang.IFn + (invoke [this k] + (.valAt this k)) + (invoke [this k not-found] + (.valAt this k not-found)) + (applyTo [this args] + (let [n (RT/boundedLength args 2)] + (case n + 0 (throw (clojure.lang.ArityException. n (.. this getClass getSimpleName))) + 1 (.invoke this (first args)) + 2 (.invoke this (first args) (second args)) + (throw (clojure.lang.ArityException. n (.. this getClass getSimpleName)))))) + + clojure.lang.IPersistentCollection + (cons [_ o] + (if (string? root) + (let [^String s root + c (char o)] + (if (< (.length s) +flat-threshold+) + (let [sb (StringBuilder. (unchecked-inc-int (.length s)))] + (.append sb s) + (.append sb c) + (->StringRope* (.toString sb) alloc _meta)) + (with-tree alloc + (->StringRope* (ropetree/rope-conj-right (ropetree/str->root s) c) + alloc _meta)))) + (if (nil? root) + (->StringRope* (String/valueOf (char o)) alloc _meta) + (with-tree alloc + (->StringRope* (ropetree/rope-conj-right root (char o)) alloc _meta))))) + (empty [_] + (->StringRope* nil alloc _meta)) + (equiv [this o] + (string-rope-equiv this o)) + + clojure.lang.IPersistentStack + (peek [_] + (cond + (nil? root) nil + (string? root) (let [^String s root] + (when (pos? (.length s)) + (Character/valueOf (.charAt s (unchecked-dec-int (.length s)))))) + :else (let [c (ropetree/rope-peek-right root)] + (if (instance? Character c) c (Character/valueOf (char c)))))) + (pop [_] + (if (string? root) + (let [^String s root + n (.length s)] + (cond + (<= n 1) (->StringRope* nil alloc _meta) + :else (->StringRope* (.substring s 0 (unchecked-dec-int n)) alloc _meta))) + (with-tree alloc + (->StringRope* (ropetree/rope-pop-right root) alloc _meta)))) + + clojure.lang.Seqable + (seq [_] + (string-rope-seq root)) + + clojure.lang.Reversible + (rseq [_] + (string-rope-rseq root)) + + clojure.lang.Sequential + + java.lang.Iterable + (iterator [this] + (SeqIterator. (seq this))) + + IReduceInit + (reduce [_ f init] + (if (string? root) + (let [^String s root + len (.length s)] + (loop [i (int 0), acc init] + (if (< i len) + (let [ret (f acc (Character/valueOf (.charAt s (unchecked-int i))))] + (if (reduced? ret) + @ret + (recur (unchecked-inc-int i) ret))) + acc))) + (let [result (string-rope-tree-reduce f init root)] + (if (reduced? result) @result result)))) + + IReduce + (reduce [_ f] + (cond + (nil? root) (f) + (string? root) + (let [^String s root + len (.length s)] + (if (zero? len) + (f) + (loop [i (int 1), acc (Character/valueOf (.charAt s 0))] + (if (< i len) + (let [ret (f acc (Character/valueOf (.charAt s (unchecked-int i))))] + (if (reduced? ret) + @ret + (recur (unchecked-inc-int i) ret))) + acc)))) + :else + ;; Tree mode: seed from first char of leftmost chunk, then walk the rest. + (let [^ordered_collections.kernel.node.INode least (tree/node-least root) + ^String first-chunk (-k least) + first-char (Character/valueOf (.charAt first-chunk 0)) + rest-root (tree/node-remove-least root) + rest-of-first-chunk-acc + (let [len (.length first-chunk)] + (loop [i (int 1), acc first-char] + (if (< i len) + (let [ret (f acc (Character/valueOf (.charAt first-chunk (unchecked-int i))))] + (if (reduced? ret) ret (recur (unchecked-inc-int i) ret))) + acc)))] + (if (reduced? rest-of-first-chunk-acc) + @rest-of-first-chunk-acc + (let [result (if (leaf? rest-root) + rest-of-first-chunk-acc + (string-rope-tree-reduce f rest-of-first-chunk-acc rest-root))] + (if (reduced? result) @result result)))))) + + cp/CollReduce + (coll-reduce [this f] + (.reduce ^IReduce this f)) + (coll-reduce [this f init] + (.reduce ^IReduceInit this f init)) + + r/CollFold + (coll-fold [this n combinef reducef] + (cond + (nil? root) (combinef) + (string? root) (.reduce ^IReduceInit this reducef (combinef)) + :else (ropetree/rope-fold root (long n) combinef reducef))) + + clojure.lang.IHashEq + (hasheq [_] + (string-rope-hasheq root)) + + clojure.lang.Associative + (containsKey [_ k] + (and (integer? k) (valid-index? (flat-size root) (long k)))) + (entryAt [this k] + (when (.containsKey this k) + (clojure.lang.MapEntry. k (.nth ^clojure.lang.Indexed this (int k))))) + (assoc [this k v] + (let [i (long k) + n (flat-size root)] + (cond + (not (insert-index? n i)) + (throw (IndexOutOfBoundsException.)) + + (= i n) + (.cons this v) + + :else + (if (string? root) + (let [^String s root + sb (StringBuilder. (.length s))] + (.append sb s 0 (int i)) + (.append sb (char v)) + (.append sb s (unchecked-inc-int (int i)) (.length s)) + (->StringRope* (.toString sb) alloc _meta)) + (with-tree alloc + (->StringRope* (ropetree/rope-assoc root i (char v)) alloc _meta)))))) + + java.util.Collection + (toArray [this] + (if (string? root) + (let [^String s root + n (.length s) + arr (object-array n)] + (dotimes [i n] + (aset arr i (Character/valueOf (.charAt s i)))) + arr) + (let [n (ropetree/rope-size root) + arr (object-array n)] + (ropetree/rope-reduce + (fn [^long i x] + (aset arr i (if (instance? Character x) x (Character/valueOf (char x)))) + (unchecked-inc i)) + (long 0) + root) + arr))) + (isEmpty [_] + (nil? root)) + (^boolean contains [_ x] + (if (instance? Character x) + (let [^Character ch x + target (.charValue ch)] + (if (string? root) + (let [^String s root] + (not= -1 (.indexOf s (int target)))) + (true? + (ropetree/rope-reduce + (fn [_ c] + (if (= (char c) target) (reduced true) false)) + false + root)))) + false)) + (containsAll [this c] + (every? #(.contains this %) c)) + (size [_] + (flat-size root)) + (add [_ _] + (throw (UnsupportedOperationException.))) + (addAll [_ _] + (throw (UnsupportedOperationException.))) + (^boolean remove [_ _] + (throw (UnsupportedOperationException.))) + (removeAll [_ _] + (throw (UnsupportedOperationException.))) + (retainAll [_ _] + (throw (UnsupportedOperationException.))) + (clear [_] + (throw (UnsupportedOperationException.))) + + proto/PRope + (rope-cat [this other] + (when-not (or (string? other) (instance? StringRope other)) + (throw (IllegalArgumentException. + "StringRope rope-cat requires a StringRope or String"))) + (let [^String s1 (cond (nil? root) "" + (string? root) root + :else nil) + ^String s2 (if (string? other) + other + (let [r (.-root ^StringRope other)] + (cond (nil? r) "" + (string? r) r + :else nil)))] + ;; Fast path: both sides are flat strings and combined fits threshold + (if (and s1 s2 (<= (+ (.length s1) (.length s2)) +flat-threshold+)) + (let [result (str s1 s2)] + (->StringRope* (when (pos? (.length result)) result) alloc _meta)) + ;; Tree path + (with-tree alloc + (let [l (ensure-tree-root root alloc) + r (if (string? other) + (ropetree/str->root ^String other) + (ensure-tree-root (.-root ^StringRope other) alloc))] + (->StringRope* (ropetree/rope-concat l r) alloc _meta)))))) + (rope-split [_ i] + (let [n (flat-size root)] + (check-insert-index! n (long i)) + (if (string? root) + (let [^String s root + ii (int i)] + [(->StringRope* (when (pos? ii) (.substring s 0 ii)) alloc _meta) + (->StringRope* (when (< ii (.length s)) (.substring s ii)) alloc _meta)]) + (with-tree alloc + (let [[l r] (ropetree/ensure-split-parts + (ropetree/rope-split-at root (long i)))] + [(->StringRope* l alloc _meta) (->StringRope* r alloc _meta)]))))) + (rope-sub [_ start end] + (let [n (flat-size root)] + (check-range! (long start) (long end) n) + (if (string? root) + (let [^String s root + result (.substring s (int start) (int end))] + (->StringRope* (when (pos? (.length result)) result) alloc _meta)) + (with-tree alloc + (->StringRope* (ropetree/rope-subvec-root root (long start) (long end)) + alloc _meta))))) + (rope-insert [this i coll] + (let [n (flat-size root) + ii (long i)] + (check-insert-index! n ii) + (if (string? root) + (let [result (flat-splice (or ^String root "") ii ii (coll->str coll))] + (with-tree alloc + (->StringRope* (make-root result alloc) alloc _meta))) + (let [^String ins (coll->str coll) + ins-len (.length ins)] + (or (when (<= ins-len +target-chunk-size+) + (when-let [new-root (ropetree/rope-splice-inplace + root ii ii (when (pos? ins-len) ins) alloc)] + (->StringRope* new-root alloc _meta))) + (with-tree alloc + (->StringRope* (ropetree/rope-insert-root root ii + (coll->tree-root coll alloc)) + alloc _meta))))))) + (rope-remove [this start end] + (let [n (flat-size root)] + (check-range! (long start) (long end) n) + (if (string? root) + (let [^String s root + ^String result (flat-splice s (long start) (long end) nil)] + (->StringRope* (when (pos? (.length result)) result) alloc _meta)) + (or (when-let [new-root (ropetree/rope-splice-inplace + root (long start) (long end) nil alloc)] + (->StringRope* new-root alloc _meta)) + (with-tree alloc + (->StringRope* (ropetree/rope-remove-root root (long start) (long end)) + alloc _meta)))))) + (rope-splice [this start end coll] + (let [n (flat-size root) + si (long start) + ei (long end)] + (check-range! si ei n) + (if (string? root) + (let [result (flat-splice (or ^String root "") si ei (coll->str coll))] + (with-tree alloc + (->StringRope* (make-root result alloc) alloc _meta))) + (let [^String rep (coll->str coll) + rep-len (.length rep)] + (or (when (<= rep-len +target-chunk-size+) + (when-let [new-root (ropetree/rope-splice-inplace + root si ei (when (pos? rep-len) rep) alloc)] + (->StringRope* new-root alloc _meta))) + (with-tree alloc + (let [mid-root (when (pos? rep-len) + (coll->tree-root coll alloc))] + (->StringRope* (ropetree/rope-splice-root root si ei mid-root) + alloc _meta)))))))) + (rope-chunks [_] + (cond + (nil? root) nil + (string? root) (list root) + :else (ropetree/rope-chunks-seq root))) + (rope-str [_] + (cond + (nil? root) "" + (string? root) root + :else (ropetree/rope->str root))) + + IEditableCollection + (asTransient [_] + (->TransientStringRope + (ensure-tree-root root alloc) + alloc (ArrayList.) (StringBuilder.) 0 true _meta)) + + Object + (hashCode [this] + (.hashCode (.toString this))) + (equals [this o] + (string-rope-equiv this o))) + + +(defn- ->StringRope* + "Construct a StringRope." + [root alloc meta] + (StringRope. root alloc meta)) + +(defn- coll->str + "Coerce a splice/insert argument to a String." + ^String [coll] + (cond + (string? coll) coll + (instance? StringRope coll) (.toString ^StringRope coll) + :else (str coll))) + +(defn- coll->tree-root + "Coerce a splice/insert argument to a tree root for the multi-traversal + fallback. Preserves existing StringRope tree structure when possible." + [coll alloc] + (cond + (instance? StringRope coll) + (ensure-tree-root (.-root ^StringRope coll) alloc) + (string? coll) (ropetree/str->root ^String coll) + :else (ropetree/str->root (str coll)))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; TransientStringRope +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn- transient-string-appended-root + "Build a rope tree from flushed chunks + tail. Caller must bind *t-join*." + [^ArrayList chunks ^StringBuilder tail] + (let [chunk-count (.size chunks) + tail-empty? (zero? (.length tail))] + (cond + (and (zero? chunk-count) tail-empty?) + nil + + (zero? chunk-count) + (ropetree/chunks->root [(.toString tail)]) + + tail-empty? + (ropetree/chunks->root (vec chunks)) + + :else + (ropetree/chunks->root-csi + (conj (vec chunks) (.toString tail)))))) + +(def ^:private ^:const +transient-rebuild-threshold+ 4) + +(defn- transient-string-final-root + "Merge original root with appended chunks/tail. Caller must bind *t-join*." + [root ^ArrayList chunks ^StringBuilder tail] + (cond + (and (zero? (.size chunks)) (zero? (.length tail))) + root + + ;; Fast path: small tail, no flushed chunks — build from tail string + (and (zero? (.size chunks)) (<= (.length tail) 32)) + (let [s (.toString tail)] + (if (nil? root) + ;; Empty root — just make a single chunk node from the tail + (tree/*t-join* s nil (leaf) (leaf)) + ;; Non-empty root — conj chars individually + (let [n (.length s)] + (loop [i 0, r root] + (if (< i n) + (recur (unchecked-inc i) (ropetree/rope-conj-right r (.charAt s i))) + r))))) + + :else + (let [appended-root (transient-string-appended-root chunks tail) + appended-chunks (+ (.size chunks) (if (zero? (.length tail)) 0 1))] + (cond + (nil? root) + appended-root + + (<= appended-chunks +transient-rebuild-threshold+) + (ropetree/rope-concat root appended-root) + + :else + (ropetree/chunks->root-csi + (cond-> (vec (ropetree/root->chunks root)) + (pos? (.size chunks)) (into (vec chunks)) + (pos? (.length tail)) (conj (.toString tail)))))))) + +(deftype TransientStringRope [^:unsynchronized-mutable root + alloc + ^ArrayList chunks + ^StringBuilder tail + ^:unsynchronized-mutable chunk-chars + ^:unsynchronized-mutable edit + _meta] + ITransientCollection + (conj [this x] + (when-not edit (throw (IllegalAccessError. "Transient used after persistent! call"))) + (.append tail (char x)) + (when (>= (.length tail) +target-chunk-size+) + (.add chunks (.toString tail)) + (set! chunk-chars (+ chunk-chars (.length tail))) + (.setLength tail 0)) + this) + + (persistent [_] + (when-not edit (throw (IllegalAccessError. "Transient used after persistent! call"))) + (set! edit false) + (with-tree alloc + (let [tree-root (transient-string-final-root root chunks tail) + ;; Demote to flat if the result is small enough + final-root (if (and tree-root + (<= (ropetree/rope-size tree-root) +flat-threshold+)) + (ropetree/rope->str tree-root) + tree-root)] + (->StringRope* final-root alloc _meta)))) + + clojure.lang.Counted + (count [_] + (+ (ropetree/rope-size root) chunk-chars (.length tail))) + + clojure.lang.Indexed + (nth [this i] + (let [rs (ropetree/rope-size root) + j (- (long i) rs)] + (cond + (and (>= i 0) (< i rs)) + (let [c (ropetree/rope-nth root (long i))] + (if (instance? Character c) c (Character/valueOf (char c)))) + + (and (>= j 0) (< j chunk-chars)) + (let [chunk-idx (quot j +target-chunk-size+) + offset (rem j +target-chunk-size+) + ^String chunk (.get chunks (int chunk-idx))] + (Character/valueOf (.charAt chunk (unchecked-int offset)))) + + (< (- j chunk-chars) (.length tail)) + (Character/valueOf (.charAt tail (unchecked-int (- j chunk-chars)))) + + :else + (throw (IndexOutOfBoundsException.))))) + (nth [this i not-found] + (let [rs (ropetree/rope-size root) + n (+ rs chunk-chars (.length tail))] + (if (and (>= (long i) 0) (< (long i) n)) + (.nth this (int i)) + not-found)))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Constructors +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn string-rope + "Create a persistent string rope for structural text editing. + Backed by a chunked weight-balanced tree: O(log n) concat, split, + splice, insert, and remove. Competitive at all sizes, dominant at scale. + + Small strings (≤ 1024 chars) are stored in flat String representation + for zero-overhead reads. Edits that grow past the threshold are + transparently promoted to the chunked tree form. + + Implements CharSequence for seamless Java interop. + + Examples: + (string-rope) ;=> #string/rope \"\" + (string-rope \"hello world\") ;=> #string/rope \"hello world\" + (string-rope (slurp \"big-file.txt\")) ;=> efficient chunked representation" + ([] (->StringRope* nil ropetree/string-rope-node-create {})) + ([s] (let [^String text (str s)] + (cond + (zero? (.length text)) + (->StringRope* nil ropetree/string-rope-node-create {}) + + (<= (.length text) +flat-threshold+) + (->StringRope* text ropetree/string-rope-node-create {}) + + :else + (with-tree ropetree/string-rope-node-create + (->StringRope* (ropetree/str->root text) + ropetree/string-rope-node-create {})))))) + +(defn string-rope-concat + "Concatenate string ropes or strings. + One argument: returns it as a string rope. + Two arguments: O(log n) binary tree join. + Three or more: O(total chunks) bulk construction." + ([x] + (->string-rope x)) + ([left right] + (proto/rope-cat (->string-rope left) (->string-rope right))) + ([left right & more] + (with-tree ropetree/string-rope-node-create + (let [alloc ropetree/string-rope-node-create + all (list* left right more) + chunks (into [] + (mapcat (fn [x] + (let [r (->string-rope x) + rt (.-root ^StringRope r)] + (cond + (nil? rt) [] + (string? rt) [rt] + :else (ropetree/root->chunks rt))))) + all)] + (->StringRope* (ropetree/chunks->root-csi chunks) + alloc (or (meta left) {})))))) + +(defn- ->string-rope + "Coerce x to a StringRope." + [x] + (cond + (instance? StringRope x) x + (string? x) (string-rope x) + :else (string-rope (str x)))) + +(defn read-string-rope + "Reader function for #string/rope tagged literals." + [s] + (string-rope s)) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Literal Representation +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defmethod print-method StringRope [^StringRope r ^java.io.Writer w] + (.write w "#string/rope ") + (print-method (.toString r) w)) diff --git a/test/ordered_collections/bench_charts.clj b/test/ordered_collections/bench_charts.clj new file mode 100644 index 0000000..fe984a6 --- /dev/null +++ b/test/ordered_collections/bench_charts.clj @@ -0,0 +1,409 @@ +(ns ordered-collections.bench-charts + "Generate benchmark charts from EDN artifacts via XChart. + + Usage: + lein bench-charts ; latest EDN → doc/charts/ + lein bench-charts --file path.edn ; specific EDN + + Produces PNG files in doc/charts/ showing headline performance + across collection types and cardinalities." + (:require [clojure.edn :as edn] + [clojure.java.io :as io] + [clojure.string :as str] + [ordered-collections.bench-utils :as bu]) + (:import [org.knowm.xchart + XYChart XYChartBuilder XYSeries XYSeries$XYSeriesRenderStyle + CategoryChart CategoryChartBuilder CategorySeries CategorySeries$CategorySeriesRenderStyle + BitmapEncoder BitmapEncoder$BitmapFormat] + [org.knowm.xchart.style Styler$LegendPosition + XYStyler CategoryStyler] + [java.awt Color Font BasicStroke] + [java.io File]) + (:gen-class)) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Style constants +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(def ^:private +width+ 1400) +(def ^:private +height+ 750) + +(def ^:private +blue+ (Color. 37 99 235)) +(def ^:private +green+ (Color. 5 150 105)) +(def ^:private +purple+ (Color. 124 58 237)) +(def ^:private +red+ (Color. 220 38 38)) +(def ^:private +orange+ (Color. 234 88 12)) +(def ^:private +gray+ (Color. 156 163 175)) + +(def ^:private +series-colors+ [+blue+ +green+ +purple+ +orange+ +red+]) +(def ^:private +line-width+ (BasicStroke. 2.5)) + +(def ^:private +label-font+ (Font. "SansSerif" Font/PLAIN 13)) +(def ^:private +title-font+ (Font. "SansSerif" Font/BOLD 16)) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Data extraction helpers +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn- load-bench-edn [path] + (edn/read-string (slurp path))) + +(defn- bench-files [] + (->> (file-seq (io/file "bench-results")) + (filter #(.isFile ^File %)) + (filter #(re-find #"\d{4}-\d{2}-\d{2}_.*\.edn$" (.getName ^File %))) + (sort-by #(.getName ^File %)))) + +(defn- latest-bench-file [] + (last (bench-files))) + +(defn- lookup + "Find mean-ns for a specific [size group variant] in the EDN." + [edn size group variant] + (get-in edn [:benchmarks size group variant :mean-ns])) + +(defn- speedup + "Compute speedup: baseline-ns / oc-ns. Returns nil if either is missing." + [edn size group oc-variant peer-variant] + (let [oc (lookup edn size group oc-variant) + peer (lookup edn size group peer-variant)] + (when (and oc peer (pos? oc)) + (/ (double peer) (double oc))))) + +(defn- sizes-vec [edn] + (sort (keys (:benchmarks edn)))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; XChart helpers +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn- style-xy! [^XYChart chart & {:keys [log-x log-y y-label title]}] + (let [^XYStyler styler (.getStyler chart)] + (.setChartTitleFont styler +title-font+) + (.setAxisTickLabelsFont styler +label-font+) + (.setLegendFont styler +label-font+) + (.setLegendPosition styler Styler$LegendPosition/InsideNW) + (.setPlotGridLinesVisible styler true) + (.setPlotGridLinesColor styler (Color. 230 230 230)) + (.setChartBackgroundColor styler Color/WHITE) + (.setPlotBackgroundColor styler Color/WHITE) + (.setPlotBorderVisible styler false) + (when log-x (.setXAxisLogarithmic styler true)) + (when log-y (.setYAxisLogarithmic styler true)) + (when title (.setTitle chart title)) + (when y-label (.setYAxisTitle chart y-label)) + (.setXAxisTitle chart "Collection size (N)")) + chart) + +(defn- add-xy-series! + [^XYChart chart series-name x-vals y-vals color] + (let [^XYSeries series (.addSeries chart (str series-name) + (double-array x-vals) + (double-array y-vals))] + (.setLineColor series color) + (.setLineStyle series +line-width+) + (.setMarkerColor series color) + (.setXYSeriesRenderStyle series XYSeries$XYSeriesRenderStyle/Line) + series)) + +(defn- add-parity-line! + "Add a dashed 1.0x reference line." + [^XYChart chart sizes] + (let [^XYSeries series (.addSeries chart "1.0x (parity)" + (double-array sizes) + (double-array (repeat (count sizes) 1.0)))] + (.setLineColor series +gray+) + (.setLineStyle series (BasicStroke. 1.5 + BasicStroke/CAP_BUTT + BasicStroke/JOIN_MITER + 10.0 + (float-array [6.0 4.0]) + 0.0)) + (.setMarkerColor series (Color. 0 0 0 0)) + (.setShowInLegend series false) + series)) + +(defn- style-category! [^CategoryChart chart & {:keys [log-y title y-label legend?]}] + (let [^CategoryStyler styler (.getStyler chart)] + (.setChartTitleFont styler +title-font+) + (.setAxisTickLabelsFont styler +label-font+) + (.setLegendFont styler +label-font+) + (.setLegendVisible styler (boolean legend?)) + (when legend? (.setLegendPosition styler Styler$LegendPosition/InsideNE)) + (.setPlotGridLinesVisible styler true) + (.setPlotGridLinesColor styler (Color. 230 230 230)) + (.setChartBackgroundColor styler Color/WHITE) + (.setPlotBackgroundColor styler Color/WHITE) + (.setPlotBorderVisible styler false) + (.setDefaultSeriesRenderStyle styler CategorySeries$CategorySeriesRenderStyle/Bar) + (.setAvailableSpaceFill styler 0.7) + (.setOverlapped styler true) + (when log-y (.setYAxisLogarithmic styler true)) + (when title (.setTitle chart title)) + (.setXAxisTitle chart "") + (when y-label (.setYAxisTitle chart y-label))) + chart) + +(defn- save-chart! [chart path] + (let [f (io/file path)] + (io/make-parents f) + (BitmapEncoder/saveBitmapWithDPI chart path + BitmapEncoder$BitmapFormat/PNG 150) + (println (str " wrote " path)))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Chart 1: Set Algebra Scaling +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn- chart-set-algebra [edn sizes out-dir] + (let [chart (-> (XYChartBuilder.) + (.width +width+) (.height +height+) + (.build)) + ops [["Union" :set-union :ordered-set :sorted-set +blue+] + ["Intersection" :set-intersection :ordered-set :sorted-set +green+] + ["Difference" :set-difference :ordered-set :sorted-set +purple+]]] + (style-xy! chart + :log-x true :log-y true + :title "Set Algebra: ordered-set vs sorted-set" + :y-label "Speedup (×)") + (add-parity-line! chart sizes) + (doseq [[label group oc peer color] ops] + (let [ys (mapv #(or (speedup edn % group oc peer) 1.0) sizes)] + (add-xy-series! chart label (mapv double sizes) ys color))) + (save-chart! chart (str out-dir "/set-algebra-scaling.png")))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Chart 2: Rope Structural Editing Scaling +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn- chart-rope-editing [edn sizes out-dir] + (let [chart (-> (XYChartBuilder.) + (.width +width+) (.height +height+) + (.build)) + ops [["Rope vs Vector" :rope-repeated-edits :rope :vector +blue+] + ["StringRope vs String" :string-rope-repeated-edits :string-rope :string +green+] + ["ByteRope vs byte[]" :byte-rope-repeated-edits :byte-rope :byte-array +purple+]]] + (style-xy! chart + :log-x true :log-y true + :title "Rope Structural Editing: 200 Random Splices" + :y-label "Speedup (×)") + (add-parity-line! chart sizes) + (doseq [[label group oc peer color] ops] + (let [ys (mapv #(or (speedup edn % group oc peer) 0.1) sizes)] + (add-xy-series! chart label (mapv double sizes) ys color))) + (save-chart! chart (str out-dir "/rope-editing-scaling.png")))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Chart 3: Collection Winners (horizontal bar) +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn- chart-collection-winners [edn sizes out-dir] + (let [max-size (last sizes) + entries [["Range Map (carve-out)" :range-map-carve-out :range-map :guava-range-map] + ["Priority Queue (push)" :priority-queue-push :priority-queue :sorted-set-by] + ["Set Algebra (union)" :set-union :ordered-set :sorted-set] + ["ByteRope (remove)" :byte-rope-remove :byte-rope :byte-array] + ["StringRope (splice)" :string-rope-splice :string-rope :string] + ["Rope (repeated edits)" :rope-repeated-edits :rope :vector] + ["Segment Tree (range query)" :segment-tree-query :segment-tree :sorted-map]] + values (mapv (fn [[_ group oc peer]] + (or (speedup edn max-size group oc peer) 1.0)) + entries) + ;; Use XY chart as a dot plot on a log-Y axis. Each collection type + ;; becomes a separate named series (one point each) so the legend + ;; provides the labels. More effective than a bar chart when values + ;; span 3 orders of magnitude. + ^XYChart chart (-> (XYChartBuilder.) + (.width +width+) (.height +height+) + (.build)) + n (count entries)] + (style-xy! chart + :log-y true + :title (str "Best Headline Win per Collection Type (N=" max-size ")") + :y-label "Speedup (x)") + (.setXAxisTitle chart "") + (let [^XYStyler styler (.getStyler chart)] + (.setLegendPosition styler Styler$LegendPosition/InsideNW) + (.setXAxisTicksVisible styler false) + (.setXAxisMin styler -0.5) + (.setXAxisMax styler (double (- n 0.5)))) + ;; One point per collection type, each in its own color + (doseq [[i [label _ _ _]] (map-indexed vector entries)] + (let [color (nth +series-colors+ (mod i (count +series-colors+))) + ^XYSeries series (.addSeries chart + (str label " (" (format "%.0fx" (double (nth values i))) ")") + (double-array [(double i)]) + (double-array [(double (nth values i))]))] + (.setMarkerColor series color) + (.setLineColor series (Color. 0 0 0 0)) + (.setXYSeriesRenderStyle series XYSeries$XYSeriesRenderStyle/Scatter))) + (add-parity-line! chart [0.0 (double (dec n))]) + (save-chart! chart (str out-dir "/collection-winners.png")))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Chart 4: Rope Operations Profile (diverging bar) +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn- chart-rope-profile [edn sizes out-dir] + (let [max-size (last sizes) + ops [["construction" :rope-construction :rope :vector] + ["nth (random)" :rope-nth :rope :vector] + ["reduce (sum)" :rope-reduce :rope :vector] + ["fold (sum)" :rope-fold-sum :rope :vector] + ["concat" :rope-concat :rope :vector] + ["splice" :rope-splice :rope :vector] + ["repeated-edits" :rope-repeated-edits :rope :vector]] + labels (mapv first ops) + values (mapv (fn [[_ group oc peer]] + (or (speedup edn max-size group oc peer) 1.0)) + ops) + ;; Split into wins and losses for color coding + win-vals (mapv #(if (>= % 1.0) % nil) values) + loss-vals (mapv #(if (< % 1.0) % nil) values) + ^CategoryChart chart (-> (CategoryChartBuilder.) + (.width +width+) (.height +height+) + (.build))] + ;; Switch to XY scatter — log-Y on CategoryChart doesn't work with + ;; values spanning 0.1x-1300x. + (let [^XYChart xychart (-> (XYChartBuilder.) + (.width +width+) (.height +height+) + (.build)) + n (count ops)] + (style-xy! xychart + :log-y true + :title (str "Rope vs PersistentVector — Full Profile (N=" max-size ")") + :y-label "Speedup (x, log scale)") + (.setXAxisTitle xychart "") + (let [^XYStyler styler (.getStyler xychart)] + (.setXAxisTicksVisible styler false) + (.setXAxisMin styler -0.5) + (.setXAxisMax styler (double (- n 0.5)))) + ;; One point per operation, colored by win/loss + (doseq [[i [label _ _ _]] (map-indexed vector ops)] + (let [v (double (nth values i)) + color (if (>= v 1.0) +blue+ +red+) + ^XYSeries series (.addSeries xychart + (str label " (" (format "%.1fx" v) ")") + (double-array [(double i)]) + (double-array [v]))] + (.setMarkerColor series color) + (.setLineColor series (Color. 0 0 0 0)) + (.setXYSeriesRenderStyle series XYSeries$XYSeriesRenderStyle/Scatter))) + (add-parity-line! xychart [0.0 (double (dec n))]) + (save-chart! xychart (str out-dir "/rope-operations-profile.png"))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Chart 5: Absolute Time — Rope vs Vector +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn- chart-rope-absolute [edn sizes out-dir] + (let [chart (-> (XYChartBuilder.) + (.width +width+) (.height +height+) + (.build)) + rope-ns (mapv #(let [v (lookup edn % :rope-repeated-edits :rope)] + (if v (/ v 1e6) 0.01)) + sizes) + vec-ns (mapv #(let [v (lookup edn % :rope-repeated-edits :vector)] + (if v (/ v 1e6) 0.01)) + sizes)] + (style-xy! chart + :log-x true :log-y true + :title "200 Random Edits: Absolute Time" + :y-label "Time (ms)") + (add-xy-series! chart "PersistentVector" (mapv double sizes) vec-ns +red+) + (add-xy-series! chart "Rope" (mapv double sizes) rope-ns +blue+) + (save-chart! chart (str out-dir "/rope-vs-vector-absolute.png")))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Chart 6: StringRope Crossover +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn- chart-string-rope-crossover [edn sizes out-dir] + (let [chart (-> (XYChartBuilder.) + (.width +width+) (.height +height+) + (.build)) + ops [["splice" :string-rope-splice :string-rope :string +blue+] + ["insert" :string-rope-insert :string-rope :string +green+] + ["remove" :string-rope-remove :string-rope :string +purple+] + ["repeated edits" :string-rope-repeated-edits :string-rope :string +orange+] + ["concat" :string-rope-concat :string-rope :string (Color. 100 100 100)]]] + (style-xy! chart + :log-x true :log-y true + :title "StringRope vs String: Crossover by Operation" + :y-label "Speedup (×)") + (add-parity-line! chart sizes) + (doseq [[label group oc peer color] ops] + (let [ys (mapv #(or (speedup edn % group oc peer) 0.01) sizes)] + (add-xy-series! chart label (mapv double sizes) ys color))) + (save-chart! chart (str out-dir "/string-rope-crossover.png")))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Chart 7: ByteRope Crossover +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn- chart-byte-rope-crossover [edn sizes out-dir] + (let [chart (-> (XYChartBuilder.) + (.width +width+) (.height +height+) + (.build)) + ops [["splice" :byte-rope-splice :byte-rope :byte-array +blue+] + ["insert" :byte-rope-insert :byte-rope :byte-array +green+] + ["remove" :byte-rope-remove :byte-rope :byte-array +purple+] + ["repeated edits" :byte-rope-repeated-edits :byte-rope :byte-array +orange+] + ["split" :byte-rope-split :byte-rope :byte-array (Color. 100 100 100)]]] + (style-xy! chart + :log-x true :log-y true + :title "ByteRope vs byte[]: Crossover by Operation" + :y-label "Speedup (×)") + (add-parity-line! chart sizes) + (doseq [[label group oc peer color] ops] + (let [ys (mapv #(or (speedup edn % group oc peer) 0.01) sizes)] + (add-xy-series! chart label (mapv double sizes) ys color))) + (save-chart! chart (str out-dir "/byte-rope-crossover.png")))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Runner +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn generate-charts + [edn-path out-dir] + (println "Generating benchmark charts...") + (println (str " source: " edn-path)) + (println (str " output: " out-dir)) + (println) + (let [edn (load-bench-edn (str edn-path)) + sizes (sizes-vec edn)] + (chart-set-algebra edn sizes out-dir) + (chart-rope-editing edn sizes out-dir) + (chart-collection-winners edn sizes out-dir) + (chart-rope-profile edn sizes out-dir) + (chart-rope-absolute edn sizes out-dir) + (chart-string-rope-crossover edn sizes out-dir) + (chart-byte-rope-crossover edn sizes out-dir) + (println) + (println "Done. 7 charts written."))) + +(defn -main [& args] + (let [file-arg (some->> (partition-all 2 1 args) + (filter (fn [[a _]] (= "--file" a))) + first second) + edn-path (or file-arg (latest-bench-file)) + out-dir (or (some->> (partition-all 2 1 args) + (filter (fn [[a _]] (= "--output" a))) + first second) + "doc/charts")] + (if edn-path + (generate-charts edn-path out-dir) + (do (println "No benchmark EDN found in bench-results/") + (System/exit 1))) + (shutdown-agents))) diff --git a/test/ordered_collections/bench_runner.clj b/test/ordered_collections/bench_runner.clj index 846593f..bd09bc0 100644 --- a/test/ordered_collections/bench_runner.clj +++ b/test/ordered_collections/bench_runner.clj @@ -3,13 +3,14 @@ Usage: lein bench # Default: N=100K (~5 min) - lein bench --full # N=10K,100K,500K (~20-30 min) + lein bench --full # N=1K,5K,10K,100K,500K (~60 min) lein bench --sizes 50000 # Custom sizes Output is written to bench-results/.edn" (:require [criterium.core :as crit] [clojure.core.reducers :as r] [clojure.data.avl :as avl] + [clojure.edn :as edn] [clojure.set :as cset] [clojure.string :as str] [clojure.java.io :as io] @@ -27,7 +28,8 @@ (:import [java.lang.management ManagementFactory] [java.net InetAddress] [java.time Duration Instant LocalDateTime] - [java.time.format DateTimeFormatter]) + [java.time.format DateTimeFormatter] + [com.google.common.collect TreeRangeMap Range]) (:gen-class)) @@ -36,7 +38,7 @@ ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; (def sizes-default [100000]) -(def sizes-full [10000 100000 500000]) +(def sizes-full [1000 5000 10000 100000 500000]) (defn timestamp [] (.format (LocalDateTime/now) @@ -229,6 +231,26 @@ :data-avl #(reduce (fn [^long acc x] (+ acc (long x))) 0 as) :ordered-set #(reduce (fn [^long acc x] (+ acc (long x))) 0 os)}))) +(defn- iterate-sum-longs + "Sum a Java Iterable of Long values via its .iterator(). Exercises the + collection's iterator path rather than its CollReduce path." + ^long [^Iterable coll] + (let [it (.iterator coll)] + (loop [acc (long 0)] + (if (.hasNext it) + (recur (unchecked-add acc (long (.next it)))) + acc)))) + +(defn bench-set-iteration-iterator [n] + (let [elems (generate-elements n) + ss (into (sorted-set) elems) + as (into (avl/sorted-set) elems) + os (core/ordered-set elems)] + (run-cases + {:sorted-set #(iterate-sum-longs ss) + :data-avl #(iterate-sum-longs as) + :ordered-set #(iterate-sum-longs os)}))) + (defn bench-set-equality [n] (let [{equal :equal different :different @@ -420,6 +442,13 @@ :data-avl #(reduce (fn [^long acc x] (+ acc (long x))) 0 data-avl) :long-ordered #(reduce (fn [^long acc x] (+ acc (long x))) 0 long-ordered)}))) +(defn bench-long-rank-lookup [n & {:keys [num-lookups] :or {num-lookups 10000}}] + (let [{:keys [data-avl long-ordered]} (make-long-sets n) + ^longs ks (long-array (repeatedly num-lookups #(rand-int n)))] + (run-cases + {:data-avl #(dotimes [i num-lookups] (avl/rank-of data-avl (aget ks i))) + :long-ordered #(dotimes [i num-lookups] (core/rank long-ordered (aget ks i)))}))) + (defn bench-long-fold [n] (let [{:keys [sorted-set data-avl long-ordered]} (make-long-sets n) sum-fn (fn [^long acc x] (+ acc (long x)))] @@ -534,6 +563,16 @@ :data-avl #(dotimes [i num-lookups] (contains? as (aget look i))) :string-ordered #(dotimes [i num-lookups] (contains? os (aget look i)))}))) +(defn bench-string-rank-lookup [n & {:keys [num-lookups] :or {num-lookups 10000}}] + (let [ks (generate-string-keys n) + cmp #(compare (str %1) (str %2)) + as (into (avl/sorted-set-by cmp) ks) + os (core/string-ordered-set ks) + ^objects look (object-array (repeatedly num-lookups #(nth ks (rand-int n))))] + (run-cases + {:data-avl #(dotimes [i num-lookups] (avl/rank-of as (aget look i))) + :string-ordered #(dotimes [i num-lookups] (core/rank os (aget look i)))}))) + (defn bench-string-set-union [n] (let [{left :left right :right} (make-string-set-pair n)] (run-cases @@ -583,6 +622,303 @@ {:interval-set-reduce #(reduce sum-intervals 0 is) :interval-set-fold #(r/fold + sum-intervals is)}))) + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Range Map Benchmarks — vs Guava TreeRangeMap +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn- gen-range-entries [n] + (let [spacing 100] + (vec (for [i (range n)] + [(* i spacing) (+ (* i spacing) (quot spacing 2)) (keyword (str "v" i))])))) + +(defn- build-oc-range-map [entries] + (reduce (fn [rm [lo hi v]] (assoc rm [lo hi] v)) (core/range-map) entries)) + +(defn- ^TreeRangeMap build-guava-range-map [entries] + (let [m (TreeRangeMap/create)] + (doseq [[lo hi v] entries] + (.put m (Range/closedOpen (long lo) (long hi)) v)) + m)) + +(defn bench-range-map-construction [n] + (let [entries (gen-range-entries n)] + (run-cases + {:guava-range-map #(build-guava-range-map entries) + :range-map #(build-oc-range-map entries)}))) + +(defn bench-range-map-bulk-construction + "Exercises the single-argument (core/range-map coll) constructor's O(n) + balanced-build path for pre-sorted disjoint input. Distinct from + bench-range-map-construction, which measures per-entry assoc." + [n] + (let [entries (gen-range-entries n) + pairs (mapv (fn [[lo hi v]] [[lo hi] v]) entries)] + (run-cases + {:guava-range-map #(build-guava-range-map entries) + :range-map #(core/range-map pairs)}))) + +(defn bench-range-map-lookup [n & {:keys [num-lookups] :or {num-lookups 10000}}] + (let [entries (gen-range-entries n) + rm (build-oc-range-map entries) + grm (build-guava-range-map entries) + max-p (long (* n 100)) + ^ints pts (int-array (repeatedly num-lookups #(rand-int max-p)))] + (run-cases + {:guava-range-map #(dotimes [i num-lookups] (.get grm (long (aget pts i)))) + :range-map #(dotimes [i num-lookups] (rm (aget pts i)))}))) + +(defn bench-range-map-carve-out [n] + (let [entries (gen-range-entries n) + rm (build-oc-range-map entries) + grm (build-guava-range-map entries) + mid-lo (long (* (quot n 4) 100)) + mid-hi (long (* (* 3 (quot n 4)) 100))] + (run-cases + {:guava-range-map #(let [copy (TreeRangeMap/create)] + (.putAll copy grm) + (.put copy (Range/closedOpen mid-lo mid-hi) :carved) + copy) + :range-map #(assoc rm [mid-lo mid-hi] :carved)}))) + +(defn bench-range-map-iteration [n] + (let [entries (gen-range-entries n) + rm (build-oc-range-map entries) + grm (build-guava-range-map entries)] + (run-cases + {:guava-range-map #(reduce (fn [^long c _] (unchecked-inc c)) 0 + (.asMapOfRanges grm)) + :range-map #(reduce (fn [^long c _] (unchecked-inc c)) 0 + (core/ranges rm))}))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Segment Tree Benchmarks — vs sorted-map subseq for range queries +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn bench-segment-tree-construction [n] + (let [data (zipmap (shuffle (range n)) (map long (range n)))] + (run-cases + {:sorted-map #(into (sorted-map) data) + :segment-tree #(core/sum-tree data)}))) + +(defn bench-segment-tree-query + [n & {:keys [num-queries window-frac] :or {num-queries 1000 window-frac 10}}] + (let [data (zipmap (range n) (map long (range n))) + st (core/sum-tree data) + sm (into (sorted-map) data) + window (max 1 (quot n window-frac)) + rng (java.util.Random. 42) + ^ints los (int-array (repeatedly num-queries + #(.nextInt rng (max 1 (- n window)))))] + (run-cases + ;; sorted-map baseline: walk the subseq and sum values + {:sorted-map #(dotimes [i num-queries] + (let [lo (aget los i) + hi (+ lo window)] + (reduce-kv (fn [^long acc _ v] (+ acc (long v))) + 0 + (into {} (subseq sm >= lo <= hi))))) + :segment-tree #(dotimes [i num-queries] + (let [lo (aget los i) + hi (+ lo window)] + (core/query st lo hi)))}))) + +(defn bench-segment-tree-update + [n & {:keys [num-updates] :or {num-updates 1000}}] + (let [data (zipmap (range n) (map long (range n))) + st (core/sum-tree data) + sm (into (sorted-map) data) + rng (java.util.Random. 42) + ^ints ks (int-array (repeatedly num-updates #(.nextInt rng n))) + ^longs vs (long-array (repeatedly num-updates #(.nextInt rng 1000)))] + (run-cases + {:sorted-map #(loop [m sm, i 0] + (if (< i num-updates) + (recur (assoc m (aget ks i) (aget vs i)) (unchecked-inc i)) + m)) + :segment-tree #(loop [m st, i 0] + (if (< i num-updates) + (recur (assoc m (aget ks i) (aget vs i)) (unchecked-inc i)) + m))}))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Priority Queue Benchmarks — vs sorted-set-by [priority seqnum value] +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn- gen-priority-pairs [n] + (let [rng (java.util.Random. 42)] + (vec (for [i (range n)] + [(.nextInt rng 1000) (keyword (str "task-" i))])))) + +(def ^:private pq-tuple-cmp + (fn [[p1 s1] [p2 s2]] + (let [c (compare p1 p2)] + (if (zero? c) (compare s1 s2) c)))) + +(defn- build-sorted-pq-baseline [pairs] + (into (sorted-set-by pq-tuple-cmp) + (map-indexed (fn [i [p v]] [p i v]) pairs))) + +(defn bench-priority-queue-construction [n] + (let [pairs (gen-priority-pairs n)] + (run-cases + {:sorted-set-by #(build-sorted-pq-baseline pairs) + :priority-queue #(core/priority-queue pairs)}))) + +(defn bench-priority-queue-push + [n & {:keys [num-ops] :or {num-ops 1000}}] + (let [pairs (gen-priority-pairs n) + base-pq (core/priority-queue pairs) + base-ss (build-sorted-pq-baseline pairs) + rng (java.util.Random. 42) + extra (vec (for [i (range num-ops)] + [(.nextInt rng 1000) (keyword (str "new-" i))]))] + (run-cases + {:sorted-set-by #(loop [s base-ss, i 0] + (if (< i num-ops) + (let [[p v] (nth extra i)] + (recur (conj s [p (+ n i) v]) (unchecked-inc i))) + s)) + :priority-queue #(loop [q base-pq, i 0] + (if (< i num-ops) + (let [[p v] (nth extra i)] + (recur (core/push q p v) (unchecked-inc i))) + q))}))) + +(defn bench-priority-queue-pop-min + [n & {:keys [num-ops] :or {num-ops 1000}}] + (let [num-ops (min num-ops n) + pairs (gen-priority-pairs n) + base-pq (core/priority-queue pairs) + base-ss (build-sorted-pq-baseline pairs)] + (run-cases + {:sorted-set-by #(loop [s base-ss, i 0] + (if (< i num-ops) + (recur (disj s (first s)) (unchecked-inc i)) + s)) + :priority-queue #(loop [q base-pq, i 0] + (if (< i num-ops) + (recur (core/pop-min q) (unchecked-inc i)) + q))}))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Ordered Multiset Benchmarks — vs sorted-map counts +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn- gen-multiset-elements [n] + ;; Repeat each element 3x to exercise the multiplicity path + (vec (mapcat #(repeat 3 %) (range (quot n 3))))) + +(defn bench-multiset-construction [n] + (let [elems (gen-multiset-elements n)] + (run-cases + {:sorted-map-counts #(reduce (fn [m x] (update m x (fnil inc 0))) + (sorted-map) elems) + :ordered-multiset #(core/ordered-multiset elems)}))) + +(defn bench-multiset-multiplicity + [n & {:keys [num-ops] :or {num-ops 10000}}] + (let [elems (gen-multiset-elements n) + mset (core/ordered-multiset elems) + counts (reduce (fn [m x] (update m x (fnil inc 0))) (sorted-map) elems) + max-key (quot n 3) + rng (java.util.Random. 42) + ^ints ks (int-array (repeatedly num-ops #(.nextInt rng (max 1 max-key))))] + (run-cases + {:sorted-map-counts #(dotimes [i num-ops] + (get counts (aget ks i) 0)) + :ordered-multiset #(dotimes [i num-ops] + (core/multiplicity mset (aget ks i)))}))) + +(defn bench-multiset-iteration [n] + (let [elems (gen-multiset-elements n) + mset (core/ordered-multiset elems) + counts (reduce (fn [m x] (update m x (fnil inc 0))) (sorted-map) elems)] + (run-cases + ;; Baseline: walk the sorted-map, expanding each entry into + ;; `count` repeated elements — the same observable content. + {:sorted-map-counts #(reduce (fn [^long acc [k c]] + (+ acc (* (long k) (long c)))) + 0 counts) + :ordered-multiset #(reduce + 0 mset)}))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Fuzzy Set / Fuzzy Map Benchmarks — vs sorted-set / sorted-map nearest +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn- sorted-set-nearest + "Find the element in sorted-set `s` closest to `query` by absolute + difference. O(log n) floor + ceiling via `subseq`/`rsubseq`." + [s query] + (let [floor (first (rsubseq s <= query)) + ceiling (first (subseq s >= query))] + (cond + (and floor ceiling) (if (<= (Math/abs ^long (- (long query) (long floor))) + (Math/abs ^long (- (long query) (long ceiling)))) + floor + ceiling) + floor floor + ceiling ceiling + :else nil))) + +(defn- sorted-map-nearest + [m query] + (let [floor (first (rsubseq m <= query)) + ceiling (first (subseq m >= query))] + (cond + (and floor ceiling) (if (<= (Math/abs ^long (- (long query) (long (key floor)))) + (Math/abs ^long (- (long query) (long (key ceiling))))) + floor + ceiling) + floor floor + ceiling ceiling + :else nil))) + +(defn bench-fuzzy-set-construction [n] + (let [elems (vec (shuffle (range n)))] + (run-cases + {:sorted-set #(into (sorted-set) elems) + :fuzzy-set #(core/fuzzy-set elems)}))) + +(defn bench-fuzzy-set-nearest + [n & {:keys [num-ops] :or {num-ops 10000}}] + (let [elems (vec (range n)) + ss (into (sorted-set) elems) + fs (core/fuzzy-set elems) + max-q (long (* 2 n)) + rng (java.util.Random. 42) + ^ints qs (int-array (repeatedly num-ops #(.nextInt rng (max 1 max-q))))] + (run-cases + {:sorted-set #(dotimes [i num-ops] + (sorted-set-nearest ss (aget qs i))) + :fuzzy-set #(dotimes [i num-ops] + (fs (aget qs i)))}))) + +(defn bench-fuzzy-map-construction [n] + (let [pairs (map (fn [i] [i (str "v-" i)]) (shuffle (range n)))] + (run-cases + {:sorted-map #(into (sorted-map) pairs) + :fuzzy-map #(core/fuzzy-map pairs)}))) + +(defn bench-fuzzy-map-nearest + [n & {:keys [num-ops] :or {num-ops 10000}}] + (let [pairs (map (fn [i] [i (str "v-" i)]) (range n)) + sm (into (sorted-map) pairs) + fm (core/fuzzy-map pairs) + max-q (long (* 2 n)) + rng (java.util.Random. 42) + ^ints qs (int-array (repeatedly num-ops #(.nextInt rng (max 1 max-q))))] + (run-cases + {:sorted-map #(dotimes [i num-ops] + (sorted-map-nearest sm (aget qs i))) + :fuzzy-map #(dotimes [i num-ops] + (fm (aget qs i)))}))) + + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; Rope Benchmarks ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; @@ -670,6 +1006,370 @@ {:rope #(r/fold combinef reducef r) :vector #(r/fold combinef reducef v)}))) +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; String Rope vs String vs StringBuilder Benchmarks +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +;; Three competitors: +;; :string — idiomatic Clojure: (str (subs ...) ...) — what most people write +;; :string-builder — optimal mutable: pre-sized StringBuilder with append(CharSequence,int,int) +;; :string-rope — persistent rope: O(log n) structural edits with structural sharing +;; +;; StringRope's architectural advantages over StringBuilder: +;; • Persistent — old versions survive edits (free undo, no defensive copying) +;; • Structural sharing — concat/split are O(log n), no bulk copying +;; • Thread-safe by construction — immutable, no locking needed +;; • O(log n) splice/insert/remove vs StringBuilder's O(n) arraycopy + +(defn- sb-splice + "Optimal String splice via StringBuilder." + ^String [^String s ^long start ^long end ^String rep] + (let [si (int start) + ei (int end) + sb (StringBuilder. (+ (.length s) (.length rep) (- si) ei))] + (.append sb s 0 si) + (.append sb rep) + (.append sb s ei (.length s)) + (.toString sb))) + +(defn- sb-insert + "Optimal String insert via StringBuilder." + ^String [^String s ^long i ^String ins] + (sb-splice s i i ins)) + +(defn- sb-remove + "Optimal String remove via StringBuilder." + ^String [^String s ^long start ^long end] + (let [si (int start) + ei (int end) + sb (StringBuilder. (- (.length s) (- ei si)))] + (.append sb s 0 si) + (.append sb s ei (.length s)) + (.toString sb))) + +(defn- random-text + "Generate a random ASCII text of length n." + ^String [^long n] + (let [sb (StringBuilder. n) + ^String chars "abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ\n0123456789" + nchars (.length chars)] + (dotimes [_ n] + (.append sb (.charAt chars (rand-int nchars)))) + (.toString sb))) + +(defn bench-string-rope-construction [n] + (let [^String text (random-text n)] + (run-cases + {:string #(String. text) + :string-builder #(let [sb (StringBuilder. (count text))] + (.append sb text) + (.toString sb)) + :string-rope #(core/string-rope text)}))) + +(defn bench-string-rope-concat [n] + (let [^String text (random-text n) + half (int (quot n 2)) + ^String s1 (.substring text 0 half) + ^String s2 (.substring text half) + sr1 (core/string-rope s1) + sr2 (core/string-rope s2)] + (run-cases + {:string #(str s1 s2) + :string-builder #(let [sb (StringBuilder. (.length text))] + (.append sb s1) + (.append sb s2) + (.toString sb)) + :string-rope #(core/string-rope-concat sr1 sr2)}))) + +(defn bench-string-rope-split [n] + (let [^String text (random-text n) + sr (core/string-rope text) + mid (int (quot n 2))] + (run-cases + {:string #(vector (subs text 0 mid) (subs text mid)) + :string-builder #(vector (.substring text 0 mid) (.substring text mid)) + :string-rope #(core/rope-split sr mid)}))) + +(defn bench-string-rope-splice [n] + (let [^String text (random-text n) + sr (core/string-rope text) + mid (quot n 2) + lo (max 0 (- mid 5)) + hi (min n (+ mid 5)) + ^String rep "YYYYYYYYYY"] + (run-cases + {:string #(str (subs text 0 lo) rep (subs text hi)) + :string-builder #(sb-splice text lo hi rep) + :string-rope #(core/rope-splice sr lo hi rep)}))) + +(defn bench-string-rope-insert [n] + (let [^String text (random-text n) + sr (core/string-rope text) + mid (quot n 2) + ^String ins "Y"] + (run-cases + {:string #(str (subs text 0 mid) ins (subs text mid)) + :string-builder #(sb-insert text mid ins) + :string-rope #(core/rope-insert sr mid ins)}))) + +(defn bench-string-rope-remove [n] + (let [^String text (random-text n) + sr (core/string-rope text) + mid (quot n 2) + lo (max 0 (- mid 5)) + hi (min n (+ mid 5))] + (run-cases + {:string #(str (subs text 0 lo) (subs text hi)) + :string-builder #(sb-remove text lo hi) + :string-rope #(core/rope-remove sr lo hi)}))) + +(defn bench-string-rope-nth [n & {:keys [num-ops] :or {num-ops 1000}}] + (let [^String text (random-text n) + sr (core/string-rope text) + rng (java.util.Random. 42) + idxs (int-array (repeatedly num-ops #(.nextInt rng (max 1 n))))] + (run-cases + {:string #(areduce idxs i acc (char 0) (.charAt text (aget idxs i))) + :string-rope #(areduce idxs i acc nil (nth sr (aget idxs i)))}))) + +(defn bench-string-rope-reduce [n] + (let [^String text (random-text n) + sr (core/string-rope text) + f (fn [^long acc c] (+ acc (long (int (char c)))))] + (run-cases + {:string #(let [len (.length text)] + (loop [i (int 0), acc (long 0)] + (if (< i len) + (recur (unchecked-inc-int i) (+ acc (long (int (.charAt text i))))) + acc))) + :string-rope #(reduce f 0 sr)}))) + +(defn bench-string-rope-str [n] + (let [^String text (random-text n) + sr (core/string-rope text)] + (run-cases + {:string #(String. text) + :string-rope #(str sr)}))) + +(defn bench-string-rope-repeated-edits [n] + (let [^String text (random-text n) + sr (core/string-rope text) + rng (java.util.Random. 42) + nops 200 + idxs (vec (repeatedly nops #(.nextInt rng (max 1 n)))) + ^String ins "XXXXX"] + (run-cases + {:string #(loop [^String s text, i 0] + (if (< i nops) + (let [pos (rem (long (nth idxs i)) (long (.length s))) + end (min (.length s) (+ pos 5))] + (recur (str (subs s 0 pos) ins (subs s end)) (unchecked-inc i))) + s)) + :string-builder #(loop [^String s text, i 0] + (if (< i nops) + (let [pos (rem (long (nth idxs i)) (long (.length s))) + end (min (.length s) (+ pos 5))] + (recur (sb-splice s pos end ins) (unchecked-inc i))) + s)) + :string-rope #(loop [r sr, i 0] + (if (< i nops) + (let [pos (rem (long (nth idxs i)) (long (count r))) + end (min (count r) (+ pos 5))] + (recur (core/rope-splice r pos end ins) (unchecked-inc i))) + r))}))) + +(defn bench-string-rope-re-find [n] + (let [^String text (random-text n) + sr (core/string-rope text) + pat #"[A-Z]{2,}"] + (run-cases + {:string #(re-find pat text) + :string-rope #(re-find pat sr)}))) + +(defn bench-string-rope-re-seq [n] + (let [^String text (random-text n) + sr (core/string-rope text) + pat #"\w+"] + (run-cases + {:string #(doall (re-seq pat text)) + :string-rope #(doall (re-seq pat sr))}))) + +(defn bench-string-rope-re-replace [n] + (let [^String text (random-text n) + sr (core/string-rope text) + pat #"[aeiou]"] + (run-cases + {:string #(clojure.string/replace text pat "*") + :string-rope #(clojure.string/replace sr pat "*")}))) + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Byte Rope vs byte[] Benchmarks +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; +;; ByteRope is a persistent, immutable byte sequence backed by a chunked +;; weight-balanced tree. The mutable baseline is the raw byte[] with +;; System.arraycopy — the fastest possible same-operation comparison on JVM. +;; ByteRope's architectural advantages: O(log n) structural edits, +;; persistent snapshots, structural sharing, thread safety by construction. + +(defn- ba-splice + "Optimal byte[] splice via System.arraycopy." + ^bytes [^bytes s ^long start ^long end ^bytes rep] + (let [si (int start) + ei (int end) + sl (alength s) + rl (int (if rep (alength rep) 0)) + result (byte-array (+ (- sl (- ei si)) rl))] + (System/arraycopy s 0 result 0 si) + (when (pos? rl) + (System/arraycopy rep 0 result si rl)) + (System/arraycopy s ei result (+ si rl) (- sl ei)) + result)) + +(defn- ba-concat + ^bytes [^bytes a ^bytes b] + (let [al (alength a) + bl (alength b) + result (byte-array (+ al bl))] + (System/arraycopy a 0 result 0 al) + (System/arraycopy b 0 result al bl) + result)) + +(defn- random-bytes + "Generate a random byte array of length n." + ^bytes [^long n] + (let [rng (java.util.Random. 42) + b (byte-array n)] + (.nextBytes rng b) + b)) + +(defn bench-byte-rope-construction [n] + (let [^bytes data (random-bytes n)] + (run-cases + {:byte-array #(java.util.Arrays/copyOf data n) + :byte-rope #(core/byte-rope data)}))) + +(defn bench-byte-rope-concat [n] + (let [^bytes b1 (random-bytes (quot n 4)) + ^bytes b2 (random-bytes (quot n 4)) + ^bytes b3 (random-bytes (quot n 4)) + ^bytes b4 (random-bytes (- n (* 3 (quot n 4)))) + br1 (core/byte-rope b1) br2 (core/byte-rope b2) + br3 (core/byte-rope b3) br4 (core/byte-rope b4)] + (run-cases + {:byte-array #(ba-concat (ba-concat (ba-concat b1 b2) b3) b4) + :byte-rope #(core/byte-rope-concat br1 br2 br3 br4)}))) + +(defn bench-byte-rope-splice [n] + (let [^bytes data (random-bytes n) + br (core/byte-rope data) + mid (quot n 2) + lo (max 0 (- mid 16)) + hi (min n (+ mid 16)) + ^bytes rep (random-bytes 32)] + (run-cases + {:byte-array #(ba-splice data lo hi rep) + :byte-rope #(core/rope-splice br lo hi rep)}))) + +(defn bench-byte-rope-insert [n] + (let [^bytes data (random-bytes n) + br (core/byte-rope data) + mid (quot n 2) + ^bytes ins (random-bytes 16)] + (run-cases + {:byte-array #(ba-splice data mid mid ins) + :byte-rope #(core/rope-insert br mid ins)}))) + +(defn bench-byte-rope-remove [n] + (let [^bytes data (random-bytes n) + br (core/byte-rope data) + mid (quot n 2) + lo (max 0 (- mid 16)) + hi (min n (+ mid 16))] + (run-cases + {:byte-array #(ba-splice data lo hi nil) + :byte-rope #(core/rope-remove br lo hi)}))) + +(defn bench-byte-rope-split [n] + (let [^bytes data (random-bytes n) + br (core/byte-rope data) + mid (int (quot n 2))] + (run-cases + {:byte-array #(vector (java.util.Arrays/copyOfRange data 0 mid) + (java.util.Arrays/copyOfRange data mid (alength data))) + :byte-rope #(core/rope-split br mid)}))) + +(defn bench-byte-rope-repeated-edits [n] + (let [^bytes data (random-bytes n) + br (core/byte-rope data) + rng (java.util.Random. 42) + nops 200 + idxs (vec (repeatedly nops #(.nextInt rng (max 1 n)))) + ^bytes ins (random-bytes 5)] + (run-cases + {:byte-array + #(loop [^bytes s data, i 0] + (if (< i nops) + (let [pos (rem (long (nth idxs i)) (long (alength s))) + end (min (alength s) (+ pos 5))] + (recur (ba-splice s pos end ins) (unchecked-inc i))) + s)) + :byte-rope + #(loop [r br, i 0] + (if (< i nops) + (let [pos (rem (long (nth idxs i)) (long (count r))) + end (min (count r) (+ pos 5))] + (recur (core/rope-splice r pos end ins) (unchecked-inc i))) + r))}))) + +(defn bench-byte-rope-nth [n & {:keys [num-ops] :or {num-ops 1000}}] + (let [^bytes data (random-bytes n) + br (core/byte-rope data) + rng (java.util.Random. 42) + idxs (int-array (repeatedly num-ops #(.nextInt rng (max 1 n))))] + (run-cases + {:byte-array #(areduce idxs i acc (long 0) (+ acc (long (aget data (aget idxs i))))) + :byte-rope #(areduce idxs i acc (long 0) (+ acc (long (nth br (aget idxs i)))))}))) + +(defn bench-byte-rope-reduce [n] + (let [^bytes data (random-bytes n) + br (core/byte-rope data)] + (run-cases + {:byte-array #(let [len (alength data)] + (loop [i (int 0), acc (long 0)] + (if (< i len) + (recur (unchecked-inc-int i) + (+ acc (bit-and (long (aget data i)) 0xff))) + acc))) + :byte-rope #(reduce + 0 br)}))) + +(defn bench-byte-rope-fold [n] + (let [^bytes data (random-bytes n) + br (core/byte-rope data)] + (run-cases + {:byte-array #(let [len (alength data)] + (loop [i (int 0), acc (long 0)] + (if (< i len) + (recur (unchecked-inc-int i) + (+ acc (bit-and (long (aget data i)) 0xff))) + acc))) + :byte-rope #(r/fold + br)}))) + +(defn bench-byte-rope-bytes [n] + (let [^bytes data (random-bytes n) + br (core/byte-rope data)] + (run-cases + {:byte-array #(java.util.Arrays/copyOf data n) + :byte-rope #(core/byte-rope-bytes br)}))) + +(defn bench-byte-rope-digest [n] + (let [^bytes data (random-bytes n) + br (core/byte-rope data)] + (run-cases + {:byte-array #(let [md (java.security.MessageDigest/getInstance "SHA-256")] + (.digest md data)) + :byte-rope #(core/byte-rope-bytes (core/byte-rope-digest br "SHA-256"))}))) + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; Suite Runners ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; @@ -699,7 +1399,15 @@ [:long-lookup bench-long-lookup] [:long-union bench-long-union] [:long-intersection bench-long-intersection] - [:long-difference bench-long-difference]]) + [:long-difference bench-long-difference] + [:string-rope-splice bench-string-rope-splice] + [:string-rope-insert bench-string-rope-insert] + [:string-rope-remove bench-string-rope-remove] + [:string-rope-repeated-edits bench-string-rope-repeated-edits] + [:byte-rope-splice bench-byte-rope-splice] + [:byte-rope-insert bench-byte-rope-insert] + [:byte-rope-remove bench-byte-rope-remove] + [:byte-rope-repeated-edits bench-byte-rope-repeated-edits]]) (def ^:private readme-benchmark-specs [[:set-construction bench-set-construction] @@ -724,6 +1432,7 @@ [:set-delete bench-set-delete] [:set-lookup bench-set-lookup] [:set-iteration bench-set-iteration] + [:set-iteration-iterator bench-set-iteration-iterator] [:set-equality bench-set-equality] [:set-fold bench-set-fold] [:set-fold-freq bench-set-fold-freq] @@ -742,6 +1451,24 @@ [:interval-map-construction bench-interval-map-construction] [:interval-lookup bench-interval-lookup] [:interval-fold bench-interval-fold] + [:range-map-construction bench-range-map-construction] + [:range-map-bulk-construction bench-range-map-bulk-construction] + [:range-map-lookup bench-range-map-lookup] + [:range-map-carve-out bench-range-map-carve-out] + [:range-map-iteration bench-range-map-iteration] + [:segment-tree-construction bench-segment-tree-construction] + [:segment-tree-query bench-segment-tree-query] + [:segment-tree-update bench-segment-tree-update] + [:priority-queue-construction bench-priority-queue-construction] + [:priority-queue-push bench-priority-queue-push] + [:priority-queue-pop-min bench-priority-queue-pop-min] + [:multiset-construction bench-multiset-construction] + [:multiset-multiplicity bench-multiset-multiplicity] + [:multiset-iteration bench-multiset-iteration] + [:fuzzy-set-construction bench-fuzzy-set-construction] + [:fuzzy-set-nearest bench-fuzzy-set-nearest] + [:fuzzy-map-construction bench-fuzzy-map-construction] + [:fuzzy-map-nearest bench-fuzzy-map-nearest] [:rope-concat bench-rope-concat] [:rope-splice bench-rope-splice] [:rope-repeated-edits bench-rope-repeated-edits] @@ -755,6 +1482,7 @@ [:long-insert bench-long-insert] [:long-delete bench-long-delete] [:long-iteration bench-long-iteration] + [:long-rank-lookup bench-long-rank-lookup] [:long-fold bench-long-fold] [:long-union bench-long-union] [:long-intersection bench-long-intersection] @@ -762,9 +1490,35 @@ [:long-split bench-long-split] [:string-set-construction bench-string-set-construction] [:string-set-lookup bench-string-set-lookup] + [:string-rank-lookup bench-string-rank-lookup] [:string-set-union bench-string-set-union] [:string-set-intersection bench-string-set-intersection] - [:string-set-difference bench-string-set-difference]]) + [:string-set-difference bench-string-set-difference] + [:string-rope-construction bench-string-rope-construction] + [:string-rope-concat bench-string-rope-concat] + [:string-rope-split bench-string-rope-split] + [:string-rope-splice bench-string-rope-splice] + [:string-rope-insert bench-string-rope-insert] + [:string-rope-remove bench-string-rope-remove] + {:key :string-rope-nth :fn bench-string-rope-nth :when #(<= % 500000)} + [:string-rope-reduce bench-string-rope-reduce] + [:string-rope-str bench-string-rope-str] + [:string-rope-repeated-edits bench-string-rope-repeated-edits] + [:string-rope-re-find bench-string-rope-re-find] + [:string-rope-re-seq bench-string-rope-re-seq] + [:string-rope-re-replace bench-string-rope-re-replace] + [:byte-rope-construction bench-byte-rope-construction] + [:byte-rope-concat bench-byte-rope-concat] + [:byte-rope-split bench-byte-rope-split] + [:byte-rope-splice bench-byte-rope-splice] + [:byte-rope-insert bench-byte-rope-insert] + [:byte-rope-remove bench-byte-rope-remove] + {:key :byte-rope-nth :fn bench-byte-rope-nth :when #(<= % 500000)} + [:byte-rope-reduce bench-byte-rope-reduce] + [:byte-rope-fold bench-byte-rope-fold] + [:byte-rope-repeated-edits bench-byte-rope-repeated-edits] + [:byte-rope-bytes bench-byte-rope-bytes] + [:byte-rope-digest bench-byte-rope-digest]]) (defn run-wins-benchmarks "Run benchmarks focused on where ordered-collections wins." @@ -776,7 +1530,7 @@ @results)) (defn run-readme-benchmarks - "Run only the benchmarks used in README tables (~5 min for 3 sizes)." + "Run only the benchmarks used in README tables (~10 min for 5 sizes)." [sizes] (let [results (atom {})] (doseq [n sizes] @@ -892,6 +1646,148 @@ (print-summary-node " " bench-results)) (println))) + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Auto-Compare Against Prior Run +;; +;; After writing the fresh EDN to disk, look for the most-recent +;; existing result file in bench-results/ that predates this run. If +;; one exists, flat-walk both and diff matching (size, group, variant) +;; tuples, printing a compact section with any notable regressions +;; and improvements. Self-contained: no dependency on the bb report +;; tool — runs from the same JVM process that just finished the bench. +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(def ^:private ^:const +regression-threshold+ 1.10) +(def ^:private ^:const +improvement-threshold+ 0.90) +(def ^:private ^:const +compare-top-n+ 15) + +(defn- prior-edn-file + "Return the absolute path of the most-recent `.edn` file in + bench-results/ whose filename sorts before the given `current` file + (which must also live in bench-results/), or nil if none exists. + Filenames are timestamped, so lexical sort == chronological sort." + [current] + (let [dir (io/file "bench-results") + current-name (.getName ^java.io.File (io/file current))] + (when (.isDirectory dir) + (let [prior (->> (.listFiles dir) + (map (fn [^java.io.File f] (.getName f))) + (filter #(.endsWith ^String % ".edn")) + (filter #(neg? (compare % current-name))) + sort + last)] + (when prior + (.getAbsolutePath ^java.io.File (io/file dir prior))))))) + +(defn- walk-leaves + "Walk a nested benchmark-result map and yield a flat seq of + {:size :group :variant :mean-ns} entries, one per leaf measurement." + [benchmarks] + (letfn [(walk [size path node] + (cond + (and (map? node) (contains? node :mean-ns)) + [{:size size + :group (first path) + :variant (last path) + :mean-ns (double (:mean-ns node))}] + + (map? node) + (mapcat (fn [[k v]] (walk size (conj path k) v)) node) + + :else nil))] + (mapcat (fn [[size groups]] + (mapcat (fn [[group node]] (walk size [group] node)) groups)) + benchmarks))) + +(defn- load-bench-edn [path] + (try + (edn/read-string (slurp path)) + (catch Exception _ nil))) + +(defn- delta-row [{:keys [size group variant mean-ns]} prior-map] + (when-let [old (get prior-map [size group variant])] + (let [new-ns (double mean-ns) + old-ns (double old) + ratio (/ new-ns old-ns)] + {:size size + :group group + :variant variant + :old-ns old-ns + :new-ns new-ns + :ratio ratio + :delta-pct (* 100.0 (dec ratio))}))) + +(defn- fmt-time [^double ns] + (cond + (>= ns 1e9) (format "%.2fs" (/ ns 1e9)) + (>= ns 1e6) (format "%.2fms" (/ ns 1e6)) + (>= ns 1e3) (format "%.2fµs" (/ ns 1e3)) + :else (format "%.0fns" ns))) + +(defn- print-delta-table [title rows] + (println title) + (println " -------------------------------------------------------------------------------------------------") + (println (format " %6s %-32s %-22s %12s %12s %10s" + "N" "group" "variant" "old" "new" "delta")) + (println " -------------------------------------------------------------------------------------------------") + (doseq [{:keys [size group variant old-ns new-ns delta-pct]} rows] + (println (format " %6d %-32s %-22s %12s %12s %+9.1f%%" + size (name group) (name variant) + (fmt-time old-ns) (fmt-time new-ns) delta-pct))) + (println)) + +(defn print-comparison-vs-prior + "If a prior bench-results EDN exists for comparison, walk both files, + match leaf measurements by (size, group, variant), and print + regressions (slower ≥10%) and improvements (faster ≥10%) side-by-side." + [current-file] + (if-let [prior-file (prior-edn-file current-file)] + (let [current (load-bench-edn current-file) + prior (load-bench-edn prior-file)] + (if-not (and current prior (:benchmarks current) (:benchmarks prior)) + (do (println) + (println "(No prior EDN loaded — skipping comparison.)")) + (let [current-rows (walk-leaves (:benchmarks current)) + prior-rows (walk-leaves (:benchmarks prior)) + prior-map (into {} (map (juxt (juxt :size :group :variant) :mean-ns)) + prior-rows) + deltas (keep #(delta-row % prior-map) current-rows) + regressions (->> deltas + (filter #(>= (:ratio %) +regression-threshold+)) + (sort-by :ratio >) + (take +compare-top-n+)) + improvements (->> deltas + (filter #(<= (:ratio %) +improvement-threshold+)) + (sort-by :ratio) + (take +compare-top-n+))] + (println) + (println "========================================================================") + (println " Comparison vs Previous Run") + (println "========================================================================") + (println) + (println (str " Baseline: " prior-file)) + (println (str " Compared: " (count deltas) " matching benchmark cells.")) + (println) + (if (seq regressions) + (print-delta-table + (format " Regressions (≥ %.0f%% slower), top %d by magnitude:" + (* 100.0 (dec +regression-threshold+)) +compare-top-n+) + regressions) + (println " No significant regressions (≥10% slower).\n")) + (if (seq improvements) + (print-delta-table + (format " Improvements (≥ %.0f%% faster), top %d by magnitude:" + (* 100.0 (- 1.0 +improvement-threshold+)) +compare-top-n+) + improvements) + (println " No significant improvements (≥10% faster).\n")) + (println (str " Use `lein bench-report --file " current-file + " --baseline " prior-file "` for the full")) + (println (str " comparison (category breakdown, headline tables, etc.).")) + (println)))) + (do (println) + (println "(No prior bench-results EDN found — skipping comparison.)")))) + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; Main Entry Point ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; @@ -905,12 +1801,12 @@ (println "Usage: lein bench [options]") (println) (println "Options:") - (println " --readme README table benchmarks only (~5 min for --full)") - (println " --full N=10K,100K,500K") + (println " --readme README table benchmarks only (~10 min for --full)") + (println " --full N=1K,5K,10K,100K,500K (~60 min)") (println " --sizes N,N,... Custom sizes (comma-separated)") (println " --help Show this help") (println) - (println "Default: N=100K (~3 min)") + (println "Default: N=100K (~5 min)") (println "Output is written to bench-results/.edn")) (defn -main [& args] @@ -938,7 +1834,8 @@ (let [results (runner (:sizes opts))] (print-summary results) - (write-results results output-file opts started-at)) + (write-results results output-file opts started-at) + (print-comparison-vs-prior output-file)) (println) (println "Benchmark suite complete."))) diff --git a/test/ordered_collections/byte_rope_test.clj b/test/ordered_collections/byte_rope_test.clj new file mode 100644 index 0000000..0a1d468 --- /dev/null +++ b/test/ordered_collections/byte_rope_test.clj @@ -0,0 +1,596 @@ +(ns ordered-collections.byte-rope-test + (:require [clojure.test :refer :all] + [clojure.core.reducers :as r] + [clojure.test.check.clojure-test :refer [defspec]] + [clojure.test.check.generators :as gen] + [clojure.test.check.properties :as prop] + [ordered-collections.core :as oc] + [ordered-collections.kernel.rope :as ropetree] + [ordered-collections.protocol :as proto] + ;; Force-load reader registrations so #byte/rope literals resolve. + [ordered-collections.readers]) + (:import [ordered_collections.types.byte_rope ByteRope] + [java.io ByteArrayInputStream ByteArrayOutputStream])) + + +(defn- ba [& xs] + (byte-array (map #(unchecked-byte (long %)) xs))) + +(defn- ba= [^bytes a ^bytes b] + (java.util.Arrays/equals a b)) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Construction +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(deftest byte-rope-empty-contract + (let [br (oc/byte-rope)] + (testing "empty nth throws IOOB, not NPE" + (is (thrown? IndexOutOfBoundsException (nth br 0))) + (is (= :nope (nth br 0 :nope)))) + (testing "empty fold returns (combinef)" + (is (= 0 (r/fold + br)))) + (testing "empty reduce with init returns init" + (is (= 42 (reduce + 42 br)))))) + +(deftest byte-rope-input-stream-zero-read + (let [br (oc/byte-rope [1 2 3]) + ins (oc/byte-rope-input-stream br)] + (testing "read(buf, off, 0) returns 0 mid-stream" + (is (= 0 (.read ins (byte-array 10) 0 0)))) + (testing "exhaust stream" + (is (= 3 (.read ins (byte-array 10) 0 10)))) + (testing "read(buf, off, 0) returns 0 at EOF" + (is (= 0 (.read ins (byte-array 10) 0 0)))) + (testing "read(buf, off, n) returns -1 at EOF" + (is (= -1 (.read ins (byte-array 10) 0 5)))))) + +(deftest byte-rope-construction + (testing "empty" + (let [br (oc/byte-rope)] + (is (zero? (count br))) + (is (nil? (seq br))) + (is (= "" (oc/byte-rope-hex br))))) + + (testing "from byte-array" + (let [data (ba 1 2 3) + br (oc/byte-rope data)] + (is (= 3 (count br))) + (is (= [1 2 3] (vec (seq br)))) + (testing "defensive copy — mutating input does not affect rope" + (aset data 0 (unchecked-byte 99)) + (is (= 1 (nth br 0)))))) + + (testing "from sequential collection" + (let [br (oc/byte-rope [0 127 128 255])] + (is (= 4 (count br))) + (is (= [0 127 128 255] (vec (seq br)))))) + + (testing "from string (UTF-8)" + (let [br (oc/byte-rope "Hello")] + (is (= 5 (count br))) + (is (ba= (ba 0x48 0x65 0x6c 0x6c 0x6f) (oc/byte-rope-bytes br))))) + + (testing "UTF-8 encodes multi-byte characters" + (let [br (oc/byte-rope "é")] ;; U+00E9 → C3 A9 + (is (= 2 (count br))) + (is (= [0xc3 0xa9] (vec (seq br)))))) + + (testing "from InputStream" + (let [in (ByteArrayInputStream. (ba 10 20 30 40)) + br (oc/byte-rope in)] + (is (= 4 (count br))) + (is (= [10 20 30 40] (vec (seq br)))))) + + (testing "from another byte rope" + (let [br1 (oc/byte-rope [1 2 3]) + br2 (oc/byte-rope br1)] + (is (= br1 br2)))) + + (testing "large input promotes to tree" + (let [data (byte-array (map #(mod % 256) (range 2000))) + br (oc/byte-rope data)] + (is (= 2000 (count br))) + (is (ba= data (oc/byte-rope-bytes br)))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Unsigned Semantics +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(deftest unsigned-semantics + (testing "nth returns unsigned long 0–255" + (let [br (oc/byte-rope [0 1 127 128 200 255])] + (is (= 0 (nth br 0))) + (is (= 255 (nth br 5))) + (is (every? #(and (<= 0 %) (<= % 255)) (map #(nth br %) (range 6)))))) + + (testing "reduce yields unsigned values" + (let [br (oc/byte-rope [0 127 128 255])] + (is (= (+ 0 127 128 255) (reduce + br))) + (is (= (+ 100 0 127 128 255) (reduce + 100 br))))) + + (testing "seq yields unsigned values" + (is (= [0 127 128 255] (vec (seq (oc/byte-rope [0 127 128 255]))))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Indexed / ILookup / IFn +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(deftest byte-rope-indexed + (let [br (oc/byte-rope [10 20 30 40 50])] + (testing "nth" + (is (= 10 (nth br 0))) + (is (= 50 (nth br 4))) + (is (thrown? IndexOutOfBoundsException (nth br 5))) + (is (= :not-found (nth br 5 :not-found)))) + (testing "get / ILookup" + (is (= 30 (get br 2))) + (is (nil? (get br 10))) + (is (= :def (get br 10 :def)))) + (testing "IFn" + (is (= 20 (br 1))) + (is (= :def (br 100 :def)))) + (testing "peek / pop" + (is (= 50 (peek br))) + (is (= [10 20 30 40] (vec (seq (pop br))))) + (is (= 10 (peek (oc/byte-rope [10]))))))) + +(deftest byte-rope-assoc + (let [br (oc/byte-rope [1 2 3 4 5])] + (is (= [1 99 3 4 5] (vec (seq (assoc br 1 99))))) + (is (= [1 2 3 4 5 99] (vec (seq (assoc br 5 99))))) ;; append via assoc + (is (thrown? IndexOutOfBoundsException (assoc br 10 1))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Structural Operations +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(deftest byte-rope-cat + (testing "flat + flat" + (let [a (oc/byte-rope [1 2 3]) + b (oc/byte-rope [4 5 6])] + (is (= [1 2 3 4 5 6] (vec (seq (proto/rope-cat a b))))))) + + (testing "with raw byte[] as RHS" + (let [a (oc/byte-rope [1 2]) + b (ba 3 4)] + (is (= [1 2 3 4] (vec (seq (proto/rope-cat a b))))))) + + (testing "combine exceeds flat threshold — promotes to tree" + (let [a (oc/byte-rope (byte-array (repeat 800 1))) + b (oc/byte-rope (byte-array (repeat 800 2))) + c (proto/rope-cat a b)] + (is (= 1600 (count c))) + (is (= 1 (nth c 0))) + (is (= 2 (nth c 1599)))))) + +(deftest byte-rope-split + (let [br (oc/byte-rope (byte-array (range 100)))] + (testing "split at midpoint" + (let [[l r] (proto/rope-split br 50)] + (is (= 50 (count l))) + (is (= 50 (count r))) + (is (= 0 (nth l 0))) + (is (= 50 (nth r 0))))) + (testing "split at 0" + (let [[l r] (proto/rope-split br 0)] + (is (zero? (count l))) + (is (= 100 (count r))))) + (testing "split at end" + (let [[l r] (proto/rope-split br 100)] + (is (= 100 (count l))) + (is (zero? (count r))))) + (testing "out of range" + (is (thrown? IndexOutOfBoundsException (proto/rope-split br 101)))))) + +(deftest byte-rope-sub + (let [br (oc/byte-rope (byte-array (range 100)))] + (testing "subrange" + (let [s (proto/rope-sub br 10 20)] + (is (= 10 (count s))) + (is (= 10 (nth s 0))) + (is (= 19 (nth s 9))))) + (testing "empty range" + (is (zero? (count (proto/rope-sub br 50 50))))))) + +(deftest byte-rope-insert + (let [br (oc/byte-rope [1 2 3 4 5])] + (testing "insert in middle" + (is (= [1 2 99 100 3 4 5] + (vec (seq (proto/rope-insert br 2 [99 100])))))) + (testing "insert at front" + (is (= [99 1 2 3 4 5] + (vec (seq (proto/rope-insert br 0 [99])))))) + (testing "insert at end" + (is (= [1 2 3 4 5 99] + (vec (seq (proto/rope-insert br 5 [99])))))) + (testing "insert byte-array" + (is (= [1 2 99 100 3 4 5] + (vec (seq (proto/rope-insert br 2 (ba 99 100))))))) + (testing "insert byte-rope" + (is (= [1 2 99 100 3 4 5] + (vec (seq (proto/rope-insert br 2 (oc/byte-rope [99 100]))))))) + (testing "insert nothing" + (is (= [1 2 3 4 5] + (vec (seq (proto/rope-insert br 2 [])))))))) + +(deftest byte-rope-remove + (let [br (oc/byte-rope [1 2 3 4 5 6 7])] + (is (= [1 2 6 7] (vec (seq (proto/rope-remove br 2 5))))) + (is (= [] (vec (seq (proto/rope-remove br 0 7))))) + (is (= [1 2 3 4 5 6 7] (vec (seq (proto/rope-remove br 3 3))))))) + +(deftest byte-rope-splice + (let [br (oc/byte-rope [1 2 3 4 5])] + (testing "replace range with new content" + (is (= [1 2 99 100 101 5] + (vec (seq (proto/rope-splice br 2 4 [99 100 101])))))) + (testing "replace with nothing = remove" + (is (= [1 2 5] (vec (seq (proto/rope-splice br 2 4 [])))))) + (testing "splice with byte-rope replacement" + (is (= [1 2 99 5] + (vec (seq (proto/rope-splice br 2 4 (oc/byte-rope [99]))))))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Equality / Hashing / Comparison +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(deftest byte-rope-equality + (testing "byte-rope to byte-rope" + (is (= (oc/byte-rope [1 2 3]) (oc/byte-rope [1 2 3]))) + (is (not= (oc/byte-rope [1 2 3]) (oc/byte-rope [1 2 4])))) + + (testing "byte-rope to byte[]" + (is (= (oc/byte-rope [1 2 3]) (ba 1 2 3))) + (is (not= (oc/byte-rope [1 2 3]) (ba 1 2 4)))) + + (testing "byte-rope NOT equal to vector" + (is (not= (oc/byte-rope [1 2 3]) [1 2 3]))) + + (testing "byte-rope NOT equal to string rope or string" + (is (not= (oc/byte-rope "hello") (oc/string-rope "hello"))) + (is (not= (oc/byte-rope "hello") "hello"))) + + (testing "tree-mode equality" + (let [a (oc/byte-rope (byte-array (map #(mod % 256) (range 3000)))) + b (oc/byte-rope (byte-array (map #(mod % 256) (range 3000))))] + (is (= a b))))) + +(deftest byte-rope-hash + (testing "equal byte ropes hash the same" + (is (= (hash (oc/byte-rope [1 2 3])) (hash (oc/byte-rope [1 2 3]))))) + (testing "tree-mode and flat-mode with same content hash the same" + (let [flat (oc/byte-rope [1 2 3]) + tree (let [b (oc/byte-rope (byte-array (repeat 2000 99)))] + (proto/rope-sub b 0 3))] + (is (= 3 (count tree)))))) + +(deftest byte-rope-comparable + (testing "unsigned lexicographic ordering" + (is (neg? (compare (oc/byte-rope [0 127]) (oc/byte-rope [0 128])))) + (is (pos? (compare (oc/byte-rope [255]) (oc/byte-rope [127 0])))) ;; unsigned 255 > 127 + (is (zero? (compare (oc/byte-rope [1 2 3]) (oc/byte-rope [1 2 3]))))) + (testing "prefix is less than longer" + (is (neg? (compare (oc/byte-rope [1 2]) (oc/byte-rope [1 2 3])))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Reduce / Fold / Seq +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(deftest byte-rope-reduce + (testing "reduce with init" + (is (= 15 (reduce + 0 (oc/byte-rope [1 2 3 4 5]))))) + (testing "reduce without init" + (is (= 15 (reduce + (oc/byte-rope [1 2 3 4 5]))))) + (testing "reduced early termination" + (is (= :stop (reduce (fn [_ x] (if (= x 3) (reduced :stop) x)) + (oc/byte-rope [1 2 3 4 5]))))) + (testing "tree-mode reduce matches expected sum" + (let [br (oc/byte-rope (byte-array (map #(mod % 256) (range 10000)))) + expected (reduce + (map #(mod % 256) (range 10000)))] + (is (= expected (reduce + br)))))) + +(deftest byte-rope-fold + (testing "r/fold matches sequential reduce" + (let [br (oc/byte-rope (byte-array (map #(mod % 256) (range 10000)))) + sequential (reduce + br) + parallel (r/fold + br)] + (is (= sequential parallel))))) + +(deftest byte-rope-seq + (testing "forward seq" + (is (= [1 2 3] (seq (oc/byte-rope [1 2 3]))))) + (testing "reverse seq" + (is (= [3 2 1] (vec (rseq (oc/byte-rope [1 2 3])))))) + (testing "multi-chunk seq" + (let [br (oc/byte-rope (byte-array (map #(mod % 256) (range 2000))))] + (is (= 2000 (count (seq br)))) + (is (= (map #(mod % 256) (range 2000)) (seq br)))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Multi-Byte Reads +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(deftest multi-byte-reads-flat + (let [br (oc/byte-rope (ba 0x12 0x34 0x56 0x78 0x9a 0xbc 0xde 0xf0))] + (testing "big-endian" + (is (= 0x12 (oc/byte-rope-get-byte br 0))) + (is (= 0x1234 (oc/byte-rope-get-short br 0))) + (is (= 0x12345678 (oc/byte-rope-get-int br 0))) + (is (= 0x123456789abcdef0 (oc/byte-rope-get-long br 0)))) + (testing "little-endian" + (is (= 0x3412 (oc/byte-rope-get-short-le br 0))) + (is (= 0x78563412 (oc/byte-rope-get-int-le br 0))) + ;; 0xf0debc9a78563412 as signed long is negative + (is (= (unchecked-long 0xf0debc9a78563412) + (oc/byte-rope-get-long-le br 0)))) + (testing "sign extension of int" + (let [b (oc/byte-rope (ba 0xff 0xff 0xff 0xff))] + (is (= -1 (oc/byte-rope-get-int b 0))) + (is (= -1 (oc/byte-rope-get-int-le b 0))))))) + +(deftest multi-byte-reads-tree + (testing "cross-chunk reads work correctly" + (let [data (byte-array (map #(mod % 256) (range 2000))) + br (oc/byte-rope data)] + ;; Tree-backed — pick a position well inside the tree + (is (= (bit-and (aget data 500) 0xff) (oc/byte-rope-get-byte br 500))) + ;; Two-byte read that might straddle a chunk boundary + (let [expected (bit-or (bit-shift-left (bit-and (long (aget data 500)) 0xff) 8) + (bit-and (long (aget data 501)) 0xff))] + (is (= expected (oc/byte-rope-get-short br 500))))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Search +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(deftest byte-rope-search + (testing "index-of" + (let [br (oc/byte-rope [1 2 3 4 5 3 2 1])] + (is (= 2 (oc/byte-rope-index-of br 3))) + (is (= 5 (oc/byte-rope-index-of br 3 3))) + (is (= -1 (oc/byte-rope-index-of br 99))) + (is (= -1 (oc/byte-rope-index-of br 3 99))))) + + (testing "index-of unsigned semantics" + (let [br (oc/byte-rope [0 128 255])] + (is (= 0 (oc/byte-rope-index-of br 0))) + (is (= 1 (oc/byte-rope-index-of br 128))) + (is (= 2 (oc/byte-rope-index-of br 255)))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Materialization +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(deftest byte-rope-materialization + (let [br (oc/byte-rope (ba 0x48 0x65 0x6c 0x6c 0x6f))] + (testing "byte-rope-bytes defensive copy" + (let [out (oc/byte-rope-bytes br)] + (is (ba= out (ba 0x48 0x65 0x6c 0x6c 0x6f))) + (aset ^bytes out 0 (byte 0)) + (is (= 0x48 (nth br 0))))) ;; rope unchanged + (testing "byte-rope-hex" + (is (= "48656c6c6f" (oc/byte-rope-hex br)))) + (testing "byte-rope-write" + (let [out (ByteArrayOutputStream.)] + (oc/byte-rope-write br out) + (is (ba= (.toByteArray out) (ba 0x48 0x65 0x6c 0x6c 0x6f))))) + (testing "byte-rope-input-stream" + (let [in-stream (oc/byte-rope-input-stream br) + buf (byte-array 10) + n (.read in-stream buf 0 10)] + (is (= 5 n)) + (is (ba= (java.util.Arrays/copyOf buf 5) (ba 0x48 0x65 0x6c 0x6c 0x6f))) + (is (= -1 (.read in-stream))))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Digest +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(deftest byte-rope-digest-sha256 + (testing "matches reference for \"hello\"" + (let [br (oc/byte-rope "hello") + digest (oc/byte-rope-digest br "SHA-256")] + (is (= "2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824" + (oc/byte-rope-hex digest))) + (is (= 32 (count digest))))) + (testing "streamed digest matches materialized digest for large input" + (let [data (byte-array (map #(mod % 256) (range 10000))) + br (oc/byte-rope data) + streamed (oc/byte-rope-digest br "SHA-256") + reference (let [md (java.security.MessageDigest/getInstance "SHA-256")] + (.digest md data))] + (is (ba= (oc/byte-rope-bytes streamed) reference))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Transient +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(deftest byte-rope-transient + (testing "batch construction via conj!" + (let [xs (mapv #(mod % 256) (range 500)) + t (transient (oc/byte-rope)) + t (reduce conj! t xs) + final (persistent! t)] + (is (= 500 (count final))) + (is (= (first xs) (nth final 0))) + (is (= (last xs) (nth final 499))))) + (testing "demotes to flat when small enough at persistent! time" + (let [t (transient (oc/byte-rope)) + t (reduce conj! t (range 100)) + final (persistent! t)] + ;; 100 bytes is well under the flat threshold, expect flat root + (is (instance? ByteRope final)) + (is (bytes? (.-root ^ByteRope final))))) + (testing "cannot conj after persistent!" + (let [t (transient (oc/byte-rope)) + _ (persistent! t)] + (is (thrown? IllegalAccessError (conj! t 1)))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Flat / Tree Boundary +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(deftest flat-tree-boundary + (testing "1024 bytes stays flat" + (let [br (oc/byte-rope (byte-array (repeat 1024 0x41)))] + (is (= 1024 (count br))) + (let [root (.-root ^ByteRope br)] + (is (some? root)) + (is (bytes? root))))) + (testing "1025 bytes promotes to tree" + (let [br (oc/byte-rope (byte-array (repeat 1025 0x41)))] + (is (= 1025 (count br))) + (let [root (.-root ^ByteRope br)] + (is (not (bytes? root)))))) + (testing "edits that grow past threshold promote" + (let [br (oc/byte-rope (byte-array (repeat 1020 1)))] + (is (bytes? (.-root ^ByteRope br))) + (let [br2 (proto/rope-insert br 500 (byte-array (repeat 10 2)))] + (is (= 1030 (count br2))) + (is (not (bytes? (.-root ^ByteRope br2)))))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; EDN Round-Trip +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(deftest byte-rope-edn-roundtrip + (testing "print-method emits #byte/rope hex literal" + (is (= "#byte/rope \"68656c6c6f\"" + (pr-str (oc/byte-rope "hello"))))) + (testing "round-trips through EDN reader" + (let [source "#byte/rope \"deadbeef\"" + parsed (read-string source)] + (is (instance? ByteRope parsed)) + (is (= 4 (count parsed))) + (is (= [0xde 0xad 0xbe 0xef] (vec (seq parsed)))))) + (testing "invalid hex rejected" + (is (thrown? IllegalArgumentException (read-string "#byte/rope \"xy\""))) + (is (thrown? IllegalArgumentException (read-string "#byte/rope \"abc\""))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Metadata +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(deftest byte-rope-metadata + (let [br (-> (oc/byte-rope [1 2 3]) (with-meta {:source :test}))] + (is (= {:source :test} (meta br))) + (testing "meta preserved through edits" + (is (= {:source :test} (meta (proto/rope-insert br 0 [0])))) + (is (= {:source :test} (meta (assoc br 0 99)))) + (is (= {:source :test} (meta (pop br))))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Property-Based Tests +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn- clamp ^long [^long n ^long i] + (min n (max 0 i))) + +(def gen-byte + (gen/fmap #(bit-and (long %) 0xff) gen/nat)) + +(def gen-byte-seq + (gen/vector gen-byte 0 200)) + +(defspec prop-byte-rope-roundtrip 100 + (prop/for-all [xs gen-byte-seq] + (let [br (oc/byte-rope xs)] + (= (vec xs) (vec (seq br)))))) + +(defspec prop-split-roundtrip 100 + (prop/for-all [xs (gen/such-that seq gen-byte-seq) + i gen/nat] + (let [n (count xs) + i' (rem i (inc n)) + br (oc/byte-rope xs) + [l r] (proto/rope-split br i')] + (and (= (take i' xs) (vec (seq l))) + (= (drop i' xs) (vec (seq r))))))) + +(defspec prop-splice-oracle 100 + (prop/for-all [xs (gen/such-that seq gen-byte-seq) + a gen/nat + b gen/nat + ins gen-byte-seq] + (let [n (count xs) + lo (clamp n (min a b)) + hi (clamp n (max a b)) + expected (concat (take lo xs) ins (drop hi xs)) + br (oc/byte-rope xs) + result (proto/rope-splice br lo hi ins)] + (= (vec expected) (vec (seq result)))))) + +(defspec prop-concat-roundtrip 100 + (prop/for-all [xs gen-byte-seq + ys gen-byte-seq] + (let [a (oc/byte-rope xs) + b (oc/byte-rope ys)] + (= (concat xs ys) (vec (seq (proto/rope-cat a b))))))) + +(defspec prop-equality 100 + (prop/for-all [xs gen-byte-seq] + (let [a (oc/byte-rope xs) + b (oc/byte-rope xs)] + (and (= a b) + (= (hash a) (hash b)))))) + +(defspec prop-byte-array-equality 100 + (prop/for-all [xs gen-byte-seq] + (let [br (oc/byte-rope xs) + ba (byte-array (map unchecked-byte xs))] + (= br ba)))) + +(defspec prop-csi-after-edits 50 + (prop/for-all [xs (gen/vector gen-byte 10 500) + ops (gen/vector + (gen/one-of + [(gen/fmap (fn [[a b]] [:split (min a b)]) + (gen/tuple gen/nat gen/nat)) + (gen/fmap (fn [[a b ins]] [:splice (min a b) (max a b) ins]) + (gen/tuple gen/nat gen/nat gen-byte-seq))]) + 1 8)] + (let [br (oc/byte-rope xs) + result (reduce + (fn [r [op a b ins]] + (let [n (count r)] + (case op + :split (first (proto/rope-split r (clamp n a))) + :splice (proto/rope-splice r (clamp n (min a b)) + (clamp n (max a b)) + (or ins []))))) + br ops) + root (.-root ^ByteRope result)] + (or (nil? root) (bytes? root) (ropetree/invariant-valid? root))))) + +(defspec prop-nth-matches-seq 100 + (prop/for-all [xs (gen/such-that seq gen-byte-seq)] + (let [br (oc/byte-rope xs)] + (every? identity + (map #(= (nth (vec xs) %) (nth br %)) + (range (count xs))))))) + +(defspec prop-multi-byte-reads 50 + (prop/for-all [xs (gen/vector gen-byte 8 200) + i gen/nat] + (let [n (count xs) + br (oc/byte-rope xs) + off (rem i (max 1 (- n 7))) + expected-short (bit-or (bit-shift-left (long (nth xs off)) 8) + (long (nth xs (inc off))))] + (= expected-short (oc/byte-rope-get-short br off))))) diff --git a/test/ordered_collections/coverage_test.clj b/test/ordered_collections/coverage_test.clj index 5cc07e8..f892207 100644 --- a/test/ordered_collections/coverage_test.clj +++ b/test/ordered_collections/coverage_test.clj @@ -446,3 +446,35 @@ (is (= [[:a 1] [:b 2] [:c 3]] (vec m))) (is (= 2 (m :b))) (is (= [[:a 1] [:b 2] [:c 3] [:d 4]] (vec (assoc m :d 4)))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Primitive specialization preservation +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(deftest long-specialization-preserved-through-mutations + (let [s (long-ordered-set [1 2 3]) + s+ (conj s 4) + s- (disj s 2)] + (testing "conj preserves LongKeyNode" + (is (instance? ordered_collections.kernel.node.LongKeyNode + (.-root ^ordered_collections.types.ordered_set.OrderedSet s+)))) + (testing "disj preserves LongKeyNode" + (is (instance? ordered_collections.kernel.node.LongKeyNode + (.-root ^ordered_collections.types.ordered_set.OrderedSet s-)))) + (testing "values are correct" + (is (= #{1 2 3 4} (set s+))) + (is (= #{1 3} (set s-))))) + + (let [m (long-ordered-map {1 :a 2 :b 3 :c}) + m+ (assoc m 4 :d) + m- (dissoc m 2)] + (testing "assoc preserves LongKeyNode" + (is (instance? ordered_collections.kernel.node.LongKeyNode + (.-root ^ordered_collections.types.ordered_map.OrderedMap m+)))) + (testing "dissoc preserves LongKeyNode" + (is (instance? ordered_collections.kernel.node.LongKeyNode + (.-root ^ordered_collections.types.ordered_map.OrderedMap m-)))) + (testing "values are correct" + (is (= :d (m+ 4))) + (is (nil? (m- 2)))))) diff --git a/test/ordered_collections/edn_test.clj b/test/ordered_collections/edn_test.clj index 385fb64..1b75b9a 100644 --- a/test/ordered_collections/edn_test.clj +++ b/test/ordered_collections/edn_test.clj @@ -111,9 +111,9 @@ ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; (deftest interval-set-tagged-literal-format - (testing "pr-str produces #ordered/interval-set tag" + (testing "pr-str produces #interval/set tag" (let [s (oc/interval-set [[1 5] [10 20]])] - (is (clojure.string/starts-with? (pr-str s) "#ordered/interval-set "))))) + (is (clojure.string/starts-with? (pr-str s) "#interval/set "))))) (deftest interval-set-round-trip (testing "read-string round-trip" @@ -138,9 +138,9 @@ ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; (deftest interval-map-tagged-literal-format - (testing "pr-str produces #ordered/interval-map tag" + (testing "pr-str produces #interval/map tag" (let [m (oc/interval-map {[1 5] :a [10 20] :b})] - (is (clojure.string/starts-with? (pr-str m) "#ordered/interval-map "))))) + (is (clojure.string/starts-with? (pr-str m) "#interval/map "))))) (deftest interval-map-round-trip (testing "read-string round-trip" @@ -163,9 +163,9 @@ ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; (deftest range-map-tagged-literal-format - (testing "pr-str produces #ordered/range-map tag" + (testing "pr-str produces #range/map tag" (let [rm (oc/range-map [[[0 10] :a] [[20 30] :b]])] - (is (clojure.string/starts-with? (pr-str rm) "#ordered/range-map "))))) + (is (clojure.string/starts-with? (pr-str rm) "#range/map "))))) (deftest range-map-round-trip (testing "read-string round-trip" @@ -184,9 +184,9 @@ ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; (deftest priority-queue-tagged-literal-format - (testing "pr-str produces #ordered/priority-queue tag" + (testing "pr-str produces #priority/queue tag" (let [pq (oc/priority-queue [[1 :a] [3 :c] [2 :b]])] - (is (clojure.string/starts-with? (pr-str pq) "#ordered/priority-queue "))))) + (is (clojure.string/starts-with? (pr-str pq) "#priority/queue "))))) (deftest priority-queue-round-trip (testing "read-string round-trip" @@ -205,9 +205,9 @@ ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; (deftest ordered-multiset-tagged-literal-format - (testing "pr-str produces #ordered/multiset tag" + (testing "pr-str produces #multi/set tag" (let [ms (oc/ordered-multiset [3 1 2 1])] - (is (clojure.string/starts-with? (pr-str ms) "#ordered/multiset "))))) + (is (clojure.string/starts-with? (pr-str ms) "#multi/set "))))) (deftest ordered-multiset-round-trip (testing "read-string round-trip" @@ -285,12 +285,12 @@ ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; (deftest rope-tagged-literal-format - (testing "pr-str produces #ordered/rope tag" - (is (= "#ordered/rope [1 2 3]" (pr-str (oc/rope [1 2 3]))))) + (testing "pr-str produces #vec/rope tag" + (is (= "#vec/rope [1 2 3]" (pr-str (oc/rope [1 2 3]))))) (testing "empty rope" - (is (= "#ordered/rope []" (pr-str (oc/rope))))) + (is (= "#vec/rope []" (pr-str (oc/rope))))) (testing "slice materializes with same tag" - (is (= "#ordered/rope [2 3 4]" + (is (= "#vec/rope [2 3 4]" (pr-str (oc/rope-sub (oc/rope (range 6)) 2 5)))))) (deftest rope-round-trip diff --git a/test/ordered_collections/memory_test.clj b/test/ordered_collections/memory_test.clj index 2bbaf98..37b4b03 100644 --- a/test/ordered_collections/memory_test.clj +++ b/test/ordered_collections/memory_test.clj @@ -115,20 +115,27 @@ (let [n 10000 data (vec (shuffle (range n))) intervals (vec (for [i (range n)] [(* i 2) (+ (* i 2) (rand-int 10))])) + range-entries (vec (for [i (range n)] [[(* i 10) (+ (* i 10) 10)] (keyword (str "v" i))])) ;; Build collections interval-set (oc/interval-set intervals) interval-map (oc/interval-map (map #(vector % :val) intervals)) + range-map (oc/range-map range-entries) + segment-tree (oc/sum-tree (into {} (map #(vector % (long %)) data))) multiset (oc/ordered-multiset (concat data data)) ; duplicates priority-q (oc/priority-queue (map #(vector % %) data)) - fuzzy (oc/fuzzy-set data) + fuzzy-set (oc/fuzzy-set data) + fuzzy-map (oc/fuzzy-map (zipmap data (map str data))) ;; Measure iset-bpe (bytes-per-element interval-set n) imap-bpe (bytes-per-element interval-map n) + rmap-bpe (bytes-per-element range-map n) + segt-bpe (bytes-per-element segment-tree n) mset-bpe (bytes-per-element multiset (* 2 n)) pq-bpe (bytes-per-element priority-q n) - fuzz-bpe (bytes-per-element fuzzy n)] + fset-bpe (bytes-per-element fuzzy-set n) + fmap-bpe (bytes-per-element fuzzy-map n)] (println) (println (format "=== Specialized Collections at N=%,d ===" n)) @@ -136,15 +143,24 @@ iset-bpe (format-bytes (measure-bytes interval-set)))) (println (format " interval-map: %5.1f bytes/interval (total: %s)" imap-bpe (format-bytes (measure-bytes interval-map)))) - (println (format " ordered-multiset:%5.1f bytes/elem (total: %s)" + (println (format " range-map: %5.1f bytes/range (total: %s)" + rmap-bpe (format-bytes (measure-bytes range-map)))) + (println (format " segment-tree: %5.1f bytes/entry (total: %s)" + segt-bpe (format-bytes (measure-bytes segment-tree)))) + (println (format " ordered-multiset:%5.1f bytes/elem (total: %s)" mset-bpe (format-bytes (measure-bytes multiset)))) - (println (format " priority-queue: %5.1f bytes/elem (total: %s)" + (println (format " priority-queue: %5.1f bytes/elem (total: %s)" pq-bpe (format-bytes (measure-bytes priority-q)))) - (println (format " fuzzy-set: %5.1f bytes/elem (total: %s)" - fuzz-bpe (format-bytes (measure-bytes fuzzy)))) + (println (format " fuzzy-set: %5.1f bytes/elem (total: %s)" + fset-bpe (format-bytes (measure-bytes fuzzy-set)))) + (println (format " fuzzy-map: %5.1f bytes/entry (total: %s)" + fmap-bpe (format-bytes (measure-bytes fuzzy-map)))) ;; Sanity checks - (is (< iset-bpe 200) "interval-set should use < 200 bytes/interval")))) + (is (< iset-bpe 200) "interval-set should use < 200 bytes/interval") + (is (< rmap-bpe 300) "range-map should use < 300 bytes/range") + (is (< segt-bpe 300) "segment-tree should use < 300 bytes/entry") + (is (< fmap-bpe 300) "fuzzy-map should use < 300 bytes/entry")))) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; Node Structure Analysis @@ -222,6 +238,73 @@ (is (< ratio 1.10) (format "Rope overhead should be < 10%% at N=%d (was %.1f%%)" n (* 100 (- ratio 1.0)))))))) +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; StringRope Memory +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn- random-ascii ^String [^long n] + (let [sb (StringBuilder. (int n)) + chars "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789" + nchars (.length chars)] + (dotimes [_ n] + (.append sb (.charAt chars (rand-int nchars)))) + (.toString sb))) + +(deftest string-rope-memory + (testing "StringRope memory vs String (ASCII content, latin-1 compact storage)" + (doseq [n [1000 10000 100000]] + (let [^String text (random-ascii n) + sr (oc/string-rope text) + sr-bpe (bytes-per-element sr n) + str-bpe (bytes-per-element text n) + ratio (/ sr-bpe str-bpe)] + + (println) + (println (format "=== StringRope Memory at N=%,d ===" n)) + (println (format " string-rope: %5.1f bytes/char (total: %s)" + sr-bpe (format-bytes (measure-bytes sr)))) + (println (format " string: %5.1f bytes/char (total: %s)" + str-bpe (format-bytes (measure-bytes text)))) + (println (format " ratio: %.2fx" ratio)) + + ;; StringRope overhead should be bounded — flat mode is + ;; ~string, tree mode adds per-chunk String + SimpleNode overhead + (is (< ratio 3.0) + (format "StringRope overhead should be < 3x at N=%d (was %.2fx)" n ratio)))))) + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; ByteRope Memory +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn- random-bytes ^bytes [^long n] + (let [rng (java.util.Random. 42) + b (byte-array (int n))] + (.nextBytes rng b) + b)) + +(deftest byte-rope-memory + (testing "ByteRope memory vs byte[]" + (doseq [n [1000 10000 100000]] + (let [^bytes data (random-bytes n) + br (oc/byte-rope data) + br-bpe (bytes-per-element br n) + ba-bpe (bytes-per-element data n) + ratio (/ br-bpe ba-bpe)] + + (println) + (println (format "=== ByteRope Memory at N=%,d ===" n)) + (println (format " byte-rope: %5.1f bytes/byte (total: %s)" + br-bpe (format-bytes (measure-bytes br)))) + (println (format " byte[]: %5.1f bytes/byte (total: %s)" + ba-bpe (format-bytes (measure-bytes data)))) + (println (format " ratio: %.2fx" ratio)) + + ;; byte[] packs at 1 byte/byte plus a tiny object header, so the + ;; tree-mode rope's per-chunk + per-node overhead is proportionally + ;; bigger than for StringRope. Bound it loosely. + (is (< ratio 5.0) + (format "ByteRope overhead should be < 5x at N=%d (was %.2fx)" n ratio)))))) + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; Summary Report ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; @@ -242,7 +325,18 @@ core-map (into (sorted-map) map-data) avl-map (into (avl/sorted-map) map-data) ordered-map (oc/ordered-map map-data) - long-map (oc/long-ordered-map map-data)] + long-map (oc/long-ordered-map map-data) + + ;; Ropes — compared against their natural baselines + rope-data (range n) + rope (oc/rope rope-data) + vector (vec rope-data) + + text (random-ascii n) + string-rope (oc/string-rope text) + + ^bytes byte-data (random-bytes n) + byte-rope (oc/byte-rope byte-data)] (println) (println "╔══════════════════════════════════════════════════════════════╗") @@ -269,6 +363,39 @@ ratio (/ bpe core-bpe)] (println (format "║ %-20s │ %10.1f │ %12s │ %8.2fx ║" name bpe (format-bytes (measure-bytes coll)) ratio))))) + (println "╠══════════════════════════════════════════════════════════════╣") + (println "║ Rope family │ Bytes/Elem │ Total Memory │ vs base ║") + (println "╠══════════════════════════════════════════════════════════════╣") + (let [vec-bpe (bytes-per-element vector n) + rope-bpe (bytes-per-element rope n)] + (println (format "║ %-20s │ %10.1f │ %12s │ %8s ║" + "vector (baseline)" vec-bpe + (format-bytes (measure-bytes vector)) + "1.00x")) + (println (format "║ %-20s │ %10.1f │ %12s │ %8.2fx ║" + "rope" rope-bpe + (format-bytes (measure-bytes rope)) + (/ rope-bpe vec-bpe)))) + (let [str-bpe (bytes-per-element text n) + sr-bpe (bytes-per-element string-rope n)] + (println (format "║ %-20s │ %10.1f │ %12s │ %8s ║" + "string (baseline)" str-bpe + (format-bytes (measure-bytes text)) + "1.00x")) + (println (format "║ %-20s │ %10.1f │ %12s │ %8.2fx ║" + "string-rope" sr-bpe + (format-bytes (measure-bytes string-rope)) + (/ sr-bpe str-bpe)))) + (let [ba-bpe (bytes-per-element byte-data n) + br-bpe (bytes-per-element byte-rope n)] + (println (format "║ %-20s │ %10.1f │ %12s │ %8s ║" + "byte[] (baseline)" ba-bpe + (format-bytes (measure-bytes byte-data)) + "1.00x")) + (println (format "║ %-20s │ %10.1f │ %12s │ %8.2fx ║" + "byte-rope" br-bpe + (format-bytes (measure-bytes byte-rope)) + (/ br-bpe ba-bpe)))) (println "╚══════════════════════════════════════════════════════════════╝") ;; Assertions for documentation accuracy diff --git a/test/ordered_collections/ordered_map_test.clj b/test/ordered_collections/ordered_map_test.clj index 981026a..f3a5ddd 100644 --- a/test/ordered_collections/ordered_map_test.clj +++ b/test/ordered_collections/ordered_map_test.clj @@ -7,6 +7,7 @@ [ordered-collections.core :refer [general-compare ordered-map ordered-map-by ordered-map-with + long-ordered-map ordered-merge-with assoc-new]] [ordered-collections.test-utils :as tu]) (:import [java.util UUID])) @@ -171,7 +172,16 @@ (testing "single map" (let [m (ordered-map [[1 :a] [2 :b]])] - (is (= m (ordered-merge-with (fn [k a b] b) m)))))) + (is (= m (ordered-merge-with (fn [k a b] b) m))))) + + (testing "long-ordered-map preserves specialization through merge" + (let [m1 (long-ordered-map {1 :a 2 :b}) + m2 (long-ordered-map {2 :B 3 :c}) + merged (ordered-merge-with (fn [k a b] b) m1 m2) + root (.-root ^ordered_collections.types.ordered_map.OrderedMap merged)] + (is (= {1 :a 2 :B 3 :c} merged)) + (is (instance? ordered_collections.kernel.node.LongKeyNode root) + "merge result should preserve LongKeyNode specialization")))) (defspec prop-merge-with-addition-is-commutative 100 (prop/for-all [kvs1 tu/gen-int-map-entries diff --git a/test/ordered_collections/rope_test.clj b/test/ordered_collections/rope_test.clj index 02ef6a8..33c1555 100644 --- a/test/ordered_collections/rope_test.clj +++ b/test/ordered_collections/rope_test.clj @@ -458,9 +458,9 @@ ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; (deftest rope-print-method - (is (= "#ordered/rope [1 2 3]" (pr-str (oc/rope [1 2 3])))) - (is (= "#ordered/rope []" (pr-str (oc/rope)))) - (is (= "#ordered/rope [5 6 7]" + (is (= "#vec/rope [1 2 3]" (pr-str (oc/rope [1 2 3])))) + (is (= "#vec/rope []" (pr-str (oc/rope)))) + (is (= "#vec/rope [5 6 7]" (pr-str (oc/rope-sub (oc/rope (range 10)) 5 8))))) @@ -743,10 +743,12 @@ ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; (defn- rope-tree-healthy? - "Check that the rope root satisfies both CSI and WBT balance." + "Check that the rope root satisfies both CSI and WBT balance. + Flat-mode roots (bare APersistentVector) are trivially valid." [root] (and (ropetree/invariant-valid? root) (or (nil? root) + (instance? clojure.lang.APersistentVector root) (ordered-collections.kernel.tree/node-healthy? root)))) (defspec prop-multi-chunk-edit-sequences 50 @@ -902,7 +904,7 @@ (prop/for-all [xs (gen/vector gen/small-integer 0 500)] (let [r (oc/rope xs) rt (clojure.edn/read-string - {:readers {'ordered/rope oc/rope}} + {:readers {'vec/rope oc/rope}} (pr-str r))] (and (= (vec r) (vec rt)) (= (count r) (count rt)))))) diff --git a/test/ordered_collections/rope_tuning_bench.clj b/test/ordered_collections/rope_tuning_bench.clj index d71287b..ef54713 100644 --- a/test/ordered_collections/rope_tuning_bench.clj +++ b/test/ordered_collections/rope_tuning_bench.clj @@ -1,18 +1,44 @@ (ns ordered-collections.rope-tuning-bench - "Systematic benchmark for rope chunk-size tuning. + "Systematic chunk-size tuning benchmark for all three rope variants. - Measures every rope operation across candidate chunk sizes to find the - optimal target/min pair. Each candidate rebuilds the test data at that - chunk size, then benchmarks all key operations against PersistentVector. + Sweeps a set of candidate chunk sizes for each variant and measures + every major operation, reporting the per-operation speedup relative + to the natural baseline (Vector / String / byte[]). Each candidate + is run with the kernel's `*target-chunk-size*` dynamic var bound to + the candidate value so the ENTIRE pipeline — construction, concat, + split, splice, reduce — operates at that chunk size. - Usage: - lein bench-rope-tuning ; Default sizes - lein bench-rope-tuning --quick ; Fast feedback + Used to identify optimal per-variant CSI constants, which are then + set in the corresponding `types/*_rope.clj` variant file. - This is the rope analogue of bench-parallel: it finds the chunk size - that gives the best overall profile rather than guessing." + Usage: + lein bench-rope-tuning ; full sweep + lein bench-rope-tuning --quick ; smaller, faster + lein bench-rope-tuning --variant rope ; one variant only + lein bench-rope-tuning --variant string-rope + lein bench-rope-tuning --variant byte-rope + + The report shows two scores per configuration: + + 'score' — geometric mean over splice, split, and concat only. + These are the structural-editing operations that define + the rope's value proposition. Splice and split are + chunk-size-insensitive at scale (tree walk dominates), + so this score is mainly driven by concat. + + 'all' — geometric mean over all six measured operations + (construct, nth, reduce, split, splice, concat). + Useful as a sanity check but weights operations like + nth and construct that are not the reason users choose + a rope over a vector. + + Note: the tuner does NOT measure repeated-edits (200 random + splices), insert, or remove — the headline workloads. From the + single-splice data, these are expected to be chunk-size-insensitive + at scale, but this has not been verified for targets > 1024." (:require [ordered-collections.bench-utils :as bu :refer [has-flag?]] [ordered-collections.kernel.rope :as ropetree] + [ordered-collections.kernel.chunk] [ordered-collections.kernel.node :as node :refer [leaf leaf?]] [ordered-collections.kernel.tree :as tree])) @@ -21,15 +47,15 @@ ;; Benchmark Infrastructure ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; -(defn bench-op - "Benchmark f, returning mean time in microseconds." +(defn- bench-op + "Benchmark thunk f, returning mean time in microseconds." [f warmup-iters bench-iters] (dotimes [_ warmup-iters] (f)) (let [start (System/nanoTime)] (dotimes [_ bench-iters] (f)) (/ (- (System/nanoTime) start) (* bench-iters 1000.0)))) -(defn bench-op-samples +(defn- bench-op-samples "Benchmark f multiple times and return summary stats." [f warmup-iters bench-iters sample-runs] (let [samples (vec (repeatedly sample-runs #(bench-op f warmup-iters bench-iters))) @@ -38,234 +64,472 @@ {:median (nth sorted (quot n 2)) :mean (/ (reduce + sorted) n) :low (first sorted) - :high (last sorted) - :samples samples})) + :high (last sorted)})) + +(defn- geomean [xs] + (let [xs (remove (fn [x] (or (nil? x) (zero? x))) xs) + n (count xs)] + (if (zero? n) + 0.0 + (Math/exp (/ (reduce + (map #(Math/log (double %)) xs)) n))))) + +(defmacro with-chunk-size + "Bind kernel CSI vars to target/minsz for the duration of body." + [target minsz & body] + `(binding [ropetree/*target-chunk-size* (long ~target) + ropetree/*min-chunk-size* (long ~minsz)] + ~@body)) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; -;; Rope Construction at Arbitrary Chunk Size +;; Generic Rope (vector chunks) vs PersistentVector ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; -(defn- rope-node-create-fn - "Build a rope-node-create function for a given chunk size." - [] - (fn [chunk _ l r] - (node/->SimpleNode chunk - (+ (count chunk) (ropetree/rope-size l) (ropetree/rope-size r)) - l r - (+ 1 (tree/node-size l) (tree/node-size r))))) - -(defn build-rope-root - "Build a rope root from a collection using a specific target chunk size." - [coll ^long target] - (let [chunks (mapv vec (partition-all target coll))] - (ropetree/chunks->root chunks))) - -(defn build-vector +(defn- build-generic-rope + "Build a rope tree at the currently bound *target-chunk-size*. + Caller must bind *t-join* and CSI vars." [coll] - (vec coll)) + (ropetree/coll->root coll)) + +(defn- bench-generic-rope-at [^long n ^long target opts] + (let [minsz (quot target 2)] + (with-chunk-size target minsz + (binding [tree/*t-join* ropetree/rope-node-create] + (let [data (range n) + root (build-generic-rope data) + v (vec data) + mid (quot n 2) + span (min 16 (quot n 4)) + lo (max 0 (- mid span)) + hi (min n (+ mid span)) + ins (vec (range (* 2 span))) + ins-rt (ropetree/coll->root ins) + rng (java.util.Random. 42) + idxs (int-array (repeatedly 1000 #(.nextInt rng (max 1 n)))) + chunk-pieces (vec (partition-all target data)) + piece-roots (mapv #(ropetree/coll->root %) chunk-pieces)] + {:target target + :n n + :chunks (count chunk-pieces) + + ;; Construction + :construct + {:rope (bench-op-samples + #(with-chunk-size target minsz + (binding [tree/*t-join* ropetree/rope-node-create] + (ropetree/coll->root data))) + (:warmup opts) (:iters opts) (:samples opts)) + :baseline (bench-op-samples + #(vec data) + (:warmup opts) (:iters opts) (:samples opts))} + + ;; nth + :nth + {:rope (bench-op-samples + #(dotimes [i 1000] + (ropetree/rope-nth root (aget idxs i))) + (:warmup opts) (:iters opts) (:samples opts)) + :baseline (bench-op-samples + #(dotimes [i 1000] + (.nth ^clojure.lang.Indexed v (aget idxs i))) + (:warmup opts) (:iters opts) (:samples opts))} + + ;; reduce + :reduce + {:rope (bench-op-samples + #(ropetree/rope-reduce + 0 root) + (:warmup opts) (:iters opts) (:samples opts)) + :baseline (bench-op-samples + #(reduce + 0 v) + (:warmup opts) (:iters opts) (:samples opts))} + + ;; split at midpoint + :split + {:rope (bench-op-samples + #(with-chunk-size target minsz + (binding [tree/*t-join* ropetree/rope-node-create] + (ropetree/rope-split-at root mid))) + (:warmup opts) (:iters opts) (:samples opts)) + :baseline (bench-op-samples + #(do [(subvec v 0 mid) (subvec v mid)] nil) + (:warmup opts) (:iters opts) (:samples opts))} + + ;; splice small replacement at midpoint + :splice + {:rope (bench-op-samples + #(with-chunk-size target minsz + (binding [tree/*t-join* ropetree/rope-node-create] + (ropetree/rope-splice-root root lo hi ins-rt))) + (:warmup opts) (:iters opts) (:samples opts)) + :baseline (bench-op-samples + #(vec (concat (subvec v 0 lo) ins (subvec v hi))) + (:warmup opts) (:iters opts) (:samples opts))} + + ;; bulk concat of pre-built pieces + :concat + {:rope (bench-op-samples + #(with-chunk-size target minsz + (binding [tree/*t-join* ropetree/rope-node-create] + (reduce ropetree/rope-concat nil piece-roots))) + (:warmup opts) (:iters opts) (:samples opts)) + :baseline (bench-op-samples + #(reduce into [] chunk-pieces) + (:warmup opts) (:iters opts) (:samples opts))}}))))) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; -;; Individual Operation Benchmarks +;; StringRope (String chunks) vs String ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; -(defn bench-nth-for-chunk-size - "Benchmark 1000 random nth lookups." - [root v n opts] - (let [rng (java.util.Random. 42) - idxs (vec (repeatedly 1000 #(.nextInt rng (int n))))] - {:rope (bench-op-samples - #(dotimes [i 1000] - (ropetree/rope-nth root (nth idxs i))) - (:warmup opts) (:iters opts) (:samples opts)) - :vector (bench-op-samples - #(dotimes [i 1000] - (nth v (nth idxs i))) - (:warmup opts) (:iters opts) (:samples opts))})) - -(defn bench-reduce-for-chunk-size - "Benchmark reduce + over all elements." - [root v opts] - {:rope (bench-op-samples - #(ropetree/rope-reduce + 0 root) - (:warmup opts) (:iters opts) (:samples opts)) - :vector (bench-op-samples - #(reduce + 0 v) - (:warmup opts) (:iters opts) (:samples opts))}) - -(defn bench-split-for-chunk-size - "Benchmark split at midpoint." - [root v n opts] - (let [mid (quot n 2)] - {:rope (bench-op-samples - #(ropetree/rope-split-at root mid) - (:warmup opts) (:iters opts) (:samples opts)) - :vector (bench-op-samples - #(do [(subvec v 0 mid) (subvec v mid)] nil) - (:warmup opts) (:iters opts) (:samples opts))})) - -(defn bench-splice-for-chunk-size - "Benchmark splice at midpoint." - [root v n opts] - (let [mid (quot n 2) - start (max 0 (- mid 16)) - end (min n (+ mid 16)) - ins (vec (range 32)) - ins-root (build-rope-root ins ropetree/+target-chunk-size+)] - {:rope (bench-op-samples - #(ropetree/rope-splice-root root start end ins-root) - (:warmup opts) (:iters opts) (:samples opts)) - :vector (bench-op-samples - #(vec (concat (subvec v 0 start) ins (subvec v end))) - (:warmup opts) (:iters opts) (:samples opts))})) - -(defn bench-concat-for-chunk-size - "Benchmark bulk concat of pre-built pieces." - [pieces-root vec-pieces opts] - {:rope (bench-op-samples - #(reduce ropetree/rope-concat nil pieces-root) - (:warmup opts) (:iters opts) (:samples opts)) - :vector (bench-op-samples - #(reduce into [] vec-pieces) - (:warmup opts) (:iters opts) (:samples opts))}) - -(defn bench-text-splice-for-chunk-size - "Benchmark splice of character data: rope vs String." - [root ^String s n opts] - (let [mid (quot n 2) - start (max 0 (- mid 16)) - end (min n (+ mid 16)) - ins (vec (seq "REPLACED-CONTENT!!!!!!!!!!!!!!!!")) - ins-s "REPLACED-CONTENT!!!!!!!!!!!!!!!!" - ins-root (build-rope-root ins ropetree/+target-chunk-size+)] - {:rope (bench-op-samples - #(ropetree/rope-splice-root root start end ins-root) - (:warmup opts) (:iters opts) (:samples opts)) - :string (bench-op-samples - #(str (subs s 0 start) ins-s (subs s end)) - (:warmup opts) (:iters opts) (:samples opts))})) - -(defn bench-text-split-for-chunk-size - "Benchmark split of character data: rope vs String." - [root ^String s n opts] - (let [mid (quot n 2)] - {:rope (bench-op-samples - #(ropetree/rope-split-at root mid) - (:warmup opts) (:iters opts) (:samples opts)) - :string (bench-op-samples - #(do [(subs s 0 mid) (subs s mid)] nil) - (:warmup opts) (:iters opts) (:samples opts))})) +(defn- random-text ^String [^long n] + (let [sb (StringBuilder. (int n)) + chars "abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789" + nchars (.length chars)] + (dotimes [_ n] + (.append sb (.charAt chars (rand-int nchars)))) + (.toString sb))) + +(defn- bench-string-rope-at [^long n ^long target opts] + (let [minsz (quot target 2)] + (with-chunk-size target minsz + (binding [tree/*t-join* ropetree/string-rope-node-create] + (let [^String text (random-text n) + root (ropetree/str->root text) + mid (quot n 2) + span (min 16 (quot n 4)) + lo (max 0 (- mid span)) + hi (min n (+ mid span)) + ^String ins (.toString (StringBuilder. (apply str (repeat (* 2 span) "X")))) + ins-rt (ropetree/str->root ins) + rng (java.util.Random. 42) + idxs (int-array (repeatedly 1000 #(.nextInt rng (max 1 n))))] + {:target target + :n n + :chunks (count (ropetree/root->chunks root)) + + :construct + {:rope (bench-op-samples + #(with-chunk-size target minsz + (binding [tree/*t-join* ropetree/string-rope-node-create] + (ropetree/str->root text))) + (:warmup opts) (:iters opts) (:samples opts)) + :baseline (bench-op-samples + #(String. text) + (:warmup opts) (:iters opts) (:samples opts))} + + :nth + {:rope (bench-op-samples + #(dotimes [i 1000] + (ropetree/rope-nth root (aget idxs i))) + (:warmup opts) (:iters opts) (:samples opts)) + :baseline (bench-op-samples + #(dotimes [i 1000] + (.charAt text (aget idxs i))) + (:warmup opts) (:iters opts) (:samples opts))} + + :reduce + {:rope (bench-op-samples + #(ropetree/rope-reduce + (fn [^long acc c] (+ acc (long (int c)))) + (long 0) root) + (:warmup opts) (:iters opts) (:samples opts)) + :baseline (bench-op-samples + #(let [len (.length text)] + (loop [i (int 0), acc (long 0)] + (if (< i len) + (recur (unchecked-inc-int i) + (+ acc (long (int (.charAt text i))))) + acc))) + (:warmup opts) (:iters opts) (:samples opts))} + + :split + {:rope (bench-op-samples + #(with-chunk-size target minsz + (binding [tree/*t-join* ropetree/string-rope-node-create] + (ropetree/rope-split-at root mid))) + (:warmup opts) (:iters opts) (:samples opts)) + :baseline (bench-op-samples + #(do [(.substring text 0 mid) (.substring text mid)] nil) + (:warmup opts) (:iters opts) (:samples opts))} + + :splice + {:rope (bench-op-samples + #(with-chunk-size target minsz + (binding [tree/*t-join* ropetree/string-rope-node-create] + (ropetree/rope-splice-root root lo hi ins-rt))) + (:warmup opts) (:iters opts) (:samples opts)) + :baseline (bench-op-samples + #(let [sb (StringBuilder. (+ (.length text) (.length ins) (- lo) hi))] + (.append sb text 0 lo) + (.append sb ins) + (.append sb text hi (.length text)) + (.toString sb)) + (:warmup opts) (:iters opts) (:samples opts))} + + :concat + (let [chunks (ropetree/root->chunks root) + chunk-roots (mapv #(ropetree/str->root %) chunks) + chunk-strs (vec chunks)] + {:rope (bench-op-samples + #(with-chunk-size target minsz + (binding [tree/*t-join* ropetree/string-rope-node-create] + (reduce ropetree/rope-concat nil chunk-roots))) + (:warmup opts) (:iters opts) (:samples opts)) + :baseline (bench-op-samples + #(let [sb (StringBuilder. (int n))] + (doseq [^String s chunk-strs] + (.append sb s)) + (.toString sb)) + (:warmup opts) (:iters opts) (:samples opts))})}))))) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; -;; Chunk Size Sweep +;; ByteRope (byte[] chunks) vs byte[] ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; -(def default-chunk-candidates [64 128 256 512 1024]) -(def default-sizes [10000 100000 500000]) +(defn- random-bytes ^bytes [^long n] + (let [rng (java.util.Random. 42) + b (byte-array (int n))] + (.nextBytes rng b) + b)) + +(defn- ba-splice + ^bytes [^bytes s ^long start ^long end ^bytes rep] + (let [si (int start) + ei (int end) + sl (alength s) + rl (int (if rep (alength rep) 0)) + result (byte-array (+ (- sl (- ei si)) rl))] + (System/arraycopy s 0 result 0 si) + (when (pos? rl) + (System/arraycopy rep 0 result si rl)) + (System/arraycopy s ei result (+ si rl) (- sl ei)) + result)) + +(defn- bench-byte-rope-at [^long n ^long target opts] + (let [minsz (quot target 2)] + (with-chunk-size target minsz + (binding [tree/*t-join* ropetree/byte-rope-node-create] + (let [^bytes data (random-bytes n) + root (ropetree/bytes->root data) + mid (quot n 2) + span (min 16 (quot n 4)) + lo (max 0 (- mid span)) + hi (min n (+ mid span)) + ^bytes ins (random-bytes (* 2 span)) + ins-rt (ropetree/bytes->root ins) + rng (java.util.Random. 42) + idxs (int-array (repeatedly 1000 #(.nextInt rng (max 1 n))))] + {:target target + :n n + :chunks (count (ropetree/root->chunks root)) + + :construct + {:rope (bench-op-samples + #(with-chunk-size target minsz + (binding [tree/*t-join* ropetree/byte-rope-node-create] + (ropetree/bytes->root data))) + (:warmup opts) (:iters opts) (:samples opts)) + :baseline (bench-op-samples + #(java.util.Arrays/copyOf data n) + (:warmup opts) (:iters opts) (:samples opts))} + + :nth + {:rope (bench-op-samples + #(dotimes [i 1000] + (ropetree/rope-nth root (aget idxs i))) + (:warmup opts) (:iters opts) (:samples opts)) + :baseline (bench-op-samples + #(dotimes [i 1000] + (bit-and (long (aget data (aget idxs i))) 0xff)) + (:warmup opts) (:iters opts) (:samples opts))} + + :reduce + {:rope (bench-op-samples + #(ropetree/rope-reduce + (fn [^long acc x] (+ acc (long x))) + (long 0) root) + (:warmup opts) (:iters opts) (:samples opts)) + :baseline (bench-op-samples + #(let [len (alength data)] + (loop [i (int 0), acc (long 0)] + (if (< i len) + (recur (unchecked-inc-int i) + (+ acc (bit-and (long (aget data i)) 0xff))) + acc))) + (:warmup opts) (:iters opts) (:samples opts))} + + :split + {:rope (bench-op-samples + #(with-chunk-size target minsz + (binding [tree/*t-join* ropetree/byte-rope-node-create] + (ropetree/rope-split-at root mid))) + (:warmup opts) (:iters opts) (:samples opts)) + :baseline (bench-op-samples + #(do [(java.util.Arrays/copyOfRange data 0 mid) + (java.util.Arrays/copyOfRange data mid n)] + nil) + (:warmup opts) (:iters opts) (:samples opts))} + + :splice + {:rope (bench-op-samples + #(with-chunk-size target minsz + (binding [tree/*t-join* ropetree/byte-rope-node-create] + (ropetree/rope-splice-root root lo hi ins-rt))) + (:warmup opts) (:iters opts) (:samples opts)) + :baseline (bench-op-samples + #(ba-splice data lo hi ins) + (:warmup opts) (:iters opts) (:samples opts))} + + :concat + (let [chunks (ropetree/root->chunks root) + chunk-roots (mapv #(ropetree/bytes->root %) chunks) + chunk-bas (vec chunks)] + {:rope (bench-op-samples + #(with-chunk-size target minsz + (binding [tree/*t-join* ropetree/byte-rope-node-create] + (reduce ropetree/rope-concat nil chunk-roots))) + (:warmup opts) (:iters opts) (:samples opts)) + :baseline (bench-op-samples + #(let [result (byte-array (int n))] + (loop [i 0, off 0] + (if (< i (count chunk-bas)) + (let [^bytes c (nth chunk-bas i) + cl (alength c)] + (System/arraycopy c 0 result off cl) + (recur (unchecked-inc i) (+ off cl))) + result))) + (:warmup opts) (:iters opts) (:samples opts))})}))))) -(defn- random-char-string ^String [^long n] - (let [alphabet "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789" - sb (StringBuilder. n)] - (dotimes [_ n] - (.append sb (.charAt alphabet (rand-int (.length alphabet))))) - (.toString sb))) -(defn bench-chunk-size-at-n - "Run all operations for a given chunk size and collection size." - [^long n ^long target opts] - (let [data (range n) - root (build-rope-root data target) - v (build-vector data) - ;; Build pieces for concat bench - pieces (->> data (partition-all target) (mapv vec)) - piece-roots (mapv #(build-rope-root % target) pieces) - ;; Text data for string comparison - text (random-char-string n) - text-root (build-rope-root (vec (seq text)) target)] - (print ".") - (flush) - {:chunk-size target - :n n - :chunks (count pieces) - :nth (bench-nth-for-chunk-size root v n opts) - :reduce (bench-reduce-for-chunk-size root v opts) - :split (bench-split-for-chunk-size root v n opts) - :splice (bench-splice-for-chunk-size root v n opts) - :concat (bench-concat-for-chunk-size piece-roots pieces opts) - :text-splice (bench-text-splice-for-chunk-size text-root text n opts) - :text-split (bench-text-split-for-chunk-size text-root text n opts)})) +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Sweep Runner +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(def default-targets [64 128 256 512 1024]) +(def default-sizes [1000 5000 10000 100000 500000]) + +(defn- run-variant [variant n target opts] + (print (format " %-12s N=%7d target=%4d " (name variant) n target)) + (flush) + (let [result (case variant + :rope (bench-generic-rope-at n target opts) + :string-rope (bench-string-rope-at n target opts) + :byte-rope (bench-byte-rope-at n target opts))] + (println) + result)) + +(defn run-sweep + "Sweep all chunk-size candidates for a given variant across all sizes." + [variant sizes targets opts] + (vec (for [n sizes, target targets] + (assoc (run-variant variant n target opts) + :variant variant)))) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; Reporting ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; -(defn- ratio [rope-stats vec-stats] - (/ (:median vec-stats) (:median rope-stats))) - -(defn- fmt-ratio [r] - (if (>= r 1.0) - (format "%5.1fx" r) - (format "%5.2fx" r))) +(defn- ratio + "Baseline / rope median. >1 means rope is faster, <1 means baseline is faster." + [op-result] + (when (and op-result (get-in op-result [:rope :median]) (get-in op-result [:baseline :median])) + (/ (get-in op-result [:baseline :median]) + (get-in op-result [:rope :median])))) + +(defn- fmt-ratio [^double r] + (cond + (nil? r) " — " + (>= r 100) (format "%5.0fx" r) + (>= r 10) (format "%5.1fx" r) + (>= r 1) (format "%5.2fx" r) + :else (format "%5.3fx" r))) (defn- fmt-us [stats] - (bu/format-ns (* 1000 (:median stats)))) - -(defn print-chunk-size-results - [results] + (when stats (bu/format-ns (* 1000 (:median stats))))) + +(defn- score + "Geometric mean of the structural-editing speedup ratios (splice, split, + concat). These are the operations that define the rope's value proposition. + Higher is better. Used to rank chunk-size candidates." + [result] + (geomean + (keep #(ratio (get result %)) [:splice :split :concat]))) + +(defn- score-all + "Geometric mean over ALL measured operations. Useful as a sanity check + but weights ops like nth and construct that are not why users choose + a rope. See docstring for the two-score rationale." + [result] + (geomean + (keep #(ratio (get result %)) [:construct :nth :reduce :split :splice :concat]))) + +(defn- baseline-name [variant] + (case variant + :rope "vector" + :string-rope "String" + :byte-rope "byte[]")) + +(defn- print-variant-header [variant] (println) (println "═══════════════════════════════════════════════════════════════════════════════════════════") - (println " ROPE CHUNK SIZE TUNING — vs Vector") - (println " Each cell shows: rope median / ratio vs vector (>1x = rope wins)") - (println "═══════════════════════════════════════════════════════════════════════════════════════════") - (doseq [[n size-results] (sort-by key (group-by :n results))] - (println) - (printf " N = %,d%n" n) - (println " ┌────────────┬────────────────┬────────────────┬────────────────┬────────────────┬────────────────┐") - (println " │ chunk size │ nth 1000 │ reduce │ split │ splice │ concat │") - (println " ├────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┤") - (doseq [{:keys [chunk-size nth reduce split splice concat]} (sort-by :chunk-size size-results)] - (printf " │ %4d │ %8s %5s │ %8s %5s │ %8s %5s │ %8s %5s │ %8s %5s │%n" - chunk-size - (fmt-us (:rope nth)) (fmt-ratio (ratio (:rope nth) (:vector nth))) - (fmt-us (:rope reduce)) (fmt-ratio (ratio (:rope reduce) (:vector reduce))) - (fmt-us (:rope split)) (fmt-ratio (ratio (:rope split) (:vector split))) - (fmt-us (:rope splice)) (fmt-ratio (ratio (:rope splice) (:vector splice))) - (fmt-us (:rope concat)) (fmt-ratio (ratio (:rope concat) (:vector concat))))) - (println " └────────────┴────────────────┴────────────────┴────────────────┴────────────────┴────────────────┘") - (println (str " vector baseline: nth=" (fmt-us (:vector (:nth (first size-results)))) - " reduce=" (fmt-us (:vector (:reduce (first size-results)))) - " split=" (fmt-us (:vector (:split (first size-results)))) - " splice=" (fmt-us (:vector (:splice (first size-results)))) - " concat=" (fmt-us (:vector (:concat (first size-results))))))) + (printf " %-12s chunk-size tuning — vs %s (>1x = rope wins)%n" + (name variant) (baseline-name variant)) + (println "═══════════════════════════════════════════════════════════════════════════════════════════")) - (println) - (println "═══════════════════════════════════════════════════════════════════════════════════════════") - (println " ROPE CHUNK SIZE TUNING — vs String (text workload)") - (println " Each cell shows: rope median / ratio vs string (>1x = rope wins)") - (println "═══════════════════════════════════════════════════════════════════════════════════════════") - (doseq [[n size-results] (sort-by key (group-by :n results))] +(defn- print-size-section [size-results variant] + (let [{:keys [n]} (first size-results)] (println) - (printf " N = %,d chars%n" n) - (println " ┌────────────┬────────────────┬────────────────┐") - (println " │ chunk size │ text splice │ text split │") - (println " ├────────────┼────────────────┼────────────────┤") - (doseq [{:keys [chunk-size text-splice text-split]} (sort-by :chunk-size size-results)] - (printf " │ %4d │ %8s %5s │ %8s %5s │%n" - chunk-size - (fmt-us (:rope text-splice)) (fmt-ratio (ratio (:rope text-splice) (:string text-splice))) - (fmt-us (:rope text-split)) (fmt-ratio (ratio (:rope text-split) (:string text-split))))) - (println " └────────────┴────────────────┴────────────────┘") - (println (str " string baseline: splice=" (fmt-us (:string (:text-splice (first size-results)))) - " split=" (fmt-us (:string (:text-split (first size-results))))))) - + (printf " N = %,d (baseline: %s)%n" n (baseline-name variant)) + (println " ┌────────┬────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬────────┬────────┐") + (println " │ target │ chunks │ construct│ nth │ reduce │ split │ splice │ concat │ score │ all │") + (println " ├────────┼────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼────────┼────────┤") + (doseq [r (sort-by :target size-results)] + (printf " │ %4d │ %4d │ %7s │ %7s │ %7s │ %7s │ %7s │ %7s │ %6.2f │ %6.2f │%n" + (:target r) + (:chunks r) + (fmt-ratio (ratio (:construct r))) + (fmt-ratio (ratio (:nth r))) + (fmt-ratio (ratio (:reduce r))) + (fmt-ratio (ratio (:split r))) + (fmt-ratio (ratio (:splice r))) + (fmt-ratio (ratio (:concat r))) + (double (score r)) + (double (score-all r)))) + (println " └────────┴────────┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┴────────┴────────┘") + (println (str " baselines: " + "construct=" (fmt-us (:baseline (:construct (first size-results)))) + " nth=" (fmt-us (:baseline (:nth (first size-results)))) + " reduce=" (fmt-us (:baseline (:reduce (first size-results)))) + " split=" (fmt-us (:baseline (:split (first size-results)))) + " splice=" (fmt-us (:baseline (:splice (first size-results)))) + " concat=" (fmt-us (:baseline (:concat (first size-results)))))))) + +(defn- print-best + "Print the best (highest-score) chunk size for each N across the sweep." + [results] (println) - (println "═══════════════════════════════════════════════════════════════════════════════════════════") - (printf " Current chunk size: target=%d min=%d%n" - ropetree/+target-chunk-size+ ropetree/+min-chunk-size+) - (println "═══════════════════════════════════════════════════════════════════════════════════════════")) + (println " ── Best chunk size per N ──") + (println " 'score' = geomean of structural ops (splice, split, concat)") + (println " 'all' = geomean of all ops (construct, nth, reduce, split, splice, concat)") + (println) + (doseq [[n size-results] (sort-by key (group-by :n results))] + (let [by-score (first (sort-by score > size-results)) + by-all (first (sort-by score-all > size-results))] + (printf " N=%,7d → by score: target=%4d (%.2f) by all: target=%4d (%.2f)%n" + n + (:target by-score) (double (score by-score)) + (:target by-all) (double (score-all by-all)))))) + +(defn print-sweep-results + [results] + (let [by-variant (group-by :variant results)] + (doseq [[variant variant-results] (sort-by key by-variant)] + (print-variant-header variant) + (doseq [[_n size-results] (sort-by key (group-by :n variant-results))] + (print-size-section size-results variant)) + (print-best variant-results)))) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; @@ -273,35 +537,52 @@ ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; (defn run-benchmark - [& {:keys [sizes chunk-candidates warmup iters samples] + [& {:keys [sizes targets variants warmup iters samples] :or {sizes default-sizes - chunk-candidates default-chunk-candidates + targets default-targets + variants [:rope :string-rope :byte-rope] warmup 10 iters 30 samples 7}}] (let [opts {:warmup warmup :iters iters :samples samples}] (println "Rope chunk-size tuning benchmark") - (println "Chunk candidates:" chunk-candidates) - (println "Collection sizes:" sizes) - (println (format "Per-measurement: %d warmup, %d iters, %d samples" warmup iters samples)) - (println) - (let [results (vec (for [n sizes, target chunk-candidates] - (do - (print (format " chunk=%d N=%,d " target n)) - (flush) - (let [r (bench-chunk-size-at-n n target opts)] - (println) - r))))] - (print-chunk-size-results results) - results))) + (println "Variants: " variants) + (println "Targets: " targets) + (println "Sizes: " sizes) + (println (format "Per-measurement: %d warmup, %d iters, %d samples%n" warmup iters samples)) + (let [results (vec (for [variant variants + n sizes + target targets] + (run-variant variant n target opts))) + tagged (mapv #(assoc %1 :variant %2) + results + (for [variant variants + _n sizes + _target targets] + variant))] + (print-sweep-results tagged) + (println) + (println " Current defaults in implementation:") + (println (format " kernel/rope.clj: +target-chunk-size+ = %d +min-chunk-size+ = %d" + ropetree/+target-chunk-size+ ropetree/+min-chunk-size+)) + (println " Per-variant defaults live in each types/*_rope.clj file.") + tagged))) (defn quick-bench [] (run-benchmark :sizes [10000 100000] - :chunk-candidates [128 256 512] + :targets [128 256 512] :warmup 5 :iters 15 :samples 5)) (defn -main [& args] - (if (has-flag? args "--quick" "-q") - (quick-bench) - (run-benchmark)) + (let [quick? (has-flag? args "--quick" "-q") + variant-arg (some->> (partition-all 2 1 args) + (filter (fn [[a _]] (#{"--variant"} a))) + first second keyword) + variants (if variant-arg [variant-arg] [:rope :string-rope :byte-rope])] + (if quick? + (run-benchmark :sizes [10000 100000] + :targets [128 256 512] + :variants variants + :warmup 5 :iters 15 :samples 5) + (run-benchmark :variants variants))) (shutdown-agents)) diff --git a/test/ordered_collections/simple_bench.clj b/test/ordered_collections/simple_bench.clj index 94d68e1..1b0ba29 100644 --- a/test/ordered_collections/simple_bench.clj +++ b/test/ordered_collections/simple_bench.clj @@ -14,7 +14,8 @@ lein bench-simple --only maps,sets ; Run maps and sets Categories for --only: - maps, sets, set-ops, intervals, specialty, strings, parallel, memory" + maps, sets, set-ops, intervals, specialty, strings, parallel, memory, + rope, string-rope, byte-rope" (:require [clojure.core.reducers :as r] [clojure.data.avl :as avl] [clojure.string :as str] @@ -657,13 +658,545 @@ (bench-string-map-lookup sizes) (bench-string-map-iteration sizes)) +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Rope (Vector) Benchmarks +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(def ^:private rope-sizes [1000 5000 10000 100000]) + +(defn bench-rope-concat + "Benchmark concatenating rope pieces vs into vector." + [sizes] + (print-header "ROPE CONCAT: Concat 4 pieces of N/4 elements" + ["vector" "rope"]) + (doseq [n sizes] + (let [quarter (quot n 4) + v1 (vec (range 0 quarter)) + v2 (vec (range quarter (* 2 quarter))) + v3 (vec (range (* 2 quarter) (* 3 quarter))) + v4 (vec (range (* 3 quarter) n)) + r1 (core/rope v1) r2 (core/rope v2) + r3 (core/rope v3) r4 (core/rope v4)] + (print-row n + [(bench 5 10 (vec (concat v1 v2 v3 v4))) + (bench 5 10 (core/rope-concat r1 r2 r3 r4))])))) + +(defn bench-rope-splice + "Benchmark replacing 32 elements at midpoint." + [sizes] + (print-header "ROPE SPLICE: Replace 32 elements at midpoint" + ["vector" "rope"]) + (doseq [n sizes] + (let [v (vec (range n)) + r (core/rope (range n)) + mid (quot n 2) + lo (max 0 (- mid 16)) + hi (min n (+ mid 16)) + ins (vec (range 32))] + (print-row n + [(bench 5 10 (vec (concat (subvec v 0 lo) ins (subvec v hi)))) + (bench 5 10 (core/rope-splice r lo hi ins))])))) + +(defn bench-rope-repeated-edits + "Benchmark 200 random splice edits." + [sizes] + (print-header "ROPE REPEATED EDITS: 200 random splice edits" + ["vector" "rope"]) + (doseq [n sizes] + (let [v (vec (range n)) + r (core/rope (range n)) + rng (java.util.Random. 42) + nops 200 + idxs (vec (repeatedly nops #(.nextInt rng (max 1 n)))) + ins (vec (range nops))] + (print-row n + [(bench 2 5 + (loop [v v, i 0] + (if (< i nops) + (let [pos (rem (nth idxs i) (count v))] + (recur (vec (concat (subvec v 0 pos) [(nth ins i)] + (subvec v (min (+ pos 5) (count v))))) + (inc i))) + v))) + (bench 2 5 + (loop [r r, i 0] + (if (< i nops) + (let [pos (rem (nth idxs i) (count r))] + (recur (core/rope-splice r pos (min (+ pos 5) (count r)) [(nth ins i)]) + (inc i))) + r)))])))) + +(defn bench-rope-reduce + "Benchmark reduce over all elements." + [sizes] + (print-header "ROPE REDUCE: reduce + over all N elements" + ["vector" "rope"]) + (doseq [n sizes] + (let [v (vec (range n)) + r (core/rope (range n))] + (print-row n + [(bench 20 10 (reduce + 0 v)) + (bench 20 10 (reduce + 0 r))])))) + +(defn bench-rope-nth + "Benchmark 1000 random nth lookups." + [sizes] + (print-header "ROPE NTH: 1,000 random nth lookups" + ["vector" "rope"]) + (doseq [n sizes] + (let [v (vec (range n)) + r (core/rope (range n)) + idxs (int-array (repeatedly 1000 #(rand-int n)))] + (print-row n + [(bench 20 10 (areduce idxs i acc nil (nth v (aget idxs i)))) + (bench 20 10 (areduce idxs i acc nil (nth r (aget idxs i))))])))) + +(defn bench-rope-fold + "Benchmark r/fold parallel sum." + [sizes] + (print-header "ROPE FOLD: r/fold parallel sum" + ["vector" "rope"]) + (doseq [n sizes] + (let [v (vec (range n)) + r (core/rope (range n))] + (print-row n + [(bench 20 10 (r/fold + v)) + (bench 20 10 (r/fold + r))])))) + +(defn run-rope-benchmarks + "Run all generic rope vs vector benchmarks." + [sizes] + (let [sizes (or (seq (filter #(<= % 100000) sizes)) rope-sizes)] + (bench-rope-concat sizes) + (bench-rope-splice sizes) + (bench-rope-repeated-edits sizes) + (bench-rope-reduce sizes) + (bench-rope-nth sizes) + (bench-rope-fold sizes))) + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Byte Rope Benchmarks +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(def ^:private byte-rope-sizes [1000 5000 10000 100000]) + +(defn- bench-ba-random + ^bytes [^long n] + (let [rng (java.util.Random. 42) + b (byte-array n)] + (.nextBytes rng b) + b)) + +(defn- bench-ba-splice + ^bytes [^bytes s ^long start ^long end ^bytes rep] + (let [si (int start) + ei (int end) + sl (alength s) + rl (int (if rep (alength rep) 0)) + result (byte-array (+ (- sl (- ei si)) rl))] + (System/arraycopy s 0 result 0 si) + (when (pos? rl) + (System/arraycopy rep 0 result si rl)) + (System/arraycopy s ei result (+ si rl) (- sl ei)) + result)) + +(defn bench-byte-rope-concat + "Benchmark concatenating byte rope pieces vs byte[] arraycopy." + [sizes] + (print-header "BYTE ROPE CONCAT: Concat 4 pieces of N/4 bytes" + ["byte[]" "byte-rope"]) + (doseq [n sizes] + (let [quarter (quot n 4) + ^bytes b1 (bench-ba-random quarter) + ^bytes b2 (bench-ba-random quarter) + ^bytes b3 (bench-ba-random quarter) + ^bytes b4 (bench-ba-random (- n (* 3 quarter))) + r1 (core/byte-rope b1) r2 (core/byte-rope b2) + r3 (core/byte-rope b3) r4 (core/byte-rope b4)] + (print-row n + [(bench 5 10 (let [a (byte-array (+ (alength b1) (alength b2) + (alength b3) (alength b4)))] + (System/arraycopy b1 0 a 0 (alength b1)) + (System/arraycopy b2 0 a (alength b1) (alength b2)) + (System/arraycopy b3 0 a (+ (alength b1) (alength b2)) + (alength b3)) + (System/arraycopy b4 0 a (+ (alength b1) (alength b2) + (alength b3)) + (alength b4)) + a)) + (bench 5 10 (core/byte-rope-concat r1 r2 r3 r4))])))) + +(defn bench-byte-rope-splice + "Benchmark replacing 32 bytes at midpoint." + [sizes] + (print-header "BYTE ROPE SPLICE: Replace 32 bytes at midpoint" + ["byte[]" "byte-rope"]) + (doseq [n sizes] + (let [^bytes data (bench-ba-random n) + r (core/byte-rope data) + mid (quot n 2) + lo (max 0 (- mid 16)) + hi (min n (+ mid 16)) + ^bytes rep (bench-ba-random 32)] + (print-row n + [(bench 5 10 (bench-ba-splice data lo hi rep)) + (bench 5 10 (core/rope-splice r lo hi rep))])))) + +(defn bench-byte-rope-repeated-edits + "Benchmark 200 random splice edits." + [sizes] + (print-header "BYTE ROPE REPEATED EDITS: 200 random splice edits" + ["byte[]" "byte-rope"]) + (doseq [n sizes] + (let [^bytes data (bench-ba-random n) + r (core/byte-rope data) + rng (java.util.Random. 42) + nops 200 + idxs (vec (repeatedly nops #(.nextInt rng (max 1 n)))) + ^bytes ins (bench-ba-random 5)] + (print-row n + [(bench 2 5 + (loop [^bytes s data, i 0] + (if (< i nops) + (let [pos (rem (long (nth idxs i)) (long (alength s))) + end (min (alength s) (+ pos 5))] + (recur (bench-ba-splice s pos end ins) (inc i))) + s))) + (bench 2 5 + (loop [r r, i 0] + (if (< i nops) + (let [pos (rem (long (nth idxs i)) (long (count r))) + end (min (count r) (+ pos 5))] + (recur (core/rope-splice r pos end ins) (inc i))) + r)))])))) + +(defn bench-byte-rope-reduce + "Benchmark reduce over all bytes." + [sizes] + (print-header "BYTE ROPE REDUCE: sum all N bytes" + ["byte[]" "byte-rope"]) + (doseq [n sizes] + (let [^bytes data (bench-ba-random n) + r (core/byte-rope data)] + (print-row n + [(bench 20 10 (let [len (alength data)] + (loop [i (int 0), acc (long 0)] + (if (< i len) + (recur (unchecked-inc-int i) + (+ acc (bit-and (long (aget data i)) 0xff))) + acc)))) + (bench 20 10 (reduce + 0 r))])))) + +(defn bench-byte-rope-digest + "Benchmark SHA-256 digest." + [sizes] + (print-header "BYTE ROPE DIGEST: SHA-256 over N bytes" + ["byte[]" "byte-rope"]) + (doseq [n sizes] + (let [^bytes data (bench-ba-random n) + r (core/byte-rope data)] + (print-row n + [(bench 5 10 (let [md (java.security.MessageDigest/getInstance "SHA-256")] + (.digest md data))) + (bench 5 10 (core/byte-rope-digest r "SHA-256"))])))) + +(defn run-byte-rope-benchmarks + "Run all byte rope vs byte[] benchmarks." + [sizes] + (let [sizes (or (seq (filter #(<= % 100000) sizes)) byte-rope-sizes)] + (bench-byte-rope-concat sizes) + (bench-byte-rope-splice sizes) + (bench-byte-rope-repeated-edits sizes) + (bench-byte-rope-reduce sizes) + (bench-byte-rope-digest sizes))) + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; String Rope vs String Structural Editing +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(def ^:private string-rope-sizes [1000 5000 10000]) + +(defn- random-text + "Generate a random ASCII text of length n." + ^String [^long n] + (let [sb (StringBuilder. n) + chars "abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ\n0123456789" + nchars (count chars)] + (dotimes [_ n] + (.append sb (.charAt chars (rand-int nchars)))) + (.toString sb))) + +(defn- string-splice + "String equivalent of rope-splice via StringBuilder — fair baseline." + ^String [^String s ^long start ^long end ^String ins] + (let [si (int start) + ei (int end) + sb (StringBuilder. (+ (.length s) (.length ins) (- si) ei))] + (.append sb s 0 si) + (.append sb ins) + (.append sb s ei (.length s)) + (.toString sb))) + +(defn- string-insert + "String equivalent of rope-insert via StringBuilder — fair baseline." + ^String [^String s ^long i ^String ins] + (string-splice s i i ins)) + +(defn- string-remove + "String equivalent of rope-remove via StringBuilder — fair baseline." + ^String [^String s ^long start ^long end] + (let [si (int start) + ei (int end) + sb (StringBuilder. (- (.length s) (- ei si)))] + (.append sb s 0 si) + (.append sb s ei (.length s)) + (.toString sb))) + +(defn bench-string-rope-construction + "Benchmark building a string-rope from a String of length N." + [sizes] + (print-header "STRING ROPE CONSTRUCTION: Build from String of length N" + ["String (id)" "string-rope"]) + (doseq [n sizes] + (let [text (random-text n)] + (print-row n + [(bench 20 10 (identity text)) + (bench 20 10 (core/string-rope text))])))) + +(defn bench-string-rope-concat + "Benchmark concatenating two equal halves." + [sizes] + (print-header "STRING ROPE CONCAT: Join two halves of length N/2" + ["String SB" "string-rope-concat"]) + (doseq [n sizes] + (let [^String text (random-text n) + half (int (quot n 2)) + ^String s1 (.substring text 0 half) + ^String s2 (.substring text half) + sr1 (core/string-rope s1) + sr2 (core/string-rope s2)] + (print-row n + [(bench 20 10 + (let [sb (StringBuilder. (.length text))] + (.append sb s1) + (.append sb s2) + (.toString sb))) + (bench 20 10 (core/string-rope-concat sr1 sr2))])))) + +(defn bench-string-rope-split + "Benchmark splitting at midpoint." + [sizes] + (print-header "STRING ROPE SPLIT: Split at midpoint" + ["String subs" "rope-split"]) + (doseq [n sizes] + (let [text (random-text n) + sr (core/string-rope text) + mid (quot n 2)] + (print-row n + [(bench 20 10 [(subs text 0 mid) (subs text mid)]) + (bench 20 10 (core/rope-split sr mid))])))) + +(defn bench-string-rope-insert + "Benchmark inserting a short string at the midpoint, 100 times." + [sizes] + (print-header "STRING ROPE INSERT: 100 inserts of 10-char string at midpoint" + ["String SB" "rope-insert"]) + (doseq [n sizes] + (let [text (random-text n) + sr (core/string-rope text) + ins "XXXXXXXXXX"] + (print-row n + [(bench 5 10 + (loop [^String s text, i 0] + (if (< i 100) + (recur (string-insert s (quot (count s) 2) ins) + (unchecked-inc i)) + s))) + (bench 5 10 + (loop [r sr, i 0] + (if (< i 100) + (recur (core/rope-insert r (quot (count r) 2) ins) + (unchecked-inc i)) + r)))])))) + +(defn bench-string-rope-remove + "Benchmark removing 10 chars from the midpoint, 100 times." + [sizes] + (print-header "STRING ROPE REMOVE: 100 removals of 10 chars at midpoint" + ["String SB" "rope-remove"]) + (doseq [n sizes] + (let [text (random-text (+ n 1000)) ;; extra room for 100 removals + sr (core/string-rope text)] + (print-row n + [(bench 5 10 + (loop [^String s text, i 0] + (if (and (< i 100) (>= (count s) 10)) + (let [mid (quot (count s) 2) + lo (max 0 (- mid 5))] + (recur (string-remove s lo (+ lo 10)) + (unchecked-inc i))) + s))) + (bench 5 10 + (loop [r sr, i 0] + (if (and (< i 100) (>= (count r) 10)) + (let [mid (quot (count r) 2) + lo (max 0 (- mid 5))] + (recur (core/rope-remove r lo (+ lo 10)) + (unchecked-inc i))) + r)))])))) + +(defn bench-string-rope-splice + "Benchmark replacing 10 chars with 10 new chars at midpoint, 100 times." + [sizes] + (print-header "STRING ROPE SPLICE: 100 replace-10 ops at midpoint" + ["String SB" "rope-splice"]) + (doseq [n sizes] + (let [text (random-text n) + sr (core/string-rope text) + rep "YYYYYYYYYY"] + (print-row n + [(bench 5 10 + (loop [^String s text, i 0] + (if (< i 100) + (let [mid (quot (count s) 2) + lo (max 0 (- mid 5)) + hi (min (count s) (+ lo 10))] + (recur (string-splice s lo hi rep) (unchecked-inc i))) + s))) + (bench 5 10 + (loop [r sr, i 0] + (if (< i 100) + (let [mid (quot (count r) 2) + lo (max 0 (- mid 5)) + hi (min (count r) (+ lo 10))] + (recur (core/rope-splice r lo hi rep) (unchecked-inc i))) + r)))])))) + +(defn bench-string-rope-random-access + "Benchmark 10,000 random charAt lookups." + [sizes] + (print-header "STRING ROPE RANDOM ACCESS: 10,000 random charAt" + ["String .charAt" "string-rope nth"]) + (doseq [n sizes] + (let [^String text (random-text n) + sr (core/string-rope text) + idxs (int-array (repeatedly 10000 #(rand-int n)))] + (print-row n + [(bench 20 10 (dotimes [i 10000] (.charAt text (aget idxs i)))) + (bench 20 10 (dotimes [i 10000] (nth sr (aget idxs i))))])))) + +(defn bench-string-rope-iteration + "Benchmark reduce over all characters." + [sizes] + (print-header "STRING ROPE ITERATION: reduce over all N chars" + ["String reduce" "string-rope reduce"]) + (doseq [n sizes] + (let [text (random-text n) + sr (core/string-rope text)] + (print-row n + [(bench 20 10 (reduce (fn [^long acc c] (+ acc (long (char c)))) 0 text)) + (bench 20 10 (reduce (fn [^long acc c] (+ acc (long (char c)))) 0 sr))])))) + +(defn bench-string-rope-materialization + "Benchmark materializing back to String (toString)." + [sizes] + (print-header "STRING ROPE MATERIALIZE: toString" + ["String (id)" "string-rope str"]) + (doseq [n sizes] + (let [text (random-text n) + sr (core/string-rope text)] + (print-row n + [(bench 20 10 (identity text)) + (bench 20 10 (str sr))])))) + +(defn bench-string-rope-editor-simulation + "Simulate a text editor session: interleaved inserts, deletes, and replacements. + 50 edits of mixed operations at random positions." + [sizes] + (print-header "STRING ROPE EDITOR SIM: 50 mixed edits at random positions" + ["String" "string-rope"]) + (doseq [n sizes] + (let [text (random-text n) + sr (core/string-rope text) + ;; Pre-generate a sequence of edit operations + edit-ops (vec (repeatedly 50 + (fn [] + (let [op (rand-int 3)] + {:op op :ins (random-text (+ 1 (rand-int 20)))}))))] + (print-row n + [(bench 5 10 + (reduce + (fn [^String s {:keys [op ^String ins]}] + (let [len (count s)] + (if (< len 5) s + (let [pos (rand-int (max 1 (- len 3)))] + (case (int op) + 0 (string-insert s pos ins) + 1 (string-remove s pos (min len (+ pos (min 20 (rand-int len))))) + 2 (string-splice s pos (min len (+ pos 10)) ins)))))) + text edit-ops)) + (bench 5 10 + (reduce + (fn [r {:keys [op ins]}] + (let [len (count r)] + (if (< len 5) r + (let [pos (rand-int (max 1 (- len 3)))] + (case (int op) + 0 (core/rope-insert r pos ins) + 1 (core/rope-remove r pos (min len (+ pos (min 20 (rand-int len))))) + 2 (core/rope-splice r pos (min len (+ pos 10)) ins)))))) + sr edit-ops))])))) + +(defn bench-string-rope-repeated-concat + "Benchmark building a large string by repeatedly concatenating small pieces." + [sizes] + (print-header "STRING ROPE REPEATED CONCAT: Append 100 x 10-char chunks" + ["String SB" "string-rope-concat" "string-rope transient"]) + (doseq [n sizes] + (let [^String base-text (random-text n) + sr (core/string-rope base-text) + chunks (vec (repeatedly 100 #(random-text 10)))] + (print-row n + [(bench 5 10 + (reduce (fn [^String acc ^String c] + (let [sb (StringBuilder. (+ (.length acc) (.length c)))] + (.append sb acc) + (.append sb c) + (.toString sb))) + base-text chunks)) + (bench 5 10 + (reduce (fn [acc c] (core/string-rope-concat acc c)) sr chunks)) + (bench 5 10 + (let [t (transient sr)] + (persistent! + (reduce (fn [t c] + (reduce conj! t c)) + t chunks))))])))) + +(defn run-string-rope-benchmarks + "Run all string-rope vs String structural editing benchmarks." + [sizes] + (let [sizes (or (seq (filter #(<= % 8000) sizes)) string-rope-sizes)] + (bench-string-rope-construction sizes) + (bench-string-rope-concat sizes) + (bench-string-rope-split sizes) + (bench-string-rope-insert sizes) + (bench-string-rope-remove sizes) + (bench-string-rope-splice sizes) + (bench-string-rope-random-access sizes) + (bench-string-rope-iteration sizes) + (bench-string-rope-materialization sizes) + (bench-string-rope-repeated-concat sizes) + (bench-string-rope-editor-simulation sizes))) + + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; Size Presets ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; -(def sizes-quick [100 1000 10000]) -(def sizes-default [100 1000 10000 100000]) -(def sizes-full [100 1000 10000 100000 1000000]) +(def sizes-quick [100 1000 5000 10000]) +(def sizes-default [100 1000 5000 10000 100000]) +(def sizes-full [100 1000 5000 10000 100000 1000000]) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; Benchmark Categories @@ -687,9 +1220,15 @@ :parallel {:title "PARALLEL FOLD (r/fold)" :fn run-parallel-benchmarks} :memory {:title "MEMORY FOOTPRINT" - :fn estimate-memory-footprint}}) + :fn estimate-memory-footprint} + :rope {:title "ROPE vs VECTOR (Structural Editing)" + :fn run-rope-benchmarks} + :string-rope {:title "STRING ROPE vs STRING (Structural Editing)" + :fn run-string-rope-benchmarks} + :byte-rope {:title "BYTE ROPE vs byte[] (Structural Editing)" + :fn run-byte-rope-benchmarks}}) -(def category-order [:maps :sets :set-ops :intervals :specialty :strings :parallel :memory]) +(def category-order [:maps :sets :set-ops :intervals :specialty :strings :parallel :memory :rope :string-rope :byte-rope]) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; Main Entry Points @@ -751,7 +1290,8 @@ (println " --help Show this help") (println) (println "Categories for --only:") - (println " maps, sets, set-ops, intervals, specialty, strings, parallel, memory") + (println " maps, sets, set-ops, intervals, specialty, strings, parallel, memory,") + (println " rope, string-rope, byte-rope") (println) (println "Examples:") (println " lein bench-simple --quick --only sets") diff --git a/test/ordered_collections/string_rope_bench.clj b/test/ordered_collections/string_rope_bench.clj new file mode 100644 index 0000000..dacf14a --- /dev/null +++ b/test/ordered_collections/string_rope_bench.clj @@ -0,0 +1,412 @@ +(ns ordered-collections.string-rope-bench + "StringRope benchmark suite with chart generation. + + Usage: + lein bench-string-rope ; Run benchmarks + generate charts + lein bench-string-rope --chart-only ; Charts from existing EDN (no bench) + lein bench-string-rope --sizes 1000,10000 ; Custom sizes + + Outputs: + bench-results/string-rope-full.edn ; Raw benchmark data + bench-results/string-rope-benchmark.png ; Log-scale speedup chart + bench-results/string-rope-benchmark-linear.png ; Linear-scale speedup chart" + (:require [ordered-collections.bench-runner :as br] + [ordered-collections.bench-utils :refer [format-ns parse-sizes]] + [clojure.edn :as edn] + [clojure.java.io :as io]) + (:import [java.awt Color Font Graphics2D BasicStroke RenderingHints] + [java.awt.geom Line2D$Double Ellipse2D$Double Rectangle2D$Double] + [java.awt.image BufferedImage] + [javax.imageio ImageIO] + [java.io File] + [java.time Instant]) + (:gen-class)) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Benchmark Runner +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(def default-sizes [1000 4000 10000 100000]) + +(def bench-specs + [[:string-rope-construction br/bench-string-rope-construction] + [:string-rope-concat br/bench-string-rope-concat] + [:string-rope-split br/bench-string-rope-split] + [:string-rope-splice br/bench-string-rope-splice] + [:string-rope-insert br/bench-string-rope-insert] + [:string-rope-remove br/bench-string-rope-remove] + [:string-rope-nth br/bench-string-rope-nth] + [:string-rope-reduce br/bench-string-rope-reduce] + [:string-rope-repeated-edits br/bench-string-rope-repeated-edits] + [:string-rope-re-find br/bench-string-rope-re-find] + [:string-rope-re-seq br/bench-string-rope-re-seq] + [:string-rope-re-replace br/bench-string-rope-re-replace]]) + +(def ^:private edn-path "bench-results/string-rope-full.edn") + +(defn run-benchmarks [sizes] + (let [results (atom {})] + (doseq [n sizes] + (println) + (println (str "===== N = " n " =====")) + (doseq [[k f] bench-specs] + (print (str " " (name k))) + (flush) + (swap! results assoc-in [n k] (f n)) + (println))) + (br/write-results @results edn-path + {:sizes (vec sizes) :args []} (Instant/now)) + (br/print-summary @results) + @results)) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; EDN → Speedup Data +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(def ^:private chart-benchmarks + ;; [edn-key chart-label competitor-key] + [[:string-rope-splice "Single Splice" :string :string-builder] + [:string-rope-insert "Single Insert" :string :string-builder] + [:string-rope-remove "Single Remove" :string :string-builder] + [:string-rope-repeated-edits "200 Random Edits" :string :string-builder] + [:string-rope-concat "Concat Halves" :string :string-builder]]) + +(defn- speedup + "Compute speedup = competitor-ns / rope-ns. Returns nil if data missing." + [bench-results bench-key competitor-key] + (let [rope-ns (get-in bench-results [bench-key :string-rope :mean-ns]) + comp-ns (get-in bench-results [bench-key competitor-key :mean-ns])] + (when (and rope-ns comp-ns (pos? rope-ns)) + (double (/ comp-ns rope-ns))))) + +(defn- extract-speedups + "Extract speedup series from EDN results for one competitor. + Returns [[label [[size speedup] ...]] ...]" + [results sizes competitor-key] + (vec + (for [[bench-key label _ _] chart-benchmarks + :let [pts (vec (for [n sizes + :let [s (speedup (get results n) bench-key competitor-key)] + :when s] + [n s]))] + :when (seq pts)] + [label pts]))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Chart Rendering +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(def ^:private W 1400) +(def ^:private H 750) +(def ^:private margin-left 90) +(def ^:private margin-right 30) +(def ^:private margin-top 100) +(def ^:private margin-bottom 100) +(def ^:private panel-gap 80) + +(def ^:private line-colors + [(Color. 30 160 60) ;; green - splice + (Color. 50 120 210) ;; blue - insert + (Color. 200 80 40) ;; orange - remove + (Color. 140 50 180) ;; purple - 200 edits + (Color. 180 150 30)]) ;; gold - concat + +(def ^:private color-axis (Color. 80 80 80)) +(def ^:private color-bg (Color. 250 250 252)) +(def ^:private color-grid (Color. 235 235 235)) +(def ^:private color-parity (Color. 180 50 50)) +(def ^:private color-win-bg (Color. 230 245 230)) +(def ^:private color-loss-bg (Color. 248 235 235)) +(def ^:private color-title (Color. 30 30 30)) +(def ^:private color-subtitle (Color. 100 100 100)) + +(def ^:private font-title (Font. "SansSerif" Font/BOLD 26)) +(def ^:private font-subtitle (Font. "SansSerif" Font/PLAIN 14)) +(def ^:private font-panel (Font. "SansSerif" Font/BOLD 16)) +(def ^:private font-tick (Font. "SansSerif" Font/PLAIN 11)) +(def ^:private font-label (Font. "SansSerif" Font/PLAIN 12)) +(def ^:private font-legend (Font. "SansSerif" Font/PLAIN 12)) +(def ^:private font-parity (Font. "SansSerif" Font/ITALIC 11)) +(def ^:private font-zone (Font. "SansSerif" Font/BOLD 11)) +(def ^:private font-annot (Font. "SansSerif" Font/BOLD 14)) + +(def ^:private stroke-line (BasicStroke. 2.5 BasicStroke/CAP_ROUND BasicStroke/JOIN_ROUND)) +(def ^:private stroke-grid (BasicStroke. 1.0)) +(def ^:private stroke-parity (BasicStroke. 2.0 BasicStroke/CAP_BUTT BasicStroke/JOIN_MITER + 10.0 (float-array [6 4]) 0.0)) +(def ^:private stroke-axis (BasicStroke. 1.5)) +(def ^:private stroke-border (BasicStroke. 1.0)) + +(defn- log10 ^double [x] (Math/log10 (double x))) + +(defn- map-range [v vmin vmax pmin pmax] + (+ pmin (* (- pmax pmin) (/ (- v vmin) (- vmax vmin))))) + +(defn- panel-bounds [panel-idx] + (let [pw (/ (- W margin-left margin-right panel-gap) 2) + x0 (+ margin-left (* panel-idx (+ pw panel-gap))) + y0 margin-top + x1 (+ x0 pw) + y1 (- H margin-bottom)] + {:x0 (double x0) :y0 (double y0) :x1 (double x1) :y1 (double y1) + :pw (double pw) :ph (double (- y1 y0))})) + +(defn- draw-line [^Graphics2D g x1 y1 x2 y2] + (.draw g (Line2D$Double. (double x1) (double y1) (double x2) (double y2)))) + +(defn- draw-circle [^Graphics2D g cx cy r] + (.fill g (Ellipse2D$Double. (- (double cx) r) (- (double cy) r) (* 2.0 r) (* 2.0 r)))) + +(defn- draw-string-centered [^Graphics2D g ^String s x y] + (let [fm (.getFontMetrics g) + sw (.stringWidth fm s)] + (.drawString g s (int (- x (/ sw 2))) (int y)))) + +(defn- draw-string-right [^Graphics2D g ^String s x y] + (let [fm (.getFontMetrics g) + sw (.stringWidth fm s)] + (.drawString g s (int (- x sw)) (int y)))) + +(defn- format-speedup ^String [v] + (if (>= v 10) + (format "%.0f\u00d7" (double v)) + (format "%.1f\u00d7" (double v)))) + +(defn- format-size ^String [n] + (cond + (>= n 1000000) (str (/ n 1000000) "M") + (>= n 1000) (str (/ n 1000) "k") + :else (str n))) + +(defn- nice-ticks + "Generate nice round tick values for a linear axis from 0 to ymax." + [ymax] + (let [raw-step (/ ymax 6.0) + mag (Math/pow 10 (Math/floor (Math/log10 raw-step))) + norm (/ raw-step mag) + step (* mag (cond (<= norm 1.5) 1 (<= norm 3.5) 2 (<= norm 7.5) 5 :else 10))] + (loop [v step, acc []] + (if (> v (* ymax 1.01)) acc (recur (+ v step) (conj acc v)))))) + + +;; ── Panel drawing ─────────────────────────────────────────────────────────── + +(defn- draw-panel + [^Graphics2D g bench-data panel-idx panel-title + & {:keys [log-y? shared-ymax]}] + (let [{:keys [x0 y0 x1 y1 pw ph]} (panel-bounds panel-idx) + all-speedups (mapcat (fn [[_ pts]] (map second pts)) bench-data) + sizes (mapv first (second (first bench-data))) + log-xmin (- (log10 (apply min sizes)) 0.15) + log-xmax (+ (log10 (apply max sizes)) 0.15) + mx (fn [v] (map-range (log10 (double v)) log-xmin log-xmax x0 x1)) + ;; Y axis setup + max-sp (double (apply max all-speedups)) + min-sp (double (apply min all-speedups)) + [my parity-y y-ticks] + (if log-y? + (let [log-ymin (min -0.6 (- (log10 (max 0.1 min-sp)) 0.2)) + log-ymax (+ (log10 max-sp) 0.3) + my (fn [v] (map-range (log10 (double v)) log-ymin log-ymax y1 y0)) + ticks (filter #(and (>= (log10 %) log-ymin) (<= (log10 %) log-ymax)) + [0.1 0.5 2 5 10 20 50 100 200 500])] + [my (my 1.0) ticks]) + (let [ymax (or shared-ymax (* max-sp 1.12)) + my (fn [v] (map-range (double v) 0.0 ymax y1 y0)) + ticks (nice-ticks ymax)] + [my (my 1.0) ticks]))] + + ;; Win/loss background zones + (.setColor g color-win-bg) + (.fillRect g (int x0) (int y0) (int pw) (int (- parity-y y0))) + (.setColor g color-loss-bg) + (.fillRect g (int x0) (int parity-y) (int pw) (int (- y1 parity-y))) + + ;; Panel border + (.setColor g (Color. 200 200 200)) + (.setStroke g stroke-border) + (.draw g (Rectangle2D$Double. x0 y0 pw ph)) + + ;; Grid lines (Y) + (.setStroke g stroke-grid) + (.setColor g color-grid) + (doseq [v y-ticks] + (let [yy (my v)] + (when (and (>= yy y0) (<= yy y1)) + (draw-line g x0 yy x1 yy)))) + + ;; Grid lines (X) + (doseq [s sizes] + (.setColor g color-grid) + (.setStroke g stroke-grid) + (draw-line g (mx s) y0 (mx s) y1)) + + ;; Parity line — dashed red + (.setStroke g stroke-parity) + (.setColor g color-parity) + (draw-line g x0 parity-y x1 parity-y) + + ;; Zone labels + (.setFont g font-zone) + (.setColor g (Color. 40 130 50)) + (.drawString g "StringRope faster \u2191" (int (+ x0 6)) (int (+ y0 16))) + (.setColor g (Color. 170 50 50)) + (.drawString g "StringRope slower \u2193" (int (+ x0 6)) (int (- y1 6))) + + ;; Axes + (.setStroke g stroke-axis) + (.setColor g color-axis) + (draw-line g x0 y1 x1 y1) + (draw-line g x0 y0 x0 y1) + + ;; Y tick labels + (.setFont g font-tick) + (.setColor g color-axis) + ;; Parity label + (when (and (>= parity-y y0) (<= parity-y y1)) + (.setColor g color-parity) + (.setFont g font-parity) + (draw-string-right g "1\u00d7 parity" (- x0 6) (+ parity-y 4)) + (.setFont g font-tick) + (.setColor g color-axis)) + (doseq [v y-ticks] + (let [yy (my v)] + (when (and (>= yy y0) (<= yy y1)) + (draw-string-right g (format-speedup v) (- x0 6) (+ yy 4))))) + + ;; X tick labels + (doseq [s sizes] + (draw-string-centered g (format-size s) (mx s) (+ y1 16))) + (.setFont g font-label) + (.setColor g color-subtitle) + (draw-string-centered g "String Length" (/ (+ x0 x1) 2) (+ y1 35)) + + ;; Panel title + (.setFont g font-panel) + (.setColor g color-title) + (draw-string-centered g panel-title (/ (+ x0 x1) 2) (- y0 10)) + + ;; Plot lines and points + (doseq [[idx [_label pts]] (map-indexed vector bench-data)] + (let [color (nth line-colors idx)] + (.setColor g color) + (.setStroke g stroke-line) + (let [mapped-pts (mapv (fn [[size sp]] [(mx size) (my sp)]) pts)] + (doseq [i (range (dec (count mapped-pts)))] + (let [[px1 py1] (nth mapped-pts i) + [px2 py2] (nth mapped-pts (inc i))] + (draw-line g px1 py1 px2 py2))) + (doseq [[px py] mapped-pts] + (.setColor g Color/WHITE) + (draw-circle g px py 5.5) + (.setColor g color) + (draw-circle g px py 4)) + (let [[px py] (last mapped-pts) + sp (second (last pts))] + (.setFont g font-annot) + (.setColor g color) + (.drawString g ^String (format-speedup sp) (int (+ px 8)) (int (+ py 5))))))))) + + +;; ── Full chart rendering ──────────────────────────────────────────────────── + +(defn- draw-legend [^Graphics2D g bench-data-left] + (let [ly (- H 28) + labels (mapv first bench-data-left) + _ (.setFont g font-legend) + fm (.getFontMetrics g) + spacing 32 + total-w (reduce + (map-indexed + (fn [_i label] (+ 26 (.stringWidth fm ^String label) spacing)) + labels)) + start-x (- (/ W 2) (/ total-w 2))] + (loop [i 0, x start-x] + (when (< i (count labels)) + (let [color (nth line-colors i) + label (nth labels i)] + (.setColor g color) + (.setStroke g stroke-line) + (draw-line g x (- ly 4) (+ x 18) (- ly 4)) + (draw-circle g (+ x 9) (- ly 4) 3.5) + (.setFont g font-legend) + (.setColor g color-axis) + (.drawString g ^String label (int (+ x 24)) (int ly)) + (let [sw (.stringWidth fm ^String label)] + (recur (inc i) (+ x 24 sw spacing)))))))) + +(defn- render-chart + [^String path data-vs-string data-vs-sb & {:keys [log-y?]}] + (let [img (BufferedImage. W H BufferedImage/TYPE_INT_ARGB) + g (.createGraphics img)] + (.setRenderingHint g RenderingHints/KEY_ANTIALIASING RenderingHints/VALUE_ANTIALIAS_ON) + (.setRenderingHint g RenderingHints/KEY_TEXT_ANTIALIASING RenderingHints/VALUE_TEXT_ANTIALIAS_ON) + (.setRenderingHint g RenderingHints/KEY_RENDERING RenderingHints/VALUE_RENDER_QUALITY) + + ;; Background + (.setColor g color-bg) + (.fillRect g 0 0 W H) + + ;; Title + (.setFont g font-title) + (.setColor g color-title) + (draw-string-centered g "StringRope: Speedup vs String and StringBuilder" (/ W 2) 32) + (.setFont g font-subtitle) + (.setColor g color-subtitle) + (draw-string-centered g + "Persistent, immutable, O(log n) \u2022 Thread-safe \u2022 Free undo via structural sharing" + (/ W 2) 52) + (draw-string-centered g + (str "Higher = faster. Dashed red line = break-even. " + (if log-y? "Log" "Linear") " scale on Y-axis.") + (/ W 2) 70) + + ;; Panels + (if log-y? + (do (draw-panel g data-vs-string 0 "vs String (str + subs)" :log-y? true) + (draw-panel g data-vs-sb 1 "vs StringBuilder (optimal mutable)" :log-y? true)) + (let [all-sp (concat (mapcat (fn [[_ pts]] (map second pts)) data-vs-string) + (mapcat (fn [[_ pts]] (map second pts)) data-vs-sb)) + ymax (* (double (apply max all-sp)) 1.12)] + (draw-panel g data-vs-string 0 "vs String (str + subs)" :shared-ymax ymax) + (draw-panel g data-vs-sb 1 "vs StringBuilder (optimal mutable)" :shared-ymax ymax))) + + ;; Legend + (draw-legend g data-vs-string) + + (.dispose g) + (ImageIO/write img "png" (File. path)) + (println (str " Chart written to: " path)))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Main +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn generate-charts + "Generate both log and linear speedup charts from EDN results." + [results sizes] + (System/setProperty "java.awt.headless" "true") + (let [vs-string (extract-speedups results sizes :string) + vs-sb (extract-speedups results sizes :string-builder)] + (println) + (println "Generating charts...") + (render-chart "bench-results/string-rope-benchmark.png" + vs-string vs-sb :log-y? true) + (render-chart "bench-results/string-rope-benchmark-linear.png" + vs-string vs-sb :log-y? false))) + +(defn -main [& args] + (let [chart-only? (some #{"--chart-only"} args) + sizes (if-let [s (some (fn [[a b]] (when (= a "--sizes") b)) + (partition 2 1 args))] + (parse-sizes s) + default-sizes)] + (if chart-only? + (let [data (edn/read-string (slurp edn-path))] + (generate-charts (:benchmarks data) (:sizes data))) + (let [results (run-benchmarks sizes)] + (generate-charts results sizes)))) + (shutdown-agents)) diff --git a/test/ordered_collections/string_rope_test.clj b/test/ordered_collections/string_rope_test.clj new file mode 100644 index 0000000..0fb51a3 --- /dev/null +++ b/test/ordered_collections/string_rope_test.clj @@ -0,0 +1,564 @@ +(ns ordered-collections.string-rope-test + (:require [clojure.test :refer :all] + [clojure.core.reducers :as r] + [clojure.string :as str] + [clojure.test.check.clojure-test :refer [defspec]] + [clojure.test.check.generators :as gen] + [clojure.test.check.properties :as prop] + [ordered-collections.core :as oc] + [ordered-collections.kernel.rope :as ropetree] + [ordered-collections.test-utils :as tu])) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Basic CharSequence Semantics +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(deftest string-rope-charsequence + (let [sr (oc/string-rope "hello world")] + (is (instance? CharSequence sr)) + (is (= 11 (.length ^CharSequence sr))) + (is (= \h (.charAt ^CharSequence sr 0))) + (is (= \d (.charAt ^CharSequence sr 10))) + (is (= "ello" (str (.subSequence ^CharSequence sr 1 5)))) + (is (= "hello world" (str sr))) + (is (= "hello world" (.toString sr))))) + +(deftest string-rope-empty + (let [sr (oc/string-rope)] + (is (= 0 (count sr))) + (is (= "" (str sr))) + (is (nil? (seq sr))) + (is (nil? (rseq sr))) + (is (nil? (peek sr))) + (is (thrown? IllegalStateException (pop sr))) + ;; Regression: empty charAt/nth must throw, not NPE on nil root + (is (thrown? StringIndexOutOfBoundsException (.charAt ^CharSequence sr 0))) + (is (thrown? IndexOutOfBoundsException (nth sr 0))) + (is (= :nope (nth sr 0 :nope))) + ;; Regression: empty fold must return (combinef), not crash + (is (= 0 (r/fold + sr))))) + +(deftest string-rope-non-integer-keys + (let [sr (oc/string-rope "abc")] + ;; Regression: get/valAt must return nil for non-integer keys, not throw + (is (nil? (get sr :x))) + (is (nil? (get sr nil))) + (is (= :nf (get sr :x :nf))) + (is (= :nf (get sr "hello" :nf))))) + +(deftest string-rope-single-char + (let [sr (oc/string-rope "x")] + (is (= 1 (count sr))) + (is (= \x (nth sr 0))) + (is (= "x" (str sr))) + (is (= \x (peek sr))) + (is (= "" (str (pop sr)))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Equality +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(deftest string-rope-equality-with-strings + (is (= (oc/string-rope "hello") "hello")) + (is (= "hello" (oc/string-rope "hello"))) + (is (= (oc/string-rope "hello") (oc/string-rope "hello"))) + (is (= (oc/string-rope "") "")) + (is (= "" (oc/string-rope ""))) + (is (not= (oc/string-rope "hello") "world")) + (is (not= (oc/string-rope "hello") (oc/string-rope "world")))) + +(deftest string-rope-not-equal-to-generic-rope + (is (not= (oc/string-rope "hello") (oc/rope (seq "hello"))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Hashing +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(deftest string-rope-hash-consistency + (is (= (hash (oc/string-rope "hello")) (hash "hello"))) + (is (= (hash (oc/string-rope "")) (hash ""))) + (is (= (hash (oc/string-rope "the quick brown fox")) + (hash "the quick brown fox")))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Indexed / ILookup / IFn +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(deftest string-rope-nth + (let [sr (oc/string-rope "abcde")] + (is (= \a (nth sr 0))) + (is (= \e (nth sr 4))) + (is (= :nope (nth sr 10 :nope))) + (is (thrown? IndexOutOfBoundsException (nth sr 5))))) + +(deftest string-rope-get + (let [sr (oc/string-rope "abc")] + (is (= \b (get sr 1))) + (is (nil? (get sr 5))) + (is (= :nope (get sr 5 :nope))))) + +(deftest string-rope-ifn + (let [sr (oc/string-rope "abc")] + (is (= \a (sr 0))) + (is (= \c (sr 2))) + (is (= :nope (sr 10 :nope))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Seq / Rseq +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(deftest string-rope-seq + (let [sr (oc/string-rope "hello")] + (is (= [\h \e \l \l \o] (seq sr))) + (is (= [\o \l \l \e \h] (rseq sr))) + (is (= 5 (count (seq sr)))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Reduce +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(deftest string-rope-reduce + (let [sr (oc/string-rope "abc")] + (is (= "abc" (reduce str "" sr))) + (is (= "abc" (reduce str sr))))) + +(deftest string-rope-reduce-early-termination + (let [sr (oc/string-rope "abcdefghij")] + (is (= "abc" + (reduce (fn [acc c] + (if (>= (count acc) 3) + (reduced acc) + (str acc c))) + "" sr))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Conj / Assoc / Peek / Pop +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(deftest string-rope-conj + (let [sr (oc/string-rope "abc")] + (is (= "abcd" (str (conj sr \d)))) + (is (= "a" (str (conj (oc/string-rope) \a)))))) + +(deftest string-rope-assoc + (let [sr (oc/string-rope "abc")] + (is (= "axc" (str (assoc sr 1 \x)))) + (is (= "abcd" (str (assoc sr 3 \d)))) + (is (thrown? IndexOutOfBoundsException (assoc sr 4 \x))))) + +(deftest string-rope-peek-pop + (let [sr (oc/string-rope "abc")] + (is (= \c (peek sr))) + (is (= "ab" (str (pop sr)))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Structural Operations (PRope) +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(deftest string-rope-cat + (is (= "hello world" + (str (oc/string-rope-concat (oc/string-rope "hello ") (oc/string-rope "world"))))) + (is (= "hello world" + (str (oc/string-rope-concat (oc/string-rope "hello ") "world"))))) + +(deftest string-rope-split + (let [[l r] (oc/rope-split (oc/string-rope "hello world") 5)] + (is (= "hello" (str l))) + (is (= " world" (str r))))) + +(deftest string-rope-sub + (is (= "quick" (str (oc/rope-sub (oc/string-rope "the quick brown") 4 9))))) + +(deftest string-rope-insert + (is (= "hello cruel world" + (str (oc/rope-insert (oc/string-rope "hello world") 5 " cruel"))))) + +(deftest string-rope-remove + (is (= "helloworld" + (str (oc/rope-remove (oc/string-rope "hello world") 5 6))))) + +(deftest string-rope-splice + (is (= "hello cruel world" + (str (oc/rope-splice (oc/string-rope "hello world") 5 6 " cruel "))))) + +(deftest string-rope-str + (is (= "hello world" (oc/rope-str (oc/string-rope "hello world")))) + (is (= "" (oc/rope-str (oc/string-rope))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Metadata +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(deftest string-rope-metadata + (let [sr (with-meta (oc/string-rope "hello") {:tag :test})] + (is (= {:tag :test} (meta sr))) + (is (= {:tag :test} (meta (empty sr)))) + (let [[l r] (oc/rope-split sr 3)] + (is (= {:tag :test} (meta l))) + (is (= {:tag :test} (meta r)))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Comparable +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(deftest string-rope-comparable + (is (zero? (compare (oc/string-rope "abc") (oc/string-rope "abc")))) + (is (neg? (compare (oc/string-rope "abc") (oc/string-rope "abd")))) + (is (pos? (compare (oc/string-rope "abd") (oc/string-rope "abc")))) + (is (neg? (compare (oc/string-rope "ab") (oc/string-rope "abc")))) + (is (pos? (compare (oc/string-rope "abc") (oc/string-rope "ab"))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Parallel Fold +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(deftest string-rope-fold + (let [sr (oc/string-rope (apply str (repeat 10000 "x")))] + (is (= 10000 + (r/fold + (r/map (constantly 1) sr)))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Transient +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(deftest string-rope-transient-basic + (let [sr (persistent! (reduce conj! (transient (oc/string-rope)) "hello"))] + (is (= "hello" (str sr))) + (is (= 5 (count sr))))) + +(deftest string-rope-transient-from-existing + (let [base (oc/string-rope "hello") + sr (persistent! (reduce conj! (transient base) " world"))] + (is (= "hello world" (str sr))))) + +(deftest string-rope-transient-empty + (let [sr (persistent! (transient (oc/string-rope)))] + (is (= "" (str sr))) + (is (= 0 (count sr))))) + +(deftest string-rope-transient-invalidation + (let [t (transient (oc/string-rope))] + (persistent! t) + (is (thrown? IllegalAccessError (conj! t \a))) + (is (thrown? IllegalAccessError (persistent! t))))) + +(deftest string-rope-transient-large + (let [text (apply str (range 1000)) + sr (persistent! (reduce conj! (transient (oc/string-rope)) text))] + (is (= text (str sr))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Print / EDN +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(deftest string-rope-print-method + (is (= "#string/rope \"hello\"" (pr-str (oc/string-rope "hello")))) + (is (= "#string/rope \"\"" (pr-str (oc/string-rope))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Java Interop +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(deftest string-rope-regex + (testing "re-find — direct, no str conversion" + (is (= "hello" (re-find #"\w+" (oc/string-rope "hello world")))) + (is (= ["hello" "hello"] (re-find #"(\w+)" (oc/string-rope "hello world")))) + (is (nil? (re-find #"\d+" (oc/string-rope "no digits here"))))) + + (testing "re-matches — full-string match" + (is (= "hello" (re-matches #"\w+" (oc/string-rope "hello")))) + (is (nil? (re-matches #"\w+" (oc/string-rope "hello world")))) + (is (= ["hello world" "hello" "world"] + (re-matches #"(\w+)\s(\w+)" (oc/string-rope "hello world"))))) + + (testing "re-seq — all matches" + (is (= ["hello" "world" "foo"] + (re-seq #"\w+" (oc/string-rope "hello world foo")))) + (is (= ["123" "456"] + (re-seq #"\d+" (oc/string-rope "abc123def456"))))) + + (testing "re-matcher — produces working Matcher" + (let [m (re-matcher #"\w+" (oc/string-rope "hello world"))] + (is (= "hello" (re-find m))) + (is (= "world" (re-find m))) + (is (nil? (re-find m))))) + + (testing "re-find on multi-chunk rope" + (let [sr (reduce (fn [r _] (oc/rope-splice r (quot (count r) 2) + (quot (count r) 2) "XYZ")) + (oc/string-rope (apply str (repeat 500 "a"))) + (range 10))] + (is (string? (re-find #"XYZ" sr))) + (is (= (count (re-seq #"XYZ" sr)) + (count (re-seq #"XYZ" (str sr))))))) + + (testing "empty rope" + (is (nil? (re-find #"\w+" (oc/string-rope "")))) + (is (= "" (re-matches #"" (oc/string-rope "")))))) + +(deftest string-rope-clojure-string + (testing "str/replace and str/replace-first accept CharSequence" + (is (= "hell0 w0rld" (str/replace (oc/string-rope "hello world") #"o" "0"))) + (is (= "hell0 world" (str/replace-first (oc/string-rope "hello world") #"o" "0")))) + + (testing "str functions via (str ...) conversion" + (is (= "HELLO" (str/upper-case (str (oc/string-rope "hello"))))) + (is (= "hello" (str/lower-case (str (oc/string-rope "HELLO"))))))) + +(deftest string-rope-charsequence-streams + (testing ".chars() returns correct IntStream" + (let [sr (oc/string-rope "abc") + cs (vec (.toArray (.chars ^CharSequence sr)))] + (is (= [(int \a) (int \b) (int \c)] cs)))) + + (testing ".codePoints() returns correct IntStream" + (let [sr (oc/string-rope "abc") + cps (vec (.toArray (.codePoints ^CharSequence sr)))] + (is (= [(int \a) (int \b) (int \c)] cps)))) + + (testing "streams on multi-chunk rope" + (let [text (apply str (repeat 500 "ab")) + sr (oc/string-rope text)] + (is (= (.count (.chars ^CharSequence text)) + (.count (.chars ^CharSequence sr))))))) + +(deftest string-rope-collection-interface + (let [^java.util.Collection sr (oc/string-rope "abc")] + (is (= 3 (.size sr))) + (is (not (.isEmpty sr))) + (is (.isEmpty ^java.util.Collection (oc/string-rope))) + (is (.contains sr \b)) + (is (not (.contains sr \z))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Edge Cases +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(deftest string-rope-chunk-boundaries + (let [text (apply str (repeat 300 "abcdefghij")) + sr (oc/string-rope text)] + (is (= text (str sr))) + (is (= 3000 (count sr))) + (is (= \a (nth sr 0))) + (is (= \j (nth sr 2999))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Property-Based Tests +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defn- clamp ^long [^long n ^long i] + (min n (max 0 i))) + +(defspec prop-string-rope-roundtrip 100 + (prop/for-all [s gen/string] + (= s (str (oc/string-rope s))))) + +(defspec prop-string-rope-split-roundtrip 100 + (prop/for-all [s (gen/such-that #(pos? (count %)) gen/string) + i gen/nat] + (let [n (count s) + i' (rem i (inc n)) + sr (oc/string-rope s) + [l r] (oc/rope-split sr i')] + (= s (str (str l) (str r)))))) + +(defspec prop-string-rope-splice-oracle 100 + (prop/for-all [s (gen/such-that #(pos? (count %)) gen/string) + a gen/nat + b gen/nat + ins gen/string] + (let [n (count s) + lo (clamp n (min a b)) + hi (clamp n (max a b)) + expected (str (subs s 0 lo) ins (subs s hi)) + sr (oc/string-rope s) + result (str (oc/rope-splice sr lo hi ins))] + (= expected result)))) + +(defspec prop-string-rope-equality 100 + (prop/for-all [s gen/string] + (and (= (oc/string-rope s) s) + (= s (oc/string-rope s)) + (= (hash (oc/string-rope s)) (hash s))))) + +(defspec prop-string-rope-csi-after-edits 50 + (prop/for-all [s (gen/such-that #(>= (count %) 10) gen/string 100) + ops (gen/vector + (gen/one-of + [(gen/fmap (fn [[a b]] [:split (min a b) (max a b)]) + (gen/tuple gen/nat gen/nat)) + (gen/fmap (fn [[a b ins]] [:splice (min a b) (max a b) ins]) + (gen/tuple gen/nat gen/nat gen/string))]) + 1 10)] + (let [sr (oc/string-rope s) + result (reduce + (fn [r [op a b ins]] + (let [n (count r)] + (case op + :split (first (oc/rope-split r (clamp n a))) + :splice (oc/rope-splice r (clamp n (min a b)) + (clamp n (max a b)) + (or ins ""))))) + sr ops) + root (.-root ^ordered_collections.types.string_rope.StringRope result)] + (or (nil? root) (string? root) (ropetree/invariant-valid? root))))) + +(defspec prop-string-rope-regex-oracle 100 + (prop/for-all [s (gen/such-that #(pos? (count %)) gen/string-alphanumeric)] + (let [sr (oc/string-rope s)] + (and (= (re-seq #"\w+" s) (re-seq #"\w+" sr)) + (= (re-find #"\w" s) (re-find #"\w" sr)) + (= (re-matches #"\w+" s) (re-matches #"\w+" sr)) + (= (str/replace s #"[aeiou]" "*") + (str/replace sr #"[aeiou]" "*")))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Surrogate Pair Safety +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(deftest surrogate-pair-chunk-boundary + (testing "surrogate pair at chunk boundary is not split" + ;; Place a supplementary character (2 code units) right at the 1024 + ;; char boundary. Without the fix, the high surrogate ends up as the + ;; last char of chunk 0 and the low surrogate as the first of chunk 1. + (let [prefix (apply str (repeat 1023 "a")) + emoji "\uD83C\uDF89" ;; U+1F389 (🎉) + suffix "bbb" + s (str prefix emoji suffix) + sr (oc/string-rope s) + root (.-root ^ordered_collections.types.string_rope.StringRope sr) + chunks (ropetree/root->chunks root)] + (is (= s (str sr)) "toString round-trip") + (is (= (count s) (count sr)) "length preserved") + ;; No chunk should end with a lone high surrogate + (doseq [^String c chunks] + (when (pos? (.length c)) + (is (not (Character/isHighSurrogate (.charAt c (dec (.length c))))) + (str "chunk ends with lone high surrogate: " (subs c (max 0 (- (.length c) 3))))))) + ;; No chunk should start with a lone low surrogate + (doseq [^String c chunks] + (when (pos? (.length c)) + (is (not (Character/isLowSurrogate (.charAt c 0))) + "chunk starts with lone low surrogate"))))) + (testing "string of all supplementary characters" + (let [emojis (apply str (repeat 600 "\uD83D\uDE00")) ;; 600 × 😀 = 1200 code units + sr (oc/string-rope emojis)] + (is (= emojis (str sr))) + (is (= 1200 (count sr)))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Flat→Tree Boundary +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(deftest flat-tree-boundary + (testing "1024 chars stays flat" + (let [s (apply str (repeat 1024 "x")) + sr (oc/string-rope s)] + (is (= 1024 (count sr))) + (is (string? (.-root ^ordered_collections.types.string_rope.StringRope sr))) + (is (= s (str sr))))) + (testing "1025 chars promotes to tree" + (let [s (apply str (repeat 1025 "x")) + sr (oc/string-rope s)] + (is (= 1025 (count sr))) + (is (not (string? (.-root ^ordered_collections.types.string_rope.StringRope sr)))) + (is (= s (str sr))))) + (testing "flat→tree promotion via edits" + (let [sr (oc/string-rope (apply str (repeat 1020 "a")))] + (is (string? (.-root ^ordered_collections.types.string_rope.StringRope sr))) + ;; Insert 10 chars — stays flat (1030 < 1024? no, 1030 > 1024 → promotes) + (let [sr2 (oc/rope-insert sr 500 "bbbbbbbbbb")] + (is (= 1030 (count sr2))) + ;; Should have promoted to tree since 1030 > 1024 + (is (not (string? + (.-root ^ordered_collections.types.string_rope.StringRope sr2)))) + (is (= (str (oc/string-rope (apply str (repeat 1020 "a")))) + (str (apply str (repeat 1020 "a")))))))) + (testing "tree→flat demotion via transient persistent!" + ;; Build a large rope then shrink it via transient + (let [sr (oc/string-rope (apply str (repeat 2000 "x"))) + ;; persistent! from transient should demote if ≤ threshold + t (transient sr) + sr2 (persistent! t)] + ;; Still 2000 chars, stays tree + (is (= 2000 (count sr2)))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; HashMap Key Compatibility +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(deftest hashmap-key-compatibility + (testing "StringRope hasheq matches String hasheq" + (let [s "hello world" + sr (oc/string-rope s)] + (is (= (hash s) (hash sr))))) + (testing "StringRope keys looked up by another StringRope" + (let [sr1 (oc/string-rope "hello world") + sr2 (oc/string-rope "hello world") + m {sr1 :found}] + (is (= :found (get m sr2))))) + (testing "String-keyed map looked up by StringRope" + (let [s "hello world" + sr (oc/string-rope s) + m {s :found}] + ;; StringRope.equals(String) works because we control it + (is (= :found (get m sr))))) + (testing "Tree-mode rope hasheq matches" + (let [s (apply str (repeat 2000 "x")) + sr (oc/string-rope s)] + (is (= (hash s) (hash sr)))))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; charAt Stress (sequential + random access across chunk boundaries) +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(deftest char-at-stress + (testing "sequential charAt across chunk boundaries" + (let [s (apply str (map #(char (+ (int \a) (mod % 26))) (range 2000))) + sr (oc/string-rope s)] + ;; Forward sequential scan + (dotimes [i 2000] + (is (= (.charAt ^CharSequence (oc/string-rope s) i) + (.charAt s i)) + (str "mismatch at index " i))))) + (testing "random access charAt" + (let [s (apply str (map #(char (+ (int \a) (mod % 26))) (range 4000))) + sr (oc/string-rope s) + indices (shuffle (range 4000))] + ;; Random order — every access walks the tree + (doseq [i (take 500 indices)] + (is (= (.charAt ^CharSequence sr i) (.charAt s i)) + (str "random access mismatch at " i))))) + (testing "charAt correct after structural edits" + (let [sr1 (oc/string-rope (apply str (repeat 2000 "a")))] + ;; Splice creates a NEW StringRope + (let [sr2 (oc/rope-splice sr1 500 1500 "bbb")] + ;; 2000 - 1000 + 3 = 1003 + (is (= 1003 (count sr2))) + ;; 0-499 = a, 500-502 = b, 503-1002 = a + (is (= \a (.charAt ^CharSequence sr2 0))) + (is (= \a (.charAt ^CharSequence sr2 499))) + (is (= \b (.charAt ^CharSequence sr2 500))) + (is (= \b (.charAt ^CharSequence sr2 502))) + (is (= \a (.charAt ^CharSequence sr2 503))) + (is (= \a (.charAt ^CharSequence sr2 1002)))))))