-
Notifications
You must be signed in to change notification settings - Fork 138
Also use compare small Uint8Array by content logic for joins #899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
KyleAMathews
wants to merge
3
commits into
main
Choose a base branch
from
claude/fix-bug-018BgfHCaNdqb8awa3LefSfu
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+195
−57
Open
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
| --- | ||
| "@tanstack/db-ivm": patch | ||
| "@tanstack/db": patch | ||
| --- | ||
|
|
||
| Fix joining collections by small Uint8Array keys to use content-based comparison instead of reference equality. | ||
|
|
||
| Previously, when joining collections using Uint8Array keys (like ULIDs or UUIDs), the Index class would compare keys by reference rather than by content. This caused joins to fail when the same byte array data existed in separate instances. | ||
|
|
||
| This fix introduces shared Uint8Array normalization utilities that convert small Uint8Arrays (≤128 bytes) to string representations for content-based equality checking. The normalization logic is now shared between `db-ivm` and `db` packages, eliminating code duplication. | ||
|
|
||
| Fixes #896 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,52 @@ | ||
| /** | ||
| * Threshold for normalizing Uint8Arrays to string representations. | ||
| * Arrays larger than this will use reference equality to avoid memory overhead. | ||
| * 128 bytes is enough for common ID formats (ULIDs are 16 bytes, UUIDs are 16 bytes) | ||
| * while avoiding excessive string allocation for large binary data. | ||
| */ | ||
| export const UINT8ARRAY_NORMALIZE_THRESHOLD = 128 | ||
|
|
||
| /** | ||
| * Check if a value is a Uint8Array or Buffer | ||
| */ | ||
| export function isUint8Array(value: unknown): value is Uint8Array { | ||
| return ( | ||
| (typeof Buffer !== `undefined` && value instanceof Buffer) || | ||
| value instanceof Uint8Array | ||
| ) | ||
| } | ||
|
|
||
| /** | ||
| * Normalize a Uint8Array to a string representation for content-based comparison. | ||
| * This enables Uint8Arrays with the same byte content to be treated as equal, | ||
| * even if they are different object instances. | ||
| * | ||
| * @param value - The Uint8Array or Buffer to normalize | ||
| * @returns A string representation of the byte array | ||
| */ | ||
| export function normalizeUint8Array(value: Uint8Array): string { | ||
| // Convert to a string representation that can be used as a Map key | ||
| // Use a special prefix to avoid collisions with user strings | ||
| return `__u8__${Array.from(value).join(`,`)}` | ||
| } | ||
|
|
||
| /** | ||
| * Normalize a value for Map key or comparison usage. | ||
| * Converts small Uint8Arrays/Buffers to string representations for content-based equality. | ||
| * This enables proper comparison and Map key usage for binary data like ULIDs. | ||
| * | ||
| * @param value - The value to normalize | ||
| * @returns The normalized value (string for small Uint8Arrays, original value otherwise) | ||
| */ | ||
| export function normalizeValue<T>(value: T): T | string { | ||
| if (isUint8Array(value)) { | ||
| // Only normalize small arrays to avoid memory overhead for large binary data | ||
| if (value.byteLength <= UINT8ARRAY_NORMALIZE_THRESHOLD) { | ||
| return normalizeUint8Array(value) | ||
| } | ||
| // For large arrays, fall back to reference equality | ||
| // Users working with large binary data should use a derived key if needed | ||
| } | ||
|
|
||
| return value | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -12,6 +12,7 @@ import { | |
| describe(`Operators`, () => { | ||
| describe(`Join operation`, () => { | ||
| testJoin() | ||
| testUint8ArrayKeyJoin() | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we decide to keep this implementation, can we incorporate this into the main testJoin function rather than adding as a new one with just this single test? |
||
| }) | ||
| }) | ||
|
|
||
|
|
@@ -342,3 +343,70 @@ function testJoin() { | |
| assertKeyedResults(`join batch processing`, batchResult, expectedResults, 6) | ||
| }) | ||
| } | ||
|
|
||
| function testUint8ArrayKeyJoin() { | ||
| test(`join with Uint8Array keys compared by content`, () => { | ||
| const graph = new D2() | ||
| const inputA = graph.newInput<[Uint8Array, string]>() | ||
| const inputB = graph.newInput<[Uint8Array, string]>() | ||
| const results: Array<[Uint8Array, [string, string]]> = [] | ||
|
|
||
| inputA.pipe( | ||
| join(inputB), | ||
| output((message) => { | ||
| for (const [item] of message.getInner()) { | ||
| results.push(item) | ||
| } | ||
| }) | ||
| ) | ||
|
|
||
| graph.finalize() | ||
|
|
||
| // Create separate Uint8Array instances with the same content | ||
| const key1A = new Uint8Array([1, 2, 3, 4]) | ||
| const key1B = new Uint8Array([1, 2, 3, 4]) | ||
| const key2A = new Uint8Array([5, 6, 7, 8]) | ||
| const key2B = new Uint8Array([5, 6, 7, 8]) | ||
|
|
||
| // Verify that the arrays are different objects | ||
| expect(key1A).not.toBe(key1B) | ||
| expect(key2A).not.toBe(key2B) | ||
|
|
||
| // Send data with different Uint8Array instances but same content | ||
| inputA.sendData( | ||
| new MultiSet([ | ||
| [[key1A, `a`], 1], | ||
| [[key2A, `b`], 1], | ||
| ]) | ||
| ) | ||
|
|
||
| inputB.sendData( | ||
| new MultiSet([ | ||
| [[key1B, `x`], 1], | ||
| [[key2B, `y`], 1], | ||
| ]) | ||
| ) | ||
|
|
||
| graph.run() | ||
|
|
||
| // Should join successfully based on content, not reference | ||
| expect(results).toHaveLength(2) | ||
|
|
||
| // Verify the joined data is correct | ||
| const sortedResults = results.sort((a, b) => { | ||
| const [keyA] = a | ||
| const [keyB] = b | ||
| return keyA[0]! - keyB[0]! | ||
| }) | ||
|
|
||
| const [key1, [val1A, val1B]] = sortedResults[0]! | ||
| expect(Array.from(key1)).toEqual([1, 2, 3, 4]) | ||
| expect(val1A).toBe(`a`) | ||
| expect(val1B).toBe(`x`) | ||
|
|
||
| const [key2, [val2A, val2B]] = sortedResults[1]! | ||
| expect(Array.from(key2)).toEqual([5, 6, 7, 8]) | ||
| expect(val2A).toBe(`b`) | ||
| expect(val2B).toBe(`y`) | ||
| }) | ||
| } | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little nervous of the overhead of adding this additional map and the lookups. We are making joins slower for all cases in order to support an edge case.