Summary
Long Distance Matching enables finding matches beyond the normal window size, critical for highly repetitive data with distant patterns (log files, database dumps, backup archives).
C reference implementation (zstd_ldm.c)
Gear hash rolling
- Rolling hash:
hash = (hash << 1) + ZSTD_ldm_gearTab[byte]
- Gear table: 256-entry lookup table for byte-to-hash-value mapping
- Stop mask determines split points (average every
2^hashRateLog bytes)
LDM hash table
- Bucket-based:
ldmEntry_t with position + checksum
LDM_BUCKET_SIZE_LOG = 4 (16 entries per bucket)
- Memory-efficient: only stores position + small checksum, not full content
Parameters
ZSTD_c_enableLongDistanceMatching: on/off
ldmHashLog: Hash table size
ldmMinMatch: Minimum match length (default 64 bytes)
ldmBucketSizeLog: Collision bucket size
ldmHashRateLog: Insertion frequency
Integration
- LDM matches are merged with regular matches in sequence generation
- Auto-enables large window (128MB) for effective long-range matching
What needs to be implemented
- Gear hash implementation — rolling hash with lookup table
- LDM hash table — bucket-based with position + checksum
- LDM match finding — search LDM table, verify matches, extend
- Sequence merging — combine LDM matches with regular matches
- Parameter API — enable/configure LDM via compression parameters
- Window extension — auto-increase window when LDM enabled
Acceptance criteria
Time estimate
3d
Summary
Long Distance Matching enables finding matches beyond the normal window size, critical for highly repetitive data with distant patterns (log files, database dumps, backup archives).
C reference implementation (zstd_ldm.c)
Gear hash rolling
hash = (hash << 1) + ZSTD_ldm_gearTab[byte]2^hashRateLogbytes)LDM hash table
ldmEntry_twith position + checksumLDM_BUCKET_SIZE_LOG = 4(16 entries per bucket)Parameters
ZSTD_c_enableLongDistanceMatching: on/offldmHashLog: Hash table sizeldmMinMatch: Minimum match length (default 64 bytes)ldmBucketSizeLog: Collision bucket sizeldmHashRateLog: Insertion frequencyIntegration
What needs to be implemented
Acceptance criteria
Time estimate
3d