weight-inspect provides deterministic structural identity for GGUF and safetensors model files.
Scope:
- Parse GGUF and safetensors headers/metadata
- Extract tensor descriptors (name, dtype, shape, byte_length)
- Compute structural hash
- Diff two files structurally (built on identity)
Out of Scope:
- Loading or hashing weight data
- Runtime compatibility prediction
- Model execution
structural_hash = SHA256(canonical_json(artifact))
| Field | Included | Notes |
|---|---|---|
| format | ✓ | "gguf" or "safetensors" |
| gguf_version | ✓ | Only for GGUF files |
| metadata keys | ✓ | Sorted lexicographically |
| metadata values | ✓ | Normalized (see below) |
| tensor names | ✓ | Sorted lexicographically |
| tensor dtype | ✓ | |
| tensor shape | ✓ | Sorted by dimension order |
| tensor byte_length | ✓ |
- File offsets
- Padding
- Physical tensor ordering
- Header formatting
- Weight/tensor data bytes
All maps MUST be sorted by key:
metadata: BTreeMap → lexicographically sortedtensors: BTreeMap → lexicographically sorted by tensor name
Integers: Serialize as JSON integers (no trailing decimals).
Floats: Serialize as raw bit patterns (as JSON numbers) to ensure reproducibility:
Float: bit pattern as u64 (e.g.,4611686018427387904)Float32: prefix withf32:then bit pattern as u32 (e.g.,f32:1077936128)
This prevents floating-point representation differences across platforms.
Strings are escaped for JSON serialization. No Unicode normalization is performed.
- Metadata arrays: preserve declared order
- Tensor shapes: preserve dimension order (already ordered by spec)
Given identical input files:
- Parsing produces identical
Artifactstructure - Canonical JSON is byte-for-byte identical
- SHA256 hash is identical
- Diff output order is stable
Cross-machine: Same hash guaranteed across any machine/OS.
format: gguf
gguf_version: 3
tensor_count: 242
metadata_count: 22
structural_hash: abc123...
First 5 tensors:
1: blk.0.attn_norm.weight [768] (f32)
...
format: gguf
structural_hash: abc123...
tensor_count: 242
metadata_count: 22
Structural Identity:
format equal: true
hash equal: false
tensor count equal: true
metadata count equal: false
Metadata:
+ key_added
- key_removed
~ key_changed: old -> new
Tensors:
+ new_tensor
- removed_tensor
~ tensor_name:
dtype: f16 -> q4_k
shape: [4096] -> [4096, 4096]
All commands support --json:
{
"schema": 1,
"format": "gguf",
"structural_hash": "abc123...",
...
}Schema version allows downstream tools to handle format evolution.
- Invalid magic bytes → "unable to parse GGUF header"
- Invalid JSON in safetensors → "invalid safetensors JSON header"
- File not found → standard IO error
No recovery logic - fail fast.
- v0.1.0: Initial release