Releases: shloktech/keyedstablehash
keyedstablehash 0.0.5
Release Notes: keyedstablehash v0.0.5
🎉 Initial Release
We are excited to announce the release of keyedstablehash v0.0.5! This library provides a robust solution for generating stable, deterministic hashes for Python objects (dictionaries, lists, primitives) with the added security of a secret key.
Unlike Python's built-in hash(), which is randomized per process, keyedstablehash ensures that the same data structure always produces the same hash, making it perfect for caching, data verification, and distributed systems.
🚀 Key Features
- Stable Hashing: Guarantees deterministic output across different Python sessions and environments.
- Keyed Support: specific focus on allowing a "secret key" (salt) to be mixed into the hash, similar to HMAC, preventing hash collision attacks or unauthorized verification.
- Deep Traversal: Recursively handles nested dictionaries and lists to ensure the entire object structure is hashed.
- Type Agnostic: Supports standard Python primitives (
str,int,float,bool,None) and collections.
📦 Installation
You can install the package directly from PyPI:
pip install keyedstablehash==0.0.5
💻 Usage Example
Here is a quick example of how to use the library to generate a stable hash for a dictionary:
1. Hashing Python Objects
Generate stable hashes for complex, nested structures.
from keyedstablehash import stable_keyed_hash
# Your secret key (must be 16 bytes)
secret_key = b"\x01" * 16
# A complex, messy object
data = {
"id": 101,
"tags": {"python", "data", "secure"}, # Sets are auto-sorted
"meta": {"created_at": 167888, "active": True}
}
h = stable_keyed_hash(data, key=secret_key)
print(f"Hex: {h.hexdigest()}")
# -> Hex: 4a1b... (Deterministic across runs)
print(f"Int: {h.intdigest()}")
# -> Int: 8392... (uint64)2. Streaming API
Mirrors the standard hashlib interface for data streams.
from keyedstablehash import siphash24
secret_key = b"\x01" * 16
s = siphash24(key=secret_key)
s.update(b"chunk_one")
s.update(b"chunk_two")
print(s.hexdigest())3. Dataframe Vectorization (The Power Feature)
Hash entire columns in Pandas, Polars, or Arrow. This is essential for data de-duplication, shuffling, or anonymization
pipelines.
import pandas as pd
import pyarrow as pa
from keyedstablehash import hash_pandas_series, hash_arrow_array
secret_key = b"\x01" * 16
# --- Pandas ---
df = pd.DataFrame({"user_id": ["u1", "u2", "u1"]})
df["hash"] = hash_pandas_series(df["user_id"], key=secret_key)
# Result: A Series of uint64 hashes
print(df["hash"])
# --- PyArrow ---
arr = pa.array(["alpha", "beta", "gamma"])
hashes = hash_arrow_array(arr, key=secret_key)
# Result: A pyarrow.Array(uint64)
print(hashes)