Skip to content

Releases: shloktech/keyedstablehash

keyedstablehash 0.0.5

27 Dec 12:53

Choose a tag to compare

Release Notes: keyedstablehash v0.0.5

🎉 Initial Release

We are excited to announce the release of keyedstablehash v0.0.5! This library provides a robust solution for generating stable, deterministic hashes for Python objects (dictionaries, lists, primitives) with the added security of a secret key.

Unlike Python's built-in hash(), which is randomized per process, keyedstablehash ensures that the same data structure always produces the same hash, making it perfect for caching, data verification, and distributed systems.

🚀 Key Features

  • Stable Hashing: Guarantees deterministic output across different Python sessions and environments.
  • Keyed Support: specific focus on allowing a "secret key" (salt) to be mixed into the hash, similar to HMAC, preventing hash collision attacks or unauthorized verification.
  • Deep Traversal: Recursively handles nested dictionaries and lists to ensure the entire object structure is hashed.
  • Type Agnostic: Supports standard Python primitives (str, int, float, bool, None) and collections.

📦 Installation

You can install the package directly from PyPI:

pip install keyedstablehash==0.0.5

💻 Usage Example

Here is a quick example of how to use the library to generate a stable hash for a dictionary:

1. Hashing Python Objects

Generate stable hashes for complex, nested structures.

from keyedstablehash import stable_keyed_hash

# Your secret key (must be 16 bytes)
secret_key = b"\x01" * 16

# A complex, messy object
data = {
    "id": 101,
    "tags": {"python", "data", "secure"},  # Sets are auto-sorted
    "meta": {"created_at": 167888, "active": True}
}

h = stable_keyed_hash(data, key=secret_key)

print(f"Hex: {h.hexdigest()}")
# -> Hex: 4a1b... (Deterministic across runs)
print(f"Int: {h.intdigest()}")
# -> Int: 8392... (uint64)

2. Streaming API

Mirrors the standard hashlib interface for data streams.

from keyedstablehash import siphash24

secret_key = b"\x01" * 16

s = siphash24(key=secret_key)
s.update(b"chunk_one")
s.update(b"chunk_two")

print(s.hexdigest())

3. Dataframe Vectorization (The Power Feature)

Hash entire columns in Pandas, Polars, or Arrow. This is essential for data de-duplication, shuffling, or anonymization
pipelines.

import pandas as pd
import pyarrow as pa
from keyedstablehash import hash_pandas_series, hash_arrow_array

secret_key = b"\x01" * 16

# --- Pandas ---
df = pd.DataFrame({"user_id": ["u1", "u2", "u1"]})
df["hash"] = hash_pandas_series(df["user_id"], key=secret_key)
# Result: A Series of uint64 hashes
print(df["hash"])

# --- PyArrow ---
arr = pa.array(["alpha", "beta", "gamma"])
hashes = hash_arrow_array(arr, key=secret_key)
# Result: A pyarrow.Array(uint64)
print(hashes)

🔗 Links