-
Notifications
You must be signed in to change notification settings - Fork 22
Description
Together with @marcelamelara and Zahra Ghodsi, we would like to propose a new SIG focused on GPU based model integrity. We are seeking feedback and interested participants.
GPU-Based Model Integrity SIG
Creation of a new Special Interest Group (SIG) at Sandbox stage
Proposed focus, intent, goals, and/or deliverables
Focus / Mission
As ML models grow in size and complexity, ensuring their integrity throughout the supply chain becomes increasingly critical. Traditional CPU-based integrity verification approaches face significant challenges:
- Scale: Modern foundation models can exceed hundreds of gigabytes, making CPU-based hashing prohibitively slow. For example, hashing a 100GB model on CPU can take 10+ minutes, creating bottlenecks in CI/CD pipelines and deployment workflows.
- Provenance: Organizations need to verify not just that a model is unchanged, but its complete lineage from training through deployment
- Verification granularity: Different use cases require different levels of verification from full model validation to selective layer verification
This SIG addresses these challenges by leveraging GPU acceleration for model integrity operations (hashing, signing, attestation). Model integrity is one component of comprehensive model provenance; this SIG's work will integrate with and enable broader provenance frameworks such as Model Transparency and Atlas.
Goals
- Establish a hardware-agnostic API and workflow for GPU-based ML model hashing and signing, with reference implementations for major GPU vendors.
- Enable ML model producers to generate trustworthy GPU-based model hashes and signatures, and model consumers to verify GPU-signed models.
- Evaluate, standardize, and implement GPU-accelerated versions of below integrity algorithm families:
| Algorithm | Properties | GPU |
|---|---|---|
| SHA-256/512 | Widely adopted, FIPS-compliant | Moderate parallelization |
| SHA-3 (Keccak) | Quantum-resistant design, NIST standard | Good parallelization |
| Lattice Hash | Efficient update | Good parallelization |
Deliverables
- API specification for GPU-based ML model hashing and signing across GPU vendors
- Libraries implementing the API and workflow for common GPU hardware GPU-optimized Merkle tree implementations enabling selective layer verification
- Talk at industry conference (target: Open Source Summit or Open Source SecurityCon, 2026)
- Stretch goal: Peer-reviewed academic paper documenting algorithm performance and security analysis (target venue: USENIX Security, IEEE S&P, or equivalent)
Success Metrics
- API specification adopted by at least 2 downstream projects or frameworks
- Demonstrated speedup over CPU-based hashing for models >10GB
- Reference implementations available for at least 2 GPU vendors
- Stretch goal: Peer-reviewed publication accepted at a recognized venue
2026 Roadmap
| Quarter | Milestone |
|---|---|
| Q1 2026 | API specification v0.1; Merkle tree structure proposal |
| Q2 2026 | Reference implementation for NVIDIA/Intel GPUs; Provenance integration spec draft; Algorithm benchmarking results published |
| Q3 2026 | API specification v1.0 incorporating community feedback; |
| Q4 2026 | Academic paper submission; Conference talk delivery; Integration testing with Model Transparency framework |
Future Directions
While the initial focus is on model integrity, the techniques and infrastructure developed by this SIG are directly applicable to dataset integrity. Training datasets face similar challenges:
- Scale**:** Large-scale datasets (e.g., LAION, Common Crawl derivatives) can reach terabytes, making integrity verification even more demanding than for models.
- Provenance**:** Tracking dataset lineage—including filtering, deduplication, and augmentation steps—is essential for reproducibility and compliance.
- Tamper detection**:** Data poisoning attacks target training data; efficient integrity verification can help detect unauthorized modifications.
Pending successful delivery of model integrity milestones, the SIG may expand its scope to include GPU-accelerated dataset hashing, signing, and attestation in 2027 and beyond.
List SIG Lead(s)
The SIG must have a minimum of 1 Lead
List of interested individuals
The SIG have a minimum of 3 members with 2 different organizational affiliations.
- Andrew Gan, Purdue University, @Andrew-Gan
- Marcela Melara, Intel, @marcelamelara
Governing Body
SIGs may report to an existing OpenSSF Working Group or directly to the TAC as their governing body. The SIG commits to providing the governing body quarterly updates on progress.
- "AI/ML Security WG”
SIG References
| Reference | URL |
|---|---|
| Repo | https://github.com/Andrew-Gan/sentry |
| Meeting Agenda | During AI/ML WG |
| OSSF Calendar Entry | To be added upon SIG approval |
| Security.md | In progress |
| Roadmap | See 2026 Roadmap above |
| code-of-conduct.md | https://github.com/ossf/ai-ml-security/blob/main/code-of-conduct.md |
| Demos | Planned for Q2 2026 |
| Papers | Sentry: Authenticating Machine Learning Artifacts on the Fly Scalable GPU-Based Integrity Verification for Large Machine Learning Models |
| Other |