GitHub - ashworks1706/zipy: Reliable Inference Runtime for Autonomous Systems

Reliable Inference Runtime for Autonomous Systems

Zipy is a constraint-aware LLM inference runtime built in Rust and wgpu for low-latency decision-making in autonomous systems. It is designed to run directly on-device under strict memory, compute, and power constraints such as rovers, drones, and edge-based agents by enforcing bounded execution, fallback modes, and structured outputs.

Native safetensors loading with efficient GPU buffer management via wgpu
Custom WGSL kernels for Transformer operations (MatMul, RoPE, RMSNorm)
Quantized inference for reduced memory footprint on edge devices
PagedAttention for efficient KV-cache management under fragmented VRAM
Low-latency inference optimized for real-time decision loops
Multi-modal input support (camera frames, IMU data, sensor streams)
Persistent KV-cache offloading to NVMe to reduce recomputation overhead
Continuous batching for fine-grained request scheduling
Model distillation support for deployment in constrained environments
FP16 / BF16 precision support

Quick Start

TBD.

Usage

TBD.

Contributing

TBD.

License

Apache 2.0 License

Acknowledgments

Built by @ashworks1706 for real-time autonomous systems operating under constrained environments.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
benches		benches
docs		docs
sdk		sdk
src		src
website		website
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quick Start

Usage

Contributing

License

Acknowledgments

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Quick Start

Usage

Contributing

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages