Skip to content

ashworks1706/zipy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

image

Reliable Inference Runtime for Autonomous Systems

crates.io npm PyPI

Quick StartUsageContributing


Zipy is a constraint-aware LLM inference runtime built in Rust and wgpu for low-latency decision-making in autonomous systems. It is designed to run directly on-device under strict memory, compute, and power constraints such as rovers, drones, and edge-based agents by enforcing bounded execution, fallback modes, and structured outputs.

  • Native safetensors loading with efficient GPU buffer management via wgpu
  • Custom WGSL kernels for Transformer operations (MatMul, RoPE, RMSNorm)
  • Quantized inference for reduced memory footprint on edge devices
  • PagedAttention for efficient KV-cache management under fragmented VRAM
  • Low-latency inference optimized for real-time decision loops
  • Multi-modal input support (camera frames, IMU data, sensor streams)
  • Persistent KV-cache offloading to NVMe to reduce recomputation overhead
  • Continuous batching for fine-grained request scheduling
  • Model distillation support for deployment in constrained environments
  • FP16 / BF16 precision support

Quick Start

TBD.

Usage

TBD.

Contributing

TBD.

License

Apache 2.0 License

Acknowledgments

Built by @ashworks1706 for real-time autonomous systems operating under constrained environments.

About

Reliable Inference Runtime for Autonomous Systems

Topics

Resources

License

Stars

Watchers

Forks

Contributors