🍓🎇 rasptorch

rasptorch is an experimental deep learning library inspired by PyTorch, built with a singular focus: making complex neural networks practical and efficient to run on resource-constrained hardware like the Raspberry Pi 5, by leveraging its GPU capabilities via Vulkan.

✨ Core Concepts

The library operates on a multi-layered architecture to maximize hardware utilization:

CPU Backend (Software): Uses a pure NumPy-backed autograd engine and nn module for reliable computation when GPU acceleration is unavailable.
GPU Backend (Hardware): Features an experimental Vulkan backend for high-speed tensor operations (elementwise math, matmul, reductions) directly on the Pi 5's GPU.
Interface: Provides a streamlined CLI/Streamlit UI for interactive model building, training, persistence, and inspection.

The Vulkan path relies on real compute shaders compiled to SPIR-V, giving deep control over the underlying hardware.

🔌 Backend Abstraction (Connectable Backends)

rasptorch now exposes a backend abstraction API so compute backends can be registered and connected at runtime:

import rasptorch

# Inspect availability
print(rasptorch.available_backends())  # {'cpu': True, 'vulkan': ..., 'opencl': ..., 'cuda': ...}

# Try to connect a backend (falls back to CPU in non-strict mode)
active = rasptorch.connect_backend("vulkan", strict=False)
print(active.name)

Built-in backend adapters:

numpy (NumPy adapter; internal key: cpu) - Pure NumPy autograd
vulkan (rasptorch Vulkan kernels, with optional CPU fallback) - Optimized for Raspberry Pi 4/5 ⚡
opencl (pyopencl when available, optional CPU fallback)
cuda (CuPy when available, with PyTorch CUDA fallback, optional CPU fallback)

CLI helpers:

rasptorch backend list
rasptorch backend connect numpy
rasptorch backend connect vulkan --strict
# Benchmark with auto-tuned Vulkan kernel and submission batching
rasptorch --json backend benchmark --backends numpy,vulkan,cuda --size 2048 --iterations 100 --warmup 20 --vulkan-kernel auto --vulkan-autotune-submit --seed 42

Note: User-facing CLI/UI labels the CPU backend as numpy. Vulkan benchmark mode uses resident buffers (upload once, repeated on-device matmul, download once). Performance (Optimized): Vulkan achieves ~564 GFLOPS (78% of NumPy on matmul_vec4 with auto-tuning). --vulkan-kernel auto probes matmul, matmul_vec4, matmul_a_bt, and matmul_a_bt_tiled (when available) and keeps the faster path. If Vulkan hits VkErrorDeviceLost, lower --vulkan-submit-every (for example, 4 or 1) or use auto-tuning. Recommended: Use --vulkan-autotune-submit to jointly probe kernel + submit chunk and pick the fastest stable combo. Optimizations: Command buffer batching, memory-mapped buffers, auto kernel selection.

📚 What's Included (Core Features)

Tensor Operations: Support for elementwise math, matrix multiplication (matmul), reductions, indexing, reshaping, stacking, and broadcasting.
Layers: Includes standard neural network blocks: Linear, MLP, CNN, GRU, Transformer, normalization layers, activations, pooling, embeddings, and attention.
Training Tools: Full suite of tools including optimizers (SGD), learning-rate schedulers, gradient clipping, and regularization helpers.
Persistence: Ability to save and load checkpoint weights without needing the full torch dependency.
Interfaces: CLI (rasptorch chat) and Streamlit UI (rasptorch ui).

🚀 Getting Started

1. Installation

A. Basic Install (CPU Only): To get the core library components running on the CPU:

pip install rasptorch

B. Development Install (Full Capability): For local development and access to all potential backends:

pip install -e ".[dev]"

C. GPU Mode Prerequisites: To utilize the GPU backend, you must meet these prerequisites:

Raspberry Pi 5 with working Vulkan drivers.
The glslc shader compiler must be available in your system PATH.
When running, you must specify the device: --device gpu or --device auto.

2. Quick Run Examples

Start the Interactive Shell:

uv run rasptorch chat

Launch the Web UI:

uv run rasptorch ui

(This will usually open at http://localhost:8501)

Viewing Help: To see all available CLI subcommands:

uv run rasptorch --help

⚙️ Execution Modes & Workflows

The main.py script controls the operational mode:

cpu: Pure NumPy autograd execution on the CPU.
gpu: Executes the training loop explicitly using the Vulkan backend kernels.
gpu-autograd: An experimental mode for tracing gradients across the GPU pipeline.

Example Training Command:

uv run main.py --device gpu --epochs 50 --batch-size 32 --lr 0.01

📊 Benchmarks

rasptorch provides a built-in benchmark tool for comparing backend performance on matrix multiplication:

Quick Benchmark (Single Size):

# Benchmark with default settings (2048x2048 matmul, 100 iterations)
uv run rasptorch backend benchmark

# Benchmark with custom size and multiple backends
uv run rasptorch --json backend benchmark --backends numpy,vulkan,cuda --size 2048 --iterations 100 --warmup 20 --seed 42

Performance Results (Raspberry Pi 5, 2048x2048 matmul, optimized):

Backend	Time (s)	Iterations/s	GFLOPS	Status
NumPy	2.25	44.4	763	Reference
Vulkan (auto-tuned)	3.15	31.8	546	⚡ GPU
CUDA (when available)	0.56	178	3059	Best

Vulkan Kernel Selection: The --vulkan-kernel auto flag intelligently probes available kernels:

matmul - Basic single-threaded implementation
matmul_vec4 - SIMD-style vec4 operations
matmul_a_bt - Matrix transpose optimization (for A @ B.T)
matmul_a_bt_tiled - Tiled transpose optimization (fastest when applicable)

Advanced Tuning:

# Auto-tune both kernel AND submission batching strategy
uv run rasptorch --json backend benchmark --backends vulkan --size 2048 \
  --iterations 100 --warmup 20 \
  --vulkan-kernel auto \
  --vulkan-autotune-submit \
  --seed 42

# Manual kernel selection with custom batch submission
uv run rasptorch --json backend benchmark --backends vulkan \
  --vulkan-kernel matmul_a_bt_tiled \
  --vulkan-submit-every 4 \
  --size 2048 --iterations 100

Output Format: Results are provided in JSON format (with --json flag) including:

status: "ok" or "unavailable"
elapsed_seconds: Total benchmark time
iterations_per_second: Throughput metric
estimated_gflops: Floating-point performance
checksum: Verification result
kernel: Selected kernel name (for auto mode)
submit_every: Submission batch size (for Vulkan)

Optimization Tips:

Use --vulkan-autotune-submit for best results (probes kernel + batch combinations)
If you see VkErrorDeviceLost, reduce --vulkan-submit-every (try 4 or 1)
Larger problem sizes better amortize GPU setup overhead
Command buffer batching (--vulkan-submit-every) balances latency and throughput

For detailed optimization guide, see VULKAN_OPTIMIZATION.md.

🧠 Advanced Topics

1. Tensor Operations

Basic tensor math is performed via:

# Create tensors
uv run rasptorch tensor random --shape 2,3,4
uv run rasptorch tensor ones --shape 5,10

The results show the low-level tensor capabilities.

2. Model Definition

Models are defined using structured commands:

# Simple MLP
uv run rasptorch model mlp --layers "64,32,16,2"
# Complex CNN
uv run rasptorch model cnn --in-channels 3 --out-channels "32,64,128"

Managing the lifecycle:

uv run rasptorch model list
uv run rasptorch model save --model-id <id> --path model.pth

🩹 Troubleshooting & Best Practices

Performance: The fastest paths are those that keep the computation entirely on the GPU and minimize data transfer across the PCIe bus.
Fallback: If GPU operations fail due to driver issues, the system gracefully falls back to the CPU NumPy path, but performance will suffer.
Advanced Use: For understanding the deep dive into custom kernel optimization, please refer to the source code in the rasptorch/gpu_demo.py and rasptorch/main.py scripts.

Name		Name	Last commit message	Last commit date
Latest commit History 155 Commits
.env		.env
.github		.github
openclaw/skills/rasptorch		openclaw/skills/rasptorch
rasptorch		rasptorch
tests		tests
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
SECURITY.md		SECURITY.md
cnn_demo.py		cnn_demo.py
demo_model.pth		demo_model.pth
essentials_demo.py		essentials_demo.py
gpu_demo.py		gpu_demo.py
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
test_import.py		test_import.py
torch_bridge_demo.py		torch_bridge_demo.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🍓🎇 rasptorch

✨ Core Concepts

🔌 Backend Abstraction (Connectable Backends)

📚 What's Included (Core Features)

🚀 Getting Started

1. Installation

2. Quick Run Examples

⚙️ Execution Modes & Workflows

📊 Benchmarks

🧠 Advanced Topics

1. Tensor Operations

2. Model Definition

🩹 Troubleshooting & Best Practices

About

Uh oh!

Releases 8

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🍓🎇 rasptorch

✨ Core Concepts

🔌 Backend Abstraction (Connectable Backends)

📚 What's Included (Core Features)

🚀 Getting Started

1. Installation

2. Quick Run Examples

⚙️ Execution Modes & Workflows

📊 Benchmarks

🧠 Advanced Topics

1. Tensor Operations

2. Model Definition

🩹 Troubleshooting & Best Practices

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 8

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages