qr-sampler

Plug any randomness source into LLM token sampling.

qr-sampler is an engine-agnostic framework that replaces standard pseudorandom token sampling with entropy from external sources — quantum random number generators (QRNGs), processor timing jitter, hardware noise, or any source you connect via gRPC or Python plugin. The core sampling pipeline has zero inference-engine dependencies; thin engine adapters integrate it with vLLM, vLLM-Metal, or any engine that supports logits processing.

pip install qr-sampler

Why qr-sampler?

Standard LLM inference uses pseudorandom number generators (PRNGs) for token sampling. PRNGs are deterministic — given the same seed, they produce the same output every time. qr-sampler replaces this with true randomness from physical processes:

Quantum RNGs — photon detectors, vacuum fluctuation devices, or any hardware QRNG over gRPC
Hardware noise — 63 thermal, timing, microarch, and GPU noise sources via OpenEntropy
Processor timing jitter — CPU clock variations as an entropy source (experimental)
Your own source — implement the EntropySource ABC or connect any hardware via the gRPC protocol
OS entropy — os.urandom() as a fallback or baseline

Consciousness-research context

qr-sampler provides infrastructure for studying whether conscious intent can influence quantum-random processes in LLM token selection. The signal amplification system converts thousands of random bytes into a single token choice, designed so that even a tiny statistical bias (e.g., 0.1% shift in byte means) produces a measurable effect on which token gets selected. All entropy is generated just-in-time — the quantum measurement happens after logits are computed, never before.

This is a research tool. It makes no claims about consciousness or quantum mechanics — it provides the infrastructure to run rigorous experiments.

Architecture

                          qr-sampler
  ┌──────────────────────────────────────────────────────┐
  │                                                      │
  │  ┌─────────────────────────────────────────────┐     │
  │  │           core/ (engine-agnostic)            │     │
  │  │  SamplingPipeline: numpy-only, no torch     │     │
  │  │                                              │     │
  │  │  entropy/ ──► amplification/ ──► selection/  │     │
  │  │      │            │                  │       │     │
  │  │  get_random   amplify(bytes)    CDF search  │     │
  │  │  _bytes(n)    → u ∈ (0,1)      → token_id  │     │
  │  │                                              │     │
  │  │  temperature/ ─── compute_temperature()      │     │
  │  │  logging/ ─────── per-token diagnostics      │     │
  │  └──────────────────────┬──────────────────────┘     │
  │                         │                            │
  │  ┌──────────────────────┴──────────────────────┐     │
  │  │         engines/ (thin adapters)             │     │
  │  │  VLLMAdapter: torch ↔ numpy, one-hot force  │     │
  │  │  (future: MLXAdapter, TRTLLMAdapter, ...)   │     │
  │  └─────────────────────────────────────────────┘     │
  │                                                      │
  │  profiles/ ─── declarative YAML metadata             │
  │  cli/ ──────── validate, build, list, info           │
  │  templates/ ── Jinja2 for Docker Compose generation  │
  └──────────────────────────────────────────────────────┘

Per-token sampling pipeline

Engine adapter calls pipeline.sample_token(logits_1d)
  │
  ├─ Temperature strategy ─────── Compute per-token temperature
  │   (fixed or entropy-dependent)    from the logit distribution
  │
  ├─ Entropy source ───────────── Fetch fresh random bytes
  │   (gRPC / system / timing /       just-in-time, after logits exist
  │    openentropy / custom)
  │
  ├─ Signal amplification ─────── Convert 20,480 bytes → one float u ∈ (0,1)
  │   (z-score or ECDF)               via statistical aggregation
  │
  ├─ Token selector ───────────── top-k → softmax → top-p → CDF → select
  │   (CDF binary search with u)      token from probability distribution
  │
  └─ Force one-hot logits ─────── Set selected token to 0.0, all others to -inf
      (engine picks exactly              (returned as numpy; adapter converts
       this token)                        to engine-native tensor)

The core pipeline is importable and functional without vLLM, torch, or any engine package. Engine adapters convert between engine-native tensors and numpy, delegate to SamplingPipeline.sample_token(), and write the result back.

Quick start

1. Validate your stack (optional)

The CLI checks compatibility of engines, models, entropy sources, and amplifiers before you deploy:

pip install qr-sampler[cli]

# Check a specific combination
qr-sampler validate --engine vllm --model Qwen/Qwen2.5-1.5B-Instruct --entropy quantum_grpc

# Exit codes: 0 = all known-working, 1 = untested, 2 = incompatible

2. Generate deployment files

# Generate Docker Compose for vLLM + quantum gRPC entropy
qr-sampler build --engine vllm --entropy quantum_grpc --output ./deploy

# Preview without writing files
qr-sampler build --engine vllm --entropy system --dry-run

This renders a docker-compose.yml and .env from built-in Jinja2 templates, configured for your chosen stack.

3. Launch

cd deploy
# Edit .env — set HF_TOKEN if using a gated model
docker compose up --build

4. Send a request

curl http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen2.5-1.5B-Instruct",
    "prompt": "The nature of consciousness is",
    "max_tokens": 100
  }'

Bare-metal install (without Docker)

pip install qr-sampler

# Start vLLM — qr-sampler registers automatically via entry points
vllm serve Qwen/Qwen2.5-1.5B-Instruct --dtype half --max-model-len 8192 --gpu-memory-utilization 0.80

Configure the entropy source via environment variables:

export QR_ENTROPY_SOURCE_TYPE=quantum_grpc
export QR_GRPC_SERVER_ADDRESS=localhost:50051
vllm serve Qwen/Qwen2.5-1.5B-Instruct --dtype half --max-model-len 8192 --gpu-memory-utilization 0.80

Apple Silicon (macOS)

qr-sampler works on Apple Silicon via vllm-metal, a community-maintained vLLM plugin under the official vllm-project GitHub org. It uses MLX under the hood but exposes the same vLLM API and plugin system — same entry points, same endpoints, same curl commands.

vllm-metal works with MLX-format models from the mlx-community collection on Hugging Face. These are pre-converted and quantized for Apple Silicon — pick one that fits your available memory.

Prerequisite: vllm-metal currently does not load custom logits processors registered via entry points — it creates an empty LogitsProcessors() instead of calling build_logitsprocs(). PR #124 fixes this with a 9-line patch that mirrors GPUModelRunner's pattern. Until it is merged, you will need to apply the patch manually or install from the PR branch. Without it, qr-sampler's plugin will be silently skipped.

1. Install vllm-metal

curl -fsSL https://raw.githubusercontent.com/vllm-project/vllm-metal/main/install.sh | bash

This creates a virtual environment at ~/.venv-vllm-metal with vLLM and all dependencies. Requires Python 3.12+.

2. Install qr-sampler

source ~/.venv-vllm-metal/bin/activate
pip install qr-sampler

3. Start the server

source ~/.venv-vllm-metal/bin/activate
vllm serve mlx-community/Qwen3-0.6B-4bit

qr-sampler registers automatically via the same vllm.logits_processors entry point. Look for this line in the server logs to confirm the plugin is active:

QRSamplerLogitsProcessor initialized: vocab_size=..., entropy_source=system+system, amplifier=zscore_mean, temperature=fixed

4. Send a request

curl http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mlx-community/Qwen3-0.6B-4bit",
    "prompt": "The nature of consciousness is",
    "max_tokens": 100
  }'

All configuration (entropy sources, temperature strategies, per-request overrides) works identically to the NVIDIA setup. The only difference is how vLLM itself is installed.

Note: Docker deployment profiles are not compatible with Apple Silicon — Docker on macOS runs a Linux VM with no Metal GPU passthrough. vllm-metal must run natively.

CLI reference

The CLI requires the [cli] extra: pip install qr-sampler[cli]

`qr-sampler list`

Browse available components:

qr-sampler list engines              # Engine profiles (vllm, vllm_metal)
qr-sampler list models --engine vllm # Known-working models for an engine
qr-sampler list entropy-sources      # All entropy source profiles
qr-sampler list amplifiers           # Signal amplification algorithms
qr-sampler list samplers             # Temperature strategies

`qr-sampler info`

Detailed information about a specific component:

qr-sampler info engine vllm
qr-sampler info entropy quantum_grpc
qr-sampler info amplifier zscore_mean
qr-sampler info sampler edt

`qr-sampler validate`

Check stack compatibility before deploying:

# Check specific combination
qr-sampler validate --engine vllm --model Qwen/Qwen2.5-1.5B-Instruct

# With a config file
qr-sampler validate --config stack.yaml

Exit codes: 0 = all known-working, 1 = untested combinations (warnings), 2 = incompatible or missing.

`qr-sampler build`

Generate Docker Compose deployment files:

# Generate files
qr-sampler build --engine vllm --entropy quantum_grpc --output ./deploy

# Preview without writing
qr-sampler build --engine vllm --entropy system --dry-run

# Bypass compatibility warnings
qr-sampler build --engine vllm --entropy timing_noise --force --output ./deploy

# From a config file
qr-sampler build --config stack.yaml --output ./deploy

Entropy sources

Built-in sources

Source	Identifier	Transport	Description
System	`system`	Local	`os.urandom()` — OS cryptographic RNG. Available everywhere. Default.
Quantum gRPC	`quantum_grpc`	gRPC	Remote entropy server via gRPC. Supports unary, server streaming, and bidi streaming.
OpenEntropy	`openentropy`	Local	63 hardware noise sources (thermal, timing, microarch, GPU). No network needed.
Timing noise	`timing_noise`	Local	CPU timing jitter (experimental).
Mock uniform	`mock_uniform`	Local	Configurable test source with seed/bias. For testing only.

Quantum gRPC

Connect any entropy server that speaks gRPC. qr-sampler ships example servers and an annotated template:

# Run the built-in urandom-over-gRPC example server
pip install qr-sampler
python examples/servers/simple_urandom_server.py --address 0.0.0.0:50051

# Point qr-sampler at it
export QR_ENTROPY_SOURCE_TYPE=quantum_grpc
export QR_GRPC_SERVER_ADDRESS=localhost:50051

Three transport modes:

Mode	`QR_GRPC_MODE`	Latency	Best for
Unary	`unary`	~1-2ms	Simplicity, debugging
Server streaming	`server_streaming`	~0.5-1ms	Middle ground
Bidirectional	`bidi_streaming`	~50-100us (same machine)	Production, lowest latency

For co-located hardware, use Unix domain sockets:

python my_qrng_server.py --address unix:///var/run/qrng.sock
export QR_GRPC_SERVER_ADDRESS=unix:///var/run/qrng.sock
export QR_GRPC_MODE=bidi_streaming

The gRPC client is protocol-agnostic. It uses configurable method paths and generic protobuf wire-format encoding. The only requirement is that your proto puts the byte count as field 1 in the request and the random bytes as field 1 in the response. Configure custom protos via QR_GRPC_METHOD_PATH and QR_GRPC_STREAM_METHOD_PATH.

Circuit breaker

The gRPC client includes an adaptive circuit breaker:

Tracks rolling P99 latency over the last 100 requests
Sets timeout to max(5ms, P99 * 1.5) (configurable via QR_CB_* env vars)
Opens after 3 consecutive failures, enters half-open state after 10s
Falls back to QR_FALLBACK_MODE when the circuit is open

OpenEntropy

OpenEntropy harvests entropy from 63 hardware noise sources on the local machine — thermal sensors, CPU timing jitter, memory timing, GPU scheduling, and more. No network, no API keys, no gRPC server needed.

pip install openentropy
export QR_ENTROPY_SOURCE_TYPE=openentropy
export QR_OE_CONDITIONING=raw   # raw (research default) | vonneumann | sha256

List available sources on your machine:

from openentropy import detect_available_sources
print([s["name"] for s in detect_available_sources()])

Fallback behavior

The FallbackEntropySource wraps a primary source with an automatic fallback:

Only catches EntropyUnavailableError — other exceptions propagate
Logs a warning when fallback is used
Exposes last_source_used for diagnostics

Configure with QR_FALLBACK_MODE:

system — fall back to os.urandom() (default)
mock_uniform — fall back to the mock source
error — raise immediately, no fallback

Third-party entropy sources

Any Python package can register entropy sources via entry points:

# In your package's pyproject.toml
[project.entry-points."qr_sampler.entropy_sources"]
lava_lamp = "my_package:LavaLampEntropySource"

The source is auto-discovered when qr-sampler starts. See Setting up your own entropy source below.

Signal amplification

The signal amplification system converts raw entropy bytes into a single uniform float u in (0, 1) that drives token selection from the CDF.

Z-score mean (`zscore_mean`) — default

Interprets raw bytes as uint8 values
Computes the sample mean M
Derives SEM = population_std / sqrt(N) (never stored — always computed)
Computes z-score: z = (M - population_mean) / SEM
Maps to uniform via normal CDF: u = 0.5 * (1 + erf(z / sqrt(2)))
Clamps to (epsilon, 1-epsilon)

Under the null hypothesis (no bias), u is uniformly distributed on (0, 1). A small per-byte bias accumulates over thousands of samples, producing a detectable shift:

20,480 bytes with +0.003 mean bias per byte:
  M ~ 127.56, SEM ~ 0.514, z ~ 0.12, u ~ 0.548

ECDF (`ecdf`)

Empirical CDF amplifier with online calibration. Maps raw bytes to uniform via a calibrated empirical distribution function. Does not assume a specific input distribution — it calibrates from observed data.

Temperature strategies

Fixed temperature (`fixed`)

Returns a constant temperature for every token. Set via QR_FIXED_TEMPERATURE (default: 0.7).

Entropy-dependent temperature (`edt`)

Dynamically adjusts temperature based on the Shannon entropy of the logit distribution:

H_norm = H / ln(vocab_size)         # Normalized entropy [0, 1]
T = base_temp * H_norm^exponent     # Power-law scaling
T = clamp(T, min_temp, max_temp)    # Bounds enforcement

High-entropy (uncertain) distributions get higher temperatures; low-entropy (confident) distributions get lower temperatures. This creates a feedback loop where the model's own uncertainty calibrates the randomness of selection.

Per-request parameter overrides

Override sampling parameters on individual requests via extra_args:

curl http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen2.5-1.5B-Instruct",
    "prompt": "The nature of consciousness is",
    "max_tokens": 100,
    "extra_args": {
      "qr_temperature_strategy": "edt",
      "qr_top_k": 100,
      "qr_top_p": 0.95,
      "qr_sample_count": 40960,
      "qr_diagnostic_mode": true
    }
  }'

Infrastructure fields (QR_GRPC_SERVER_ADDRESS, QR_FALLBACK_MODE, etc.) cannot be overridden per-request — they are set at server startup.

Web UI

qr-sampler works with Open WebUI, a self-hosted ChatGPT-style interface that connects to vLLM's OpenAI-compatible API.

Docker (NVIDIA / Linux): The qr-sampler build command includes Open WebUI in the generated Compose file:

qr-sampler build --engine vllm --entropy quantum_grpc --output ./deploy
cd deploy
docker compose up --build
# Open http://localhost:3000

Apple Silicon: Run Open WebUI standalone in Docker, pointing at your vllm-metal server:

docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -e OPENAI_API_BASE_URL=http://host.docker.internal:8000/v1 \
  -e OPENAI_API_KEY=not-needed \
  -e WEBUI_AUTH=false \
  --name open-webui \
  ghcr.io/open-webui/open-webui:main

Controlling qr-sampler from the UI

A pre-built filter function injects qr-sampler per-request parameters into every chat message via the Open WebUI Valves system. This lets you adjust temperature, top-k, top-p, sample count, and other sampling parameters from the admin panel.

To set it up:

Go to Admin Panel > Functions in Open WebUI.
Click Import and select examples/open-webui/qr_sampler_filter.json.
Toggle the function to Global.
Click the gear icon to adjust parameters.

See examples/open-webui/README.md for the full guide.

Open WebUI is entirely optional. qr-sampler works the same way with direct API calls, curl, Python clients, or any OpenAI-compatible tool.

Configuration reference

All configuration is done via environment variables with the QR_ prefix. Per-request overrides use the qr_ prefix in extra_args.

Infrastructure fields (NOT per-request overridable)

Environment variable	Default	Description
`QR_ENTROPY_SOURCE_TYPE`	`system`	Primary entropy source identifier
`QR_GRPC_SERVER_ADDRESS`	`localhost:50051`	gRPC entropy server address (`host:port` or `unix:///path`)
`QR_GRPC_TIMEOUT_MS`	`5000`	gRPC call timeout in milliseconds
`QR_GRPC_RETRY_COUNT`	`2`	Retry attempts after gRPC failure
`QR_GRPC_MODE`	`unary`	Transport mode: `unary`, `server_streaming`, `bidi_streaming`
`QR_GRPC_METHOD_PATH`	`/qr_entropy.EntropyService/GetEntropy`	gRPC method path for unary RPC
`QR_GRPC_STREAM_METHOD_PATH`	`/qr_entropy.EntropyService/StreamEntropy`	gRPC method path for streaming RPC (empty disables streaming)
`QR_GRPC_API_KEY`	(empty)	API key sent via gRPC metadata (empty = no auth)
`QR_GRPC_API_KEY_HEADER`	`api-key`	gRPC metadata header name for the API key
`QR_FALLBACK_MODE`	`system`	Fallback when primary fails: `error`, `system`, `mock_uniform`
`QR_CB_WINDOW_SIZE`	`100`	Rolling latency window size for P99 computation
`QR_CB_MIN_TIMEOUT_MS`	`5.0`	Minimum adaptive timeout in milliseconds
`QR_CB_TIMEOUT_MULTIPLIER`	`1.5`	Multiplier applied to P99 latency for adaptive timeout
`QR_CB_RECOVERY_WINDOW_S`	`10.0`	Seconds before half-open retry after circuit opens
`QR_CB_MAX_CONSECUTIVE_FAILURES`	`3`	Consecutive failures before circuit breaker opens

Sampling parameters (per-request overridable)

Environment variable	extra_args key	Default	Description
`QR_SIGNAL_AMPLIFIER_TYPE`	`qr_signal_amplifier_type`	`zscore_mean`	Signal amplification algorithm
`QR_SAMPLE_COUNT`	`qr_sample_count`	`20480`	Entropy bytes fetched per token
`QR_POPULATION_MEAN`	`qr_population_mean`	`127.5`	Null-hypothesis mean for byte values
`QR_POPULATION_STD`	`qr_population_std`	`73.612...`	Population std for uniform [0, 255]
`QR_UNIFORM_CLAMP_EPSILON`	`qr_uniform_clamp_epsilon`	`1e-10`	Clamp u to avoid degenerate CDF
`QR_TEMPERATURE_STRATEGY`	`qr_temperature_strategy`	`fixed`	Strategy: `fixed` or `edt`
`QR_FIXED_TEMPERATURE`	`qr_fixed_temperature`	`0.7`	Constant temperature (fixed strategy)
`QR_EDT_BASE_TEMP`	`qr_edt_base_temp`	`0.8`	Base coefficient for EDT
`QR_EDT_EXPONENT`	`qr_edt_exponent`	`0.5`	Power-law exponent for EDT
`QR_EDT_MIN_TEMP`	`qr_edt_min_temp`	`0.1`	EDT temperature floor
`QR_EDT_MAX_TEMP`	`qr_edt_max_temp`	`2.0`	EDT temperature ceiling
`QR_TOP_K`	`qr_top_k`	`50`	Top-k filtering (`<=0` disables)
`QR_TOP_P`	`qr_top_p`	`0.9`	Nucleus sampling threshold (`1.0` disables)
`QR_LOG_LEVEL`	`qr_log_level`	`summary`	Logging: `none`, `summary`, `full`
`QR_DIAGNOSTIC_MODE`	`qr_diagnostic_mode`	`false`	Store all token records in memory

You can also use a .env file — pydantic-settings loads it automatically.

Setting up your own entropy source

qr-sampler is designed to connect any randomness source to LLM token sampling. There are two approaches.

Approach A: gRPC server (recommended)

Implement a gRPC server. You can use the built-in qr_entropy.EntropyService protocol (example servers provided), or your own proto as long as field 1 carries the byte count (request) and random bytes (response).

5-minute walkthrough

Copy the template:

cp examples/servers/qrng_template_server.py my_qrng_server.py

Implement three methods in the QRNGHardware class:

class QRNGHardware:
    def __init__(self, device_path="/dev/qrng0"):
        self._device = open(device_path, "rb")

    def generate(self, n_bytes: int) -> bytes:
        # CRITICAL: Generate entropy NOW, not from a buffer.
        # The quantum measurement must happen during this call.
        return self._device.read(n_bytes)

    def close(self):
        self._device.close()

Run it:

pip install qr-sampler
python my_qrng_server.py --port 50051

Deploy with Docker:

# Generate deployment files
qr-sampler build --engine vllm --entropy quantum_grpc --output ./deploy

# Configure the gRPC address
cd deploy
# Edit .env: QR_GRPC_SERVER_ADDRESS=<your-server>:50051
docker compose up --build

Or configure directly via environment variables (bare-metal):

export QR_ENTROPY_SOURCE_TYPE=quantum_grpc
export QR_GRPC_SERVER_ADDRESS=localhost:50051
vllm serve Qwen/Qwen2.5-1.5B-Instruct --dtype half --max-model-len 8192 --gpu-memory-utilization 0.80

The template handles all gRPC boilerplate (unary + bidirectional streaming, health checks, graceful shutdown). You only write the hardware-specific code.

The gRPC protocol

service EntropyService {
  rpc GetEntropy (EntropyRequest) returns (EntropyResponse);
  rpc StreamEntropy (stream EntropyRequest) returns (stream EntropyResponse);
}

message EntropyRequest {
  int32 bytes_needed = 1;
  int64 sequence_id = 2;
}

message EntropyResponse {
  bytes data = 1;
  int64 sequence_id = 2;
  int64 generation_timestamp_ns = 3;
  string device_id = 4;
}

Any language that supports gRPC can implement this server — Python, C++, Rust, Go, etc.

Just-in-time constraint

The entropy must be generated after the client sends the request, not from a pre-generated pool:

No buffering or caching of previously generated bytes
The physical measurement happens during the generate() call
generation_timestamp_ns in the response proves freshness

This is critical for consciousness-research applications where the timing relationship between logit computation and entropy generation matters.

Deployment options

systemd (Linux):

sudo cp examples/systemd/qr-entropy-server.service /etc/systemd/system/
sudo cp examples/systemd/qr-entropy-server.env /etc/default/qr-entropy-server
sudo systemctl enable --now qr-entropy-server

Unix domain sockets (lowest latency for co-located hardware):

python my_qrng_server.py --address unix:///var/run/qrng.sock
export QR_GRPC_SERVER_ADDRESS=unix:///var/run/qrng.sock

Approach B: Python plugin (in-process)

For entropy sources that don't need a separate server, implement the EntropySource ABC directly:

from qr_sampler.entropy.base import EntropySource
from qr_sampler.entropy.registry import register_entropy_source

@register_entropy_source("my_source")
class MyEntropySource(EntropySource):
    @property
    def name(self) -> str:
        return "my_source"

    @property
    def is_available(self) -> bool:
        return True

    def get_random_bytes(self, n: int) -> bytes:
        return my_hardware.read(n)

    def close(self) -> None:
        my_hardware.disconnect()

Register via entry points in your package's pyproject.toml:

[project.entry-points."qr_sampler.entropy_sources"]
my_source = "my_package.entropy:MyEntropySource"

Then set QR_ENTROPY_SOURCE_TYPE=my_source.

Validation

Test your entropy source:

from qr_sampler.entropy.quantum import QuantumGrpcSource
from qr_sampler.config import QRSamplerConfig

config = QRSamplerConfig(
    entropy_source_type="quantum_grpc",
    grpc_server_address="localhost:50051",
)
source = QuantumGrpcSource(config)

# Basic connectivity
data = source.get_random_bytes(1024)
assert len(data) == 1024

# Health check
status = source.health_check()
print(status)  # {'source': 'quantum_grpc', 'healthy': True, ...}

source.close()

For statistical validation, check that your source produces uniform byte distributions:

import numpy as np
from scipy import stats

data = source.get_random_bytes(100_000)
samples = np.frombuffer(data, dtype=np.uint8)

# KS test against uniform distribution
stat, p_value = stats.kstest(samples / 255.0, 'uniform')
print(f"KS statistic: {stat:.6f}, p-value: {p_value:.6f}")
# p-value should be > 0.05 for a good entropy source

Plugin architecture

qr-sampler uses a registry + entry-points pattern for extensibility:

qr_sampler.entropy_sources          Third-party entropy sources
qr_sampler.engine_adapters          Third-party engine adapters
vllm.logits_processors              vLLM plugin registration

Each subsystem (entropy, amplification, temperature, engines) has its own registry with decorator-based registration for built-in implementations and entry-point discovery for third-party extensions. The pipeline never instantiates strategy classes directly — it always goes through the registry.

Adding new components

New engine adapter: Subclass EngineAdapter, implement get_pipeline(). Register with @EngineAdapterRegistry.register("name"). Add entry point under qr_sampler.engine_adapters.

New entropy source: Subclass EntropySource, implement name, is_available, get_random_bytes(), close(). Register with @register_entropy_source("name").

New signal amplifier: Subclass SignalAmplifier, implement amplify(raw_bytes) -> AmplificationResult. Register with @AmplifierRegistry.register("name").

New temperature strategy: Subclass TemperatureStrategy, implement compute_temperature(logits, config) -> TemperatureResult. Always compute Shannon entropy. Register with @TemperatureStrategyRegistry.register("name").

See CONTRIBUTING.md for detailed development instructions.

Project structure

src/qr_sampler/
├── __init__.py                    # Package version, re-exports
├── __main__.py                    # CLI entry: python -m qr_sampler
├── config.py                      # Pydantic-settings configuration
├── exceptions.py                  # Exception hierarchy
├── processor.py                   # Backward compat: re-exports VLLMAdapter
├── py.typed                       # PEP 561 type hint marker
├── core/                          # Engine-agnostic pipeline (NO torch)
│   ├── pipeline.py                # SamplingPipeline + factory functions
│   └── types.py                   # SamplingResult frozen dataclass
├── engines/                       # Engine adapter layer
│   ├── base.py                    # EngineAdapter ABC
│   ├── registry.py                # EngineAdapterRegistry
│   └── vllm.py                    # VLLMAdapter: vLLM V1 LogitsProcessor
├── profiles/                      # Declarative YAML metadata
│   ├── schema.py                  # Pydantic validation models
│   ├── loader.py                  # Profile discovery + loading
│   ├── compatibility.py           # Tri-state compatibility checker
│   ├── engines/                   # Engine profiles (vllm, vllm_metal)
│   ├── entropy/                   # Entropy profiles (system, quantum_grpc, ...)
│   ├── amplifiers/                # Amplifier profiles (zscore_mean, ecdf)
│   └── samplers/                  # Sampler profiles (fixed, edt)
├── cli/                           # CLI commands (requires [cli] extra)
│   ├── main.py                    # Click group
│   ├── validate_cmd.py            # qr-sampler validate
│   ├── build_cmd.py               # qr-sampler build
│   ├── list_cmd.py                # qr-sampler list
│   └── info_cmd.py                # qr-sampler info
├── templates/                     # Jinja2 templates for qr-sampler build
│   ├── docker-compose.yml.j2
│   └── env.j2
├── amplification/
│   ├── base.py                    # SignalAmplifier ABC, AmplificationResult
│   ├── registry.py                # AmplifierRegistry
│   ├── zscore.py                  # Z-score mean amplifier
│   └── ecdf.py                    # ECDF amplifier
├── entropy/
│   ├── base.py                    # EntropySource ABC
│   ├── registry.py                # Auto-discovery registry + entry points
│   ├── quantum.py                 # gRPC QRNG (3 transport modes, circuit breaker)
│   ├── system.py                  # os.urandom()
│   ├── timing.py                  # CPU timing jitter (experimental)
│   ├── mock.py                    # Configurable test source
│   ├── fallback.py                # Fallback composition wrapper
│   └── openentropy.py             # OpenEntropy integration
├── logging/
│   ├── types.py                   # TokenSamplingRecord (frozen, __slots__)
│   └── logger.py                  # SamplingLogger (none/summary/full)
├── proto/
│   ├── entropy_service.proto      # gRPC protocol definition
│   ├── entropy_service_pb2.py     # Hand-written protobuf stubs
│   └── entropy_service_pb2_grpc.py
├── selection/
│   ├── types.py                   # SelectionResult dataclass
│   └── selector.py                # CDF-based token selector
└── temperature/
    ├── base.py                    # TemperatureStrategy ABC, Shannon entropy
    ├── registry.py                # TemperatureStrategyRegistry
    ├── fixed.py                   # Fixed temperature
    └── edt.py                     # Entropy-dependent temperature

examples/
├── servers/                       # Example entropy servers
│   ├── simple_urandom_server.py   # Minimal reference (~50 lines)
│   ├── timing_noise_server.py     # CPU timing jitter
│   └── qrng_template_server.py    # Annotated template for custom QRNGs
├── open-webui/                    # Open WebUI filter function
├── docker/                        # Dockerfiles
└── systemd/                       # systemd unit files

Statistical analysis

qr-sampler includes statistical tests (in tests/test_statistical_properties.py, requires scipy) that validate the mathematical properties of the sampling pipeline:

KS-test for u-value uniformity: Under the null hypothesis (no bias), amplified u values should be uniformly distributed on (0, 1).
Bias detection: Verifies that a small per-byte mean shift produces a statistically detectable shift in the u distribution — confirming the amplification system is sensitive enough for consciousness-research experiments.
EDT monotonicity: Validates that the entropy-dependent temperature strategy produces higher temperatures for higher-entropy logit distributions.

pytest tests/test_statistical_properties.py -v

Development

git clone https://github.com/alchemystack/qr-sampler.git
cd qr-sampler
pip install -e ".[dev]"

# Run tests
pytest tests/ -v

# Lint and format
ruff check src/ tests/
ruff format src/ tests/

# Type check
mypy --strict src/

# Pre-commit hooks
pre-commit install
pre-commit run --all-files

See CONTRIBUTING.md for the full development guide.

License

Apache 2.0 — see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.github		.github
deployments		deployments
examples		examples
src/qr_sampler		src/qr_sampler
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

qr-sampler

Why qr-sampler?

Consciousness-research context

Architecture

Per-token sampling pipeline

Quick start

1. Validate your stack (optional)

2. Generate deployment files

3. Launch

4. Send a request

Bare-metal install (without Docker)

Apple Silicon (macOS)

1. Install vllm-metal

2. Install qr-sampler

3. Start the server

4. Send a request

CLI reference

qr-sampler list

qr-sampler info

qr-sampler validate

qr-sampler build

Entropy sources

Built-in sources

Quantum gRPC

Circuit breaker

OpenEntropy

Fallback behavior

Third-party entropy sources

Signal amplification

Z-score mean (zscore_mean) — default

ECDF (ecdf)

Temperature strategies

Fixed temperature (fixed)

Entropy-dependent temperature (edt)

Per-request parameter overrides

Web UI

Controlling qr-sampler from the UI

Configuration reference

Infrastructure fields (NOT per-request overridable)

Sampling parameters (per-request overridable)

Setting up your own entropy source

Approach A: gRPC server (recommended)

5-minute walkthrough

The gRPC protocol

Just-in-time constraint

Deployment options

Approach B: Python plugin (in-process)

Validation

Plugin architecture

Adding new components

Project structure

Statistical analysis

Development

License

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`qr-sampler list`

`qr-sampler info`

`qr-sampler validate`

`qr-sampler build`

Z-score mean (`zscore_mean`) — default

ECDF (`ecdf`)

Fixed temperature (`fixed`)

Entropy-dependent temperature (`edt`)

Packages