Sphere

Sphere is a research program focused on the invariant infrastructure needed to make it safe for any type of stateless intelligence to take autonomous stateful action.

The name Sphere comes from the geometric property that a sphere has a finite and fully known surface area while containing an infinite and uncountable number of radii.

Sphere is currently focused on engineering with LLMs and is investigating two claims:

Agentic action can be bounded, observable, and verifiable without trusting the agent or forfeiting its expected utility.

Sphere hypothesizes that agents operating within its engineering produce lower out-of-bounds action rates and higher verifiability than the same agent without it, with no drop in work quality. This is tested by ablating components across fixed agent tasks, isolating the engineering as the variable. The hypothesis is weakened if ablation produces no measurable change, if the engineering reduces work quality, if prompts alone produce the same bounds, or if a placebo version produces comparable results.

Agent performance can be stabilized across open and SOTA models with the use of a closed codebase vocabulary, deterministic tools, cross-model evaluations, and interpretability-informed training.

Sphere hypothesizes that an open model within its engineering performs comparably to state-of-the-art models from one or two generations ago, on the same work, with less variance across runs. This is tested by running the same work items on the open model and prior-generation SOTA models with engineering active and ablated, isolating the engineering as the variable. The hypothesis is weakened if the open model does not approach prior-generation SOTA performance, if one component does all the work, if results don't generalize across models, or if a placebo version produces the same results.

Research is executed across ten tracks: Axis, Inclination, Tangent, Radii, Curve, Arc, Surface, Shell, Sector, Limit. Each track enables, produces, or furthers Sphere's work.

Orientation:

FINDINGS.md — what Sphere has learned, with links to sources.
INCLINATION.json — the live operator working surface; intents, hypotheses, experiments.
experiments/ — per-experiment design, runs, artifacts, papers.
architecture/ — design documents spanning experiments.

Axis

Axis is the infrastructure the research relies on.

A single Mac mini was chosen to host the research due to the unified memory it offers and its wide consumer availability. Where possible, infrastructure choices favor open source and enterprise-grade solutions to strengthen scalability and reproducibility of research findings in both enterprise and independent contexts.

Term	Description
Identity	Domain, Entra IAM, Microsoft Exchange, YubiKey
Vault	Bitwarden
Compute	Mac mini (M4 Pro, 64GB RAM, 512GB SSD), GCP, AWS
Network	Tailscale, Spectrum
Container	Podman
Engine	MLX LM, Ollama, llama.cpp

Inclination

Inclination is the intent, context, expected outputs, and desired outcomes provided to the agent via the Work directory.

The human originates, the agent executes, and then the human reviews and approves. Evaluation, verification, and enforcement are ensured independently of both.

Term	Description
Human	Operator; originates intent, reviews and approves outputs
Work	Work directory; intent, context, outputs, outcomes

Tangent

Tangent provides the human-computer interface.

A custom CLI provides both a unified command and observability surface. The Mac mini and CLI are configured to enable remote access via SSH over Tailscale using Blink Shell.

Term	Description
CLI	Sphere CLI
Remote	Blink Shell, herdr (open-source terminal multiplexer for agents)

Radii

Runtime environment for the main builder agent.

The agent's attention is focused on executing work while deterministic code enforces authorization boundaries and workflow requirements.

Term	Description
Runtime	Vendor/OSS
Sandbox	macOS
Codebase	Closed code vocabulary, documentation
Stop Hooks	Per-turn batch commits
Tools	just-bash (Vercel Labs)
LLM1	Builder model

Curve

Curve is the study of agent trajectories as new capabilities are established.

Where Arc records the execution that occurred, Curve examines the executions that could occur, the edge cases, adversarial chains, and failure paths.

Term	Description
Red-teaming	Adversarial probing of reachable agent trajectories
Capability analysis	Mapping the paths a new Radii capability makes reachable

Arc

Mediation and evidence layer for agent invocation and machine execution.

The agent invokes tools via a deterministic intermediary that initiates execution, returns outputs, and writes each step to an append-only log; tools requiring network connectivity route through an APISIX gateway.

Each work cycle is captured in Git, enforced against policy as code (OPA, AST walkers), and bundled into the Debrief, which is an evidence packet in HTML format that the operator reads to approve or reject the work for push.

The append-only log and APISIX gateway provide real-time observability via the Sphere CLI.

Term	Description
Intermediary	Deterministic (Rust)
Log	Append-only
Gateway	APISIX
Git	Version control, commit signing, pre-push hooks
Policy	OPA, AST walkers
Debrief	Evidence packet in HTML format

Surface

Surface is the machine-visible record of execution. Surface makes agent action visible through OS execution artifacts.

Term	Description
Witness node	A Surface Laptop 5 running Linux, observing execution at the machine layer
Telemetry	Process, file, and network traces (osquery, auditd, journald, OpenTelemetry)

Shell

Enhancement layer that adds evaluation and adversarial cross-model review to Arc.

Builder model's (LLM1) outputs are scored by the evaluations harness, which is served via the intermediary like any other tool. A reviewer model (LLM2), running on different weights and intelligence than the builder (LLM1), reviews the work and produces a justification that folds into the Debrief.

Term	Description
Evals	Output performance and behavior measurements
LLM2	Adversarial cross-model reviewer

Sector

Interpretability research and experimentation inside LLM1 and LLM2.

Everything to this point has been engineering around the model. Sector goes inside the model, applying interpretability tooling to open instrumentable models and using what's learned to inform training pipelines.

Sector will run various experiments to explore the effects that various combinations of variables from across Radii, Arc, and Shell have on agent performance.

Term	Description
Interpretability	TransformerLens, Circuit Tracer, SAELens, Neuronpedia, HeadVis
Weights	Open instrumentable models (Qwen, Gemma, GPT-OSS), Hugging Face
Training	Interpretability-informed training, RLHF (reinforcement learning from human feedback)

Limit

Empirical limit of open-model performance against state-of-the-art.

Limit takes what's been built across Radii, Arc, and Shell, combines it with what's been learned in Sector, and then measures how close open models can come to state-of-the-art models on the same work, and where gaps remain.

Term	Description
Benchmarks	Open vs state-of-the-art performance measurements

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
.agents		.agents
.claude		.claude
.github		.github
.pi		.pi
architecture		architecture
enforcement		enforcement
experiments		experiments
inference		inference
infra		infra
interface		interface
interpretability		interpretability
languages		languages
observability		observability
orchestration		orchestration
tools		tools
.gitignore		.gitignore
.markdownlint-cli2.jsonc		.markdownlint-cli2.jsonc
.yamllint		.yamllint
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
FINDINGS.md		FINDINGS.md
INCLINATION.json		INCLINATION.json
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sphere

Axis

Inclination

Tangent

Radii

Curve

Arc

Surface

Shell

Sector

Limit

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sphere

Axis

Inclination

Tangent

Radii

Curve

Arc

Surface

Shell

Sector

Limit

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages