Barkhausen stability monitor for AI agent loops.
Replace max_iterations=5 with a real-time trajectory classifier that reads four features off the loop's error series and routes it into one of five named states — knowing whether your agent loop is converging, stalling, oscillating, or diverging, and what to do in each case.
Home: loopgain.ai
Works for any iterative AI workflow with a measurable error signal — verify-revise loops, refinement passes, tool-use retry chains, RAG with self-correction, code-gen with linter feedback, multi-step reasoning loops. Pre-built adapters for LangGraph, CrewAI, AutoGen, LangChain, OpenAI Agents SDK, and Claude Agent SDK; drop-in via the raw API for any custom stack. Pure Python, no runtime dependencies.
Keywords: AI agent loops · agentic AI · infinite loop detection · divergence detection · early stopping · convergence · agent orchestration · LLM stability · generator-verifier-reviser · feedback-loop control.
Production agent loops universally use max_iterations=N as their termination policy. It's the embarrassing default of agentic AI: you either waste compute (loop stops too late) or ship bad output (loop stops too early). LoopGain replaces it with a control-theoretic stability monitor based on the Barkhausen criterion — a foundational result from electrical-engineering feedback-oscillator analysis (1921).
pip install loopgainPure Python, no dependencies, supports Python 3.10+.
Three lines of code wrap any iterative loop with a measurable error signal:
from loopgain import LoopGain
lg = LoopGain(target_error=0.1)
while lg.should_continue():
errors = verifier.verify(output)
lg.observe(errors, output=output)
output = reviser.revise(output, errors)
result = lg.result
print(result.outcome) # "converged" | "oscillating" | "diverged" | "stalled" | "max_iterations"
print(result.best_output) # the lowest-error iteration's output
print(result.iterations_used)
print(result.gain_margin) # 1 / max(Aβ_smooth)
print(result.savings_vs_fixed_cap)observe() accepts either a numeric error magnitude or any sequence (whose length becomes the magnitude). Pass output=... to enable the best-so-far buffer.
LoopGain measures empirical loop gain (Aβ = E(n) / E(n-1)) at every iteration and exposes it as a smoothed time series for visualization. The decision engine, however, classifies the full error trajectory using four features:
E_ratio = E_current / E_first # cumulative reduction
slope_log = OLS slope of log10(E) # geometric trend direction
slope_p = t-test p-value of slope # statistical significance
osc_std = std of detrended log10(E) # oscillation magnitude
It routes the trajectory into one of five named states:
| State | Condition | Action |
|---|---|---|
FAST_CONVERGE |
cumulative reduction to ≤ 10% of E_first | Continue, predict ETA |
CONVERGING |
negative slope with p < 0.05, OR cumulative ≤ 50% |
Continue, watch for upward drift |
STALLING |
no significant slope, no detectable oscillation | Stop after 2 consecutive readings — return best-so-far |
OSCILLATING |
high residual variance with flat trend | Stop — return best-so-far |
DIVERGING |
positive slope with p < 0.05 AND cumulative > 110% |
Abort — roll back to best-so-far |
Plus a short-circuit: if observed error drops at or below target_error, the loop stops immediately with state TARGET_MET. The default target_error=0.0 short-circuits on exactly zero error — the natural completion signal for verifier-driven loops. Pass target_error=None to disable the short-circuit and rely on stability detection alone.
The decision is conservative by design: requiring both statistical significance and meaningful cumulative motion before terminating prevents false-positive aborts on noisy real-LLM error series. Validated at 98.8% macro-averaged accuracy across 5 regimes on N=1000 deterministic-mock trajectories (see RESULTS_v2_classifier.md). The STALLING ceiling of ~94% is the t-test's irreducible 5% type-I error rate, not a classifier weakness.
Recommended minimum: 6 iterations for reliable trend significance. At n≤4 the t-test is severely underpowered (df=2 requires |t|>4.3 for p<0.05) — the classifier conservatively falls back to STALLING when evidence is thin. The thresholds are derived analytically (control theory + statistical convention), not fitted; tune them per domain via the TrajectoryThresholds argument once you have production traces.
Legacy single-feature classifier: the original v0.1 single-Aβ-band classifier (thresholds 0.3 / 0.85 / 0.95 / 1.05) is still available via LoopGain(classifier='legacy_bands') for callers that have empirically tuned the bands to a specific workload.
When the loop is converging (Aβ_smooth < 1), LoopGain produces a closed-form prediction of iterations remaining:
n_remaining = log(E_target / E_current) / log(Aβ_smooth)
Available as lg.eta mid-loop. Returns None when the prediction isn't well-defined (no Aβ yet, target zero, or non-converging gain).
LoopGain keeps a buffer of all observed outputs paired with their error scores. On termination it returns argmin(error), not the last iteration:
| Terminal state | Returned output |
|---|---|
TARGET_MET |
Current output (by definition, the best) |
OSCILLATING |
Lowest-error iteration in the buffer |
DIVERGING |
Lowest-error iteration (which is not the last one) |
This transforms divergence detection from "abort with garbage" into "abort with the best you've seen so far" — a free quality floor.
LoopGain(target_error=0.0, max_iterations=None, thresholds=None, trajectory_thresholds=None, classifier='trajectory', smoothing_window=3, assumed_fixed_cap=10)
Construct the monitor.
target_error— Stop when an observed error drops at or below this. Default0.0short-circuits on exactly zero error (the natural completion signal for verifier-driven loops). PassNoneto disable the short-circuit entirely.max_iterations— Hard safety cap. DefaultNone(rely on stability detection). Recommended ~20–50 for production.thresholds— CustomThresholdBandsfor the legacy single-Aβ-band classifier. Ignored whenclassifier='trajectory'.trajectory_thresholds— CustomTrajectoryThresholdsfor the multi-feature classifier (the default). Override only with workload-specific evidence.classifier—'trajectory'(default, v0.2 multi-feature classifier) or'legacy_bands'(v0.1 single-Aβ-band classifier).smoothing_window— EMA window for the smoothed Aβ series (always maintained for visualization, regardless of classifier choice). Default 3.assumed_fixed_cap— Used to computesavings_vs_fixed_cap. Default 10.
Record this iteration's errors and optional output. Returns the current state name. errors accepts a number (used directly) or any sequence (length used as magnitude).
Returns False once a terminal state fires.
Current state name. One of INIT, FAST_CONVERGE, CONVERGING, STALLING, OSCILLATING, DIVERGING, TARGET_MET, MAX_ITERATIONS. The corresponding terminal result.outcome values are converged, oscillating, diverged, stalled (v0.2 trajectory mode only — STALLING terminating after 2 consecutive readings), max_iterations, or in_progress.
Predicted iterations to reach target. None when not well-defined.
1 / max(Aβ_smooth). > 1 means stable headroom across the entire run.
Terminal result with outcome, iterations_used, best_index, best_output, best_error, convergence_profile, error_history, gain_margin, savings_vs_fixed_cap. Safe to call mid-loop.
lg.send_telemetry(endpoint, token, workload_id=None, timeout=2.0, allow_insecure=False, framework=None, loop_type=None, team=None, include_per_iteration=True) -> bool
Opt-in. Send a single anonymized telemetry POST after the loop terminates. Best-effort — never raises, returns True on 2xx, False otherwise. Adapters auto-stamp framework; loop_type and team are free-form labels that surface as filters in the dashboard. Pass include_per_iteration=False to send aggregate summary only.
import os
from loopgain import LoopGain
lg = LoopGain(target_error=0.1)
# ... run the loop ...
lg.send_telemetry(
endpoint=os.environ["LOOPGAIN_TELEMETRY_ENDPOINT"], # or hardcode
token=os.environ["LOOPGAIN_TELEMETRY_TOKEN"], # never hardcode
workload_id="my-rag-pipeline", # opaque label
)Recommended setup: store the token outside source. Two clean options:
# Option A: environment variable (simplest)
export LOOPGAIN_TELEMETRY_ENDPOINT="https://telemetry.loopgain.ai/v1/aggregate"
export LOOPGAIN_TELEMETRY_TOKEN="lgk_..." # add to ~/.zshrc or ~/.bashrc
# Option B: macOS Keychain (more secure)
pip install keyring
python3 -c "import keyring; keyring.set_password('loopgain', 'telemetry', input('Token: '))"
# Then in code: keyring.get_password('loopgain', 'telemetry')What is sent: state transitions, Aβ summary (min/max/median), gain margin, rollback flag, iterations used, savings, library version, optional opaque workload_id, threshold config, hour-bucketed timestamp.
What is NEVER sent: prompts, completions, error contents, output buffer, individual Aβ values, or any customer identity beyond the bearer token. Privacy contract is enforced by the payload-shape unit tests in tests/test_telemetry.py.
The hosted endpoint at telemetry.loopgain.ai is one acceptable destination. The receiver and dashboard are both open-source — self-host to keep telemetry fully under your control.
Thin wrappers under loopgain.integrations drive each major agent framework's iteration with a LoopGain monitor and auto-stamp framework="<name>" on telemetry. The frameworks themselves are optional dependencies — install the extra you need:
pip install 'loopgain[langgraph]' # LangGraph
pip install 'loopgain[crewai]' # CrewAI
pip install 'loopgain[autogen]' # AutoGen v0.4+
pip install 'loopgain[langchain]' # LangChain (create_agent / AgentExecutor)
pip install 'loopgain[openai-agents]' # OpenAI Agents SDK
pip install 'loopgain[claude-agent-sdk]' # Anthropic Claude Agent SDK
pip install 'loopgain[all]' # all sixAll adapters take a LoopGain instance plus an error_fn you provide — the framework doesn't know what your error signal is, so the adapter doesn't either. error_fn returns a non-negative number (or None to skip an iteration).
Drives graph.stream(input, stream_mode="updates"). Each update is one iteration.
from loopgain import LoopGain
from loopgain.integrations import LangGraphAdapter
graph = build_my_verify_revise_graph().compile()
lg = LoopGain(target_error=0.1, max_iterations=20)
adapter = LangGraphAdapter(
lg=lg,
error_fn=lambda update: len(update.get("verifier", {}).get("errors", [])),
)
final_state = adapter.run(graph, {"draft": initial})
lg.send_telemetry(
endpoint=os.environ["LOOPGAIN_TELEMETRY_ENDPOINT"],
token=os.environ["LOOPGAIN_TELEMETRY_TOKEN"],
workload_id="rag-rewrite",
framework=adapter.framework_name, # "langgraph", auto-stamped
)adapter.stream(...) yields each item if you want the full trace; adapter.arun(...) / adapter.astream(...) are the async counterparts and accept an async error_fn.
Installs step_callback and/or task_callback on a Crew. Pick whichever granularity matches your loop — step_error_fn for refinement within a Task, task_error_fn for refinement across Tasks.
from crewai import Crew
from loopgain import LoopGain
from loopgain.integrations import CrewAIAdapter
lg = LoopGain(target_error=0.1, max_iterations=20)
adapter = CrewAIAdapter(
lg=lg,
task_error_fn=lambda task_output: count_failed_checks(task_output.raw),
)
crew = Crew(agents=[...], tasks=[...])
adapter.install(crew)
result = crew.kickoff()
adapter.uninstall() # or use `with CrewAIAdapter(...) as a:` context
lg.send_telemetry(
endpoint=...,
token=...,
framework=adapter.framework_name, # "crewai"
)The adapter chains with any callback you already had installed — your existing instrumentation isn't overwritten.
Wraps team.run_stream(task=...). In a verify-revise rotation, filter to the verifier's messages with observe_sources={"verifier"} so only it drives observe().
from autogen_agentchat.teams import RoundRobinGroupChat
from loopgain import LoopGain
from loopgain.integrations import AutoGenAdapter
team = RoundRobinGroupChat(participants=[generator, verifier])
lg = LoopGain(target_error=0.1, max_iterations=20)
adapter = AutoGenAdapter(
lg=lg,
error_fn=lambda msg: parse_verifier_score(msg.content),
observe_sources={"verifier"},
)
result = await adapter.run(team, task="...")
lg.send_telemetry(
endpoint=...,
token=...,
framework=adapter.framework_name, # "autogen"
)Pass a cancellation_token to adapter.run(...) and the adapter will cancel it when LoopGain reaches a terminal state (target met, oscillation, divergence). The legacy v0.2 ConversableAgent.initiate_chat API is not supported — use the v0.4 event-driven runtime.
Duck-types against any LangChain agent that exposes .stream(input, **kwargs) / .astream(input, **kwargs) — both the current langchain.agents.create_agent() (v1+) and the legacy AgentExecutor. The adapter forwards **stream_kwargs verbatim, so the chunk shape your error_fn sees is the one your agent emits.
from langchain.agents import create_agent
from loopgain import LoopGain
from loopgain.integrations import LangChainAdapter
agent = create_agent(model="gpt-5-nano", tools=[get_weather])
lg = LoopGain(target_error=0.0, max_iterations=20)
def error_fn(chunk):
if chunk.get("type") != "updates":
return None
# Count unresolved tool calls; drops to 0 once the agent stops calling tools.
return sum(
1 for _, update in chunk["data"].items()
if getattr(update.get("messages", [None])[-1], "tool_calls", None)
)
adapter = LangChainAdapter(lg=lg, error_fn=error_fn)
final = adapter.run(
agent,
{"messages": [{"role": "user", "content": "What's the weather?"}]},
stream_mode="updates",
version="v2",
)
lg.send_telemetry(
endpoint=...,
token=...,
framework=adapter.framework_name, # "langchain"
)For legacy AgentExecutor: just drop the stream_mode / version kwargs; each yielded chunk is an AddableDict per step (parse intermediate_steps or the terminal output key in your error_fn).
Wraps Runner.run_streamed(agent, input).stream_events(). The SDK is async-first; the adapter mirrors that. A run_sync helper wraps the async path with asyncio.run for synchronous callers.
from agents import Agent, function_tool
from loopgain import LoopGain
from loopgain.integrations import OpenAIAgentsAdapter
agent = Agent(name="Reviser", instructions="...", tools=[...])
lg = LoopGain(target_error=0.0, max_iterations=20)
def error_fn(event):
# Default observes only run_item_stream_event; pull the verifier's
# reported failure count off tool outputs.
if event.item.type == "tool_call_output_item":
return float(event.item.output.get("failures", 0))
return None
adapter = OpenAIAgentsAdapter(lg=lg, error_fn=error_fn)
result = await adapter.run(agent, input="Fix the bug.")
print(result.final_output)
lg.send_telemetry(
endpoint=...,
token=...,
framework=adapter.framework_name, # "openai-agents"
)By default the adapter only forwards run_item_stream_event to error_fn — pass observe_event_types=None to see every event (including raw token deltas and agent-handoff notifications). When LoopGain reaches a terminal state, the adapter best-effort calls .cancel() on the underlying RunResultStreaming.
Wraps Anthropic's claude_agent_sdk.query(prompt=..., options=...) async iterator. By default observes only AssistantMessage (skips UserMessage / SystemMessage / ResultMessage); override with observe_message_types=None or a custom tuple.
from claude_agent_sdk import ClaudeAgentOptions, TextBlock
from loopgain import LoopGain
from loopgain.integrations import ClaudeAgentSDKAdapter
def error_fn(message):
# Count `FAIL:` markers a self-verifying persona emits.
for block in getattr(message, "content", []):
if isinstance(block, TextBlock):
return float(block.text.count("FAIL:"))
return None
lg = LoopGain(target_error=0.0, max_iterations=20)
adapter = ClaudeAgentSDKAdapter(lg=lg, error_fn=error_fn)
options = ClaudeAgentOptions(system_prompt="Self-verify each draft.")
result = await adapter.run(
prompt="Write a haiku about feedback loops.",
options=options,
)
lg.send_telemetry(
endpoint=...,
token=...,
framework=adapter.framework_name, # "claude-agent-sdk"
)For the bidirectional ClaudeSDKClient use case, pass message_iterator=client.receive_messages() instead of prompt=....
For frameworks without an adapter, the raw LoopGain.observe() API works against any iterable. The adapters are 100-200 lines each — copy one of loopgain/integrations/{langgraph,crewai,autogen,langchain,openai_agents,claude_agent_sdk}.py as a starting point.
Initial public release. Core library shipped (current version: see the PyPI badge at the top). Framework adapters (LangGraph, CrewAI, AutoGen, LangChain, OpenAI Agents SDK, Claude Agent SDK) are installable as optional extras. The cloud-aggregator telemetry receiver and dashboard are live as separate open-source repos. The math and the API surface are stable.
This is alpha software. The API may break before 1.0 if production usage surfaces design issues; pin the version.
LoopGain applies the Barkhausen stability criterion (Heinrich Barkhausen, 1921 — the foundational result on when feedback amplifiers oscillate) to AI agent feedback loops. The criterion was originally a way to predict whether an electronic oscillator would sustain oscillation; it turns out to map cleanly onto any feedback loop you can attach an error signal to.
The cleanest summary: an iterative AI loop with a measurable error signal is a feedback system. The ratio E(n) / E(n-1) is its empirical loop gain. The Barkhausen result tells you that loop gain less than 1 converges, equal to 1 oscillates, greater than 1 diverges. LoopGain operationalizes this: classifies the loop's current band, decides what to do, and tells you when you'll converge.
Loop types this applies to in practice:
- Verify-revise loops (GVR pattern) — generator produces, verifier finds issues, reviser fixes. Error = issue count or severity-weighted score.
- Refinement loops — initial output, iterate to improve. Error = distance from target spec / rubric score.
- Tool-use retry chains — agent calls tool, gets back error/success, retries. Error = consecutive failure count or aggregate score.
- RAG with self-correction — retrieve, generate, critique, re-retrieve. Error = critique severity or hallucination score.
- Code generation with linter/test feedback — generate, run tests/linter, fix, repeat. Error = failing test count or linter violation count.
- Multi-step reasoning loops — ReAct-style think/act/observe iterations. Error = whatever the agent's quality assessor returns.
- Custom feedback loops — anything where you can produce a number that should drop toward zero as the loop succeeds.