AI Agents Playground — Runtime & Infrastructure Reasoning

A minimal, end‑to‑end demonstration of modern agent architectures for runtime and infrastructure diagnostics, built around real system logs, OpenStack control‑plane events, and swappable LLM backends.

This repository is a reference implementation showing how agents can reason over operational evidence.

Setup

cd Agents
python -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

If you plan to use local models, install and run Ollama first:

ollama --version

Configuration

Create a .env file in the repo root:

OPENAI_API_KEY=your_openai_api_key_here

Notes:

OPENAI_API_KEY is required when --llm openai is used.
Ollama backends (tiny, mistral, llama3, phi3, qwen14) require the ollama CLI.
Optional: set HF_TOKEN to improve Hugging Face download reliability/rate limits.

Why

Most “AI troubleshooting” tools hallucinate answers when evidence is missing.

This project does the opposite:

Enforces evidence scope
Detects evidence gaps
Separates baseline vs incident behavior
Guides the next diagnostic step
Refuses to guess

The result is an agent that behaves like a senior infrastructure engineer, not a chatbot.

What this repo demonstrates

Agent architectures

Single Agent – plain LLM reasoning (no tools)
ReAct Agent – reasoning + tool use
RAG Agent – retrieval‑augmented reasoning
Multi‑Agent – orchestration patterns

Runtime reasoning (core focus)

Linux boot and system logs
OpenStack control‑plane and service logs
Baseline vs abnormal comparison
Subsystem‑aware diagnostics (API, compute, MQ, DB)
Cross‑layer reasoning (host ↔ control plane)

Knowledge reasoning (secondary)

arXiv research corpus
Isolated from runtime evidence
Explicit domain selection

Key design principles

1. Evidence‑first reasoning

Agents reason only from retrieved runtime evidence. If data is missing, the agent says so.

2. Explicit scope & gaps

Every answer declares:

what evidence was used
what evidence is missing
what would be needed next

3. No hallucination paths

If logs do not support a conclusion, no conclusion is drawn.

4. Runtime ≠ Knowledge

Operational logs and research papers are treated as separate epistemic domains.

Repository structure

Agents/
├── core/                # Agent kernel, ReAct logic, tool routing
├── rag/                 # Retrieval layers (Linux, OpenStack, arXiv, comparison)
├── sources/             # Ingestion & normalization code (Linux / OpenStack)
│   ├── linux/
│   └── openstack/
├── data/                # Runtime artifacts & evidence (gitignored)
│   ├── arxiv_index/     # FAISS index for research knowledge
│   ├── linux_index/     # FAISS index for Linux runtime logs
│   └── sources/
│       ├── linux/       # Raw Linux logs
│       └── openstack/   # Raw OpenStack logs (normal / abnormal)
├── execution/           # LLM runtimes (local / OpenAI)
├── examples/            # Runnable agent demos (single / ReAct / RAG / multi)
├── orchestration/       # Multi-agent / graph experiments
├── tests/               # Kernel and routing tests
├── run.py               # Unified CLI entrypoint
└── README.md

Agent Modes

Single Agent

Direct question → LLM answer.

python run.py --mode single --query "What is OpenStack?"

Used only as a baseline.

ReAct Agent (Core of the Project)

A reasoning loop that:

thinks
retrieves evidence
observes
reasons again

python run.py --mode react --domain runtime \
  --query "Why is OpenStack unstable?"

This is where most of the interesting work happens.

RAG Agent

Classic retrieval-augmented generation over a document corpus (e.g. arXiv).

python run.py --mode rag --domain knowledge \
  --query "Why do distributed systems fail?"

Used to explain why a pattern is known — never to invent runtime facts.

Multi-Agent (Optional)

Combines runtime + knowledge agents.

Runtime answers what is happening. Knowledge explains why this pattern is known.

Runtime Diagnostic Mode (The Interesting Part)

Baseline vs Abnormal Comparison

python run.py --mode react --domain runtime \
  --compare normal abnormal \
  --query "Why is OpenStack unstable?"

This forces the agent to:

retrieve normal OpenStack behavior
retrieve abnormal OpenStack behavior
reason only over differences

Environmental facts shared by both baselines are not allowed as causes.

Adding Live Incident Logs

python run.py --mode react --domain runtime \
  --compare normal abnormal \
  --current-logs data/incidents/current.log \
  --query "What is wrong with my cloud-init?"

Key design rule:

Current logs are ephemeral context, not indexed knowledge.

They are injected once, reasoned over, and discarded.

Evidence Hygiene (Why This Is Different)

This project enforces several non-negotiable rules:

Evidence-Bounded Answers

If the logs don’t show it, the agent won’t invent it.

Absence Is a Signal

If a subsystem should emit logs but doesn’t, the agent can conclude:

“This likely never ran.”

Causal Isolation

In comparison mode:

Only differences between normal and abnormal baselines may be causal.

Old hardware, low RAM, or kernel quirks shared by both baselines are suppressed.

Explicit Evidence Gaps

When evidence is missing, the agent says so and lists what would be needed next.

This is the opposite of hallucination.

LLM Backends (User-Friendly by Design)

The CLI exposes capability tiers, not model internals:

python run.py --llm tiny     # very fast smoke tests
python run.py --llm mistral  # log summarisation
python run.py --llm llama3   # decent local reasoning
python run.py --llm phi3     # strong local diagnostics
python run.py --llm qwen14   # stronger local diagnostics
python run.py --llm openai   # strongest reasoning

Under the hood (via Ollama):

CLI flag	Actual model
tiny	phi3:mini
mistral	mistral
llama3	llama3:8b
phi3	phi3:medium
qwen14	qwen2.5:14b

Missing local models are downloaded automatically by Ollama.

Troubleshooting

ModuleNotFoundError: `llama_index.embeddings.huggingface`

Install all requirements (includes plugin packages):

source venv/bin/activate
pip install -r requirements.txt

If needed, install missing plugins directly:

pip install llama-index-embeddings-huggingface llama-index-vector-stores-faiss

`OPENAI_API_KEY` errors or auth failures

Verify .env exists in repo root.
Verify OPENAI_API_KEY is set correctly.
Re-run with --llm openai.

Ollama errors (`command not found`, model not available, empty output)

Confirm Ollama is installed and running.
Pull/run a model manually:

ollama run mistral

Cloud-init diagnosis returns "no evidence found"

This usually means the current evidence set does not include cloud-init output. Collect and pass richer incident logs (for example cloud-init, systemd, and relevant OpenStack service logs), then rerun:

python run.py --mode react --domain runtime \
  --compare normal abnormal \
  --current-logs data/incidents/current.log \
  --query "What is wrong with my cloud-init?"

Why Multiple Models Matter

Small local models are great for:

speed
summaries
iteration

They are not good at:

causal reasoning
absence-of-evidence inference
epistemic constraints

This project is designed to expose those limits, not hide them.

You can route serious diagnostic questions to stronger models without changing agent logic.

Reproducibility & Determinism

No hidden state
No silent retries
No background learning
Every decision is inspectable

If the agent gives a bad answer, you can trace why.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
core		core
docs		docs
examples/best_path		examples/best_path
execution		execution
multiagent		multiagent
orchestration		orchestration
rag		rag
sources		sources
tests		tests
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py
unified_agent.py		unified_agent.py

Folders and files

Latest commit

History

Repository files navigation

AI Agents Playground — Runtime & Infrastructure Reasoning

Setup

Configuration

Why

What this repo demonstrates

Agent architectures

Runtime reasoning (core focus)

Knowledge reasoning (secondary)

Key design principles

1. Evidence‑first reasoning

2. Explicit scope & gaps

3. No hallucination paths

4. Runtime ≠ Knowledge

Repository structure

Agent Modes

Single Agent

ReAct Agent (Core of the Project)

RAG Agent

Multi-Agent (Optional)

Runtime Diagnostic Mode (The Interesting Part)

Baseline vs Abnormal Comparison

Adding Live Incident Logs

Evidence Hygiene (Why This Is Different)

Evidence-Bounded Answers

Absence Is a Signal

Causal Isolation

Explicit Evidence Gaps

LLM Backends (User-Friendly by Design)

Troubleshooting

ModuleNotFoundError: llama_index.embeddings.huggingface

OPENAI_API_KEY errors or auth failures

Ollama errors (command not found, model not available, empty output)

Cloud-init diagnosis returns "no evidence found"

Why Multiple Models Matter

Reproducibility & Determinism

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

ModuleNotFoundError: `llama_index.embeddings.huggingface`

`OPENAI_API_KEY` errors or auth failures

Ollama errors (`command not found`, model not available, empty output)

Packages