Skip to content

miharcan/ai_agents

Repository files navigation

AI Agents Playground — Runtime & Infrastructure Reasoning

A minimal, end‑to‑end demonstration of modern agent architectures for runtime and infrastructure diagnostics, built around real system logs, OpenStack control‑plane events, and swappable LLM backends.

This repository is a reference implementation showing how agents can reason over operational evidence.


Setup

cd Agents
python -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

If you plan to use local models, install and run Ollama first:

ollama --version

Configuration

Create a .env file in the repo root:

OPENAI_API_KEY=your_openai_api_key_here

Notes:

  • OPENAI_API_KEY is required when --llm openai is used.
  • Ollama backends (tiny, mistral, llama3, phi3, qwen14) require the ollama CLI.
  • Optional: set HF_TOKEN to improve Hugging Face download reliability/rate limits.

Why

Most “AI troubleshooting” tools hallucinate answers when evidence is missing.

This project does the opposite:

  • Enforces evidence scope
  • Detects evidence gaps
  • Separates baseline vs incident behavior
  • Guides the next diagnostic step
  • Refuses to guess

The result is an agent that behaves like a senior infrastructure engineer, not a chatbot.


What this repo demonstrates

Agent architectures

  • Single Agent – plain LLM reasoning (no tools)
  • ReAct Agent – reasoning + tool use
  • RAG Agent – retrieval‑augmented reasoning
  • Multi‑Agent – orchestration patterns

Runtime reasoning (core focus)

  • Linux boot and system logs
  • OpenStack control‑plane and service logs
  • Baseline vs abnormal comparison
  • Subsystem‑aware diagnostics (API, compute, MQ, DB)
  • Cross‑layer reasoning (host ↔ control plane)

Knowledge reasoning (secondary)

  • arXiv research corpus
  • Isolated from runtime evidence
  • Explicit domain selection

Key design principles

1. Evidence‑first reasoning

Agents reason only from retrieved runtime evidence. If data is missing, the agent says so.

2. Explicit scope & gaps

Every answer declares:

  • what evidence was used
  • what evidence is missing
  • what would be needed next

3. No hallucination paths

If logs do not support a conclusion, no conclusion is drawn.

4. Runtime ≠ Knowledge

Operational logs and research papers are treated as separate epistemic domains.


Repository structure

Agents/
├── core/                # Agent kernel, ReAct logic, tool routing
├── rag/                 # Retrieval layers (Linux, OpenStack, arXiv, comparison)
├── sources/             # Ingestion & normalization code (Linux / OpenStack)
│   ├── linux/
│   └── openstack/
├── data/                # Runtime artifacts & evidence (gitignored)
│   ├── arxiv_index/     # FAISS index for research knowledge
│   ├── linux_index/     # FAISS index for Linux runtime logs
│   └── sources/
│       ├── linux/       # Raw Linux logs
│       └── openstack/   # Raw OpenStack logs (normal / abnormal)
├── execution/           # LLM runtimes (local / OpenAI)
├── examples/            # Runnable agent demos (single / ReAct / RAG / multi)
├── orchestration/       # Multi-agent / graph experiments
├── tests/               # Kernel and routing tests
├── run.py               # Unified CLI entrypoint
└── README.md

Agent Modes

Single Agent

Direct question → LLM answer.

python run.py --mode single --query "What is OpenStack?"

Used only as a baseline.


ReAct Agent (Core of the Project)

A reasoning loop that:

  • thinks
  • retrieves evidence
  • observes
  • reasons again
python run.py --mode react --domain runtime \
  --query "Why is OpenStack unstable?"

This is where most of the interesting work happens.


RAG Agent

Classic retrieval-augmented generation over a document corpus (e.g. arXiv).

python run.py --mode rag --domain knowledge \
  --query "Why do distributed systems fail?"

Used to explain why a pattern is known — never to invent runtime facts.


Multi-Agent (Optional)

Combines runtime + knowledge agents.

Runtime answers what is happening. Knowledge explains why this pattern is known.


Runtime Diagnostic Mode (The Interesting Part)

Baseline vs Abnormal Comparison

python run.py --mode react --domain runtime \
  --compare normal abnormal \
  --query "Why is OpenStack unstable?"

This forces the agent to:

  • retrieve normal OpenStack behavior
  • retrieve abnormal OpenStack behavior
  • reason only over differences

Environmental facts shared by both baselines are not allowed as causes.


Adding Live Incident Logs

python run.py --mode react --domain runtime \
  --compare normal abnormal \
  --current-logs data/incidents/current.log \
  --query "What is wrong with my cloud-init?"

Key design rule:

Current logs are ephemeral context, not indexed knowledge.

They are injected once, reasoned over, and discarded.


Evidence Hygiene (Why This Is Different)

This project enforces several non-negotiable rules:

Evidence-Bounded Answers

If the logs don’t show it, the agent won’t invent it.

Absence Is a Signal

If a subsystem should emit logs but doesn’t, the agent can conclude:

“This likely never ran.”

Causal Isolation

In comparison mode:

Only differences between normal and abnormal baselines may be causal.

Old hardware, low RAM, or kernel quirks shared by both baselines are suppressed.

Explicit Evidence Gaps

When evidence is missing, the agent says so and lists what would be needed next.

This is the opposite of hallucination.


LLM Backends (User-Friendly by Design)

The CLI exposes capability tiers, not model internals:

python run.py --llm tiny     # very fast smoke tests
python run.py --llm mistral  # log summarisation
python run.py --llm llama3   # decent local reasoning
python run.py --llm phi3     # strong local diagnostics
python run.py --llm qwen14   # stronger local diagnostics
python run.py --llm openai   # strongest reasoning

Under the hood (via Ollama):

CLI flag Actual model
tiny phi3:mini
mistral mistral
llama3 llama3:8b
phi3 phi3:medium
qwen14 qwen2.5:14b

Missing local models are downloaded automatically by Ollama.


Troubleshooting

ModuleNotFoundError: llama_index.embeddings.huggingface

Install all requirements (includes plugin packages):

source venv/bin/activate
pip install -r requirements.txt

If needed, install missing plugins directly:

pip install llama-index-embeddings-huggingface llama-index-vector-stores-faiss

OPENAI_API_KEY errors or auth failures

  • Verify .env exists in repo root.
  • Verify OPENAI_API_KEY is set correctly.
  • Re-run with --llm openai.

Ollama errors (command not found, model not available, empty output)

  • Confirm Ollama is installed and running.
  • Pull/run a model manually:
ollama run mistral

Cloud-init diagnosis returns "no evidence found"

This usually means the current evidence set does not include cloud-init output. Collect and pass richer incident logs (for example cloud-init, systemd, and relevant OpenStack service logs), then rerun:

python run.py --mode react --domain runtime \
  --compare normal abnormal \
  --current-logs data/incidents/current.log \
  --query "What is wrong with my cloud-init?"

Why Multiple Models Matter

Small local models are great for:

  • speed
  • summaries
  • iteration

They are not good at:

  • causal reasoning
  • absence-of-evidence inference
  • epistemic constraints

This project is designed to expose those limits, not hide them.

You can route serious diagnostic questions to stronger models without changing agent logic.


Reproducibility & Determinism

  • No hidden state
  • No silent retries
  • No background learning
  • Every decision is inspectable

If the agent gives a bad answer, you can trace why.


License

MIT

About

A modular agent framework for OpenStack-centric runtime analytics, reasoning over Linux and OpenStack logs using modern agent architectures.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages