This repository contains the source code for the Eternium Agency: a modular, multi-agent system designed to automate, monitor, and manage your homelab infrastructure through natural language commands.
The Eternium Agency is an AI-powered crew of specialist agents orchestrated by a central coordinator. Instead of writing complex scripts or navigating multiple dashboards, you can simply ask the agency to perform tasks. It understands your intent and delegates the work to the right specialist, whether it's checking your Kubernetes cluster, managing your container registry, or querying performance metrics.
The system is built using Python and the Google Agent Development Kit (ADK), and it is designed to be highly modular and configurable.
The agency operates on a hierarchical model. A central Coordinator agent acts as a manager, interpreting user requests and delegating them to the appropriate specialist sub-agent.
- Modular Agent Architecture: Easily enable or disable agents for different services via configuration.
- Natural Language Interface: Interact with your entire homelab through a simple chat interface.
- Multi-Step Reasoning: Capable of complex, multi-step, multi-agent workflows to answer questions or perform tasks.
- Persistent Memory: The Memory Agent uses a Milvus vector database to learn and recall information across restarts.
- Service Integrations: Out-of-the-box support for Kubernetes, Docker, Helm, Prometheus, Harbor, MySQL, and more.
- Containerised & Deployable: Comes with a
Dockerfileand a Helm chart for easy deployment into any Kubernetes cluster.
The Eternium agency is composed of a Coordinator and a team of specialist sub-agents, each with a distinct role and set of capabilities.
| Agent Name | Description | Key Responsibilities & Tools |
|---|---|---|
eternium_coordinator |
The Manager & Dispatcher. It interprets all user requests and delegates tasks to the appropriate specialist | - Natural Language Understanding - Intent classification - Task delegation to sub-agents - Final response synthesis |
kubernetes_expert |
The Infrastructure Engineer. It interacts directly with the Kubernetes API to manage and diagnose cluster resources | - List/describe pods, deployments, etc. - Get container logs - Scale deployments - Patch image versions |
helm_operator |
The Application Lifecycle Manager. It manages applications deployed via Helm charts | - List installed applications (releases) - Get release status and history - Upgrade applications to new versions |
docker_agent |
The Image Logistics Operator. It manipulates local container images on the host machine | - docker pull from public registries- docker tag images for the local registry- docker push images to the local registry |
registry_inspector |
The Quality Assurance & Security Officer. It reads data from the Harbor container registry | - List projects and repositories - List image tags (versions) - Get vulnerability reports - Trigger new image scans |
prometheus_analyser |
The Monitoring Analyst. It translates natural language questions into precise PromQL queries | - Formulate and execute PromQL queries - Retrieve metrics for CPU, memory, etc. - Check the status of monitored services |
mysql_dba |
The Database Administrator. It performs administrative tasks on a MySQL server | - Create/drop databases and users - Grant/revoke permissions - Execute raw SQL queries - Perform database backups |
memory_agent |
The Librarian & Archivist. It manages the agency's persistent, long-term memory | - Store new facts (add_to_memory)- Retrieve relevant context ( query_memory)- Find/delete old memories |
Before running the agency, ensure the following services are running and accessible:
- A Kubernetes cluster
- A container runtime on your local machine (Docker) for image manipulation tools.
- A Harbor container registry.
- A Prometheus instance for metrics.
- A Milvus vector database for memory.
- An LLM provider (e.g., a local Ollama server or an OpenAI API key).
All configuration is managed through a single .env file in the project root.
- Copy the example file:
cp .env.example .env - Edit the
.envfile with the URLs and credentials for your specific services.
It is recommended to use a Python virtual environment.
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtThis command starts the Uvicorn web server.
python main.pyYou can now access the ADK Web UI at http://localhost:8000/dev-ui/?app=eternium
# Build the container image
docker build -t eternium-agent .
# Run the container, passing in your .env file for configuration
docker run --rm -it --env-file .env -p 8000:8000 eternium-agentThe most robust way to deploy the agency is with the provided Helm chart located in the /charts/eternium-agent directory.
- Customise Values: Create a
my-values.yamlfile and override any default settings from the chart'svalues.yaml, especially theconfigandsecretssections. - Install the Chart:
helm install eternium ./charts/eternium-agent --namespace my-agents --create-namespace -f my-values.yaml
Once the agent is running, you can interact with it via the ADK Web UI. Here are some example queries you can try:
- Simple Query:
List all the namespaces in my cluster. - Memory Query:
Please remember that the on-call engineer for the database team is 'Alice'. - Follow-up Memory Query:
Who is the on-call for the database team? - Multi-Step Diagnostic:
My 'prowlarr' pod in the 'media' namespace is crashing. Can you find out why? - Multi-Agent, Cross-Domain Query:
What version of 'nginx' is running in the 'ingress' namespace, and does that image have any critical vulnerabilities in Harbor?
- Store all sensitive credentials in your
.envfile locally or as Kubernetes Secrets when deploying via Helm. Never commit your.envfile to Git. - The application runs as a non-root user (
appuser) inside the container for enhanced security.
Feel free to open issues or submit pull requests for new agents or improvements!
I'll be looking to extend this in the future with potentially (but not limited to)...
- Alertmanager Agent – View, group, and acknowledge active alerts from Prometheus Alertmanager
- Loki Agent – Search logs, summarise errors, correlate with services
- Terraform Agent – Inspect infrastructure state, run
planorvalidatevia APIs - Ansible Agent – Trigger playbooks, inspect inventories or host variables
- Falco Agent – Display runtime security alerts and suspicious behaviour
- Trivy Agent – Report on container CVEs, Git repo scans, and SBOMs
- Self-Healing Agent – Detect problems and propose (or trigger) remediations
- Knowledge Agent – Provide RAG-based Q&A from internal docs and wikis
- Self-Healing Agent – Detect problems and propose (or trigger) remediations
- Planner Agent – Multi-step LLM planner for coordinated DevOps actions
- etc...
Possibilities are endless...
This project is licensed under the MIT License with a Non-Commercial Addendum.
- Personal, educational, and homelab use: Free and unrestricted, in accordance with the MIT License.
- Commercial use: Not permitted without explicit, written permission from the copyright holder.
If you wish to use this project or its code in a commercial setting, please contact GizzmoShifu to discuss licensing terms.
Please see the LICENSE file for full details.
