diff --git a/content/learning-paths/laptops-and-desktops/dgx_persistent_agent/1_introduce.md b/content/learning-paths/laptops-and-desktops/dgx_persistent_agent/1_introduce.md new file mode 100644 index 0000000000..ccf3810e49 --- /dev/null +++ b/content/learning-paths/laptops-and-desktops/dgx_persistent_agent/1_introduce.md @@ -0,0 +1,259 @@ +--- +title: Understand Persistent AI Runtime Architecture +weight: 2 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Understand Persistent AI Runtime Architecture + +In this Learning Path, you will build a ***persistent local AI runtime*** on NVIDIA [DGX Spark](https://www.nvidia.com/en-gb/products/workstations/dgx-spark/). The implementation is validated on DGX Spark, but the architecture also applies to other ***Arm Cortex-A platforms*** that can run containerized services and local AI runtimes. + +The final system is not a single chatbot process. It is a set of local services that run continuously, share a workspace, react to file events, generate summaries, create embeddings, store vector memory, retrieve context, and periodically reason about the state of the workspace. + +The core idea is: ***AI systems are orchestration systems, not just inference systems.*** + +DGX Spark is well suited to this type of workload because it combines ***Arm CPU orchestration*** with local GPU acceleration. In the [Grace Blackwell architecture](https://learn.arm.com/learning-paths/laptops-and-desktops/dgx_spark_llamacpp/1_gb10_introduction/), the Arm Grace CPU coordinates background services, filesystem events, scheduling, document processing, metadata handling, and service-to-service communication. The Blackwell GPU accelerates ***local LLM inference***, token generation, summarization, and embedding generation. + +By the end of this Learning Path, you will have a local runtime with these capabilities: + +| Capability | Runtime component | +|---|---| +| Local LLM inference | [Ollama](https://ollama.com/) | +| Persistent vector memory | [Qdrant](https://qdrant.tech/) | +| Workspace orchestration | Hermes Agent | +| Browser-based interaction | [Open WebUI](https://github.com/open-webui/open-webui) | +| Semantic retrieval | Hermes Agent + Qdrant + Ollama | +| Autonomous workspace cognition | Hermes Agent + Ollama | + +## Runtime Architecture Overview + +The runtime uses four containerized services: + +- Hermes Agent +- Ollama +- Qdrant +- Open WebUI + +These services communicate over a local Docker network and share a persistent workspace on the host. + +```text ++------------------+ HTTP API +------------------------------+ +| Open WebUI | -----------------> | Ollama Container | +| User interface | | Local inference runtime | ++------------------+ +--------------^---------------+ + | + | inference + | embeddings + | + +----------------+---------------+ + | Hermes Container | + | CPU-side orchestration | + +----------+-------------+-------+ + | | + | files | vectors + | events | metadata + v v + +----------+----+ +----+----------+ + | Shared | | Qdrant | + | workspace | | vector memory | + +---------------+ +---------------+ +``` + +The important architectural pattern is ***separation of responsibilities***. Each service has a narrow role, and Hermes coordinates the overall workflow. + +| Layer | Service | Purpose | +|---|---|---| +| Interaction layer | Open WebUI | Provides browser-based access to local models | +| Inference layer | Ollama | Runs local language and embedding models | +| Memory layer | Qdrant | Stores and searches vector memory | +| Orchestration layer | Hermes Agent | Watches files, schedules work, coordinates services | + +## Runtime Components + +### Hermes Runtime + +Hermes is the ***orchestration runtime*** you will build in this Learning Path. + +It runs as a persistent Python service inside a container. It watches the shared workspace, detects new files, reads documents, sends requests to Ollama, stores memory in Qdrant, performs semantic retrieval, and later generates autonomous workspace summaries. + +Hermes is responsible for: + +- Filesystem monitoring +- Workflow orchestration +- Runtime scheduling +- Document parsing +- Prompt preparation +- Inference coordination +- Memory coordination +- Autonomous cognition + +Hermes does not run the language model itself. Instead, it coordinates AI workflows across local services. + +This is the main CPU-side workload in the system. The Arm CPU keeps the runtime alive, schedules background loops, tracks file events, moves data between services, and manages runtime state. + +### Ollama Runtime + +Ollama provides the local inference runtime in this Learning Path. It is used because it is a convenient way to run local models and expose a simple API, but the architecture is not limited to Ollama. + +Conceptually, Ollama is one possible ***inference backend***. Hermes can orchestrate any local or remote inference service that exposes a compatible API, such as llama.cpp server, vLLM, a custom PyTorch service, or another model runtime. + +In this Learning Path, Hermes uses Ollama for two types of model calls: + +- Chat completion, using [`qwen2.5:7b`](https://huggingface.co/Qwen/Qwen2.5-7B) +- Embedding generation, using [`nomic-embed-text`](https://ollama.com/library/nomic-embed-text) + + +The chat model is used to summarize files, answer questions over retrieved memory, and generate workspace-level insights. The embedding model converts text into vectors so Qdrant can store and search semantic memory. + +Ollama is responsible for: + +- Local LLM inference +- Token generation +- AI summarization +- Embedding generation + +Ollama does not watch files, manage memory, or decide when work should happen. It provides model execution, and Hermes calls it when the workflow requires inference. + +### Qdrant Memory Service + +Qdrant provides ***persistent vector memory***. + +Hermes stores document embeddings in a Qdrant collection named `workspace_memory`. Each stored point includes a vector and payload metadata, such as the document path, generated summary, and source content excerpt. + +Qdrant is responsible for: + +- Vector storage +- Semantic indexing +- Similarity search +- Long-term memory persistence +- Contextual retrieval + +Qdrant does not perform LLM inference. It stores vectors and returns semantically similar memories when Hermes performs a retrieval query. + +### Open WebUI + +Open WebUI provides a local browser interface for interacting with the Ollama runtime. + +It is useful for validating that local models are available, testing prompts, and giving users a simple interface to local inference. In this Learning Path, Open WebUI is not the orchestration layer and it is not the memory system. + +Open WebUI is responsible for: + +- Browser-based access +- Local chat interaction +- Model testing and exploration + +The persistent AI runtime is still coordinated by Hermes. + +## Shared Workspace + +The services use a ***shared workspace*** mounted into the containers. + +The workspace structure is: + +```text +workspace/ +|-- inbox/ +|-- memory/ +|-- logs/ +|-- processed/ +`-- config/ +``` + +Each directory has a specific purpose: + +| Directory | Purpose | +|---|---| +| `workspace/inbox/` | Input files monitored by Hermes | +| `workspace/memory/` | Generated memory artifacts and workspace summaries | +| `workspace/logs/` | Runtime logs and diagnostics | +| `workspace/processed/` | Optional location for processed files | +| `workspace/config/` | Runtime policy configuration | + +The shared workspace is what turns isolated containers into a coordinated local AI runtime. Hermes can observe files created on the host, use Ollama to process them, store memory in Qdrant, and write results back to persistent storage. + +## Event-driven AI Workflows + +Persistent AI systems are long-running systems. They do not wait for a single prompt and then exit. They monitor runtime state and react when something changes. + +In this Learning Path, Hermes starts with a filesystem watcher: + +```text +[New document] -> [Filesystem event] -> [Hermes orchestration] -> [Document processing] +``` + +As you add capabilities, the workflow grows: + +```text +[New document] + -> [CPU watcher] + -> [Document parsing] + -> [GPU summarization] + -> [GPU embedding] + -> [Qdrant memory] +``` + +This event-driven design is important because it shows how AI systems become continuous local runtimes. The model is only one part of the system. The surrounding runtime decides when to call the model, what context to provide, where to store results, and how later workflows can reuse those results. + +## Semantic Memory and Retrieval + +***Semantic memory*** gives the runtime a way to retain information over time. + +| Flow | Runtime path | +|---|---| +| Store memory | `[Document] -> [Summary] -> [Embedding] -> [Qdrant vector storage]` | +| Retrieve memory | `[Question] -> [Query embedding] -> [Similarity search] -> [Contextual response]` | + +This is different from storing plain text files and searching for keywords. Vector search allows the runtime to retrieve content based on semantic similarity. For example, a question about "CPU scheduling" can retrieve a document that discusses "runtime orchestration" even if the exact words are different. + +## Autonomous Workspace Cognition + +The final stage of this Learning Path adds autonomous workspace cognition. + +Instead of responding only when a new file appears or when a query is submitted, Hermes periodically reviews the accumulated semantic memory and generates a workspace-level summary. + +The cognition workflow is: + +```text +[Semantic memory] -> [Scheduled analysis] -> [Workspace summary] -> [Runtime insights] +``` + +Runtime behavior is controlled by a configuration file: + +```text +/workspace/config/runtime.json +``` + +This allows the runtime to adjust settings such as supported file extensions, retrieval depth, summary interval, and summary output path without hardcoding every behavior into the agent. + +## CPU and GPU Responsibilities + +This Learning Path highlights heterogeneous AI computing. The CPU and GPU both matter, but they perform different roles. + +The Arm Grace CPU coordinates persistent runtime work: + +- Filesystem monitoring +- Event scheduling +- Runtime orchestration +- Background service coordination +- Document parsing +- Metadata management +- Vector database coordination +- Runtime policy loading +- Long-running process lifecycle management + +The Blackwell GPU accelerates model execution: + +- Local LLM inference +- Token generation +- AI summarization +- Embedding generation +- Contextual reasoning +- Workspace summary generation + +This separation is central to the architecture. The GPU accelerates model-heavy operations, while the CPU keeps the distributed AI runtime organized and continuously operating. + +## Next Step + +Next, you will build the DGX Spark runtime foundation: Docker, GPU-enabled containers, the shared workspace, and the initial Ollama, Qdrant, and Open WebUI services. diff --git a/content/learning-paths/laptops-and-desktops/dgx_persistent_agent/2_build.md b/content/learning-paths/laptops-and-desktops/dgx_persistent_agent/2_build.md new file mode 100644 index 0000000000..b973fa27bb --- /dev/null +++ b/content/learning-paths/laptops-and-desktops/dgx_persistent_agent/2_build.md @@ -0,0 +1,585 @@ +--- +title: Build the DGX Spark AI Runtime Foundation +weight: 3 +layout: "learningpathall" +--- + +## Build the DGX Spark AI Runtime Foundation + +In this section, you will prepare the ***base runtime*** used by the rest of the Learning Path. + +You will install ***Docker***, configure ***GPU-enabled containers***, create a ***persistent workspace***, and start the initial runtime service stack: + +- Ollama for local inference +- Qdrant for vector memory +- Open WebUI for browser-based model access + +***Hermes Agent*** is added in the next section. This section builds the local infrastructure it depends on. + +## Verify the DGX Spark Environment + +Start by verifying that your DGX Spark system exposes the expected Arm CPU and NVIDIA GPU environment. + +Check the CPU architecture: + +```bash +uname -m +``` + +The expected output is: + +```text +aarch64 +``` + +This confirms that you are running on an Arm64 environment. + +Check the Linux distribution: + +```bash +lsb_release -a +``` + +Check that the NVIDIA GPU and CUDA driver stack are visible: + +```bash +nvidia-smi +``` + +Confirm that the command shows the GPU, driver version, and CUDA version. Later, you will run the same command from inside a container to verify GPU passthrough. + +Example output: + +```text +nvidia-smi +Wed May 20 18:12:05 2026 ++-----------------------------------------------------------------------------------------+ +| NVIDIA-SMI 580.95.05 Driver Version: 580.95.05 CUDA Version: 13.0 | ++-----------------------------------------+------------------------+----------------------+ +| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | +| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | +| | | MIG M. | +|=========================================+========================+======================| +| 0 NVIDIA GB10 On | 0000000F:01:00.0 Off | N/A | +| N/A 36C P8 4W / N/A | Not Supported | 0% Default | +| | | N/A | ++-----------------------------------------+------------------------+----------------------+ + ++-----------------------------------------------------------------------------------------+ +| Processes: | +| GPU GI CI PID Type Process name GPU Memory | +| ID ID Usage | +|=========================================================================================| +| 0 N/A N/A 3565 G /usr/lib/xorg/Xorg 137MiB | +| 0 N/A N/A 3776 G /usr/bin/gnome-shell 164MiB | +| 0 N/A N/A 5115 G .../8305/usr/lib/firefox/firefox 239MiB | +| 0 N/A N/A 85940 G ...m Performix/arm-performix-gui 54MiB | ++-----------------------------------------------------------------------------------------+ +``` + +## Install Docker + +Install the packages needed to add the Docker repository: + +```bash +sudo apt update +sudo apt install -y \ + ca-certificates \ + curl \ + gnupg \ + lsb-release +``` + +Add the Docker GPG key: + +```bash +curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \ +sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg +``` + +Add the Docker repository: + +```bash +echo \ +"deb [arch=$(dpkg --print-architecture) \ +signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] \ +https://download.docker.com/linux/ubuntu \ +$(lsb_release -cs) stable" | \ +sudo tee /etc/apt/sources.list.d/docker.list > /dev/null +``` + +Install Docker Engine and Docker Compose: + +```bash +sudo apt update + +sudo apt install -y \ + docker-ce \ + docker-ce-cli \ + containerd.io \ + docker-buildx-plugin \ + docker-compose-plugin +``` + +Allow your user to run Docker commands: + +```bash +sudo usermod -aG docker $USER +``` + +Apply the new group membership in the current shell: + +```bash +newgrp docker +``` + +Verify Docker: + +```bash +docker run hello-world +``` + +You should see a message confirming that Docker is installed and working. + +## Install NVIDIA Container Toolkit + +Install NVIDIA Container Toolkit so Docker containers can access the GPU. + +Add the NVIDIA Container Toolkit GPG key: + +```bash +curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \ +sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg +``` + +Add the NVIDIA Container Toolkit repository: + +```bash +curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \ +sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ +sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list +``` + +Install the toolkit: + +```bash +sudo apt update +sudo apt install -y nvidia-container-toolkit +``` + +Configure the Docker runtime: + +```bash +sudo nvidia-ctk runtime configure --runtime=docker +``` + +Restart Docker: + +```bash +sudo systemctl restart docker +``` + +## Verify GPU-enabled Containers + +Run a CUDA validation container: + +```bash +docker run --rm --gpus all \ +nvcr.io/nvidia/cuda:13.0.1-devel-ubuntu24.04 \ +nvidia-smi +``` + +If you have not pulled this image before, Docker downloads it before running `nvidia-smi`. This can take a few minutes depending on your network connection. + +```text +Unable to find image 'nvcr.io/nvidia/cuda:13.0.1-devel-ubuntu24.04' locally +13.0.1-devel-ubuntu24.04: Pulling from nvidia/cuda +03f66a4525ea: Pull complete +c03b8ec8dd33: Pull complete +cae1e96ffa7d: Pull complete +2cb956a72162: Pull complete +817eab9d3c52: Pull complete +cc43ec4c1381: Pull complete +30fc8198a31e: Pull complete +c88eadd06616: Pull complete +c7ba38867e8d: Pull complete +fd2e70db7702: Pull complete +85eb6b47da08: Pull complete +Digest: sha256:7d2f6a8c2071d911524f95061a0db363e24d27aa51ec831fcccf9e76eb72bc92 +Status: Downloaded newer image for nvcr.io/nvidia/cuda:13.0.1-devel-ubuntu24.04 + +========== +== CUDA == +========== + +CUDA Version 13.0.1 + +Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. + +This container image and its contents are governed by the NVIDIA Deep Learning Container License. +By pulling and using the container, you accept the terms and conditions of this license: +https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license + +A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience. + +Sun May 24 10:13:04 2026 ++-----------------------------------------------------------------------------------------+ +| NVIDIA-SMI 580.159.03 Driver Version: 580.159.03 CUDA Version: 13.0 | ++-----------------------------------------+------------------------+----------------------+ +| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | +| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | +| | | MIG M. | +|=========================================+========================+======================| +| 0 NVIDIA GB10 On | 0000000F:01:00.0 Off | N/A | +| N/A 44C P0 10W / N/A | Not Supported | 0% Default | +| | | N/A | ++-----------------------------------------+------------------------+----------------------+ + ++-----------------------------------------------------------------------------------------+ +| Processes: | +| GPU GI CI PID Type Process name GPU Memory | +| ID ID Usage | +|=========================================================================================| +| No running processes found | ++-----------------------------------------------------------------------------------------+ +``` + +If the command prints GPU information from inside the container, Docker GPU passthrough is working. + +At this point, your DGX Spark system can run GPU-enabled AI containers. + +## Create the Persistent Workspace + +Create the project directory: + +```bash +mkdir -p ~/dgx-hermes-agent +cd ~/dgx-hermes-agent +``` + +Create the directory structure used by the runtime: + +```bash +mkdir -p \ +workspace/inbox \ +workspace/memory \ +workspace/logs \ +workspace/processed \ +workspace/config \ +models \ +compose \ +qdrant +``` + +The workspace should now look like this: + +```text +dgx-hermes-agent/ +|-- compose/ +|-- models/ +|-- qdrant/ +|-- workspace/ +| |-- config/ +| |-- inbox/ +| |-- logs/ +| |-- memory/ +| `-- processed/ +``` + +The `workspace/` directory is shared across runtime services. Hermes will later monitor `workspace/inbox/`, write generated artifacts to `workspace/memory/`, and read runtime policies from `workspace/config/`. + +## Build the Runtime Service Stack + +Create and edit the file `~/dgx-hermes-agent/compose/docker-compose.yml`. + +Add the following content: + +```yaml +services: + + ollama: + image: ollama/ollama:latest + container_name: ollama + + ports: + - "11434:11434" + + dns: + - 8.8.8.8 + - 1.1.1.1 + + volumes: + - ../models:/root/.ollama + - ../workspace:/workspace + + deploy: + resources: + reservations: + devices: + - driver: nvidia + count: all + capabilities: [gpu] + + environment: + - NVIDIA_VISIBLE_DEVICES=all + + restart: unless-stopped + + qdrant: + image: qdrant/qdrant:latest + container_name: qdrant + + ports: + - "6333:6333" + - "6334:6334" + + volumes: + - ../qdrant:/qdrant/storage + + restart: unless-stopped + + open-webui: + image: ghcr.io/open-webui/open-webui:main + container_name: open-webui + + ports: + - "3000:8080" + + environment: + - OLLAMA_BASE_URL=http://ollama:11434 + + volumes: + - open-webui:/app/backend/data + + depends_on: + - ollama + + restart: unless-stopped + +volumes: + open-webui: +``` + +This Compose stack creates the first three runtime services. Hermes will be added as a fourth service later. + +## Runtime Service Roles + +The initial stack separates model execution, memory storage, and user interaction. + +| Service | Role | +|---|---| +| Ollama | Runs local language and embedding models | +| Qdrant | Stores persistent vector memory | +| Open WebUI | Provides a local browser interface to Ollama | + +The `models/` directory persists Ollama models on the host. The `qdrant/` directory persists vector database storage. The `workspace/` directory is mounted into Ollama now and will also be mounted into Hermes later. + +Ollama does not orchestrate workspace files by itself. The mount verification below confirms shared storage access; Hermes will become the service that reads workspace files and decides when to call Ollama. + +## Start the Runtime Stack + +If Ollama is already installed as a host service, stop it to avoid port conflicts: + +```bash +sudo systemctl stop ollama +sudo systemctl disable ollama +``` + +Start the container stack: + +```bash +cd ~/dgx-hermes-agent/compose +docker compose up -d +``` + +{{% notice Note %}} +The first `docker compose up -d` run can take several minutes because Docker needs to pull the service images. The time depends on your network speed. +{{% /notice %}} + +Verify that the containers are running: + +```bash +docker ps +``` + +You should see containers similar to: + +```text +NAME IMAGE COMMAND SERVICE CREATED STATUS PORTS +ollama ollama/ollama:latest "/bin/ollama serve" ollama 5 seconds ago Up 4 seconds 0.0.0.0:11434->11434/tcp, [::]:11434->11434/tcp +open-webui ghcr.io/open-webui/open-webui:main "bash start.sh" open-webui 4 seconds ago Up 4 seconds (health: starting) 0.0.0.0:3000->8080/tcp, [::]:3000->8080/tcp +qdrant qdrant/qdrant:latest "./entrypoint.sh" qdrant 5 seconds ago Up 4 seconds 0.0.0.0:6333-6334->6333-6334/tcp, [::]:6333-6334->6333-6334/tcp +``` + +## Verify Container Networking + +Open a shell in the Ollama container: + +```bash +docker exec -it ollama bash +``` + +Verify DNS resolution: + +```bash +getent hosts registry.ollama.ai +``` + +Example output: + +```text +root@367b013fd34c:/# getent hosts registry.ollama.ai +2606:4700:3036::6815:4be3 registry.ollama.ai +2606:4700:3034::ac43:b6e5 registry.ollama.ai +``` + +Exit the container shell: + +```bash +exit +``` + +The DNS settings in the Compose file help the container reach the Ollama model registry reliably. + +## Pull Local Models + +Open a shell in the Ollama container: + +```bash +docker exec -it ollama bash +``` + +Pull the language model used in this Learning Path: + +```bash +ollama pull qwen2.5:7b +``` + +Pull the embedding model: + +```bash +ollama pull nomic-embed-text +``` + +Exit the container: + +```bash +exit +``` + +The Learning Path uses fixed models so that the later code and validation steps remain consistent. The architecture can use other suitable models, but keep these names while following the examples in this Learning Path. + +| Model | Purpose | +|---|---| +| `qwen2.5:7b` | Local chat, summarization, reasoning | +| `nomic-embed-text` | Embedding generation for semantic memory | + +## Verify Local Inference + +Open a shell in the Ollama container: + +```bash +docker exec -it ollama bash +``` + +Run the local model: + +```bash +ollama run qwen2.5:7b +``` + +Enter a short prompt, such as: + +```text +Summarize the role of CPU orchestration for AI agent in one sentence. +``` + +After the model responds, exit the interactive model session and the container shell. + +You can also monitor GPU activity from another terminal while the model is running: + +```bash +nvtop +``` + +This validates that local inference is available before Hermes begins calling Ollama programmatically. + +## Verify Open WebUI + +Open a browser and navigate to: + +```text +http://localhost:3000 +``` + +Open WebUI should connect to the Ollama service at: + +```text +http://ollama:11434 +``` + +Use Open WebUI to confirm that the local model is available. + +## Verify Qdrant + +Open the Qdrant dashboard: + +```text +http://localhost:6333/dashboard +``` + +![Qdrant dashboard running locally before the workspace_memory collection is created#center](qdrant_dashboard.png "Qdrant Dashboard") + +Qdrant is running, but it does not contain the `workspace_memory` collection yet. Hermes creates that collection later when you add persistent memory. + +## Verify the Shared Workspace Mount + +Open another terminal on your DGX Spark system and create a test file on the host. Do not run this command inside a container. + +```bash +echo "Arm CPUs orchestrate persistent AI workflows." \ +> ~/dgx-hermes-agent/workspace/inbox/test.txt +``` + +Verify that the shared mount is visible by opening a shell in the Ollama container: + +```bash +docker exec -it ollama bash +``` + +Inside the container, run: + +```bash +ls /workspace +cat /workspace/inbox/test.txt +``` + +You should see: + +```text +drwxrwxr-x 2 1001 1001 4096 May 20 18:16 config +drwxrwxr-x 2 1001 1001 4096 May 20 18:37 inbox +drwxrwxr-x 2 1001 1001 4096 May 20 18:16 logs +drwxrwxr-x 2 1001 1001 4096 May 20 18:16 memory +drwxrwxr-x 2 1001 1001 4096 May 20 18:16 processed +``` + +And the file content: + +```text +Arm CPUs orchestrate persistent AI workflows. +``` + +Exit the container: + +```bash +exit +``` + +## Summary + +You have built the ***runtime foundation*** for the persistent local AI system. The DGX Spark environment now has Docker, Docker Compose, NVIDIA Container Toolkit, GPU-enabled containers, persistent workspace storage, and the initial Ollama, Qdrant, and Open WebUI services. + +You also verified shared workspace access, local inference, and the fixed model setup used by the later sections. + +Next, you will add Hermes Agent as the persistent orchestration runtime. diff --git a/content/learning-paths/laptops-and-desktops/dgx_persistent_agent/3_deploy_orch_runtime.md b/content/learning-paths/laptops-and-desktops/dgx_persistent_agent/3_deploy_orch_runtime.md new file mode 100644 index 0000000000..9abf2a0cf0 --- /dev/null +++ b/content/learning-paths/laptops-and-desktops/dgx_persistent_agent/3_deploy_orch_runtime.md @@ -0,0 +1,372 @@ +--- +title: Deploy Hermes Orchestration Runtime +weight: 4 +layout: "learningpathall" +--- + +## Deploy Hermes Orchestration Runtime + +In this section, you will add ***Hermes Agent*** to the runtime stack. + +The purpose of Hermes Agent is to act as the ***orchestration layer*** for the local AI runtime. It watches the workspace, detects runtime events, and coordinates the next action without requiring a user to manually run each step. + +Hermes is the ***CPU-side orchestration runtime***. It runs continuously, watches the shared workspace, and reacts when new files are created. This is the first step toward a ***persistent local AI agent***. + +In this section, Hermes does not call a language model yet. You will first build the event-driven runtime foundation: + +```text +workspace/inbox + -> Filesystem event + -> Hermes event handler + -> Content preview +``` + +Later sections add local inference, persistent memory, semantic retrieval, and autonomous cognition. + +## Create the Hermes Runtime Directory + +Return to the project root: + +```bash +cd ~/dgx-hermes-agent +``` + +Create the Hermes source directory: + +```bash +mkdir -p hermes +``` + +The project structure now includes a source directory for the orchestration runtime: + +```text +dgx-hermes-agent/ +|-- compose/ +|-- hermes/ +|-- models/ +|-- qdrant/ +`-- workspace/ +``` + +## Create the Hermes Container Image + +Create and edit the file `~/dgx-hermes-agent/hermes/Dockerfile`. + +Add the following content: + +```dockerfile +FROM python:3.11-slim + +WORKDIR /app + +RUN apt-get update && apt-get install -y \ + git \ + curl \ + build-essential \ + && rm -rf /var/lib/apt/lists/* + +RUN pip install --no-cache-dir \ + ollama \ + qdrant-client \ + watchdog \ + sentence-transformers \ + pypdf \ + python-dotenv + +COPY . /app + +CMD ["python", "-u", "agent.py"] +``` + +This image installs the dependencies used throughout the Learning Path. Some packages, such as `ollama` and `qdrant-client`, are used in later sections. Installing them now keeps the Hermes container image consistent as the runtime gains capabilities. + +The command uses `python -u`: + +```dockerfile +CMD ["python", "-u", "agent.py"] +``` + +The `-u` option enables unbuffered output. This is important for a persistent service because log messages appear immediately when you run the following command later. + +```bash +docker logs -f hermes +``` + +## Create the Hermes Runtime Service + +Create and edit the file `~/dgx-hermes-agent/hermes/agent.py`. + +Add the following content: + +```python +import os +import time +from watchdog.observers import Observer +from watchdog.events import FileSystemEventHandler + +WATCH_DIR = "/workspace/inbox" + +class WorkspaceHandler(FileSystemEventHandler): + + def on_created(self, event): + + if event.is_directory: + return + + print(f"\n[Agent] New file detected:") + print(event.src_path) + + summarize_file(event.src_path) + +def summarize_file(path): + + try: + + with open(path, "r") as f: + content = f.read() + + print("\n[Agent] File content preview:") + print(content[:500]) + + except Exception as e: + + print(f"[Agent] Error reading file: {e}") + +if __name__ == "__main__": + + print("\n[Hermes Agent] Starting workspace watcher...") + print(f"[Hermes Agent] Monitoring: {WATCH_DIR}") + + observer = Observer() + + observer.schedule( + WorkspaceHandler(), + WATCH_DIR, + recursive=False + ) + + observer.start() + + try: + + while True: + time.sleep(1) + + except KeyboardInterrupt: + + observer.stop() + + observer.join() +``` + +This first agent performs three important orchestration tasks: + +- Starts a long-running runtime process +- Watches `/workspace/inbox` +- Handles file creation events + +The `summarize_file()` function does not use an LLM yet. For now, it reads and prints the first 500 characters of the file. This validates the filesystem event pipeline before adding model inference. + +## Code Trace + +The runtime starts by defining the watched directory: + +```python +WATCH_DIR = "/workspace/inbox" +``` + +The event handler receives filesystem events: + +```python +class WorkspaceHandler(FileSystemEventHandler): + + def on_created(self, event): +``` + +Directory events are ignored: + +```python +if event.is_directory: + return +``` + +New file events are passed into the document processing function: + +```python +summarize_file(event.src_path) +``` + +The observer keeps the runtime active: + +```python +while True: + time.sleep(1) +``` + +This is the core pattern for persistent AI orchestration. The CPU keeps the service running, watches for events, and triggers work when the runtime state changes. + +## Update Docker Compose + +Open and edit the file `~/dgx-hermes-agent/compose/docker-compose.yml`. + +Add the Hermes service under `services:`: + +```yaml + hermes: + build: + context: ../hermes + + container_name: hermes + + volumes: + - ../workspace:/workspace + + environment: + - OLLAMA_HOST=http://ollama:11434 + - QDRANT_HOST=qdrant + + depends_on: + - ollama + - qdrant + + restart: unless-stopped +``` + +The final service structure should be: + +```text +services: + ollama: + qdrant: + open-webui: + hermes: +``` + +Hermes mounts the same shared workspace as the other services. It also receives environment variables for Ollama and Qdrant, which are used in later sections. + +## Build the Hermes Runtime + +Build the Hermes container: + +```bash +cd ~/dgx-hermes-agent/compose +docker compose build hermes +``` + +The first build installs the Python dependencies listed in the Dockerfile. + +Start the stack: + +```bash +docker compose up -d +``` + +Verify that the Hermes container is running: + +```bash +docker ps +``` + +You should see ***hermes*** alongside the existing runtime services: + +```text +CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES +8439b1e36b6c compose-hermes "python -u agent.py" 2 seconds ago Up 2 seconds hermes +8cb62495cb7b ghcr.io/open-webui/open-webui:main "bash start.sh" 3 hours ago Up 3 hours (healthy) 0.0.0.0:3000->8080/tcp, [::]:3000->8080/tcp open-webui +367b013fd34c ollama/ollama:latest "/bin/ollama serve" 3 hours ago Up 3 hours 0.0.0.0:11434->11434/tcp, [::]:11434->11434/tcp ollama +e770401a4a0f qdrant/qdrant:latest "./entrypoint.sh" 3 hours ago Up 3 hours 0.0.0.0:6333-6334->6333-6334/tcp, [::]:6333-6334->6333-6334/tcp qdrant +``` + +## Verify Hermes Runtime Logs + +Follow the Hermes logs: + +```bash +docker logs -f hermes +``` + +Expected output: + +```text +[Hermes Agent] Starting workspace watcher... +[Hermes Agent] Monitoring: /workspace/inbox +``` + +This confirms that Hermes started and is watching the shared inbox directory. + +## Validate Event-driven Processing + +Open a second terminal on the host and create a new test file. Use a filename that does not already exist so the `on_created()` event is triggered. + +Create the file outside the inbox first, then move the completed file into `workspace/inbox/`. This avoids triggering the filesystem event before the file content has finished writing. + +```bash +echo "Hermes watches the workspace and reacts to new files." \ +> /tmp/runtime-test.txt + +mv /tmp/runtime-test.txt \ +~/dgx-hermes-agent/workspace/inbox/runtime-test.txt +``` + +Return to the terminal that is following Hermes logs. You should see output similar to: + +```text +[Agent] New file detected: +/workspace/inbox/runtime-test.txt + +[Agent] File content preview: +Hermes watches the workspace and reacts to new files. +``` + +This validates the event-driven pipeline: + +```text +Host file write + -> Container receives filesystem event + -> Hermes on_created() handler + -> File content preview +``` + +## Verify Shared Workspace Access + +Hermes sees the host file path through the mounted container path: + +| Host path | Container path | +|---|---| +| `~/dgx-hermes-agent/workspace/inbox` | `/workspace/inbox` | + +This shared mount is what allows the host, Hermes, Ollama, and later memory workflows to operate on the same persistent runtime state. + +## Runtime Responsibilities + +Hermes is now responsible for: + +- Filesystem monitoring +- Runtime lifecycle management +- Event handling +- File reading +- Workflow triggering + +At this stage, Hermes is not performing inference, generating embeddings, or storing vectors. Those capabilities are added incrementally so you can validate each layer of the runtime. + +## CPU Orchestration Responsibilities + +This section demonstrates the CPU-side work required by persistent AI systems. + +The Arm CPU is coordinating: + +- A long-running service process +- Filesystem event monitoring +- Runtime scheduling +- File processing +- Containerized service lifecycle + +This is the foundation for the rest of the Learning Path. The GPU becomes important when model inference is added, but the persistent runtime itself is coordinated by CPU-side orchestration. + +## Summary + +You added ***Hermes Agent*** to the DGX Spark runtime stack as a persistent Python service. The runtime now has a Hermes container, a filesystem watcher, and a Docker Compose service that mounts the shared workspace. + +You also verified that creating a new file in `workspace/inbox/` triggers Hermes logs, which confirms that the ***event-driven orchestration*** path is working. + +Next, you will connect Hermes to Ollama for local LLM summarization. diff --git a/content/learning-paths/laptops-and-desktops/dgx_persistent_agent/4_local_llm.md b/content/learning-paths/laptops-and-desktops/dgx_persistent_agent/4_local_llm.md new file mode 100644 index 0000000000..0267034488 --- /dev/null +++ b/content/learning-paths/laptops-and-desktops/dgx_persistent_agent/4_local_llm.md @@ -0,0 +1,347 @@ +--- +title: Add Local LLM Inference +weight: 5 +layout: "learningpathall" +--- + +## Add Local LLM Inference + +In this section, you will connect ***Hermes Agent*** to ***Ollama***. + +The purpose of this step is to turn Hermes from a file watcher into an ***inference orchestrator***. Hermes still controls the workflow, but it now sends document content to Ollama and uses the model response as part of the runtime output. + +The runtime already watches `workspace/inbox/` and reacts when a file is created. You will now extend that workflow so Hermes sends file content to a ***local language model*** and prints an AI-generated summary. + +The workflow becomes: + +```text +workspace/inbox document + -> Hermes on_created() handler + -> Hermes calls Ollama + -> Local LLM summary +``` + +This introduces the first GPU-accelerated step in the persistent runtime. + +## Configure Ollama Runtime Access + +Hermes reaches Ollama through the Docker Compose network. + +In the Hermes Compose service, this environment variable was added earlier: + +```yaml +environment: + - OLLAMA_HOST=http://ollama:11434 +``` + +Inside the Docker network, the service name `ollama` resolves to the Ollama container. Hermes uses this URL when it creates the Ollama Python client. + +Verify that the Ollama container is running: + +```bash +cd ~/dgx-hermes-agent/compose +docker ps +``` + +You should see both ***ollama*** and ***hermes*** running. + + +## Verify the Local Language Model + +You pulled `qwen2.5:7b` when you built the runtime foundation. In this section, run a quick inference test to confirm that the model is still available inside the Ollama container: + +```bash +docker exec -it ollama ollama run qwen2.5:7b +``` + +Enter a short prompt: + +```text +Summarize persistent AI runtimes in one sentence. +``` + +Exit the model session when finished. + +## Add Inference Support to Hermes + +Open and edit the file `~/dgx-hermes-agent/hermes/agent.py`. + +Replace the file with the following version: + +```python +import os +import time +import ollama + +from watchdog.observers import Observer +from watchdog.events import FileSystemEventHandler + +WATCH_DIR = "/workspace/inbox" + +OLLAMA_HOST = os.getenv( + "OLLAMA_HOST", + "http://ollama:11434" +) + +client = ollama.Client(host=OLLAMA_HOST) + +class WorkspaceHandler(FileSystemEventHandler): + + def on_created(self, event): + + if event.is_directory: + return + + print(f"\n[Agent] New file detected:") + print(event.src_path) + + summarize_file(event.src_path) + +def summarize_file(path): + + try: + + with open(path, "r") as f: + content = f.read() + + print("\n[Agent] Running local inference...") + + response = client.chat( + model="qwen2.5:7b", + messages=[ + { + "role": "system", + "content": ( + "You are a local AI workspace assistant. " + "Summarize the document in 3 concise bullet points." + ) + }, + { + "role": "user", + "content": content[:4000] + } + ] + ) + + summary = response["message"]["content"] + + print("\n[Agent] AI Summary:") + print(summary) + + except Exception as e: + + print(f"[Agent] Error: {e}") + +if __name__ == "__main__": + + print("\n[Hermes Agent] Starting workspace watcher...") + print(f"[Hermes Agent] Monitoring: {WATCH_DIR}") + + observer = Observer() + + observer.schedule( + WorkspaceHandler(), + WATCH_DIR, + recursive=False + ) + + observer.start() + + try: + + while True: + time.sleep(1) + + except KeyboardInterrupt: + + observer.stop() + + observer.join() +``` + +## Code Trace + +This version adds the Ollama Python SDK: + +```python +import ollama +``` + +Hermes reads the Ollama endpoint from the runtime environment: + +```python +OLLAMA_HOST = os.getenv( + "OLLAMA_HOST", + "http://ollama:11434" +) +``` + +The client connects to the Ollama service: + +```python +client = ollama.Client(host=OLLAMA_HOST) +``` + +The file content is sent to the local model: + +```python +response = client.chat( + model="qwen2.5:7b", + messages=[ + { + "role": "system", + "content": ( + "You are a local AI workspace assistant. " + "Summarize the document in 3 concise bullet points." + ) + }, + { + "role": "user", + "content": content[:4000] + } + ] +) +``` + +The runtime limits the input to the first 4000 characters: + +```python +"content": content[:4000] +``` + +This keeps the initial workflow simple and avoids sending very large files to the model. + +## Rebuild Hermes + +Rebuild the Hermes container: + +```bash +cd ~/dgx-hermes-agent/compose +docker compose build hermes +``` + +Restart the runtime: + +```bash +docker compose up -d +``` + +Follow the Hermes logs: + +```bash +docker logs -f hermes +``` + +Expected startup output: + +```text +[Hermes Agent] Starting workspace watcher... +[Hermes Agent] Monitoring: /workspace/inbox +``` + +## Validate AI Summarization + +Create a new file in another terminal. Write the file outside the inbox first, then move it into `workspace/inbox/` so Hermes sees a completed file. + +```bash +cat > /tmp/ai-runtime-note.txt <<'EOF' +Persistent AI systems are not only prompt-response applications. +They run as long-lived local services that monitor events, coordinate +runtime workflows, store memory, and use GPU acceleration when model +inference is required. +EOF + +mv /tmp/ai-runtime-note.txt \ +~/dgx-hermes-agent/workspace/inbox/ai-runtime-note.txt +``` + +Return to the Hermes logs. You should see output similar to: + +```text +[Agent] New file detected: +/workspace/inbox/ai-runtime-note.txt + +[Agent] Running local inference... + +[Agent] AI Summary: +- Persistent AI systems function beyond simple prompt-response interactions, operating as ongoing local services. +- These systems monitor events, manage workflows, and maintain stored memory for extended periods. +- They utilize GPU acceleration during model inference to enhance performance. +``` + +The generated summary text will vary because it is produced by the local model. + +## Verify GPU-accelerated Inference + +To observe GPU activity during inference, use two terminals. + +In terminal 1, follow Hermes logs: + +```bash +docker logs -f hermes +``` + +In terminal 2, schedule a new file to be created after a short delay, then start `nvtop` immediately: + +```bash +( +sleep 5 +cat > /tmp/gpu-inference-test.txt <<'EOF' +DGX Spark combines Arm CPU orchestration with NVIDIA GPU acceleration. +The CPU coordinates persistent services, while the GPU accelerates local +language model inference and summarization workloads. +EOF + +mv /tmp/gpu-inference-test.txt \ +~/dgx-hermes-agent/workspace/inbox/gpu-inference-test.txt +) & +nvtop +``` + +The background command creates the file after five seconds, giving `nvtop` time to start before Ollama begins inference. During summarization, `nvtop` should show activity from the Ollama container or model runtime. This confirms that the GPU is accelerating local inference while Hermes coordinates the workflow. + +## Runtime Responsibilities + +The runtime now has a clear separation of responsibilities. + +Hermes is responsible for: + +- Filesystem monitoring +- Reading workspace files +- Preparing prompts +- Calling the Ollama API +- Printing runtime logs +- Coordinating the workflow lifecycle + +Ollama is responsible for: + +- Loading the local model +- Running token generation +- Returning the generated summary + +## CPU and GPU Responsibilities + +The Arm Grace CPU coordinates the workflow: + +- Watches the workspace +- Receives filesystem events +- Reads file content +- Prepares model requests +- Sends API calls to Ollama +- Logs runtime progress + +The Blackwell GPU accelerates the model workload: + +- Local LLM inference +- Token generation +- AI summarization + +This pattern is repeated throughout the Learning Path. Hermes orchestrates; Ollama executes model inference. + +## Summary + +You extended Hermes with ***local LLM inference*** through the Ollama Python SDK and the `OLLAMA_HOST` runtime setting. New files in the workspace can now trigger summarization with `qwen2.5:7b`, and GPU activity can be validated with `nvtop`. + +The runtime has moved from simple file detection to ***event-driven AI summarization***. + +Next, you will add persistent semantic memory with embeddings and Qdrant. diff --git a/content/learning-paths/laptops-and-desktops/dgx_persistent_agent/5_persistent_memory.md b/content/learning-paths/laptops-and-desktops/dgx_persistent_agent/5_persistent_memory.md new file mode 100644 index 0000000000..c62992d04b --- /dev/null +++ b/content/learning-paths/laptops-and-desktops/dgx_persistent_agent/5_persistent_memory.md @@ -0,0 +1,503 @@ +--- +title: Build Persistent Semantic Memory +weight: 6 +layout: "learningpathall" +--- + +## Build Persistent Semantic Memory + +In this section, you will add ***persistent semantic memory*** to Hermes Agent. + +In the previous section, Hermes became an ***inference orchestrator***: it watched the workspace, sent document content to Ollama, and printed an AI summary. This section extends that workflow so the summary and source content are no longer just log output. Hermes will encode the document as an embedding and store it in Qdrant as reusable memory. + +The runtime can already monitor files and generate summaries with a local language model. You will now generate ***embeddings*** for workspace content and store them in ***Qdrant***. + +The workflow becomes: + +```text +workspace/inbox document + -> Hermes summarizes with Ollama + -> Hermes generates embedding + -> Hermes stores vector + payload in Qdrant + -> persistent semantic memory +``` + +This turns Hermes from an event-driven summarizer into a local AI runtime with long-term memory. + +## Persistent Memory Architecture + +Semantic memory uses vector embeddings to represent document meaning. + +In this Learning Path, the memory pipeline has three services: + +| Component | Responsibility | +|---|---| +| Hermes Agent | Orchestrates ingestion, summaries, embeddings, and storage | +| Ollama | Generates summaries and embeddings | +| Qdrant | Stores vectors and metadata as persistent memory | + +The fixed embedding configuration is: + +| Component | Value | +|---|---| +| Embedding model | `nomic-embed-text` | +| Vector dimension | `768` | +| Qdrant collection | `workspace_memory` | +| Distance metric | Cosine | + +The vector dimension must match the output size of the embedding model. For `nomic-embed-text`, the collection is created with a vector size of `768`. + +For example, a document about CPU orchestration is first summarized by `qwen2.5:7b`. Hermes then sends the same document text to `nomic-embed-text`, receives a 768-dimensional embedding, and stores that vector in Qdrant with metadata such as the file path, generated summary, and source content excerpt. Later, a query about "runtime scheduling" can retrieve this memory even if the document does not contain the exact same words. + +## Pull the Embedding Model + +Open a shell in the Ollama container: + +```bash +docker exec -it ollama bash +``` + +Pull the embedding model: + +```bash +ollama pull nomic-embed-text +``` + +Exit the container: + +```bash +exit +``` + +The embedding model converts text into vectors. Qdrant stores those vectors and uses them later for semantic retrieval. + +## Add Persistent Memory to Hermes + +Open and edit the file `~/dgx-hermes-agent/hermes/agent.py`. + +Replace the file with the following version: + +```python +import os +import uuid +import time +import ollama + +from qdrant_client import QdrantClient +from qdrant_client.models import Distance, VectorParams, PointStruct + +from watchdog.observers import Observer +from watchdog.events import FileSystemEventHandler + +WATCH_DIR = "/workspace/inbox" + +SUPPORTED_EXTENSIONS = [ + ".txt", + ".md", + ".log" +] + +OLLAMA_HOST = os.getenv( + "OLLAMA_HOST", + "http://ollama:11434" +) + +QDRANT_HOST = os.getenv( + "QDRANT_HOST", + "qdrant" +) + +COLLECTION_NAME = "workspace_memory" + +client = ollama.Client(host=OLLAMA_HOST) + +qdrant = QdrantClient( + host=QDRANT_HOST, + port=6333 +) + +def ensure_collection(): + collections = qdrant.get_collections().collections + names = [c.name for c in collections] + if COLLECTION_NAME not in names: + + qdrant.create_collection( + collection_name=COLLECTION_NAME, + vectors_config=VectorParams( + size=768, + distance=Distance.COSINE + ) + ) + print(f"[Memory] Created collection: {COLLECTION_NAME}") + +class WorkspaceHandler(FileSystemEventHandler): + def on_created(self, event): + if event.is_directory: + return + + filename = os.path.basename(event.src_path) + if filename.startswith("."): + return + + ext = os.path.splitext(filename)[1] + if ext not in SUPPORTED_EXTENSIONS: + return + + print(f"\n[Agent] New file detected:") + print(event.src_path) + process_file(event.src_path) + +def generate_summary(content): + + response = client.chat( + model="qwen2.5:7b", + messages=[ + { + "role": "system", + "content": ( + "You are a local AI workspace assistant. " + "Summarize the document in 3 concise bullet points." + ) + }, + { + "role": "user", + "content": content[:4000] + } + ] + ) + return response["message"]["content"] + +def generate_embedding(content): + + response = client.embed( + model="nomic-embed-text", + input=content[:4000] + ) + return response["embeddings"][0] + +def store_memory(path, content, summary, embedding): + + point_id = str(uuid.uuid4()) + qdrant.upsert( + collection_name=COLLECTION_NAME, + points=[ + PointStruct( + id=point_id, + vector=embedding, + payload={ + "path": path, + "summary": summary, + "content": content[:4000] + } + ) + ] + ) + print(f"[Memory] Stored document: {path}") + +def process_file(path): + + try: + with open(path, "r", encoding="utf-8") as f: + content = f.read() + print("\n[Agent] Running summarization inference...") + summary = generate_summary(content) + print("\n[Agent] AI Summary:") + print(summary) + print("\n[Agent] Generating embeddings...") + + embedding = generate_embedding(content) + store_memory( + path, + content, + summary, + embedding + ) + except Exception as e: + print(f"[Agent] Error: {e}") + +if __name__ == "__main__": + + print("\n[Hermes Agent] Starting workspace watcher...") + print(f"[Hermes Agent] Monitoring: {WATCH_DIR}") + + ensure_collection() + observer = Observer() + observer.schedule( + WorkspaceHandler(), + WATCH_DIR, + recursive=False + ) + observer.start() + try: + while True: + time.sleep(1) + except KeyboardInterrupt: + observer.stop() + observer.join() +``` + +## Code Trace + +This version adds the Qdrant client: + +```python +from qdrant_client import QdrantClient +from qdrant_client.models import Distance, VectorParams, PointStruct +``` + +The runtime connects to Qdrant using the Docker service name: + +```python +QDRANT_HOST = os.getenv( + "QDRANT_HOST", + "qdrant" +) +``` + +Hermes creates a persistent memory collection if it does not exist: + +```python +qdrant.create_collection( + collection_name=COLLECTION_NAME, + vectors_config=VectorParams( + size=768, + distance=Distance.COSINE + ) +) +``` + +The embedding API uses `client.embed(...)`: + +```python +response = client.embed( + model="nomic-embed-text", + input=content[:4000] +) + +return response["embeddings"][0] +``` + +This is the current Ollama Python SDK embedding interface. The returned embedding is stored as the first item in `response["embeddings"]`. + +Hermes stores each document as a Qdrant point: + +```python +PointStruct( + id=point_id, + vector=embedding, + payload={ + "path": path, + "summary": summary, + "content": content[:4000] + } +) +``` + +The payload stores metadata alongside the vector so future retrieval results can include document context. + +## Runtime Filtering + +This version also adds basic runtime hygiene. + +Supported file extensions are defined in: + +```python +SUPPORTED_EXTENSIONS = [ + ".txt", + ".md", + ".log" +] +``` + +Hidden files are ignored: + +```python +if filename.startswith("."): + return +``` + +Unsupported extensions are ignored: + +```python +if ext not in SUPPORTED_EXTENSIONS: + return +``` + +This avoids ingesting temporary files, hidden files, and unsupported file formats. + +## Rebuild Hermes + +Rebuild the Hermes container: + +```bash +cd ~/dgx-hermes-agent/compose +docker compose build hermes +``` + +Restart the runtime: + +```bash +docker compose up -d +``` + +Follow the Hermes logs: + +```bash +docker logs -f hermes +``` + +On first startup, if the collection does not already exist, you should see: + +```text +[Memory] Created collection: workspace_memory +``` + +You should also see: + +```text +[Hermes Agent] Starting workspace watcher... +[Hermes Agent] Monitoring: /workspace/inbox +``` + +## Validate Memory Ingestion + +Create a new document. Write it outside the inbox first, then move the completed file into `workspace/inbox/` so Hermes processes a fully written document. + +```bash +cat > /tmp/memory-test.txt <<'EOF' +Persistent AI runtimes need memory so that previous workspace activity +can influence future reasoning. Semantic memory stores embeddings and +metadata so the runtime can retrieve relevant context later. +EOF + +mv /tmp/memory-test.txt \ +~/dgx-hermes-agent/workspace/inbox/memory-test.txt +``` + +Watch the Hermes logs. Expected output includes: + +```text +[Agent] New file detected: +/workspace/inbox/memory-test.txt + +[Agent] Running summarization inference... + +[Agent] AI Summary: +- Persistent AI runtimes require memory to incorporate past workspace activities into future reasoning. +- Semantic memory in AI systems retains embeddings and metadata to store relevant context. +- This stored information allows for retrieval of pertinent context, enhancing the runtime's ability to reason effectively. +``` + +Then: + +```text +[Agent] Generating embeddings... +[Memory] Stored document: /workspace/inbox/memory-test.txt +``` + +The summary text will vary because it is generated by the local model. + +## Verify Qdrant Memory + +Open the Qdrant dashboard: + +```text +http://localhost:6333/dashboard +``` + +Confirm that the `workspace_memory` collection exists: + +![Qdrant dashboard showing the workspace_memory collection#center](qdrant_dashboard_2.png "Qdrant Dashboard") + +The dashboard should show the `workspace_memory` collection after Hermes starts and runs `ensure_collection()`. If the collection does not appear, check the Hermes logs for Qdrant connection errors and confirm that the `qdrant` container is running. + +Open the collection and verify that points are being stored. Each point represents one ingested workspace document and should contain: + +- A 768-dimensional vector +- A `path` payload field +- A `summary` payload field +- A `content` payload field + +![Qdrant workspace_memory collection showing stored vectors and payload fields#center](qdrant_dashboard_3.png "Qdrant workspace_memory") + +Use this view to confirm that Qdrant has stored both the vector and payload metadata. The payload fields are important because later retrieval steps need the path and summary to assemble useful context for the LLM. + +You can also inspect collection storage and memory usage: + +![Qdrant collection storage view showing persistent memory usage#center](qdrant_dashboard_4.png "Qdrant workspace_memory") + +The memory usage view confirms that Qdrant is maintaining persistent collection state on disk. This matters because the vector memory should survive container restarts as long as the `../qdrant:/qdrant/storage` volume remains mounted. + +You can also inspect collections from the host: + +```bash +curl http://localhost:6333/collections +``` + +The response should include: + +```text +workspace_memory +``` + +## CPU and GPU Responsibilities + +The Arm Grace CPU coordinates the memory pipeline: + +- Detects new files +- Filters supported file types +- Reads file content +- Calls Ollama for summaries and embeddings +- Creates Qdrant collections +- Upserts vector points and metadata +- Keeps the long-running runtime active + +The Blackwell GPU accelerates the model workloads: + +- Summary generation with `qwen2.5:7b` +- Embedding generation with `nomic-embed-text` + +Qdrant stores the results as persistent memory. + +## Runtime Compatibility Notes + +The following compatibility notes apply to the code you added in `~/dgx-hermes-agent/hermes/agent.py`. + +Use the current Ollama embedding API inside the `generate_embedding()` function: + +```python +client.embed(...) +``` + +Read the embedding from the `embeddings` list returned by Ollama: + +```python +response["embeddings"][0] +``` + +Do not use older examples that call: + +```python +client.embeddings(...) +``` + +The Qdrant vector dimension must match the embedding model output size. For this Learning Path, use ***768*** with: ***nomic-embed-text*** + +This dimension is configured in `ensure_collection()`: + +```python +vectors_config=VectorParams( + size=768, + distance=Distance.COSINE +) +``` + +If you change the embedding model later, update the Qdrant collection dimension to match the new model output. If the dimensions do not match, Qdrant will reject the inserted vectors. + +## Summary + +You added ***persistent semantic memory*** to Hermes Agent by connecting it to Qdrant, creating the `workspace_memory` collection, generating local embeddings with Ollama, and storing vectors with document metadata. + +The runtime can now ingest documents, summarize them, generate embeddings, and preserve that context as ***persistent vector memory***. + +Next, you will add semantic retrieval and contextual question answering. diff --git a/content/learning-paths/laptops-and-desktops/dgx_persistent_agent/6_semantic_retrieval.md b/content/learning-paths/laptops-and-desktops/dgx_persistent_agent/6_semantic_retrieval.md new file mode 100644 index 0000000000..f31fc72e19 --- /dev/null +++ b/content/learning-paths/laptops-and-desktops/dgx_persistent_agent/6_semantic_retrieval.md @@ -0,0 +1,630 @@ +--- +title: Add Semantic Retrieval and Contextual Reasoning +weight: 7 +layout: "learningpathall" +--- + +## Add Semantic Retrieval and Contextual Reasoning + +In this section, you will add ***semantic retrieval*** to Hermes Agent. + +In the previous section, Hermes stored document summaries, embeddings, and metadata in Qdrant. This section turns that stored memory into an active reasoning source: Hermes will accept a question, retrieve relevant memories, assemble them as context, and send that context to the local model. + +The runtime can already ingest documents, summarize them, generate embeddings, and store semantic memory in Qdrant. You will now add a ***query workflow*** so Hermes can search memory and use retrieved context to answer questions. + +The workflow becomes: + +```text +/workspace/query.txt + -> Hermes embeds the question + -> Qdrant returns relevant memories + -> Hermes assembles context + -> Ollama generates a grounded answer +``` + +This is the first stage where Hermes uses memory as reasoning context instead of only storing it. + +## Contextual Retrieval Architecture + +The retrieval pipeline uses all three runtime services: + +| Component | Responsibility | +|---|---| +| Hermes Agent | Detects queries, generates query embeddings, assembles context | +| Ollama | Generates embeddings and contextual answers | +| Qdrant | Searches stored semantic memory | + +The runtime keeps the same fixed memory configuration: + +| Component | Value | +|---|---| +| Embedding model | `nomic-embed-text` | +| Vector dimension | `768` | +| Qdrant collection | `workspace_memory` | +| Retrieval limit | `3` | + +Hermes will watch for a query file at: + +```text +/workspace/query.txt +``` + +When the file exists, Hermes reads the question, deletes the query file, searches memory, and prints the answer in the container logs. + +## Add Retrieval Functions to Hermes + +Open and edit the file `~/dgx-hermes-agent/hermes/agent.py`. + +Replace the file with the following version: + +```python +import os +import uuid +import time +import ollama + +from qdrant_client import QdrantClient +from qdrant_client.models import ( + Distance, + VectorParams, + PointStruct +) + +from watchdog.observers import Observer +from watchdog.events import FileSystemEventHandler + +WATCH_DIR = "/workspace/inbox" + +SUPPORTED_EXTENSIONS = [ + ".txt", + ".md", + ".log" +] + +OLLAMA_HOST = os.getenv( + "OLLAMA_HOST", + "http://ollama:11434" +) + +QDRANT_HOST = os.getenv( + "QDRANT_HOST", + "qdrant" +) + +COLLECTION_NAME = "workspace_memory" + +client = ollama.Client(host=OLLAMA_HOST) + +qdrant = QdrantClient( + host=QDRANT_HOST, + port=6333 +) + +def ensure_collection(): + collections = qdrant.get_collections().collections + names = [c.name for c in collections] + if COLLECTION_NAME not in names: + qdrant.create_collection( + collection_name=COLLECTION_NAME, + vectors_config=VectorParams( + size=768, + distance=Distance.COSINE + ) + ) + print(f"[Memory] Created collection: {COLLECTION_NAME}") + +class WorkspaceHandler(FileSystemEventHandler): + def on_created(self, event): + if event.is_directory: + return + filename = os.path.basename(event.src_path) + if filename.startswith("."): + return + ext = os.path.splitext(filename)[1] + if ext not in SUPPORTED_EXTENSIONS: + return + print(f"\n[Agent] New file detected:") + print(event.src_path) + process_file(event.src_path) + +def generate_summary(content): + response = client.chat( + model="qwen2.5:7b", + messages=[ + { + "role": "system", + "content": ( + "You are a local AI workspace assistant. " + "Summarize the document in 3 concise bullet points." + ) + }, + { + "role": "user", + "content": content[:4000] + } + ] + ) + return response["message"]["content"] + +def generate_embedding(content): + response = client.embed( + model="nomic-embed-text", + input=content[:4000] + ) + return response["embeddings"][0] + +def store_memory(path, content, summary, embedding): + point_id = str(uuid.uuid4()) + qdrant.upsert( + collection_name=COLLECTION_NAME, + points=[ + PointStruct( + id=point_id, + vector=embedding, + payload={ + "path": path, + "summary": summary, + "content": content[:4000] + } + ) + ] + ) + print(f"[Memory] Stored document: {path}") + +def search_memory(query): + print("\n[Memory] Searching semantic memory...") + embedding = generate_embedding(query) + results = qdrant.query_points( + collection_name=COLLECTION_NAME, + query=embedding, + limit=3 + ).points + memories = [] + for result in results: + payload = result.payload + memories.append({ + "path": payload.get("path"), + "summary": payload.get("summary") + }) + return memories + +def query_workspace(question): + memories = search_memory(question) + context = "\n\n".join([ + f"Document: {m['path']}\nSummary:\n{m['summary']}" + for m in memories + ]) + response = client.chat( + model="qwen2.5:7b", + messages=[ + { + "role": "system", + "content": ( + "You are a persistent AI workspace assistant. " + "Answer questions using the retrieved workspace memory." + ) + }, + { + "role": "user", + "content": ( + f"Question:\n{question}\n\n" + f"Relevant workspace memory:\n{context}" + ) + } + ] + ) + answer = response["message"]["content"] + print("\n[Workspace Query]") + print(question) + print("\n[Retrieved Memories]") + print(context) + print("\n[AI Response]") + print(answer) + +def process_file(path): + try: + with open(path, "r", encoding="utf-8") as f: + content = f.read() + print("\n[Agent] Running summarization inference...") + summary = generate_summary(content) + print("\n[Agent] AI Summary:") + print(summary) + print("\n[Agent] Generating embeddings...") + embedding = generate_embedding(content) + store_memory( + path, + content, + summary, + embedding + ) + except Exception as e: + print(f"[Agent] Error: {e}") + +if __name__ == "__main__": + print("\n[Hermes Agent] Starting workspace watcher...") + print(f"[Hermes Agent] Monitoring: {WATCH_DIR}") + ensure_collection() + observer = Observer() + observer.schedule( + WorkspaceHandler(), + WATCH_DIR, + recursive=False + ) + observer.start() + try: + while True: + time.sleep(1) + if os.path.exists("/workspace/query.txt"): + with open("/workspace/query.txt", "r") as f: + question = f.read().strip() + os.remove("/workspace/query.txt") + query_workspace(question) + + except KeyboardInterrupt: + observer.stop() + observer.join() +``` + +## Code Trace + +The `search_memory()` function converts the user question into an embedding: + +```python +embedding = generate_embedding(query) +``` + +It searches Qdrant using the current Qdrant Python client API: + +```python +results = qdrant.query_points( + collection_name=COLLECTION_NAME, + query=embedding, + limit=3 +).points +``` + +The current API uses `query=embedding`. Do not use older examples that pass `query_vector=embedding`. + +Qdrant returns scored point objects. Hermes reads the payload from each result: + +```python +payload = result.payload +``` + +Only the path and summary are assembled into the retrieval context: + +```python +memories.append({ + "path": payload.get("path"), + "summary": payload.get("summary") +}) +``` + +The context is converted into a prompt: + +```python +context = "\n\n".join([ + f"Document: {m['path']}\nSummary:\n{m['summary']}" + for m in memories +]) +``` + +Hermes sends the question and retrieved memory to the local model: + +```python +f"Question:\n{question}\n\n" +f"Relevant workspace memory:\n{context}" +``` + +The runtime loop checks for a query file: + +```python +if os.path.exists("/workspace/query.txt"): +``` + +After reading the question, Hermes removes the query file: + +```python +os.remove("/workspace/query.txt") +``` + +This makes query processing event-like while keeping the runtime simple and local. + +## Runtime Compatibility Notes + +The compatibility details in this section apply to the retrieval code you added in `search_memory()` and `query_workspace()`. The embedding API and vector dimension were covered in the previous section, so this section focuses on the Qdrant retrieval call and result parsing. + +Use the current Qdrant semantic retrieval API inside `search_memory()`: + +```python +results = qdrant.query_points( + collection_name=COLLECTION_NAME, + query=embedding, + limit=3 +).points +``` + +The current Qdrant client expects the query vector in the `query` argument. Do not use older examples that pass `query_vector=embedding`. + +Qdrant returns scored point objects. Read each payload from `result.payload` before assembling the context: + +```python +for result in results: + payload = result.payload +``` + +In this section, `limit=3` is intentionally hardcoded in `search_memory()` so the retrieval behavior is easy to inspect. Later, runtime policy can make this value configurable. + +## Rebuild Hermes + +Rebuild the Hermes container: + +```bash +cd ~/dgx-hermes-agent/compose +docker compose build hermes +``` + +Restart the runtime: + +```bash +docker compose up -d +``` + +Follow the Hermes logs: + +```bash +docker logs -f hermes +``` + +Expected startup output: + +```text +[Hermes Agent] Starting workspace watcher... +[Hermes Agent] Monitoring: /workspace/inbox +``` + +## Create Test Memory + +Before testing retrieval, create a few new documents so Qdrant contains useful semantic memory. + +For each document, write the file in `/tmp` first and then move it into `workspace/inbox/`. This gives Hermes a completed file when the `on_created()` event fires. + +Create a document about CPU orchestration: + +```bash +cat > /tmp/cpu-orchestration-note.txt <<'EOF' +Arm CPUs are responsible for orchestration in persistent AI runtimes. +They coordinate filesystem events, runtime scheduling, container services, +document parsing, metadata handling, and vector database operations. +EOF + +mv /tmp/cpu-orchestration-note.txt \ +~/dgx-hermes-agent/workspace/inbox/cpu-orchestration-note.txt +``` + +Create a document about GPU inference: + +```bash +cat > /tmp/gpu-inference-note.txt <<'EOF' +NVIDIA GPUs accelerate local model inference, token generation, +summarization, embedding generation, and contextual reasoning workloads. +EOF + +mv /tmp/gpu-inference-note.txt \ +~/dgx-hermes-agent/workspace/inbox/gpu-inference-note.txt +``` + +Create a document about semantic memory: + +```bash +cat > /tmp/semantic-memory-note.txt <<'EOF' +Semantic memory stores embeddings and metadata in a vector database. +This allows persistent AI systems to retrieve relevant prior context +based on meaning instead of exact keyword matching. +EOF + +mv /tmp/semantic-memory-note.txt \ +~/dgx-hermes-agent/workspace/inbox/semantic-memory-note.txt +``` + +Watch the Hermes logs until each document is summarized, embedded, and stored. + +Expected log lines include: + +```text +[Agent] New file detected: +/workspace/inbox/cpu-orchestration-note.txt + +[Agent] Running summarization inference... + +[Agent] AI Summary: +- Arm CPUs handle orchestration in persistent AI runtimes. +- They manage filesystem events, runtime scheduling, and container services. +- Tasks also include document parsing, metadata handling, and vector database operations. + +[Agent] Generating embeddings... +[Memory] Stored document: /workspace/inbox/cpu-orchestration-note.txt + +[Agent] New file detected: +/workspace/inbox/gpu-inference-note.txt + +[Agent] Running summarization inference... + +[Agent] AI Summary: +- NVIDIA GPUs speed up local model inference processes. +- These GPUs enhance token generation, summarization, and embedding generation tasks. +- They also improve contextual reasoning workloads locally. + +[Agent] Generating embeddings... +[Memory] Stored document: /workspace/inbox/gpu-inference-note.txt + +[Agent] New file detected: +/workspace/inbox/semantic-memory-note.txt + +[Agent] Running summarization inference... + +[Agent] AI Summary: +- Semantic memory uses a vector database to store embeddings and metadata. +- This enables persistent AI systems to recall relevant past contexts based on meaning. +- The system avoids relying solely on exact keyword matching for retrieval. + +[Agent] Generating embeddings... +[Memory] Stored document: /workspace/inbox/semantic-memory-note.txt +``` + +## Test Semantic Retrieval + +Create a query file: + +```bash +echo "How do CPUs help persistent AI systems?" \ +> ~/dgx-hermes-agent/workspace/query.txt +``` + +Hermes checks for `/workspace/query.txt` in the runtime loop. When it sees the file, it reads the question, removes the file, embeds the question, searches Qdrant, and sends the retrieved context to Ollama. + +In the Hermes logs, first confirm that semantic search started: + +```text +[Memory] Searching semantic memory... +``` + +Next, confirm that Hermes printed the question and the retrieved memory context: + +```text +[Workspace Query] +How do CPUs help persistent AI systems? + +[Retrieved Memories] +Document: /workspace/inbox/cpu-orchestration-note.txt +Summary: +- Arm CPUs manage orchestration in persistent AI runtimes. +- They handle filesystem events, runtime scheduling, and container services. +- Additionally, they process document parsing, metadata handling, and vector database operations. + +Document: /workspace/inbox/cpu-orchestration-note.txt +Summary: +- Arm CPUs handle orchestration in persistent AI runtimes. +- They manage filesystem events, runtime scheduling, and container services. +- Tasks also include document parsing, metadata handling, and vector database operations. + +Document: /workspace/inbox/semantic-memory-note.txt +Summary: +- Semantic memory uses a vector database to store embeddings and metadata. +- This enables persistent AI systems to recall relevant past context based on meaning. +- Retrieval is done without relying solely on exact keyword matching. +``` + +Finally, confirm that Ollama generated a response from the retrieved context: + +```text +[AI Response] +In persistent AI systems, CPUs play a crucial role in managing orchestration tasks that are essential for the system's operation. Specifically, Arm CPUs handle several key responsibilities: + +1. **Orchestration Management**: They manage the overall orchestration of the runtime environment. +2. **Filesystem Events Handling**: CPU processes and responds to events related to file systems within the AI system. +3. **Runtime Scheduling**: They schedule tasks and processes based on current needs and resource availability. +4. **Container Services**: CPUs handle services running in containers, ensuring that these components operate efficiently. + +Furthermore, Arm CPUs are involved in processing various data-related operations: +- **Document Parsing**: This involves breaking down documents into manageable chunks for further analysis or storage. +- **Metadata Handling**: They manage the creation and manipulation of metadata associated with data entities. +- **Vector Database Operations**: These include storing embeddings and metadata in vector databases, which is critical for semantic memory systems. Vector databases allow persistent AI systems to recall relevant past context based on meaning rather than exact keyword matching. + +These functions collectively ensure that CPUs are central to maintaining the functionality and efficiency of persistent AI systems by managing both operational tasks and data processing needs. +``` + +The exact answer will vary, but it should refer to retrieved memory about CPU orchestration, filesystem events, scheduling, and runtime coordination. + +## Verify Contextual Reasoning + +Ask a second question: + +```bash +echo "Why does the runtime need semantic memory?" \ +> ~/dgx-hermes-agent/workspace/query.txt +``` + +Expected behavior: + +- Hermes embeds the question +- Qdrant retrieves relevant summaries +- Hermes assembles the retrieved summaries into context +- Ollama generates an answer grounded in that context + +The logs should include a retrieved memory from the semantic memory document you created earlier. + +Example output: + +```text +[Memory] Searching semantic memory... + +[Workspace Query] +Why does the runtime need semantic memory? + +[Retrieved Memories] +Document: /workspace/inbox/memory-test.txt +Summary: +- Persistent AI runtimes require memory to incorporate past workspace activities into future reasoning. +- Semantic memory in AI systems retains embeddings and metadata to store relevant context. +- This stored information allows for retrieval of pertinent context, enhancing the runtime's ability to reason effectively. + +Document: /workspace/inbox/semantic-memory-note.txt +Summary: +- Semantic memory uses a vector database to store embeddings and metadata. +- This enables persistent AI systems to recall relevant past context based on meaning. +- Retrieval is done without relying solely on exact keyword matching. + +Document: /workspace/inbox/semantic-memory-note.txt +Summary: +- Semantic memory uses a vector database to store embeddings and metadata. +- This enables persistent AI systems to recall relevant past contexts based on meaning. +- The system avoids relying solely on exact keyword matching for retrieval. + +[AI Response] +The runtime needs semantic memory because it retains embeddings and metadata that store relevant context from past workspace activities. This allows the system to effectively reason by retrieving pertinent information, enhancing its ability to understand and respond to new inputs more intelligently. Unlike simple keyword matching, semantic memory uses a vector database which captures the meaning of words or concepts, enabling more accurate and contextualized recall of past events or knowledge. +``` + +## Retrieval Workflow + +The full retrieval workflow is: + +```text +query.txt question + -> Ollama query embedding + -> Qdrant workspace_memory search + -> retrieved summaries + -> Hermes context prompt + -> Ollama contextual response + -> Hermes log output +``` + +This creates a local contextual reasoning loop using persistent memory. + +## CPU and GPU Responsibilities + +The Arm Grace CPU coordinates retrieval: + +- Watches for `query.txt` +- Reads and deletes the query file +- Calls Ollama for query embeddings +- Calls Qdrant for vector search +- Parses Qdrant result payloads +- Assembles retrieved context +- Calls Ollama for contextual reasoning + +The Blackwell GPU accelerates: + +- Query embedding generation +- Contextual LLM inference +- Response generation + +Qdrant performs the vector similarity search and returns the most relevant memory payloads. + +## Summary + +You added ***semantic retrieval*** and ***contextual reasoning*** to Hermes Agent. The runtime now turns a question into an embedding, searches Qdrant with `query_points(...)`, assembles retrieved memory, and sends that context to `qwen2.5:7b`. + +The runtime can now store memory and reason over it through the local `/workspace/query.txt` workflow. + +Next, you will add autonomous workspace cognition. diff --git a/content/learning-paths/laptops-and-desktops/dgx_persistent_agent/7_autonomous_workspace.md b/content/learning-paths/laptops-and-desktops/dgx_persistent_agent/7_autonomous_workspace.md new file mode 100644 index 0000000000..995d86fc29 --- /dev/null +++ b/content/learning-paths/laptops-and-desktops/dgx_persistent_agent/7_autonomous_workspace.md @@ -0,0 +1,841 @@ +--- +title: Add Autonomous Workspace Cognition +weight: 8 +layout: "learningpathall" +--- + +## Add Autonomous Workspace Cognition + +In this section, you will add ***autonomous workspace cognition*** to Hermes Agent. + +In the previous section, Hermes could answer a question by retrieving relevant memory on demand. This section adds proactive behavior: Hermes will periodically review stored workspace memory, identify recurring themes, and write a summary without waiting for a user query. + +For example, if the workspace contains notes about CPU orchestration, GPU inference, and semantic memory, Hermes can generate a scheduled workspace summary that explains those themes and how they relate to the current local AI runtime. + +The runtime can already ingest documents, build semantic memory, and answer questions using retrieved context. You will now add a ***periodic cognition workflow*** that reviews stored memory and generates a workspace-level summary. + +The workflow becomes: + +```text +stored workspace memory + -> scheduled cognition loop + -> Hermes aggregates summaries + -> Ollama analyzes recurring themes + -> workspace-summary.txt +``` + +This is the final stage of the Learning Path. Hermes becomes a persistent autonomous local AI runtime that can monitor, remember, retrieve, and periodically reason about workspace state. + +## Autonomous Cognition Overview + +Autonomous cognition means the runtime performs useful reasoning without waiting for a new document or explicit query. + +Hermes will: + +- Load runtime policy from `/workspace/config/runtime.json` +- Continue watching `workspace/inbox/` +- Continue ingesting supported files into semantic memory +- Continue answering questions from `/workspace/query.txt` +- Periodically summarize the stored workspace memory +- Write the summary to `/workspace/memory/workspace-summary.txt` + +The runtime remains local-first. Files, models, vector memory, and summaries stay on the DGX Spark system. + +## Create the Runtime Config Directory + +Create the configuration directory if it does not already exist: + +```bash +mkdir -p ~/dgx-hermes-agent/workspace/config +``` + +Create and edit the file `~/dgx-hermes-agent/workspace/config/runtime.json`. + +Add the following content: + +```json +{ + "summary_interval_hours": 8, + "supported_extensions": [ + ".txt", + ".md", + ".log" + ], + "retrieval_limit": 3, + "summary_output": "/workspace/memory/workspace-summary.txt" +} +``` + +The policy file controls runtime behavior without rebuilding the container. + +| Policy | Purpose | +|---|---| +| `summary_interval_hours` | Controls how often Hermes generates a workspace summary | +| `supported_extensions` | Controls which file types Hermes ingests | +| `retrieval_limit` | Records the intended semantic retrieval depth for the runtime policy | +| `summary_output` | Defines where Hermes writes the workspace summary | + +The verified code in this section keeps semantic retrieval at `limit=3`, matching the policy value shown above. The policy file makes this setting visible for later hardening, where the retrieval function can load it dynamically. + +## Add Autonomous Cognition to Hermes + +Open and edit the file `~/dgx-hermes-agent/hermes/agent.py`. + +Replace the file with the following version: + +```python +import os +import json +import uuid +import time +import ollama + +from datetime import datetime +from qdrant_client import QdrantClient +from qdrant_client.models import ( + Distance, + VectorParams, + PointStruct +) + +from watchdog.observers import Observer +from watchdog.events import FileSystemEventHandler + +WATCH_DIR = "/workspace/inbox" +CONFIG_PATH = "/workspace/config/runtime.json" +COLLECTION_NAME = "workspace_memory" + +OLLAMA_HOST = os.getenv( + "OLLAMA_HOST", + "http://ollama:11434" +) +QDRANT_HOST = os.getenv( + "QDRANT_HOST", + "qdrant" +) + +client = ollama.Client(host=OLLAMA_HOST) +qdrant = QdrantClient( + host=QDRANT_HOST, + port=6333 +) + +def ensure_collection(): + collections = qdrant.get_collections().collections + names = [c.name for c in collections] + if COLLECTION_NAME not in names: + qdrant.create_collection( + collection_name=COLLECTION_NAME, + vectors_config=VectorParams( + size=768, + distance=Distance.COSINE + ) + ) + print(f"[Memory] Created collection: {COLLECTION_NAME}") + +def load_runtime_config(): + with open(CONFIG_PATH, "r") as f: + return json.load(f) + +class WorkspaceHandler(FileSystemEventHandler): + def on_created(self, event): + if event.is_directory: + return + filename = os.path.basename(event.src_path) + # Ignore hidden files + if filename.startswith("."): + return + ext = os.path.splitext(filename)[1] + config = load_runtime_config() + supported_extensions = config.get( + "supported_extensions", + [".txt"] + ) + if ext not in supported_extensions: + return + + print(f"\n[Agent] New file detected:") + print(event.src_path) + process_file(event.src_path) + +def generate_summary(content): + response = client.chat( + model="qwen2.5:7b", + messages=[ + { + "role": "system", + "content": ( + "You are a local AI workspace assistant. " + "Summarize the document in 3 concise bullet points." + ) + }, + { + "role": "user", + "content": content[:4000] + } + ] + ) + return response["message"]["content"] + +def generate_embedding(content): + response = client.embed( + model="nomic-embed-text", + input=content[:4000] + ) + return response["embeddings"][0] + +def store_memory(path, content, summary, embedding): + point_id = str(uuid.uuid4()) + qdrant.upsert( + collection_name=COLLECTION_NAME, + points=[ + PointStruct( + id=point_id, + vector=embedding, + payload={ + "path": path, + "summary": summary, + "content": content[:4000] + } + ) + ] + ) + print(f"[Memory] Stored document: {path}") + +def search_memory(query): + print("\n[Memory] Searching semantic memory...") + embedding = generate_embedding(query) + results = qdrant.query_points( + collection_name=COLLECTION_NAME, + query=embedding, + limit=3 + ).points + + memories = [] + for result in results: + payload = result.payload + memories.append({ + "path": payload.get("path"), + "summary": payload.get("summary") + }) + return memories + +def query_workspace(question): + memories = search_memory(question) + context = "\n\n".join([ + f"Document: {m['path']}\nSummary:\n{m['summary']}" + for m in memories + ]) + response = client.chat( + model="qwen2.5:7b", + messages=[ + { + "role": "system", + "content": ( + "You are a persistent AI workspace assistant. " + "Answer questions using the retrieved workspace memory." + ) + }, + { + "role": "user", + "content": ( + f"Question:\n{question}\n\n" + f"Relevant workspace memory:\n{context}" + ) + } + ] + ) + answer = response["message"]["content"] + print("\n[Workspace Query]") + print(question) + + print("\n[Retrieved Memories]") + print(context) + + print("\n[AI Response]") + print(answer) + +def generate_workspace_summary(): + print("\n[Cognition] Generating workspace summary...") + results = qdrant.scroll( + collection_name=COLLECTION_NAME, + limit=10, + with_payload=True + )[0] + summaries = [] + for result in results: + payload = result.payload + summaries.append( + payload.get("summary", "") + ) + combined = "\n\n".join(summaries) + response = client.chat( + model="qwen2.5:7b", + messages=[ + { + "role": "system", + "content": ( + "You are an autonomous workspace cognition agent. " + "Analyze the workspace summaries and identify " + "important recurring themes and insights." + ) + }, + { + "role": "user", + "content": combined[:6000] + } + ] + ) + workspace_summary = response["message"]["content"] + config = load_runtime_config() + output_path = config.get( + "summary_output", + "/workspace/memory/workspace-summary.txt" + ) + with open(output_path, "w") as f: + f.write( + f"Workspace Summary\n" + f"Generated: {datetime.now()}\n\n" + ) + f.write(workspace_summary) + print("\n[Cognition] Workspace summary updated:") + print(output_path) + +def process_file(path): + try: + with open(path, "r", encoding="utf-8") as f: + content = f.read() + print("\n[Agent] Running summarization inference...") + summary = generate_summary(content) + + print("\n[Agent] AI Summary:") + print(summary) + + print("\n[Agent] Generating embeddings...") + embedding = generate_embedding(content) + store_memory( + path, + content, + summary, + embedding + ) + except Exception as e: + print(f"[Agent] Error: {e}") + +if __name__ == "__main__": + print("\n[Hermes Agent] Starting workspace watcher...") + print(f"[Hermes Agent] Monitoring: {WATCH_DIR}") + + ensure_collection() + observer = Observer() + observer.schedule( + WorkspaceHandler(), + WATCH_DIR, + recursive=False + ) + observer.start() + last_summary_time = 0 + try: + while True: + time.sleep(5) + config = load_runtime_config() + summary_interval_hours = config.get( + "summary_interval_hours", + 8 + ) + interval_seconds = ( + summary_interval_hours * 3600 + ) + current_time = time.time() + + # Periodic autonomous cognition + if ( + current_time - last_summary_time + > interval_seconds + ): + generate_workspace_summary() + last_summary_time = current_time + + # Interactive semantic retrieval + if os.path.exists("/workspace/query.txt"): + with open("/workspace/query.txt", "r") as f: + question = f.read().strip() + os.remove("/workspace/query.txt") + query_workspace(question) + + except KeyboardInterrupt: + observer.stop() + observer.join() +``` + +## Code Trace + +This version adds JSON configuration loading: + +```python +CONFIG_PATH = "/workspace/config/runtime.json" +``` + +```python +def load_runtime_config(): + with open(CONFIG_PATH, "r") as f: + return json.load(f) +``` + +File filtering now comes from runtime policy: + +```python +config = load_runtime_config() + +supported_extensions = config.get( + "supported_extensions", + [".txt"] +) +``` + +The cognition function reads stored memory from Qdrant: + +```python +results = qdrant.scroll( + collection_name=COLLECTION_NAME, + limit=10, + with_payload=True +)[0] +``` + +It extracts stored summaries: + +```python +summaries.append( + payload.get("summary", "") +) +``` + +It asks the local model to analyze recurring themes: + +```python +"You are an autonomous workspace cognition agent. " +"Analyze the workspace summaries and identify " +"important recurring themes and insights." +``` + +It writes the result to the configured summary output path: + +```python +output_path = config.get( + "summary_output", + "/workspace/memory/workspace-summary.txt" +) +``` + +The main loop reloads runtime policy every cycle: + +```python +config = load_runtime_config() +``` + +This allows changes to `runtime.json` to affect runtime behavior without rebuilding the container. + +## Rebuild Hermes + +Rebuild the Hermes container: + +```bash +cd ~/dgx-hermes-agent/compose +docker compose build hermes +``` + +Restart the runtime: + +```bash +docker compose up -d +``` + +Follow the logs: + +```bash +docker logs -f hermes +``` + +On startup, the first cognition cycle runs immediately because `last_summary_time` starts at `0`. + +Expected output: + +```text +[Cognition] Generating workspace summary... +[Cognition] Workspace summary updated: +/workspace/memory/workspace-summary.txt +``` + +This startup behavior is expected and validates that the cognition pipeline can read memory, call Ollama, and write the summary file. + +## Verify Workspace Summary Output + +View the generated summary on the host: + +```bash +cat ~/dgx-hermes-agent/workspace/memory/workspace-summary.txt +``` + +Expected structure: + +```text +Workspace Summary +Generated: 2026-05-20 22:53:29.539079 + +### Recurring Themes and Insights: + +1. **Semantic Memory in Persistent AI Systems:** + - Semantic memory utilizes a vector database to store embeddings and metadata. + - This approach allows for context-based retrieval rather than relying solely on exact keyword matching. + - The system can recall relevant past contexts based on meaning, enhancing its reasoning capabilities. + +2. **GPU Utilization:** + - NVIDIA GPUs are crucial for speeding up local model inference processes. + - They enhance tasks such as token generation, summarization, and embedding generation. + - These GPUs also improve the performance of contextual reasoning workloads locally. + +3. **Arm CPUs in Persistent AI Runtimes:** + - Arm CPUs handle orchestration by managing various operational tasks including: + - Filesystem events + - Runtime scheduling + - Container services + - They also process document parsing, metadata handling, and vector database operations. + - These tasks are essential for maintaining the overall functionality and efficiency of the persistent AI runtime. + +### Summary: +The key insights from the workspace summaries revolve around how semantic memory enables context-based recall in AI systems, the role of NVIDIA GPUs in accelerating model inference tasks, and the multifaceted responsibilities of Arm CPUs in orchestrating various operational aspects of a persistent AI environment. These themes highlight the interdependence of different hardware components and their specific roles in enhancing the performance and effectiveness of AI systems. +``` + +The summary content will vary because it is generated by the local model from stored memory. + +## Validate Event-Driven Ingestion + +Create a new file. As in the previous sections, write it outside the inbox first and then move it into `workspace/inbox/` so Hermes reads a completed file. + +```bash +cat > /tmp/autonomous-runtime-note.txt <<'EOF' +Autonomous workspace cognition allows a persistent AI runtime to analyze +stored memory on a schedule. This helps the system identify recurring +themes, summarize activity, and maintain awareness of workspace state. +EOF + +mv /tmp/autonomous-runtime-note.txt \ +~/dgx-hermes-agent/workspace/inbox/autonomous-runtime-note.txt +``` + +Expected Hermes logs: + +```text +[Agent] New file detected: +/workspace/inbox/autonomous-runtime-note.txt + +[Agent] Running summarization inference... + +[Agent] AI Summary: +- Autonomous workspace cognition enables an ongoing AI analysis of stored data at scheduled intervals. +- The system uses this analysis to detect recurring themes and summarize activities within the workspace. +- It maintains scheduled awareness of workspace state from stored memory. + +[Agent] Generating embeddings... +[Memory] Stored document: /workspace/inbox/autonomous-runtime-note.txt +``` + +The new document is added to semantic memory and can be included in future workspace summaries. + +## Validate Semantic Retrieval Still Works + +Autonomous cognition adds scheduling and workspace-level summaries, but it should not break the interactive retrieval workflow from the previous section. Validate semantic retrieval again to confirm that Hermes can still process `/workspace/query.txt` while the cognition loop is enabled. + +Create a query: + +```bash +echo "What is autonomous workspace cognition?" \ +> ~/dgx-hermes-agent/workspace/query.txt +``` + +Expected logs: + +```text +[Memory] Searching semantic memory... + +[Workspace Query] +What is autonomous workspace cognition? + +[Retrieved Memories] +Document: /workspace/inbox/autonomous-runtime-note.txt +Summary: +- Autonomous workspace cognition enables an ongoing AI analysis of stored data at scheduled intervals. +- The system uses this analysis to detect recurring themes and summarize activities within the workspace. +- It maintains scheduled awareness of workspace state from stored memory. + +Document: /workspace/inbox/memory-test.txt +Summary: +- Persistent AI runtimes require memory to incorporate past workspace activities into future reasoning. +- Semantic memory in AI systems retains embeddings and metadata to store relevant context. +- This stored information allows for retrieval of pertinent context, enhancing the runtime's ability to reason effectively. + +Document: /workspace/inbox/cpu-orchestration-note.txt +Summary: +- Arm CPUs manage orchestration in persistent AI runtimes. +- They handle filesystem events, runtime scheduling, and container services. +- Additionally, they process document parsing, metadata handling, and vector database operations. + +[AI Response] +Autonomous workspace cognition is a feature that enables ongoing AI analysis of stored data at scheduled intervals. This system uses the analysis to detect recurring themes and summarize activities within the workspace. It maintains scheduled awareness from stored memory, allowing the runtime to reason over the information it has already ingested. +``` + +This confirms that autonomous cognition was added without removing the query workflow from the previous section. + +## Validate Runtime Policy Reload + +Runtime policy reload is important because persistent AI systems should be configurable without rebuilding containers or restarting the full stack. In this validation, you temporarily change the supported file extensions and confirm that Hermes applies the new policy during its normal runtime loop. + +Open and edit the file `~/dgx-hermes-agent/workspace/config/runtime.json`. + +Change the supported extensions so Hermes only ingests Markdown files: + +```json +{ + "summary_interval_hours": 8, + "supported_extensions": [ + ".md" + ], + "retrieval_limit": 3, + "summary_output": "/workspace/memory/workspace-summary.txt" +} +``` + +Wait 5 to 10 seconds for the runtime loop to reload the policy. + +Create a `.txt` file: + +```bash +echo "This text file should be ignored by the current policy." \ +> /tmp/ignored-policy-test.txt + +mv /tmp/ignored-policy-test.txt \ +~/dgx-hermes-agent/workspace/inbox/ignored-policy-test.txt +``` + +Hermes should not ingest it because `.txt` is no longer in `supported_extensions`. + +Now create a Markdown file: + +```bash +cat > /tmp/accepted-policy-test.md <<'EOF' +# Policy Test + +This Markdown file should be ingested because the runtime policy allows +files with the .md extension. +EOF + +mv /tmp/accepted-policy-test.md \ +~/dgx-hermes-agent/workspace/inbox/accepted-policy-test.md +``` + +Expected logs: + +```text +[Agent] New file detected: +/workspace/inbox/accepted-policy-test.md + +[Agent] Running summarization inference... + +[Agent] AI Summary: +- The document discusses key strategies for enhancing local business support through AI technologies. +- It highlights the importance of personalized customer experiences as enabled by advanced data analysis and machine learning techniques. +- Recommendations include integrating chatbots and virtual assistants to improve communication efficiency and customer service. +``` + +This validates that Hermes reloads runtime configuration dynamically. + +Restore the original policy when you are done: + +```json +{ + "summary_interval_hours": 8, + "supported_extensions": [ + ".txt", + ".md", + ".log" + ], + "retrieval_limit": 3, + "summary_output": "/workspace/memory/workspace-summary.txt" +} +``` + +## Trigger a Faster Cognition Cycle + +For validation, you can temporarily reduce the summary interval. + +Open and edit the file `~/dgx-hermes-agent/workspace/config/runtime.json`. + +Set a very small interval: + +```json +{ + "summary_interval_hours": 0.001, + "supported_extensions": [ + ".txt", + ".md", + ".log" + ], + "retrieval_limit": 3, + "summary_output": "/workspace/memory/workspace-summary.txt" +} +``` + +This is approximately 3.6 seconds. With this setting, Hermes repeatedly triggers the cognition loop after only a short pause. In the logs, you should see `[Cognition] Generating workspace summary...` and `[Cognition] Workspace summary updated:` appear again and again while the runtime is active. + +This fast interval is useful for validation, but it is intentionally aggressive. Leave it enabled only long enough to confirm that scheduling works, then restore the interval to a larger value. + +Follow the logs: + +```bash +docker logs -f hermes +``` + +Expected output: + +```text +[Cognition] Generating workspace summary... +[Cognition] Workspace summary updated: +``` + +Restore the interval to `8` after validation to avoid continuous summary generation: + +```json +{ + "summary_interval_hours": 8, + "supported_extensions": [ + ".txt", + ".md", + ".log" + ], + "retrieval_limit": 3, + "summary_output": "/workspace/memory/workspace-summary.txt" +} +``` + +## Validate Persistent Runtime Lifecycle + +Restart the stack: + +```bash +cd ~/dgx-hermes-agent/compose +docker compose restart hermes +``` + +Follow the logs: + +```bash +docker logs -f hermes +``` + +Expected output: + +```text +[Hermes Agent] Starting workspace watcher... +[Hermes Agent] Monitoring: /workspace/inbox +``` + +The `workspace_memory` collection remains in Qdrant because the Qdrant storage directory is persisted on the host. + +Verify that the summary file still exists: + +```bash +ls ~/dgx-hermes-agent/workspace/memory/ +``` + +You should see `workspace-summary.txt`. + +This confirms that the runtime state persists across container restarts. + +## Runtime Validation Summary + +At this point, the local runtime supports: + +| Capability | Status | +|---|---| +| Workspace monitoring | Complete | +| Local summarization | Complete | +| Embedding generation | Complete | +| Persistent vector memory | Complete | +| Semantic retrieval | Complete | +| Contextual reasoning | Complete | +| Autonomous workspace cognition | Complete | +| Dynamic runtime policy reload | Complete | + +## CPU and GPU Responsibilities + +The Arm Grace CPU coordinates the autonomous runtime: + +- Filesystem monitoring +- Runtime policy loading +- Dynamic configuration reload +- Background scheduling +- Semantic memory aggregation +- Query workflow coordination +- Workspace summary lifecycle + +The Blackwell GPU accelerates: + +- Summarization +- Embedding generation +- Contextual reasoning +- Autonomous workspace analysis + +The result is a heterogeneous local AI system where the CPU coordinates persistent workflows and the GPU accelerates model execution. + +## Runtime Behavior Notes + +The final runtime still uses the Ollama and Qdrant APIs introduced in the previous sections. The notes below focus on runtime behavior that is specific to autonomous cognition and policy-driven orchestration. + +Runtime configuration is reloaded inside the main loop: + +```python +config = load_runtime_config() +``` + +This means changes to `/workspace/config/runtime.json` can affect behavior without rebuilding the Hermes container. If the JSON file is malformed, Hermes will fail when it tries to reload the policy, so validate the file syntax after editing. + +Workspace cognition reads stored memory using Qdrant `scroll(...)`: + +```python +results = qdrant.scroll( + collection_name=COLLECTION_NAME, + limit=10, + with_payload=True +)[0] +``` + +This is different from semantic search. `scroll(...)` is used here to collect recent stored summaries for workspace-level analysis, while `query_points(...)` is still used for question-driven semantic retrieval. + +On startup, the first cognition cycle runs immediately because `last_summary_time` starts at `0`: + +```python +last_summary_time = 0 +``` + +This behavior is expected. It validates that Hermes can read memory, call Ollama, and write the configured summary output path. + +The current implementation primarily handles file creation events through: + +```python +on_created() +``` + +For validation, use new filenames. Existing files or file modifications may not trigger ingestion. File modification handling is a natural next improvement for a hardened runtime. + +The `retrieval_limit` value is present in `runtime.json`, but the verified retrieval code in this section still uses `limit=3` inside `search_memory()`. Treat the policy value as a visible configuration placeholder for later hardening. + +## Summary + +You completed the ***persistent autonomous local AI runtime*** on DGX Spark. The finished system demonstrates how an Arm CPU can coordinate long-running AI workflows while a GPU accelerates summarization, embedding generation, contextual reasoning, and workspace-level cognition. + +This Learning Path uses DGX Spark as the reference platform, but the architecture is reusable beyond this specific system. The same pattern can be adapted to other Arm platforms that can run containerized services, local inference backends, vector memory, and a CPU-side orchestration runtime. + +The key idea is that persistent AI systems are ***distributed orchestration systems***, not just single inference calls. Hermes coordinates workspace ingestion, semantic memory, retrieval, autonomous summaries, and runtime policy, while the inference and memory services remain replaceable implementation choices. + +This implementation is intentionally a minimal MVP. It validates the end-to-end architecture, but it does not yet handle production concerns such as repeated updates to the same file, deduplication, re-indexing, versioned memory records, or file modification events. Those hardening steps are natural extensions once the core runtime pattern is working. diff --git a/content/learning-paths/laptops-and-desktops/dgx_persistent_agent/_index.md b/content/learning-paths/laptops-and-desktops/dgx_persistent_agent/_index.md new file mode 100644 index 0000000000..d1a2cab53e --- /dev/null +++ b/content/learning-paths/laptops-and-desktops/dgx_persistent_agent/_index.md @@ -0,0 +1,57 @@ +--- +title: Orchestrate a Persistent Local AI Agent with Hermes on DGX Spark + +description: Learn how to build a persistent local AI agent on NVIDIA DGX Spark using event-driven orchestration, semantic memory, and heterogeneous Arm CPU + GPU workloads. You'll combine Hermes Agent, Ollama, and Qdrant to create a continuously running local AI runtime capable of event-driven document ingestion, contextual retrieval, and scheduled workspace cognition. + +minutes_to_complete: 90 + +who_is_this_for: This is an advanced topic for developers interested in persistent local AI agent systems, semantic memory architectures, and heterogeneous AI computing on NVIDIA DGX Spark. You'll learn how Arm-based Grace CPUs orchestrate long-running AI workflows including filesystem monitoring, semantic retrieval, runtime scheduling, and autonomous cognition, while Blackwell GPUs accelerate local language model inference and embeddings generation using Ollama. This Learning Path is a great fit if you want to understand how persistent AI runtimes operate continuously using coordinated CPU and GPU workloads. + +learning_objectives: + - Describe how persistent AI runtimes combine orchestration, semantic memory, and local inference + - Build a continuously running local AI agent using Hermes Agent, Ollama, and Qdrant + - Use Arm Grace CPUs to orchestrate event-driven AI workflows on NVIDIA DGX Spark + - Deploy semantic memory and contextual retrieval pipelines using vector embeddings and Qdrant + +prerequisites: + - An NVIDIA DGX Spark system with at least 15 GB of available disk space + +author: Odin Shen + +### Tags +skilllevels: Advanced +subjects: ML +armips: + - Cortex-A +operatingsystems: + - Linux +tools_software_languages: + - Python + - Docker + - Ollama + +further_reading: + - resource: + title: NVIDIA DGX Spark + link: https://www.nvidia.com/en-gb/products/workstations/dgx-spark/ + type: website + - resource: + title: RAG Learning Path + link: https://learn.arm.com/learning-paths/laptops-and-desktops/dgx_spark_rag/ + type: website + - resource: + title: Offline Voice Chatbot Learning Path + link: https://learn.arm.com/learning-paths/laptops-and-desktops/dgx_spark_voicechatbot/ + type: documentation + - resource: + title: Unlock quantized LLM performance on Arm-based NVIDIA DGX Spark + link: /learning-paths/laptops-and-desktops/dgx_spark_llamacpp/ + type: Learning Path + + +### FIXED, DO NOT MODIFY +# ================================================================================ +weight: 1 # _index.md always has weight of 1 to order correctly +layout: "learningpathall" # All files under learning paths have this same wrapper +learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content. +--- diff --git a/content/learning-paths/laptops-and-desktops/dgx_persistent_agent/_next-steps.md b/content/learning-paths/laptops-and-desktops/dgx_persistent_agent/_next-steps.md new file mode 100644 index 0000000000..c3db0de5a2 --- /dev/null +++ b/content/learning-paths/laptops-and-desktops/dgx_persistent_agent/_next-steps.md @@ -0,0 +1,8 @@ +--- +# ================================================================================ +# FIXED, DO NOT MODIFY THIS FILE +# ================================================================================ +weight: 21 # Set to always be larger than the content in this path to be at the end of the navigation. +title: "Next Steps" # Always the same, html page title. +layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing. +--- diff --git a/content/learning-paths/laptops-and-desktops/dgx_persistent_agent/qdrant_dashboard.png b/content/learning-paths/laptops-and-desktops/dgx_persistent_agent/qdrant_dashboard.png new file mode 100644 index 0000000000..56b004c80d Binary files /dev/null and b/content/learning-paths/laptops-and-desktops/dgx_persistent_agent/qdrant_dashboard.png differ diff --git a/content/learning-paths/laptops-and-desktops/dgx_persistent_agent/qdrant_dashboard_2.png b/content/learning-paths/laptops-and-desktops/dgx_persistent_agent/qdrant_dashboard_2.png new file mode 100644 index 0000000000..7991d3f126 Binary files /dev/null and b/content/learning-paths/laptops-and-desktops/dgx_persistent_agent/qdrant_dashboard_2.png differ diff --git a/content/learning-paths/laptops-and-desktops/dgx_persistent_agent/qdrant_dashboard_3.png b/content/learning-paths/laptops-and-desktops/dgx_persistent_agent/qdrant_dashboard_3.png new file mode 100644 index 0000000000..6a167d0588 Binary files /dev/null and b/content/learning-paths/laptops-and-desktops/dgx_persistent_agent/qdrant_dashboard_3.png differ diff --git a/content/learning-paths/laptops-and-desktops/dgx_persistent_agent/qdrant_dashboard_4.png b/content/learning-paths/laptops-and-desktops/dgx_persistent_agent/qdrant_dashboard_4.png new file mode 100644 index 0000000000..ab5d26ca9a Binary files /dev/null and b/content/learning-paths/laptops-and-desktops/dgx_persistent_agent/qdrant_dashboard_4.png differ