This repository documents my learning and development process of a security-focused HomeLab utilizing Proxmox VE, pfSense Firewall, network segmentation, and virtualization.
The objective is to gain hands-on experience for a career as a System Administrator with a focus on Networking and IT Security.
- Project Objectives
- Hardware & Initial Setup
- Architecture Overview
- Network Design
- Proxmox Installation
- pfSense Firewall
- Virtual Machines
- Testing & Validation
- Security Measures
- Current Status
- Roadmap / Next Steps
- Build a Proxmox HomeLab on Bare Metal.
- Implement pfSense as the central Firewall & Router.
- Network isolation using VLANs & Subnets.
- Implementation of a DMZ for public-facing services.
- Operation of Linux and Windows VMs.
- Systematic documentation of the learning process.
| Component | Description |
|---|---|
| Server | AOOSTAR WTR PRO – AMD Ryzen 7 5825U, 64 GB RAM |
| Router | TP-Link Archer AX18 |
| ISP | Magenta Gateway |
| Client | Linux Mint / Laptop |
This project includes the setup of an isolated laboratory environment (VLAN 30) behind a pfSense firewall on a Proxmox host.
- VM Name:
ai-ops-01 - OS: Ubuntu Server
- IP Address:
192.168.30.20 - VLAN ID: 30
- Gateway:
192.168.30.1(pfSense)
To establish internet connectivity for the VM in VLAN 30, the following potential issues were systematically excluded using the OSI Model approach:
- Issue: The VM could not reach the gateway.
- Solution: Ensured the Proxmox bridge (
vmbr1) is "VLAN-aware" and the VM is assigned the correct VLAN tag (30). - Validation:
ip neighshowed the statusREACHABLEfor the gateway's MAC address.
- Issue: Packets were blocked by pfSense (
Default deny rule). - Solution: * Created a Pass Rule on the
AIOPSinterface.- Changed the source from a specific IP to
Anyto rule out subnet misconfigurations.
- Changed the source from a specific IP to
- Validation: Successful ping to gateway
192.168.30.1.
- Issue: Packets reached pfSense but did not exit correctly to the internet.
- Solution:
- Configured Outbound NAT on the
WANinterface for the192.168.30.0/24network. - Disabled Static Port to ensure compatibility with the upstream primary router.
- Configured Outbound NAT on the
- Validation:
ping -n 1.1.1.1was successful.
- Issue: Internet access via IP was functional, but
google.comcould not be resolved. - Solution: Added the
192.168.30.0/24subnet to the DNS Resolver Access Lists in pfSense. - Validation:
ping google.comwas successful.
In accordance with the project policy (Jan 28, 2026), all sensitive variables and the final network topology are managed within the vault_passwords.yml file.
After establishing basic connectivity, the VM was prepared for professional operation within Proxmox and for automation.
To improve communication between the Proxmox host and the VM (e.g., for graceful shutdowns and IP display), the Guest Agent was installed.
- Command:
sudo apt install qemu-guest-agent - Status: Service successfully activated, even though Ubuntu manages it as a static unit.
- Result: The VM's IP address is now directly visible in the Proxmox summary.
To ensure the VM remains reachable at a consistent address for automation, it was transitioned from DHCP to a static configuration.
- File:
/etc/netplan/50-cloud-init.yaml - Configuration:
- IP:
192.168.30.20/24 - Gateway:
192.168.30.1 - DNS:
192.168.30.1(pfSense) &8.8.8.8
- IP:
Laboratory access is now managed centrally from the Management PC (Linux Mint) via Ansible.
- SSH Keys: The public key from the Management PC was deployed (
ssh-copy-id) to enable passwordless logins. - Ansible Vault: Sensitive data, such as the
sudopassword for userangel, is stored encrypted ingroup_vars/all.yml. - Vault Automation: Optimized the workflow using a local
.vault_pass.txtandansible.cfgto eliminate manual password prompts during playbook execution.
Insert Screenshot here: (Successful Ansible Ping or
whoamitest run)
Implemented workflows:
update_system.yml: Performs a fullapt upgrade.check_reboot.yml: Checks for the existence of/var/run/reboot-requiredand securely reboots the VM if necessary.
Insert Screenshot here: (Ansible Playbook Run without errors)
- Isolated network environment (VLAN 30) active.
- Internet access & DNS via pfSense stable.
- VM management via Proxmox Guest Agent active.
- Full control established via Ansible automation.
Following successful network integration, the VM ai-ops-01 was fully provisioned as a Docker Host via automation.
The entire setup process is managed via Ansible, including:
- System Maintenance: Intelligent reboot checks (
check_reboot.yml). - Docker Stack: Automated installation of Docker Engine & Docker Compose (
setup_docker.yml). - Security: Secure management of database passwords and API keys using Ansible Vault.
Proof 1: Intelligent System Maintenance Ansible independently detects if a reboot is required and skips the task (
skipping) if the system is up to date.
The VM is now ready for the deployment of n8n and AI tools. The user angel has been authorized to manage Docker containers without sudo, improving security and workflow.
Proof 2: Successful Docker Provisioning The final playbook report shows the successful setup of all components and user permissions.
- Static IP & Guest Agent configured.
- Ansible Vault & SSH Key authentication active.
- Docker & Docker Compose fully operational.
- Next Step: Start the n8n stack (n8n + Postgres).
The heart of the automation, n8n, was deployed as a container stack alongside a Postgres database.
- Orchestration: Docker Compose (via Ansible
community.dockercollection). - Persistence: Docker Volumes for n8n data and database content.
- Security: Dynamic injection of DB credentials via Ansible Vault during deployment.
Proof: Successful Stack Deployment All tasks were executed without errors; the stack is production-ready.
During deployment, specific adjustments were made to make the stack operational in a local development environment:
- YAML Validation: Corrected indentation in the Docker Compose template to resolve the
additional properties not allowederror. - Security Override: Set
N8N_SECURE_COOKIE=falseto allow local HTTP access without SSL. - Vault Integration: Used
vault_passwords.ymlfor secure injection of thePOSTGRES_PASSWORDvariable.
After deployment, system reachability was successfully verified.
| Component | Status | URL / Port |
|---|---|---|
| n8n Frontend | ✅ Online | http://192.168.30.20:5678 |
| Postgres DB | ✅ Connected | Internal Port 5432 |
To ensure privacy-compliant and cost-free AI processing, the system was expanded with a local LLM interface.
Installed via a dedicated Ansible playbook (install_ollama.yml), handling:
- Download and installation of the Ollama binary.
- Systemd service configuration for auto-start.
- Initial pull of the Llama3 model (approx. 4.7 GB).
An intelligent workflow was created in n8n, acting as the bridge between the automation server and the local AI.
Configuration Details:
- Node Structure: An
AI Agentacts as the brain, supported by anOllama Chat Model. - Connectivity: Connected via VM IP on port
11434. - Model: Utilizing
llama3:latest.
Proof: Successful AI Execution The image shows the validated workflow. The green indicators confirm that the AI Agent successfully communicated with Llama3 and generated a response.
Following extensive testing, the architecture migrated from OpenClaw to Open WebUI for robust Model Context Protocol (MCP) integration and stable connection to local inference engines.
Roles are distributed to optimize resources and isolate the management layer:
- Management Hub (VM 102 - Mint): Hosts the frontend (Open WebUI) and automation engine (n8n).
- AI Service Node (ai-ops-01): Hosts the inference engine (Ollama) and interface logic (MCP Server & mcpo Bridge).
Implemented a FastMCP-based server to grant the AI direct access to Proxmox infrastructure.
- The Challenge: Open WebUI requires a standard OpenAPI/REST interface, while MCP communicates natively via SSE (Server-Sent Events).
- The Solution: Implementation of the
mcpoBridge. This runs the MCP Python script as a stdio subprocess and translates tools into a dynamicopenapi.jsonon port5002. - Security Patch: Implemented a Python monkey-patch to disable
dns_rebinding_protectionin FastMCP, allowing cross-VLAN access from the Management VM to the Docker host.
Provisioned entirely via Ansible (deploy_ai_brain.yml) using encrypted secrets from vault_passwords.yml.
| Service | Port | Host | Role |
|---|---|---|---|
| Open WebUI | 8080 | Mint-VM (102) | Primary Chat Interface & Tool Hub |
| n8n | 5678 | Mint-VM (102) | Event-Handling & Workflow Automation |
| mcpo (Bridge) | 5002 | ai-ops-01 | REST Translator for Proxmox Tools |
| Ollama | 11434 | ai-ops-01 | Local LLM Inference (Llama3) |
| Proxmox API | 8006 | Host (WTR Pro) | Target Infrastructure for AI Control |
Verified via the generated OpenAPI specification:
curl http://192.168.1.10:5002/openapi.json | python3 -m json.tool
Result: The AI now has "hands" within the HomeLab. It can autonomously:
- Fetch VM lists from the Proxmox host.
- Analyze resource utilization (CPU/RAM) across nodes.
- Start/Stop VMs based on natural language commands.
The automation layer has reached full maturity. The system now autonomously monitors the Proxmox infrastructure and reports status changes via Telegram using local AI.
The integration was verified by a successful end-to-end execution, transforming raw JSON data into human-readable intelligence.
- Endpoint:
http://192.168.30.20:5002/list_vms - Method:
POST(Utilizing Raw Body for custom data mapping) - AI Engine: Ollama / Llama3 (Local Inference)
- Result: n8n retrieves the VM list, feeds it into the LLM, and delivers a German status report via Telegram.
During implementation, we identified that n8n’s default JSON serializer has limitations with complex mappings.
The Solution: We switched the HTTP Request to Body Content Type: Raw with application/json headers. This allowed us to use a JavaScript .map() expression to pre-format the VM data for the AI.
Optimized Prompt Logic:
{
"model": "llama3",
"stream": false,
"prompt": "Analysiere diese Proxmox VMs auf Deutsch und gib einen kurzen Statusbericht: {{ $input.all().map(i => i.json.node + ' VM ' + i.json.name + ' ist ' + i.json.status).join(', ') }}"
}To ensure professional readability in the Telegram client, a Regex-based string replacement was implemented to handle the AI's newline characters (\n).
Telegram Expression:
{{ $json.response.replace(/\\n/g, '\n') }}
Proof: Successful Proxmox Analysis via Telegram The screenshot confirms that Llama3 correctly identifies "stopped" VMs (e.g.,
DC-01,CL-01-WIN11) and delivers a structured report to the "Homelab Monitor" bot.
After establishing infrastructure monitoring, the system was expanded to include a security-focused workflow that analyzes firewall logs in real-time using Llama3.
To make pfSense logs accessible for the AI, a remote logging pipeline was established.
- Configuration: pfSense was configured (via WebUI: Status -> System Logs -> Settings) to send logs to the central AI node.
- Remote Log Server:
192.168.30.20:5141(ai-ops-01). - Receiver: A
syslog-ngDocker container was deployed onai-ops-01to capture and store the incoming stream.
A second n8n workflow was implemented to perform automated threat hunting:
- Read Logs: n8n accesses the log file
/var/log/syslog-ng/pfsense.logvia theExecute Commandnode. - AI Filtering: The logs are forwarded to the local Llama3 instance.
- Prompt Logic: The AI is tasked to identify suspicious patterns, blocked connection attempts from unusual IPs, or potential security breaches.
- Instant Alerting: If security concerns are found, a detailed summary is sent via Telegram.
Proof: AI Security Analysis in n8n The screenshot shows the successful parsing of firewall logs. Llama3 identifies patterns in the
pfsense.logand summarizes them for Telegram.
To ensure n8n can read the logs generated by the syslog-ng container, the log directory was mounted as a shared volume across both containers. This ensures zero-latency access to security events.
The laboratory infrastructure is now fully "AI-Aware" and consists of the following service matrix on ai-ops-01 (192.168.30.20):
| Service | Port | Role |
|---|---|---|
| n8n | 5678 |
Central Automation Engine & Workflow Orchestrator |
| Open WebUI | 8080 |
Human-to-AI Interface (Chat) |
| Ollama | 11434 |
Local LLM Inference (Llama3) |
| mcpo | 5002 |
REST Bridge for Proxmox API Control |
| syslog-ng | 5141 |
Central Security Log Collector for pfSense |
To ensure 100% reproducibility and disaster recovery, the entire AI-Ops stack has been migrated to a centralized Ansible configuration.
The manual container setups were refactored into an idempotent Ansible playbook. This allows for a one-click redeployment of the complete infrastructure on ai-ops-01.
Key Tasks Automated:
- Directory Structure: Creation of persistent data paths for logs and workflows.
- Component Deployment:
- Open WebUI: Central AI interface.
- Proxmox MCP & mcpo Bridge: Infrastructure API connectors.
- Syslog-ng: Centralized log collector for pfSense security audits.
- n8n Stack: The automation engine orchestrating the workflows.
- Security: Integrated
ansible-vaultusingvault_passwords.ymlfor sensitive credential injection.
Proof: Successful Infrastructure Orchestration The terminal output confirms that all 8 tasks executed successfully (ok=8), verifying that the AI-Ops environment is now fully managed via Infrastructure as Code (IaC).
After the successful deployment via Ansible, the network layer was hardened to prevent unauthorized access to the AI services, as all containers were binding to 0.0.0.0.
A system audit revealed several exposed services on ai-ops-01 (192.168.30.20):
- Port 8080: Open WebUI (Cleartext)
- Port 5678: n8n (Cleartext)
- Port 11434: Ollama API
- Port 5141: Syslog-ng (UDP/TCP)
To secure these services, a strict "Default Block" policy was implemented on the pfSense firewall for the AIOPS VLAN.
| Order | Action | Source | Destination | Ports | Description |
|---|---|---|---|---|---|
| 1 | PASS | 10.0.10.52 (Mint) |
192.168.30.20 |
5678, 8080 | Mgmt Access |
| 2 | PASS | 192.168.30.20 |
Any |
Any | Outbound Updates |
| 3 | BLOCK | Any |
192.168.30.20 |
Any | Security Baseline |
- Success: Access via Management VM (
10.0.10.52) remains fully functional. - Security: Unauthorized access attempts from other VLANs are successfully dropped by Rule 3.
Note: The order of rules is critical. The "Allow" rule for the Management VM must precede the "Block All" rule to ensure administrative access is not severed.
To protect the "AI-Ops Brain" from hardware failure or data corruption, a multi-tier backup strategy has been implemented, focusing on Version Control (Git) and encrypted off-site storage.
The following critical components are prioritized for the backup routine:
- Ansible Repository: All playbooks (
deploy_ai_brain.yml), roles, and inventory files. - n8n Workflows: Exported JSON files of the Proxmox, pfSense, and Trading pipelines.
- Configuration Files: Docker-compose files,
syslog-ngconfigs, and thevault_passwords.yml(encrypted). - Node Data: Persistent volumes of the
mcpobridge and custom Python scripts.
The infrastructure is now managed as Infrastructure as Code (IaC).
- Repository: Private GitHub repository for configuration files.
- Automation: An n8n workflow or a cron-based Ansible task periodically pushes the latest verified configurations to GitHub.
- Security: Sensitive data is strictly handled via
ansible-vaultto ensure no plain-text passwords ever reach the remote repository.
A dedicated "Backup-Agent" workflow has been designed:
- Trigger: Daily at 03:00 AM.
- Action: Exports all active n8n workflows via the n8n API.
- Storage: Commits the exports to the local Git directory and performs a
git pushto the encrypted off-site target.
Reliability Note: Hardware is temporary, but the "Brain" (the logic) is now permanent and recoverable on any new Linux node within minutes using the Ansible Playbook.
While GitHub stores the configuration files (Infrastructure as Code), the actual state of the AI-Ops node—including n8n database, user credentials, and active workflow executions—resides within Docker Volumes. A hardware failure or container corruption would lead to a total loss of these operational data points if not backed up separately.
To ensure full recovery, the persistent data must be extracted from the Docker environment:
- Target Volumes: Focus on
/var/lib/docker/volumes/(specifically forn8n_data,syslog_data, andollama_configs). - Mechanism: Automated "Snapshot-to-Archive" process using the
docker run --rm --volumes-frommethod to create compressed.tar.gzsnapshots. - Storage Location: Backups are first stored locally in
/home/angel/backups/and then synced to an off-site target (S3, NAS, or encrypted Cloud) to follow the 3-2-1 Backup Rule.
The recovery is integrated into the Ansible lifecycle:
- Re-deploy Stack: Ansible recreates the containers and empty volumes.
- Data Restoration: The latest
.tar.gzsnapshot is extracted back into the volumes before the services start. - Verification: System integrity check to ensure n8n recognizes all previous workflows and API keys.
Security Warning: Docker volume backups contain unencrypted secrets (like your Binance API keys or Telegram tokens). These archives MUST be encrypted (e.g., via
gpgoransible-vault) before being moved to any cloud-based or external storage.
To ensure the stability and performance of the AI-Ops infrastructure, a professional monitoring stack was deployed. This allows for real-time tracking of system resources and service health.
The stack is deployed via Docker Compose and consists of three core services:
- Prometheus: The time-series database that collects metrics.
- Grafana: The visualization platform for dashboards.
- Node-Exporter: A helper service that exports hardware metrics (CPU, RAM, Disk) from the host.
To allow the Management VM (Mint) access to the new monitoring services while maintaining a strict firewall policy, a Port Alias was created in pfSense.
- Alias Name:
AI_Stack_Ports - Included Ports:
3000(Grafana),5678(n8n),8080(Open WebUI),5001/5002(MCP/Bridge). - Rule Logic: The existing "Pass" rule for the Mint VM was updated to use this alias as the destination, ensuring all AI services are reachable through a single, manageable rule.
Instead of building views from scratch, the Node Exporter Full (ID: 1860) dashboard was imported.
- Data Source: Prometheus (connected via
http://127.0.0.1:9090). - Metrics Tracked: CPU Load, Memory Usage, Disk I/O, and Network Traffic.
The monitoring stack was added to the main Ansible lifecycle to ensure it is part of the automated "AI Brain" deployment.
# Adding the monitoring stack to the Ansible Playbook
cat >> deploy_ai_brain.yml << 'EOF'
- name: Deploy Monitoring Stack
community.docker.docker_compose_v2:
project_name: monitoring
project_src: /home/angel/ai-stack/monitoring
state: present
EOF
To complete the "AI-Ops" security perimeter, the AIOPS interface (VLAN 30) has been integrated into Suricata to prevent lateral movement within the lab environment.
- Mode: Legacy Mode (selected for maximum stability with virtualized network drivers).
- Coverage: Multi-layered defense combining WAN (perimeter) and AIOPS (internal stack).
- Blocking: "Block Offenders" is enabled to immediately drop malicious IPs at the firewall level.
Integration into the n8n AI-Security Analyst is achieved through enhanced logging parameters:
- Syslog Forwarding: Enabled (
LOCAL1/NOTICE) for the n8n/syslog-ng pipeline. - HTTP Inspection: Active for granular tracking of web-based attack vectors.
- IaC Validation: The entire monitoring stack deployment was verified via Ansible (ok=8).
- Initialization: The interface was manually bound via
+ Addto ensure visibility in Alert/Block dropdown menus. - Status Check: Both WAN and AIOPS interfaces are verified active (green checkmark) and operational.
- AIOPS Monitoring: Interface successfully initialized and started.
- Alert Visibility: AIOPS is selectable in the Alert and Block dropdown menus.
- Log Stream: Security events are flowing from the internal VLAN to the central syslog-ng collector. [cite: eigene Analyse
To automate threat detection and reduce alert fatigue, an autonomous AI Security Analyst was deployed using n8n and Llama3. This workflow fulfills key requirements for Security+ Domain 4.3 (Incident Response) and 2.1 (Threat Intelligence).
The system follows a 5-minute polling cycle to analyze network telemetry:
- Ingestion (Schedule): Triggers every 5 minutes to maintain near real-time visibility.
- Extraction (SSH): Filters
/home/angel/ai-stack/syslog/pfsense.logfor Suricata-specific security events while ignoring administrative noise (e.g., successful logins). - Intelligence (Ollama/Llama3): Processes the raw log strings. The AI acts as a Tier-1 Analyst, categorizing events by risk level and suggesting mitigation strategies.
- Filtering (IF-Node): Ensures only actionable intelligence is forwarded to prevent notification spam.
- Alerting (Telegram): Sends a formatted Markdown report directly to the administrator's mobile device.
During initial testing, the AI successfully categorized the following internal events:
- Configuration Changes: Identified as Low Risk ✅
- Log Rotations: Correctly identified as Normal/Maintenance ✅
- Session Timeouts: Identified as standard security behavior ✅
| Workflow Name | Function | Frequency | Target |
|---|---|---|---|
| Proxmox VM Monitor | Resource Health | 5 Min | Telegram |
| pfSense Log Analysis | Firewall Telemetry | 5 Min | Telegram |
| Suricata Security | IDS/IPS Intelligence | 5 Min | Telegram |
Operational Insight: By offloading initial log review to Llama3, the "Mean Time to Detect" (MTTD) is significantly reduced without requiring human intervention for routine logs.
To enable secure access to the AI-Stack Dashboard (Open WebUI) and n8n workflows from external networks, a WireGuard VPN was implemented on pfSense. The troubleshooting of this setup covers critical aspects of Security+ Domain 3.2 (Remote Access Solutions) and 4.4 (Network Troubleshooting).
The connection utilizes a dedicated VPN subnet to isolate administrative traffic:
- VPN Subnet:
10.0.50.0/24 - Mobile Peer (iPhone):
10.0.50.3 - Laptop Peer (ROG):
10.0.50.2 - Target Resource:
192.168.30.20(Ubuntu AI Server), Port8080.
- Problem: Packets were dropped by the target server despite "Pass" rules in the firewall.
- Root Cause: Virtualized network interfaces (VirtIO) often miscalculate checksums, causing the target OS to discard packets.
- Resolution: Navigated to
System > Advanced > Networkingand enabled Disable hardware checksum offload.
- Problem:
tcpdumpon the Ubuntu server showed incoming requests from the invalid IP192.168.30.0instead of the VPN IP10.0.50.3. - Root Cause: Automatic Outbound NAT was incorrectly masking VPN traffic with the network ID of the destination interface.
- Resolution: Switched to Hybrid Outbound NAT and created a "No NAT" rule for traffic from
10.0.50.0/24to192.168.30.0/24.
- Problem: Laptop connection resulted in "0 B Received".
- Root Cause: Peer identity conflict. WireGuard requires a unique Public/Private key pair for every individual device.
- Resolution: 1. Generated a unique key pair on the ROG Laptop. 2. Updated the pfSense Peer configuration with the Laptop's specific Public Key. 3. Correctly mapped the pfSense Tunnel Public Key to the Laptop's Peer configuration.
| Device | Transport | Authentication | Status |
|---|---|---|---|
| iPhone (5G) | WireGuard App | PubKey + PSK | SUCCESS (Dashboard OK) |
| ASUS ROG Laptop | WireGuard Windows | PubKey + PSK | Pending (Handshake Fix) |
Hier ist die ergänzte Dokumentation für dein Handbuch. Dieser Teil dokumentiert die finale Lösung des Laptop-Problems und vervollständigt das WireGuard-Kapitel. Markdown
The remote access for the ASUS ROG Laptop was finalized after resolving a critical "Identity Mismatch" in the cryptographic exchange.
The primary reason for the initial "Handshake: Never" status was an incorrect Public Key assignment. The following mapping was implemented to establish the secure tunnel:
- Peer Validation: The Public Key generated by the Laptop App (
MgWxz7...) was copied into the pfSensePeersconfiguration. - Endpoint Authentication: The Public Key of the pfSense Tunnel (
8R5JVh...) was copied into the Laptop App under the[Peer]section. - IP Alignment: The Laptop's internal VPN address was standardized to
10.0.50.2/32to match the pfSense "Allowed IPs" definition.
- Handshake Status: Verified via
Status > WireGuard(Handshake active < 1 min). - Service Access: Successfully accessed the Ubuntu Dashboard via
http://192.168.30.20:8080from a remote network. - Data Flow: Bi-directional traffic confirmed (Received/Sent bytes increasing).
After establishing full connectivity, the following hardening steps were performed to ensure long-term stability and security for the remote access solution (Security+ Domain 3.2 & 4.4).
To maintain the integrity of Outbound NAT mappings and routing persistence, a dual-tab interface structure was implemented in pfSense:
- Layer 1: WireGuard (Group Tab): Serves as the central instance for administrative rules and global logging (Default Deny).
- Layer 2: VPN (Member Tab/
tun_wg0): A manually assigned interface dedicated to the data-plane traffic for the AI-Stack. This ensures that NAT fixes and MSS clamping remain persistent across system reboots.- IPv4 Config: Set to Static IPv4.
- IPv4 Address:
10.0.50.1/24(Acts as the local gateway for the VPN subnet). - MSS Clamping: Set to
1240to prevent packet fragmentation across diverse mobile networks.
- MTU Tuning: Configured
MTU = 1280in the Laptop client configuration to ensure VPN headers do not exceed the standard 1500-byte Ethernet frame. - DNS Gateway Logic: Clients are configured to use
DNS = 10.0.50.1. This ensures all queries are processed by the local Unbound instance over the encrypted tunnel. - DNS Rebinding Protection: Hardened the system against rebinding attacks while allowing internal resolution by adding the
private-domain: "ai.local"directive to the Unbound Custom Options.
The rule logic uses Static Aliases to resolve pfSense macro errors and ensures strict identity-based access. Filtering is primarily handled on the WireGuard (Group) tab, while the VPN (Member) tab handles technical stability.
Trusted_VPN_Clients: Contains static VPN IPs for the Laptop (10.0.50.2) and Mobile device (10.0.50.3).
| Tab | Priority | Action | Source | Destination | Port | Purpose |
|---|---|---|---|---|---|---|
| WG | 1 | PASS | 10.0.50.2 |
10.0.20.1 |
8443 |
Restricted pfSense Admin GUI |
| WG | 2 | PASS | 10.0.50.2 |
192.168.30.20 |
22 |
Secure CLI Management (SSH) |
| WG | 3 | PASS | Trusted_Clients |
192.168.30.20 |
AI_Ports |
WebUIs (n8n, OpenWebUI) |
| VPN | 4 | PASS | Trusted_Clients |
10.0.50.1 |
53 (UDP) |
Internal DNS (Unbound) |
| VPN | 5 | PASS | Trusted_Clients |
192.168.30.0/24 |
* |
Sloppy Access (Stability Fix) |
| WG | 6 | BLOCK/LOG | Any |
Any |
Any |
Default Deny (SOC Data Source) |
To resolve connection drops in the virtualized environment, the Allow_Trusted_VPN_Sloppy_Access rule was implemented on the VPN Interface tab:
- Sloppy State & Any Flags: Allows TCP data flow even during incomplete handshakes or packet reordering within the tunnel.
- NAT Persistence: Verified Hybrid Outbound NAT rule (Interface:
AIOPS, Source:10.0.50.0/24, Destination:192.168.30.0/24, Option:NO NAT) to preserve original source IPs for auditing.
The Unbound DNS Resolver provides local resolution for the AI-Stack ecosystem via Host Overrides:
ai-brain.ai.local->192.168.30.20n8n.ai.local->192.168.30.20grafana.ai.local->192.168.30.20
Rule 6 (Block & Log) acts as a sensor for the AIOps monitoring stack:
- SOC Trigger: Blocked packets generate log entries forwarded to the
syslog-ngcontainer. - Alerting: The
Suricata_Analystn8n workflow monitors logs and triggers immediate Telegram notifications for unauthorized access attempts.
Status [2026-03-06]: The WireGuard environment is verified as Stable, Hardened, and Production Ready. Internal DNS resolution is operational without compromising Rebinding Protection.
To ensure high availability and proactive capacity management of the AI-Stack, a centralized monitoring and alerting system was implemented using Grafana, Prometheus, and Telegram.
A dedicated Telegram Bot was integrated as the primary contact point for critical infrastructure alerts.
- Integration: Grafana Alerting → Contact Points
- Bot Name:
AI_Stack_Alert_Bot - Configuration: * Integration Type: Telegram
- Chat ID:
1930764418
- Chat ID:
- Status: Verified via successful test notification.
The following alerts were configured within the Homelab folder to monitor the health of the ai-ops-01 node.
During implementation, it was discovered that node-exporter running in Docker mapped the root filesystem differently. The query was adjusted to target the specific physical device.
- Metric Analysis:
node_filesystem_avail_bytesverified viacurl http://localhost:9100/metrics. - Final Query: ```promql
100 - (node_filesystem_avail_bytes{device="/dev/sda2"} / node_filesystem_size_bytes{device="/dev/sda2"} * 100)
- Threshold:
IS ABOVE 80% - Current State: Verified at 77.19% (Warning level).
- Group:
disk-alerts
Monitors the percentage of used memory to prevent OOM (Out of Memory) kills of AI models (Ollama).
- Query: ```promql
(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100
- Threshold:
IS ABOVE 90% - Group:
ram-alerts
Monitors the reachability of the exporters and container status.
- Query: ```promql
up{job="node"} == 0
- Threshold:
IS BELOW 1 - Logic: If the result table is empty, all systems are UP. An alert is only triggered if a target returns
0(Down). - Group:
container-alerts
If metrics return No Data, the following diagnostic steps were established:
- Check Prometheus Exporter: Verify raw data via
curl http://localhost:9100/metrics. - Verify Device Mapping: Use
Explorein Grafana to find correct labels (e.g.,device="/dev/sda2"instead ofmountpoint="/"). - Prometheus Direct Query: Confirm metrics are indexed at
http://192.168.30.20:9090.
| Component | Function | Status |
|---|---|---|
| node-exporter | Collects hardware metrics | Active ✅ |
| Prometheus | Time-series database | Active ✅ |
| Grafana | Visualization & Alert Logic | Active ✅ |
| Telegram Bot | Instant Alert Delivery | Active ✅ |
Status [2026-03-09]: Monitoring is fully operational. The system currently monitors Disk, RAM, and Container availability. Alerts are silenced during normal operation (Empty Table logic for "Container Down") and will fire immediately upon threshold violation.
A dedicated Wazuh SIEM VM was deployed to centralize security event management.
| Component | Details |
|---|---|
| VM | wazuh-siem (VM 107) |
| OS | Ubuntu Server 24.04 |
| IP | 192.168.30.20 |
| RAM | 8GB |
| Disk | 50GB |
| Agent | Host | Status |
|---|---|---|
| 001 | ai-ops-01 | Active ✅ |
| 002 | Mint-Management | Active ✅ |
- pfSense logs forwarded via syslog-ng → Wazuh Agent on ai-ops-01
- Suricata IDS alerts visible in Wazuh Dashboard
- 82+ security events indexed
``` Network Traffic ↓ Suricata IDS (pfSense) ↓ syslog-ng (ai-ops-01) ↓ Wazuh Agent → Wazuh Manager ↓ Wazuh Dashboard (https://192.168.30.30) ↓ Telegram Alerts (n8n) ```
A unified AI gateway providing automatic model fallback.
``` Request → LiteLLM (:4000) ├── Gemini 2.0 Flash (Cloud) ├── qwen2.5:14b (Local) └── llama3 (Fallback) ```
```bash ai-local # Start Aider with local Ollama ai # Start Aider with Gemini ```
Features:
- Direct file editing in terminal
- Git integration
- Auto-fallback between models
- MCP context awareness
| VM | Name | IP | RAM | Status |
|---|---|---|---|---|
| 101 | pfSense-CE | 192.168.1.136 | - | Running ✅ |
| 102 | Mint-Management | 10.0.10.52 | 4GB | Running ✅ |
| 103 | Webserver-01 | - | - | Running ✅ |
| 104 | DC-01 | - | - | Stopped ⏸ |
| 105 | CL-01-WIN11 | - | - | Stopped ⏸ |
| 106 | ai-ops-01 | 192.168.30.20 | 32GB | Running ✅ |
| 107 | wazuh-siem | 192.168.30.30 | 8GB | Running ✅ |
- Proxmox + pfSense + VLANs
- AI Stack (Ollama + Open WebUI)
- n8n Automation + Telegram Bot
- Grafana + Prometheus Monitoring
- WireGuard VPN
- Wazuh SIEM
- LiteLLM + Aider Terminal Agent
- Active Directory (DC-01)
- Kali Linux Attacker VM
- CI/CD Pipeline
- Terraform-Proxmox












