NetMon is a production-ready, high-performance network monitoring solution for Kubernetes that leverages eBPF (Extended Berkeley Packet Filter) technology to provide deep visibility into network connections with minimal overhead. It captures comprehensive network flow data including successful connections, failed attempts, protocol details, and precise latency measurements.
- Features
- Architecture
- Strengths
- Limitations
- Requirements
- Installation
- Building
- Running
- Testing
- Debugging
- Benchmarking
- Configuration
- Metrics
- Troubleshooting
- Contributing
- Kernel-Level Packet Inspection: Direct packet capture using eBPF TC (Traffic Control) programs
- Zero-Copy Performance: Minimal CPU overhead (<2% baseline) with efficient kernel-to-userspace communication
- Comprehensive Flow Tracking: Captures all TCP/UDP/ICMP flows including failed connections
- Kubernetes-Aware: Automatic enrichment with pod, service, namespace, and node metadata
- Multi-Architecture Support: Supports both x86_64 and arm64 architectures
- Production Ready: Battle-tested design with graceful degradation and comprehensive error handling
- Prometheus Integration: Native metrics export for easy integration with existing monitoring stacks
- Real-Time Latency Tracking: Precise RTT measurements and connection timing
- Protocol Detection: Application-layer protocol identification (HTTP, gRPC, TLS)
┌─────────────────────────────────────────────────────────────┐
│ Kubernetes Node │
│ │
│ ┌───────────────┐ ┌───────────────┐ ┌──────────────┐ │
│ │ Pod A │ │ Pod B │ │ Pod C │ │
│ └───────────────┘ └───────────────┘ └──────────────┘ │
│ │ │ │ │
│ └─────────────────┴───────────────────┘ │
│ │ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Kernel Space │ │
│ │ ┌──────────────────────────────────────────────────┐ │ │
│ │ │ eBPF TC Programs │ │ │
│ │ │ ┌────────────┐ ┌─────────────┐ ┌──────────┐ │ │ │
│ │ │ │ TC Ingress │ │ TC Egress │ │ Flow Map │ │ │ │
│ │ │ │ Classifier │ │ Classifier │ │ Tracking │ │ │ │
│ │ │ └────────────┘ └─────────────┘ └──────────┘ │ │ │
│ │ └──────────────────────────────────────────────────┘ │ │
│ └────────────────────────────────────────────────────────┘ │
│ │ │
│ Perf Event Buffer │
│ │ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ User Space │ │
│ │ ┌──────────────────────────────────────────────────┐ │ │
│ │ │ NetMon Agent (DaemonSet) │ │ │
│ │ │ ┌──────────┐ ┌──────────┐ ┌────────────────┐ │ │ │
│ │ │ │ Flow │ │ K8s │ │ Prometheus │ │ │ │
│ │ │ │ Tracker │ │ Enricher │ │ Exporter │ │ │ │
│ │ │ └──────────┘ └──────────┘ └────────────────┘ │ │ │
│ │ └──────────────────────────────────────────────────┘ │ │
│ └────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
- Ultra-Low Overhead: <2% CPU usage under normal conditions, compared to 5-10% for traditional packet capture
- Efficient Memory Usage: 50-200MB per node with intelligent flow aggregation
- Scalable: Handles millions of concurrent connections per node
- Zero-Copy Architecture: Direct kernel-to-userspace data transfer via perf buffers
- Complete Flow Coverage: Captures all connections including failed attempts, timeouts, and resets
- Bidirectional Tracking: Independent tracking of ingress and egress traffic
- Protocol Awareness: Deep packet inspection for application protocol detection
- Latency Metrics: Microsecond-precision RTT and connection timing
- Native Kubernetes Integration: Seamless deployment as DaemonSet with automatic RBAC
- Multi-Architecture: Single binary supports both x86_64 and arm64
- Production Hardened: Graceful degradation, comprehensive error handling, resource limits
- Observable: Self-monitoring metrics, detailed logging, health endpoints
- Simple API: Clean Go interfaces with comprehensive documentation
- Extensive Testing: Unit, integration, and benchmark test suites
- Easy Debugging: Built-in debug mode, trace logging, diagnostic tools
- Flexible Configuration: YAML-based config with hot-reload support
- Minimum Kernel: 4.15+ required, 5.4+ recommended for full features
- BTF Support: Required for CO-RE (Compile Once, Run Everywhere) - kernel 5.2+
- eBPF Features: Some advanced features require newer kernels
- Privileged Access: Requires privileged containers or specific capabilities (NET_ADMIN, SYS_ADMIN)
- Platform Support: Linux-only (no Windows/macOS support for production)
- Resource Usage: Memory scales with connection count (300 bytes per flow)
- No Packet Payload: Does not capture or inspect packet contents (privacy by design)
- Sampling at Scale: May require sampling at >100k connections/sec per node
- NAT Complexity: Limited visibility through complex NAT configurations
- Encrypted Traffic: Cannot decode encrypted payloads (TLS/HTTPS)
- Linux kernel 4.15+ (5.4+ recommended)
- Kubernetes 1.19+
- 2 CPU cores, 512MB RAM minimum per node
- Privileged container permissions or specific capabilities
- Go 1.21+
- Clang/LLVM 12+ (for eBPF compilation)
- Docker or Podman (for container builds)
- Make (for build automation)
- Zig compiler (for simplified cross-compilation)
- KIND (for local testing)
- Prometheus (for metrics collection)
# Clone the repository
git clone https://github.com/netmon/netmon.git
cd netmon
# Deploy to Kubernetes
kubectl apply -f deployments/kubernetes/namespace.yaml
kubectl apply -f deployments/kubernetes/rbac.yaml
kubectl apply -f deployments/kubernetes/configmap.yaml
kubectl apply -f deployments/kubernetes/daemonset.yaml
# Check status
kubectl -n netmon-system get pods
kubectl -n netmon-system logs -l app=netmon-agenthelm repo add netmon https://netmon.github.io/charts
helm install netmon netmon/netmon --namespace netmon-system# Install dependencies
make dev-setup
# Build eBPF programs and Go binary
make build
# Build for specific architecture
make build-x86
make build-arm64
# Build with Zig (recommended for cross-compilation)
make build-zig ARCH=arm64# Build container image
make docker-build
# Build multi-arch image
docker buildx build --platform linux/amd64,linux/arm64 -t netmon:latest .
# Build for KIND
make build-kind# Debug build with symbols
make build-debug
# Static binary for minimal containers
make build-static
# Custom version
make build VERSION=1.2.3# Run locally (requires root)
sudo ./build/netmon --debug --log-level=trace
# Run with custom config
sudo ./build/netmon --config=configs/dev-config.yaml
# Run with specific interface
sudo ./build/netmon --interface=eth0# Deploy to Kubernetes
kubectl apply -f deployments/kubernetes/
# Run with resource limits
kubectl set resources daemonset/netmon-agent \
--limits=cpu=1,memory=512Mi \
--requests=cpu=100m,memory=128Mi
# Enable specific features
kubectl set env daemonset/netmon-agent \
NETMON_FEATURES_PROTOCOL_DETECTION=true \
NETMON_FEATURES_LATENCY_TRACKING=true# Run with Docker
docker run --privileged --network=host \
-v /sys/fs/bpf:/sys/fs/bpf \
-v /sys/kernel/debug:/sys/kernel/debug \
netmon:latest
# Run with Podman
podman run --privileged --network=host \
--mount type=bind,source=/sys/fs/bpf,target=/sys/fs/bpf \
netmon:latest# Run all unit tests
make test
# Run with coverage
make test-coverage
# Run specific package
go test -v ./pkg/ebpf/...
# Run with race detector
go test -race ./...# Run integration tests (requires root)
sudo make test-integration
# Run with KIND cluster
make test-kind
# Run specific integration test
sudo go test -tags=integration -run TestFlowTracking ./tests/integration/# Setup test environment
make setup-test-env
# Run E2E tests
make test-e2e
# Run load tests
make test-load CONNECTIONS=10000 DURATION=60s# Generate coverage report
make coverage
# View coverage in browser
make coverage-html
open coverage.html# Enable debug logging
export NETMON_DEBUG=true
export NETMON_LOG_LEVEL=trace
# Run with debug server
./netmon --debug-server=:6060
# Access debug endpoints
curl localhost:6060/debug/pprof/
curl localhost:6060/debug/flows
curl localhost:6060/debug/config# List loaded BPF programs
sudo bpftool prog list | grep netmon
# Show BPF map contents
sudo bpftool map dump id <map_id>
# Trace BPF program execution
sudo bpftool prog trace log
# Check BPF verifier logs
sudo cat /sys/kernel/debug/tracing/trace_pipe# Check eBPF program status
kubectl exec -n netmon-system daemonset/netmon-agent -- netmon-cli status
# Dump flow table
kubectl exec -n netmon-system daemonset/netmon-agent -- netmon-cli flows dump
# Enable trace logging
kubectl exec -n netmon-system daemonset/netmon-agent -- netmon-cli debug trace on
# Collect diagnostic bundle
kubectl exec -n netmon-system daemonset/netmon-agent -- netmon-cli support bundle- BPF Program Load Failures
# Check kernel support
grep CONFIG_BPF /boot/config-$(uname -r)
# Verify permissions
capsh --print | grep cap_sys_admin- High Memory Usage
# Check flow count
kubectl exec -n netmon-system daemonset/netmon-agent -- netmon-cli stats flows
# Enable flow aging
kubectl exec -n netmon-system daemonset/netmon-agent -- netmon-cli config set flow.max_age=300s- Missing Flows
# Check interface attachment
kubectl exec -n netmon-system daemonset/netmon-agent -- tc filter show dev <interface>
# Verify packet flow
kubectl exec -n netmon-system daemonset/netmon-agent -- netmon-cli debug packet-trace on# Run standard benchmarks
make benchmark
# Run specific benchmark
go test -bench=BenchmarkFlowProcessing -benchtime=10s ./pkg/ebpf/
# Run with memory profiling
go test -bench=. -benchmem -memprofile=mem.prof ./pkg/ebpf/# Generate test load
./scripts/loadtest.sh --connections=10000 --duration=300s
# Monitor performance during load
./scripts/monitor-performance.sh
# Generate performance report
./scripts/performance-report.sh > perf-report.md# Deploy test workload
kubectl apply -f test/workloads/realistic-traffic.yaml
# Run comprehensive benchmark
./scripts/comprehensive-benchmark.sh
# Results location
cat results/benchmark-$(date +%Y%m%d).jsonKey metrics to monitor:
- CPU Usage: Should stay <2% at 1000 flows/sec
- Memory Usage: ~300 bytes per active flow
- Latency: Event processing <100μs p99
- Drop Rate: Should be 0% under normal load
# configs/netmon.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: netmon-config
data:
config.yaml: |
# Capture settings
capture:
interfaces: ["eth0", "docker0"]
sampling_rate: 1.0
# Flow settings
flow:
max_flows: 1000000
flow_timeout: 300s
# Export settings
export:
prometheus:
enabled: true
port: 9090
# Performance tuning
performance:
workers: 4
buffer_size: 10000# Feature flags
features:
protocol_detection:
enabled: true
protocols: ["http", "grpc", "mysql", "redis"]
latency_tracking:
enabled: true
histogram_buckets: [0.1, 0.5, 1, 5, 10, 50, 100, 500, 1000]
security_monitoring:
enabled: true
detect_port_scans: true
detect_suspicious_patterns: true
# Resource limits
resources:
memory_limit: "1Gi"
cpu_limit: "2"
maps:
flow_map_size: 1000000
event_buffer_size: 100000# Flow metrics
netmon_flows_active{protocol="tcp",direction="egress"} 1523
netmon_flows_total{protocol="udp",direction="ingress"} 45231
netmon_bytes_total{src_ip="10.0.0.1",dst_ip="10.0.0.2"} 5234521
netmon_packets_total{protocol="tcp"} 1234567
# Performance metrics
netmon_ebpf_events_processed_total 5234521
netmon_ebpf_events_dropped_total 0
netmon_processing_duration_seconds{quantile="0.99"} 0.000095
# Resource metrics
netmon_memory_usage_bytes 52345678
netmon_cpu_usage_percentage 1.5
netmon_goroutines_count 42
Import the provided Grafana dashboard:
# Import dashboard
kubectl create configmap netmon-dashboard \
--from-file=dashboards/grafana/netmon-dashboard.json
# Dashboard ID: 12345 (when published to Grafana.com)# Top talkers by bytes
topk(10, sum by (src_ip, dst_ip) (
rate(netmon_bytes_total[5m])
))
# Connection failure rate
sum(rate(netmon_flows_total{status="failed"}[5m])) /
sum(rate(netmon_flows_total[5m]))
# P95 latency by service
histogram_quantile(0.95,
sum by (dst_service, le) (
rate(netmon_latency_seconds_bucket[5m])
)
)
# Built-in diagnostics
netmon-cli doctor
# System compatibility check
netmon-cli check-system
# Generate support bundle
netmon-cli support-bundle --output=/tmp/netmon-support.tar.gz# Reduce overhead for high-traffic environments
export NETMON_SAMPLING_RATE=0.1
export NETMON_FLOW_AGGREGATION=true
export NETMON_BATCH_SIZE=1000
# Optimize for latency measurement
export NETMON_LATENCY_PRECISION=high
export NETMON_TIMESTAMP_SOURCE=hardware-
Reduce Memory Usage
- Enable sampling:
sampling_rate: 0.1 - Reduce flow timeout:
flow_timeout: 60s - Limit tracked protocols
- Enable sampling:
-
Improve Accuracy
- Increase buffer sizes
- Add more worker threads
- Enable kernel timestamps
-
Debug Connection Issues
- Check tc filter attachment
- Verify eBPF program loading
- Monitor drop counters
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
# Fork and clone
git clone https://github.com/YOUR-USERNAME/netmon.git
cd netmon
# Install development tools
make dev-setup
# Create feature branch
git checkout -b feature/your-feature
# Run tests before submitting
make test
make lint# Run full test suite
make test-all
# Test in KIND
make test-kind-integration
# Benchmark your changes
make benchmark-compare BASE=mainThis project is licensed under the Apache License 2.0 - see LICENSE for details.
- The Cilium project for eBPF libraries and inspiration
- The Kubernetes community for excellent client libraries
- The Linux kernel community for eBPF development
For more information, visit our documentation site or join our community Slack.