🚧 WORK IN PROGRESS: BREAKING CHANGES BY DESIGN
Current Status: Persistent storage is now MANDATORY. All fallback to
/tmphas been completely removed. Containers will NOT START without proper persistent volumes. This is an intentional breaking change.Development Philosophy: Breaking changes are introduced frequently without migration paths by design. This approach prevents technical debt accumulation and enables rapid innovation toward the optimal Oracle Cloud VM protection solution.
Latest Breaking Change: Mandatory Persistent Storage (Current Version)
As of the current version, persistent storage is mandatory. The container will fail to start without proper volume configuration:
# Docker Compose
volumes:
- /var/lib/loadshaper:/var/lib/loadshaper
# Kubernetes
volumeMounts:
- name: loadshaper-data
mountPath: /var/lib/loadshaper- Running without persistent volumes
- Fallback to
/tmpstorage (completely removed) - Containers starting without writable
/var/lib/loadshaper
LoadShaper requires proper volume permissions BEFORE starting - no automatic fixes are provided for security reasons.
Error: "Cannot write to /var/lib/loadshaper - check volume permissions"
REQUIRED: Pre-deployment Volume Setup
For Docker named volumes (recommended):
# One-time setup: Create volume with correct permissions
docker run --rm -v loadshaper-metrics:/var/lib/loadshaper alpine:latest chown -R 1000:1000 /var/lib/loadshaper
# Then start LoadShaper
docker compose up -dFor bind mounts:
# Create and set permissions on host directory
sudo mkdir -p /var/lib/loadshaper
sudo chown -R 1000:1000 /var/lib/loadshaper
sudo chmod -R 755 /var/lib/loadshaper
# Update compose.yaml to use bind mount
volumes:
- /var/lib/loadshaper:/var/lib/loadshaperFor Kubernetes/OpenShift:
securityContext:
runAsUser: 1000
runAsGroup: 1000
fsGroup: 1000 # Ensures volume has correct group ownershipVerification:
# Check container logs - should show success
docker logs loadshaper
# Verify volume ownership
docker run --rm -v loadshaper-metrics:/test alpine:latest ls -la /test- Oracle Compliance: 7-day P95 CPU calculations require persistent metrics database
- Data Integrity: Prevents silent failures that could cause VM reclamation
- Performance: Eliminates temporary storage overhead and reliability issues
Next Breaking Changes: Additional Oracle compliance improvements planned. Always check CHANGELOG.md before updating.
LoadShaper follows strict rootless container principles for maximum security:
- Never runs as root - Container always executes as user
loadshaper(UID/GID 1000) - No privilege escalation - No automatic permission fixing or root operations
- User responsibility - Volume permissions must be configured correctly before deployment
- Security first - Prevents container breakout and follows container security best practices
Why Rootless?
- Eliminates container security vulnerabilities
- Follows least-privilege principle
- Compatible with security-conscious environments (Kubernetes, OpenShift)
- Prevents accidental host system modifications
Modern native network generator implementation - Uses Python sockets instead of external dependencies for maximum efficiency and control. Requires Linux 3.14+ (March 2014) with kernel MemAvailable support.
Intelligent baseline load generator that prevents Oracle Cloud Always Free compute instances from being reclaimed due to underutilization.
Oracle Cloud Always Free compute instances are automatically reclaimed if they remain underutilized for 7 consecutive days. An instance is considered idle when ALL of the following conditions are met over a 7-day window:
- CPU utilization for the 95th percentile is below 20%
- Network utilization is below 20% (simple threshold, not P95)
- Memory utilization is below 20% (A1.Flex shapes only, simple threshold, not P95)
Source: Oracle Cloud Always Free Resources - Idle Compute Instances
loadshaper prevents VM reclamation by intelligently maintaining resource utilization above Oracle's thresholds while remaining completely unobtrusive to legitimate workloads. It:
✅ Keeps at least one metric above 20% to prevent reclamation
✅ Runs at lowest OS priority (nice 19) with minimal system impact
✅ Automatically pauses when real workloads need resources
✅ Tracks CPU 95th percentile over 7-day rolling windows (matches Oracle's measurement)
✅ Works on both x86-64 and ARM64 Oracle Free Tier shapes
📋 Prerequisites:
- Docker and Docker Compose installed
- Persistent storage required - LoadShaper needs persistent volume for 7-day P95 metrics
- Single instance only - Run only one LoadShaper process per system to avoid conflicts
- External network peers required - For network protection, configure
NET_PEERSwith external servers you control
- Rootless container setup - LoadShaper follows security best practices (non-root user)
1. Clone and setup:
git clone https://github.com/senomorf/loadshaper.git
cd loadshaper2. REQUIRED: Setup volume permissions (one-time):
# Create volume with correct permissions for rootless container
docker run --rm -v loadshaper-metrics:/var/lib/loadshaper alpine:latest chown -R 1000:1000 /var/lib/loadshaper3. Deploy:
docker compose up -d --build
⚠️ Important: LoadShaper follows rootless container security principles. Volume permissions MUST be configured correctly before starting the container - no automatic fixes are provided.
4. Monitor activity:
docker logs -f loadshaper3. See current metrics:
# Look for telemetry lines showing current, average, and 95th percentile values
docker logs loadshaper | grep "\[loadshaper\]" | tail -5That's it! loadshaper will automatically detect your Oracle Cloud shape and start maintaining appropriate resource utilization.
For Kubernetes deployments, Helm charts are available in the helm/ directory:
# Install with default values
helm install loadshaper ./helm/loadshaper
# Or with custom configuration
helm install loadshaper ./helm/loadshaper -f custom-values.yamlKey Kubernetes considerations:
- Persistent Volume required for 7-day P95 metrics storage
- Resource limits included - Default CPU/memory limits configured for Oracle Free Tier
- Security hardened - Read-only root filesystem and non-root user configured
- Single replica only - LoadShaper must not run multiple instances per node
- Multiple configurations - Production, security-hardened, and shape-specific value files included
- Node affinity recommended to ensure consistent VM assignment
For security-conscious environments, you can add additional hardening to your Docker Compose deployment:
services:
loadshaper:
# ... existing configuration ...
read_only: true # Read-only root filesystem
tmpfs:
- /tmp # Allow temporary files in memory
cap_drop:
- ALL # Drop all Linux capabilities
security_opt:
- no-new-privileges:true # Prevent privilege escalationFor security-conscious environments, you can add additional hardening to your Docker Compose deployment:
services:
loadshaper:
# ... existing configuration ...
read_only: true # Read-only root filesystem
tmpfs:
- /tmp # Allow temporary files in memory
cap_drop:
- ALL # Drop all Linux capabilities
security_opt:
- no-new-privileges:true # Prevent privilege escalation📖 More Information:
- Configuration Reference - Detailed environment variable options
- CONTRIBUTING.md - Development setup and contribution guidelines
- CHANGELOG.md - Version history and breaking changes
Oracle's Always Free Tier compute shapes have the following specifications and reclamation thresholds:
VM.Standard.E2.1.Micro:
- CPU: 1/8 OCPU (burstable)
- Memory: 1 GB RAM (no memory reclamation rule)
- Network: 480 Mbps internal, 50 Mbps external (internet)
- 20% threshold = ~10 Mbps external traffic required
⚠️ See VCN Routing Warning below
A1.Flex (ARM-based):
- CPU: Up to 4 OCPUs
- Memory: Up to 24 GB RAM (20% threshold applies)
- Network: 1 Gbps per vCPU
- 20% threshold = ~0.2 Gbps per vCPU required
Important: Network monitoring typically measures internet-bound traffic (external bandwidth), not internal VM-to-VM traffic. For E2 shapes, focus on the 50 Mbps external limit.
IPv6 Global Addresses Can Route Internally: In Oracle Cloud VCNs, IPv6 global addresses (2001::/3) may route internally between VMs in the same subnet/VCN rather than via the internet gateway. This creates a critical compliance risk:
- Problem: Traffic between VMs with IPv6 global addresses stays within the VCN
- Result: Oracle's monitoring sees no external network activity → VM reclamation
- Solution: Ensure
NET_PEERStargets are external servers you control or have permission to use
Example External Peers (Replace with your own servers):
NET_PEERS=192.0.2.1,198.51.100.1 # TEST-NET addresses (example only - replace with real servers)
NET_PEERS=2001:db8::1 # Documentation IPv6 (example only - replace with real servers)Avoid VCN-Internal Peers:
# DON'T USE - These may route internally:
NET_PEERS=10.0.0.5 # RFC1918 private (definitely internal)
NET_PEERS=2603:1234::5 # IPv6 global but same VCN (may be internal)
NET_PEERS=your-other-vm-hostname # Another VM in same VCNloadshaper drives CPU usage and can emit network traffic. It tracks CPU utilization over a 7-day rolling window and calculates the 95th percentile to match Oracle's exact reclamation criteria. For memory and network, it uses simple threshold monitoring (no P95) as per Oracle's actual measurement method. The telemetry output shows current values, averages, and CPU P95.
┌─────────────────────────────────────────────────────────────────────┐
│ LoadShaper │
├─────────────────┬─────────────────┬─────────────────┬───────────────┤
│ Metrics │ P95 CPU │ Load │ Health │
│ Collector │ Controller │ Generators │ Server │
│ │ │ │ │
│ ┌─────────────┐ │ ┌─────────────┐ │ ┌─────────────┐ │ ┌───────────┐ │
│ │ /proc/stat │ │ │ Ring Buffer │ │ │ CPU Workers │ │ │ /health │ │
│ │ /proc/mem │─┼─│ SQLite DB │─┼─│ Mem Alloc │ │ │ /metrics │ │
│ │ /proc/net │ │ │ State Mach │ │ │ Net Traffic │ │ │ :8080 │ │
│ │ /loadavg │ │ │ Slot Timing │ │ │ │ │ │ │ │
│ └─────────────┘ │ └─────────────┘ │ └─────────────┘ │ └───────────┘ │
└─────────────────┴─────────────────┴─────────────────┴───────────────┘
│ │ │ │
│ │ │ │
▼ ▼ ▼ ▼
5s samples 7-day P95 Oracle VM Docker/K8s
EMA averages calculations protection monitoring
loadshaper operates as a lightweight monitoring and control system with four main components:
- CPU utilization: Read from
/proc/stat(system-wide percentage) - Memory utilization: Read from
/proc/meminfousing industry-standard calculation (see Memory Calculation) - Network utilization: Read from
/proc/net/devwith automatic speed detection - Load average: Monitor from
/proc/loadavgto detect CPU contention
- SQLite database: Stores samples every 5 seconds for rolling 7-day analysis in persistent storage (
/var/lib/loadshaper/metrics.db) - 95th percentile calculation: CPU only (mirrors Oracle's measurement method)
- Automatic cleanup: Removes data older than 7 days
- Persistent storage requirement: Database must be stored at
/var/lib/loadshaper/metrics.dbfor 7-day history preservation
- CPU stress: Low-priority workers (nice 19) with arithmetic operations
- Memory occupation: Gradual allocation with periodic page touching for A1.Flex shapes
- Network traffic: Native Python network bursts to peer instances when needed
- Load balancing: Automatic pausing when real workloads need resources
- HTTP endpoints:
/healthand/metricson port 8080 (configurable) - Docker integration: Provides health checks for container orchestration
- Monitoring support: Real-time metrics for external monitoring systems
- Security: Binds to localhost by default, configurable for external access
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Collect │───▶│ Analyze │───▶│ Adjust │
│ Metrics │ │ CPU P95 │ │ Load Level │
│ Every 5s │ │ vs Thresholds │ │ Accordingly │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Store in │ │ Check Load │ │ Yield to │
│ SQLite DB │ │ Average for │ │ Real Work │
│ (7 days) │ │ Contention │ │ When Needed │
└─────────────────┘ └─────────────────┘ └─────────────────┘
Design Priority: Minimal Impact on System Responsiveness
CPU stress runs at the absolute lowest OS priority (nice 19) and is designed to have minimal impact on system responsiveness for other processes. Key characteristics:
- Lowest priority: Both controller and CPU workers run at
nice19, immediately yielding to any real workloads - Transient bursts: Short, jittered activity periods with frequent sleep intervals (5ms yielding slices)
- Baseline operation: Designed to be lightweight background activity, not sustained high-intensity load
- Immediate yielding: Automatically pauses when system load average indicates CPU contention from legitimate processes
Workload Selection Criteria: When choosing between stress methods that produce similar CPU utilization metrics, always prioritize the approach with the least impact on system responsiveness and latency for other processes. The current implementation uses simple arithmetic operations that minimize context switching overhead and avoid cache pollution.
loadshaper implements smart network fallback to provide additional protection when CPU-based protection alone is insufficient. This feature generates network traffic only when Oracle's reclamation thresholds are at risk.
Activation Logic:
- E2 shapes: Activates when CPU P95 AND network both approach Oracle's 20% threshold
- A1 shapes: Activates when CPU P95, network, AND memory all approach Oracle's 20% threshold
Oracle Compliance:
- Uses simple threshold monitoring for network (not P95) matching Oracle's measurement method
- Generates traffic only when needed as a fallback to CPU-based protection
- Smart debouncing prevents oscillation and reduces system impact
| Variable | Default | Description |
|---|---|---|
NET_ACTIVATION |
adaptive |
Fallback mode: adaptive, always, off |
NET_FALLBACK_START_PCT |
19.0 |
Activate fallback below this network utilization threshold |
NET_FALLBACK_STOP_PCT |
23.0 |
Deactivate fallback above this network utilization threshold |
NET_FALLBACK_RISK_THRESHOLD_PCT |
22.0 |
CPU P95 and memory risk threshold for activation |
NET_FALLBACK_DEBOUNCE_SEC |
30 |
Minimum time between activation changes |
NET_FALLBACK_MIN_ON_SEC |
60 |
Minimum time to stay active once triggered |
NET_FALLBACK_MIN_OFF_SEC |
30 |
Minimum time to stay inactive once stopped |
NET_FALLBACK_RAMP_SEC |
10 |
Ramp-up time for gradual rate adjustment |
adaptive (recommended): Smart activation based on Oracle reclamation rules
- Monitors CPU P95, network, and memory utilization
- Activates only when multiple metrics approach danger thresholds
- Provides maximum protection with minimal system impact
always: Continuous network generation
- Useful for testing or environments with strict network requirements
- Higher resource usage but maximum network protection
off: Disables network fallback entirely
- CPU-only protection mode
- Recommended only when network generation is not desired
Activates network fallback only in extreme risk scenarios
NET_ACTIVATION=adaptive
NET_FALLBACK_START_PCT=15.0 # Very low threshold
NET_FALLBACK_STOP_PCT=25.0 # Higher deactivation threshold
NET_FALLBACK_RISK_THRESHOLD_PCT=19.0 # More conservative risk level
NET_FALLBACK_DEBOUNCE_SEC=60 # Longer debounce to avoid rapid changes
NET_FALLBACK_MIN_ON_SEC=120 # Stay active longer once triggeredUse case: Environments where network activity should be minimized but Oracle compliance is critical.
Maximizes protection against VM reclamation with active network generation
NET_ACTIVATION=adaptive
NET_FALLBACK_START_PCT=22.0 # Higher activation threshold
NET_FALLBACK_STOP_PCT=28.0 # Higher deactivation threshold
NET_FALLBACK_RISK_THRESHOLD_PCT=24.0 # Proactive risk management
NET_FALLBACK_DEBOUNCE_SEC=15 # Quick response to changes
NET_FALLBACK_MIN_ON_SEC=60 # Standard minimum on timeUse case: Critical workloads where VM reclamation must be avoided at all costs.
Always-on network generation for testing network bandwidth and validation
NET_ACTIVATION=always
NET_TARGET_PCT=25.0 # Consistent 25% network utilization
NET_MODE=client # Client mode for outbound traffic
NET_PEERS=your-server.example.com # External servers you control (replace with actual servers)Use case: Development environments, network performance testing, bandwidth validation.
Disables network fallback completely, relies only on CPU P95 control
NET_ACTIVATION=off
NET_TARGET_PCT=0 # No network generation
CPU_P95_TARGET_MIN=25.0 # Higher CPU target to compensate
CPU_P95_TARGET_MAX=30.0 # Adjusted range for CPU-only protectionUse case: Environments where network generation is prohibited or impossible.
Optimal balance between resource usage and Oracle compliance
NET_ACTIVATION=adaptive
NET_FALLBACK_START_PCT=20.0 # Oracle threshold-based
NET_FALLBACK_STOP_PCT=25.0 # Safe deactivation level
NET_FALLBACK_RISK_THRESHOLD_PCT=22.0 # Standard risk threshold
NET_FALLBACK_DEBOUNCE_SEC=30 # Balanced response time
NET_FALLBACK_MIN_ON_SEC=60 # Standard minimum active period
NET_FALLBACK_RAMP_SEC=15 # Smooth transitionsUse case: Production environments requiring reliable Oracle compliance with reasonable resource usage.
loadshaper monitors system load average to detect CPU contention from other
processes. When the 1-minute load average per core exceeds the configured
threshold (default 0.6), CPU workers are automatically paused to yield resources
to legitimate workloads. Workers resume when load drops below the resume
threshold (default 0.4). This ensures loadshaper remains unobtrusive and
immediately steps aside when real work needs the CPU.
The default thresholds (0.6/0.4) provide a good balance between responsiveness to legitimate workloads and stability. The hysteresis gap prevents oscillation when load hovers near the threshold.
loadshaper automatically stores CPU, memory, and network utilization samples
in a lightweight SQLite database for 7-day rolling analysis. Metrics are stored
at each control period (default 5 seconds) and automatically cleaned up after
7 days.
Storage location:
- Required:
/var/lib/loadshaper/metrics.db(persistent storage required)
Telemetry output format:
[loadshaper] cpu now=45.2% avg=42.1% p95=48.3% | mem(excl-cache) now=55.1% avg=52.8% | nic(...) now=12.50% avg=11.25% | load now=0.45 avg=0.42 | ... | samples_7d=98547
Where:
now: Current sample valueavg: 5-minute exponential moving average (memory/network only; CPU uses P95 control)p95: 95th percentile over the past 7 days (CPU only, matches Oracle's measurement)samples_7d: Number of samples stored in the 7-day window
Storage characteristics:
- Approximately 120,960 samples per week (one every 5 seconds)
- Estimated database size: 10-20 MB for 7 days of data
- Thread-safe for concurrent access
- Gracefully handles storage failures (continues with existing behavior)
For local runs outside of Docker, you must first create the persistent storage directory:
# Create persistent storage directory with correct permissions
sudo mkdir -p /var/lib/loadshaper
sudo chown $USER:$USER /var/lib/loadshaperNote: LoadShaper requires persistent storage at /var/lib/loadshaper to maintain the 7-day P95 CPU history needed for Oracle compliance. Without this directory, the application will fail to start.
Environment variables can override shape detection and contention limits:
# Override detected NIC speed and adjust network caps
NET_SENSE_MODE=container NET_LINK_MBIT=10000 NET_STOP_PCT=20 python -u loadshaper.py
# Raise CPU target while lowering the safety stop
CPU_P95_SETPOINT=50.0 CPU_STOP_PCT=70 MEM_TARGET_PCT=25 MEM_STOP_PCT=80 python -u loadshaper.py
# Configure load average monitoring thresholds (more aggressive example)
LOAD_THRESHOLD=1.0 LOAD_RESUME_THRESHOLD=0.6 LOAD_CHECK_ENABLED=true python -u loadshaper.py
# Conservative load monitoring (earlier pause, safer for shared systems)
LOAD_THRESHOLD=0.4 LOAD_RESUME_THRESHOLD=0.2 python -u loadshaper.pyloadshaper automatically detects your Oracle Cloud shape and applies optimized configuration templates:
| Shape | CPU | RAM | Network | Template |
|---|---|---|---|---|
| VM.Standard.E2.1.Micro | 1/8 OCPU | 1GB | 50 Mbps | e2-1-micro.env |
| VM.Standard.E2.2.Micro | 2/8 OCPU | 2GB | 50 Mbps | e2-2-micro.env |
| VM.Standard.A1.Flex (1 vCPU) | 1 vCPU | 6GB | 1 Gbps | a1-flex-1.env |
| VM.Standard.A1.Flex (2 vCPU) | 2 vCPU | 12GB | 2 Gbps | a1-flex-2.env |
| VM.Standard.A1.Flex (3 vCPU) | 3 vCPU | 18GB | 3 Gbps | a1-flex-3.env |
| VM.Standard.A1.Flex (4 vCPU) | 4 vCPU | 24GB | 4 Gbps | a1-flex-4.env |
The system uses a three-tier configuration priority:
- Environment Variables (highest priority)
- Shape-specific Template (automatic detection)
- Built-in Defaults (conservative fallback)
This means you can override any template value with environment variables while still benefiting from automatic shape-optimized defaults.
Critical: LoadShaper requires persistent storage for 7-day P95 CPU calculations. The storage directory is configurable via the PERSISTENCE_DIR environment variable.
Default Configuration:
PERSISTENCE_DIR="/var/lib/loadshaper" # Default persistence directoryStorage Contents:
metrics.db- SQLite database with 7-day rolling metrics (10-20MB)p95_ring_buffer.json- Ring buffer state for fast P95 calculations (few KB)
Custom Storage Examples:
For Docker bind mounts:
# docker-compose.yml
environment:
PERSISTENCE_DIR: "/app/data"
volumes:
- /host/path/loadshaper:/app/dataFor Kubernetes PersistentVolumes:
# deployment.yaml
env:
- name: PERSISTENCE_DIR
value: "/data/loadshaper"
volumeMounts:
- name: loadshaper-storage
mountPath: /data/loadshaperFor local development:
# Create custom directory
mkdir -p ~/loadshaper-data
export PERSISTENCE_DIR="$HOME/loadshaper-data"
python loadshaper.pyImportant Notes:
- Directory must be writable by UID 1000 (rootless container security)
- Without persistent storage, container will fail to start (mandatory requirement)
- Custom path must be absolute, not relative
- Same
PERSISTENCE_DIRvalue must be used consistently across container restarts
For non-Oracle Cloud environments, loadshaper safely falls back to conservative E2.1.Micro-like defaults, making it safe to run anywhere.
Critical for Oracle compliance: loadshaper calculates memory utilization by excluding cache/buffers, which aligns with industry standards and Oracle's likely implementation for VM reclamation criteria.
The Problem with Including Cache/Buffers:
- Linux aggressively uses free RAM for disk caching (often 50-80% of total memory)
- This cache is instantly reclaimable when applications need memory
- Including cache would make almost every Linux VM appear "active" even when idle
- This would defeat Oracle's reclamation policy of finding truly unused VMs
Industry Standard Approach:
- AWS CloudWatch:
mem_used_percentexcludes cache/buffers - Azure Monitor: Uses "available memory" metrics (cache-aware)
- Kubernetes: Uses "working set" memory (excludes reclaimable cache)
- Google Cloud: Uses similar cache-aware calculations
Preferred Method (Linux 3.14+):
memory_utilization = 100 × (1 - MemAvailable/MemTotal)
Fallback Method (older kernels):
cache_buffers = Buffers + max(0, Cached + SReclaimable - Shmem)
memory_utilization = 100 × (MemTotal - MemFree - cache_buffers) / MemTotal
Set DEBUG_MEM_METRICS=true to see both calculations in telemetry:
DEBUG_MEM_METRICS=true docker compose up -dOutput example:
mem(excl-cache) now=25.3% avg=24.1% p95=28.7% [incl-cache=78.2%]
This shows the huge difference: 25% (real app usage) vs 78% (including cache).
⚠️ CRITICAL: For Oracle Free Tier VM protection, ensure at least one metric target is above 20%. Setting all targets below 20% will cause Oracle to reclaim your VM. Oracle checks if ALL metrics are below 20% - if so, the VM is reclaimed.🚨 CRITICAL: SINGLE INSTANCE ONLY: Only run ONE LoadShaper instance per system. Multiple instances create race conditions in:
- P95 ring buffer state - Concurrent writes corrupt slot history tracking
- Metrics database - SQLite locks and data corruption
- Resource calculations - Conflicting load measurements
Result: Oracle VM reclamation due to broken P95 calculations. Each LoadShaper instance requires exclusive access to
/var/lib/loadshaper/persistent storage.
| Variable | Auto-Configured Values | Description | E2.1.Micro | E2.2.Micro | A1.Flex-1 | A1.Flex-2 | A1.Flex-3 | A1.Flex-4 |
|---|---|---|---|---|---|---|---|---|
CPU_P95_SETPOINT |
23.5, 25.0, 25.0, 25.0, 25.0, 30.0 | Target CPU P95 (7-day window) | 23.5% | 25.0% | 25.0% | 25.0% | 25.0% | 30.0% |
MEM_TARGET_PCT |
0, 0, 30, 30, 30, 30 | Target memory utilization (%) | 0% (disabled) | 0% (disabled) | 30% (above 20% rule) | 30% (above 20% rule) | 30% (above 20% rule) | 30% (above 20% rule) |
NET_TARGET_PCT |
15, 25, 25, 25, 25, 30 | Target network utilization (%) | 15% (50 Mbps) | 25% (50 Mbps) | 25% (1 Gbps) | 25% (2 Gbps) | 25% (3 Gbps) | 30% (4 Gbps) |
| Variable | Auto-Configured Values | Description | E2 Shapes | A1 Shapes |
|---|---|---|---|---|
CPU_STOP_PCT |
45, 50, 85, 85 | CPU % to pause load generation | 45-50% (shared tenancy) | 85% (dedicated) |
MEM_STOP_PCT |
80, 85, 90, 90 | Memory % to pause allocation | 80-85% (conservative) | 90% (with 20% rule) |
NET_STOP_PCT |
40, 40, 60, 60 | Network % to pause traffic | 40% (50 Mbps limit) | 60% (higher capacity) |
| Variable | Default | Description |
|---|---|---|
CONTROL_PERIOD_SEC |
5 |
Seconds between control decisions |
AVG_WINDOW_SEC |
300 |
Exponential moving average window for memory/network (5 min) |
HYSTERESIS_PCT |
5 |
Percentage hysteresis to prevent oscillation |
JITTER_PCT |
10 |
Random jitter in load generation (%) |
JITTER_PERIOD_SEC |
5 |
Seconds between jitter adjustments |
| Variable | Default | Description |
|---|---|---|
CPU_P95_TARGET_MIN |
22.0 |
Minimum target for 7-day CPU P95 (must stay >20%) |
CPU_P95_TARGET_MAX |
28.0 |
Maximum target for 7-day CPU P95 (efficiency ceiling) |
CPU_P95_SETPOINT |
25.0 |
Optimal P95 target (center of safe range 22-28%) |
CPU_P95_EXCEEDANCE_TARGET |
6.5 |
Target percentage of high-intensity slots (%) |
CPU_P95_SLOT_DURATION_SEC |
60.0 |
Duration of each control slot (seconds) |
CPU_P95_HIGH_INTENSITY |
35.0 |
CPU utilization during high-intensity slots (%) |
CPU_P95_BASELINE_INTENSITY |
20.0 |
CPU utilization during normal slots (minimum for Oracle compliance) |
CPU_P95_RING_BUFFER_BATCH_SIZE |
10 |
Number of slots between ring buffer state saves (performance optimization) |
| Variable | Default | Description |
|---|---|---|
LOAD_THRESHOLD |
0.6 |
Load average per core to pause workers |
LOAD_RESUME_THRESHOLD |
0.4 |
Load average per core to resume workers |
LOAD_CHECK_ENABLED |
true |
Enable/disable load average monitoring |
| Variable | Default | Description |
|---|---|---|
MEM_MIN_FREE_MB |
512 |
Minimum free memory to maintain (MB) |
MEM_STEP_MB |
64 |
Memory allocation step size (MB) |
MEM_TOUCH_INTERVAL_SEC |
1.0 |
Memory page touching frequency (0.5-10.0 seconds) |
| Variable | Default | Description |
|---|---|---|
NET_MODE |
client |
Network mode: off, client |
NET_PROTOCOL |
udp |
Protocol: udp (lower CPU), tcp |
NET_PEERS |
(empty) | Comma-separated peer IP addresses - must be external servers you control |
NET_PORT |
15201 |
TCP port for network communication |
NET_BURST_SEC |
10 |
Duration of traffic bursts (seconds) |
NET_IDLE_SEC |
10 |
Idle time between bursts (seconds) |
NET_TTL |
1 |
IP TTL for generated packets |
NET_PACKET_SIZE |
8900 |
Packet size (bytes) optimized for Oracle Cloud MTU 9000 |
| Variable | Default | Description |
|---|---|---|
NET_VALIDATE_STARTUP |
true |
Validate peer connectivity at startup |
NET_REQUIRE_EXTERNAL |
true |
Require external (non-RFC1918) addresses for E2 shapes |
NET_VALIDATION_TIMEOUT_MS |
200 |
Timeout for peer validation (milliseconds) |
NET_STATE_DEBOUNCE_SEC |
5.0 |
State transition debounce time (seconds) |
NET_STATE_MIN_ON_SEC |
15.0 |
Minimum time in active state (seconds) |
NET_STATE_MIN_OFF_SEC |
20.0 |
Minimum time in inactive state (seconds) |
NET_IPV6 |
auto |
IPv6 mode: auto, on, off |
| Variable | Default | Description |
|---|---|---|
NET_SENSE_MODE |
container |
Detection mode: container, host |
NET_IFACE |
ens3 |
Host interface name (required when NET_SENSE_MODE=host) |
NET_IFACE_INNER |
eth0 |
Container interface name |
NET_LINK_MBIT |
1000 |
Fallback link speed (Mbps) |
NET_MIN_RATE_MBIT |
1 |
Minimum traffic generation rate |
NET_MAX_RATE_MBIT |
800 |
Maximum traffic generation rate |
LoadShaper implements proportional safety scaling to prevent Oracle VM reclamation while maintaining system responsiveness. This advanced feature dynamically adjusts CPU intensity based on system load and P95 positioning.
The P95 controller uses an "exceedance budget" approach:
| Variable | Default | Description |
|---|---|---|
CPU_P95_EXCEEDANCE_TARGET |
6.5 |
Target percentage of high-intensity slots (0-100%) |
How it works:
- 6.5% exceedance means ~6.5% of configurable time slots (default 60s) run above the setpoint
- 93.5% of slots run at or below the setpoint, achieving the target P95
- Dynamic scaling: High system load reduces intensity proportionally
- Automatic adaptation: Controller adjusts to maintain the exceedance budget
# Example: At 80% system load, intensity scales down proportionally
if load_average > LOAD_THRESHOLD:
intensity_factor = max(0.1, 1.0 - (load_average - LOAD_THRESHOLD) / 2.0)
actual_intensity = base_intensity * intensity_factorThis ensures CPU stress never competes with legitimate workloads while maintaining Oracle compliance.
| Variable | Default | Description |
|---|---|---|
NET_ACTIVATION |
adaptive |
Network mode: adaptive (fallback), always, off |
NET_FALLBACK_START_PCT |
19 |
Start network generation below this % |
NET_FALLBACK_STOP_PCT |
23 |
Stop network generation above this % |
NET_FALLBACK_RISK_THRESHOLD_PCT |
22 |
Oracle reclamation risk threshold for CPU/memory |
NET_FALLBACK_DEBOUNCE_SEC |
30 |
Seconds to wait before changing state |
NET_FALLBACK_MIN_ON_SEC |
60 |
Minimum seconds to stay active |
NET_FALLBACK_MIN_OFF_SEC |
30 |
Minimum seconds to stay inactive |
NET_FALLBACK_RAMP_SEC |
10 |
Rate ramp time for smooth transitions (seconds) |
The native network generator provides advanced performance features:
TCP Connection Pooling:
- Persistent connections per target reduce connection overhead
- Automatic reconnection with exponential backoff on failures
- Significant performance improvement for sustained TCP traffic
IPv6 Support:
- Protocol-agnostic networking using socket.getaddrinfo() with AF_UNSPEC
- Supports IPv4, IPv6, and hostname targets (resolved via DNS)
- \u26a0\ufe0f VCN Routing Risk: IPv6 global addresses may route internally within Oracle VCN
- Proper TTL/hop limit handling for both address families
DNS Resolution Caching:
- Hostnames resolved once per session and cached
- Reduces DNS query overhead for repeated connections
- Supports both A and AAAA record resolution
Protocol-Specific Optimizations:
- UDP: Non-blocking sockets with optimized send buffers (1MB+)
- TCP: Connection pooling with TCP_NODELAY for low latency
- Jumbo frame support (MTU 9000) with 8900-byte packets for 30-50% CPU reduction
Performance Considerations:
- Rate limiting accuracy: 5ms token bucket precision
- Packet generation: Pre-allocated buffers for zero-copy operation
- Resource usage: Automatic cleanup with context manager support
- Thread safety: Designed for single-threaded operation per generator
The network generator includes a comprehensive reliability system to prevent silent failures and ensure Oracle-compliant traffic generation (requires external peers):
State Machine Architecture:
OFF→INITIALIZING→VALIDATING→ACTIVE_UDP→ACTIVE_TCP→ERROR- Automatic state transitions based on validation results and peer health
- Hysteresis and debouncing prevent oscillation between states
- Network is supplementary protection - system continues on CPU/memory even without network
Peer Validation & Reputation:
- External address validation: Automatically rejects RFC1918, loopback, and link-local addresses (
⚠️ VCN-internal IPv6 not detected) - EMA-based reputation scoring: Tracks peer reliability over time (0-100 score)
- Runtime tx_bytes monitoring: Validates actual network traffic generation (
⚠️ Cannot detect VCN-internal routing) - Automatic peer switching: Detects failed peers and switches to healthy alternatives
Fallback Chain:
- Primary: UDP traffic to configured peers
- Secondary: TCP traffic to same peers
- Final: If no valid peers, network generation will be inactive (fail-fast)
- Final: Local loopback generation (degraded mode)
Network Health Scoring (0-100):
- State health: 40% weight (ACTIVE states = 100, ERROR = 0)
- Peer reputation: 30% weight (EMA of validation success rates)
- Validation success: 20% weight (recent tx_bytes confirmation)
- Error rates: 10% weight (inverse of recent failures)
Oracle E2 Compliance Features:
- Ensures all traffic targets external addresses (non-RFC1918)
- Requires user-configured external peers for Oracle compliance
- Validates actual external traffic generation required for Oracle's 20% network threshold
VM.Standard.E2.1.Micro (x86-64):
# Conservative settings for shared 1/8 OCPU
CPU_P95_SETPOINT=23.5 CPU_P95_EXCEEDANCE_TARGET=6.5 MEM_TARGET_PCT=0 NET_TARGET_PCT=15
NET_LINK_MBIT=50 LOAD_THRESHOLD=0.6A1.Flex (ARM64):
# Higher targets for dedicated resources
CPU_P95_SETPOINT=28.5 CPU_P95_EXCEEDANCE_TARGET=6.5 MEM_TARGET_PCT=25 NET_TARGET_PCT=25
NET_LINK_MBIT=1000 LOAD_THRESHOLD=0.8| Variable | Default | Description |
|---|---|---|
HEALTH_ENABLED |
true |
Enable/disable HTTP health check server |
HEALTH_PORT |
8080 |
Port for health check endpoints |
HEALTH_HOST |
127.0.0.1 |
Host interface to bind (localhost only by default) |
LOADSHAPER_TEMPLATE_DIR |
config-templates/ |
Directory containing Oracle shape configuration templates |
ORACLE_METADATA_PROBE |
0 |
Enable Oracle-specific metadata service probe (0=disabled, 1=enabled) |
Oracle shape detection results are cached for 5 minutes (300 seconds) to avoid repeated system calls. The cache includes:
- Detected shape name and template file
- Oracle environment detection result
- System specifications (CPU count, memory size)
Note: In containerized environments, memory detection reflects the host system, not container limits.
loadshaper provides HTTP endpoints for health monitoring and metrics retrieval, primarily designed for Docker container health checks and monitoring systems.
GET /health - Health check status
{
"status": "healthy",
"uptime_seconds": 3245.1,
"timestamp": 1705234567.89,
"checks": ["all_systems_operational"],
"metrics_storage": "available",
"persistence_storage": "available",
"load_generation": "active",
"storage_status": {
"disk_usage_mb": 45.2,
"oldest_sample": "2024-01-01T12:00:00Z",
"sample_count": 120960
}
}GET /metrics - Detailed metrics and configuration
{
"timestamp": 1705234567.89,
"current": {
"cpu_percent": 45.2,
"cpu_avg": 42.1,
"memory_percent": 55.1,
"memory_avg": 52.8,
"network_percent": 12.5,
"network_avg": 11.25,
"load_average": 0.42,
"duty_cycle": 0.65,
"network_rate_mbit": 15.2,
"paused": false
},
"targets": {
"cpu_p95_setpoint": 25.0,
"memory_target": 0.0,
"network_target": 15.0
},
"configuration": {
"cpu_stop_threshold": 95.0,
"memory_stop_threshold": 95.0,
"network_stop_threshold": 95.0,
"load_threshold": 0.6,
"worker_count": 4,
"control_period": 10.0,
"averaging_window": 60.0
},
"percentiles_7d": {
"cpu_p95": 48.3,
"memory_p95": 52.1,
"network_p95": 14.8,
"load_p95": 0.35,
"sample_count_7d": 120960
}
}Add health check to your Docker compose or Dockerfile:
# docker-compose.yml
services:
loadshaper:
build: .
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 60s# Dockerfile
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
CMD curl -f http://localhost:8080/health || exit 1By default, the health server binds to localhost only (127.0.0.1) for security. To enable external access:
# Allow access from any interface (Docker containers)
HEALTH_HOST=0.0.0.0 docker run ...
# Bind to specific interface
HEALTH_HOST=10.0.0.1 docker run ...
# Disable health server entirely
HEALTH_ENABLED=false docker run ...Security Note: Only bind to external interfaces (0.0.0.0) in trusted environments or behind proper network security controls.
For production deployments, proper monitoring is essential to ensure LoadShaper effectively prevents Oracle VM reclamation while maintaining system stability.
Monitor these critical metrics via the GET /metrics endpoint:
| Metric | Path | Critical Threshold | Description |
|---|---|---|---|
| CPU P95 | percentiles_7d.cpu_p95 |
Must stay > 20% | Primary Oracle reclamation metric |
| Health Status | status |
Must be "healthy" | Overall LoadShaper health |
| Controller State | p95_controller.state |
Watch for stability | P95 controller operational state |
| Exceedance % | p95_controller.exceedance_pct |
Target ~6.5% | Percentage of high-intensity slots |
| Load Average | current.load_average |
Monitor spikes | System load impact tracking |
# LoadShaper Down - Immediate reclamation risk
curl -f http://localhost:8080/health || ALERT "LoadShaper unreachable"
# CPU P95 Below Oracle Threshold
if cpu_p95 < 20% for > 60 minutes:
ALERT "CPU P95 approaching Oracle reclamation threshold"# Low CPU Activity Warning
if cpu_p95 < 25% for > 30 minutes:
WARN "CPU P95 trending toward danger zone"
# High Exceedance - Overshooting target
if exceedance_pct > 15% for > 30 minutes:
WARN "P95 controller exceedance above optimal range"
# Controller Instability
if p95_controller.state in ["BUILDING","REDUCING"] for > 20 minutes:
WARN "P95 controller unable to reach stable state"# Process Restart Tracking
if uptime_seconds < 60:
INFO "LoadShaper restarted - monitoring for stability"# prometheus.yml
scrape_configs:
- job_name: 'loadshaper'
static_configs:
- targets: ['localhost:8080']
metrics_path: /metrics
scrape_interval: 30s#!/bin/bash
# monitor-loadshaper.sh
METRICS=$(curl -s http://localhost:8080/metrics)
CPU_P95=$(echo $METRICS | jq -r '.percentiles_7d.cpu_p95 // 0')
if (( $(echo "$CPU_P95 < 20" | bc -l) )); then
echo "CRITICAL: CPU P95 ($CPU_P95%) below Oracle threshold"
# Send notification (email, Slack, PagerDuty, etc.)
fi# Verify LoadShaper activity is visible to Oracle
# Check OCI Console > Compute > Instance Details > Metrics
# CPU Utilization should show consistent activity pattern✅ Confirm LoadShaper is Working:
- CPU P95 stabilizes in target range (23-28%)
- Controller state reaches "MAINTAINING" after warmup
- Exceedance percentage near 6.5%
- OCI Console shows consistent CPU activity
- CPU P95 trending downward toward 20%
- Controller stuck in "BUILDING" or "REDUCING" states
- Volatile CPU patterns instead of stable control
- Gaps in monitoring data indicating downtime
🔧 Troubleshooting:
- Check Docker logs:
docker logs loadshaper - Verify database:
docker exec loadshaper ls -la /var/lib/loadshaper/ - Test health endpoint:
curl http://localhost:8080/health - Monitor system load: Ensure load average isn't constantly high
- Prometheus + Grafana: Best for comprehensive monitoring and alerting
- Datadog/New Relic: Good for managed environments with agent-based monitoring
- Simple cron + curl: Sufficient for basic deployments with shell script alerting
- OCI Monitoring: Use as secondary validation of LoadShaper effectiveness
Remember: The ultimate success metric is your VM remaining active. Monitor consistently, but trust the system's design to maintain Oracle compliance automatically.
Q: Will this impact the performance of my applications?
A: No. loadshaper runs at the lowest OS priority (nice 19) and automatically pauses when real workloads need resources. It's designed to be completely invisible to legitimate applications.
Q: How much system resources does loadshaper use?
A: Very minimal - typically <1% CPU when idle, 10-20MB memory for metrics storage, and network traffic only when needed as fallback.
Q: Does this work on both x86-64 and ARM64?
A: Yes, it automatically detects and adapts to both VM.Standard.E2.1.Micro (x86-64) and A1.Flex (ARM64) shapes.
Q: How does this prevent my Always Free instance from being reclaimed?
A: Oracle reclaims instances when ALL metrics are below 20% for 7 days. loadshaper ensures at least one metric stays above 20% by tracking CPU 95th percentile (matching Oracle's measurement) and simple averages for memory/network.
Q: What if Oracle changes their reclamation policy?
A: The thresholds are easily configurable via environment variables. Simply adjust CPU_P95_SETPOINT, MEM_TARGET_PCT, or NET_TARGET_PCT as needed.
Q: Will Oracle consider this usage "legitimate"?
A: The tool generates actual resource utilization that would be visible to Oracle's monitoring. However, you should review Oracle's terms of service to ensure compliance with your use case.
Q: Why does memory targeting default to 0% on E2.1.Micro?
A: E2 shapes only have 1GB RAM and memory isn't counted in Oracle's reclamation criteria for these instances. Memory targeting is only enabled by default on A1.Flex shapes.
Q: How can I tell if it's working?
A: Watch the telemetry output: docker logs -f loadshaper. You'll see current, average, and CPU 95th percentile values.
Q: What happens if I restart the container?
A: Metrics history is preserved in the SQLite database (stored in /var/lib/loadshaper/ on persistent storage). The 7-day rolling window continues from where it left off.
Q: Can I run this alongside other applications?
A: Absolutely. That's the primary use case. loadshaper is designed to coexist peacefully with any workload.
Q: CPU load isn't reaching the target percentage
A: Check if LOAD_THRESHOLD is too low (causing frequent pauses) or if CPU_STOP_PCT is being triggered. Try increasing LOAD_THRESHOLD to 0.8 or 1.0.
Q: Network traffic isn't being generated
A: Check network state in telemetry output. Configure NET_PEERS with external servers you control. Monitor network health scores and state transitions. If state shows ERROR, check connectivity to your configured peers.
Q: Memory usage isn't increasing on A1.Flex
A: Check available free memory and ensure MEM_TARGET_PCT is set above current usage. Verify the container has adequate memory limits.
By default, LoadShaper uses /var/lib/loadshaper as its persistent storage directory. You can customize this location using the PERSISTENCE_DIR environment variable if needed.
To use a custom storage path with Docker Compose:
# compose.override.yaml
services:
loadshaper:
environment:
- PERSISTENCE_DIR=/data/loadshaper # Custom path inside container
volumes:
- loadshaper-metrics:/data/loadshaper # Mount volume to custom pathFor Kubernetes deployments with custom paths:
apiVersion: v1
kind: ConfigMap
metadata:
name: loadshaper-config
data:
PERSISTENCE_DIR: "/data/loadshaper"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: loadshaper-storage
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: loadshaper
spec:
template:
spec:
containers:
- name: loadshaper
envFrom:
- configMapRef:
name: loadshaper-config
volumeMounts:
- name: storage
mountPath: /data/loadshaper # Must match PERSISTENCE_DIR
volumes:
- name: storage
persistentVolumeClaim:
claimName: loadshaper-storage- Path consistency: The
PERSISTENCE_DIRenvironment variable must match the volume mount path - Volume permissions: The directory must be owned by UID/GID 1000 (LoadShaper user)
- Single instance: Only one LoadShaper instance can use a given storage path
- Migration: When changing storage paths, historical metrics will not be migrated automatically
Interested in improving loadshaper? Check out our Contributing Guide for:
- Development environment setup
- Testing requirements
- Code style guidelines
- How to submit improvements
Loadshaper includes comprehensive test coverage to ensure reliability:
# Run all tests
python -m pytest -q
# Run specific test modules
python -m pytest tests/test_cpu_p95_controller.py -v
python -m pytest tests/test_health_endpoints.py -v
# Run with coverage
python -m pytest --cov=loadshaperCPUP95Controller Test Suite (tests/test_cpu_p95_controller.py):
- 46 comprehensive tests covering all controller functionality
- Initialization: Ring buffer sizing, cache behavior, state setup
- State Machine: BUILDING/MAINTAINING/REDUCING transitions with adaptive hysteresis
- Intensity Calculations: Target intensity algorithms for different states and P95 distances
- Exceedance Targets: Adaptive exceedance budget control based on state and P95 deviation
- Slot Engine: Time-based slot rollover, exceedance budget control, safety gating
- Status Reporting: Telemetry data structure and accuracy
- Edge Cases: Extreme configurations, error conditions, boundary conditions
Health Endpoints (tests/test_health_endpoints.py):
- HTTP endpoint functionality and response validation
- Telemetry data accuracy and formatting
Shape Detection (tests/test_shape_detection.py):
- Oracle Cloud instance shape detection and configuration
- Mocked Storage: Tests use
MockMetricsStorageto simulate database interactions - Time Control: Tests use
patch('time.time')for deterministic slot timing - Cache Management: Tests properly handle P95 caching to ensure fresh data
- State Isolation: Each test starts with clean controller state
- Algorithm Verification: Tests validate the exact mathematical behavior of P95-driven control
- Smart network activation: Trigger network load only when CPU/memory metrics trend below thresholds (predictive activation)
- Additional Oracle shapes: Support for newer compute shapes and specialized instances
- Multi-cloud support: Extend to AWS Free Tier, Google Cloud Free Tier
- Advanced monitoring: Integration with Oracle Cloud monitoring APIs for validation
- Resource optimization: Dynamic adjustment based on actual Oracle reclamation patterns
The controller and CPU load workers run with low operating system priority using os.nice(19).
On Linux/Unix systems this lowers their scheduling priority; on other platforms
the call is ignored. Tight loops include small sleep slices (≈5 ms) so the
scheduler can run other workloads without noticeable latency impact.
Check CPU load is working:
# CPU percentage should be near your target
docker logs -f loadshaper | grep "cpu now="Check memory allocation:
# Memory usage should increase over time if MEM_TARGET_PCT > current usage
docker logs -f loadshaper | grep "mem(excl-cache)"Check network traffic:
# Network percentage should show activity when NET_MODE=client and peers are configured
docker logs -f loadshaper | grep "nic("Container startup failures:
# Check container logs for entrypoint errors
docker logs loadshaper
# Common entrypoint issues:
# 1. "NOT a mount point" - NEW: Container now verifies persistent volume is actually mounted
docker exec loadshaper mount | grep /var/lib/loadshaper || echo "No volume mounted - container will not start"
# Fix: Ensure docker-compose.yaml has the volume configured:
# volumes:
# - loadshaper-metrics:/var/lib/loadshaper
# 2. "Permission denied" - persistent storage mount point permissions
docker exec loadshaper ls -ld /var/lib/loadshaper || echo "Mount point not accessible"
# Fix for named volumes:
docker run --rm -v loadshaper-metrics:/var/lib/loadshaper alpine:latest chown -R 1000:1000 /var/lib/loadshaper
# Fix for bind mounts:
sudo chown -R 1000:1000 ./persistent-storage/
# 3. "Write test failed" - storage not writable (uses mktemp for security)
docker exec loadshaper touch /var/lib/loadshaper/test && docker exec loadshaper rm /var/lib/loadshaper/test || echo "Storage not writable"
# 4. "Database migration failed" - corrupted or incompatible database
docker exec loadshaper rm -f /var/lib/loadshaper/metrics.db && docker restart loadshaper
# 5. Verify compose configuration includes persistent volume
docker compose config | grep -A5 volumes || echo "No volumes configured - add persistent storage"Mount Point Verification (NEW in v2.0.0):
# Container now requires /var/lib/loadshaper to be a mount point
# This prevents accidental use of container's filesystem which loses data on restart
# Verify mount inside container:
docker exec loadshaper sh -c 'stat -c "%d" /var/lib/loadshaper; stat -c "%d" /var/lib'
# Different device IDs = properly mounted; Same IDs = NOT mounted (container won't start)
# For Kubernetes users - avoid emptyDir:
# emptyDir volumes are NOT persistent across pod restarts
# Use PersistentVolumeClaim (PVC) instead for true persistence
# Debug mount issues:
docker inspect loadshaper | jq '.[0].Mounts' # Show all mounts
docker volume ls # List volumes
docker volume inspect loadshaper-metrics # Check volume detailsCPU not reaching target percentage:
- Check if
LOAD_THRESHOLDis too low (workers pause when system load is high) - Verify
CPU_STOP_PCTisn't triggering premature shutdown - Increase
CPU_P95_SETPOINTif needed
Memory not increasing:
- Ensure sufficient free memory exists (respects
MEM_MIN_FREE_MB) - Check if
MEM_STOP_PCTis being triggered - Verify container has access to enough memory
Network traffic not generating:
- Check telemetry for network state (should show ACTIVE_UDP or ACTIVE_TCP)
- Configure
NET_PEERSwith external servers you control - required for network protection - Monitor network health score (0-100) - low scores indicate connectivity issues
- Network generator automatically falls back: UDP → TCP → next peer
- For E2 shapes, ensure external internet access (not just internal VM connectivity)
Database storage issues:
# Check if persistent metrics storage is working
docker exec loadshaper ls -la /var/lib/loadshaper/ 2>/dev/null || echo "Persistent storage not mounted - container will fail"
# If database corrupted, remove and restart (7-day P95 history will reset)
docker exec loadshaper rm -f /var/lib/loadshaper/metrics.db && docker restart loadshaper
# Check disk space and verify write permissions
docker exec loadshaper df -h /var/lib/loadshaper && docker exec loadshaper touch /var/lib/loadshaper/test && docker exec loadshaper rm /var/lib/loadshaper/testLoad average causing frequent pauses:
# If workers pause too often, adjust thresholds
LOAD_THRESHOLD=1.0 LOAD_RESUME_THRESHOLD=0.6 docker compose up -dNetwork connectivity issues:
# Check firewall rules allow traffic on NET_PORT (default 15201)
# Verify NET_PEERS addresses are reachable
# For E2 shapes, ensure peers are external (not internal/localhost)Peer connectivity failures:
# Check connectivity to your configured peers
docker exec loadshaper ping -c 3 your-server.example.com
# Replace with your actual peer addresses
# If unreachable, check your network connectivity or firewall rulesNetwork generation issues:
# Check container logs for network errors
docker logs loadshaper | grep -i network
# Look for peer connectivity issues
docker logs loadshaper | grep -i peerHigh network CPU usage:
# Switch to UDP if TCP overhead is too high
NET_PROTOCOL=udp docker compose up -d --buildNetwork interface detection problems:
# Check available network interfaces
docker exec loadshaper cat /proc/net/dev
# Manual interface specification if auto-detection fails
NET_INTERFACE=eth0 docker compose up -d --buildFor resource-constrained VMs (1 vCPU/1GB):
# Optimize for minimal overhead with P95 control
CPU_P95_SETPOINT=22.0
MEM_TARGET_PCT=0
NET_TARGET_PCT=22
LOAD_THRESHOLD=0.4
docker compose up -d --buildFor better responsiveness:
# More responsive to system load
CONTROL_PERIOD_SEC=1
AVG_WINDOW_SEC=5
HYSTERESIS_PCT=2
docker compose up -d --buildComprehensive system state:
# Collect all relevant information for troubleshooting
echo "=== System Info ==="
docker exec loadshaper uname -a
docker exec loadshaper cat /proc/meminfo | head -5
docker exec loadshaper cat /proc/loadavg
echo "=== Loadshaper Config ==="
docker exec loadshaper env | grep -E "(CPU_|MEM_|NET_|LOAD_)" | sort
echo "=== Recent Logs ==="
docker logs --tail 50 loadshaper
echo "=== Network Connectivity ==="
docker exec loadshaper ping -c 3 your-server.example.com 2>/dev/null || echo "External peer unreachable"
echo "=== Resource Usage ==="
docker stats --no-stream loadshaperTesting network generation manually:
# Test native network generator directly
docker exec -it loadshaper python3 -c "
import sys
sys.path.append('/app')
import loadshaper
# Test UDP generation
gen = loadshaper.NetworkGenerator(rate_mbps=10, protocol='udp')
print('Starting UDP test...')
gen.start()
import time; time.sleep(5)
gen.stop()
print('UDP test complete')
# Test TCP generation
gen = loadshaper.NetworkGenerator(rate_mbps=10, protocol='tcp')
print('Starting TCP test...')
gen.start()
time.sleep(5)
gen.stop()
print('TCP test complete')
"Verify health endpoints:
# Test health endpoint if enabled
curl -f http://localhost:8080/health || echo "Health check failed"
# Test metrics endpoint
curl http://localhost:8080/metrics 2>/dev/null | head -10
# Check health server logs
docker logs loadshaper | grep -i healthCheck 7-day compliance:
# Verify metrics meet Oracle thresholds
docker exec loadshaper python3 -c "
import sys, sqlite3, os
sys.path.append('/app')
import loadshaper
# Check database exists
db_path = '/var/lib/loadshaper/metrics.db'
if not os.path.exists(db_path):
print('Persistent metrics database not found - volume not mounted correctly')
exit(1)
# Check recent metrics
storage = loadshaper.MetricsStorage()
cpu_p95 = storage.get_percentile('cpu')
print(f'CPU 95th percentile: {cpu_p95:.1f}% (need >20%)' if cpu_p95 else 'CPU P95 not available')
try:
print(f'Network samples available: {storage.get_sample_count()}')
print('Check /metrics endpoint for current utilization levels')
except:
print('Network metrics not available')
if loadshaper.is_e2_shape():
print('E2 shape: CPU and network must be >20%')
else:
try:
print('Memory tracking active for A1 shapes')
print('Check /metrics endpoint for current memory utilization')
except:
print('Memory metrics not available')
print('A1 shape: CPU, network, AND memory must all be >20%')
"