A minimal, no-Kubernetes ops kit for keeping long-running trading bots (or any Python daemons) alive 24/7 — on a Windows box and on a cheap Linux VPS.
No Docker, no orchestrator, no SaaS. Two battle-tested paths, both extracted from a bot that ran unattended for months:
- Windows — a config-driven PowerShell watchdog + a logon Scheduled Task.
- Linux — a one-shot VPS bootstrap + a templated systemd unit that supervises any number of bots.
Both give you: auto-restart on crash, stall detection, log rotation, hourly state backups, and Telegram alerts.
For one or a few bots on one box, those are heavier than the problem. What you actually need is: restart when it dies, restart when it silently freezes, don't fill the disk with logs, back up the state file, and ping me when something happens. That's ~250 lines. This is those lines, cleaned up and reusable.
The stall detection is the part most setups miss: a bot process can stay
alive while its event loop is wedged (dead WebSocket, deadlock). Restart=always
won't catch that — the process never exits. Here, if the stderr log goes quiet
for longer than a threshold, the bot is killed and restarted.
cd windows
Copy-Item ..\bots.example.json .\bots.json # then edit paths + bots
powershell -ExecutionPolicy Bypass -File install_watchdog.ps1
Start-ScheduledTask -TaskName trading-bot-ops-watchdogwatchdog.ps1— readsbots.json, supervises every entry. ASCII-only so it runs under stock Windows PowerShell 5.1.install_watchdog.ps1— registers it as a userland Scheduled Task (no admin) that starts at logon and self-restarts.- Config knobs (
bots.example.json): check interval, stale threshold, log rotation size/retention, state-backup interval/retention.
# on a fresh VM, as root:
APP_DIR=/opt/app APP_USER=botuser bash linux/install.sh
# from your machine:
bash linux/sync.sh <SERVER_IP>
# on the server -- one templated unit handles every bot:
systemctl enable --now bot@run_my_bot # runs runtime/run_my_bot.py
journalctl -u bot@run_my_bot -f
systemctl status 'bot@*'install.sh is idempotent and sets up: a dedicated unprivileged user, a venv,
the templated bot@.service, a UFW firewall (SSH only), and journald capped at
7 days / 500 MB so logs never fill the disk. sync.sh rsyncs your code
without copying your local .env (secrets stay on the server) and restarts
the running units.
One file, any number of bots — the instance name is the script:
systemctl start bot@run_goat # -> /opt/app/venv/bin/python runtime/run_goat.py
systemctl start bot@run_ftmo # -> runtime/run_ftmo.pyIt already includes Restart=always, NoNewPrivileges=true, and MemoryMax /
CPUQuota resource guards.
Put these in <bot_root>/.env and you get a ping on every restart / stall /
failure:
TELEGRAM_TOKEN=123456:ABC...
TELEGRAM_CHAT_ID=987654321
Leave them out and alerts are simply skipped.
trading-bot-ops/
├── bots.example.json # the watchdog config (copy -> bots.json)
├── windows/
│ ├── watchdog.ps1 # supervises N processes from the config
│ └── install_watchdog.ps1 # register as a logon Scheduled Task (no admin)
└── linux/
├── install.sh # one-shot VPS bootstrap (idempotent)
├── sync.sh # rsync deploy, secret-safe, restarts units
└── systemd/bot@.service # templated unit: one file, many bots
This keeps a process running; it does not make a losing strategy profitable. Use a separate risk guard in your bot. MIT licensed — adapt freely.