This document describes the high-level architecture, design decisions, and data flows of the py_rt Algorithm Trading System.
The py_rt system implements a hybrid Python-Rust architecture that separates offline research/backtesting tasks (Python) from online low-latency trading execution (Rust). This design maximizes Python's productivity for research while leveraging Rust's performance for production trading.
- Architecture Overview
- Python-Rust Separation
- Component Descriptions
- Data Flow
- Communication Patterns
- Design Decisions
- Scalability
- Limitations
- Future Improvements
For comprehensive architecture documentation, see:
- /docs/architecture/python-rust-separation.md - Complete system design with Python offline and Rust online components
- /docs/architecture/component-diagram.md - C4 model diagrams showing component interactions
- /docs/architecture/integration-layer.md - PyO3 bindings, ZeroMQ messaging, and shared memory IPC
The system is designed with a clear separation of concerns:
- Backtesting: Historical data replay, strategy validation
- Optimization: Parameter tuning with Optuna, genetic algorithms
- Machine Learning: Model training with XGBoost, PyTorch
- Analysis: Statistical analysis, visualization, performance attribution
- Research: Jupyter notebooks, hypothesis testing
- Market Data: Low-latency WebSocket ingestion, order book management
- Execution: Order routing, retry logic, slippage protection
- Risk Management: Pre-trade checks, position tracking, P&L monitoring
- Signal Processing: Real-time indicators, ML inference (ONNX)
- PyO3: Python-Rust bindings for performance-critical functions
- ZeroMQ: Pub/sub messaging for event-driven communication
- Shared Memory: Lock-free ring buffers for ultra-low-latency data transfer
- Protocol Buffers: Binary serialization for efficient data exchange
┌─────────────────────────────────────────────────────────────────────┐
│ PYTHON OFFLINE │
│ Research → Backtesting → Optimization → ML Training → ONNX Export │
└─────────────────────────────┬───────────────────────────────────────┘
│
┌─────────▼─────────┐
│ PyO3 / ZeroMQ │
│ Protocol Buffers │
└─────────┬─────────┘
│
┌─────────────────────────────▼───────────────────────────────────────┐
│ RUST ONLINE │
│ Market Data → Signal Processing → Risk Checks → Order Execution │
└─────────────────────────────────────────────────────────────────────┘
The original system follows a microservices architecture with four independent components communicating via ZeroMQ (ZMQ) messaging. Each component is a separate Rust binary that can be deployed and scaled independently.
graph TD
A[Alpaca Markets API] -->|WebSocket| B[Market Data Service]
B -->|ZMQ PUB| C[Signal Bridge Python ML]
B -->|ZMQ PUB| D[Risk Manager]
C -->|ZMQ PUB| D
D -->|ZMQ PUB| E[Execution Engine]
E -->|REST API| A
B -.->|Prometheus| F[Monitoring]
C -.->|Prometheus| F
D -.->|Prometheus| F
E -.->|Prometheus| F
┌─────────────────────────────────────────────────────────────┐
│ Alpaca Markets Cloud │
│ (WebSocket + REST API v2) │
└───────────────┬─────────────────────────────┬───────────────┘
│ │
│ WebSocket │ REST API
│ (Market Data) │ (Orders)
▼ ▲
┌─────────────────────┐ ┌─────────────────────┐
│ Market Data │ │ Execution Engine │
│ Service │ │ - Order Router │
│ - WebSocket Client │ │ - Retry Logic │
│ - Order Book │ │ - Slippage Check │
│ - Aggregation │ │ - Rate Limiting │
│ - ZMQ Publisher │ │ │
└──────────┬──────────┘ └──────────▲──────────┘
│ │
│ ZMQ PUB (port 5555) │ ZMQ SUB
│ Topics: market.* │ Topic: order.*
│ │
▼ │
┌─────────────────────┐ ┌─────────────────────┐
│ Signal Bridge │ │ Risk Manager │
│ - PyO3 Bindings │────────▶│ - Position Track │
│ - Python ML │ ZMQ PUB │ - Risk Limits │
│ - Feature Eng. │ (5556) │ - Pre-trade Check │
│ │ │ - P&L Tracking │
└─────────────────────┘ └─────────────────────┘
│
│ ZMQ PUB (5557)
│ Topic: risk.*
└─────────────────┘
Responsibility: Real-time market data ingestion and distribution
Key Features:
- WebSocket connection to Alpaca Markets data stream (v2)
- Maintains live order books for subscribed symbols
- Aggregates trades into OHLCV bars (multiple timeframes)
- Publishes market data via ZMQ PUB socket
Implementation Details:
websocket.rs: Tokio-tungstenite WebSocket client with automatic reconnectionorderbook.rs: Binary heap-based order book with O(log n) updatesaggregation.rs: Time-based windowing for bar generationpublisher.rs: ZMQ PUB socket wrapper with topic-based routing
Performance:
- Processes 50,000+ messages/second
- Order book updates in <50μs (p99)
- Memory: ~50MB for 100 symbols
Configuration:
{
"alpaca_api_key": "string",
"alpaca_secret_key": "string",
"zmq_pub_address": "tcp://*:5555",
"symbols": ["AAPL", "MSFT"],
"reconnect_delay_secs": 5
}Responsibility: Python integration for ML-based signal generation
Key Features:
- PyO3 bindings expose Rust types to Python
- Subscribes to market data via ZMQ SUB
- Processes signals through Python ML models
- Publishes trading signals via ZMQ PUB
Implementation Details:
lib.rs: PyO3 module with#[pymodule]attribute- Exposes
Signal,Bar,Tradetypes to Python - Async runtime integration with Python's asyncio
- Technical indicators via
tacrate
Use Cases:
- Feature Engineering: Calculate technical indicators in Python
- ML Models: scikit-learn, TensorFlow, PyTorch integration
- Custom Strategies: Implement complex logic in Python
Example Python Integration:
import signal_bridge
# Receive market data
bar = signal_bridge.receive_bar()
# Generate signal
if bar.close > bar.open * 1.01:
signal = signal_bridge.Signal(
symbol="AAPL",
action="Buy",
confidence=0.85
)
signal_bridge.publish_signal(signal)Performance:
- Python GIL can limit throughput to ~1,000 signals/second
- Consider multi-process architecture for higher throughput
Responsibility: Pre-trade risk checks and position tracking
Key Features:
- Validates orders against configurable risk limits
- Tracks open positions and realized/unrealized P&L
- Enforces maximum position size, order size, daily loss limits
- Publishes approved/rejected orders
Implementation Details:
lib.rs: Core risk engine with position tracking- In-memory position store (HashMap)
- Atomic risk checks with async/await
Risk Checks:
- Position Size Limit: Maximum position value per symbol
- Order Size Limit: Maximum single order value
- Daily Loss Limit: Maximum unrealized loss in trading day
- Account Balance: Sufficient buying power
Configuration:
{
"max_position_size": 10000.0,
"max_order_size": 1000.0,
"max_daily_loss": 5000.0,
"zmq_sub_address": "tcp://localhost:5555",
"zmq_pub_address": "tcp://*:5557"
}Example Risk Check Flow:
Signal → Risk Manager → Check Limits → Approve/Reject → Publish
│
├─ Position size OK?
├─ Order size OK?
├─ Daily loss OK?
└─ Account balance OK?
Responsibility: Order execution through Alpaca Markets API
Key Features:
- Smart order routing with retry logic
- Slippage protection (rejects orders exceeding threshold)
- Rate limiting (200 requests/minute via
governorcrate) - Order status tracking and updates
Implementation Details:
router.rs: Order routing logic with async HTTP client (reqwest)retry.rs: Exponential backoff retry strategy (max 3 retries)slippage.rs: Compares limit order price to current market price
Retry Strategy:
// Exponential backoff: 1s, 2s, 4s
async fn retry_order(order: Order) -> Result<OrderResponse> {
let mut delay = Duration::from_secs(1);
for attempt in 0..3 {
match execute_order(&order).await {
Ok(response) => return Ok(response),
Err(e) if is_retryable(&e) => {
tokio::time::sleep(delay).await;
delay *= 2;
}
Err(e) => return Err(e),
}
}
Err(Error::MaxRetriesExceeded)
}Slippage Protection:
// Reject limit orders with >50bps slippage
fn check_slippage(order: &Order, market_price: Price) -> bool {
let slippage_bps = ((order.price - market_price) / market_price * 10000.0).abs();
slippage_bps <= config.max_slippage_bps
}Configuration:
{
"alpaca_api_key": "string",
"alpaca_secret_key": "string",
"max_retries": 3,
"max_slippage_bps": 50,
"rate_limit_per_minute": 200
}sequenceDiagram
participant Alpaca as Alpaca Markets
participant MD as Market Data
participant SB as Signal Bridge
participant RM as Risk Manager
participant EE as Execution Engine
Alpaca->>MD: WebSocket: Trade Update
MD->>MD: Update Order Book
MD->>SB: ZMQ PUB: market.trade
MD->>RM: ZMQ PUB: market.trade
SB->>SB: ML Model Inference
SB->>RM: ZMQ PUB: signal.generated
RM->>RM: Risk Check
alt Risk Check Passed
RM->>EE: ZMQ PUB: risk.approved
EE->>Alpaca: REST: POST /v2/orders
Alpaca->>EE: Response: Order Accepted
EE->>RM: ZMQ PUB: order.filled
else Risk Check Failed
RM->>SB: ZMQ PUB: risk.rejected
end
Time: t0
Alpaca Markets ─────────────▶ Market Data Service
(WebSocket: Trade Update)
Time: t0 + 50μs
Market Data Service ─────────▶ Signal Bridge
Market Data Service ─────────▶ Risk Manager
(ZMQ PUB: market.trade)
Time: t0 + 5ms
Signal Bridge ───────────────▶ Risk Manager
(ZMQ PUB: signal.generated)
Time: t0 + 6ms
Risk Manager ────────────────▶ Execution Engine
(ZMQ PUB: risk.approved)
Time: t0 + 10ms
Execution Engine ────────────▶ Alpaca Markets
(REST: POST /v2/orders)
Total Latency: ~10ms (signal to execution)
Advantages:
- Decoupling: Publishers don't know about subscribers
- Scalability: Add subscribers without modifying publishers
- Performance: Zero-copy messaging, <10μs latency
- Reliability: Automatic reconnection and buffering
Topic-Based Routing:
// Publisher (Market Data)
pub async fn publish_trade(&self, trade: Trade) {
let topic = "market.trade";
let message = Message::TradeUpdate(trade);
let data = serde_json::to_vec(&message)?;
self.socket.send_multipart([topic.as_bytes(), &data], 0)?;
}
// Subscriber (Risk Manager)
pub async fn subscribe(&mut self) {
self.socket.set_subscribe(b"market.")?;
self.socket.set_subscribe(b"signal.")?;
loop {
let msg = self.socket.recv_multipart(0)?;
let topic = String::from_utf8(msg[0].clone())?;
let data = &msg[1];
self.handle_message(&topic, data).await?;
}
}Message Topics:
market.trade- Trade updatesmarket.quote- Quote updatesmarket.bar- OHLCV bar updatessignal.generated- Trading signalsrisk.approved- Risk-approved ordersrisk.rejected- Risk-rejected ordersorder.submitted- Orders sent to exchangeorder.filled- Order fill updatessystem.heartbeat- Component health checks
For synchronous operations (e.g., account balance queries):
// Client (Risk Manager)
let request = Message::BalanceRequest;
socket.send(serde_json::to_vec(&request)?)?;
let response: BalanceResponse = socket.recv_json()?;
// Server (Execution Engine)
let request: Message = socket.recv_json()?;
let balance = alpaca.get_account_balance().await?;
let response = Message::BalanceResponse(balance);
socket.send(serde_json::to_vec(&response)?)?;Performance:
- Zero-cost abstractions enable C++ performance without manual memory management
- No garbage collection pauses (critical for low-latency trading)
- LLVM-optimized machine code
Safety:
- Ownership system prevents data races at compile time
- No null pointer dereferences or use-after-free bugs
- Strong type system catches errors before production
Concurrency:
- Tokio async runtime provides efficient task scheduling
- Zero-cost async/await syntax
- fearless concurrency with Send/Sync traits
Trade-offs:
- Steeper learning curve compared to Python/Go
- Longer compile times (mitigated with incremental compilation)
- Smaller ecosystem compared to Java/C++
Performance:
- Zero-copy messaging via shared memory
- <10μs message latency
- Millions of messages/second throughput
Flexibility:
- Multiple transport protocols (TCP, IPC, inproc)
- Various messaging patterns (PUB/SUB, REQ/REP, PUSH/PULL)
- Language-agnostic (Rust, Python, C++, Go, etc.)
Reliability:
- Automatic reconnection
- Configurable buffering and backpressure
- Fair queuing for subscribers
Alternatives Considered:
- gRPC: Higher latency (~100μs), requires code generation
- NATS: Requires external broker, added operational complexity
- Kafka: Too heavy for low-latency use case (ms latency)
- Redis PUB/SUB: Requires external server, no persistence
Benefits:
- Shared Dependencies: Single
Cargo.lockfor reproducible builds - Common Types:
commoncrate shared across all components - Consistent Versioning: Workspace-level version management
- Efficient Builds: Cargo caches shared dependencies
Structure:
rust/
├── Cargo.toml # Workspace root
├── common/ # Shared types and utilities
├── market-data/ # Independent binary
├── signal-bridge/ # Independent binary
├── risk-manager/ # Independent binary
└── execution-engine/ # Independent binary
Performance:
- Near-native speed (10-100x faster than pure Python)
- Zero-copy data transfer between Rust and Python
- GIL management for concurrent access
Flexibility:
- Gradual migration path (start with Python, optimize critical paths in Rust)
- Leverage Python's ML ecosystem (scikit-learn, TensorFlow, PyTorch)
- Maintain type safety with pyo3 type annotations
Alternative Considered:
- Native Python: Too slow for high-frequency signal generation
- Cython: Requires compilation, less type safety
- RESTful API: Higher latency, serialization overhead
Current Design:
- Positions stored in-memory (HashMap)
- No persistence across restarts
Rationale:
- Simplicity: No database setup/maintenance
- Performance: In-memory access is 100-1000x faster
- MVP Focus: Sufficient for initial development
Future Improvement:
- Add PostgreSQL for position persistence
- Use TimescaleDB for time-series metrics
- Redis for shared state across instances
Each component can scale independently:
┌─────────────────────┐
│ Market Data 1 │────┐
└─────────────────────┘ │
┌─────────────────────┐ │ ┌─────────────────────┐
│ Market Data 2 │────┼───▶│ Load Balancer │
└─────────────────────┘ │ │ (HAProxy/NGINX) │
┌─────────────────────┐ │ └─────────────────────┘
│ Market Data N │────┘ │
└─────────────────────┘ ▼
ZMQ PUB socket
Scaling Strategies:
- Market Data: Shard by symbol (e.g., instance 1 handles AAPL-MSFT, instance 2 handles GOOGL-AMZN)
- Signal Bridge: Run multiple Python processes (avoid GIL contention)
- Risk Manager: Shard by symbol or use distributed locking (Redis)
- Execution Engine: Round-robin order distribution
CPU:
- Multi-threaded Tokio runtime scales to available cores
- CPU affinity can reduce context switching
Memory:
- Order books grow linearly with number of symbols (~500KB per symbol)
- Estimated memory: 100 symbols × 500KB = 50MB
Network:
- ZMQ supports 10Gbps+ throughput with proper tuning
- TCP_NODELAY disables Nagle's algorithm for lower latency
Rust Optimizations:
[profile.release]
opt-level = 3 # Maximum optimization
lto = true # Link-time optimization
codegen-units = 1 # Single codegen unit (slower compile, faster runtime)
strip = true # Remove debug symbolsTokio Tuning:
// Increase worker threads
tokio::runtime::Builder::new_multi_thread()
.worker_threads(8)
.enable_all()
.build()?;ZMQ Tuning:
// Increase socket buffers
socket.set_sndhwm(10000)?; // Send high-water mark
socket.set_rcvhwm(10000)?; // Receive high-water mark
socket.set_tcp_keepalive(1)?;-
Single Exchange Support
- Only Alpaca Markets integration
- Cannot aggregate liquidity across exchanges
-
No Position Persistence
- Positions lost on restart
- Risk exposure during recovery
-
Simple Risk Model
- Position-based limits only
- No portfolio-level VaR or Greeks
-
No Backtesting
- Live trading only
- Cannot validate strategies on historical data
-
Limited Observability
- Basic Prometheus metrics
- No distributed tracing (Jaeger/Zipkin)
-
No High Availability
- Single point of failure for each component
- No leader election or failover
- WebSocket Reconnection: 5-second delay on disconnect (Alpaca limitation)
- Python GIL: Signal generation limited to ~1,000/second
- Rate Limiting: Alpaca API limits to 200 requests/minute
-
Position Persistence
- PostgreSQL integration
- Automatic recovery on restart
-
Enhanced Risk Management
- Portfolio-level risk (VaR, Expected Shortfall)
- Greeks calculation for options
- Correlation-based position limits
-
Multi-Exchange Support
- Binance integration (crypto)
- Interactive Brokers (stocks, futures, options)
-
Backtesting Framework
- Historical data replay
- Strategy performance metrics
- Parameter optimization
-
Web Dashboard
- React-based monitoring UI
- Real-time position tracking
- P&L visualization
-
Distributed Tracing
- Jaeger integration
- Request ID propagation
- Performance profiling
-
High Availability
- Raft consensus for leader election
- Automatic failover
- State replication
-
Machine Learning Pipeline
- Feature store (Feast)
- Model serving (Seldon Core)
- A/B testing framework
-
Options Trading
- Greeks calculation
- Volatility surface modeling
- Multi-leg strategies
Last Updated: 2024-10-14 | Version: 0.1.0