Skip to content

Feat/enhanced risk resilience telemetry v2#205

Open
VirilePeak wants to merge 24 commits intoPolymarket:mainfrom
VirilePeak:feat/enhanced-risk-resilience-telemetry_v2
Open

Feat/enhanced risk resilience telemetry v2#205
VirilePeak wants to merge 24 commits intoPolymarket:mainfrom
VirilePeak:feat/enhanced-risk-resilience-telemetry_v2

Conversation

@VirilePeak
Copy link

No description provided.

AlphaClaw added 15 commits February 17, 2026 18:22
- RiskConfig: ENV-based configuration with validation
- PortfolioState: Equity, exposure, PnL tracking
- Position: Simplified position representation
- RiskManager: Core risk checks (sizing, limits, stops)
- RiskBlockReason: Enum for telemetry
- Comprehensive unit tests for all components

Risk Rules Implemented:
- max_risk_pct_per_trade (default 2%)
- max_total_exposure_pct (default 15%)
- daily_loss_limit_pct (default 5%)
- max_concurrent_positions (default 5)
- max_slippage_bps for stop loss (default 100)
- max_spread_bps for entry filter (default 200)

Feature flags:
- RISK_ENABLED=1/0 to toggle all checks

Tests: 20+ test cases covering sizing, blocking, exits, telemetry
… Step 2)

- Enhanced Trader class with risk integration
- Portfolio state tracking (equity, exposure, positions)
- Risk-based position sizing before execution
- Pre-trade risk checks (spread, exposure, daily loss)
- Position maintenance with exit signals
- Daily stats reset for tracking

Key changes:
- _get_portfolio_state(): Builds portfolio snapshot
- _check_new_trading_day(): Resets daily limits
- Risk checks before every trade execution
- Exit signal detection in maintain_positions()

Note: Actual execution commented out (TOS compliance)
Requires: get_open_positions() in Polymarket class
Circuit Breaker:
- 3 states: CLOSED, OPEN, HALF_OPEN
- Configurable failure thresholds per service
- Automatic recovery with half-open testing
- Metrics tracking (state changes, blocked calls)
- Per-service configs: Polymarket (fast), Gamma (medium), OpenAI (slow)

Retry Handler:
- Exponential backoff with jitter
- Configurable retryable exceptions
- Decorator for easy function wrapping
- Pre-configured handlers for each API

Features:
- Thread-safe implementation
- Global registry for circuit breakers
- Detailed metrics for observability
- Force reset for manual recovery
Metrics Collection:
- TradeMetrics: Per-trade tracking (status, latency, PnL)
- CycleMetrics: End-to-end cycle timing
- Counter system with labels
- Latency histograms (Prometheus-style)
- Block reason tracking

HTTP Server:
- /metrics - Prometheus text format
- /metrics/json - JSON format
- /health - Health check
- Background thread, non-blocking

Features:
- Thread-safe implementation
- Configurable history limits
- Stage timing context manager
- Global singleton for easy access
Model Registry:
- ENV-based configuration (DEFAULT_MODEL, FALLBACK_MODEL)
- Pre-configured models: GPT-4, GPT-3.5, Claude-3
- Per-model timeouts and retry policies
- Rate limit tracking

LLM Client:
- Automatic fallback on 429/5xx errors
- Exponential backoff per model
- Provider abstraction (OpenAI, Anthropic)
- Detailed response metadata (latency, model used)

Usage:
- llm_call(messages) - simple API
- client.call() with full control
- Fallback chain: DEFAULT_MODEL -> FALLBACK_MODEL

Environment:
- DEFAULT_MODEL=gpt-4
- FALLBACK_MODEL=gpt-3.5-turbo
- OPENAI_API_KEY / ANTHROPIC_API_KEY
EnhancedExecutor:
- Retry/Backoff for all external calls (Polymarket, Gamma, OpenAI)
- Circuit breaker protection per service
- Model fallback via llm_call()
- Telemetry collection at critical points
- Risk gate before order submission

IntegratedTrader:
- Full A-D integration in one_best_trade()
- Stage-by-stage latency tracking
- Risk checks with telemetry
- Metrics server auto-start
- Backwards compatible interface

Features:
- All feature-flagged via ENV
- Detailed block reason logging
- Cycle metrics recording
- Position maintenance with exit signals
Circuit Breaker Tests:
- closed -> open transition on failures
- open rejects calls immediately
- open -> half_open after timeout
- half_open -> closed on success
- half_open -> open on failure

Retry Handler Tests:
- success without retry
- retry then success
- exhaust retries
- no retry on non-retryable exceptions

Risk Manager Tests:
- position sizing caps
- max exposure block
- daily loss limit block
- spread too wide block

Model Registry Tests:
- ENV loading
- model config retrieval

Integration Smoke Tests:
- metrics counter increments
- circuit breaker metrics
- Quick start guide
- All ENV variables with defaults
- How to run (3 modes)
- How to verify (tests, metrics, circuit breaker)
- Architecture diagram
- File changes summary
- Monitoring guide
- Troubleshooting section
- Fixes NameError in test_metrics_collector_increment
- All 18 tests now passing
.gitignore:
- Python artifacts (__pycache__, *.pyc)
- Environment files (.env)
- Credentials (gdrive_credentials.json, oauth_credentials.json, etc.)
- Logs and local databases
- IDE files

CI Workflow:
- Run on push/PR to main and feat/* branches
- Python 3.12 setup
- Install dependencies
- Run pytest on test_integration.py
- Secret scanning check
PolymarketWSClient:
- Connects to wss://ws-subscriptions-clob.polymarket.com/ws/market
- Normalizes events: Quote, Trade, Orderbook
- In-memory state: latest_quote per market
- Auto-reconnect with exponential backoff
- Thread-safe implementation

Health Endpoint:
- /market-data/health - Overall health status
- /market-data/status - Detailed status + quotes
- Tracks: connected, last_message_age_s, subscriptions

Features:
- Feature-flagged: WS_ENABLED
- Falls back to HTTP if WS disabled
- Configurable reconnect interval
- Singleton for easy access
OrderBook:
- Level-2 bids/asks with PriceLevel
- Best bid/ask, spread, mid, microprice
- Depth within X bps (1bp, 5bp)
- Imbalance calculation (-1 to 1)
- Volatility proxy from book shape

LiquidityGate:
- max_spread_bps check
- min_depth_1bp check
- min_depth_5bp check
- max_book_age_s staleness check

OrderBookManager:
- Multi-market orderbook storage
- Snapshot + delta updates
- Trade history tracking
- VWAP calculation
- Liquidity check per market

Features:
- Feature-flagged: L2_ENABLED
- Thread-safe implementation
- Singleton for easy access
…ck G)

ExecutionEngine:
- Pre-trade checks: risk gate + liquidity gate + staleness gate
- Order types: maker, taker, smart (maker -> taker fallback)
- Iceberg splitting for large orders
- Post-trade: slippage tracking, fill verification
- Retry logic with max_retries config

ExecutionConfig:
- order_type: maker/taker/smart
- max_slippage_bps, max_order_age_s
- iceberg_threshold, iceberg_parts
- Feature flags: enabled, verify_fills, retry_on_fail

ExecutionResult:
- success, filled_size, avg_price
- slippage_bps, fees, latency_ms
- retries, error tracking

Features:
- Feature-flagged: EXECUTION_ENABLED
- VWAP calculation for multi-part fills
- Slippage statistics tracking
- Singleton for easy access
Trader Integration:
- MarketDataEnhancedTrader with feature flags
- WS_ENABLED, L2_ENABLED, EXECUTION_ENABLED (default OFF)
- Automatic fallback to HTTP if WS unavailable
- Pre-trade gates: risk + liquidity + staleness
- Smart execution when EXECUTION_ENABLED=1

Tests (test_market_data.py):
- Block E: WS event parsing, health state
- Block F: Orderbook snapshot, delta, computed features
- Block G: Execution decisions (stale, spread, depth)

README Update:
- 5m-ready mode section
- ENV flags with conservative defaults
- How to run + verify
- Fallback behavior docs
- Add pytest.approx for floating point comparisons
- Fix LiquidityGate test spreads (tight vs wide)
- Fix RiskManager tests with relaxed spread limits
- Fix Execution tests with proper gate ordering
- Add missing import in test_market_data.py
- Fix health_server.py import path
@VirilePeak VirilePeak marked this pull request as ready for review February 17, 2026 11:48
AlphaClaw added 9 commits February 17, 2026 20:14
- Prevents catching SystemExit, KeyboardInterrupt
- Cleaner error handling
- Best practice for production code
- one_best_trade() called itself on exception
- Could cause stack overflow
- Now re-raises exception instead
- Proper retry logic should be in caller
- get_execution_engine() was not thread-safe
- Could create multiple instances in race condition
- Now uses double-checked locking pattern
- _latest_quotes could grow unbounded with many subscriptions
- Now limited to 1000 entries
- Auto-cleanup of unsubscribed markets when limit reached
- portfolio.equity could be 0 or negative
- Would cause ZeroDivisionError
- Now checks equity before division and blocks trade
- Add ping_interval and ping_timeout to prevent hanging
- Close event loop on thread exit to prevent resource leak
- _connected was set without lock (race with subscribe)
- _connected not reset on disconnect
- Now properly synchronized with _lock
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant