Aarav Jain improvements#8
Conversation
|
@aaravjj2 is attempting to deploy a commit to the Victor's projects Team on Vercel. A member of the Team first needs to authorize it. |
There was a problem hiding this comment.
Pull request overview
This PR expands Musashi’s operational + research loop: adds risk/session controls, performance tracking & resolution ingestion for signal_outcomes, introduces ML signal scoring scaffolding, and strengthens testing/docs/scripts for CI/backtesting and infra operations.
Changes:
- Add new API endpoints/routes (risk session, performance metrics, internal resolve-market) plus rate limiting and cache/WS enhancements.
- Introduce ML scoring module + schema to log outcomes for training, along with scripts to collect resolutions and run backtests.
- Expand unit/smoke/testing ladder and add extensive deployment/environment documentation.
Reviewed changes
Copilot reviewed 81 out of 85 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| vercel.json | Adds new API routes + CORS header for X-API-Key. |
| tests/unit/cache-utils.test.mjs | Adds unit tests for rate limit + cache/KV helpers. |
| tests/helpers/test-helpers.mjs | Adds shared response/fetch/KV mock helpers. |
| supabase/migrations/20260418000000_signal_outcomes.sql | Creates signal_outcomes table + indexes for ML/metrics. |
| src/types/market.ts | Extends market/arbitrage types; adds Kelly sizing types. |
| src/ml/signal-scorer-model.ts | Adds ML model loading/inference + heuristic fallback. |
| src/ml/models/README.md | Documents model-weights directory. |
| src/ml/models/.gitkeep | Keeps weights directory in repo. |
| src/ml/index.ts | Exposes ML public API exports. |
| src/ml/example-usage.ts | Demonstrates end-to-end ML scoring workflow. |
| src/ml/README.md | Full ML scorer documentation. |
| src/ml/QUICKSTART.md | Quickstart guide for ML scoring. |
| src/db/signal-outcomes.example.ts | Example usage for outcome logging + resolution + metrics. |
| src/db/README.md | Documents outcome tracking system. |
| src/db/ARCHITECTURE.md | Describes signal→DB→training/metrics architecture. |
| src/api/supabase-client.ts | Adds typed signal_outcomes + connection test. |
| src/api/polymarket-price-poller.ts | Adds REST order book depth fetcher. |
| src/analysis/sentiment-analyzer.ts | Adds weighted multi-tweet sentiment aggregation. |
| src/analysis/semantic-matcher.ts | Adds transformer-based semantic matching with caching. |
| src/analysis/semantic-matcher-example.ts | Example usage for semantic matching. |
| src/analysis/kelly-sizing.ts | Adds Kelly sizing + volatility/anomaly detectors. |
| src/analysis/README.md | Documents semantic matcher design/perf. |
| scripts/test-smoke-imports.ts | Adds import smoke tests for critical modules. |
| scripts/test-performance-endpoints.ts | Adds integration script to validate performance endpoints. |
| scripts/test-agent-api.ts | Increases default timeout to 30s. |
| scripts/ml/collect-resolutions.ts | Batch job to backfill outcomes from venue resolutions. |
| scripts/lib/is-main-module.ts | ESM-safe entrypoint detection helper. |
| scripts/interview-ready.ts | Runs CI ladder + prints pitch prompts. |
| scripts/backtest/pnl-calculator.ts | Adds P&L calculator utilities for backtests. |
| scripts/backtest/historical-data-fetcher.ts | Fetches KV price snapshots for backtesting. |
| scripts/backtest/example-usage.ts | Shows programmatic backtest usage. |
| package.json | Adds test ladder scripts + ML/backtest/collector commands; deps for transformers/ws. |
| docs/WS_STRATEGY.md | WS operations guidance. |
| docs/TESTING.md | Documents testing ladder and scripts. |
| docs/SUBMISSION.md | Submission guidance and commands. |
| docs/SLO.md | Adds SLO ideas and metrics targets. |
| docs/QUICK_START_PERFORMANCE.md | Setup guide for performance tracking endpoints. |
| docs/PORTFOLIO_RISK.md | Notes on portfolio-level risk beyond session. |
| docs/PHASE_D_GATE.md | Gate criteria for execution-layer work. |
| docs/NATIVE_DEPS.md | Notes on transformers/sharp/native deps. |
| docs/ML_CALIBRATION.md | Roadmap for calibration improvements. |
| docs/ENVIRONMENT.md | Central env var reference (flags/keys/limits). |
| docs/DEPLOYMENT.md | Deployment checklist (migrations/env/crons/workflows). |
| docs/ARBITRAGE_REALISM.md | Notes on mid-price vs executable edge and scaling. |
| api/risk/session.ts | New session-level circuit breaker endpoint. |
| api/metrics/performance.ts | New endpoint for aggregated signal performance metrics. |
| api/markets/arbitrage.ts | Adds rate limiting + new filtering params (net spread/opposed). |
| api/lib/rate-limit.ts | Adds per-instance sliding window rate limiter helpers. |
| api/lib/market-cache.ts | Integrates Polymarket WS price refresh + order book fetcher; makes arbitrage async. |
| api/internal/resolve-market.ts | New internal endpoint to resolve markets & compute P&L. |
| api/health.ts | Adds operational readiness flags + updated endpoint catalog/version. |
| api/analyze-text.ts | Adds rate limiting + vol regime hint + optional ML scoring fields. |
| REAL_TIME_IMPLEMENTATION.md | Real-time infra documentation. |
| README.md | Updates repo pointers + adds env/testing/deploy docs links. |
| QUICKSTART_OUTCOME_TRACKING.md | Quickstart for outcome tracking workflow. |
| IMPLEMENTATION_SUMMARY.md | Summary documentation for real-time infra implementation. |
| DEPLOYMENT_READY.md | Deployment readiness narrative for performance system. |
| .gitignore | Ignores reports + ML weights JSON outputs. |
| .github/workflows/collect-resolutions.yml | Adds workflow_dispatch job for resolution collection. |
| .github/workflows/ci.yml | Adds CI workflow (typecheck + test ladder). |
| .github/workflows/backtest-report.yml | Adds workflow_dispatch backtest report artifact job. |
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| function isAuthorized(req: VercelRequest): boolean { | ||
| const apiKey = req.headers['x-api-key'] || req.headers['authorization']?.replace('Bearer ', ''); | ||
| const expectedKey = process.env.INTERNAL_API_KEY; | ||
|
|
||
| if (!expectedKey) { | ||
| // If no key is configured, check if request is from internal network | ||
| const allowedIps = (process.env.INTERNAL_IPS || '').split(','); | ||
| const clientIp = req.headers['x-forwarded-for'] || req.socket?.remoteAddress || ''; | ||
| return allowedIps.some(ip => clientIp.toString().includes(ip)); | ||
| } | ||
|
|
||
| return apiKey === expectedKey; | ||
| } |
There was a problem hiding this comment.
isAuthorized() falls back to an IP allowlist when INTERNAL_API_KEY is unset, but (process.env.INTERNAL_IPS || '').split(',') yields [''] when the env var is missing/empty; clientIp.includes('') will then return true for every request, effectively disabling auth. Default to deny when no API key is configured (or require a non-empty allowlist and perform exact/CIDR matching on the first x-forwarded-for IP).
| const supabaseUrl = process.env.NEXT_PUBLIC_SUPABASE_URL; | ||
| const supabaseKey = process.env.SUPABASE_SERVICE_KEY || process.env.NEXT_PUBLIC_SUPABASE_ANON_KEY; | ||
|
|
||
| if (!supabaseUrl || !supabaseKey) { | ||
| res.status(500).json({ | ||
| success: false, | ||
| error: 'Supabase configuration missing', | ||
| }); | ||
| return; | ||
| } | ||
|
|
||
| const supabase = createSupabaseBrowserClient(supabaseUrl, supabaseKey); | ||
|
|
There was a problem hiding this comment.
This endpoint only reads NEXT_PUBLIC_SUPABASE_URL (and NEXT_PUBLIC_SUPABASE_ANON_KEY), but the rest of the codebase/docs also use SUPABASE_URL / SUPABASE_ANON_KEY. As written, /api/internal/resolve-market will 500 in environments that set only the non-NEXT_PUBLIC_* vars. Consider using the same fallback chain as api/metrics/performance.ts (SUPABASE_URL || NEXT_PUBLIC_SUPABASE_URL, and prefer SUPABASE_SERVICE_KEY on the server).
| // Fetch all signal outcomes | ||
| const { data: allSignals, error: allSignalsError } = await supabase | ||
| .from('signal_outcomes') | ||
| .select('*'); |
There was a problem hiding this comment.
This endpoint pulls the entire signal_outcomes table into memory (select('*')) and then filters in-process for 24h/7d/30d. As the table grows, this will become slow and can exceed serverless memory/time limits. Prefer querying only the needed time window (e.g., gte('created_at', day30Ago)) and selecting only required columns; consider separate lightweight queries/counts for totals and unresolved backlog.
| // Recency decay — half-life of 15 minutes | ||
| const ageMins = Math.max(0, (now - t.timestamp) / 60_000); | ||
| const recencyDecay = Math.exp(-ageMins / 15); | ||
|
|
There was a problem hiding this comment.
The comment says “half-life of 15 minutes”, but Math.exp(-ageMins / 15) uses 15 minutes as the decay constant (time to e^-1), not a half-life. If you intend a true 15-minute half-life, the exponent should be scaled by ln(2); otherwise update the comment to match the implemented decay.
| // Filter out anomalous markets from arbitrage consideration | ||
| const arbitrageOpportunities = await getArbitrage(0.03); |
There was a problem hiding this comment.
This comment says “Filter out anomalous markets from arbitrage consideration”, but the code only filters is_directionally_opposed. Either update the comment to reflect what’s actually being filtered, or implement the anomalous-market filter here (e.g., based on market.is_anomalous).
| function calculatePnL( | ||
| edge: number, | ||
| predictedProb: number, | ||
| wasCorrect: boolean, | ||
| bankroll: number = 1000 // Default bankroll | ||
| ): number { | ||
| // Kelly Criterion: f* = (bp - q) / b | ||
| // where b = decimal odds - 1, p = win probability, q = 1 - p | ||
| // Simplified: bet size = edge * bankroll (fraction Kelly) | ||
|
|
||
| const kellyFraction = Math.abs(edge) * 0.25; // Quarter Kelly for safety | ||
| const betSize = kellyFraction * bankroll; | ||
|
|
||
| if (wasCorrect) { | ||
| // Return at fair odds based on predicted probability | ||
| // Profit = betSize / predictedProb - betSize | ||
| return betSize * (1 / predictedProb - 1); | ||
| } else { | ||
| // Loss: we lose the entire bet | ||
| return -betSize; | ||
| } |
There was a problem hiding this comment.
calculatePnL() can divide by zero (or near-zero) when predictedProb is 0/very small, producing Infinity/huge P&L values. Clamp predictedProb to a safe epsilon range (and validate it’s finite) before using 1 / predictedProb so a single bad row can’t corrupt metrics.
| The file will be created after running the training script. | ||
|
|
There was a problem hiding this comment.
Fix the extra space in “The file …” (double space) to avoid a typo in docs.
| // Run if executed directly | ||
| if (require.main === module) { | ||
| main().catch(console.error); | ||
| } |
There was a problem hiding this comment.
This script uses require.main === module as its entrypoint guard. If it’s executed under ESM (which is common with node --import tsx), require is undefined and the script will crash. Consider using the repo’s isMainModule() helper (scripts/lib/is-main-module.ts) or an import.meta.url-based guard for consistency.
| try { | ||
| const modelPath = path.join(__dirname, 'models', 'signal-scorer-v1.json'); | ||
|
|
||
| if (!fs.existsSync(modelPath)) { | ||
| modelLoadError = 'Model file not found'; | ||
| console.warn('[signal-scorer-model] Model file not found, using heuristic fallback'); | ||
| return null; | ||
| } |
There was a problem hiding this comment.
loadModel() builds the weights path using __dirname. If this module is ever executed as ESM (or bundled), __dirname is not defined and model loading will fail. Prefer deriving the directory from import.meta.url (via fileURLToPath) or otherwise ensure this module is CJS-only and documented as such.
| // Run examples if this file is executed directly | ||
| if (require.main === module) { | ||
| runAllExamples(); | ||
| } |
There was a problem hiding this comment.
This example uses require.main === module to detect direct execution. If it’s run under ESM (common with node --import tsx), require is undefined. Consider using isMainModule() (scripts/lib/is-main-module.ts) or an import.meta.url-based guard for ESM compatibility.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c1287f5e0e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const allowedIps = (process.env.INTERNAL_IPS || '').split(','); | ||
| const clientIp = req.headers['x-forwarded-for'] || req.socket?.remoteAddress || ''; | ||
| return allowedIps.some(ip => clientIp.toString().includes(ip)); |
There was a problem hiding this comment.
Reject unauthenticated calls when INTERNAL_IPS is empty
The fallback auth path becomes effectively open when INTERNAL_API_KEY is unset and INTERNAL_IPS is missing/empty: (process.env.INTERNAL_IPS || '').split(',') yields [''], and clientIp.includes('') is always true, so any client is treated as authorized. In that deployment state, POST /api/internal/resolve-market can be called publicly and mutate signal outcomes, which is a production-impacting security regression.
Useful? React with 👍 / 👎.
| const exitPrice = direction === 'YES' | ||
| ? exitSnapshot.yesPrice | ||
| : (1 - exitSnapshot.yesPrice); |
There was a problem hiding this comment.
Recompute exit price after stop-loss/take-profit updates
exitPrice is captured before the stop-loss/take-profit branch, but that branch can replace exitSnapshot and exitReason. Because exitPrice is never recomputed, downstream P&L, correctness, and price-change metrics still use the old expiry/resolution price instead of the triggered stop/take-profit price, making backtest results incorrect whenever those guards fire.
Useful? React with 👍 / 👎.
| const { data: signals, error: fetchError } = await supabase | ||
| .from('signal_outcomes') | ||
| .select('*') | ||
| .eq('market_id', market.ticker) |
There was a problem hiding this comment.
Match Kalshi resolutions against stored market_id format
This lookup uses raw market.ticker, but signals are logged with market_id = topMarket.id (the normalized internal ID, e.g. kalshi-<ticker>), so Kalshi rows won't be found and remain unresolved. In practice the collector silently skips Kalshi outcome updates, which breaks win-rate/P&L tracking and ML labels for that platform.
Useful? React with 👍 / 👎.
…dress review findings - Remove Access-Control-Allow-Origin: * from risk/session endpoint - Add API key auth + per-IP rate limiting to /api/risk/session - Replace SELECT * with explicit column list in getRecentPerformance() - Truncate raw tweet text to 280 chars before DB storage - Replace 32-bit hash event IDs with crypto.randomUUID() - Add ML circular-priors warning and MUSASHI_ML_ENABLED runtime gate - Document fair-odds P&L assumption in calculatePnL and backtest README - Update DEPLOYMENT.md with new env vars Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…cktest disclaimer 1. Event ID: replace crypto.randomUUID() with deterministic SHA-256 hash of (marketId + signalType + 5-min time bucket + text) for stable dedup 2. .gitignore: add *.docx, *.xlsx, *.pptx to prevent Office binaries in git 3. Dependency audit: all deps verified in-use, no removals needed 4. ML floor: add getMinRealSignals() clamped to min 50 with warning log 5. Retry-After: add 60-second header on 429 rate-limit responses 6. CORS: read ALLOWED_ORIGIN env var, reject wildcard in production 7. Backtest: prepend fair-odds disclaimer to all generated reports 8. Tests: add event ID determinism + ML floor clamp tests, fix risk/session tests to provide auth headers (81/81 passing) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…r format 1. Export generateEventId from signal-generator, rewrite unit tests to import the real function instead of inline re-implementations; add bucket-boundary test documenting 5-minute window dedup behavior 2. .gitignore: add *.doc, *.xls, *.ppt alongside existing Office patterns; add reference materials note in README 3. Rewrite getMinRealSignals test to import and call the actual exported function from signal-scorer-model.ts 4. Add Retry-After: 60 header test using pre-filled rate limit bucket 5. CORS: return 500 in production when ALLOWED_ORIGIN is unset instead of just logging; update ENVIRONMENT.md docs 6. Backtest disclaimer: switch to box-drawing character format matching spec 7. Dependency audit: all 4 new deps verified in-use, all MIT/Apache licensed 83/83 tests pass (35 unit + 39 API + 5 wallet + 4 smoke), typecheck clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
bb602aa to
abc5afc
Compare
Summary\n- Expand test coverage across unit and API handler layers\n- Add shared test helpers and improve test scripts/docs\n- Include infrastructure/docs updates for performance, ML, risk, and backtesting\n\n## Validation\n- pnpm test:all (all suites passing)\n