Aarav Jain improvements by aaravjj2 · Pull Request #8 · MusashiBot/musashi-api

aaravjj2 · 2026-04-20T22:32:41Z

Summary\n- Expand test coverage across unit and API handler layers\n- Add shared test helpers and improve test scripts/docs\n- Include infrastructure/docs updates for performance, ML, risk, and backtesting\n\n## Validation\n- pnpm test:all (all suites passing)\n

vercel · 2026-04-20T22:32:45Z

@aaravjj2 is attempting to deploy a commit to the Victor's projects Team on Vercel.

A member of the Team first needs to authorize it.

Copilot

Pull request overview

This PR expands Musashi’s operational + research loop: adds risk/session controls, performance tracking & resolution ingestion for signal_outcomes, introduces ML signal scoring scaffolding, and strengthens testing/docs/scripts for CI/backtesting and infra operations.

Changes:

Add new API endpoints/routes (risk session, performance metrics, internal resolve-market) plus rate limiting and cache/WS enhancements.
Introduce ML scoring module + schema to log outcomes for training, along with scripts to collect resolutions and run backtests.
Expand unit/smoke/testing ladder and add extensive deployment/environment documentation.

Reviewed changes

Copilot reviewed 81 out of 85 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
vercel.json	Adds new API routes + CORS header for `X-API-Key`.
tests/unit/cache-utils.test.mjs	Adds unit tests for rate limit + cache/KV helpers.
tests/helpers/test-helpers.mjs	Adds shared response/fetch/KV mock helpers.
supabase/migrations/20260418000000_signal_outcomes.sql	Creates `signal_outcomes` table + indexes for ML/metrics.
src/types/market.ts	Extends market/arbitrage types; adds Kelly sizing types.
src/ml/signal-scorer-model.ts	Adds ML model loading/inference + heuristic fallback.
src/ml/models/README.md	Documents model-weights directory.
src/ml/models/.gitkeep	Keeps weights directory in repo.
src/ml/index.ts	Exposes ML public API exports.
src/ml/example-usage.ts	Demonstrates end-to-end ML scoring workflow.
src/ml/README.md	Full ML scorer documentation.
src/ml/QUICKSTART.md	Quickstart guide for ML scoring.
src/db/signal-outcomes.example.ts	Example usage for outcome logging + resolution + metrics.
src/db/README.md	Documents outcome tracking system.
src/db/ARCHITECTURE.md	Describes signal→DB→training/metrics architecture.
src/api/supabase-client.ts	Adds typed `signal_outcomes` + connection test.
src/api/polymarket-price-poller.ts	Adds REST order book depth fetcher.
src/analysis/sentiment-analyzer.ts	Adds weighted multi-tweet sentiment aggregation.
src/analysis/semantic-matcher.ts	Adds transformer-based semantic matching with caching.
src/analysis/semantic-matcher-example.ts	Example usage for semantic matching.
src/analysis/kelly-sizing.ts	Adds Kelly sizing + volatility/anomaly detectors.
src/analysis/README.md	Documents semantic matcher design/perf.
scripts/test-smoke-imports.ts	Adds import smoke tests for critical modules.
scripts/test-performance-endpoints.ts	Adds integration script to validate performance endpoints.
scripts/test-agent-api.ts	Increases default timeout to 30s.
scripts/ml/collect-resolutions.ts	Batch job to backfill outcomes from venue resolutions.
scripts/lib/is-main-module.ts	ESM-safe entrypoint detection helper.
scripts/interview-ready.ts	Runs CI ladder + prints pitch prompts.
scripts/backtest/pnl-calculator.ts	Adds P&L calculator utilities for backtests.
scripts/backtest/historical-data-fetcher.ts	Fetches KV price snapshots for backtesting.
scripts/backtest/example-usage.ts	Shows programmatic backtest usage.
package.json	Adds test ladder scripts + ML/backtest/collector commands; deps for transformers/ws.
docs/WS_STRATEGY.md	WS operations guidance.
docs/TESTING.md	Documents testing ladder and scripts.
docs/SUBMISSION.md	Submission guidance and commands.
docs/SLO.md	Adds SLO ideas and metrics targets.
docs/QUICK_START_PERFORMANCE.md	Setup guide for performance tracking endpoints.
docs/PORTFOLIO_RISK.md	Notes on portfolio-level risk beyond session.
docs/PHASE_D_GATE.md	Gate criteria for execution-layer work.
docs/NATIVE_DEPS.md	Notes on transformers/sharp/native deps.
docs/ML_CALIBRATION.md	Roadmap for calibration improvements.
docs/ENVIRONMENT.md	Central env var reference (flags/keys/limits).
docs/DEPLOYMENT.md	Deployment checklist (migrations/env/crons/workflows).
docs/ARBITRAGE_REALISM.md	Notes on mid-price vs executable edge and scaling.
api/risk/session.ts	New session-level circuit breaker endpoint.
api/metrics/performance.ts	New endpoint for aggregated signal performance metrics.
api/markets/arbitrage.ts	Adds rate limiting + new filtering params (net spread/opposed).
api/lib/rate-limit.ts	Adds per-instance sliding window rate limiter helpers.
api/lib/market-cache.ts	Integrates Polymarket WS price refresh + order book fetcher; makes arbitrage async.
api/internal/resolve-market.ts	New internal endpoint to resolve markets & compute P&L.
api/health.ts	Adds operational readiness flags + updated endpoint catalog/version.
api/analyze-text.ts	Adds rate limiting + vol regime hint + optional ML scoring fields.
REAL_TIME_IMPLEMENTATION.md	Real-time infra documentation.
README.md	Updates repo pointers + adds env/testing/deploy docs links.
QUICKSTART_OUTCOME_TRACKING.md	Quickstart for outcome tracking workflow.
IMPLEMENTATION_SUMMARY.md	Summary documentation for real-time infra implementation.
DEPLOYMENT_READY.md	Deployment readiness narrative for performance system.
.gitignore	Ignores reports + ML weights JSON outputs.
.github/workflows/collect-resolutions.yml	Adds workflow_dispatch job for resolution collection.
.github/workflows/ci.yml	Adds CI workflow (typecheck + test ladder).
.github/workflows/backtest-report.yml	Adds workflow_dispatch backtest report artifact job.

Files not reviewed (1)

pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-20T22:38:08Z

+function isAuthorized(req: VercelRequest): boolean {
+  const apiKey = req.headers['x-api-key'] || req.headers['authorization']?.replace('Bearer ', '');
+  const expectedKey = process.env.INTERNAL_API_KEY;
+
+  if (!expectedKey) {
+    // If no key is configured, check if request is from internal network
+    const allowedIps = (process.env.INTERNAL_IPS || '').split(',');
+    const clientIp = req.headers['x-forwarded-for'] || req.socket?.remoteAddress || '';
+    return allowedIps.some(ip => clientIp.toString().includes(ip));
+  }
+
+  return apiKey === expectedKey;
+}


isAuthorized() falls back to an IP allowlist when INTERNAL_API_KEY is unset, but (process.env.INTERNAL_IPS || '').split(',') yields [''] when the env var is missing/empty; clientIp.includes('') will then return true for every request, effectively disabling auth. Default to deny when no API key is configured (or require a non-empty allowlist and perform exact/CIDR matching on the first x-forwarded-for IP).

Copilot · 2026-04-20T22:38:08Z

+    const supabaseUrl = process.env.NEXT_PUBLIC_SUPABASE_URL;
+    const supabaseKey = process.env.SUPABASE_SERVICE_KEY || process.env.NEXT_PUBLIC_SUPABASE_ANON_KEY;
+
+    if (!supabaseUrl || !supabaseKey) {
+      res.status(500).json({
+        success: false,
+        error: 'Supabase configuration missing',
+      });
+      return;
+    }
+
+    const supabase = createSupabaseBrowserClient(supabaseUrl, supabaseKey);
+


This endpoint only reads NEXT_PUBLIC_SUPABASE_URL (and NEXT_PUBLIC_SUPABASE_ANON_KEY), but the rest of the codebase/docs also use SUPABASE_URL / SUPABASE_ANON_KEY. As written, /api/internal/resolve-market will 500 in environments that set only the non-NEXT_PUBLIC_* vars. Consider using the same fallback chain as api/metrics/performance.ts (SUPABASE_URL || NEXT_PUBLIC_SUPABASE_URL, and prefer SUPABASE_SERVICE_KEY on the server).

Copilot · 2026-04-20T22:38:08Z

+    // Fetch all signal outcomes
+    const { data: allSignals, error: allSignalsError } = await supabase
+      .from('signal_outcomes')
+      .select('*');


This endpoint pulls the entire signal_outcomes table into memory (select('*')) and then filters in-process for 24h/7d/30d. As the table grows, this will become slow and can exceed serverless memory/time limits. Prefer querying only the needed time window (e.g., gte('created_at', day30Ago)) and selecting only required columns; consider separate lightweight queries/counts for totals and unresolved backlog.

Copilot · 2026-04-20T22:38:09Z

+    // Recency decay — half-life of 15 minutes
+    const ageMins = Math.max(0, (now - t.timestamp) / 60_000);
+    const recencyDecay = Math.exp(-ageMins / 15);
+


The comment says “half-life of 15 minutes”, but Math.exp(-ageMins / 15) uses 15 minutes as the decay constant (time to e^-1), not a half-life. If you intend a true 15-minute half-life, the exponent should be scaled by ln(2); otherwise update the comment to match the implemented decay.

Copilot · 2026-04-20T22:38:09Z

+    // Filter out anomalous markets from arbitrage consideration
    const arbitrageOpportunities = await getArbitrage(0.03);


This comment says “Filter out anomalous markets from arbitrage consideration”, but the code only filters is_directionally_opposed. Either update the comment to reflect what’s actually being filtered, or implement the anomalous-market filter here (e.g., based on market.is_anomalous).

Copilot · 2026-04-20T22:38:09Z

+function calculatePnL(
+  edge: number,
+  predictedProb: number,
+  wasCorrect: boolean,
+  bankroll: number = 1000 // Default bankroll
+): number {
+  // Kelly Criterion: f* = (bp - q) / b
+  // where b = decimal odds - 1, p = win probability, q = 1 - p
+  // Simplified: bet size = edge * bankroll (fraction Kelly)
+
+  const kellyFraction = Math.abs(edge) * 0.25; // Quarter Kelly for safety
+  const betSize = kellyFraction * bankroll;
+
+  if (wasCorrect) {
+    // Return at fair odds based on predicted probability
+    // Profit = betSize / predictedProb - betSize
+    return betSize * (1 / predictedProb - 1);
+  } else {
+    // Loss: we lose the entire bet
+    return -betSize;
+  }


calculatePnL() can divide by zero (or near-zero) when predictedProb is 0/very small, producing Infinity/huge P&L values. Clamp predictedProb to a safe epsilon range (and validate it’s finite) before using 1 / predictedProb so a single bad row can’t corrupt metrics.

Copilot · 2026-04-20T22:38:10Z

+The  file will be created after running the training script.
+


Fix the extra space in “The file …” (double space) to avoid a typo in docs.

Copilot · 2026-04-20T22:38:10Z

+// Run if executed directly
+if (require.main === module) {
+  main().catch(console.error);
+}


This script uses require.main === module as its entrypoint guard. If it’s executed under ESM (which is common with node --import tsx), require is undefined and the script will crash. Consider using the repo’s isMainModule() helper (scripts/lib/is-main-module.ts) or an import.meta.url-based guard for consistency.

Copilot · 2026-04-20T22:38:10Z

+  try {
+    const modelPath = path.join(__dirname, 'models', 'signal-scorer-v1.json');
+
+    if (!fs.existsSync(modelPath)) {
+      modelLoadError = 'Model file not found';
+      console.warn('[signal-scorer-model] Model file not found, using heuristic fallback');
+      return null;
+    }


loadModel() builds the weights path using __dirname. If this module is ever executed as ESM (or bundled), __dirname is not defined and model loading will fail. Prefer deriving the directory from import.meta.url (via fileURLToPath) or otherwise ensure this module is CJS-only and documented as such.

Copilot · 2026-04-20T22:38:10Z

+// Run examples if this file is executed directly
+if (require.main === module) {
+  runAllExamples();
+}


This example uses require.main === module to detect direct execution. If it’s run under ESM (common with node --import tsx), require is undefined. Consider using isMainModule() (scripts/lib/is-main-module.ts) or an import.meta.url-based guard for ESM compatibility.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c1287f5e0e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-20T22:39:20Z

+    const allowedIps = (process.env.INTERNAL_IPS || '').split(',');
+    const clientIp = req.headers['x-forwarded-for'] || req.socket?.remoteAddress || '';
+    return allowedIps.some(ip => clientIp.toString().includes(ip));


Reject unauthenticated calls when INTERNAL_IPS is empty

The fallback auth path becomes effectively open when INTERNAL_API_KEY is unset and INTERNAL_IPS is missing/empty: (process.env.INTERNAL_IPS || '').split(',') yields [''], and clientIp.includes('') is always true, so any client is treated as authorized. In that deployment state, POST /api/internal/resolve-market can be called publicly and mutate signal outcomes, which is a production-impacting security regression.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-20T22:39:20Z

+    const exitPrice = direction === 'YES' 
+      ? exitSnapshot.yesPrice 
+      : (1 - exitSnapshot.yesPrice);


Recompute exit price after stop-loss/take-profit updates

exitPrice is captured before the stop-loss/take-profit branch, but that branch can replace exitSnapshot and exitReason. Because exitPrice is never recomputed, downstream P&L, correctness, and price-change metrics still use the old expiry/resolution price instead of the triggered stop/take-profit price, making backtest results incorrect whenever those guards fire.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-20T22:39:20Z

+      const { data: signals, error: fetchError } = await supabase
+        .from('signal_outcomes')
+        .select('*')
+        .eq('market_id', market.ticker)


Match Kalshi resolutions against stored market_id format

This lookup uses raw market.ticker, but signals are logged with market_id = topMarket.id (the normalized internal ID, e.g. kalshi-<ticker>), so Kalshi rows won't be found and remain unresolved. In practice the collector silently skips Kalshi outcome updates, which breaks win-rate/P&L tracking and ML labels for that platform.

Useful? React with 👍 / 👎.

…ifacts

…s safety

…dress review findings - Remove Access-Control-Allow-Origin: * from risk/session endpoint - Add API key auth + per-IP rate limiting to /api/risk/session - Replace SELECT * with explicit column list in getRecentPerformance() - Truncate raw tweet text to 280 chars before DB storage - Replace 32-bit hash event IDs with crypto.randomUUID() - Add ML circular-priors warning and MUSASHI_ML_ENABLED runtime gate - Document fair-odds P&L assumption in calculatePnL and backtest README - Update DEPLOYMENT.md with new env vars Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…cktest disclaimer 1. Event ID: replace crypto.randomUUID() with deterministic SHA-256 hash of (marketId + signalType + 5-min time bucket + text) for stable dedup 2. .gitignore: add *.docx, *.xlsx, *.pptx to prevent Office binaries in git 3. Dependency audit: all deps verified in-use, no removals needed 4. ML floor: add getMinRealSignals() clamped to min 50 with warning log 5. Retry-After: add 60-second header on 429 rate-limit responses 6. CORS: read ALLOWED_ORIGIN env var, reject wildcard in production 7. Backtest: prepend fair-odds disclaimer to all generated reports 8. Tests: add event ID determinism + ML floor clamp tests, fix risk/session tests to provide auth headers (81/81 passing) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…r format 1. Export generateEventId from signal-generator, rewrite unit tests to import the real function instead of inline re-implementations; add bucket-boundary test documenting 5-minute window dedup behavior 2. .gitignore: add *.doc, *.xls, *.ppt alongside existing Office patterns; add reference materials note in README 3. Rewrite getMinRealSignals test to import and call the actual exported function from signal-scorer-model.ts 4. Add Retry-After: 60 header test using pre-filled rate limit bucket 5. CORS: return 500 in production when ALLOWED_ORIGIN is unset instead of just logging; update ENVIRONMENT.md docs 6. Backtest disclaimer: switch to box-drawing character format matching spec 7. Dependency audit: all 4 new deps verified in-use, all MIT/Apache licensed 83/83 tests pass (35 unit + 39 API + 5 wallet + 4 smoke), typecheck clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Aarav Jain improvements

0290a46

Copilot AI review requested due to automatic review settings April 20, 2026 22:32

Copilot started reviewing on behalf of aaravjj2 April 20, 2026 22:33 View session

Copilot AI reviewed Apr 20, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Apr 20, 2026

View reviewed changes

aaravjj2 and others added 4 commits April 20, 2026 18:48

Address review: secure internal endpoints, scale reads, and clean art…

ccbc364

…ifacts

Address deep review: semantic prewarm, synthetic provenance, and stat…

0456b8b

…s safety

aaravjj2 requested a review from Copilot April 20, 2026 23:55

Copilot started reviewing on behalf of aaravjj2 April 20, 2026 23:56 View session

aaravjj2 force-pushed the aarav-jain-improvements branch from bb602aa to abc5afc Compare April 21, 2026 00:17

Copilot AI reviewed Apr 21, 2026

View reviewed changes

		// Filter out anomalous markets from arbitrage consideration
		const arbitrageOpportunities = await getArbitrage(0.03);

Conversation

aaravjj2 commented Apr 20, 2026

Summary\n- Expand test coverage across unit and API handler layers\n- Add shared test helpers and improve test scripts/docs\n- Include infrastructure/docs updates for performance, ML, risk, and backtesting\n\n## Validation\n- pnpm test:all (all suites passing)\n

Uh oh!

vercel Bot commented Apr 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants