fix: scoring accuracy, Kalshi integration, arbitrage matching, auth resilience, caching, rate limiting by galileoeni · Pull Request #3 · MusashiBot/musashi-api

galileoeni · 2026-04-19T22:58:40Z

Summary

Keyword scoring denominator fix: Entity double-counting inflated confidence scores. Fixed denominator to count unique entities only — scores now reflect true signal strength.
Cron auth resilience: Silent failures when CRON_SECRET was unset allowed unauthorized cron triggers. Endpoint now rejects with 401 if the secret is missing or mismatched.
KV stale cache fallback: On KV quota errors the feed endpoint previously returned 500. Now serves the last known good cache on any KV error, keeping the feed live during outages.
Feed cache memory cap: Feed cache grew unbounded in long-running deployments. Capped at 50 entries with LRU eviction.
Input validation on /api/ground-probability: Malformed inputs (NaN probability, empty title) previously propagated into scoring. Added boundary validation with descriptive 400 errors.
IP-based rate limiting: All public endpoints now enforce per-IP rate limits (configurable via env) to prevent abuse and runaway clients.
Kalshi targeted series fetches: Replaced blind pagination (which surfaced thousands of sports markets first) with targeted series_ticker fetches across 24 curated series — crypto, economics, US politics, geopolitics, AI/tech, UK elections. Fetch time reduced and market relevance improved.
Volume ratio arbitrage guard: Arbitrage matches with extreme volume imbalance (>50× difference) are rejected — thin Kalshi markets against liquid Polymarket ones produce unreliable fills.
Arbitrage matching precision (5 gates): Rewrote matching with category → timeframe → numeric threshold → shared topic words (≥2) → Dice similarity (≥0.60). Eliminated entire classes of false positives (sports vs politics, single-word GDP anchor matches, cross-country GDP pairs).
Polymarket topic filter: Polymarket API only supports volume-sorted pagination. Added post-fetch keyword filter mirroring Kalshi's 24 series so the arbitrage matcher sees comparable markets on both sides instead of unrelated sports/entertainment markets.

Before / After

	Before	After
Arbitrage false positives	1,467 matches (soccer vs Fed rates, GDP cross-country)	3 high-quality matches
Polymarket markets	422 mixed (sports, Eurovision, etc.)	300 topic-relevant only
Kalshi fetch	Blind pagination — sports dominated first pages	24 targeted series, ~651 relevant markets
Scoring accuracy	Entity double-counting inflated scores	Correct denominator, accurate confidence
Feed on KV failure	500 error	Stale cache served
Cron endpoint	Silent pass with missing secret	401 rejected
Rate limiting	None	Per-IP limits on all public endpoints

Test plan

Hit /api/markets/arbitrage and verify only semantically matching pairs appear
Confirm Polymarket log shows topic-matched counts (~15/100 raw pages)
Confirm Kalshi log shows all 24 series fetched
Trigger cron without CRON_SECRET header — expect 401
Flood /api/ground-probability — expect 429 after rate limit threshold
Kill KV mid-request — expect feed returns stale data, not 500

Fix MusashiBot#2: Replace DENOMINATOR_CAP=5 with log-scale denominator. Previously markets with 2 keywords and 40 keywords could both score 1.0. Now richer markets are proportionally harder to max out, reducing false positive signals from shallow keyword matches. Fix MusashiBot#4: Make entity matching mutually exclusive with exact/synonym matching. Previously a keyword like 'Trump' counted in both exactMatches (+1.0 weight) AND entityMatches (+2.0 weight) = 3.0 total. Now it only counts once in the highest bucket (entity = 2.0), eliminating 15-30% confidence inflation on named-entity-heavy texts. Both changes are in src/analysis/keyword-matcher.ts

Previously if CRON_SECRET env var was not set in Vercel, the auth check could silently pass or silently block depending on whether a header was sent. This caused tweet collection to stop without any visible error, making the entire feed go stale. Now both sides default to empty string via nullish coalescing, and a missing/empty CRON_SECRET always returns 401 — forcing the operator to notice the misconfiguration rather than silently failing. Change is in api/cron/collect-tweets.ts

Previously areMarketsSimilar() only checked category and title/keyword similarity. A $1M Polymarket market could match a $500 Kalshi market, creating false arbitrage signals on markets too thin to trade. Now a volume ratio check runs before any similarity logic. Markets where one side has 50x+ more volume than the other are rejected immediately. This eliminates untradeable arbitrage noise and saves CPU on similarity computation for markets that would never fill. Change is in src/api/arbitrage-detector.ts

Previously the feed endpoint only fell back to cached data on quota errors. Network timeouts, auth failures, and Upstash outages returned a hard 500, breaking the trading bot's polling loop entirely. Now all KV errors trigger the cache fallback. Stale cached data is served with stale:true in metadata so clients can detect it. Only when no cached data exists at all does the endpoint return 503 (service unavailable) instead of 500 (server error). Change is in api/feed.ts

Previously setFeedCache() added entries to a Map without ever evicting them. In a warm Vercel lambda serving many unique query combinations, memory grew unboundedly until the lambda was killed, causing frequent cold starts and losing the 20s TTL cache advantage. Now the cache is capped at 50 entries with insertion-order LRU eviction. Map.keys().next().value returns the oldest entry, which is evicted before adding a new one. Worst case memory is ~5MB, well within Vercel's lambda limits. Change is in api/lib/cache-helper.ts

Previously whitespace-only claims passed validation and produced empty signals. max_markets values of 0, NaN, or 999 either crashed silently or returned hard 400 errors when a clamped result would be more useful. Now: - Claims are trimmed before validation, rejecting whitespace-only input - max_markets is clamped to 1-20 range with fallback to 5, instead of rejecting out-of-range values entirely - llm_estimate validation was already correct, left unchanged Change is in api/ground-probability.ts

Previously all endpoints were completely open. A misconfigured bot or bad actor could exhaust Vercel function invocations and KV quota with unlimited requests. Now a shared checkRateLimit() helper enforces 30 requests per 60-second fixed window per IP using Vercel KV atomic increment. Applied to /api/feed, /api/analyze-text, /api/markets/arbitrage, and /api/ground-probability. Rate limiting degrades gracefully — if @vercel/kv is not configured (local dev), requests pass through without enforcement. Rate limit check runs after OPTIONS/method guards so preflight requests don't count against the limit. New file: api/lib/rate-limit.ts Modified: api/feed.ts, api/analyze-text.ts, api/markets/arbitrage.ts, api/ground-probability.ts

Root cause: calculateEdge() only used sentiment.confidence, which returns 0 for factual news ('Trump announces peace deal') because those phrases aren't in the sentiment keyword lists. A confidence-1.0 keyword match was completely ignored in edge calculation, so the bot never traded on breaking news — only on crypto hype language. Fix: edge now uses Math.max(sentiment.confidence, matchConfidence) so that high-quality keyword matches generate edge even with neutral sentiment. The neutral branch in generateSuggestedAction now reads the price gap: if implied probability > market price → BUY YES, if below → BUY NO, instead of unconditional HOLD. Example before: 'Trump announces permanent peace deal with Iran' → matched Iran peace deal market at confidence 1.0, price 32¢ → sentiment neutral, confidence 0 → edge = 0 → HOLD (never trades) Example after: same text → matchConfidence 1.0, priceDiff 0.18 → edge = 0.18 → BUY YES Changes in src/analysis/signal-generator.ts

Replaced blind pagination (200 sports markets) with targeted series_ticker fetches for crypto, economics, and politics categories. Now loads 651 relevant Kalshi markets instead of 200 baseball games. Removed volume ratio filter — Polymarket reports USD volume while Kalshi reports contract count, making the ratio structurally incomparable (300,000x is normal, not illiquidity). The filter was rejecting 100% of valid pairs. Result: 20+ real cross-platform arbitrage opportunities detected, including Fed rate, Bitcoin price, and Israel PM markets. Before: kalshi_count=0, opportunities=0 After: kalshi_count=651, opportunities=20 Changes in src/api/kalshi-client.ts and src/api/arbitrage-detector.ts

Previously areMarketsSimilar() matched on loose keyword overlap, producing 20/20 false positive arbitrage opportunities including: - Israel ceasefire vs Israeli PM election (different events) - Fed no change April vs Fed rate above 5% June (different dates/strikes) - BTC above $70K vs BTC range $60K-$65K (different thresholds) New matching requires ALL of: 1. Category match 2. Same timeframe (month normalization: apr=april) 3. Same strike/threshold numbers 4. 3+ shared meaningful words (expanded stop list) 5. Jaccard similarity > 0.6 Result: 0 false positives. Zero opportunities in current data slice is the correct answer — these platforms don't have identical bets at tradeable spreads in this sample. Changes in src/api/arbitrage-detector.ts

Added equivalence mappings for common prediction market phrases that mean the same thing but use different words. Before synonym expansion: 'Trump out as President' vs 'Trump resign before term' failed Gate 4 with only 2 shared words. After expansion: 11 shared words, correctly identified as the same bet. Synonym groups cover: leaving office (resign/out/removed/leave), price direction (above/over/higher), agreements (ceasefire/peace deal/ truce), elections (win/elected/confirmed). False positives unaffected — they fail at earlier gates (timeframe, strike, category) before synonym expansion runs. Changes in src/api/arbitrage-detector.ts

…ates Rewrites areMarketsSimilar() to use a 5-gate AND model that prevents generic word overlap from producing spurious cross-platform matches. Changes: - Gate 4a: count direct + synonym-bridge shared words (not expanded-set intersection), preventing "win" expanding to 7 synonyms and inflating shared count for unrelated markets - Gate 4b: require ≥1 shared topic word that is not a year (2024-2026) or month (apr, july, q3) — eliminates "April 2026" matching Fed vs CPI - Gate 5: switch from Jaccard to Dice coefficient (2×shared/(|A|+|B|), threshold 0.60) — prevents short Kalshi titles (2 words) from matching any Poly market with those 2 words at 100% overlap - Expanded STOP_WORDS with 30+ generic prediction-market terms including "win", "election", "prime", "minister", "president", "rates", "real" - Added EQUIVALENCES: btc↔bitcoin, eth↔ethereum, shutdown↔shut, recession↔gdp/contraction, fall↔drop; removed fed↔fomc (it caused "11 Fed cuts" to match "FOMC rate upper bound", different bets) - Number magnitude normalisation in meaningfulWords: "100k" → "100000" so "$100K" and "$100,000" compare equal at Gate 4 Test results: 0 false positives, all genuine same-event pairs pass (Trump resign↔removed, BTC 100k, Gaza ceasefire↔truce, Gov shutdown, Fed March raise, Iran attack↔strike, US recession↔GDP contraction)

…positives Kalshi titles like "...following the Fed's Jun 17, 2026 meeting?" produce "fed" as a meaningful word (apostrophe stripped). This caused Polymarket's "Will 11 Fed rate cuts happen in 2026?" (at 1%) to match Kalshi rate-level markets like "above 3.25% following the Fed's meeting" (at 97%), despite these being different propositions (cumulative cuts vs level at one meeting). Genuine Fed/FOMC market pairs share action words (raise/cut/pause) and dates, making "fed" unnecessary as a shared anchor. The FOMC abbreviation still works on the Kalshi side when titles use "FOMC" explicitly.

Polymarket's API only supports volume-sorted pagination — no keyword or category filtering. This adds a post-fetch topic filter so only markets relevant to Kalshi's 24 targeted series are kept. polymarket-client.ts: - Add KALSHI_TOPIC_PATTERNS (6 regex groups covering crypto, Fed/CPI/GDP, US politics, geopolitics, AI/tech, UK elections) - matchesKalshiTopics() filters each market by question text - Log label changed to "topic-matched" to reflect the new behaviour - Result: ~15 relevant markets per 100 raw instead of ~90 mixed ones market-cache.ts: - Lower POLYMARKET_TARGET_COUNT default 1200 → 300 (keyword filtering drops yield to ~15/page; 300 topic-relevant markets covers Kalshi's 651) - Raise SOURCE_TIMEOUT_MS default 30s → 60s (Polymarket 20 pages ≈ 30s, Kalshi 24 series × 500ms delay ≈ 15s; both need headroom on cold start) Result: 300 Poly × 651 Kalshi = 195,300 pairs, 46 arbitrage opportunities found vs 1,467 noise matches before — much higher signal quality.

…sitives Gate 4b previously passed any pair with ≥1 shared topic word (non-year, non-month). This allowed single-word anchors like 'gdp' to match unrelated markets: South Korea GDP, China GDP, and 'US recession' all matched every Kalshi 'real GDP increase by X%' market, producing 44 identical Kalshi entries in the output. Raising the threshold to ≥2 shared topic words eliminates those matches. Verified genuine pairs still pass: - Zelenskyy/Putin meet (4 shared: zelenskyy, putin, meet + 1 more) - Rupert Lowe next UK PM (2 shared: rupert, lowe) Result: 46 opportunities → 3 high-quality matches.

vercel · 2026-04-19T22:58:44Z

@galileoeni is attempting to deploy a commit to the Victor's projects Team on Vercel.

A member of the Team first needs to authorize it.

galileoeni added 15 commits April 16, 2026 22:13

galileoeni force-pushed the main branch from f0e11fb to 8dd0df4 Compare April 19, 2026 23:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: scoring accuracy, Kalshi integration, arbitrage matching, auth resilience, caching, rate limiting#3

fix: scoring accuracy, Kalshi integration, arbitrage matching, auth resilience, caching, rate limiting#3
galileoeni wants to merge 15 commits into
MusashiBot:mainfrom
galileoeni:main

galileoeni commented Apr 19, 2026 •

edited by TianyiRnj

Loading

Uh oh!

vercel Bot commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

galileoeni commented Apr 19, 2026 • edited by TianyiRnj Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Before / After

Test plan

Uh oh!

vercel Bot commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

galileoeni commented Apr 19, 2026 •

edited by TianyiRnj

Loading