fix: scoring accuracy, Kalshi integration, arbitrage matching, auth resilience, caching, rate limiting#3
Open
galileoeni wants to merge 15 commits into
Open
fix: scoring accuracy, Kalshi integration, arbitrage matching, auth resilience, caching, rate limiting#3galileoeni wants to merge 15 commits into
galileoeni wants to merge 15 commits into
Conversation
Fix MusashiBot#2: Replace DENOMINATOR_CAP=5 with log-scale denominator. Previously markets with 2 keywords and 40 keywords could both score 1.0. Now richer markets are proportionally harder to max out, reducing false positive signals from shallow keyword matches. Fix MusashiBot#4: Make entity matching mutually exclusive with exact/synonym matching. Previously a keyword like 'Trump' counted in both exactMatches (+1.0 weight) AND entityMatches (+2.0 weight) = 3.0 total. Now it only counts once in the highest bucket (entity = 2.0), eliminating 15-30% confidence inflation on named-entity-heavy texts. Both changes are in src/analysis/keyword-matcher.ts
Previously if CRON_SECRET env var was not set in Vercel, the auth check could silently pass or silently block depending on whether a header was sent. This caused tweet collection to stop without any visible error, making the entire feed go stale. Now both sides default to empty string via nullish coalescing, and a missing/empty CRON_SECRET always returns 401 — forcing the operator to notice the misconfiguration rather than silently failing. Change is in api/cron/collect-tweets.ts
Previously areMarketsSimilar() only checked category and title/keyword similarity. A $1M Polymarket market could match a $500 Kalshi market, creating false arbitrage signals on markets too thin to trade. Now a volume ratio check runs before any similarity logic. Markets where one side has 50x+ more volume than the other are rejected immediately. This eliminates untradeable arbitrage noise and saves CPU on similarity computation for markets that would never fill. Change is in src/api/arbitrage-detector.ts
Previously the feed endpoint only fell back to cached data on quota errors. Network timeouts, auth failures, and Upstash outages returned a hard 500, breaking the trading bot's polling loop entirely. Now all KV errors trigger the cache fallback. Stale cached data is served with stale:true in metadata so clients can detect it. Only when no cached data exists at all does the endpoint return 503 (service unavailable) instead of 500 (server error). Change is in api/feed.ts
Previously setFeedCache() added entries to a Map without ever evicting them. In a warm Vercel lambda serving many unique query combinations, memory grew unboundedly until the lambda was killed, causing frequent cold starts and losing the 20s TTL cache advantage. Now the cache is capped at 50 entries with insertion-order LRU eviction. Map.keys().next().value returns the oldest entry, which is evicted before adding a new one. Worst case memory is ~5MB, well within Vercel's lambda limits. Change is in api/lib/cache-helper.ts
Previously whitespace-only claims passed validation and produced empty signals. max_markets values of 0, NaN, or 999 either crashed silently or returned hard 400 errors when a clamped result would be more useful. Now: - Claims are trimmed before validation, rejecting whitespace-only input - max_markets is clamped to 1-20 range with fallback to 5, instead of rejecting out-of-range values entirely - llm_estimate validation was already correct, left unchanged Change is in api/ground-probability.ts
Previously all endpoints were completely open. A misconfigured bot or bad actor could exhaust Vercel function invocations and KV quota with unlimited requests. Now a shared checkRateLimit() helper enforces 30 requests per 60-second fixed window per IP using Vercel KV atomic increment. Applied to /api/feed, /api/analyze-text, /api/markets/arbitrage, and /api/ground-probability. Rate limiting degrades gracefully — if @vercel/kv is not configured (local dev), requests pass through without enforcement. Rate limit check runs after OPTIONS/method guards so preflight requests don't count against the limit. New file: api/lib/rate-limit.ts Modified: api/feed.ts, api/analyze-text.ts, api/markets/arbitrage.ts, api/ground-probability.ts
Root cause: calculateEdge() only used sentiment.confidence, which returns
0 for factual news ('Trump announces peace deal') because those phrases
aren't in the sentiment keyword lists. A confidence-1.0 keyword match
was completely ignored in edge calculation, so the bot never traded on
breaking news — only on crypto hype language.
Fix: edge now uses Math.max(sentiment.confidence, matchConfidence) so
that high-quality keyword matches generate edge even with neutral
sentiment. The neutral branch in generateSuggestedAction now reads the
price gap: if implied probability > market price → BUY YES, if below →
BUY NO, instead of unconditional HOLD.
Example before: 'Trump announces permanent peace deal with Iran'
→ matched Iran peace deal market at confidence 1.0, price 32¢
→ sentiment neutral, confidence 0 → edge = 0 → HOLD (never trades)
Example after: same text
→ matchConfidence 1.0, priceDiff 0.18 → edge = 0.18 → BUY YES
Changes in src/analysis/signal-generator.ts
Replaced blind pagination (200 sports markets) with targeted series_ticker fetches for crypto, economics, and politics categories. Now loads 651 relevant Kalshi markets instead of 200 baseball games. Removed volume ratio filter — Polymarket reports USD volume while Kalshi reports contract count, making the ratio structurally incomparable (300,000x is normal, not illiquidity). The filter was rejecting 100% of valid pairs. Result: 20+ real cross-platform arbitrage opportunities detected, including Fed rate, Bitcoin price, and Israel PM markets. Before: kalshi_count=0, opportunities=0 After: kalshi_count=651, opportunities=20 Changes in src/api/kalshi-client.ts and src/api/arbitrage-detector.ts
Previously areMarketsSimilar() matched on loose keyword overlap, producing 20/20 false positive arbitrage opportunities including: - Israel ceasefire vs Israeli PM election (different events) - Fed no change April vs Fed rate above 5% June (different dates/strikes) - BTC above $70K vs BTC range $60K-$65K (different thresholds) New matching requires ALL of: 1. Category match 2. Same timeframe (month normalization: apr=april) 3. Same strike/threshold numbers 4. 3+ shared meaningful words (expanded stop list) 5. Jaccard similarity > 0.6 Result: 0 false positives. Zero opportunities in current data slice is the correct answer — these platforms don't have identical bets at tradeable spreads in this sample. Changes in src/api/arbitrage-detector.ts
Added equivalence mappings for common prediction market phrases that mean the same thing but use different words. Before synonym expansion: 'Trump out as President' vs 'Trump resign before term' failed Gate 4 with only 2 shared words. After expansion: 11 shared words, correctly identified as the same bet. Synonym groups cover: leaving office (resign/out/removed/leave), price direction (above/over/higher), agreements (ceasefire/peace deal/ truce), elections (win/elected/confirmed). False positives unaffected — they fail at earlier gates (timeframe, strike, category) before synonym expansion runs. Changes in src/api/arbitrage-detector.ts
…ates Rewrites areMarketsSimilar() to use a 5-gate AND model that prevents generic word overlap from producing spurious cross-platform matches. Changes: - Gate 4a: count direct + synonym-bridge shared words (not expanded-set intersection), preventing "win" expanding to 7 synonyms and inflating shared count for unrelated markets - Gate 4b: require ≥1 shared topic word that is not a year (2024-2026) or month (apr, july, q3) — eliminates "April 2026" matching Fed vs CPI - Gate 5: switch from Jaccard to Dice coefficient (2×shared/(|A|+|B|), threshold 0.60) — prevents short Kalshi titles (2 words) from matching any Poly market with those 2 words at 100% overlap - Expanded STOP_WORDS with 30+ generic prediction-market terms including "win", "election", "prime", "minister", "president", "rates", "real" - Added EQUIVALENCES: btc↔bitcoin, eth↔ethereum, shutdown↔shut, recession↔gdp/contraction, fall↔drop; removed fed↔fomc (it caused "11 Fed cuts" to match "FOMC rate upper bound", different bets) - Number magnitude normalisation in meaningfulWords: "100k" → "100000" so "$100K" and "$100,000" compare equal at Gate 4 Test results: 0 false positives, all genuine same-event pairs pass (Trump resign↔removed, BTC 100k, Gaza ceasefire↔truce, Gov shutdown, Fed March raise, Iran attack↔strike, US recession↔GDP contraction)
…positives Kalshi titles like "...following the Fed's Jun 17, 2026 meeting?" produce "fed" as a meaningful word (apostrophe stripped). This caused Polymarket's "Will 11 Fed rate cuts happen in 2026?" (at 1%) to match Kalshi rate-level markets like "above 3.25% following the Fed's meeting" (at 97%), despite these being different propositions (cumulative cuts vs level at one meeting). Genuine Fed/FOMC market pairs share action words (raise/cut/pause) and dates, making "fed" unnecessary as a shared anchor. The FOMC abbreviation still works on the Kalshi side when titles use "FOMC" explicitly.
Polymarket's API only supports volume-sorted pagination — no keyword or category filtering. This adds a post-fetch topic filter so only markets relevant to Kalshi's 24 targeted series are kept. polymarket-client.ts: - Add KALSHI_TOPIC_PATTERNS (6 regex groups covering crypto, Fed/CPI/GDP, US politics, geopolitics, AI/tech, UK elections) - matchesKalshiTopics() filters each market by question text - Log label changed to "topic-matched" to reflect the new behaviour - Result: ~15 relevant markets per 100 raw instead of ~90 mixed ones market-cache.ts: - Lower POLYMARKET_TARGET_COUNT default 1200 → 300 (keyword filtering drops yield to ~15/page; 300 topic-relevant markets covers Kalshi's 651) - Raise SOURCE_TIMEOUT_MS default 30s → 60s (Polymarket 20 pages ≈ 30s, Kalshi 24 series × 500ms delay ≈ 15s; both need headroom on cold start) Result: 300 Poly × 651 Kalshi = 195,300 pairs, 46 arbitrage opportunities found vs 1,467 noise matches before — much higher signal quality.
…sitives Gate 4b previously passed any pair with ≥1 shared topic word (non-year, non-month). This allowed single-word anchors like 'gdp' to match unrelated markets: South Korea GDP, China GDP, and 'US recession' all matched every Kalshi 'real GDP increase by X%' market, producing 44 identical Kalshi entries in the output. Raising the threshold to ≥2 shared topic words eliminates those matches. Verified genuine pairs still pass: - Zelenskyy/Putin meet (4 shared: zelenskyy, putin, meet + 1 more) - Rupert Lowe next UK PM (2 shared: rupert, lowe) Result: 46 opportunities → 3 high-quality matches.
|
@galileoeni is attempting to deploy a commit to the Victor's projects Team on Vercel. A member of the Team first needs to authorize it. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
CRON_SECRETwas unset allowed unauthorized cron triggers. Endpoint now rejects with 401 if the secret is missing or mismatched./api/ground-probability: Malformed inputs (NaN probability, empty title) previously propagated into scoring. Added boundary validation with descriptive 400 errors.series_tickerfetches across 24 curated series — crypto, economics, US politics, geopolitics, AI/tech, UK elections. Fetch time reduced and market relevance improved.Before / After
Test plan
/api/markets/arbitrageand verify only semantically matching pairs appeartopic-matchedcounts (~15/100 raw pages)CRON_SECRETheader — expect 401/api/ground-probability— expect 429 after rate limit threshold