Skip to content

fix: scoring accuracy, Kalshi integration, arbitrage matching, auth resilience, caching, rate limiting#3

Open
galileoeni wants to merge 15 commits into
MusashiBot:mainfrom
galileoeni:main
Open

fix: scoring accuracy, Kalshi integration, arbitrage matching, auth resilience, caching, rate limiting#3
galileoeni wants to merge 15 commits into
MusashiBot:mainfrom
galileoeni:main

Conversation

@galileoeni
Copy link
Copy Markdown

@galileoeni galileoeni commented Apr 19, 2026

Summary

  • Keyword scoring denominator fix: Entity double-counting inflated confidence scores. Fixed denominator to count unique entities only — scores now reflect true signal strength.
  • Cron auth resilience: Silent failures when CRON_SECRET was unset allowed unauthorized cron triggers. Endpoint now rejects with 401 if the secret is missing or mismatched.
  • KV stale cache fallback: On KV quota errors the feed endpoint previously returned 500. Now serves the last known good cache on any KV error, keeping the feed live during outages.
  • Feed cache memory cap: Feed cache grew unbounded in long-running deployments. Capped at 50 entries with LRU eviction.
  • Input validation on /api/ground-probability: Malformed inputs (NaN probability, empty title) previously propagated into scoring. Added boundary validation with descriptive 400 errors.
  • IP-based rate limiting: All public endpoints now enforce per-IP rate limits (configurable via env) to prevent abuse and runaway clients.
  • Kalshi targeted series fetches: Replaced blind pagination (which surfaced thousands of sports markets first) with targeted series_ticker fetches across 24 curated series — crypto, economics, US politics, geopolitics, AI/tech, UK elections. Fetch time reduced and market relevance improved.
  • Volume ratio arbitrage guard: Arbitrage matches with extreme volume imbalance (>50× difference) are rejected — thin Kalshi markets against liquid Polymarket ones produce unreliable fills.
  • Arbitrage matching precision (5 gates): Rewrote matching with category → timeframe → numeric threshold → shared topic words (≥2) → Dice similarity (≥0.60). Eliminated entire classes of false positives (sports vs politics, single-word GDP anchor matches, cross-country GDP pairs).
  • Polymarket topic filter: Polymarket API only supports volume-sorted pagination. Added post-fetch keyword filter mirroring Kalshi's 24 series so the arbitrage matcher sees comparable markets on both sides instead of unrelated sports/entertainment markets.

Before / After

Before After
Arbitrage false positives 1,467 matches (soccer vs Fed rates, GDP cross-country) 3 high-quality matches
Polymarket markets 422 mixed (sports, Eurovision, etc.) 300 topic-relevant only
Kalshi fetch Blind pagination — sports dominated first pages 24 targeted series, ~651 relevant markets
Scoring accuracy Entity double-counting inflated scores Correct denominator, accurate confidence
Feed on KV failure 500 error Stale cache served
Cron endpoint Silent pass with missing secret 401 rejected
Rate limiting None Per-IP limits on all public endpoints

Test plan

  • Hit /api/markets/arbitrage and verify only semantically matching pairs appear
  • Confirm Polymarket log shows topic-matched counts (~15/100 raw pages)
  • Confirm Kalshi log shows all 24 series fetched
  • Trigger cron without CRON_SECRET header — expect 401
  • Flood /api/ground-probability — expect 429 after rate limit threshold
  • Kill KV mid-request — expect feed returns stale data, not 500

Fix MusashiBot#2: Replace DENOMINATOR_CAP=5 with log-scale denominator.
Previously markets with 2 keywords and 40 keywords could both score 1.0.
Now richer markets are proportionally harder to max out, reducing false
positive signals from shallow keyword matches.

Fix MusashiBot#4: Make entity matching mutually exclusive with exact/synonym matching.
Previously a keyword like 'Trump' counted in both exactMatches (+1.0 weight)
AND entityMatches (+2.0 weight) = 3.0 total. Now it only counts once in the
highest bucket (entity = 2.0), eliminating 15-30% confidence inflation on
named-entity-heavy texts.

Both changes are in src/analysis/keyword-matcher.ts
Previously if CRON_SECRET env var was not set in Vercel, the auth check
could silently pass or silently block depending on whether a header was
sent. This caused tweet collection to stop without any visible error,
making the entire feed go stale.

Now both sides default to empty string via nullish coalescing, and a
missing/empty CRON_SECRET always returns 401 — forcing the operator to
notice the misconfiguration rather than silently failing.

Change is in api/cron/collect-tweets.ts
Previously areMarketsSimilar() only checked category and title/keyword
similarity. A $1M Polymarket market could match a $500 Kalshi market,
creating false arbitrage signals on markets too thin to trade.

Now a volume ratio check runs before any similarity logic. Markets where
one side has 50x+ more volume than the other are rejected immediately.
This eliminates untradeable arbitrage noise and saves CPU on similarity
computation for markets that would never fill.

Change is in src/api/arbitrage-detector.ts
Previously the feed endpoint only fell back to cached data on quota
errors. Network timeouts, auth failures, and Upstash outages returned
a hard 500, breaking the trading bot's polling loop entirely.

Now all KV errors trigger the cache fallback. Stale cached data is
served with stale:true in metadata so clients can detect it. Only when
no cached data exists at all does the endpoint return 503 (service
unavailable) instead of 500 (server error).

Change is in api/feed.ts
Previously setFeedCache() added entries to a Map without ever evicting
them. In a warm Vercel lambda serving many unique query combinations,
memory grew unboundedly until the lambda was killed, causing frequent
cold starts and losing the 20s TTL cache advantage.

Now the cache is capped at 50 entries with insertion-order LRU eviction.
Map.keys().next().value returns the oldest entry, which is evicted
before adding a new one. Worst case memory is ~5MB, well within
Vercel's lambda limits.

Change is in api/lib/cache-helper.ts
Previously whitespace-only claims passed validation and produced empty
signals. max_markets values of 0, NaN, or 999 either crashed silently
or returned hard 400 errors when a clamped result would be more useful.

Now:
- Claims are trimmed before validation, rejecting whitespace-only input
- max_markets is clamped to 1-20 range with fallback to 5, instead of
  rejecting out-of-range values entirely
- llm_estimate validation was already correct, left unchanged

Change is in api/ground-probability.ts
Previously all endpoints were completely open. A misconfigured bot or
bad actor could exhaust Vercel function invocations and KV quota with
unlimited requests.

Now a shared checkRateLimit() helper enforces 30 requests per 60-second
fixed window per IP using Vercel KV atomic increment. Applied to
/api/feed, /api/analyze-text, /api/markets/arbitrage, and
/api/ground-probability.

Rate limiting degrades gracefully — if @vercel/kv is not configured
(local dev), requests pass through without enforcement. Rate limit
check runs after OPTIONS/method guards so preflight requests don't
count against the limit.

New file: api/lib/rate-limit.ts
Modified: api/feed.ts, api/analyze-text.ts, api/markets/arbitrage.ts,
api/ground-probability.ts
Root cause: calculateEdge() only used sentiment.confidence, which returns
0 for factual news ('Trump announces peace deal') because those phrases
aren't in the sentiment keyword lists. A confidence-1.0 keyword match
was completely ignored in edge calculation, so the bot never traded on
breaking news — only on crypto hype language.

Fix: edge now uses Math.max(sentiment.confidence, matchConfidence) so
that high-quality keyword matches generate edge even with neutral
sentiment. The neutral branch in generateSuggestedAction now reads the
price gap: if implied probability > market price → BUY YES, if below →
BUY NO, instead of unconditional HOLD.

Example before: 'Trump announces permanent peace deal with Iran'
→ matched Iran peace deal market at confidence 1.0, price 32¢
→ sentiment neutral, confidence 0 → edge = 0 → HOLD (never trades)

Example after: same text
→ matchConfidence 1.0, priceDiff 0.18 → edge = 0.18 → BUY YES

Changes in src/analysis/signal-generator.ts
Replaced blind pagination (200 sports markets) with targeted
series_ticker fetches for crypto, economics, and politics categories.
Now loads 651 relevant Kalshi markets instead of 200 baseball games.

Removed volume ratio filter — Polymarket reports USD volume while
Kalshi reports contract count, making the ratio structurally
incomparable (300,000x is normal, not illiquidity). The filter was
rejecting 100% of valid pairs.

Result: 20+ real cross-platform arbitrage opportunities detected,
including Fed rate, Bitcoin price, and Israel PM markets.

Before: kalshi_count=0, opportunities=0
After: kalshi_count=651, opportunities=20

Changes in src/api/kalshi-client.ts and src/api/arbitrage-detector.ts
Previously areMarketsSimilar() matched on loose keyword overlap,
producing 20/20 false positive arbitrage opportunities including:
- Israel ceasefire vs Israeli PM election (different events)
- Fed no change April vs Fed rate above 5% June (different dates/strikes)
- BTC above $70K vs BTC range $60K-$65K (different thresholds)

New matching requires ALL of:
1. Category match
2. Same timeframe (month normalization: apr=april)
3. Same strike/threshold numbers
4. 3+ shared meaningful words (expanded stop list)
5. Jaccard similarity > 0.6

Result: 0 false positives. Zero opportunities in current data slice
is the correct answer — these platforms don't have identical bets at
tradeable spreads in this sample.

Changes in src/api/arbitrage-detector.ts
Added equivalence mappings for common prediction market phrases that
mean the same thing but use different words. Before synonym expansion:
'Trump out as President' vs 'Trump resign before term' failed Gate 4
with only 2 shared words. After expansion: 11 shared words, correctly
identified as the same bet.

Synonym groups cover: leaving office (resign/out/removed/leave),
price direction (above/over/higher), agreements (ceasefire/peace deal/
truce), elections (win/elected/confirmed).

False positives unaffected — they fail at earlier gates (timeframe,
strike, category) before synonym expansion runs.

Changes in src/api/arbitrage-detector.ts
…ates

Rewrites areMarketsSimilar() to use a 5-gate AND model that prevents
generic word overlap from producing spurious cross-platform matches.

Changes:
- Gate 4a: count direct + synonym-bridge shared words (not expanded-set
  intersection), preventing "win" expanding to 7 synonyms and inflating
  shared count for unrelated markets
- Gate 4b: require ≥1 shared topic word that is not a year (2024-2026)
  or month (apr, july, q3) — eliminates "April 2026" matching Fed vs CPI
- Gate 5: switch from Jaccard to Dice coefficient (2×shared/(|A|+|B|),
  threshold 0.60) — prevents short Kalshi titles (2 words) from matching
  any Poly market with those 2 words at 100% overlap
- Expanded STOP_WORDS with 30+ generic prediction-market terms including
  "win", "election", "prime", "minister", "president", "rates", "real"
- Added EQUIVALENCES: btc↔bitcoin, eth↔ethereum, shutdown↔shut,
  recession↔gdp/contraction, fall↔drop; removed fed↔fomc (it caused
  "11 Fed cuts" to match "FOMC rate upper bound", different bets)
- Number magnitude normalisation in meaningfulWords: "100k" → "100000"
  so "$100K" and "$100,000" compare equal at Gate 4

Test results: 0 false positives, all genuine same-event pairs pass
(Trump resign↔removed, BTC 100k, Gaza ceasefire↔truce, Gov shutdown,
Fed March raise, Iran attack↔strike, US recession↔GDP contraction)
…positives

Kalshi titles like "...following the Fed's Jun 17, 2026 meeting?" produce
"fed" as a meaningful word (apostrophe stripped). This caused Polymarket's
"Will 11 Fed rate cuts happen in 2026?" (at 1%) to match Kalshi rate-level
markets like "above 3.25% following the Fed's meeting" (at 97%), despite
these being different propositions (cumulative cuts vs level at one meeting).

Genuine Fed/FOMC market pairs share action words (raise/cut/pause) and
dates, making "fed" unnecessary as a shared anchor. The FOMC abbreviation
still works on the Kalshi side when titles use "FOMC" explicitly.
Polymarket's API only supports volume-sorted pagination — no keyword
or category filtering. This adds a post-fetch topic filter so only
markets relevant to Kalshi's 24 targeted series are kept.

polymarket-client.ts:
- Add KALSHI_TOPIC_PATTERNS (6 regex groups covering crypto, Fed/CPI/GDP,
  US politics, geopolitics, AI/tech, UK elections)
- matchesKalshiTopics() filters each market by question text
- Log label changed to "topic-matched" to reflect the new behaviour
- Result: ~15 relevant markets per 100 raw instead of ~90 mixed ones

market-cache.ts:
- Lower POLYMARKET_TARGET_COUNT default 1200 → 300 (keyword filtering
  drops yield to ~15/page; 300 topic-relevant markets covers Kalshi's 651)
- Raise SOURCE_TIMEOUT_MS default 30s → 60s (Polymarket 20 pages ≈ 30s,
  Kalshi 24 series × 500ms delay ≈ 15s; both need headroom on cold start)

Result: 300 Poly × 651 Kalshi = 195,300 pairs, 46 arbitrage opportunities
found vs 1,467 noise matches before — much higher signal quality.
…sitives

Gate 4b previously passed any pair with ≥1 shared topic word (non-year,
non-month). This allowed single-word anchors like 'gdp' to match unrelated
markets: South Korea GDP, China GDP, and 'US recession' all matched every
Kalshi 'real GDP increase by X%' market, producing 44 identical Kalshi
entries in the output.

Raising the threshold to ≥2 shared topic words eliminates those matches.
Verified genuine pairs still pass:
- Zelenskyy/Putin meet (4 shared: zelenskyy, putin, meet + 1 more)
- Rupert Lowe next UK PM (2 shared: rupert, lowe)

Result: 46 opportunities → 3 high-quality matches.
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 19, 2026

@galileoeni is attempting to deploy a commit to the Victor's projects Team on Vercel.

A member of the Team first needs to authorize it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant