Skip to content

fix: reduce false positives in movers and arbitrage detection#12

Open
vaisd wants to merge 1 commit into
MusashiBot:mainfrom
vaisd:fix/movers-arbitrage-false-positives
Open

fix: reduce false positives in movers and arbitrage detection#12
vaisd wants to merge 1 commit into
MusashiBot:mainfrom
vaisd:fix/movers-arbitrage-false-positives

Conversation

@vaisd
Copy link
Copy Markdown

@vaisd vaisd commented May 15, 2026

This PR fixes false positive issues in the movers and arbitrage detection systems by tightening similarity criteria and improving price change accuracy.

Changes Made

arbitrage-detector.ts:

  • Added stop word filtering before keyword overlap calculation which allows to filter out common filler words
  • Removed 'other' category bypass, requiring exact category matches
  • Raised keyword overlap threshold from 3 to 4 shared keywords
  • Tightened entity matching to need 3+ shared entities and 45%+ title similarity

arbitrage.ts:

  • Added a 0.5 floor clamp on minConfidence to stop callers from bypassing similarity filter
  • Updated keyword confidence formula to make sure that 4 shared keywords maps to 0.5 confidence, having incremental boosts for more keyword overlap

price-snapshots.ts:

  • Modified computePriceChange() to normalize changes by actual elapsed time, scaling down raw deltas when snapshots aren't exactly 1 hour apart so that we can prevent overstating movement from sparse or unevenly timed snapshots

movers.ts:

  • Reject minChange values below 0.02 (smallest precomputed bucket) with 400 error message

arbitrage-detector.test.ts:

  • Regression tests for confidence clamping, price normalization, and input validation
  • Verification that false positives are rejected and actual true matches still pass

Tradeoff

The price change normalization in computePriceChange() has a chance to understate real moves that happen quickly within a short window. This tradeoff is intentional to reduce false positives from sparse snapshot timing, where uneven intervals could otherwise inflate what looks like movement.

@vercel
Copy link
Copy Markdown

vercel Bot commented May 15, 2026

@vaisd is attempting to deploy a commit to the Victor's projects Team on Vercel.

A member of the Team first needs to authorize it.

@TianyiRnj
Copy link
Copy Markdown
Contributor

Thanks for all the work on this. Tightening the current heuristics definitely moves things in the right direction, and I appreciate the effort to reduce false positives in both movers and arbitrage detection.

One thought for a follow-up PR: instead of continuing to tune keyword overlap thresholds, we may want to try a more semantic matching approach for cross-platform market pairing. Happy to collaborate on that whenever you have bandwidth.

A few concrete options worth exploring:

Option A: BM25 / lexical ranking

  • Replace raw keyword-count thresholding with BM25 scoring.
  • Common words get downweighted automatically, reducing filler-word matches without manual threshold tuning.
  • Cheap, deterministic, and Vercel-friendly — a stronger lexical baseline than the current heuristic.

Option B: Small embedding model

  • Use embeddings for semantic similarity between market titles.
  • BGE-small or MiniLM are good self-contained options — store embeddings in KV and compare with cosine similarity.
  • Best kept in a cron/precompute job rather than the request path.
  • Strongest no-training option if we want to address semantic matching directly.

Option C: Hybrid

  • BM25 for candidate generation, embeddings for reranking.
  • Probably the best quality-to-complexity tradeoff for cost control and matching quality together.

A couple of related improvements that would help regardless of direction:

  • Align category taxonomy between Polymarket and Kalshi rather than relying on exact string matches
  • Add hard guards for conflicting entities, numbers, and dates
  • For movers, prefer a windowed median over a single closest snapshot, and require at least 2 points in the lookback window before emitting a mover

My suggestion would be to track the semantic matching work as a separate follow-up, so we can experiment against a batch of real false positives and false negatives before committing to a specific approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants