Skip to content

Benchmark (unlabeled) Data Collection MVP#18

Open
Ryan-Siglag wants to merge 5 commits into
MusashiBot:mainfrom
Ryan-Siglag:benchmark
Open

Benchmark (unlabeled) Data Collection MVP#18
Ryan-Siglag wants to merge 5 commits into
MusashiBot:mainfrom
Ryan-Siglag:benchmark

Conversation

@Ryan-Siglag
Copy link
Copy Markdown

Benchmark (unlabeled) Data Collection MVP

Adds an automated pipeline that snapshots tweets and prediction market data into a single timestamped JSON file.

What it does

Fetches from three sources in a single run:

  • Twitter — latest tweets from high-priority accounts — batchFetchTimelines()
  • Polymarket — current prediction markets — fetchPolymarkets()
  • Kalshi — current prediction markets — fetchKalshiMarkets()

Rate-limited APIs are retried with exponential backoff. If a source fails after all retries, it is skipped, and the rest of the snapshot is still saved.

Output format

Each run writes a file to src/benchmark/unlabeled_data/results_<timestamp>.json:

{
  "meta": {
    "collected_at": "2026-05-28T20:09:04.997Z",
    "window_start": "2026-05-28T19:54:04.997Z",
    "window_end": "2026-05-28T20:09:04.997Z",
    "window_minutes": 15,
    "twitter_accounts_queried": ["Reuters", "AP", ...],
    "tweet_count": 42,
    "market_count": 300,
    "polymarket_count": 150,
    "kalshi_count": 150
  },
  "tweets": [ ... ],
  "markets": [ ... ]
}

Running

pnpm collect

@vercel
Copy link
Copy Markdown

vercel Bot commented May 29, 2026

@Ryan-Siglag is attempting to deploy a commit to the Victor's projects Team on Vercel.

A member of the Team first needs to authorize it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant