Auto-fetch ELO scores instead of hardcoding in generate_model_catalog.py

## Problem

`generate_model_catalog.py` hardcodes ELO scores in a static `ELO_SCORES` dict. This causes several issues:

- **Scores go stale** — ELO scores drift as new votes come in on the Arena leaderboard, but our dict stays frozen
- **New models are invisible** — models not in the dict get no score and may be filtered out
- **Multiple leaderboards** — there are different coding leaderboards (Code Arena WebDev, Text Arena Coding) with very different scores for the same model (e.g., Grok scores ~1450 on Text Arena Coding but ~1200 on Code Arena WebDev)
- **Manual errors** — we recently shipped gpt-5.2-codex at 1471 (the score for gpt-5.2-high, a different model) when its actual score is 1336

## Proposed Solution

Agentically fetch ELO scores at generation time instead of maintaining a static dict:

1. **Scrape or API-fetch** the Arena leaderboard (e.g., `arena.ai/leaderboard/code`) to get current scores
2. **Fuzzy-match** Arena model names to litellm model IDs (e.g., `gpt-5-medium` → `gpt-5`, `claude-opus-4-6` → `claude-opus-4-6`)
3. **Fall back** to the static dict only for models not on any leaderboard (local models like `lm_studio/`, niche providers)
4. **Cache** fetched scores with a TTL so we don't hit the leaderboard on every run

## Context

The current `ELO_SCORES` dict has ~40 entries with inline comments like `# [CODE] #22` and `# [EST]` to track provenance. Many `[EST]` entries are rough guesses. Fresh leaderboard data would eliminate all of these.

## Acceptance Criteria

- [ ] `generate_model_catalog.py` fetches current ELO scores from at least one Arena leaderboard
- [ ] Graceful fallback to static scores if fetch fails (network issues, rate limits)
- [ ] Model name fuzzy matching handles Arena ↔ litellm naming differences
- [ ] Generated `llm_model.csv` has accurate, up-to-date scores without manual dict maintenance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-fetch ELO scores instead of hardcoding in generate_model_catalog.py #550

Problem

Proposed Solution

Context

Acceptance Criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Auto-fetch ELO scores instead of hardcoding in generate_model_catalog.py #550

Description

Problem

Proposed Solution

Context

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions