Skip to content

Conversation

@ammar-agent
Copy link
Collaborator

@ammar-agent ammar-agent commented Jan 18, 2026

Summary

  • streamline upload-tbench-results.py with shared run metadata, deterministic ordering, and safer parsing
  • simplify leaderboard submission flow with centralized command/JSON handling and clearer artifact grouping

Testing

  • make static-check (fails: zizmor known-vulnerable-actions audit hit a 403 from GitHub API)

Generated with mux • Model: openai:gpt-5.2-codex • Thinking: high • Cost: $20.48

- Remove unused extract_model_from_config() and extract_thinking_from_config()
  functions from upload-tbench-results.py (model/thinking are read inline)
- Remove unused n_total_trials variable from build_rows()
- Remove unused _last_environment field from MuxAgent class

Net -15 LoC
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant