You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**NEVER change the backend due to missing credentials or CI configuration issues.**
6
+
7
+
- Research track: always uses SkyPilot (cloud VMs)
8
+
- Algorithmic track: always uses Docker (local)
9
+
10
+
If CI fails due to credentials/permissions, fix the credentials - do NOT change the code to use a different backend. The backend choice is intentional for each track's evaluation requirements.
Copy file name to clipboardExpand all lines: CONTRIBUTING.md
+10-6Lines changed: 10 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,8 @@
1
1
# Contributing to Frontier-CS
2
2
3
-
Frontier-CS is currently an **invitation-only** project for new problems.
3
+
> **For Problem Contributors**: Guidelines for creating and submitting new problems to Frontier-CS.
4
+
5
+
Frontier-CS is currently an **invitation-only** project for new problems.
4
6
Please create a GitHub pull request (PR) with your proposed problem following the guidelines below. After your PR is reviewed and merged, please send any hidden test data and reference solutions to the contact email provided at the end of this document.
5
7
6
8
@@ -130,11 +132,11 @@ research/{problem_name}/
130
132
├── evaluate.sh # Evaluation entry point
131
133
├── evaluator.py # Scoring logic
132
134
├── readme # Problem description
133
-
├── reference.py# Reference solution (required for CI validation)
135
+
├── reference.{py,cpp} # Reference solution (required for CI, extension per language)
134
136
└── resources/ # Problem-specific code/data
135
137
```
136
138
137
-
> **Note**: The `reference.py` is required for CI validation. When you submit a PR, the CI will automatically run your reference solution and verify it achieves score > 0.
139
+
> **Note**: A reference solution is required for CI validation. Use `reference.py` for Python problems or `reference.cpp` if `language: cpp` in config.yaml. The CI will automatically run your reference solution and verify it achieves score > 0.
138
140
139
141
### Solution Interface
140
142
@@ -331,10 +333,12 @@ When you submit a PR that adds or modifies problems, CI will automatically valid
| Research |`reference.py`|`research/problems/{name}/reference.py`|
336
+
| Research |`reference.{py,cpp}`|`research/problems/{name}/reference.{ext}` (extension per `language` in config.yaml)|
335
337
336
338
If the reference solution is missing or scores 0, the PR will be blocked from merging.
337
339
340
+
> **Important**: The reference solution must achieve score > 0. This is a design choice to ensure the evaluator is working correctly - a score > 0 proves that the evaluation pipeline can successfully compile/run the solution and produce a valid score. If the reference only scores 0, we cannot distinguish between "evaluator error" and "valid solution with no improvement". For problems that measure speedup against a baseline, the reference must be **faster than the baseline**, not just a copy of it.
341
+
338
342
### Local Testing
339
343
340
344
Before submitting a PR, test your reference solution locally:
@@ -343,8 +347,8 @@ Before submitting a PR, test your reference solution locally:
frontier-eval batch research --workers 20 --clusters 4
73
+
frontier batch research --workers 20 --clusters 4
74
74
```
75
75
76
76
### Result Storage
77
77
78
78
```bash
79
79
# Local (default): results saved to ./results/batch/{track}/
80
-
frontier-eval batch research
80
+
frontier batch research
81
81
82
82
# Cloud bucket (requires --backend skypilot): results written directly to S3/GCS
83
-
frontier-eval batch research --bucket-url s3://my-bucket/results
83
+
frontier batch research --bucket-url s3://my-bucket/results
84
84
85
85
# Sync from bucket to local
86
-
frontier-eval batch research --bucket-url s3://my-bucket/results --sync-bucket
86
+
frontier batch research --bucket-url s3://my-bucket/results --sync-bucket
87
87
```
88
88
89
89
### Control Options
90
90
91
91
```bash
92
-
frontier-eval batch research --status # Check status
93
-
frontier-eval batch research --no-resume # Force re-evaluate all
94
-
frontier-eval batch research --retry-failed # Retry failed (including score=0)
92
+
frontier batch research --status # Check status
93
+
frontier batch research --no-resume # Force re-evaluate all
94
+
frontier batch research --retry-failed # Retry failed (including score=0)
95
95
```
96
96
97
97
- Incremental evaluation with hash-based caching (solution/problem changes trigger re-evaluation)
@@ -114,7 +114,7 @@ We welcome submissions from all models and agent frameworks. To have your result
114
114
115
115
### Algorithmic Problems
116
116
117
-
We currently release **1 -- 3 public test case** per problem for local testing and debugging. Full evaluation (with all test cases) is performed on our servers.
117
+
We currently release **1-3 public test cases** per problem for local testing and debugging. Full evaluation (with all test cases) is performed on our servers.
118
118
119
119
#### What to Submit
120
120
@@ -174,7 +174,7 @@ Problem (e.g., gemm_optimization, poc_generation)
174
174
175
175
Each variant has a unique **Problem ID** based on its path under `research/`.
176
176
177
-
The full list of all evaluatable variants is in [`research/problems.txt`](research/problems.txt) (109 variants total, aggregated into ~50 categories for reporting).
177
+
The full list of all evaluatable variants is in [`research/scripts/problems.txt`](research/scripts/problems.txt).
178
178
179
179
| Type | Example Path | Problem ID |
180
180
|------|-------------|------------|
@@ -309,7 +309,9 @@ export GOOGLE_API_KEY=...
309
309
310
310
### Generate Solutions
311
311
312
-
#### Research Track (Python)
312
+
#### Research Track
313
+
314
+
Most research problems are Python, but some (e.g., `nbody_simulation`) require C++. The language is configured per-problem via `language` field in `config.yaml`.
0 commit comments