Commit 208fd03
committed
Fix batch-and-skip benchmark exploit via per-call timing and correctness checks
The current eval times all 15 custom_kernel() calls as a single batch and
divides by 15. A malicious submission can exploit this by deferring all work
to one call (batching 15 problems into a single kernel launch) and making the
other 14 calls no-ops, reporting ~1/15th of the real per-call cost.
Cloning data alone (as proposed in #102) does not fully prevent this -- a
shape-matching fallback path can still collect new data objects and batch them.
This fix:
- Clones data each timing iteration (prevents object-identity caching)
- Times each call individually with its own CUDA events and GPU sync
(prevents amortization across calls)
- Checks correctness after each individual call in recheck/leaderboard mode
(catches deferred-computation exploits that return uncomputed tensors)
- Uses a local seed variable instead of mutating test.args
- Fixes the recheck indentation bug where only the last call was checked1 parent 04c0b02 commit 208fd03
1 file changed
+32
-23
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
240 | 240 | | |
241 | 241 | | |
242 | 242 | | |
243 | | - | |
| 243 | + | |
244 | 244 | | |
| 245 | + | |
245 | 246 | | |
246 | | - | |
247 | | - | |
248 | | - | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
249 | 253 | | |
250 | 254 | | |
251 | 255 | | |
| |||
263 | 267 | | |
264 | 268 | | |
265 | 269 | | |
266 | | - | |
267 | | - | |
268 | | - | |
269 | | - | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
270 | 274 | | |
271 | 275 | | |
272 | 276 | | |
| 277 | + | |
273 | 278 | | |
| 279 | + | |
274 | 280 | | |
| 281 | + | |
275 | 282 | | |
276 | | - | |
277 | | - | |
278 | | - | |
279 | | - | |
280 | | - | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
281 | 287 | | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
282 | 293 | | |
283 | | - | |
284 | | - | |
285 | | - | |
286 | | - | |
287 | | - | |
288 | 294 | | |
289 | | - | |
290 | | - | |
291 | | - | |
292 | | - | |
293 | | - | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
294 | 302 | | |
| 303 | + | |
295 | 304 | | |
296 | 305 | | |
297 | 306 | | |
| |||
0 commit comments