[cuda backend] optimized L_kv threshold for sdpa implementation selection. #13858
| Job | Run time |
|---|---|
| 5s | |
| 32s | |
| 46m 25s | |
| 45m 47s | |
| 18m 17s | |
| 17m 44s | |
| 31m 16s | |
| 31m 29s | |
| 22m 7s | |
| 26m 59s | |
| 21m 25s | |
| 17m 33s | |
| 20m 58s | |
| 34m 5s | |
| 30m 40s | |
| 0s | |
| 28m 33s | |
| 33m 25s | |
| 28m 1s | |
| 34m 48s | |
| 21m 33s | |
| 0s | |
| 23m 49s | |
| 3s | |
| 8h 55m 34s |