[cuda backend] optimized L_kv threshold for sdpa implementation selection. #6893
| Job | Run time |
|---|---|
| 32s | |
| 2s | |
| 29m 34s | |
| 29m 33s | |
| 25m 52s | |
| 16m 16s | |
| 30m 13s | |
| 26m 2s | |
| 36m 3s | |
| 35m 20s | |
| 36m 15s | |
| 35m 42s | |
| 35m 26s | |
| 36m 36s | |
| 6h 13m 26s |
| Job | Run time |
|---|---|
| 32s | |
| 2s | |
| 29m 34s | |
| 29m 33s | |
| 25m 52s | |
| 16m 16s | |
| 30m 13s | |
| 26m 2s | |
| 36m 3s | |
| 35m 20s | |
| 36m 15s | |
| 35m 42s | |
| 35m 26s | |
| 36m 36s | |
| 6h 13m 26s |