Commit 64b087c
Ralf Waldukat
Fix flash_attn default to match upstream AUTO behavior
Critical fixes from code review:
- server/settings.py: Change flash_attn default from False to None (AUTO)
Upstream llama.cpp defaults to LLAMA_FLASH_ATTN_TYPE_AUTO, server was
incorrectly forcing DISABLED, blocking optimization for models that need it
- llama_cpp.py: Consistent stub style (pass -> ...) for llama_max_tensor_buft_overrides
- CMakeLists.txt: Document version workaround for mtmd build1 parent 77b13a4 commit 64b087c
3 files changed
+5
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
154 | 154 | | |
155 | 155 | | |
156 | 156 | | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
157 | 160 | | |
158 | 161 | | |
159 | 162 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1400 | 1400 | | |
1401 | 1401 | | |
1402 | 1402 | | |
1403 | | - | |
| 1403 | + | |
1404 | 1404 | | |
1405 | 1405 | | |
1406 | 1406 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
104 | 104 | | |
105 | 105 | | |
106 | 106 | | |
107 | | - | |
| 107 | + | |
108 | 108 | | |
109 | 109 | | |
110 | 110 | | |
| |||
0 commit comments