Commit 781d1ed
Project Team
Fix streaming timeout: use httpx.Timeout to separate connect from read
llama3.2-vision encodes the image before emitting any tokens, so
first-token latency on a T4 can be 30-90s under VRAM pressure.
Passing a plain integer to ollama.Client applied that value as the
httpx read timeout on every individual chunk, which fired during the
image-encoding phase (before the first token) even though Ollama was
working correctly.
Use httpx.Timeout(timeout=<configured>, connect=10) so the read
timeout covers the full inference window, while the connect timeout
still fails fast if Ollama is unreachable.1 parent d4a71f6 commit 781d1ed
1 file changed
Lines changed: 17 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
83 | 83 | | |
84 | 84 | | |
85 | 85 | | |
86 | | - | |
87 | | - | |
88 | | - | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
89 | 98 | | |
| 99 | + | |
90 | 100 | | |
91 | 101 | | |
92 | | - | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
93 | 106 | | |
94 | 107 | | |
95 | 108 | | |
| |||
0 commit comments