-
-
Notifications
You must be signed in to change notification settings - Fork 138
Description
OS
Linux
GPU Library
CUDA 12.x
Python version
3.12
Describe the bug
You guys are awesome, thank you so much for your work!
When using the docker compose file provided in the repo, loading a model with tensor parallelism fails. I get a [Errno 28] No space left on device. error (see logs below). I think I've pinpointed it down, it seems to reference to the space available at /dev/shm.
Increasing the shm_size of the compose service to a larger value allowed to me to load the model successfully. The default is 64mb if my research is correct. Setting it to 2gb did not help, but setting it to 16gb did work. I assume it has to be large enough to fit a whole model layer.
Reproduction steps
- Spin up the API using the docker-compose.yml provided in this repo
- Load a model via the API with tensor parallelism enabled (I've tried with Doctor-Shotgun/GLM-4.5-Air-exl3_3.14bpw-h6)
Expected behavior
I'd expect the default docker-compose.yml file to work out of the box. I suggest adding a shm_size that works for most setups. Maybe you have some insights into how much space is required. I'd be willing to create a PR to adjust the docker-compose file and maybe add a note to the wiki if you'd like.
Logs
tabbyapi-1 | 2025-10-11 09:52:28.381 INFO: 127.0.0.1:51336 - "POST /v1/model/load
tabbyapi-1 | HTTP/1.1" 200
tabbyapi-1 | 2025-10-11 09:52:28.811 INFO: Using backend exllamav3
tabbyapi-1 | 2025-10-11 09:52:28.815 INFO: exllamav3 version: 0.0.7
tabbyapi-1 | 2025-10-11 09:52:28.816 WARNING: ExllamaV3 is currently in an alpha state.
tabbyapi-1 | Please note that all config options may not work.
tabbyapi-1 | 2025-10-11 09:52:31.175 WARNING: The provided model does not have vision
tabbyapi-1 | capabilities that are supported by ExllamaV3. Vision input is disabled.
tabbyapi-1 | 2025-10-11 09:52:31.176 WARNING: Draft model is disabled because a model name
tabbyapi-1 | wasn't provided. Please check your config.yml!
tabbyapi-1 | 2025-10-11 09:52:31.176 WARNING: The given cache size (86000) is not a multiple
tabbyapi-1 | of 256.
tabbyapi-1 | 2025-10-11 09:52:31.176 WARNING: Overriding cache_size with an overestimated
tabbyapi-1 | value of 86016 tokens.
tabbyapi-1 | 2025-10-11 09:52:31.177 WARNING: The given cache_size (86016) is less than 2 *
tabbyapi-1 | max_seq_len and may be too small for requests using CFG.
tabbyapi-1 | 2025-10-11 09:52:31.177 WARNING: Ignore this warning if you do not plan on
tabbyapi-1 | using CFG.
tabbyapi-1 | 2025-10-11 09:52:31.185 INFO: Attempting to load a prompt template if
tabbyapi-1 | present.
tabbyapi-1 | 2025-10-11 09:52:31.211 INFO: Using template "chat_template" for chat
tabbyapi-1 | completions.
tabbyapi-1 | 2025-10-11 09:52:31.213 INFO: Loading model:
tabbyapi-1 | /app/models/GLM-4.5-Air-exl3_3.14bpw-h6
tabbyapi-1 | 2025-10-11 09:52:31.213 INFO: Loading with tensor parallel
tabbyapi-1 | /opt/venv/lib/python3.12/site-packages/joblib/_multiprocessing_helpers.py:44: UserWarning: [Errno 28] No space left on device. joblib will operate in serial mode
tabbyapi-1 | warnings.warn("%s. joblib will operate in serial mode" % (e,))
tabbyapi-1 | /opt/venv/lib/python3.12/site-packages/joblib/_multiprocessing_helpers.py:44: UserWarning: [Errno 28] No space left on device. joblib will operate in serial mode
tabbyapi-1 | warnings.warn("%s. joblib will operate in serial mode" % (e,))
Additional context
No response
Acknowledgements
- I have looked for similar issues before submitting this one.
- I have read the disclaimer, and this issue is related to a code bug. If I have a question, I will use the Discord server.
- I understand that the developers have lives and my issue will be answered when possible.
- I understand the developers of this program are human, and I will ask my questions politely.