[BUG] Default docker /dev/shm size too small for tensor parallelism to work

### OS

Linux

### GPU Library

CUDA 12.x

### Python version

3.12

### Describe the bug

You guys are awesome, thank you so much for your work!

When using the docker compose file provided in the repo, loading a model with tensor parallelism fails. I get a `[Errno 28] No space left on device.` error (see logs below). I think I've pinpointed it down, it seems to reference to the space available at /dev/shm.

Increasing the `shm_size` of the compose service to a larger value allowed to me to load the model successfully. The default is 64mb if my research is correct. Setting it to 2gb did not help, but setting it to 16gb did work. I assume it has to be large enough to fit a whole model layer.

### Reproduction steps

1. Spin up the API using the docker-compose.yml provided in this repo
2. Load a model via the API with tensor parallelism enabled (I've tried with Doctor-Shotgun/GLM-4.5-Air-exl3_3.14bpw-h6)

### Expected behavior

I'd expect the default docker-compose.yml file to work out of the box. I suggest adding a shm_size that works for most setups. Maybe you have some insights into how much space is required. I'd be willing to create a PR to adjust the docker-compose file and maybe add a note to the wiki if you'd like.

### Logs

```
tabbyapi-1  | 2025-10-11 09:52:28.381 INFO:     127.0.0.1:51336 - "POST /v1/model/load 
tabbyapi-1  | HTTP/1.1" 200
tabbyapi-1  | 2025-10-11 09:52:28.811 INFO:     Using backend exllamav3
tabbyapi-1  | 2025-10-11 09:52:28.815 INFO:     exllamav3 version: 0.0.7
tabbyapi-1  | 2025-10-11 09:52:28.816 WARNING:  ExllamaV3 is currently in an alpha state. 
tabbyapi-1  | Please note that all config options may not work.
tabbyapi-1  | 2025-10-11 09:52:31.175 WARNING:  The provided model does not have vision 
tabbyapi-1  | capabilities that are supported by ExllamaV3. Vision input is disabled.
tabbyapi-1  | 2025-10-11 09:52:31.176 WARNING:  Draft model is disabled because a model name 
tabbyapi-1  | wasn't provided. Please check your config.yml!
tabbyapi-1  | 2025-10-11 09:52:31.176 WARNING:  The given cache size (86000) is not a multiple
tabbyapi-1  | of 256.
tabbyapi-1  | 2025-10-11 09:52:31.176 WARNING:  Overriding cache_size with an overestimated 
tabbyapi-1  | value of 86016 tokens.
tabbyapi-1  | 2025-10-11 09:52:31.177 WARNING:  The given cache_size (86016) is less than 2 * 
tabbyapi-1  | max_seq_len and may be too small for requests using CFG. 
tabbyapi-1  | 2025-10-11 09:52:31.177 WARNING:  Ignore this warning if you do not plan on 
tabbyapi-1  | using CFG.
tabbyapi-1  | 2025-10-11 09:52:31.185 INFO:     Attempting to load a prompt template if 
tabbyapi-1  | present.
tabbyapi-1  | 2025-10-11 09:52:31.211 INFO:     Using template "chat_template" for chat 
tabbyapi-1  | completions.
tabbyapi-1  | 2025-10-11 09:52:31.213 INFO:     Loading model: 
tabbyapi-1  | /app/models/GLM-4.5-Air-exl3_3.14bpw-h6
tabbyapi-1  | 2025-10-11 09:52:31.213 INFO:     Loading with tensor parallel
tabbyapi-1  | /opt/venv/lib/python3.12/site-packages/joblib/_multiprocessing_helpers.py:44: UserWarning: [Errno 28] No space left on device.  joblib will operate in serial mode
tabbyapi-1  |   warnings.warn("%s.  joblib will operate in serial mode" % (e,))
tabbyapi-1  | /opt/venv/lib/python3.12/site-packages/joblib/_multiprocessing_helpers.py:44: UserWarning: [Errno 28] No space left on device.  joblib will operate in serial mode
tabbyapi-1  |   warnings.warn("%s.  joblib will operate in serial mode" % (e,))
```

### Additional context

_No response_

### Acknowledgements

- [x] I have looked for similar issues before submitting this one.
- [x] I have read the disclaimer, and this issue is related to a code bug. If I have a question, I will use the Discord server.
- [x] I understand that the developers have lives and my issue will be answered when possible.
- [x] I understand the developers of this program are human, and I will ask my questions politely.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] Default docker /dev/shm size too small for tensor parallelism to work #387

OS

GPU Library

Python version

Describe the bug

Reproduction steps

Expected behavior

Logs

Additional context

Acknowledgements

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[BUG] Default docker /dev/shm size too small for tensor parallelism to work #387

Description

OS

GPU Library

Python version

Describe the bug

Reproduction steps

Expected behavior

Logs

Additional context

Acknowledgements

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions