Skip to content

Conversation

@kitaekatt
Copy link

Summary

When using --parallel N, the --ctx-size value is the total context divided among all slots, not the per-slot context. This is a common source of confusion (see #11681, #5732).

Changes

Added clarification to two flags in tools/server/README.md:

--ctx-size: Added note explaining that when using --parallel N, this is the total context divided among all slots. Each slot gets ctx-size / parallel tokens. To allocate X tokens per slot with N parallel slots, set --ctx-size to X * N.

--parallel: Added note that the total context is divided equally among these slots.

Example

  • --ctx-size 4096 --parallel 4 → each slot gets 1024 tokens
  • To get 4096 tokens per slot with 4 parallel slots, use --ctx-size 16384 --parallel 4

Related Issues

@ngxson
Copy link
Collaborator

ngxson commented Dec 4, 2025

this documentation is auto-generated, modify its source in arg.cpp instead

@taronaeo
Copy link
Collaborator

taronaeo commented Dec 5, 2025

This is a common source of confusion (see #11681, #5732).

#17671 as well

When using --parallel N, the --ctx-size value is the TOTAL context
divided among all slots, not the per-slot context. This is a common
source of confusion (see ggml-org#11681, ggml-org#5732).

Examples:
- --ctx-size 4096 --parallel 4 → each slot gets 1024 tokens
- To get 4096 tokens per slot with 4 parallel slots, use --ctx-size 16384

Updated the help text in arg.cpp (the source for auto-generated docs)
for both --ctx-size and --parallel flags to clarify this behavior.

Fixes ggml-org#11681
@kitaekatt kitaekatt force-pushed the docs/clarify-ctx-size-parallel branch from b43b61d to dc0d886 Compare December 6, 2025 15:34
@kitaekatt
Copy link
Author

I have moved the changes to arg.cpp!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Misc. bug: llama-server --ctx-size is divided by --parallel and cannot be increased?

3 participants