docs(server): clarify that --ctx-size is total context divided among parallel slots #17767

kitaekatt · 2025-12-04T20:22:07Z

Summary

When using --parallel N, the --ctx-size value is the total context divided among all slots, not the per-slot context. This is a common source of confusion (see #11681, #5732).

Changes

Added clarification to two flags in tools/server/README.md:

--ctx-size: Added note explaining that when using --parallel N, this is the total context divided among all slots. Each slot gets ctx-size / parallel tokens. To allocate X tokens per slot with N parallel slots, set --ctx-size to X * N.

--parallel: Added note that the total context is divided equally among these slots.

Example

--ctx-size 4096 --parallel 4 → each slot gets 1024 tokens
To get 4096 tokens per slot with 4 parallel slots, use --ctx-size 16384 --parallel 4

Related Issues

Fixes Misc. bug: llama-server --ctx-size is divided by --parallel and cannot be increased? #11681 (ctx-size divided by parallel confusion)
Related to Context length documentation confusion #5732 (context length documentation confusion)

ngxson · 2025-12-04T20:37:16Z

this documentation is auto-generated, modify its source in arg.cpp instead

taronaeo · 2025-12-05T07:08:41Z

This is a common source of confusion (see #11681, #5732).

#17671 as well

When using --parallel N, the --ctx-size value is the TOTAL context divided among all slots, not the per-slot context. This is a common source of confusion (see ggml-org#11681, ggml-org#5732). Examples: - --ctx-size 4096 --parallel 4 → each slot gets 1024 tokens - To get 4096 tokens per slot with 4 parallel slots, use --ctx-size 16384 Updated the help text in arg.cpp (the source for auto-generated docs) for both --ctx-size and --parallel flags to clarify this behavior. Fixes ggml-org#11681

kitaekatt · 2025-12-06T15:37:31Z

I have moved the changes to arg.cpp!

kitaekatt requested review from ggerganov and ngxson as code owners December 4, 2025 20:22

loci-dev mentioned this pull request Dec 4, 2025

UPSTREAM PR #17767: docs(server): clarify that --ctx-size is total context divided among parallel slots auroralabs-loci/llama.cpp#439

Open

github-actions bot added examples server labels Dec 4, 2025

kitaekatt force-pushed the docs/clarify-ctx-size-parallel branch from b43b61d to dc0d886 Compare December 6, 2025 15:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs(server): clarify that --ctx-size is total context divided among parallel slots #17767

docs(server): clarify that --ctx-size is total context divided among parallel slots #17767

kitaekatt commented Dec 4, 2025

Uh oh!

ngxson commented Dec 4, 2025

Uh oh!

taronaeo commented Dec 5, 2025 •

edited

Loading

Uh oh!

kitaekatt commented Dec 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

docs(server): clarify that --ctx-size is total context divided among parallel slots #17767

Are you sure you want to change the base?

docs(server): clarify that --ctx-size is total context divided among parallel slots #17767

Conversation

kitaekatt commented Dec 4, 2025

Summary

Changes

Example

Related Issues

Uh oh!

ngxson commented Dec 4, 2025

Uh oh!

taronaeo commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kitaekatt commented Dec 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

taronaeo commented Dec 5, 2025 •

edited

Loading