Skip to content

Commit 98dde6f

Browse files
Merge pull request #10 from XyLearningProgramming/bugfix/oom
allowed more mem usage to avoid oom
2 parents 7ea9394 + de73260 commit 98dde6f

File tree

2 files changed

+3
-4
lines changed

2 files changed

+3
-4
lines changed

deploy/helm/values.yaml

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -79,12 +79,11 @@ env: {}
7979

8080
# Resource requests and limits for the container.
8181
# See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
82-
# Tuned for Qwen3-0.6B-Q4_K_M (484 MB) on 1-CPU / 1 GB VPS nodes.
83-
# Previous values for Q8_0 (805 MB): limits cpu=3/mem=800Mi, requests cpu=50m/mem=32Mi
82+
# Tuned for Qwen3-0.6B-Q4_K_M (484 MB) + n_ctx=8192 KV cache (~448 MB) on 1-CPU / 1 GB VPS nodes.
8483
resources:
8584
limits:
8685
cpu: 1
87-
memory: 700Mi
86+
memory: 1Gi
8887
requests:
8988
cpu: 200m
9089
memory: 600Mi

slm_server/config.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ class Settings(BaseSettings):
6262
description="Owner label for /models list. Set SLM_MODEL_OWNER to override.",
6363
)
6464
n_ctx: int = Field(
65-
4096, description="Maximum context window (input + generated tokens)."
65+
8192, description="Maximum context window (input + generated tokens)."
6666
)
6767
n_threads: int = Field(
6868
2, description="Number of OpenMP threads llama‑cpp will spawn."

0 commit comments

Comments
 (0)