local-inference

Repo currently documents a mostly-working works-on-my-machine configuration that delivers a "useful" solution to local inference on a M4 Max 128Gb device. Contents may evolve over time.

This may, or not, be a configuration already folded into eg LM-Studio, but at the time of creation (early May 2026) no such off-the-shelf working configuration was found by the author.

power throttling

sudo powermetrics -i 100 -s gpu_power

If SW requested state is higher than SW state, there is throttling of some description, likely thermal:

GPU SW requested state: (P1 :   0% P2 :   0% P3 :   0% P4 :   0% P5 :   0% P6 :   0% P7 :   0% P8 :   0% P9 :   0% P10 : 100% P11 :   0% P12 :   0% P13 :   0% P14 :   0% P15 :   0%)
GPU SW state: (SW_P1 :   0% SW_P2 :   0% SW_P3 :   0% SW_P4 :   0% SW_P5 :   0% SW_P6 :   0% SW_P7 :   0% SW_P8 :  48% SW_P9 :  52% SW_P10 :   0% SW_P11 :   0% SW_P12 :   0% SW_P13 :   0% SW_P14 :   0% SW_P15 :   0%)

opencode

The goal is to minimise the circumstances in which opencode does compaction, since this is essentially a prefill.

compaction snippet:

leave some space for the compact summary
bump max history messages to a value that we should be unlikely to reach

{
  "$schema": "https://opencode.ai/config.json",
  "compaction": {
    "auto": true,
    "threshold": 0.98,
    "max_history_messages": 4000,
    "prune": false,
    "reserved": 4096
  },

model snippet:

we leave some headroom between stated max context and actual to avoid failures
set a max output that should match the llama.cpp setting, as this seems to help contain thinking loops

        "Qwen3.6-27B-Q8_4.gguf": {
          "name": "Qwen3.6-27B-Q4_0.gguf",
          "limit": { "context": 260000, "input": 260000, "output": 8192 }
        }

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
README.md		README.md
chat-template_froggeric_qwen-001.jinja		chat-template_froggeric_qwen-001.jinja
chat-template_froggeric_qwen-V19.jinja		chat-template_froggeric_qwen-V19.jinja
llama.cpp-modelconfig-qwen3.6-26B-MTP-worksonmymachine.md		llama.cpp-modelconfig-qwen3.6-26B-MTP-worksonmymachine.md
llama.cpp-modelconfig-qwen3.6-35B-A3B-worksonmymachine.md		llama.cpp-modelconfig-qwen3.6-35B-A3B-worksonmymachine.md
llama.cpp-version.md		llama.cpp-version.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

local-inference

power throttling

opencode

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

local-inference

power throttling

opencode

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages