Sentra CLI is a local-first terminal assistant with explicit local model operations, runtime fallbacks, and persistent sessions.
cmake -S . -B build && cmake --build build
./build/sentraOptional:
./build/sentra --config sentra.conf --session session-123/model list/model current/model use <id|num>/model add <id> <hf-repo> <hf-file> [local-path]/model download <id|num>/model validate/model remove <id|num>(asks for confirmation)
Model presets are defined in models.tsv:
id<TAB>name<TAB>hf_repo<TAB>hf_file<TAB>local_path
Active model selection is persisted across runs via state_file in config.
Example for adding a new Hugging Face GGUF and running it:
/model add qwen25_7b_q4km Qwen/Qwen2.5-7B-Instruct-GGUF qwen2.5-7b-instruct-q4_k_m.gguf
/model download qwen25_7b_q4km
/model use qwen25_7b_q4km
/model validate
Core commands:
/help/status/clear/menu/menu run <n>/code list/code copy [n]/code shell/code shell run [n]
Menu mode:
- Run
/menuto enter menu mode (menu>prompt). - Type menu numbers directly (
1,2,3, ...). - Type
q/quit/exitto leave via menu action0. - Any slash command exits menu mode and runs normally.
Quick aliases (no leading slash needed in normal prompt):
help,h,?->/helpmenu,m->/menustatus,s->/statusclear,cls->/clearmodels->/model listuse <id|num>->/model use <id|num>download <id|num>->/model download <id|num>remove <id|num>->/model remove <id|num>q,quit,exit->/exit
Use sentra.conf:
runtime_preference=llama-inproc|local-binary|mocklocal_command_template=llama-cli -m {model_path} -n {max_tokens} --no-display-prompt -p {prompt}max_tokens=...context_window_tokens=...profile=fast|balanced|qualityllama_n_threads=...llama_n_threads_batch=...llama_n_batch=...llama_offload_kqv=true|falsellama_op_offload=true|false
llama-inproc runs GGUF directly through linked libllama inside Sentra (no llama-cli subprocess).
local-binary requires placeholders {model_path}, {prompt}, and {max_tokens} and a resolvable executable on PATH. If unavailable, Sentra falls back deterministically to the first available runtime and prints a startup note.
Runtime commands:
/profile fast|balanced|quality/set max_tokens <n>/set context <n>/set stream raw|render/status
Notes:
fastlowers context and output token budgets and defaults to raw streaming.rawstreaming improves perceived latency (first visible output sooner).- Each turn prints a perf line:
first_token=...mstotal=...mstokens=...tps=...
| Symptom | Likely Cause | Action |
|---|---|---|
runtime 'X' unavailable; using 'Y' |
Preferred runtime unavailable on this build/machine | Install required dependencies or choose an available runtime |
llama-inproc failed to create context |
Local backend/device initialization issue | Ensure libllama and libggml are installed and rebuild; verify model is valid and memory is sufficient |
active model path is missing |
Model file not downloaded or removed | Run /model download <id> then /model validate |
local-binary runtime failed with exit code ... |
Runtime process error | Inspect printed stderr, verify model path, and run command template manually |
| Download 401/403 | Hugging Face auth/license not satisfied | Run huggingface-cli login, accept model license, retry |
hf_transfer not installed message |
Optional acceleration package missing | Continue with fallback download path or install hf_transfer |
- 3B-4B quantized models: lower memory footprint, fastest startup, good for laptops.
- 7B-8B quantized models: stronger quality, moderate memory/latency tradeoff.
- Higher parameter models: require substantially more RAM/VRAM; prefer desktop-class hardware.
- If latency grows in long chats, reduce
max_tokensand/orcontext_window_tokens.
/session/session info/session list
Session logs are append-only in .sentra/sessions/<session-id>.log using a structured v1 line format. Metadata is stored in .sentra/sessions/<session-id>.meta with created time, active model id, and runtime name.
./build/sentra_tests
./tests/smoke_repl.shinclude/sentra/: public interfaces and typessrc/core/: orchestration, registry, state, sessions, context windowingsrc/runtime/: runtime adapterssrc/cli/: REPL loop and command handlingscripts/: operational helpers (downloads)docs/: architecture and operations notes
- Future features are put here...
To contribute to the project, you can look at any of the issues, or take on one of the roadmap features. Feel free to reach out to any of the contributors for questions.