Local AI Server for Linux

Run GGUF language models locally with llama.cpp, CPU or GPU acceleration, and llama-swap. The server exposes an OpenAI-compatible API and discovers models placed in the configured install directory, which defaults to ~/ai/models.

What it provides

OpenAI-compatible chat and completion endpoints
CPU mode plus optional Vulkan, ROCm, OpenVINO, or SYCL llama.cpp backends
Automatic discovery of .gguf model files
On-demand model loading and switching through llama-swap
A systemd user service
A localai command for service, model, update, and uninstall tasks

Requirements

Ubuntu, Debian, Fedora, RHEL, or another compatible x86-64 Linux system
A working CPU install, or a supported GPU/runtime for your selected backend
sudo access during installation
Enough RAM and VRAM for the model and quantization you choose

The installer uses the known compatible releases llama.cpp b9672 and llama-swap v226. The separate update script checks for newer releases. The default llama.cpp backend is vulkan. For CPU-only machines or simple VM testing, use LLAMA_CPP_BACKEND=cpu; CPU installs use smaller defaults and no GPU offload.

The installer can install required packages with apt-get, dnf, or yum.

Install

One-line install:

curl -fsSL https://hossbit.github.io/localai/install.sh | bash

Custom install directory:

curl -fsSL https://hossbit.github.io/localai/install.sh | LOCALAI_DIR="$HOME/my-ai" bash

Manual install:

git clone https://github.com/hossbit/local-ai-server.git
cd local-ai-server
chmod +x ./*.sh
./install-local-ai.sh

The installer asks where to install LocalAI:

LocalAI install directory [~/ai]:

Press Enter to use the default ~/ai. To choose the path without a prompt, set LOCALAI_DIR:

LOCALAI_DIR=~/my-ai ./install-local-ai.sh

Or pass --dir:

./install-local-ai.sh --dir ~/my-ai

Choose a llama.cpp backend with LLAMA_CPP_BACKEND. The default is vulkan.

LLAMA_CPP_BACKEND=cpu ./install-local-ai.sh
LLAMA_CPP_BACKEND=vulkan ./install-local-ai.sh
LLAMA_CPP_BACKEND=rocm ./install-local-ai.sh
LLAMA_CPP_BACKEND=openvino ./install-local-ai.sh
LLAMA_CPP_BACKEND=sycl-fp16 ./install-local-ai.sh
LLAMA_CPP_BACKEND=sycl-fp32 ./install-local-ai.sh

The installer does not start the server automatically. If no .gguf files are found in the models directory, it prints a warning because chat requests need a model. Add at least one model, then use the service commands below to start and check LocalAI.

To start it automatically when you log in:

systemctl --user enable --now localai

Add a model

Place one or more .gguf files in:

~/ai/models

If you installed somewhere else, use that directory's models folder instead.

For example, with the Hugging Face CLI:

python3 -m pip install --user huggingface_hub
hf auth login

hf download bartowski/Qwen2.5-Coder-7B-Instruct-GGUF \
  Qwen2.5-Coder-7B-Instruct-Q4_K_M.gguf \
  --local-dir ~/ai/models

Some model repositories require a Hugging Face account and read token. See Hugging Face access tokens.

The model ID exposed by the API is the filename without .gguf. For example:

Qwen2.5-Coder-7B-Instruct-Q4_K_M.gguf

becomes:

Qwen2.5-Coder-7B-Instruct-Q4_K_M

Use the server

Read the selected port:

PORT=$(cat ~/ai/conf/port)

For a custom install directory:

PORT=$(cat ~/my-ai/conf/port)

List available models:

curl "http://127.0.0.1:${PORT}/v1/models"

Send a chat request:

MODEL="Qwen2.5-Coder-7B-Instruct-Q4_K_M"

curl "http://127.0.0.1:${PORT}/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d "{
    \"model\": \"${MODEL}\",
    \"messages\": [
      {\"role\": \"user\", \"content\": \"What is Linux?\"}
    ]
  }"

Python with the OpenAI SDK:

from pathlib import Path
from openai import OpenAI

port = Path.home().joinpath("ai/conf/port").read_text().strip()
client = OpenAI(base_url=f"http://127.0.0.1:{port}/v1", api_key="local")

response = client.chat.completions.create(
    model="Qwen2.5-Coder-7B-Instruct-Q4_K_M",
    messages=[{"role": "user", "content": "Hello!"}],
)

print(response.choices[0].message.content)

The local server does not validate api_key, but OpenAI client libraries usually require a non-empty value.

Service and helper commands

Most users only need these:

Command	Purpose
`localai start`	Start the service.
`localai stop`	Unload loaded models, then stop the service.
`localai restart`	Restart the service.
`localai status`	Show service, process, API, and port status.
`localai check`	Check the API and model list.
`localai logs`	Follow LocalAI logs.
`localai models`	List installed `.gguf` models and show loaded state when the API is reachable.
`localai load MODEL`	Warm one model, for example `localai load Qwen2.5-Coder-7B-Instruct-Q4_K_M`.
`localai unload MODEL`	Release one loaded model.
`localai unload all`	Release all loaded models.
`localai update`	Update installed components.
`localai version`	Show component versions.
`localai uninstall`	Remove helper files; models are kept by default.

Advanced forms:

Command	Purpose
`localai check --chat`	Also send a tiny chat request.
`localai load all`	Warm every model; use only when you have enough memory.
`localai update --no-start`	Update and leave the service stopped.
`LLAMA_CPP_BACKEND=cpu localai update`	Switch backend during update.
`LOCALAI_CTX_SIZE=8192 LOCALAI_N_GPU_LAYERS=20 localai start`	Override runtime settings for one start.
`LOCALAI_FLASH_ATTN=1 LOCALAI_PARALLEL=2 localai start`	Enable optional llama-server tuning for one start.
`localai uninstall --remove-models`	Also remove downloaded models.
`localai uninstall --dir ~/my-ai`	Uninstall from a custom directory.
`localai uninstall --remove-llama-swap`	Also remove the per-user `llama-swap` binary.

Configuration

Shared defaults live in:

localai.conf          # source default
~/ai/conf/localai.conf # installed copy

This file contains install paths, service names, port settings, and llama.cpp runtime defaults. Environment variables still override the config for one command.

bin/rebuild-config.sh creates conf/config.yaml from every .gguf file in the install directory's models folder. It runs automatically whenever the server starts.

Default runtime settings are:

Vulkan and other GPU-capable backends: context size 16384, GPU layers 8
CPU backend: context size 4096, GPU layers 0
Threads: 6
KV cache: q4_0
Jinja chat templates: enabled
Flash attention, mlock, no-mmap, parallel, batch size, and ubatch size: disabled unless configured
Idle model timeout: 900 seconds

Useful llama-server tuning variables:

Variable	Effect
`LOCALAI_CTX_SIZE`	Sets `--ctx-size`.
`LOCALAI_N_GPU_LAYERS`	Sets `--n-gpu-layers`.
`LOCALAI_THREADS`	Sets `-t`.
`LOCALAI_CACHE_TYPE_K` / `LOCALAI_CACHE_TYPE_V`	Set KV cache quantization.
`LOCALAI_PARALLEL`	Adds `--parallel` when set.
`LOCALAI_BATCH_SIZE`	Adds `--batch-size` when set.
`LOCALAI_UBATCH_SIZE`	Adds `--ubatch-size` when set.
`LOCALAI_FLASH_ATTN`	Adds `--flash-attn` when set to `1`.
`LOCALAI_JINJA`	Adds `--jinja` when set to `1`; default is `1`.
`LOCALAI_MLOCK`	Adds `--mlock` when set to `1`.
`LOCALAI_NO_MMAP`	Adds `--no-mmap` when set to `1`.
`LOCALAI_EXTRA_LLAMA_ARGS`	Appends extra single-line llama-server flags.

Override any of these for one start with the start command form shown in the service command table, or edit ~/ai/conf/localai.conf to make the setting persistent.

Troubleshooting

Check the configured port and models:

cat ~/ai/conf/port
curl "http://127.0.0.1:$(cat ~/ai/conf/port)/v1/models"

Replace ~/ai with your selected install directory if needed.

Check GPU detection:

~/ai/bin/llama-server --list-devices

Check logs:

tail -n 100 ~/ai/logs/llama-swap.log

If a Hugging Face download returns 401 Unauthorized:

hf auth logout
hf auth login
hf auth whoami

Security

The helper scripts bind llama-swap to 127.0.0.1, so the API is available only on the local machine by default. Do not expose it to a network without adding authentication, TLS, and appropriate firewall rules.

Credits

This project is built on top of:

Special thanks to the maintainers and contributors of these projects.

LocalAI focuses on simplifying installation, configuration, model management, and service deployment for local LLM environments.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.github/workflows		.github/workflows
lib		lib
site/localai		site/localai
tests		tests
LICENSE		LICENSE
README.md		README.md
install-local-ai.sh		install-local-ai.sh
install.sh		install.sh
localai		localai
localai.conf		localai.conf
rebuild-config.sh		rebuild-config.sh
start.sh		start.sh
stop.sh		stop.sh
uninstall-local-ai.sh		uninstall-local-ai.sh
update-local-ai.sh		update-local-ai.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Local AI Server for Linux

What it provides

Requirements

Install

Add a model

Use the server

Service and helper commands

Configuration

Troubleshooting

Security

Credits

Support

About

Uh oh!

Releases 12

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Local AI Server for Linux

What it provides

Requirements

Install

Add a model

Use the server

Service and helper commands

Configuration

Troubleshooting

Security

Credits

Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 12

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages