cli: add option to connect to server via http(s) by pwilkin · Pull Request #21674 · ggml-org/llama.cpp

pwilkin · 2026-04-09T11:53:02Z

Overview

Adds an --endpoint option to connect to an existing server instance.

Additional information

In many cases, people want to run a llama-server for various uses but also might want a quick test UI in cases where they cannot access the WebUI (i.e. pure console / terminal environments). Since llama-cli spawns a separate server instance, you cannot run both in VRAM-constrained environments, so having the option to run llama-cli with a llama-server endpoint seems desirable.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: Yes, although GLM 5.1 generated code with a goto in it, so I had to double-check.

ngxson

IMO I'm not quite comfortable with this change. This adds too much for a feature that no one ever asked (via an issue)

If you really need to do this, just code your own CLI thing in higher-level languages like python or nodejs

ngxson · 2026-04-09T16:21:47Z

tools/cli/cli-remote.h

+struct cli_backend {
+    virtual ~cli_backend() = default;
+
+    // model / server info
+    virtual std::string get_model_name() const = 0;
+    virtual bool has_vision() const = 0;
+    virtual bool has_audio() const = 0;
+    virtual std::string get_build_info() const = 0;
+
+    // chat completion (streaming), returns assistant content text
+    virtual std::string generate_completion(
+        const json & messages,
+        const common_params & params,
+        bool verbose_prompt,
+        result_timings & out_timings) = 0;
+
+    // load a local text file, return its contents (empty string on failure)
+    virtual std::string load_text_file(const std::string & fname) = 0;
+
+    // load a local media file, return the OAI content part JSON for it
+    // returns empty JSON object on failure
+    virtual json load_media_file(const std::string & fname) = 0;
+
+    // cleanup
+    virtual void terminate() = 0;
+};


I imagine this will add double the effort each time someone adds a new feature to the CLI

Not a wise choice for long-term maintenance. The CLI should either support native API or remote API, but not both

To be honest, I really do feel like having the remote API as the only one would be the better option. As in: it would add interoperability, it would make it simpler to implement the MCP / command execution stuff and it would remove the need to keep a separate track for accessing the server. And all it would take to retain the current functionality of launching the client and the server at the same time would be a simple wrapper.

Putting this up for consideration and converting this to draft for now.

pwilkin · 2026-04-09T16:34:21Z

@ngxson since we don't want double APIs, what do you think of a prototype here that does the following:

removes the cli-specific path
migrates everything to the http path
if run without --endpoint, launches a server on a random port that is closed as soon as the cli exits to mimic the previous cli functionality
?

ngxson · 2026-04-09T17:07:51Z

Honestly I don't have a strong opinion on whether the CLI should use native API, HTTP API or another IPC mechanism like unix socket. However, since most LLM CLI uses HTTP API under the hood, I agree that it may be better in the longer term to go with that for llama-cli.

I do have 2 concerns though:

Currently, the CLI acts as an example on how easy it is to use llama.cpp as an external library (via binding; without being a HTTP server). If we choose to move CLI away from this, we must still need to add an example of doing so (though, can be much more basic than CLI)
If we choose to use HTTP for CLI, we should no longer link CLI against libserver. The consequence is that CLI must either spawn an llama-server instance, or llama-server should be a daemon.

For the point (1), no actions is needed from your side, I will eventually implement it (which goes back to the idea of llamax library), there are many people already asking for a easy-to-use native API that accepts multimodal. However, for the point (2), I think we need to consider it more carefully.

pwilkin · 2026-04-09T17:15:02Z

@ngxson cross-platform daemon management can get really tricky, so I'd prefer not to go that route. I'd say spawning a llama-server instance that gets shut down at cli close would be the preferred way to go.

cli: add option to connect to server via http(s)

faa7bd8

pwilkin requested review from a team and ngxson as code owners April 9, 2026 11:53

fix Mac failing build

d77c2e4

ngxson requested changes Apr 9, 2026

View reviewed changes

ngxson reviewed Apr 9, 2026

View reviewed changes

pwilkin marked this pull request as draft April 9, 2026 16:29

github-actions bot added the examples label Apr 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cli: add option to connect to server via http(s)#21674

cli: add option to connect to server via http(s)#21674
pwilkin wants to merge 2 commits intoggml-org:masterfrom
pwilkin:llama-cli-remote

pwilkin commented Apr 9, 2026

Uh oh!

ngxson left a comment •

edited

Loading

Uh oh!

ngxson Apr 9, 2026

Uh oh!

pwilkin Apr 9, 2026

Uh oh!

pwilkin commented Apr 9, 2026

Uh oh!

ngxson commented Apr 9, 2026 •

edited

Loading

Uh oh!

pwilkin commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pwilkin commented Apr 9, 2026

Overview

Additional information

Requirements

Uh oh!

ngxson left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngxson Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

pwilkin Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

pwilkin commented Apr 9, 2026

Uh oh!

ngxson commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pwilkin commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ngxson left a comment •

edited

Loading

ngxson commented Apr 9, 2026 •

edited

Loading