cli: add option to connect to server via http(s)#21674
cli: add option to connect to server via http(s)#21674pwilkin wants to merge 2 commits intoggml-org:masterfrom
Conversation
| struct cli_backend { | ||
| virtual ~cli_backend() = default; | ||
|
|
||
| // model / server info | ||
| virtual std::string get_model_name() const = 0; | ||
| virtual bool has_vision() const = 0; | ||
| virtual bool has_audio() const = 0; | ||
| virtual std::string get_build_info() const = 0; | ||
|
|
||
| // chat completion (streaming), returns assistant content text | ||
| virtual std::string generate_completion( | ||
| const json & messages, | ||
| const common_params & params, | ||
| bool verbose_prompt, | ||
| result_timings & out_timings) = 0; | ||
|
|
||
| // load a local text file, return its contents (empty string on failure) | ||
| virtual std::string load_text_file(const std::string & fname) = 0; | ||
|
|
||
| // load a local media file, return the OAI content part JSON for it | ||
| // returns empty JSON object on failure | ||
| virtual json load_media_file(const std::string & fname) = 0; | ||
|
|
||
| // cleanup | ||
| virtual void terminate() = 0; | ||
| }; |
There was a problem hiding this comment.
I imagine this will add double the effort each time someone adds a new feature to the CLI
Not a wise choice for long-term maintenance. The CLI should either support native API or remote API, but not both
There was a problem hiding this comment.
To be honest, I really do feel like having the remote API as the only one would be the better option. As in: it would add interoperability, it would make it simpler to implement the MCP / command execution stuff and it would remove the need to keep a separate track for accessing the server. And all it would take to retain the current functionality of launching the client and the server at the same time would be a simple wrapper.
Putting this up for consideration and converting this to draft for now.
|
@ngxson since we don't want double APIs, what do you think of a prototype here that does the following:
|
|
Honestly I don't have a strong opinion on whether the CLI should use native API, HTTP API or another IPC mechanism like unix socket. However, since most LLM CLI uses HTTP API under the hood, I agree that it may be better in the longer term to go with that for llama-cli. I do have 2 concerns though:
For the point (1), no actions is needed from your side, I will eventually implement it (which goes back to the idea of |
|
@ngxson cross-platform daemon management can get really tricky, so I'd prefer not to go that route. I'd say spawning a |
Overview
Adds an
--endpointoption to connect to an existing server instance.Additional information
In many cases, people want to run a
llama-serverfor various uses but also might want a quick test UI in cases where they cannot access the WebUI (i.e. pure console / terminal environments). Sincellama-clispawns a separate server instance, you cannot run both in VRAM-constrained environments, so having the option to runllama-cliwith allama-serverendpoint seems desirable.Requirements
gotoin it, so I had to double-check.