@@ -105,11 +105,70 @@ python3 chatglm_cpp/convert.py -i THUDM/chatglm2-6b -t q4_0 -o chatglm2-ggml.bin
105105<details open >
106106<summary >ChatGLM3-6B</summary >
107107
108+ ChatGLM3-6B further supports function call and code interpreter in addition to chat mode.
109+
110+ Chat mode:
108111``` sh
109112python3 chatglm_cpp/convert.py -i THUDM/chatglm3-6b -t q4_0 -o chatglm3-ggml.bin
110113./build/bin/main -m chatglm3-ggml.bin -p 你好 --top_p 0.8 --temp 0.8
111114# 你好👋!我是人工智能助手 ChatGLM3-6B,很高兴见到你,欢迎问我任何问题。
112115```
116+
117+ Setting system prompt:
118+ ``` sh
119+ ./build/bin/main -m chatglm3-ggml.bin -p 你好 -s " You are ChatGLM3, a large language model trained by Zhipu.AI. Follow the user's instructions carefully. Respond using markdown."
120+ # 你好👋!我是 ChatGLM3,有什么问题可以帮您解答吗?
121+ ```
122+
123+ Function call:
124+ ~~~
125+ $ ./build/bin/main -m chatglm3-ggml.bin --top_p 0.8 --temp 0.8 --sp examples/system/function_call.txt -i
126+ System > Answer the following questions as best as you can. You have access to the following tools: ...
127+ Prompt > 生成一个随机数
128+ ChatGLM3 > random_number_generator
129+ ```python
130+ tool_call(seed=42, range=(0, 100))
131+ ```
132+ Tool Call > Please manually call function `random_number_generator` with args `tool_call(seed=42, range=(0, 100))` and provide the results below.
133+ Observation > 23
134+ ChatGLM3 > 根据您的要求,我使用随机数生成器API生成了一个随机数。根据API返回结果,生成的随机数为23。
135+ ~~~
136+
137+ Code interpreter:
138+ ~~~
139+ $ ./build/bin/main -m chatglm3-ggml.bin --top_p 0.8 --temp 0.8 --sp examples/system/code_interpreter.txt -i
140+ System > 你是一位智能AI助手,你叫ChatGLM,你连接着一台电脑,但请注意不能联网。在使用Python解决任务时,你可以运行代码并得到结果,如果运行结果有错误,你需要尽可能对代码进行改进。你可以处理用户上传到电脑上的文件,文件默认存储路径是/mnt/data/。
141+ Prompt > 列出100以内的所有质数
142+ ChatGLM3 > 好的,我会为您列出100以内的所有质数。
143+ ```python
144+ def is_prime(n):
145+ """Check if a number is prime."""
146+ if n <= 1:
147+ return False
148+ if n <= 3:
149+ return True
150+ if n % 2 == 0 or n % 3 == 0:
151+ return False
152+ i = 5
153+ while i * i <= n:
154+ if n % i == 0 or n % (i + 2) == 0:
155+ return False
156+ i += 6
157+ return True
158+
159+ primes_upto_100 = [i for i in range(2, 101) if is_prime(i)]
160+ primes_upto_100
161+ ```
162+
163+ Code Interpreter > Please manually run the code and provide the results below.
164+ Observation > [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
165+ ChatGLM3 > 100以内的所有质数为:
166+
167+ $$
168+ 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97
169+ $$
170+ ~~~
171+
113172</details >
114173
115174<details >
@@ -251,7 +310,7 @@ pip install .
251310
252311Pre-built wheels for CPU backend on Linux / MacOS / Windows are published on [release](https://github.com/li-plus/chatglm.cpp/releases). For CUDA / Metal backends, please compile from source code or source distribution.
253312
254- ** Using pre -converted ggml models **
313+ ** Using Pre -converted GGML Models **
255314
256315Here is a simple demo that uses ` chatglm_cpp.Pipeline` to load the GGML model and chat with it. First enter the examples folder (` cd examples` ) and launch a Python interactive shell:
257316` ` ` python
@@ -264,7 +323,7 @@ Here is a simple demo that uses `chatglm_cpp.Pipeline` to load the GGML model an
264323
265324To chat in stream, run the below Python example:
266325` ` ` sh
267- python3 cli_chat .py -m ../chatglm-ggml.bin -i
326+ python3 cli_demo .py -m ../chatglm-ggml.bin -i
268327` ` `
269328
270329Launch a web demo to chat in your browser:
@@ -280,26 +339,56 @@ For other models:
280339< summary> ChatGLM2-6B< /summary>
281340
282341` ` ` sh
283- python3 cli_chat .py -m ../chatglm2-ggml.bin -p 你好 --temp 0.8 --top_p 0.8 # CLI demo
342+ python3 cli_demo .py -m ../chatglm2-ggml.bin -p 你好 --temp 0.8 --top_p 0.8 # CLI demo
284343python3 web_demo.py -m ../chatglm2-ggml.bin --temp 0.8 --top_p 0.8 # web demo
285344` ` `
286345< /details>
287346
288347< details open>
289348< summary> ChatGLM3-6B< /summary>
290349
350+ ** CLI Demo**
351+
352+ Chat mode:
353+ ` ` ` sh
354+ python3 cli_demo.py -m ../chatglm3-ggml.bin -p 你好 --temp 0.8 --top_p 0.8
355+ ` ` `
356+
357+ Function call:
291358` ` ` sh
292- python3 cli_chat.py -m ../chatglm3-ggml.bin -p 你好 --temp 0.8 --top_p 0.8 # CLI demo
293- python3 web_demo.py -m ../chatglm3-ggml.bin --temp 0.8 --top_p 0.8 # web demo
359+ python3 cli_demo.py -m ../chatglm3-ggml.bin --temp 0.8 --top_p 0.8 --sp system/function_call.txt -i
294360` ` `
361+
362+ Code interpreter:
363+ ` ` ` sh
364+ python3 cli_demo.py -m ../chatglm3-ggml.bin --temp 0.8 --top_p 0.8 --sp system/code_interpreter.txt -i
365+ ` ` `
366+
367+ ** Web Demo**
368+
369+ Install Python dependencies and the IPython kernel for code interpreter.
370+ ` ` ` sh
371+ pip install streamlit jupyter_client ipython ipykernel
372+ ipython kernel install --name chatglm3-demo --user
373+ ` ` `
374+
375+ Launch the web demo:
376+ ` ` ` sh
377+ streamlit run chatglm3_demo.py
378+ ` ` `
379+
380+ | Function Call | Code Interpreter |
381+ | -----------------------------| --------------------------------|
382+ | ! [](docs/function_call.png) | ! [](docs/code_interpreter.png) |
383+
295384< /details>
296385
297386< details>
298387< summary> CodeGeeX2< /summary>
299388
300389` ` ` sh
301390# CLI demo
302- python3 cli_chat .py -m ../codegeex2-ggml.bin --temp 0 --mode generate -p " \
391+ python3 cli_demo .py -m ../codegeex2-ggml.bin --temp 0 --mode generate -p " \
303392# language: Python
304393# write a bubble sort function
305394"
@@ -312,7 +401,7 @@ python3 web_demo.py -m ../codegeex2-ggml.bin --temp 0 --max_length 512 --mode ge
312401< summary> Baichuan-13B-Chat< /summary>
313402
314403` ` ` sh
315- python3 cli_chat .py -m ../baichuan-13b-chat-ggml.bin -p 你好 --top_k 5 --top_p 0.85 --temp 0.3 --repeat_penalty 1.1 # CLI demo
404+ python3 cli_demo .py -m ../baichuan-13b-chat-ggml.bin -p 你好 --top_k 5 --top_p 0.85 --temp 0.3 --repeat_penalty 1.1 # CLI demo
316405python3 web_demo.py -m ../baichuan-13b-chat-ggml.bin --top_k 5 --top_p 0.85 --temp 0.3 --repeat_penalty 1.1 # web demo
317406` ` `
318407< /details>
@@ -321,7 +410,7 @@ python3 web_demo.py -m ../baichuan-13b-chat-ggml.bin --top_k 5 --top_p 0.85 --te
321410< summary> Baichuan2-7B-Chat< /summary>
322411
323412` ` ` sh
324- python3 cli_chat .py -m ../baichuan2-7b-chat-ggml.bin -p 你好 --top_k 5 --top_p 0.85 --temp 0.3 --repeat_penalty 1.05 # CLI demo
413+ python3 cli_demo .py -m ../baichuan2-7b-chat-ggml.bin -p 你好 --top_k 5 --top_p 0.85 --temp 0.3 --repeat_penalty 1.05 # CLI demo
325414python3 web_demo.py -m ../baichuan2-7b-chat-ggml.bin --top_k 5 --top_p 0.85 --temp 0.3 --repeat_penalty 1.05 # web demo
326415` ` `
327416< /details>
@@ -330,7 +419,7 @@ python3 web_demo.py -m ../baichuan2-7b-chat-ggml.bin --top_k 5 --top_p 0.85 --te
330419< summary> Baichuan2-13B-Chat< /summary>
331420
332421` ` ` sh
333- python3 cli_chat .py -m ../baichuan2-13b-chat-ggml.bin -p 你好 --top_k 5 --top_p 0.85 --temp 0.3 --repeat_penalty 1.05 # CLI demo
422+ python3 cli_demo .py -m ../baichuan2-13b-chat-ggml.bin -p 你好 --top_k 5 --top_p 0.85 --temp 0.3 --repeat_penalty 1.05 # CLI demo
334423python3 web_demo.py -m ../baichuan2-13b-chat-ggml.bin --top_k 5 --top_p 0.85 --temp 0.3 --repeat_penalty 1.05 # web demo
335424` ` `
336425< /details>
@@ -339,7 +428,7 @@ python3 web_demo.py -m ../baichuan2-13b-chat-ggml.bin --top_k 5 --top_p 0.85 --t
339428< summary> InternLM-Chat-7B< /summary>
340429
341430` ` ` sh
342- python3 cli_chat .py -m ../internlm-chat-7b-ggml.bin -p 你好 --top_p 0.8 --temp 0.8 # CLI demo
431+ python3 cli_demo .py -m ../internlm-chat-7b-ggml.bin -p 你好 --top_p 0.8 --temp 0.8 # CLI demo
343432python3 web_demo.py -m ../internlm-chat-7b-ggml.bin --top_p 0.8 --temp 0.8 # web demo
344433` ` `
345434< /details>
@@ -348,12 +437,12 @@ python3 web_demo.py -m ../internlm-chat-7b-ggml.bin --top_p 0.8 --temp 0.8 # we
348437< summary> InternLM-Chat-20B< /summary>
349438
350439` ` ` sh
351- python3 cli_chat .py -m ../internlm-chat-20b-ggml.bin -p 你好 --top_p 0.8 --temp 0.8 # CLI demo
440+ python3 cli_demo .py -m ../internlm-chat-20b-ggml.bin -p 你好 --top_p 0.8 --temp 0.8 # CLI demo
352441python3 web_demo.py -m ../internlm-chat-20b-ggml.bin --top_p 0.8 --temp 0.8 # web demo
353442` ` `
354443< /details>
355444
356- ** Load and optimize Hugging Face LLMs in one line of code **
445+ ** Converting Hugging Face LLMs at Runtime **
357446
358447Sometimes it might be inconvenient to convert and save the intermediate GGML models beforehand. Here is an option to directly load from the original Hugging Face model, quantize it into GGML models in a minute, and start serving. All you need is to replace the GGML model path with the Hugging Face model name or path.
359448` ` ` python
@@ -369,7 +458,7 @@ Processing model states: 100%|████████████████
369458
370459Likewise, replace the GGML model path with Hugging Face model in any example script, and it just works. For example:
371460` ` ` sh
372- python3 cli_chat .py -m THUDM/chatglm-6b -p 你好 -i
461+ python3 cli_demo .py -m THUDM/chatglm-6b -p 你好 -i
373462` ` `
374463
375464# # API Server
@@ -443,7 +532,7 @@ docker build . --network=host -t chatglm.cpp
443532# cpp demo
444533docker run -it --rm -v $PWD :/opt chatglm.cpp ./build/bin/main -m /opt/chatglm-ggml.bin -p " 你好"
445534# python demo
446- docker run -it --rm -v $PWD :/opt chatglm.cpp python3 examples/cli_chat .py -m /opt/chatglm-ggml.bin -p " 你好"
535+ docker run -it --rm -v $PWD :/opt chatglm.cpp python3 examples/cli_demo .py -m /opt/chatglm-ggml.bin -p " 你好"
447536# langchain api server
448537docker run -it --rm -v $PWD :/opt -p 8000:8000 -e MODEL=/opt/chatglm-ggml.bin chatglm.cpp \
449538 uvicorn chatglm_cpp.langchain_api:app --host 0.0.0.0 --port 8000
0 commit comments