feat: batch forward for prompt by rain1201 · Pull Request #25 · RightNow-AI/picolm

rain1201 · 2026-03-24T11:42:30Z

What does this PR do?

Add batch forward to process full prompt while reading the model for only one time.

Type of change

Performance improvement

Testing

Tested on x86-64 (Linux/macOS/Windows)
Tested with TinyLlama 1.1B Q4_K_M

Test command:

./picolm model.gguf -p "The capital of France is" -n 20 -t 0

Output:

Loading model: ../../tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
Model config:
  n_embd=2048, n_ffn=5632, n_heads=32, n_kv_heads=4
  n_layers=22, vocab_size=32000, max_seq=2048
  head_dim=64, rope_base=10000.0
Allocating 1.17 MB for runtime state (+ 44.00 MB FP16 KV cache)
Tokenizer loaded: 32000 tokens, bos=1, eos=2
Prompt: 6 tokens, generating up to 20 (temp=0.00, top_p=0.90, threads=4)
---
Paris.

2. B.C. The capital of ancient Rome was Rome.

3
---

Checklist

Code compiles without warnings (make native)
No new dependencies added
Memory usage not increased (check stderr output)

feat: batch forward for prompt

e3446cb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: batch forward for prompt#25

feat: batch forward for prompt#25
rain1201 wants to merge 1 commit intoRightNow-AI:mainfrom
rain1201:pr

rain1201 commented Mar 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rain1201 commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Type of change

Testing

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rain1201 commented Mar 24, 2026 •

edited

Loading