Skip to content

Conversation

@philip-essential
Copy link

This adds support for Rnj-1, which is an 8B model we just released. We've been using llama.cpp to play around with the model internally, and we released a GGUF checkpoint for the instruction-tuned version.

The model architecture is similar enough to Gemma3 that in Transformers/VLLM/SGLang we can reuse the same model file. However, in llama.cpp we need some small changes, so I've added a new implementation, based closely on the Gemma3 one. The changes are:

  • All layers use global attention.
  • Long-context is via YaRN.

Because our huggingface config.json uses "Gemma3ForCausalLM" as the architecture, convert_hf_to_gguf.py is unable to tell that these configs are for Rnj-1. The solution I came up with is to manually change the architecture to Rnj1ForCausalLM before converting the checkpoint. I added a note in convert_hf_to_gguf.py about this. But perhaps there's a better solution?

@CISC
Copy link
Collaborator

CISC commented Dec 6, 2025

Because our huggingface config.json uses "Gemma3ForCausalLM" as the architecture, convert_hf_to_gguf.py is unable to tell that these configs are for Rnj-1. The solution I came up with is to manually change the architecture to Rnj1ForCausalLM before converting the checkpoint. I added a note in convert_hf_to_gguf.py about this. But perhaps there's a better solution?

Instead change llm_build_gemma3_iswa into a templated llm_build_gemma3, like f.ex. smallthinker and add support for YaRN and non-SWA in Gemma3Model conversion.

@faisal-fida
Copy link

faisal-fida commented Dec 7, 2025

@philip-essential Just following up on PR #17811 (Rnj-1 support).

Currently hitting an error: unknown model architecture: 'rnj1' when trying to load the GGUF. Any chance we can prioritize merging this so the community can use Rnj-1?

@sirmo
Copy link

sirmo commented Dec 7, 2025

I tested the current fork of this PR and it works pretty well with the published gguf Q4 quants. The model follows OpenCode (TUI coding agent) instructions well in my brief testing. Neat model!

(though 32K context size is a bit limiting for local coding agents) this might be a great agentic model for efficient execution. Thank you for all your work!

Hardware tested on: 7900xtx with ROCm backend.

@philip-essential
Copy link
Author

Instead change llm_build_gemma3_iswa into a templated llm_build_gemma3, like f.ex. smallthinker and add support for YaRN and non-SWA in Gemma3Model conversion.

That makes sense. I can try and do that soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

model Model specific python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants