-
Notifications
You must be signed in to change notification settings - Fork 0
Usage
Eric Apgar edited this page Feb 13, 2026
·
2 revisions
When using a new model for the first time, you must specify remote=True so that the model can be pulled from the online Hugging Face repositories. After that, the model can be used offline. A Hugging Face token may or may not be required depending on the model.
The location param specifies where the model will be downloaded to for the first time use, and thereafter specifies where the model will be pulled from.
import llm
model = llm.model(name='openai/gpt-oss-20b')
model.load(location=<path to model cache dir>, remote=True)
response = model.ask(prompt='Tell me a joke.')
print(response)
After you've already downloaded the model from the first use.
import llm
model = llm.model(name='openai/gpt-oss-20b')
model.load(location=<path to model cache dir>)
response = model.ask(prompt='Tell me a joke.')
print(response)
| Parameter | Type | Description |
|---|---|---|
| location | str | Directory path to where models are stored. |
| remote | bool | Look online for models. |
| commit | str | Git Commit for specific model version. |
| quantization | str | Not yet supported. Model Quantization. |
| Parameter | Type | Description |
|---|---|---|
| prompt | str | What to ask of the LLM. |
| images | list | List of PIL.Image images. from PIL import image
|
| max_tokens | int | Max number of tokens to generate in response. |
| temperature | float | LLM temperature. |