Skip to content
Eric Apgar edited this page Feb 13, 2026 · 2 revisions

First Time Use

When using a new model for the first time, you must specify remote=True so that the model can be pulled from the online Hugging Face repositories. After that, the model can be used offline. A Hugging Face token may or may not be required depending on the model.

The location param specifies where the model will be downloaded to for the first time use, and thereafter specifies where the model will be pulled from.

import llm


model = llm.model(name='openai/gpt-oss-20b')
model.load(location=<path to model cache dir>, remote=True)
response = model.ask(prompt='Tell me a joke.')

print(response)

Typical Use

After you've already downloaded the model from the first use.

import llm


model = llm.model(name='openai/gpt-oss-20b')
model.load(location=<path to model cache dir>)
response = model.ask(prompt='Tell me a joke.')

print(response)

Parameters

llm.model.load()

Parameter Type Description
location str Directory path to where models are stored.
remote bool Look online for models.
commit str Git Commit for specific model version.
quantization str Not yet supported. Model Quantization.

llm.model.ask()

Parameter Type Description
prompt str What to ask of the LLM.
images list List of PIL.Image images. from PIL import image
max_tokens int Max number of tokens to generate in response.
temperature float LLM temperature.

Clone this wiki locally