Skip to content

[Bug]: Incorrect detection order for model_dir type causes IN_MEMORY loading failure #11666

@capyun007

Description

@capyun007

System Info

not important.

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

nothing

Expected behavior

nothing

actual behavior

nothing

additional notes

In tensorrt_llm/models/model_weights_loader.py, there is a logic error in detect_format() and preload() where the type checking order for model_dir is incorrect.
When model_dir is an in-memory object (dict or PreTrainedModel), the current implementation may incorrectly attempt filesystem checks, leading to unexpected behavior or runtime errors.

In the original implementation:
def detect_format(self):
if os.path.isfile(self.model_dir):
...
elif os.path.isdir(self.model_dir):
...
elif isinstance(self.model_dir, dict) or isinstance(self.model_dir, PreTrainedModel):
self.format = ModelWeightsFormat.IN_MEMORY

os.path.isfile() / os.path.isdir() are evaluated before checking whether model_dir is:
a dict
a PreTrainedModel

This causes issues when:
model_dir is an in-memory object
or a wrapper that does not behave like a filesystem path
A similar inconsistency exists in preload() where the branching order does not strictly match detect_format().

Fix:
def detect_format(self):
if isinstance(self.model_dir, dict) or isinstance(self.model_dir, PreTrainedModel):
self.format = ModelWeightsFormat.IN_MEMORY
elif os.path.isfile(self.model_dir):
......

def preload(self):
# Initialize shards and load_func
if isinstance(self.model_dir, PreTrainedModel):
shard_files = [dict(self.model_dir.named_parameters())]
elif os.path.isdir(self.model_dir):
shard_files = glob.glob(self.model_dir + "/*." + self.format.value)
......

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Inference runtime<NV>General operational aspects of TRTLLM execution not in other categories.bugSomething isn't workingwaiting for feedback

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions