Multiple conversation turns in android app

I have a short question. I build and uploaded an android app deploying LLama3 (https://bwsyncandshare.kit.edu/s/t3898Ge7AZ6SWBn).
However, I couldnt get the model to continue using the last conversation turns.

I assume that the kv cache is stored in module_ internally and here
https://github.com/pytorch/executorch/blob/main/examples/models/llama2/runner/runner.cpp#L175
only the last decoded token and the position index of that token is given to the model. Is that correct?

To use the last conversation turns within the next prompt I tried to start here
https://github.com/pytorch/executorch/blob/main/examples/models/llama2/runner/runner.cpp#L277
not with 0 as start position index but with the number of tokens which were decoded during the last conversation turns. However, that didn't work, because the model didn't remember the last conversations (I tried e.g. "My name is Christian" -> answer -> "What is my name?"). Is my approach wrong?

For performance reasons I don't want to give the whole conversation history multiple times to the model.

Best,
Christian

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multiple conversation turns in android app #3864

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Multiple conversation turns in android app #3864

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions