Skip to content

Multiple conversation turns in android app #3864

@chuber11

Description

@chuber11

I have a short question. I build and uploaded an android app deploying LLama3 (https://bwsyncandshare.kit.edu/s/t3898Ge7AZ6SWBn).
However, I couldnt get the model to continue using the last conversation turns.

I assume that the kv cache is stored in module_ internally and here
https://github.com/pytorch/executorch/blob/main/examples/models/llama2/runner/runner.cpp#L175
only the last decoded token and the position index of that token is given to the model. Is that correct?

To use the last conversation turns within the next prompt I tried to start here
https://github.com/pytorch/executorch/blob/main/examples/models/llama2/runner/runner.cpp#L277
not with 0 as start position index but with the number of tokens which were decoded during the last conversation turns. However, that didn't work, because the model didn't remember the last conversations (I tried e.g. "My name is Christian" -> answer -> "What is my name?"). Is my approach wrong?

For performance reasons I don't want to give the whole conversation history multiple times to the model.

Best,
Christian

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions