I tested the current implementation and it seems that prefix cache is not yet supported. When I input the same image and prompt multiple times, the inference speed remains the same each time — neither the image nor the prompt appears to be reused.
Do you have plans to add prefix/encoder cache functionality (similar to what vLLM offers) in the future?
I tested the current implementation and it seems that prefix cache is not yet supported. When I input the same image and prompt multiple times, the inference speed remains the same each time — neither the image nor the prompt appears to be reused.
Do you have plans to add prefix/encoder cache functionality (similar to what vLLM offers) in the future?