Does the current implementation support prefix/encoder cache?

I tested the current implementation and it seems that prefix cache is not yet supported. When I input the same image and prompt multiple times, the inference speed remains the same each time — neither the image nor the prompt appears to be reused.

Do you have plans to add prefix/encoder cache functionality (similar to what vLLM offers) in the future?