Currently, we use set_langchain_cache() to cache completions which does not work with vLLM for some reason and offer little control.
I would propose to support a caching mechanism for judge completions, e.g. when calling annotate_battles
https://github.com/OpenEuroLLM/JudgeArena/blob/main/judgearena/evaluate.py#L261 which caches judge-annotations in files like {judge_arena_dir}/cache/db/{benchmark}/{judge}.db (the naming would allow the user to manually delete some entries easily).
For the storage, we could use sqlite which would not require any dependency.
For the schema, we could use "benchmark", "instruction_id", "model_a", "model_b", "judge" as keys and retrieve any entry from the db if we get a hit and generate otherwise.
We could store the following which would allow to return completion from cache and perform small analysis:
@dataclass
class AnnotationEntry:
benchmark: str
instruction_id: str
model_a: str
model_b: str
judge: str
judge_input: str
judge_completion: str
date: str
Any thoughts @ErlisLushtaku @kargibora ?
Currently, we use
set_langchain_cache()to cache completions which does not work with vLLM for some reason and offer little control.I would propose to support a caching mechanism for judge completions, e.g. when calling
annotate_battleshttps://github.com/OpenEuroLLM/JudgeArena/blob/main/judgearena/evaluate.py#L261 which caches judge-annotations in files like
{judge_arena_dir}/cache/db/{benchmark}/{judge}.db(the naming would allow the user to manually delete some entries easily).For the storage, we could use sqlite which would not require any dependency.
For the schema, we could use
"benchmark", "instruction_id", "model_a", "model_b", "judge"as keys and retrieve any entry from the db if we get a hit and generate otherwise.We could store the following which would allow to return completion from cache and perform small analysis:
Any thoughts @ErlisLushtaku @kargibora ?