Skip to content

Caching judge completion #36

@geoalgo

Description

@geoalgo

Currently, we use set_langchain_cache() to cache completions which does not work with vLLM for some reason and offer little control.

I would propose to support a caching mechanism for judge completions, e.g. when calling annotate_battles
https://github.com/OpenEuroLLM/JudgeArena/blob/main/judgearena/evaluate.py#L261 which caches judge-annotations in files like {judge_arena_dir}/cache/db/{benchmark}/{judge}.db (the naming would allow the user to manually delete some entries easily).

For the storage, we could use sqlite which would not require any dependency.

For the schema, we could use "benchmark", "instruction_id", "model_a", "model_b", "judge" as keys and retrieve any entry from the db if we get a hit and generate otherwise.

We could store the following which would allow to return completion from cache and perform small analysis:

@dataclass
class AnnotationEntry:
    benchmark: str
    instruction_id: str
    model_a: str
    model_b: str
    judge: str
    judge_input: str
    judge_completion: str
    date: str

Any thoughts @ErlisLushtaku @kargibora ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions