TTBench: LLM Benchmark for Test-Time-Compute

This is a repository for TTBench, the Test-Time Compute Benchmark.

This benchmark features CoT queried for multiple LLMs on a variety of mathematical and reasoning datasets. The few-shot query process and answer extraction are standardised for every dataset, which eases the burden on researchers in terms of time and money.

Installation

Please, install this becnhmark from the source.

pip install .

It requires api_responses.zip (download from Google Drive) file containing a database. For the following example, let us assume this file is in your code directory.

Example

from ttbench import load, DatasetType, LLMType
    
dataset, [llm1,llm2] = load(DatasetType.SVAMP, [LLMType.LLaMA3B32, LLMType.Qwen72B25], api_path="api_responses.zip")

for question_id, dataentry in dataset:
    print("Question: ", dataentry.question)
    print("True answer: ", dataentry.answer)
    llm1_response = llm1(question_id, N=20)
    print("Cost: $", llm1_response.cost)
    print("1st CoT answer: ",  llm1_response.cots[0].answer)

Refer to examples folder for more examples of the benchmark evaluation

Cost modelling

We also provide a procedure to model the dollar cost for each query. This ensures the fair comparison between test-time-compute methods.

from ttbench import load, DatasetType, LLMType

dataset, [llm, ] = load(DatasetType.CommonsenseQA, [LLMType.Mixtral8x7B], api_path="api_responses.zip")

question_id = 42
response = llm(question_id, N=2)

print(f"Request processing cost: ${response.request.cost:0.9f}")
print(f"First CoT response cost: ${response.cots[0].metadata.cost:0.9f}")
print(f"Total LLM query cost: ${response.cost:0.9f}")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TTBench: LLM Benchmark for Test-Time-Compute

Installation

Example

Cost modelling

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

TTBench: LLM Benchmark for Test-Time-Compute

Installation

Example

Cost modelling