Skip to content

Commit 7dac466

Browse files
committed
adding expected threshold for Tau Retail
1 parent d957771 commit 7dac466

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

eval_protocol/benchmarks/test_tau_bench_retail.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,7 @@ def tau_bench_retail_to_evaluation_row(data: List[Dict[str, Any]]) -> List[Evalu
108108
],
109109
rollout_processor=MCPGymRolloutProcessor(),
110110
rollout_processor_kwargs={"domain": "retail"},
111+
passed_threshold={"success": 0.65, "standard_error": 0.02},
111112
num_runs=8,
112113
mode="pointwise",
113114
max_concurrent_rollouts=50,

0 commit comments

Comments
 (0)