Skip to content

How to run AgentFlow Benchmark with a locally trained Flow-GRPO checkpoint #33

@arghavan-kpm

Description

@arghavan-kpm

Hi,

I trained a model using the Flow-GRPO training pipeline following the README setup:

  • Base model: Qwen/Qwen2.5-7B-Instruct (served via vLLM)
  • MODEL_ENGINE in config.yaml: ["trainable", "gpt-4o-mini", "gpt-4o-mini", "gpt-4o-mini"]
    (i.e., I used gpt-4o-mini for the non-trainable model engines, while the trainable engine is the local vLLM-served model)

Now I’m trying to evaluate the trained checkpoint using the “AgentFlow Benchmark” section in the README. My understanding is:

  1. Serve the model checkpoint with vLLM
  2. Run the benchmark run.sh script

Question: Is that the correct evaluation flow?

If so, I’m confused about how to use my local training checkpoint with the provided benchmark scripts. The example scripts seem written for the published model on Hugging Face (AgentFlow-7B/agentflow-planner-7b). I’m not sure what to change in:

  • serve_vllm.sh
  • run.sh

…so they load my checkpoint saved under:
checkpoints/AgentFlow_pro/AgentFlow_pro/global_step_*

Could you please clarify what edits are needed and the recommended way to point the benchmark scripts to a locally trained checkpoint?

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions