docs: clarify PAIBench-C reproduction seed and prompt format by Muneerali199 · Pull Request #211 · NVIDIA/cosmos

Muneerali199 · 2026-06-13T15:50:30Z

Adds a clarifying note to the transfer cookbook README addressing the two remaining questions from bhack on the PAIBench-C reproducibility issue:

Seed: All clips use --seed 2026 as the canonical reference seed
Prompt format: Prompts follow the structured prompt.json format shown in assets/*/

Remaining reproducibility gaps (prompt conversion, SAM2 determinism, evaluation) are tracked at #219.

Signed-off-by: Muneerali199 <alimuneerali245@gmail.com>

bhack · 2026-06-13T16:21:23Z

But how the structured prompts are generated for the dataset?

Also the problem is not only about reproducibility striclty it is that if you compared with the official PAIBench-C precomputed dataset seg GT it is not reproducible. Have you recomputed source segmentation for your paper/model card?

lfengad · 2026-06-15T02:58:20Z

@trungtpham for review? THX!

Muneerali199 · 2026-06-15T18:19:39Z

Thanks for looking at this. The structured prompts follow the format in assets/*/prompt.json — basically load that template and fill in the scene params per clip. The generation code is in cookbooks/cosmos3/generator/transfer/.

About the source segmentation — I haven't compared against the official PAIBench-C precomputed GT yet. I'll add a note in the cookbook saying that's still pending and link to the non-determinism tracker (#7) for now. Will follow up once I've done the validation.

bhack · 2026-06-15T20:09:43Z

I think we are quite far from reproducibility of the model card.

The remaining blocker for PAIBench-C reproduction is the prompt artifact.

PAIBench-C public prompts are natural-language captions in metadata.csv / captions/*.json, while the Cosmos3 cookbook uses a structured prompt.json schema.

Could you clarify exactly how the PAIBench-C captions were converted into Cosmos3 structured prompt.json files for Table 16?

In particular:

Were the public PAIBench-C captions used directly, or converted into structured Cosmos3 prompt.json?
If converted, was the input metadata.csv caption_text, captions/{task_id}.json, the source video, or some combination?
Is the conversion script / system prompt / model available?
Are the per-clip structured prompt.json files used for the 600 PAIBench-C examples available?
Did the reported Table 16 segmentation result use those structured prompts plus official HF sam2_vids/sam2_pkls, or were source segmentations recomputed?

Without those prompt files or the conversion recipe, the released specs/seed/control settings define the inference shape, but not an exact reproduction of the PAIBench-C table, because the prompt conditioning differs from the public PAIBench-C dataset.

Muneerali199 · 2026-06-16T17:48:54Z

Fair questions — though this PR is intentionally limited in scope. It just documents the seed (2026) and the prompt.json format that's already in the repo (under assets/*/), since those were the two open items from the original issue that could be clarified right away.

The deeper pipeline questions — how PAIBench-C captions were converted into structured prompts, what the per-clip files look like, whether the conversion script exists — are outside what this PR covers. Those would need a separate follow-up with the right context.

For now, this PR gets the basic recipe documented so someone can at least run with the same seed and prompt schema. The rest (conversion script, per-clip files, segmentation validation) is still TBD. Happy to open a tracking issue for that if it helps.

bhack · 2026-06-16T18:42:30Z

Ok as you know the internal process that produced Table 16 can you please open a tracking issue cause also after this PR is merged we have really 0 reproducibility for the conditioned metrics especially if you really used the structured prompt of the example for computing the table in the model card instead of using the official "flat" benchmark prompts.

Also your seed don't fix the sam2 random point sampling but only the WAN/DIT derived Cosmos generator.
See https://github.com/SHI-Labs/physical-ai-bench/blob/main/conditional_generation/models/grounded_sam_v2.py#L44C1-L83C18

So we need to know if you have wrote your custom evaluation/metrics as we don't have any public evidence of this.

Cause if you have used that original official benchmark code it is not deterministic and so also your model card was impacted.

Muneerali199 · 2026-06-16T19:19:34Z

Good points, especially about SAM2 — you're right that the seed only covers the generator side. I've opened #219 to track the remaining gaps (prompt conversion, per-clip files, SAM2 determinism, evaluation pipeline). That way we can close this docs PR and continue the conversation there.

docs: clarify PAIBench-C reproduction seed and prompt format

19708dc

Signed-off-by: Muneerali199 <alimuneerali245@gmail.com>

Muneerali199 mentioned this pull request Jun 13, 2026

Need released-code recipe to reproduce Cosmos3 PAIBench-C transfer results NVIDIA/cosmos-framework#14

Open

Muneerali199 mentioned this pull request Jun 16, 2026

Track remaining PAIBench-C reproducibility gaps #219

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: clarify PAIBench-C reproduction seed and prompt format#211

docs: clarify PAIBench-C reproduction seed and prompt format#211
Muneerali199 wants to merge 1 commit into
NVIDIA:mainfrom
Muneerali199:patch-transfer-readme

Muneerali199 commented Jun 13, 2026 •

edited

Loading

Uh oh!

bhack commented Jun 13, 2026

Uh oh!

lfengad commented Jun 15, 2026

Uh oh!

Muneerali199 commented Jun 15, 2026

Uh oh!

bhack commented Jun 15, 2026

Uh oh!

Muneerali199 commented Jun 16, 2026

Uh oh!

bhack commented Jun 16, 2026

Uh oh!

Muneerali199 commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Muneerali199 commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bhack commented Jun 13, 2026

Uh oh!

lfengad commented Jun 15, 2026

Uh oh!

Muneerali199 commented Jun 15, 2026

Uh oh!

bhack commented Jun 15, 2026

Uh oh!

Muneerali199 commented Jun 16, 2026

Uh oh!

bhack commented Jun 16, 2026

Uh oh!

Muneerali199 commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Muneerali199 commented Jun 13, 2026 •

edited

Loading