Dear authors, thank you for releasing TTT-Discover.
I am reimplementing the method and ran into some ambiguity around the generation path for Qwen3-8B.
From reading the released code, I see that the rollout code appears to instantiate the same TwoPhaseTokenCompleter regardless of model choice:
|
policy = TwoPhaseTokenCompleter( |
|
sampling_client=sampling_client, |
|
tokenizer=tokenizer, |
|
phase1_max_tokens=phase1_max_tokens, |
|
temperature=temperature, |
|
) |
However, that completer appears to be written specifically for GPT-OSS:
|
PHASE2_PREFILL = "\n\n... okay, I am out of thinking tokens. I need to send my final message now." |
|
# Full marker to transition from analysis to final channel |
|
GPTOSS_FINAL_MARKER = "<|end|><|start|>assistant<|channel|>final<|message|>" |
|
# Marker that indicates we're already in the final channel |
|
GPTOSS_FINAL_CHANNEL_INDICATOR = "<|channel|>final<|message|>" |
|
if self._contains_subsequence(phase1_tokens, self.GPTOSS_FINAL_CHANNEL_INDICATOR): |
Meanwhile, the Qwen3 renderer uses a different chat format, and stops on <|im_end|>. Thus, I'm not sure whether the two phase decoding is silently skipped when Qwen3-8B is used for the math and circle packing tasks.
Could I check if the intended behavior is any of the following:
- there is a true two-phase decoding with a Qwen-specific string to do the token forcing at token 26,000, or
- the two-phase token completer runs as is, but this means it tries to match GPT-OSS specific harmony channel tags to a string that follows the Qwen chat template, so those conditions silently fail, or
- Qwen3-8B just uses ordinary single-phase decoding.
Thank you!
Dear authors, thank you for releasing TTT-Discover.
I am reimplementing the method and ran into some ambiguity around the generation path for Qwen3-8B.
From reading the released code, I see that the rollout code appears to instantiate the same TwoPhaseTokenCompleter regardless of model choice:
discover/ttt_discover/rl/train.py
Lines 340 to 345 in 5df1a0e
However, that completer appears to be written specifically for GPT-OSS:
discover/ttt_discover/tinker_utils/completers.py
Lines 60 to 64 in 5df1a0e
discover/ttt_discover/tinker_utils/completers.py
Line 120 in 5df1a0e
Meanwhile, the Qwen3 renderer uses a different chat format, and stops on
<|im_end|>. Thus, I'm not sure whether the two phase decoding is silently skipped when Qwen3-8B is used for the math and circle packing tasks.Could I check if the intended behavior is any of the following:
Thank you!