Skip to content

Was two phase decoding actually used for Qwen3-8B? #11

@cheongalc

Description

@cheongalc

Dear authors, thank you for releasing TTT-Discover.

I am reimplementing the method and ran into some ambiguity around the generation path for Qwen3-8B.

From reading the released code, I see that the rollout code appears to instantiate the same TwoPhaseTokenCompleter regardless of model choice:

policy = TwoPhaseTokenCompleter(
sampling_client=sampling_client,
tokenizer=tokenizer,
phase1_max_tokens=phase1_max_tokens,
temperature=temperature,
)

However, that completer appears to be written specifically for GPT-OSS:

PHASE2_PREFILL = "\n\n... okay, I am out of thinking tokens. I need to send my final message now."
# Full marker to transition from analysis to final channel
GPTOSS_FINAL_MARKER = "<|end|><|start|>assistant<|channel|>final<|message|>"
# Marker that indicates we're already in the final channel
GPTOSS_FINAL_CHANNEL_INDICATOR = "<|channel|>final<|message|>"

if self._contains_subsequence(phase1_tokens, self.GPTOSS_FINAL_CHANNEL_INDICATOR):

Meanwhile, the Qwen3 renderer uses a different chat format, and stops on <|im_end|>. Thus, I'm not sure whether the two phase decoding is silently skipped when Qwen3-8B is used for the math and circle packing tasks.

Could I check if the intended behavior is any of the following:

  • there is a true two-phase decoding with a Qwen-specific string to do the token forcing at token 26,000, or
  • the two-phase token completer runs as is, but this means it tries to match GPT-OSS specific harmony channel tags to a string that follows the Qwen chat template, so those conditions silently fail, or
  • Qwen3-8B just uses ordinary single-phase decoding.

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions