Was two phase decoding actually used for Qwen3-8B?

Dear authors, thank you for releasing TTT-Discover.

I am reimplementing the method and ran into some ambiguity around the generation path for Qwen3-8B.

From reading the released code, I see that the rollout code appears to instantiate the same TwoPhaseTokenCompleter regardless of model choice: https://github.com/test-time-training/discover/blob/5df1a0ee9b04272ca33de0101ae64dd499e63f29/ttt_discover/rl/train.py#L340-L345

However, that completer appears to be written specifically for GPT-OSS:
https://github.com/test-time-training/discover/blob/5df1a0ee9b04272ca33de0101ae64dd499e63f29/ttt_discover/tinker_utils/completers.py#L60-L64

https://github.com/test-time-training/discover/blob/5df1a0ee9b04272ca33de0101ae64dd499e63f29/ttt_discover/tinker_utils/completers.py#L120

Meanwhile, the Qwen3 renderer uses a different chat format, and stops on `<|im_end|>`. Thus, I'm not sure whether the two phase decoding is silently skipped when Qwen3-8B is used for the math and circle packing tasks. 

Could I check if the intended behavior is any of the following:
- there is a true two-phase decoding with a Qwen-specific string to do the token forcing at token 26,000, or
- the two-phase token completer runs as is, but this means it tries to match GPT-OSS specific harmony channel tags to a string that follows the Qwen chat template, so those conditions silently fail, or
- Qwen3-8B just uses ordinary single-phase decoding.

Thank you!

	policy = TwoPhaseTokenCompleter(
	sampling_client=sampling_client,
	tokenizer=tokenizer,
	phase1_max_tokens=phase1_max_tokens,
	temperature=temperature,
	)

	PHASE2_PREFILL = "\n\n... okay, I am out of thinking tokens. I need to send my final message now."
	# Full marker to transition from analysis to final channel
	GPTOSS_FINAL_MARKER = "<\|end\|><\|start\|>assistant<\|channel\|>final<\|message\|>"
	# Marker that indicates we're already in the final channel
	GPTOSS_FINAL_CHANNEL_INDICATOR = "<\|channel\|>final<\|message\|>"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Was two phase decoding actually used for Qwen3-8B? #11

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Was two phase decoding actually used for Qwen3-8B? #11

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions