traning data

Hello author, could you please tell me how the training data here is constructed? Is it directly using the prompts in the appendix and dumped by the deepseek-r1 model? Another question: what is the difference between this paper and search-o1 in the non-long text writing part? This paper seems to be a special extension of search-o1 in cultural and creative fields. Is that the case?