-
Notifications
You must be signed in to change notification settings - Fork 19
Description
Hello Dr. Li, Dr. Miao, and Dr. Li,
First, thank you for your excellent ICLR 2025 paper, "SIMPLE IS EFFECTIVE: THE ROLES OF GRAPHS AND LARGE LANGUAGE MODELS IN KNOWLEDGE-GRAPH-BASED RETRIEVAL-AUGMENTED GENERATION." The SubgraphRAG framework you proposed is very impressive and insightful.
I have been exploring your official GitHub repository to gain a deeper understanding of the experimental setup. The provided code and processed data (like entity_identifiers.txt and gpt_triples.pth) have been incredibly helpful.
I am currently trying to follow the full data generation pipeline, and I couldn't seem to locate the scripts for the initial preprocessing steps. I would be very grateful if you could provide some clarification on this.
Specifically, I'm looking for:
- The script or methodology used to process the original WebQSP and CWQ datasets to create the
WebQSP-subandCWQ-subvariants mentioned in the paper. - The script used to interact with the GPT-4o API (using the prompt from Appendix E) to generate the labeled triples that are stored in
gpt_triples.pth.
Having access to these scripts or any pointers you could provide would be immensely helpful for my understanding and for potentially reproducing your valuable results.
Thank you again for your time and for this significant contribution to the field.
Best regards,
Martin Wang