'Evidence fragments' are the text elements of a scientific paper that specifically describes the evidence presented in figures to statements about their scientific meaning. We here describe a corpus of fragments derived from papers concerned with molecular interaction experiments (curated into Pathway Logic or INTACT). All literature-based data is derived from open access papers.
- Only curate paragraphs from the results sections of primary experimental papers (i.e., papers where original experimental research was performed).
- Tag each high-level clause with one of 7 codes:
none,hypothesis,problem,goal,method,result,implication(derived from http://arxiv.org/abs/1702.05398). - For any clause, associate them with a figure or a subfigure.
- Use codes like
f1a,f1b, etc to link to subfigures - Use whole numbers to link to figures without subfigures or all aggregated subfigures in a given panel.
- Do not link information intended as background (i.e, that cites other papers).
- If a clause cites (data not shown), link it to
uX(where X is a number). - If a clause includes more than one figure use a pipe separator: e.g,
f1a|f1b
- Use codes like
The subdirectories of this directory are organized as follows.
- 00_uncurated - TSV files to be processed
- 01_discourse_tags_complete - TSV files where all discourse tags are included
- 02_expt_spans_complete - TSV files where all discourse tags and figure columns are included
- 03_primary_claims_complete - TSV files where all discourse tags, figure columns, and checks for primary statements are included