Structured Semantic Modeling of Citation Intents
CiTelling is a radically new model offine-grained semantic structures lying behind citational sentences.
We provide a dataset of citation intents with 1380 instances to be employed within Machine Learning scenarios.
The project follows an incremental approach, therefore the first step is the construction of a dataset, annotated by hand, containing the information useful for the follwing classification phase.
After a careful analysis of citational intents and meaning, we ended up with five categories (or classes): Analyze, Compare, Extend, Propose and Use.
To the semantic model we add the concepts of Object and Context, respectively the topic covered by the citation and the context to disambiguate it. In addition, the roles covered by the citation are distinguished: A-subj if the source paper plays the role of subject in presenting the object, B-subj if it the cited paper is the only related to the object. Finally, we subcategorize some classes: Analyze with the subclass 'Critique', Use with 'Use-data' and Compare with 'Contrast'. Subcategories are indicated through the attribute flags.
The dataset consists of 5 files in csv format, corresponding to the five classes of citations, with values separated by tabs. The first line represents the header.
text: (Peter et al 2019) uses the pm10-2019 dataset for analyze air pollution.
object: pm10-2018
context: for analyze air pollution
role: B-subj
flags: Data
- Roger Ferrod (student) - roger.ferrod@unito.it
- Luigi Di Caro - luigi.dicaro@unito.it
- Claudio Schifanella - claudio.schifanella@unito.it
This project is licensed under the MIT License - see the LICENSE.md file for details
Supervisors:
- Luigi Di Caro
- Claudio Schifanella