Because
(I'm using the term dual annotation to indicate manual annotation redundantly done by any number of annotators more than one)
So far, all the annotation projects we've worked on had single annotation. Based on that fact, we designed workflow regarding processing of annotation data (raw >> gold, organization under batches and dates, etc.) without consideration of
- IAA measurement
- adjudication/curation for merging dual annotation
However, in the latest annotation effort - RFB - we started dual annotation, at least for a subset of the whole dataset. And I think it's now a time to discuss how we want to host dual annotations and the adjudicated single set "raw" data in this public repo. Concretely,
- We need fixed terms to indicate
- raw manual annotation (currently called
raw, hereinafter "raw")
- adjudicated "gold" annotation (currently no such thing, hereinafter
"gold")
- machine-ready "public" annotation (currently called
gold, hereinafter "release")
- Do we want to host both "raw" and "gold", or "gold" only?
- How do we publish the adjudication process, if any. I can imagine all-manual adjudication and code-assisted adjudication. In the latter, should we consider special handling of adjudication code, just like
process.py?
- Where should the IAA calculation results be reported? In
README, or a separate file/directory?
And maybe more questions.
Starting this issue to discuss details Any input is welcome!
Done when
We set a guideline or template for handling
- dual "raw" annotation files
- IAA reports
- documentation of adjudication process
Additional context
No response
Because
(I'm using the term
dual annotationto indicate manual annotation redundantly done by any number of annotators more than one)So far, all the annotation projects we've worked on had single annotation. Based on that fact, we designed workflow regarding processing of annotation data (raw >> gold, organization under batches and dates, etc.) without consideration of
However, in the latest annotation effort - RFB - we started dual annotation, at least for a subset of the whole dataset. And I think it's now a time to discuss how we want to host dual annotations and the adjudicated single set "raw" data in this public repo. Concretely,
raw, hereinafter"raw")"gold")gold, hereinafter"release")process.py?README, or a separate file/directory?And maybe more questions.
Starting this issue to discuss details Any input is welcome!
Done when
We set a guideline or template for handling
Additional context
No response