Skip to content

Challenges on text selection

CarolinOdebrecht edited this page Sep 30, 2019 · 2 revisions

Challenges on text selection

We collect challenges on text selection and corpus composition for ELTeC. This is an report on the experiences in the Action / WG1. Please provide information in the following structure. This structure will help to compare the challenges across language collections. Copy for each report the following phrases:

Report for language collection XYZ

  • Challenges in finding / indentifying texts (e.g. due to lack of catalogue data) ca. 100 words
  • Challenges in meeting the sampling criteria (e.g. finding texts on a certain length or in a certain period) ca. 100 word
  • Challenges in meeting the balancing criteria (e.g. balancing texts with respect to length criterion) ca. 100 words

What's here

E5C-discussion-paper ELTeC Corpus Composition Criteria Compliance Calculations : draft for discussion

Challenges-on-text-selection Reports on challenges regarding text selection and balancing

Workflow Step-by-step introduction for contributing texts to ELTeC.

Uploading-files-on-GitHub-Step-by-Step How to upload texts on GitHub

textFeatures Table of textual features and their encodings

teiHeaders Instructions for compiling an ELTeC Header

choosingTitles Suggestions on how to select texts for ELTeC

Versioning-Guidelines-for-ELTeC Draft for defining our versioning guidelines.

Filenames and identifiers: A proposal

Please feel free to add ideas and discussion notes

Call-for-Contributions What texts can you contribute?

Example-Texts Add an example here!

ELTeC-List-of-Candidates Draft table for text candidates

Online-Text-Collections Some links to less well known collections

Clone this wiki locally