Labeling queue library for managing human labeling workflows
-
Updated
Apr 4, 2026 - Elixir
Labeling queue library for managing human labeling workflows
An open multi-rater benchmark for characterizing architectural fingerprints in identity-scaffolded LLMs.
Open evaluation harness for mental health LLM responses. 5 clinically-grounded rubrics, LLM-as-judge with bias controls, crisis-detection routing to 988 protocols.
Statistical validation of labeling consistency across three independent raters for a handwritten digit classification dataset.
Multi-axis taxonomy of Australian coronial recommendations 1998-2026 — code, codebook, and pre-registered analysis outputs. Pre-registered at OSF DOI 10.17605/OSF.IO/NEX85 under CC-BY 4.0.
The R scripts for my MSc dissertation "Developmental Trends in Children’s Understanding of COVID-19: A Draw, Write & Tell Study"
Statistical analysis of inter-rater reliability and quality patterns in LLM evaluation systems using R
TRIP(U): Emotional travel recommendation system using NLP. Features a custom Gold Standard corpus of 500 labeled texts and supervised classification (SVM/Naive Bayes) for emotionally adaptive itineraries. Master's Thesis (TFM) - Distinction.
Add a description, image, and links to the inter-rater-reliability topic page so that developers can more easily learn about it.
To associate your repository with the inter-rater-reliability topic, visit your repo's landing page and select "manage topics."