inter-rater-reliability

Here are 8 public repositories matching this topic...

North-Shore-AI / anvil

Labeling queue library for managing human labeling workflows

machine-learning elixir otp annotation functional-programming beam labeling cohens-kappa human-in-the-loop data-labeling hitl nshkr-ingot inter-rater-reliability

Updated Apr 4, 2026
Elixir

devmance / SECI

Star

An open multi-rater benchmark for characterizing architectural fingerprints in identity-scaffolded LLMs.

benchmark open-science large-language-models llm multi-rater ai-benchmark ai-identity inter-rater-reliability simulated-emergence

Updated May 23, 2026
Python

KarmaEnchanter / mental-health-llm-eval

Star

Open evaluation harness for mental health LLM responses. 5 clinically-grounded rubrics, LLM-as-judge with bias controls, crisis-detection routing to 988 protocols.

psychology cbt ai-safety conversational-ai clinical-ai cohen-kappa ollama llm-evaluation llm-as-judge mental-health-ai ai-eval inter-rater-reliability eval-harness lifeline-988 open-source-eval

Updated May 23, 2026
Python

ZPD-Numbers-2025-2026 / Inter-observer-reliability

Star

Statistical validation of labeling consistency across three independent raters for a handwritten digit classification dataset.

python statistics jupyter-notebook image-classification handwritten-digits cohens-kappa fleiss-kappa inter-rater-reliability

Updated Apr 26, 2026
HTML

hayden-farquhar / Coronial-Recommendation-Taxonomy

Star

Multi-axis taxonomy of Australian coronial recommendations 1998-2026 — code, codebook, and pre-registered analysis outputs. Pre-registered at OSF DOI 10.17605/OSF.IO/NEX85 under CC-BY 4.0.

taxonomy text-classification australia reproducibility large-language-models injury-prevention inter-rater-reliability pre-registered coronial-recommendations

Updated May 23, 2026
Python

freyamurray / MSc-Dissertation

Star

The R scripts for my MSc dissertation "Developmental Trends in Children’s Understanding of COVID-19: A Draw, Write & Tell Study"

data-science r psychology r-markdown demographics content-analysis msc-project r-stats r-studio inter-rater-reliability child-psychology

Updated Feb 1, 2026
R

jgresswright / portfolio-llm-eval-quality

Star

Statistical analysis of inter-rater reliability and quality patterns in LLM evaluation systems using R

data-science statistics r-stats portfolio-project llm-evaluation inter-rater-reliability

Updated May 13, 2026
R

TRIP(U): Emotional travel recommendation system using NLP. Features a custom Gold Standard corpus of 500 labeled texts and supervised classification (SVM/Naive Bayes) for emotionally adaptive itineraries. Master's Thesis (TFM) - Distinction.

nlp data-science machine-learning big-data sentiment-analysis naive-bayes-classifier mit-license svm-classifier inclusive-design double-blind gold-standard responsible-ai cohen-kappa user-safety travel-recommendation inter-rater-reliability accesible-tourism

Updated May 19, 2026
Jupyter Notebook

Improve this page

Add a description, image, and links to the inter-rater-reliability topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the inter-rater-reliability topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inter-rater-reliability

Here are 8 public repositories matching this topic...

North-Shore-AI / anvil

devmance / SECI

KarmaEnchanter / mental-health-llm-eval

ZPD-Numbers-2025-2026 / Inter-observer-reliability

hayden-farquhar / Coronial-Recommendation-Taxonomy

freyamurray / MSc-Dissertation

jgresswright / portfolio-llm-eval-quality

jc-datarchitect / trip-u

Improve this page

Add this topic to your repo