CEDD — Conversational Emotional Drift Detection

Hackathon Report / Rapport de hackathon

Mila x Bell x Kids Help Phone / Jeunesse, J'ecoute March 16--23, 2026

Team 404HarmNotFound

Name	Role	Background
Shuchita Singh	ML / NLP Lead	AI Researcher, LLM fine-tuning, RAG, NLP, Responsible AI
Amanda Wu	UX / Presentation	Principal Product Designer (TD), 10 yrs FinTech UX, HCI Master (UCL)
Priyanka Naga	Clinical / Strategy	Business Leader (Ottawa Hospital), Innovation Lead (CHEO), MSc Data Science, PMP
Dominic D'Apice	Infra / Code / ML	Dev II AI Infra Azure, 25+ yrs Linux/DevOps, Data Science cert (TELUQ), Kaggle

1. Executive Summary (Resume executif)

CEDD (Conversational Emotional Drift Detection) is a real-time safety layer designed to sit alongside a youth mental health chatbot serving Canadians aged 16--22. Unlike systems that evaluate each message in isolation, CEDD monitors the trajectory of emotional deterioration across an entire conversation---detecting when messages grow shorter, more negative, lose questions and hope words, or drift semantically toward crisis language.

Key results:

90.0% cross-validated accuracy (+-1.6%, stratified k=4), up from a 66.7% baseline
67 engineered features combining lexical analysis, multilingual sentence embeddings, and behavioral coherence signals
36/36 adversarial test cases passing across 20 attack categories, with 0 critical misses
600 bilingual synthetic conversations (French + English) for training, including 120 adversarial scenarios
5-step warm handoff to a simulated Kids Help Phone counselor at crisis level
Fully bilingual (Quebecois French + Canadian English) with culturally sensitive detection for somatization, identity conflict, and withdrawal patterns

CEDD processes emotional drift in approximately 0ms (feature extraction + ML classification), calling an LLM only to generate the adaptive response. This makes it fundamentally more efficient and explainable than multi-agent approaches that require multiple GPT-4-class calls per message.

2. Problem Statement (Enonce du probleme)

The gap in youth mental health AI (La lacune dans l'IA en sante mentale jeunesse)

Youth mental health chatbots face a critical safety challenge: detecting when a conversation is drifting toward crisis. Current approaches have significant blind spots:

Per-message analysis misses the trajectory. A message saying "I'm fine" after six increasingly distressed messages is not fine---but a single-message classifier cannot know this.
Keyword-only detection is trivially defeated by circumlocution, sarcasm, code-switching, or cultural expression patterns that do not use Western English crisis vocabulary.
LLM-based detection (e.g., EmoAgent) requires multiple expensive API calls per message, creating latency, cost, and availability concerns for a 24/7 crisis service.
Monolingual systems systematically fail Francophone, Indigenous, and culturally diverse Canadian youth who express distress differently.

Kids Help Phone statistics (Statistiques de Jeunesse, J'ecoute)

46% of youth supported identify as 2SLGBTQ+
10% identify as Indigenous (2x population proportion)
75% share something they have never told anyone else
Suicide contacts among youth 13 and under doubled in 4 years
71% of youth prefer non-verbal communication (JMIR, May 2025)
44% of 988 callers abandon before connecting (GSA/OES)
20% seek suicide help via text vs 5% via phone in Ontario (CBC)

These numbers demand a system that can detect distress across languages, cultures, and communication styles---and that hands off to human support seamlessly rather than displaying a phone number and ending the conversation.

3. Solution Overview (Apercu de la solution)

CEDD operates as a lightweight, explainable safety layer with two phases:

Phase 1: Offline training (Entrainement hors ligne)

generate_synthetic_data.py  -->  600 bilingual conversations (Claude Haiku API)
                                        |
                                  train.py (cross-validate, fit, save)
                                        |
                                  models/cedd_model.joblib

Phase 2: Live detection (Detection en temps reel)

User message
     |
     v
Feature Extractor --> 10 features/message --> 67 trajectory features
     |
     v
Classifier (6-gate safety logic + GradientBoosting)
     |
     v
Alert Level: Green / Yellow / Orange / Red
     |
     v
Response Modulator --> Adaptive system prompt --> LLM response
     |                        \
     v                         --> Warm handoff at Red (simulated KHP counselor "Alex")
Session Tracker --> SQLite (cross-session longitudinal tracking)

Design philosophy: Asymmetric errors. Over-alerting (false positives) is always preferable to missing a crisis (false negatives). The 6-gate safety logic enforces this: safety keyword rules override ML predictions by design, and low-confidence predictions default to elevated alert levels.

4. Track 1: Adversarial Stress-Testing (Tests adversariaux)

Overview (Apercu)

We built a comprehensive adversarial test suite (tests/adversarial_suite.py) with 36 test cases across 20 attack categories, designed to probe CEDD's robustness against the types of inputs that would cause a deployed youth mental health system to fail dangerously.

Test categories (Categories de tests)

#	Category	What it tests	Language
1	`false_positive_physical`	Physical pain without emotional distress (back pain, nausea)	FR
2	`sarcasm`	Sarcastic crisis language ("dying of laughter")	FR
3	`negation`	Negated distress ("I'm NOT suicidal")	EN
4	`code_switching`	Mid-conversation language switching (FR/EN)	Mixed
5	`quebecois_slang`	Quebecois expressions ("je suis brule", "j'ai la misere")	FR
6	`gradual_drift_no_keywords`	Slow deterioration without explicit crisis words	EN
7	`direct_crisis`	Explicit suicidal ideation	FR/EN
8	`hidden_intent`	Crisis masked in casual language	EN
9	`manipulation_downplay`	"I was just joking about wanting to die"	EN
10	`somatization`	Emotional distress expressed as physical symptoms	FR
11	`identity_conflict`	2SLGBTQ+ rejection and identity distress	EN
12	`sudden_escalation`	Rapid shift from casual to crisis	FR
13	`active_bypass`	Deliberate attempts to bypass safety detection	EN
14	`rapid_recovery_manipulation`	Fake recovery after crisis signals	EN
15	`cultural_false_positive`	Cultural expressions that look like distress	FR
16	`neurodivergent_pattern`	Literal/flat communication that is not distress	EN
17	`emoji_only`	Emoji-only messages (no text to analyze)	--
18	`repeated_word`	Repeated single word ("ok ok ok ok")	EN
19	`short_recovery`	Brief message after recovery	EN
20	`long_message`	Very long messages with buried crisis signals	EN
21	`neutral_personne_fr`	French "personne" used as noun, not crisis keyword	FR
22	`emoji_crisis`	Crisis expressed through emoji sequences	--

Progress (Progression)

Milestone	Tests	Passed	Key Fix
Baseline (March 10)	10	7	--
Post data expansion	10	9	320 training conversations
Post keyword fix	10	10	Crisis keyword expansion
Post features (67D)	13	13	Embeddings + coherence
Post 480 convos	13	13	Data augmentation
Post 600 convos	30	30	Adversarial archetypes
Current (word-boundary fix)	36	36	Regex `\b`, context-aware "personne", feminine forms

Safety regression gate (Porte de regression de securite)

The test suite enforces a strict safety contract:

Exit code 0: All tests pass
Exit code 1: Non-critical failures (over-alerting within tolerance)
Exit code 2: Critical miss --- a crisis conversation was classified as Green or Yellow. This blocks any merge to main.

A critical miss means a youth in crisis would receive a generic supportive response instead of crisis intervention and resources. This is the worst possible outcome for a safety-critical application.

Word-boundary fix (Correction des limites de mots)

A significant Track 1 finding was that substring-based keyword matching caused both false positives and false negatives:

"morte de rire" (dying of laughter) matched "mort" (death) --- false positive
"une personne formidable" (a wonderful person) matched "personne" (nobody) --- false positive
"mortel" (awesome, slang) matched "mort" (death) --- false positive

We replaced all if word in text checks with regex \b word-boundary matching and added context-aware handling for ambiguous words like "personne" (which means "nobody" in isolation but "a person" when preceded by an article). We also added feminine forms to all lexicons (e.g., "epuisee", "abandonnee", "desesperee").

5. Track 2: Logic Hardening (Durcissement logique)

6-gate safety logic (Logique de securite a 6 portes)

The classifier uses a 6-gate decision pipeline that layers ML predictions with deterministic safety rules:

Gate 1: < 3 user messages?         --> Return Green (insufficient data)
Gate 2: Safety keyword floor       --> Crisis words force Orange/Red minimum
Gate 3: ML prediction              --> GradientBoosting on 67D feature vector
Gate 4: Low confidence (< 45%)?    --> Default to Yellow (cautious)
Gate 5: Short conversation (< 6)?  --> Cap at Orange maximum
Gate 6: Safety floor enforcement   --> ML cannot go below keyword level

Rationale: Gates 2 and 6 ensure that explicit crisis language always triggers an appropriate response, regardless of what the ML model predicts. Gate 4 ensures that uncertain predictions err on the side of caution. Gate 5 prevents premature Red alerts when there is insufficient conversational context.

Feature engineering (Ingenierie des features)

We evolved from 7 basic features to 67 clinically motivated features across four layers:

Layer 1: Per-message lexical features (10 features)

Feature	Clinical rationale
`word_count`	Shortening messages indicate withdrawal and disengagement
`punctuation_ratio`	Changes in punctuation reflect emotional state shifts
`question_presence`	Loss of questions indicates loss of engagement and curiosity
`negative_score`	Rising negativity across bilingual (FR+EN) lexicon
`finality_score`	Finality/ending language ("plus rien", "end it all")
`hope_score`	Declining hope words signal deterioration
`length_delta`	Message-to-message length changes track behavioral shifts
`negation_score`	Negated positive states ("can't cope", "ne suis pas bien")
`identity_conflict_score`	2SLGBTQ+/cultural identity distress signals
`somatization_score`	Physical + emotional word co-occurrence (cultural expression)

Layer 2: Trajectory statistics (60 features)

Each of the 10 per-message features is aggregated over the full conversation using 6 statistics: mean, standard deviation, slope, last value, maximum, and minimum. This captures the trajectory --- not just the current state, but the direction and volatility of emotional drift.

The slope statistic is particularly important: finality_score_slope captures whether finality language is increasing over time, while hope_score_slope captures whether hope is declining.

Layer 3: Semantic embedding features (4 features)

Using paraphrase-multilingual-MiniLM-L12-v2 (a multilingual sentence transformer), we compute:

Feature	What it measures
`embedding_drift`	Mean cosine distance between consecutive messages --- semantic wandering
`crisis_similarity`	Cosine similarity of the last message to a crisis language centroid
`embedding_slope`	PCA-reduced 1D slope --- directional semantic drift over time
`embedding_variance`	Mean pairwise cosine distance --- overall conversation coherence

These features catch distress expressed through synonyms, paraphrases, and indirect language that lexical features alone would miss. The multilingual model handles both French and English natively.

Layer 4: Behavioral coherence features (3 features)

Feature	What it measures
`short_response_ratio`	Fraction of user messages with fewer than 5 words (disengagement)
`min_topic_coherence`	Minimum cosine similarity between consecutive messages (topic jumping)
`question_response_ratio`	Fraction of assistant questions followed by a responsive reply

These features detect behavioral withdrawal --- the pattern where a distressed user stops engaging meaningfully with the chatbot, giving short or tangential replies.

Negation handling (Gestion des negations)

A critical logic hardening improvement: detecting negated positive states. "I'm fine" and "I'm not fine" have opposite meanings, but both contain positive words. We added regex-based negation patterns for both French and English:

French: ne ... pas bien, ne ... plus capable, ne ... jamais
English: can't cope, not okay, never going to be

Word-boundary precision (Precision des limites de mots)

Replaced all substring matching (if word in text) with regex word-boundary matching (\b), plus context-aware rules:

"personne" preceded by a French article ("une", "la", "cette") is not a finality word
Feminine forms added to all lexicons for gender-inclusive detection
Prevents false positives from compound words ("mortel", "immortel")

6. Track 3: Synthetic Data Augmentation (Augmentation de donnees synthetiques)

Data generation pipeline (Pipeline de generation de donnees)

All training data is 100% synthetic --- no real PII, per hackathon rules. We used the Claude Haiku API (generate_synthetic_data.py) to create realistic youth mental health conversations.

Standard archetypes (Archetypes standards): 480 conversations

4 classes: Green (normal), Yellow (concerning), Orange (significant distress), Red (crisis)
60 conversations per class per language = 60 x 4 x 2 = 480
Bilingual: authentic Quebecois French + Canadian English
12 user messages + 12 assistant messages per conversation

Adversarial archetypes (Archetypes adversariaux): 120 conversations

We identified 6 adversarial patterns that standard training data fails to cover:

Archetype	What it captures	Why standard data misses it
`physical_only`	Distress expressed purely through physical symptoms	Standard data uses emotional vocabulary
`sarcasm_distress`	Sarcastic or ironic crisis language	Standard data is direct
`adversarial_bypass`	Deliberate attempts to evade detection	Standard data is cooperative
`identity_distress`	2SLGBTQ+ and cultural identity crisis	Standard data uses generic scenarios
`neurodivergent_flat`	Flat affect that is not distress	Standard data has clear emotional arcs
`crisis_with_deflection`	Crisis signals immediately followed by deflection	Standard data is consistent

Adding these 120 adversarial conversations:

Improved CV variance from +-4.4% to +-1.5% (more stable model)
Improved sample:feature ratio from 7.2:1 to 9.0:1
Enabled 30/30 adversarial test pass rate (up from 13/13 on fewer tests)

Quality annotation (Annotation qualite)

We built a Claude-based quality annotation pipeline that scored each synthetic conversation on realism, label accuracy, and clinical plausibility. While filtering low-quality conversations hurt accuracy (removing edge cases the model needed to learn from), the annotation insights informed which adversarial archetypes to add. The annotation scripts and intermediate data files were removed after serving their purpose.

Bilingual integrity (Integrite bilingue)

Every conversation is generated in its target language with authentic regional expressions:

French: Quebecois vocabulary, informal register (tutoiement), regional expressions
English: Canadian context (university, CEGEP references, Canadian cultural elements)
No translation artifacts: conversations are generated natively in each language, not translated

7. Architecture

System architecture (Architecture du systeme)

+---------------------------------------------------------------------+
|                        CEDD Safety Layer                             |
|                                                                      |
|  +------------------+    +------------------+    +----------------+  |
|  | Feature Extractor|    |    Classifier    |    |   Response     |  |
|  |                  |--->|                  |--->|   Modulator    |  |
|  | 10 features/msg  |    | 6-gate safety    |    |                |  |
|  | 67 trajectory    |    | GradientBoosting |    | Adaptive       |  |
|  | Bilingual NLP    |    | StandardScaler   |    | system prompts |  |
|  | Embeddings       |    |                  |    | LLM fallback   |  |
|  +------------------+    +------------------+    +----------------+  |
|                                                         |            |
|  +------------------+                           +-------v--------+   |
|  | Session Tracker  |                           | LLM Chain      |   |
|  | SQLite           |                           | Cohere         |   |
|  | Cross-session    |                           | Groq           |   |
|  | Withdrawal       |                           | Gemini         |   |
|  | detection        |                           | Claude         |   |
|  +------------------+                           | Static text    |   |
|                                                  +----------------+  |
+---------------------------------------------------------------------+
                              |
                    +---------v---------+
                    |   Streamlit UI    |
                    |   Bilingual       |
                    |   FR / EN toggle  |
                    +-------------------+

ML pipeline (Pipeline ML)

Scaler: StandardScaler --- normalizes 67 features to mean=0, std=1
Model: GradientBoostingClassifier --- 200 trees, max_depth=3, learning_rate=0.1
Validation: StratifiedKFold with k=4
Serialization: joblib format (models/cedd_model.joblib)

LLM fallback chain (Chaine de repli LLM)

To ensure 24/7 availability for a crisis service, CEDD uses a 5-step fallback chain:

Cohere --- primary
Groq (Llama 3.3 70B Versatile) --- fastest inference
Google Gemini (2.5 Flash) --- tertiary
Anthropic Claude (Haiku) --- quaternary
Static emergency text --- guaranteed availability with crisis resources

If all LLM providers are down, the static fallback still provides Kids Help Phone contact information. No youth in crisis should ever see a blank screen.

Alert levels (Niveaux d'alerte)

Level	Color	Description	LLM behavior
0	Green	Normal conversation	Supportive, open-ended questions
1	Yellow	Concerning signs (fatigue, loneliness)	Enhanced emotional validation
2	Orange	Significant distress	Active support + crisis resources mentioned
3	Red	Potential crisis	Urgent referral, KHP resources prominently displayed

Warm handoff (Transfert accompagne)

At Red alert, CEDD replaces the industry-standard "cold referral" (display a phone number, end conversation) with a 5-step accompanied transition:

Empathetic acknowledgment --- validate feelings, no hotline number yet
Permission-based transition --- ask consent, frame as "upgrade" not rejection
Context bridge --- generate anonymized summary for KHP responder
Seamless connection --- text-based (686868), same modality, no story repetition
Background monitoring + follow-up --- CEDD stays active, acknowledges returning users

The simulated counselor "Alex" uses an ASIST-trained (Applied Suicide Intervention Skills Training) persona with short responses (2-4 sentences), one question at a time, and consistent emotional validation before any questions.

Cross-session tracking (Suivi inter-sessions)

SQLite-based longitudinal tracking records:

Session metadata (user, timestamps, max alert level, message count)
Alert events (level, confidence, trigger message)
Handoff events (step reached, user response)
Last activity timestamps (for withdrawal detection)

Users returning after >24 hours without closing their session trigger a withdrawal risk check and receive a welcome-back message.

8. Results and Metrics (Resultats et metriques)

Improvement trajectory (Trajectoire d'amelioration)

Stage	CV Accuracy	Features	Training Data	Adversarial Tests
Baseline (March 10)	66.7% +-26.4%	42 (7x6)	24 conversations	7/10
Data expansion	91.2% +-1.5%	48 (8x6)	320 conversations	9/10
+Negation +Embeddings	92.2% +-1.8%	52	320 conversations	9/10
+Identity +Somatization +Coherence	92.5% +-1.5%	67	320 conversations	13/13
Data expansion to 480	91.7% +-4.4%	67	480 conversations	13/13
Adversarial augmentation to 600	90.5% +-1.5%	67	600 conversations	30/30
Word-boundary fix (current)	90.0% +-1.6%	67	600 conversations	36/36

Current metrics (Metriques actuelles)

Metric	Value
Cross-validated accuracy (k=4, stratified)	90.0% +-1.6%
Feature count	67 (60 trajectory + 4 embedding + 3 coherence)
Training conversations	600 (480 standard + 120 adversarial)
Languages	2 (French + English, 300 each)
Adversarial tests	36/36 passing, 0 critical misses
Adversarial test categories	20
Sample:feature ratio	9.0:1 (target: 10:1)
Train accuracy	~100% (expected with 200 trees on 600 samples)

Top features by model importance (Features dominantes)

Rank	Feature	Importance	What it measures
1	`word_count_max`	0.192	Longest message in conversation
2	`word_count_slope`	0.179	Whether messages are getting shorter over time
3	`word_count_last`	0.138	Length of the most recent message
4	`length_delta_mean`	0.075	Average change in message length

The dominance of word-count-related features aligns with clinical research: progressive shortening of messages is one of the strongest behavioral signals of emotional withdrawal and disengagement. This is a signal that per-message classifiers cannot detect --- only trajectory analysis reveals it.

Key insight (Constat cle)

The accuracy dropped slightly from 92.5% to 90.0% as we added adversarial training data. This is expected and desirable: the model traded a small amount of overall accuracy for significantly better robustness on adversarial edge cases. The CV variance also stabilized dramatically (from +-26.4% to +-1.6%), indicating a much more reliable model.

9. User Experience (Experience utilisateur)

Bilingual interface (Interface bilingue)

The Streamlit application supports full French/English toggle, with all UI elements, system prompts, feature labels, and crisis resources available in both languages.

Key UX features (Fonctionnalites UX principales)

Feature	Description
Welcome card	Branded HTML card with brain emoji, bilingual title/description, call to action
Demo profiles	5 selectable profiles (team members + Guest) with distinct longitudinal trajectories: stable green, gradual improvement, fluctuating, escalating, and fresh start
Feature importance chart	Collapsible Plotly horizontal bar chart showing top 5 features by composite score (model importance x scaled value), with 6 color categories and bilingual labels
Feature radar chart	Plotly Scatterpolar showing 10 per-message features normalized 0-1, with current message in alert-level color and first message as green ghost overlay
Side-by-side compare mode	Split view: raw LLM response (no system prompt) vs CEDD-modulated response. Demonstrates the value of adaptive prompting on crisis messages
Demo autopilot	Auto-plays a 9-message scenario (Felix in FR, Alex in EN) showing Green-to-Yellow-to-Orange drift. Judges can observe the full trajectory hands-free
Alert transition animation	CSS-animated toast notification when alert level increases
Chat timestamps	HH:MM timestamps below each message bubble
LLM source badge	Colored badge on each assistant message showing which LLM generated it
Alert level badge	Colored dot on each assistant message showing the CEDD classification
Export transcript	Download conversation + alert history as JSON
Simulated counselor handoff	At Red level, interactive handoff to "Alex" (KHP counselor persona) with blue gradient UI, distinct avatar, and ASIST-trained responses
About panel	Collapsible bilingual explanation of CEDD's operation

Competitive UX audit findings (Resultats de l'audit UX concurrentiel)

Amanda Wu audited 6 platforms (ChatGPT, Gemini, Character.AI, Wysa, Woebot) across desirability, usability, and accessibility. No platform currently offers all of:

Canadian-specific crisis resources (Kids Help Phone) --- CEDD does
French-language crisis detection --- CEDD does
Subtle/coded distress detection --- CEDD's trajectory analysis does
Warm handoff to human responder --- CEDD does (5-step flow)
Cross-session memory --- CEDD does (SQLite longitudinal tracking)

10. Canadian Multicultural Context (Contexte multiculturel canadien)

Canada is the most diverse G7 country. Crisis detection trained only on Western English expressions will systematically fail the most vulnerable youth. CEDD addresses this with culturally informed feature engineering:

Cultural Group	Expression Pattern	CEDD Detection
Indigenous	Storytelling, substance references, holistic/spiritual framing	Trajectory features detect drift patterns regardless of vocabulary
South Asian	Somatization --- "my chest hurts" = emotional pain	`somatization_score` detects physical + emotional word co-occurrence
East Asian	Withdrawal, silence, minimizing ("I'm fine")	`short_response_ratio` + `question_response_ratio` catch disengagement
Francophone	French-language distress, Quebecois dialect	Native FR lexicons + bilingual embeddings
2SLGBTQ+	Identity conflict, coded rejection language	`identity_conflict_score` with dedicated phrase lists
Neurodivergent	Literal language, shutdown, emotional bursts	Temporal trajectory distinguishes from sustained crisis

Why this matters (Pourquoi c'est important)

With 46% of Kids Help Phone youth identifying as 2SLGBTQ+ and 10% identifying as Indigenous, culturally insensitive detection is not just a technical limitation --- it is a safety failure that disproportionately affects the most at-risk populations.

11. Competitive Analysis (Analyse concurrentielle)

CEDD vs EmoAgent (Princeton/Michigan, arXiv:2504.09689)

EmoAgent is the closest academic reference point. It uses a multi-agent architecture with GPT-4o for both detection and response.

Dimension	CEDD	EmoAgent
Detection speed	~0ms (feature extraction + ML)	Multiple GPT-4o calls per message
Detection cost	$0 (local computation)	~$0.04-0.10 per message
Explainability	Full (67 named features, composite scores, importance chart)	Black box (LLM reasoning)
Bilingual	Native FR + EN (lexicons, embeddings, prompts)	English only
Cross-session	SQLite longitudinal tracking + withdrawal detection	Per-conversation only
Cultural sensitivity	Somatization, identity conflict, coherence features	None
Clinical tools	4-level alerts + 6-gate safety logic + warm handoff	PHQ-9, PDI, PANSS (validated scales)
Availability	5-model LLM fallback chain + static emergency text	Single provider dependency

Our positioning: "EmoAgent needs 4 GPT-4o calls per message. CEDD detects in ~0ms with 67 features and GradientBoosting, and only calls the LLM to modulate the response. 5-model fallback chain ensures availability. It is the difference between an IDS that deep-inspects all traffic and a lightweight edge firewall --- and ours works in both French and English."

What EmoAgent does better (Ce qu'EmoAgent fait mieux)

EmoAgent uses clinically validated instruments (PHQ-9, PDI, PANSS) as evaluation metrics. CEDD's thresholds are engineering-derived, not clinically validated. Borrowing simplified validated scales as complementary metrics is a planned improvement.

12. Limitations and Future Work (Limites et travaux futurs)

Current limitations (Limites actuelles)

Limitation	Impact	Mitigation
No clinical validation	Alert thresholds are engineering-derived	6-gate safety logic provides deterministic safety floor
ML unreliable for short conversations (< 6 messages)	Cannot build trajectory from insufficient data	Gate 5 caps at Orange maximum
Sarcasm and periphrasis remain challenging	"je pese sur tout le monde" not reliably detected	Embeddings complement lexical features but do not fully solve this
Sample:feature ratio at 9.0:1 (ideal is 10:1)	Slight risk of overfitting	Adversarial augmentation improved from 7.2:1
Identity conflict detection is phrase-based	May miss coded or indirect identity distress	Embeddings provide partial coverage
Over-prediction on short/ambiguous green conversations	Physical-only or neurodivergent patterns may trigger Orange	Documented as acceptable (over-alerting preferred to under-alerting)
Train accuracy ~100%	Indicates model memorization	CV accuracy (90.0%) shows generalization is reasonable

Future work (Travaux futurs)

Improvement	Expected impact	Effort
LSTM/Transformer sequence model	+10-15% accuracy (learns message ordering natively)	3-4 hrs
Minimization detection	Cross-reference "I'm fine" with behavioral signals	1-2 hrs
Burst vs sustained temporal patterns	Better neurodivergent/ADHD handling	2-3 hrs
Clinical validation study	Validated thresholds from mental health professionals	Weeks
Intra-session timing analysis	Detect pauses and response delays as signals	1-2 hrs

13. Conclusion

CEDD demonstrates that effective youth mental health crisis detection does not require expensive multi-agent LLM architectures. By combining trajectory-based feature engineering (67 features across lexical, embedding, and behavioral layers), deterministic safety gates, and adaptive LLM response modulation, CEDD achieves:

Reliable detection: 90.0% +-1.6% cross-validated accuracy on 600 bilingual conversations
Adversarial robustness: 36/36 test cases passing across 20 attack categories, 0 critical misses
Explainability: every alert is traceable to specific named features and their scores
Cultural sensitivity: somatization, identity conflict, and withdrawal detection for Canada's diverse youth population
Operational resilience: 5-model LLM fallback chain with static emergency text guarantee
Humane crisis response: 5-step warm handoff that accompanies youth to help rather than abandoning them with a phone number

The improvement trajectory tells the story: from 66.7% accuracy with 42 features and 24 conversations, to 90.0% accuracy with 67 features and 600 conversations. From 7/10 adversarial tests to 36/36. Every iteration was guided by the principle that in a youth mental health application, missing a crisis is the one failure mode that is never acceptable.

CEDD is not a replacement for human counselors. It is the safety layer that ensures no youth in crisis falls through the cracks of an AI conversation --- and that when crisis is detected, the transition to human support is seamless, respectful, and accompanied.

14. References and Resources (References et ressources)

Emergency resources (Ressources d'urgence)

Kids Help Phone / Jeunesse, J'ecoute: 1-800-668-6868 (24/7, free, confidential) --- text: 686868
Suicide Crisis Helpline / Ligne de crise suicide: 9-8-8 (988.ca)
Multi-Ecoute: 514-378-3430 (multiecoute.org)
Tracom: 514-483-3033 (tracom.ca)
Emergency services / Services d'urgence: 911

Academic references (References academiques)

EmoAgent: Multi-Agent Framework for Emotional Support (Princeton/Michigan, arXiv:2504.09689)
ASIST (Applied Suicide Intervention Skills Training) --- LivingWorks
Kids Help Phone Annual Report 2024 --- Service statistics and demographics
JMIR (May 2025) --- Youth communication preferences in mental health contexts

Technical references (References techniques)

paraphrase-multilingual-MiniLM-L12-v2 --- Sentence-Transformers (Reimers & Gurevych, 2019)
GradientBoostingClassifier --- scikit-learn (Pedregosa et al., 2011)
Streamlit --- open-source app framework for ML

Report prepared by Team 404HarmNotFound for the Mila x Bell x Kids Help Phone Hackathon, March 2026. All training data is 100% synthetic --- no real PII was used at any stage.

FilesExpand file tree

report.md

Latest commit

History