Ethics — Fairness, Consent, Scientific Integrity

WELLab AI-Enabled Research & Impact Platform Washington University in St. Louis

This document defines the ethical framework governing the WELLab platform, including IRB compliance, informed consent for AI-driven insights, cross-cultural fairness auditing, reproducibility standards, data safeguards, participant rights, and model transparency.

IRB Compliance Framework
Informed Consent for AI-Driven Insights
Cross-Cultural Fairness Auditing
Reproducibility Standards
Individual vs Population Data Safeguards
Participant Data Rights
Model Transparency and Confidence Interval Reporting

1. IRB Compliance Framework

Governing Principles

All data collection, processing, and analysis on the WELLab platform operates under Institutional Review Board (IRB) approval from Washington University in St. Louis. The platform is designed to make IRB compliance structural — built into the system architecture rather than relying on individual researcher adherence.

IRB Protocol Requirements

Requirement	Implementation
Active protocol number	Stored in every Participant record (`consent.irb_protocol_id`). API rejects data submission without valid protocol.
Protocol expiration tracking	Automated alert 90 days before protocol expiration. Data collection halted if protocol lapses.
Amendments	Protocol amendments are version-tracked. System logs which consent version each participant signed.
Continuing review	Annual review checklists generated automatically from platform data (enrollment counts, adverse events, protocol deviations).
Adverse event reporting	Dedicated endpoint for logging adverse events. Automated notification to PI within 24 hours.
Protocol deviation logging	All deviations (e.g., data accessed outside protocol scope) are logged and flagged for PI review.

Data Collection Boundaries

The platform will not collect data types not specified in the active IRB protocol.
API validation enforces field-level restrictions — if a data field is not in the approved protocol, the API rejects it.
Cross-cultural datasets imported from external sources (HRS, SHARE, ELSA) must have separate data use agreements on file. The system tracks which datasets are authorized for which analyses.

Human Subjects Protections

Vulnerable populations: If the study enrolls participants with cognitive impairment, the platform supports proxy consent workflows and simplified UI modes.
Withdrawal: Participants can withdraw at any time via the app (Settings > Consent > Withdraw). Withdrawal triggers a cascade that halts data collection, flags existing data per protocol (retain de-identified or delete), and sends confirmation.
Compensation tracking: If participants are compensated, the platform tracks EMA completion milestones and generates compensation reports (without linking to financial systems).

2. Informed Consent for AI-Driven Insights

Consent Architecture

AI-generated insights represent a novel element of research participation that requires specific, informed consent. The platform separates AI consent from general study consent.

AI-Specific Consent Elements

Participants are informed about and consent to each of the following:

Element	Plain Language Description Provided to Participant
AI-generated summaries	"We use AI to create brief written summaries of your wellbeing patterns. These summaries are designed to be encouraging and informative, not diagnostic."
Coupling classification	"Our system identifies patterns in how your daily emotions relate to your overall satisfaction. This is a research classification, not a clinical assessment."
Risk scoring	"For participants in our cognitive health study, we compute a research-grade risk score. This score is for research purposes only and is not a medical diagnosis or prediction."
Data used for AI	"Your EMA responses, survey answers, and (if applicable) health information are processed by our AI system. No data is shared with external AI companies — all processing happens within our secure research infrastructure."
Claude API usage	"We use a commercial AI service (Anthropic Claude) to generate written summaries. Only aggregated, non-identifying data patterns are sent to this service. Your name, ID, and personal details are never included."
Right to opt out	"You can turn off AI-generated insights at any time in your app settings. This does not affect your participation in the study or your access to your own data."

Consent Granularity

The consent.ai_insights_opt_in field is a boolean, but the consent form provides granular detail. The platform also supports per-feature opt-outs:

{
  "ai_insights_opt_in": true,
  "ai_preferences": {
    "show_trend_summaries": true,
    "show_strength_badges": true,
    "show_trajectory_info": false,
    "allow_claude_api_processing": true
  }
}

Re-Consent Triggers

The platform triggers re-consent when:

A new AI capability is added that was not described in the original consent.
The AI model changes in a way that materially affects output (e.g., switching from Claude Sonnet to a different model family).
The data types used for AI processing expand beyond original scope.
IRB protocol amendment requires updated consent language.

Re-consent is delivered via in-app notification with the updated consent form. Data collection continues under the original consent terms until the participant re-consents or opts out.

3. Cross-Cultural Fairness Auditing

Fairness Framework

The platform audits all AI models for demographic fairness before deployment and on a monthly cadence. The goal is to ensure that model performance and outputs do not systematically disadvantage any demographic group.

Protected Attributes

Attribute	Categories Audited
Age	18-34, 35-49, 50-64, 65-79, 80+
Sex	Male, Female, Intersex
Race/Ethnicity	White, Black/African American, Hispanic/Latino, Asian, American Indian/Alaska Native, Native Hawaiian/Pacific Islander, Multiracial, Other
Education	Less than HS, HS diploma, Some college, Bachelor's, Graduate/Professional
Country	All countries represented in dataset
Income bracket	Quintiles Q1-Q5

Fairness Metrics

Demographic Parity

For classification outputs (e.g., coupling type, risk category):

Demographic Parity Ratio = P(positive outcome | group A) / P(positive outcome | group B)

Acceptable range: 0.80 - 1.25 (80% rule)

The platform computes this for every protected attribute × every classification output.

Disparate Impact

Disparate Impact Ratio = (Selection rate for protected group) / (Selection rate for reference group)

Threshold: > 0.80 (per EEOC guidelines, adapted for research context)

Equalized Odds

For predictive models (e.g., ADRD risk):

True Positive Rate difference across groups < 0.05
False Positive Rate difference across groups < 0.05

Calibration

Risk scores should be equally calibrated across groups:

For each risk decile, observed event rate should be similar across demographic groups.
Calibration slope per group: acceptable range 0.85 - 1.15

Fairness Audit Pipeline

The scripts/fairness_audit.py script implements the following pipeline:

1. Load model outputs and demographic data
2. For each protected attribute:
   a. Compute demographic parity ratio
   b. Compute disparate impact ratio
   c. Compute equalized odds (for predictive models)
   d. Compute calibration by group (for risk scores)
3. Generate report:
   a. Pass/fail for each metric × attribute combination
   b. Visualizations (grouped bar charts, calibration plots)
   c. Recommendations for remediation
4. Gate deployment:
   a. If any critical metric fails → block deployment, notify PI
   b. If any warning metric fails → flag for review, allow deployment with PI approval

Audit Schedule

Trigger	Scope
Pre-deployment (CI/CD)	All classification and prediction outputs
Monthly (scheduled)	Full audit across all models and demographic groups
On-demand (researcher request)	Specific model or subgroup analysis
Post-data-import	Fairness check on newly imported cross-cultural datasets

Remediation Actions

When a fairness metric fails:

Investigate: Determine whether the disparity reflects model bias or genuine population differences.
Document: Log the finding, investigation, and decision in the fairness audit trail.
Remediate (if model bias):
- Re-weight training data
- Add demographic covariates
- Apply post-processing calibration
- Re-audit after remediation
Accept (if genuine population difference):
- Document scientific justification
- Add contextual note to model outputs
- PI sign-off required

4. Reproducibility Standards

Principle

Every AI pipeline output on the WELLab platform must be fully reproducible. Given the same input data and code version, the system must produce identical results.

Implementation

Pinned Dependencies

All dependencies are version-pinned with lock files:

Language	Lock File	Tool
Python	`requirements.txt` + `requirements-lock.txt`	pip-compile
Node.js	`package-lock.json`	npm
CDK	`package-lock.json`	npm
R (via rpy2)	`renv.lock`	renv

No floating version ranges (e.g., ^1.0.0) in production dependencies. All versions are exact (e.g., ==1.0.0).

Deterministic Seeds

All stochastic operations use explicit random seeds:

# Every ML pipeline sets seeds at the top
import random
import numpy as np
import torch

RANDOM_SEED = 42  # Configurable per pipeline run

random.seed(RANDOM_SEED)
np.random.seed(RANDOM_SEED)
torch.manual_seed(RANDOM_SEED)
torch.cuda.manual_seed_all(RANDOM_SEED)

# For scikit-learn
from sklearn.cluster import KMeans
km = KMeans(n_clusters=4, random_state=RANDOM_SEED)

The random seed is stored in the model artifact metadata and logged with every output.

Version-Controlled Pipelines

Every model output includes:
{
  "model_version": "idels-coupling-v2.1.0",     # Semantic version
  "code_commit": "abc123def456",                  # Git commit hash
  "pipeline_run_id": "run-20260405-001",          # Unique run ID
  "random_seed": 42,                              # Seed used
  "input_data_hash": "sha256:abc123...",          # Hash of input data
  "dependency_hash": "sha256:def456...",          # Hash of lock file
  "training_started_at": "2026-04-05T02:00:00Z",
  "training_completed_at": "2026-04-05T02:45:00Z"
}

Model Registry

All model artifacts are stored in S3 with the following structure:

s3://wellab-models-{env}/
├── emotional-dynamics/
│   ├── idels-coupling-v2.1.0/
│   │   ├── model.pkl
│   │   ├── metadata.json
│   │   ├── training_data_manifest.json
│   │   ├── fairness_audit.json
│   │   └── validation_results.json
│   └── temporal-dynamics-v1.2.0/
│       └── ...
├── health-engine/
│   └── ...
├── lifespan-trajectory/
│   └── ...
└── cognitive-health/
    └── ...

Reproducibility Verification

Monthly automated jobs re-run a subset of analyses with stored seeds and verify output matches. Discrepancies trigger an alert and investigation.

5. Individual vs Population Data Safeguards

Principle

Individual-level data (participant-identifiable) and population-level data (aggregated, de-identified) are treated as fundamentally different and subject to different access rules, display rules, and storage rules.

Access Rules

Data Level	Who Can Access	How
Individual (own data)	The participant themselves	Participant Experience UI (authenticated, own data only)
Individual (research)	Authorized researchers on the IRB protocol	Researcher Dashboard (authenticated, audit-logged)
Individual (admin)	Platform administrators	Admin tools (authenticated, audit-logged, justification required)
Population (research)	Authorized researchers	Researcher Dashboard (aggregated views, no export of individual records)
Population (policy)	Policy viewers	Policy Dashboard (k-anonymized, no individual access possible)

k-Anonymity Enforcement

All population-level displays and exports enforce k-anonymity with k >= 10:

Any demographic cell with fewer than 10 individuals is suppressed (displayed as "< 10" or omitted).
Geographic aggregations are rolled up to the next level if a region has fewer than 10 participants.
Cross-tabulations that would create small cells are automatically simplified.

Implementation:

def enforce_k_anonymity(data: pd.DataFrame, group_cols: list, k: int = 10) -> pd.DataFrame:
    """Suppress groups with fewer than k individuals."""
    group_sizes = data.groupby(group_cols).size()
    valid_groups = group_sizes[group_sizes >= k].index
    return data[data.set_index(group_cols).index.isin(valid_groups)]

Display Rules

Rule	Participant UI	Researcher Dashboard	Policy Dashboard
Show individual scores	Own only	Yes (de-identified)	Never
Show individual trajectories	Own only (simplified)	Yes	Never
Show risk scores	Never (strength-framed narrative only)	Yes	Aggregated distribution only
Show comparisons to others	Never	Yes (group-level)	Yes (population-level)
Allow data export	Own data only	Aggregated datasets, with DUA	k-anonymized aggregates

Linkage Prevention

Participant IDs are pseudonymized differently for each dashboard.
The Policy Dashboard has no access to participant IDs whatsoever — it receives only pre-aggregated data from a dedicated Lambda function that enforces k-anonymity before returning results.
Researcher Dashboard shows study IDs, never real names or contact information.

6. Participant Data Rights

Rights Overview

Participants have the following rights over their data, accessible via the Participant Experience UI (Settings > My Data) or by contacting the research team:

Right	Description	Implementation
View	See all data the platform holds about them	"My Data" screen: browsable, searchable view of all observations, assessments, computed metrics, and AI-generated insights
Export	Download a complete copy of their data	"Export My Data" button → generates JSON + CSV archive → download link (24-hour expiry)
Correct	Request correction of inaccurate demographic data	"Edit Profile" for demographics. Observational data is immutable (corrections noted as amendments).
Delete	Request deletion of all personal data	"Delete My Data" button → confirmation flow → cascading deletion across DynamoDB + S3 + backups
Restrict	Limit processing of their data	Opt-out of specific AI features while remaining in the study
Object	Object to specific processing	Request review of how their data is being used; escalated to PI
Withdraw	Withdraw from the study entirely	"Withdraw" button → halts all data collection, triggers deletion or de-identification per protocol

Deletion Process

When a participant requests data deletion:

1. Participant clicks "Delete My Data" → confirmation screen explaining consequences
2. Participant confirms with re-authentication (password or biometric)
3. System sets participant status to "deletion_pending"
4. Immediate: Halt all data collection and AI processing for this participant
5. Within 24 hours:
   a. Delete all DynamoDB items with PK = PARTICIPANT#{id}
   b. Delete all S3 objects tagged with participant_id
   c. Remove participant from any cached model inputs
   d. Purge from CloudWatch logs (where technically feasible)
6. Within 7 days:
   a. Verify deletion across all storage tiers
   b. Send confirmation to participant
   c. Log deletion event in audit trail (without personal data)
7. Exception: De-identified data already included in published analyses
   may be retained per IRB protocol — participant is informed of this
   in the consent form.

Export Format

The data export package includes:

wellab-export-P-20250401-0042-20260405/
├── participant_profile.json
├── observations/
│   ├── observations.csv
│   └── observations.json
├── health_records/
│   ├── health_records.csv
│   └── health_records.json
├── lifespan_assessments/
│   ├── assessments.csv
│   └── assessments.json
├── cognitive_assessments/
│   ├── assessments.csv
│   └── assessments.json
├── interventions/
│   ├── interventions.csv
│   └── interventions.json
├── computed_metrics/
│   ├── coupling_classification.json
│   ├── volatility_scores.json
│   ├── trajectory_parameters.json
│   └── risk_scores.json
├── ai_insights/
│   └── generated_insights.json
└── export_metadata.json

7. Model Transparency and Confidence Interval Reporting

Transparency Principles

Every AI model output on the WELLab platform must be accompanied by sufficient information for a researcher to understand what the model did, how confident it is, and what its limitations are.

Required Metadata for Every Model Output

Field	Description	Example
`model_version`	Semantic version of the model	`idels-coupling-v2.1.0`
`model_type`	Human-readable model description	`Gaussian Mixture Model for coupling classification`
`n_observations`	Number of data points used	`284`
`confidence_interval`	95% CI for primary estimate	`[0.36, 0.48]`
`confidence_level`	Qualitative confidence label	`high` (>0.85 posterior), `moderate` (0.70-0.85), `low` (<0.70)
`assumptions`	Key model assumptions	`Linearity, normally distributed residuals, stationarity`
`limitations`	Known limitations for this output	`Small sample size for this demographic subgroup`
`training_data_description`	Description of training data	`N=2,450 adults aged 40-85, 65% female, 78% White, US sample`
`generalizability_note`	Who this model does and does not generalize to	`Validated on US and European samples. Not validated for East Asian populations.`
`last_fairness_audit`	Date of most recent fairness audit	`2026-04-01`

Confidence Interval Standards

Output Type	CI Method	Reporting
Regression coefficients	Profile likelihood or Wald	95% CI reported with every coefficient
Coupling classification	Posterior probability	Posterior probability for each class; minimum 0.70 for assignment
Risk scores	Bootstrap (1000 iterations)	95% CI around composite risk score
Survival curves	Greenwood's formula	95% confidence bands on all survival curves
Causal effects (ATE)	Bootstrap or influence function	95% CI + multiple estimator comparison
Trajectory cluster prevalence	Bootstrap	95% CI on prevalence estimates
Cross-cultural comparisons	Measurement invariance tests	Metric invariance required before comparison; reported with fit indices

Researcher-Facing Transparency

The Researcher Dashboard includes a "Model Card" panel for every visualization:

┌─────────────────────────────────────────────────┐
│ Model Card: IDELS Coupling Classification       │
│                                                 │
│ Version: idels-coupling-v2.1.0                  │
│ Type: Gaussian Mixture Model (4 components)     │
│ Training data: N=2,450 (US, 2023-2025)         │
│ Fairness audit: Passed (2026-04-01)             │
│ Assumptions: Linear within-person associations, │
│   normally distributed random effects            │
│ Limitations: Not validated for samples with     │
│   < 20 observations per participant              │
│ Reproduce: [Copy Parameters] [Download Artifact]│
└─────────────────────────────────────────────────┘

Participant-Facing Transparency

Participants see simplified transparency information:

Insights include a "How we calculated this" expandable section.
Risk-related information is never presented as certainty — always framed as "research estimates" with context.
The word "prediction" is avoided in participant-facing text. Instead: "pattern", "tendency", "association".

Policy-Facing Transparency

Policy dashboard outputs include:

Methodology summary (2-3 sentences, plain language).
Uncertainty ranges displayed as error bars or shaded confidence bands.
Sample size and population description for every statistic.
"What this does NOT tell us" section for key findings.
Citation to peer-reviewed methodology papers where applicable.

Model Deprecation

When a model version is superseded:

New version is deployed alongside old version for a validation period (30 days).
Outputs from both versions are compared for consistency.
Old version is marked as deprecated and removed from production endpoints.
Historical outputs retain their original model_version tag — they are never retroactively relabeled.
Deprecation is logged and communicated to researchers via dashboard notification.

FilesExpand file tree

ethics.md

Latest commit

History

ethics.md

File metadata and controls

Ethics — Fairness, Consent, Scientific Integrity

Table of Contents

1. IRB Compliance Framework

Governing Principles

IRB Protocol Requirements

Data Collection Boundaries

Human Subjects Protections

2. Informed Consent for AI-Driven Insights

Consent Architecture

AI-Specific Consent Elements

Consent Granularity

Re-Consent Triggers

3. Cross-Cultural Fairness Auditing

Fairness Framework

Protected Attributes

Fairness Metrics

Demographic Parity

Disparate Impact

Equalized Odds

Calibration

Fairness Audit Pipeline

Audit Schedule

Remediation Actions

4. Reproducibility Standards

Principle

Implementation

Pinned Dependencies

Deterministic Seeds

Version-Controlled Pipelines

Model Registry

Reproducibility Verification

5. Individual vs Population Data Safeguards

Principle

Access Rules

k-Anonymity Enforcement

Display Rules

Linkage Prevention

6. Participant Data Rights

Rights Overview

Deletion Process

Export Format

7. Model Transparency and Confidence Interval Reporting

Transparency Principles

Required Metadata for Every Model Output

Confidence Interval Standards

Researcher-Facing Transparency

Participant-Facing Transparency

Policy-Facing Transparency

Model Deprecation