Skip to content

Add TupleTimeTextProcessor for Temporal Text Data#829

Open
Rian354 wants to merge 27 commits intosunlabuiuc:masterfrom
Rian354:master
Open

Add TupleTimeTextProcessor for Temporal Text Data#829
Rian354 wants to merge 27 commits intosunlabuiuc:masterfrom
Rian354:master

Conversation

@Rian354
Copy link
Contributor

@Rian354 Rian354 commented Feb 8, 2026

Add TupleTimeTextProcessor for Temporal Text Data

Summary/TLDR

Adds TupleTimeTextProcessor, a processor for handling clinical text paired with temporal information (time differences). Allows for modality routing in multimodal pipelines.

Key Features

  • Temporal text handling: Processes (List[str], List[float]) tuples
  • Automatic modality routing: type_tag enables automatic encoder selection
  • Clean interface: Returns (texts, time_tensor, modality_tag)
  • Registered processor: Available via string key "tuple_time_text"

I/O

Input:  Tuple[List[str], List[float]]
        - List[str]: Clinical text entries
        - List[float]: Time differences between entries

Output: Tuple[List[str], torch.Tensor, str]
        - List[str]: Same text entries (unmodified)
        - torch.Tensor: 1D float tensor of time differences
        - str: Type tag for modality routing (default: "note")

Files Added

  • pyhealth/processors/tuple_time_text_processor.py (107 loc)
  • tests/test_tuple_time_text_processor.py (1 test)
  • docs/api/processors/pyhealth.processors.TupleTimeTextProcessor.rst (docs)

Files Modified

  • pyhealth/processors/__init__.py (export added)
  • docs/api/processors.rst (documentation index)
  • examples/text_embedding_tutorial.ipynb (added example section)

Example Usage

from pyhealth.processors import TupleTimeTextProcessor

# Initialize processor with modality tag
processor = TupleTimeTextProcessor(type_tag="clinical_note")

# Process temporal text data
texts = ["Admission note", "Progress note", "Discharge summary"]
time_diffs = [0.0, 24.0, 72.0]  # hours since admission

texts_out, time_tensor, tag = processor.process((texts, time_diffs))
# texts_out: ["Admission note", "Progress note", "Discharge summary"]
# time_tensor: tensor([0., 24., 72.])
# tag: "clinical_note"

Testing

pytest tests/test_tuple_time_text_processor.py -v
# 1 passed

Rian354 and others added 27 commits December 8, 2025 03:08
- Created TupleTimeTextProcessor for (text, time_diff) tuples
- Handles temporal clinical text with automatic modality routing
- Comprehensive test suite (16 tests, all passing)
- Added processor documentation in docs/api/processors/
- Updated tutorial notebook with multimodal fusion examples
- Registered processor with 'tuple_time_text' string key
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant