Skip to content
Merged

TpT #80

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions docs/API/estimators/trees/tpt_decision_tree_classifier.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Time-penalised Trees Decision Tree Classifier

??? tip "What is the Time-penalised Trees (TpT) Decision Tree Classifier?"
The `TpTDecisionTreeClassifier` is a longitudinal-aware decision tree that extends the standard CART algorithm with
a **time-penalised split gain**. At a parent node observed at time $t_p$, a candidate split evaluated at time
$t_c$ has its information gain $\Delta I$ scaled by an exponential penalty $e^{-\gamma\,(t_c - t_p)}$. The
splitter therefore prefers earlier waves unless later observations bring a substantially stronger signal, which
yields sparse-in-time and interpretable trees.

TpT can consume both `wide` longitudinal matrices (with `features_group`) and LONG-format dataframes (one row per
`(subject, time)` observation) by setting `assume_long_format=True` and providing `id_col`, `time_col`, and
`duration_col`.

We highly recommend reading the `Temporal Dependency` page before exploring the TpT API.

[See The Temporal Dependency Guide ](../../../tutorials/temporal_dependency.md){ .md-button }

## ::: scikit_longitudinal.estimators.trees.TpT.TpT_decision_tree.TpTDecisionTreeClassifier
options:
heading: "TpTDecisionTreeClassifier"
members:
- fit
- predict
- predict_proba

!!! note "Where do `predict` and `predict_proba` come from?"
Both methods are inherited from scikit-learn's `DecisionTreeClassifier`. `TpTDecisionTreeClassifier` only overrides
them to handle the optional LONG→wide conversion; otherwise the standard scikit-learn behaviour applies.

!!! warning "`gamma` vs. `threshold_gain`"
`threshold_gain` is kept as a backward-compatible alias for `gamma` (both control the time-penalty rate
$\gamma$). Prefer the explicit `gamma` keyword in new code; if both are provided, `gamma` takes precedence.
15 changes: 8 additions & 7 deletions docs/publications.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,20 +61,21 @@ This section keeps track of the research papers behind the implemented primitive
??? abstract "Abstract"
We propose a new variant of the Correlation-based Feature Selection (CFS) method for coping with longitudinal data - where variables are repeatedly measured across different time points. The proposed CFS variant is evaluated on ten datasets created using data from the English Longitudinal Study of Ageing (ELSA), with different age-related diseases used as the class variables to be predicted. The results show that, overall, the proposed CFS variant leads to better predictive performance than the standard CFS and the baseline approach of no feature selection, when using Naïve Bayes and J48 decision tree induction as classification algorithms (although the difference in performance is very small in the results for J4.8). We also report the most relevant features selected by J48 across the datasets.

### In Active Development

???+ warning "Time-penalised Trees (TpT)"
**Paper:** *Time-penalised trees (TpT): introducing a new tree-based data mining algorithm for time-varying covariates*. [Read the paper (DOI)](https://doi.org/10.1007/s10472-024-09950-w)
???+ note "Time-penalised Trees (TpT)"
**Papers:**

**Status:** Integration planned for a future `Sklong` release.
- Valla, M. (2024). *Time-penalised trees (TpT): introducing a new tree-based data mining algorithm for time-varying covariates*. *Annals of Mathematics and Artificial Intelligence* 92, 1609–1661. [Read the paper (DOI)](https://doi.org/10.1007/s10472-024-09950-w)
- Valla, M., Milhaud, X. (2026). *Consistent Time-Aware Trees for Longitudinal Data: The Time-Penalized Tree*. ⟨hal-05022929v2⟩. [Read the preprint](https://cnrs.hal.science/hal-05022929)

**Discussion:** [Issue #65](https://github.com/simonprovost/scikit-longitudinal/issues/65)
**API Reference:** [TpT Decision Tree Classifier](API/estimators/trees/tpt_decision_tree_classifier.md)

**Credits:** Original author: Mathias Valla. Implementation: under active development by [Mathias Valla](https://github.com/MathiasValla).
**Credits:** Original author: Mathias Valla. Implementation: [Mathias Valla](https://github.com/MathiasValla), Esteban Mauboussin, Alae Khidour, Berkehan Kocak, and Sonny Mupfuni, with the `Sklong` team.

??? abstract "Abstract"
This article introduces a new decision tree algorithm that accounts for time-varying covariates in the decision-making process. Traditional decision tree algorithms assume that the covariates are static and do not change over time, which can lead to inaccurate predictions in dynamic environments. Other existing methods suggest workaround solutions such as the pseudo-subject approach. The proposed algorithm utilises a different structure and a time-penalised splitting criterion that allows a recursive partitioning of both the covariates space and time. Relevant historical trends are then inherently involved in the construction of a tree, and are visible and interpretable once it is fit. This approach allows for innovative and highly interpretable analysis in settings where the covariates are subject to change over time. The effectiveness of the algorithm is demonstrated through a real-world data application in life insurance. The results presented in this article can be seen as an introduction or proof-of-concept of the time-penalised approach, and the algorithm’s theoretical properties and comparison against existing approaches on datasets from various fields will be explored in forthcoming work.

### In Active Development

???+ warning "Clustering-based KNN Regression for Longitudinal Data (CKNNRLD)"
**Paper:** *Boosting K-nearest neighbor regression performance for longitudinal data through a novel learning approach*. [Read the paper (DOI)](https://doi.org/10.1186/s12859-025-06205-1)

Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ dependencies = [
"deep-forest-py310>=0.1.9",
"scikit-lexicographical-trees==0.0.5",
"pytest==9.0.1",
"setuptools<81",
]

[dependency-groups]
Expand Down
4 changes: 4 additions & 0 deletions scikit_longitudinal/discovery.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,8 @@ def is_abstract(c):
"LexicoDecisionTreeClassifier",
"LexicoDecisionTreeRegressor",
"LexicoDeepForestClassifier",
"TpTDecisionTreeClassifier",
"TpTDecisionTreeRegressor",
]
)
)
Expand Down Expand Up @@ -147,6 +149,8 @@ def is_abstract(c):
"LexicoDecisionTreeClassifier",
"LexicoDecisionTreeRegressor",
"LexicoDeepForestClassifier",
"TpTDecisionTreeClassifier",
"TpTDecisionTreeRegressor",
]
)
)
Expand Down
Loading
Loading