Skip to content

added trainFraction to the initialization of the splitSettings#34

Open
Volpym wants to merge 1 commit intoOHDSI:mainfrom
Volpym:fix-train-fraction-in-backwards-comp
Open

added trainFraction to the initialization of the splitSettings#34
Volpym wants to merge 1 commit intoOHDSI:mainfrom
Volpym:fix-train-fraction-in-backwards-comp

Conversation

@Volpym
Copy link

@Volpym Volpym commented Jan 15, 2026

When using ATLAS’ export function for predictions, all generated prediction workflows pass through BackwardsComp.R via execute() in Main.R.
Currently, splitSettings from predictionAnalysis.json only contains testFraction, but createDefaultSplitSettings() is called with this single parameter, causing trainFraction to default to 75%, regardless of the provided testFraction.

This results in incorrect and inconsistent dataset splits.

Problem Description

Because trainFraction is not derived from the provided testFraction, the following issues occur:
- testFraction < 25%
- Training set defaults to 75%
- Test set uses the provided fraction (e.g. 10%)
- The remaining subjects (e.g. 15%) are unused

 - testFraction > 25%
      - Subjects can appear in both the training and test datasets
      - This can lead to data leakage and incorrect performance evaluation

Proposed Fix

This PR calculates trainFraction dynamically as:
trainFraction = 1 - testFraction and passes it explicitly to createDefaultSplitSettings().
This ensures that the train and test sets are mutually exclusive and fully account for the population, matching the intent of predictionAnalysis.json.


Although this package has not been updated since 2022, it is still actively used by ATLAS exports.
Fixing this behavior prevents incorrect model evaluation and aligns the exported R code with the split configuration defined in ATLAS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments