Skip to content

Fix shuffle misalignment#58

Open
basakbahcivanci wants to merge 2 commits into
IBM:mainfrom
basakbahcivanci:fix-shuffle-misalignment
Open

Fix shuffle misalignment#58
basakbahcivanci wants to merge 2 commits into
IBM:mainfrom
basakbahcivanci:fix-shuffle-misalignment

Conversation

@basakbahcivanci
Copy link
Copy Markdown

This pull request fixes an index misalignment bug where the target column in transformed_model_target_data.csv could become out of sync with condition_binary(original target) in the metadata after shuffling/splitting.
Changes:

  • autoxai4omics/utils/ml/data_split.py
    Reset indices after train/test split to ensure targets remain correctly aligned with features and metadata.
  • autoxai4omics/utils/ml/preprocessing.py
    Updated to preserve index consistency.
  • autoxai4omics/utils/ml/class_balancing.py
    Replaced direct imports (from numpy import ndarrayfrom pandas.core.frame import DataFrame) with import numpy as np and ”import pandas as pd due to error. Improved error messages to show actual received types for easier debugging. Allowed y_train as a pd.Series (for labels).
  • autoxai4omics/omics/tabular.py
    Allowed y to be kept as Series with SampleID index.
  • autoxai4omics/utils/save.py
    Updated to preserve consistent indices
  • autoxai4omics/models/tabauto/keras_model.py , autoxai4omics/models/tabauto/lgbm_model.py , autoxai4omics/models/tabauto/xgboost_model.py
    Updated model wrappers to correctly handle input/target arrays with consistent indices, preventing downstream Errors and misaligned labels.

… and splitting

Signed-off-by: Basak Bahcivanci <basakbahcivanci@gmail.com>
…essing and splitting

Signed-off-by: Basak Bahcivanci <basakbahcivanci@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant