Skip to content

Add multiclass classification (OVO, OVA) #52

@eprifti

Description

@eprifti

Context

gpredomics currently supports binary classification only (2 classes: 0 vs 1). Many clinical and biological problems involve multiple classes (e.g., disease subtypes, treatment response categories, multiple conditions).

Proposed approaches

One-vs-All (OVA / OVR)

  • Train K binary classifiers, each separating one class from all others
  • At prediction time, assign the class with the highest score/confidence
  • Pros: simple, only K models needed, each model is a standard gpredomics binary model (fully interpretable)
  • Cons: class imbalance (one class vs all others); models are not calibrated against each other; may produce ambiguous regions where multiple classifiers predict positive
  • Implementation: can be orchestrated externally (run gpredomics K times with relabeled y), or integrated into the engine for convenience

One-vs-One (OVO)

  • Train K×(K-1)/2 binary classifiers, one for each pair of classes
  • At prediction time, each classifier votes; assign the class with the most votes
  • Pros: each pairwise classifier sees balanced sub-problems; often better separation
  • Cons: quadratic number of models; voting ties possible; harder to interpret the ensemble
  • Implementation: more complex orchestration; needs a voting/aggregation layer

Comparison

Approach # Models Balance Interpretability Complexity
OVA K Imbalanced High (each model is standalone) Low
OVO K(K-1)/2 Balanced Medium (ensemble of pairwise models) Medium

Design considerations

  • Jury integration: The existing voting/jury system could potentially be reused for combining OVO classifiers
  • Feature importance: How to aggregate feature importance across multiple binary models
  • Cross-validation: CV should maintain class proportions across all classes (stratified K-fold)
  • param.yaml: Need a new parameter for multiclass strategy (multiclass: ova or multiclass: ovo)
  • Output format: Results should show per-class metrics (sensitivity, specificity, etc.) and a confusion matrix

Related work

  • predomicsmc — existing multiclass extension via Predomics (R implementation)
  • scikit-learn's OneVsRestClassifier / OneVsOneClassifier as design reference

Suggested implementation path

  1. Start with OVA — simpler, each sub-model is a standard gpredomics run
  2. Add OVO as an option later
  3. Consider whether multiclass should be a core engine feature or an orchestration layer (wrapper script / predomicsapp-web feature)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions