Skip to content

Clarify QSPR/ChemProp scaling for “window” vs modifiers: do model outputs need pre-scaling? #16

@mmahmedgithub

Description

@mmahmedgithub

Context

I’m using DrugEx Graph RL with multiple objectives:

  • QSPR/ChemProp regressors: absolute LogP, absolute LogD7, log‑scaled Caco2
  • RDKit properties: QED, MW, TPSA, logP

DrugEx applies modifiers (e.g., ClippedScore, SmoothHump) to map raw objective values to [0,1], then combines them via WS or Pareto.


Observed Behavior

For QSPR “window” targets, the modifier requires the targets to be 0‑centered (roughly ±1.5 around 0, hard coded it seems).
This works if the model output (my qsprpred/chemprop) is a residual or zero‑centered value, but fails for absolute outputs like LogP ≈ 2–5 (scores collapse toward 0 unless pre‑centered).

RDKit windows accept explicit low/high thresholds and work fine for absolute ranges.


Questions

  1. For QSPR/ChemProp regressors on absolute scales (LogP, LogD7, log‑Caco2),
    do outputs need to be pre‑standardized/zero‑centered to use “window”? . This eventually require me retraining the models after rescaling the endpoints to the desired 0-centered residuals, am I getting things right here ?
    Or can modifiers map absolute values directly to [0,1] without retraining?

  2. Is there a supported way to specify an arbitrary window (low, high, sharpness) for QSPR objectives (unscaled/real values, like logp, pic50, pki, and so on), similar to RDKit windows?

  3. If not, is this the recommended pattern?

    • Use QSPR active/inactive with thresholds for absolute regressors. But even here, it seems that I need to specify a single cut-off / threshold that will be used for all endpoints (not sure if this will work for scaled outputs). If you are familiar with reinvent4, may be you can get where does the confusion come from.
    • Keep range constraints via RDKit property windows
  4. For WS vs Pareto:

    • Are per‑objective pass thresholds (e.g., ~0.99) the only condition for Desired=1?
    • Do modifiers ensure scale invariance across heterogeneous objectives?
  5. Are there minimal example configs combining:

    • QSPR absolute outputs (LogP/LogD)
    • QSPR “low‑is‑better” (Caco2) . Yes, this is what i need in my specific problem, low Caco2
    • RDKit windows
      under a single reward scheme without pre‑transforming model outputs?

Environment

  • DrugEx: master (containerized)
  • CUDA: RTX 4070S
  • ChemProp models: standard regression (absolute) and log‑scaled Caco2

Goal

Confirm whether QSPR “window” is intentionally zero‑centered only, and whether there could be an option to set (low, high, shape) directly for QSPR window objectives to avoid retraining models to residuals.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions