Clarify QSPR/ChemProp scaling for “window” vs modifiers: do model outputs need pre-scaling?

## Context

I’m using **DrugEx Graph RL** with multiple objectives:

- **QSPR/ChemProp regressors:** absolute LogP, absolute LogD7, log‑scaled Caco2  
- **RDKit properties:** QED, MW, TPSA, logP  

DrugEx applies *modifiers* (e.g., `ClippedScore`, `SmoothHump`) to map raw objective values to [0,1], then combines them via WS or Pareto.

---

## Observed Behavior

For QSPR **“window”** targets, the modifier  requires the targets to be 0‑centered (roughly ±1.5 around 0, hard coded it seems).  
This works if the model output (my qsprpred/chemprop) is a residual or zero‑centered value, but fails for absolute outputs like LogP ≈ 2–5 (scores collapse toward 0 unless pre‑centered).

RDKit windows accept explicit **low/high thresholds** and work fine for absolute ranges.

---

## Questions

1. For QSPR/ChemProp regressors on absolute scales (LogP, LogD7, log‑Caco2),  
   do outputs need to be pre‑standardized/zero‑centered to use “window”? . This eventually require me retraining the models after rescaling the endpoints to the desired 0-centered residuals, am I getting things right here ?
   Or can modifiers map absolute values directly to `[0,1]` without retraining?

2. Is there a supported way to specify an **arbitrary window** (`low`, `high`, `sharpness`) for QSPR objectives (unscaled/real values, like logp, pic50, pki, and so on), similar to RDKit windows?

3. If not, is this the recommended pattern?  
   - Use QSPR **active/inactive** with thresholds for absolute regressors. But even here, it seems that I need to specify a single cut-off / threshold that will be used for all endpoints (not sure if this will work for scaled outputs). If you are familiar with reinvent4, may be you can get where does the confusion come from.  
   - Keep range constraints via **RDKit property windows**

4. For WS vs Pareto:  
   - Are per‑objective pass thresholds (e.g., ~0.99) the only condition for `Desired=1`?  
   - Do modifiers ensure **scale invariance** across heterogeneous objectives?

5. Are there minimal example configs combining:  
   - QSPR absolute outputs (LogP/LogD)  
   - QSPR “low‑is‑better” (Caco2) . Yes, this is what i need in my specific problem, low Caco2
   - RDKit windows  
   under a single reward scheme **without pre‑transforming** model outputs?

## Environment
- **DrugEx:** master (containerized)
- **CUDA:** RTX 4070S
- **ChemProp models:** standard regression (absolute) and log‑scaled Caco2

---

## Goal
Confirm whether QSPR “window” is intentionally zero‑centered only, and whether there could be an option to set `(low, high, shape)` directly for QSPR window objectives to avoid retraining models to residuals.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify QSPR/ChemProp scaling for “window” vs modifiers: do model outputs need pre-scaling? #16

Context

Observed Behavior

Questions

Environment

Goal

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Clarify QSPR/ChemProp scaling for “window” vs modifiers: do model outputs need pre-scaling? #16

Description

Context

Observed Behavior

Questions

Environment

Goal

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions