consistency in expression estimation

From email, could be useful in case of change between current expression estimation script and the development one -

**CPM Normalization Strategy:**

The manuscript (Page 19, Methods) mentions: "instead of performing CPM normalization of simulated bulks, we normalize each test bulk sample to sum to the mean of sums of simulated samples," which suggests CPM normalization might be avoided. However, in the run_dissect_expr function, it appears that standard CPM normalization is applied (sc.pp.normalize_total(target_sum=1e6)). Could you kindly confirm which normalization strategy is the intended one for the final results reported in the paper?

**Consistency Loss and Data Mixing:**

The paper describes generating mixed samples ($B^{mix}$) using a mixing parameter $\beta$. In my reading of the run_dissect_expr function, it wasn't immediately clear where this explicit mixing occurs, as the iterators for real and simulated data seem to be processed somewhat separately. I might be missing the specific lines where the mixing happens—could you point me to that logic?

**Loss Function Scaling:**

I noticed in the code that the input is divided by the number of cell types (/ labels.shape[1]) prior to the loss computation. As I couldn't find a mention of this scaling factor in the manuscript, could you explain the motivation behind this step?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

consistency in expression estimation #7

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

consistency in expression estimation #7

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions