Skip to content

Commit 7884b2e

Browse files
authored
Refine Lego analogy in statistical modeling section
Clarified the explanation of statistical modeling using Lego analogy and emphasized the importance of understanding design requirements in genetic analysis.
1 parent 7d313ad commit 7884b2e

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ These notes are for trainees with quantitative backgrounds but without formal tr
44

55
These notes are not organized by method, by paper, or by software tool. Instead, we organize by scientific question. For each question, we focus on what problem we are trying to solve, what assumptions we are making, and what generative model most naturally describes how the data arise. Once these foundations are clear, existing methods become natural solutions, and their limitations become obvious.
66

7-
Think of it like building a Lego model to represent something in the real world. The statistical building blocks (likelihoods, priors, latent variables, hierarchical structures) are the pieces available. Our goal is to focus on designing the blueprint that captures the essential features of the biological reality, while keeping the available blocks in mind. The details of assembling specific kits will inevitably be discussed, but they are not the focus. When one understands what the design requires and what connections matter, one will know how to select and combine blocks to satisfy those requirements. With this foundation, one can read new methods papers and recognize the same underlying ideas, and feel comfortable adapting or extending existing approaches for new problems.
7+
Think of it like building a Lego model to represent something in the real world. The statistical building blocks (likelihoods, priors, latent variables, hierarchical structures) are the pieces available. Our goal is to focus on designing the blueprint that captures the essential features of the biological reality, while keeping the available blocks in mind. The details of assembling specific kits will inevitably be discussed, but they are not the focus. When one understands what the design requires and what connections matter, one will know how to select and combine blocks to satisfy those requirements. With this foundation, one can read new methods papers and recognize the same underlying ideas, and feel comfortable adapting or extending existing approaches for new problems (and only worry about various tricky details at this point).
88

99
As an example, consider allele-specific expression (ASE) QTL analysis. Total expression reflects the sum of transcripts from both haplotypes; ASE measures their difference within heterozygotes. The same genetic effect parameter underlies both, appearing as dosage effect $(0, 1, 2)$ in total expression and haplotype difference $(-1, 0, +1)$ in ASE. Because sum and difference are conditionally independent, ASE adds information about genetic effects beyond total expression from the same samples, effectively increasing sample size. The within-individual comparison also cancels individual-level confounders (which affect both haplotypes equally), and haplotype difference in ASE provides different correlation (LD) patterns than conventional genotype dosage, thus improving fine-mapping resolution. These advantages motivate incorporating ASE into QTL analysis. [RASQUAL](https://www.nature.com/articles/ng.3467) implemented a rigorous generative model with Negative Binomial total counts and Beta-Binomial allele-specific counts sharing genetic effect parameters; [mixQTL](https://www.nature.com/articles/s41467-021-21592-8) later achieved scalability through Gaussian approximations and [WASP](https://github.com/bmvdgeijn/WASP) preprocessing, trading some modeling rigor for computational efficiency suitable for large-scale analysis. One can extend this framework further by adding local ancestry modeling and fine-mapping under different likelihoods, following the same approach to motivation and generative modeling.
1010

0 commit comments

Comments
 (0)