Use pseudogenes to find 'sampling distribution' (?) / null hypothesis, these genes should have no selection and therefore dNdS mean = 1 + variation. Enabling comparison of real effect vs chance.
Problems:
- Pseudogene sequences not available for GRC37?
- Annotations in ICGC VCF are simply "exon_variant" for pseudogenes - rather than nonsynon or synon
Use pseudogenes to find 'sampling distribution' (?) / null hypothesis, these genes should have no selection and therefore dNdS mean = 1 + variation. Enabling comparison of real effect vs chance.
Problems: