-
Notifications
You must be signed in to change notification settings - Fork 3
Description
@LynxJinyangii @hannesbecher We can import tree sequence into R or simulate it via reticulate. For AlphaSimR, we however need base population haplotypes/genomes. Usually we get these via:
founderGenomes = quickHaplo(nInd = 10, nChr = 2, segSites = 3)
founderGenomes = runMacs(nInd = 10, nChr = 2, segSites = 3)
key point here is that we preset the number of sites we will work with segSites (we limit this to a manageable number to avoid working with large haplotype matrix that has quadratic storage and compute complexity) and that the resulting object is of type:
> is(founderGenomes)
[1] "MapPop" "RawPop"
To swap quickHaplo or runMacs with imported tree sequence, we should create a function that:
- converts a
tsobject to haplotypes (but only for requested number of sites) - creates a
MapPopobject
Re 1) We should be mindful of sub-sampling sites to make the haplotype matrix manageable. Looking at https://tskit.dev/tskit/docs/stable/python-api.html#tskit.TreeSequence.genotype_matrix I don't see a way to subsample, but https://tskit.dev/tskit/docs/stable/python-api.html#tskit.TreeSequence.variants seems to be a viable alternative
Re 2) look at AlphaSimR::quickHaplo() or AlphaSimR:runMacs() on how to construct a MapPop object.
How shall we call this function? Suggestions: asMapPop() would follow the usual casting function names in R, so the call would look like founderGenomes = asMapPop(ts). An alternative would be ts2MapPop(), so the call would look like founderGenomes = ts2MapPop(ts). Since we already have isMapPop(), then asMapPop() feels like a natural choice. Keen to hear suggestions!
Hmm, we should think about handling one vs multiple ts objects for multiple chromosomes! How will we present these on R side? As a vector/list of multiple ts objects? How would one do this on the Python side? As [ts1, ts2, ...]?
Edit: Once we have the asMapPop() functionality, we could then run:
# founderTs is obtained in an appropriate way
founderGenomes = asMapPop(founderTs, segSites = 3)
# asMapPop should recognise how many chromosomes (how many separate tree sequences we have in founderTs)
SP = SimParam$new(founderGenomes)
# Continue with AlphaSimR as usually with the "box of haplotype" approach and
# ignoring tree sequence entirely from this point onwards
# (it was used only to get base population haplotypes stored in founderGenomes)