-
Notifications
You must be signed in to change notification settings - Fork 7
Updating error messages and metafounder rules #27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
Conversation
| else: | ||
| # expect more infomative error message | ||
| print(f"ERROR: Invalid metafounder input format.\nExiting...") | ||
| print(f"ERROR: The same metafounder is expected for both parents. For individual {idx} the metafounders were {sireID} and {damID}.\nExiting...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RosCraddock what if we have an F1 crossbred without pedigree and we want to give them different metafounders for the different parents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was thinking the same. This could still be considered with the above condition, but would require the user to create two dummy individuals, one for each parent of the F1 individual, and then assign the metafounders for the different breeds to each of the dummy individuals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this makes sense (to create two metafounders for such F1 individuals). So, we do cater for this edge case and we have tests for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can handle this, but it requires the user to add dummy individuals to the pedigree and assign them the two different metafounders for the F1 cross. We do not do this within AlphaPeel, although we could do so instead of the error message. Do you think this should be handled by AlphaPeel or returned to the user via the error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RosCraddock I am a bit too removed from the use cases here. Can you please share how the input looks like for this? ideally this could also serve as an example in the exapmle(s) folder and a clarification/mention in the docs - so we are super clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RosCraddock thanks this is very helpful! The crux of the limitation is that we don't allow for different metafounders. I suggest we lift that limitation, and then I think the problem will go away? To show this strictness in another way, we would also not allow for half-founders having one metafounder, because the metafounder by definition differs with the known parent.
I appreciate that have now hijacked this issue on editing the error message - but I have only now realised that we are likely too strict with enforcing that metafounders must match. So, let's just try to fix this as part of this PR. Will rename it though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gregorgorjanc Yes, I agree. I can see how it is too restrictive, particularly in the context of Jana's data and questions.
The code will need some review because with AlphaPeel, we do not treat metafounders the same way we treat individuals. Will have a look and outline a plan tomorrow or next week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To allow for different metafounders is a fairly easy change:
- collect both metafounders in the
ind.metafounderthroughind.metafounder = [sireID, damID]in tinyhouse.pedigree.py - Adjust args in alphapeel.tinypeel.py and alphapeel.peelingupdates.py for this change. Assume that for founders with different metafounders, the anterior term is calculated through HWE, using the mean allele frequency of both metafounders.
For half-metafounders, it's relevant to give some context. Ignoring metafounders, AlphaPeel handles any half-founder by allocating a dummy individual labelled as "mother of .." or "father of ..". For consistency, we could handle any half-metafounders in the same way: (1) allocate a dummy individual, (2) then assign the metafounder to the dummy individual. This is essentially what AlphaPeel currently does, except we require the user to do this. I can add code so this is completed internally instead. However, this may introduce many dummy individuals. Instead, we could allocate the same dummy individual for each metafounder (rather than a unique dummy individual for each half-founder). However, is it preferable to support half-metafounders without dummy individuals? If so, this is a more complex modification. @gregorgorjanc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have implemented code to support including multiple metafounders for the same individual and, as part of that, conducted some quick testing. One of the key changes in the code was in the -update_allele_prob function, which uses the mean of the respective founders to estimate the alternative allele frequency for each metafounder. For individuals with different metafounders as sires and dams, I used this code:
MF = list(pedigree.AAP.keys())
for mfx in MF:
AAP = pedigree.AAP[mfx]
sumOfGenotypes = np.full((4, peelingInfo.nLoci), 0, dtype=np.float32)
n = 0
for ind in pedigree:
if mfx in ind.MetaFounder and ind.isFounder():
ind_genotype = peelingInfo.getGenoProbs(ind.idn)
if len(ind.MetaFounder) == 2:
sumOfGenotypes += ind_genotype / len(ind.MetaFounder)
n += 1 / len(ind.MetaFounder)
else:
sumOfGenotypes += ind_genotype
n += 1
for i in range(peelingInfo.nLoci):
marker_Aa = sumOfGenotypes[1, i]
marker_aA = sumOfGenotypes[2, i]
marker_AA = sumOfGenotypes[3, i]
AAP[i] = (0.5 * (marker_Aa + marker_aA) + marker_AA) / n
if AAP[i] < 0.01:
AAP[i] = 0.01
elif AAP[i] > 0.99:
AAP[i] = 0.99
pedigree.AAP[mfx] = AAP.astype(np.float32)
for ind in pedigree:
if ind.MetaFounder is not None and ind.isFounder():
AAP = np.zeros(peelingInfo.nLoci, dtype=np.float32)
for mfx in ind.MetaFounder:
AAP += pedigree.AAP[mfx]
if len(ind.MetaFounder) > 1:
AAP /= len(ind.MetaFounder)
mafGeno = ProbMath.getGenotypesFromMaf(AAP)
peelingInfo.anterior[ind.idn, :, :] = mafGeno
For testing, I compared this new approach with using dummy individuals, so the ped files look like this, where D0 is the multiple metafounder individual.
Without dummy individuals
| id | fid | mid |
|---|---|---|
| M0 | MF_1 | MF_1 |
| F0 | MF_2 | MF_2 |
| D0 | MF_1 | MF_2 |
With dummy individuals (for D0)
| id | fid | mid |
|---|---|---|
| M0 | MF_1 | MF_1 |
| F0 | MF_2 | MF_2 |
| D0 | FD0 | MD0 |
| FD0 | MF_1 | MF_1 |
| MD0 | MF_2 | MF_2 |
When using homozygous geno inputs (see below), whether using dummy individuals or not, we get identical results in the alternative allele frequency estimates for the two metafounders.
geno_file (all homozygous)
| id | dosage |
|---|---|
| M0 | 0 0 0 0 0 |
| F0 | 2 2 2 2 2 |
| D0 | 2 2 2 2 2 |
The resulting alt_allele_prob
| MF_1 | MF_2 |
|---|---|
| 0.33 | 0.99 |
| 0.33 | 0.99 |
| 0.33 | 0.99 |
| 0.33 | 0.99 |
| 0.33 | 0.99 |
When D0 is heterozygous at each of the 5 loci, we get different results using dummy individuals compared to without.
geno_file inputted
| id | dosage |
|---|---|
| M0 | 0 0 0 0 0 |
| F0 | 2 2 2 2 2 |
| D0 | 1 1 1 1 1 |
alt_allele_prob without dummy individuals
| MF_1 | MF_2 |
|---|---|
| 0.17 | 0.83 |
| 0.17 | 0.83 |
| 0.17 | 0.83 |
| 0.17 | 0.83 |
| 0.17 | 0.83 |
alt_allele_prob with dummy individuals
| MF_1 | MF_2 |
|---|---|
| 0.01 | 0.99 |
| 0.01 | 0.99 |
| 0.33 | 0.67 |
| 0.01 | 0.99 |
| 0.01 | 0.99 |
The results for alt_allele_prob without dummy individuals are logical. When we input the allele dosage, there is an equal chance that the alternative allele came from MF_1 or MF_2 for individual D0. This will increase the MF_1 estimate and reduce the MF_2 estimate compared to just including individuals M0 or F0 only.
The result for alt_allele_prob with dummy individuals is odd. The 0.99 and 0.01 values in MF_1 and MF_2, respectively, although possible, cannot be known (i.e as certain) from the little information provided. However, the different results at the third locus (0.33 MF_1 and 0.67 MF_2) are more plausible. I do not understand why a different result was estimated at this locus only.
Referring to the phased genotype probabilities of D0 only, there are some strange results.
My expectation for the phased genotype probabilities of D0 is:
| phase | probs |
|---|---|
| aa | 0 0 0 0 0 |
| aA | 0.5 0.5 0.5 0.5 0.5 |
| Aa | 0.5 0.5 0.5 0.5 0.5 |
| AA | 0 0 0 0 0 |
Phased probs without dummy individuals
| phase | probs |
|---|---|
| aa | 0.0000 0.0000 0.0000 0.0000 0.0000 |
| aA | 0.5000 0.5000 0.0000 0.5000 0.5000 |
| Aa | 0.5000 0.5000 0.9999 0.5000 0.5000 |
| AA | 0.0000 0.0000 0.0000 0.0000 0.0000 |
Phased probs with dummy individuals
| phase | probs |
|---|---|
| aa | 0.0000 0.0000 0.0001 0.0000 0.0000 |
| aA | 0.9999 0.9999 0.0001 0.9999 0.9999 |
| Aa | 0.0001 0.0001 0.9997 0.0001 0.0001 |
| AA | 0.0000 0.0000 0.0001 0.0000 0.0000 |
So it seems something is up with the phasing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, issue solved. It is not an error, but a relevant note. Where multiple loci are heterozygous in genotyped founders, the default is to phase the middle locus so the alternative allele is paternal, but keep all other loci the same (50:50). This can be suppressed with the flag -no_phase_founder. For individual D0, this gives the phased probs:
Without dummy individuals
| phase | prob |
|---|---|
| aa | 0.0000 0.0000 0.0000 0.0000 0.0000 |
| aA | 0.5000 0.5000 0.5000 0.5000 0.5000 |
| Aa | 0.5000 0.5000 0.5000 0.5000 0.5000 |
| AA | 0.0000 0.0000 0.0000 0.0000 0.0000 |
With dummy individuals
| phase | prob |
|---|---|
| aa | 0.0000 0.0000 0.0000 0.0000 0.0000 |
| aA | 0.9999 0.9999 0.9999 0.9999 0.9999 |
| Aa | 0.0001 0.0001 0.0001 0.0001 0.0001 |
| AA | 0.0000 0.0000 0.0000 0.0000 0.0000 |
The phasing without dummy individuals ignores the additional information from the metafounders, where it is more likely that the alternative allele is maternal, as MF_2 has the highest alternative allele frequency. This is correct to ignore where individual D0 is known to come from MF_1 and MF_2, but not necessarily know which one represents the dam or sire. However, if it is known that individual D0's sire was MF_1 and its dam was MF_2, it would be valuable to include this additional information in the phased probabilities. When using dummy individuals, this is accounted for and can explain the difference in the estimates.
To summarise, when considering multiple metafounder individuals, there are two possible scenarios:
- It is not known which metafounders represent the dam or the sire.
- It is known which metafounders represent the dam or the sire.
I can modify how the anterior probability is updated so scenario (2) can be accounted for without the need for dummy individuals, but how would we distinguish between the two scenarios?
Will come back to this later.
|
All looks great except one condition, which likely should not be in there - we should be able to support different metafounders. Let's address this as part of this PR so we don't make the process to complicated. Can you suggest edits to this effect? We might need more/different tests for this. |
XingerTang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks good to me! Just a reminder: Don't forget to remove the "# expect more informative error message" comments
Per issue #26. Updating the error statements to be more informative for the user.