Skip to content

Conversation

@RosCraddock
Copy link

Per issue #26. Updating the error statements to be more informative for the user.

else:
# expect more infomative error message
print(f"ERROR: Invalid metafounder input format.\nExiting...")
print(f"ERROR: The same metafounder is expected for both parents. For individual {idx} the metafounders were {sireID} and {damID}.\nExiting...")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RosCraddock what if we have an F1 crossbred without pedigree and we want to give them different metafounders for the different parents?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was thinking the same. This could still be considered with the above condition, but would require the user to create two dummy individuals, one for each parent of the F1 individual, and then assign the metafounders for the different breeds to each of the dummy individuals.

Copy link
Member

@gregorgorjanc gregorgorjanc Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this makes sense (to create two metafounders for such F1 individuals). So, we do cater for this edge case and we have tests for that?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can handle this, but it requires the user to add dummy individuals to the pedigree and assign them the two different metafounders for the F1 cross. We do not do this within AlphaPeel, although we could do so instead of the error message. Do you think this should be handled by AlphaPeel or returned to the user via the error message?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RosCraddock I am a bit too removed from the use cases here. Can you please share how the input looks like for this? ideally this could also serve as an example in the exapmle(s) folder and a clarification/mention in the docs - so we are super clear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RosCraddock thanks this is very helpful! The crux of the limitation is that we don't allow for different metafounders. I suggest we lift that limitation, and then I think the problem will go away? To show this strictness in another way, we would also not allow for half-founders having one metafounder, because the metafounder by definition differs with the known parent.

I appreciate that have now hijacked this issue on editing the error message - but I have only now realised that we are likely too strict with enforcing that metafounders must match. So, let's just try to fix this as part of this PR. Will rename it though.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gregorgorjanc Yes, I agree. I can see how it is too restrictive, particularly in the context of Jana's data and questions.

The code will need some review because with AlphaPeel, we do not treat metafounders the same way we treat individuals. Will have a look and outline a plan tomorrow or next week.

Copy link
Author

@RosCraddock RosCraddock Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To allow for different metafounders is a fairly easy change:

  • collect both metafounders in the ind.metafounder through ind.metafounder = [sireID, damID] in tinyhouse.pedigree.py
  • Adjust args in alphapeel.tinypeel.py and alphapeel.peelingupdates.py for this change. Assume that for founders with different metafounders, the anterior term is calculated through HWE, using the mean allele frequency of both metafounders.

For half-metafounders, it's relevant to give some context. Ignoring metafounders, AlphaPeel handles any half-founder by allocating a dummy individual labelled as "mother of .." or "father of ..". For consistency, we could handle any half-metafounders in the same way: (1) allocate a dummy individual, (2) then assign the metafounder to the dummy individual. This is essentially what AlphaPeel currently does, except we require the user to do this. I can add code so this is completed internally instead. However, this may introduce many dummy individuals. Instead, we could allocate the same dummy individual for each metafounder (rather than a unique dummy individual for each half-founder). However, is it preferable to support half-metafounders without dummy individuals? If so, this is a more complex modification. @gregorgorjanc

Copy link
Author

@RosCraddock RosCraddock Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have implemented code to support including multiple metafounders for the same individual and, as part of that, conducted some quick testing. One of the key changes in the code was in the -update_allele_prob function, which uses the mean of the respective founders to estimate the alternative allele frequency for each metafounder. For individuals with different metafounders as sires and dams, I used this code:

MF = list(pedigree.AAP.keys())
    for mfx in MF:
        AAP = pedigree.AAP[mfx]
        sumOfGenotypes = np.full((4, peelingInfo.nLoci), 0, dtype=np.float32)
        n = 0
        for ind in pedigree:
            if mfx in ind.MetaFounder and ind.isFounder():
                ind_genotype = peelingInfo.getGenoProbs(ind.idn)
                if len(ind.MetaFounder) == 2:
                    sumOfGenotypes += ind_genotype / len(ind.MetaFounder)
                    n += 1 / len(ind.MetaFounder)
                else:
                    sumOfGenotypes += ind_genotype
                    n += 1
        for i in range(peelingInfo.nLoci):
            marker_Aa = sumOfGenotypes[1, i]
            marker_aA = sumOfGenotypes[2, i]
            marker_AA = sumOfGenotypes[3, i]
            AAP[i] = (0.5 * (marker_Aa + marker_aA) + marker_AA) / n
            if AAP[i] < 0.01:
                AAP[i] = 0.01
            elif AAP[i] > 0.99:
                AAP[i] = 0.99
        pedigree.AAP[mfx] = AAP.astype(np.float32)
    for ind in pedigree:
        if ind.MetaFounder is not None and ind.isFounder():
            AAP = np.zeros(peelingInfo.nLoci, dtype=np.float32)
            for mfx in ind.MetaFounder:
                AAP += pedigree.AAP[mfx]
            if len(ind.MetaFounder) > 1:
                AAP /= len(ind.MetaFounder)
            mafGeno = ProbMath.getGenotypesFromMaf(AAP)
            peelingInfo.anterior[ind.idn, :, :] = mafGeno

For testing, I compared this new approach with using dummy individuals, so the ped files look like this, where D0 is the multiple metafounder individual.

Without dummy individuals

id fid mid
M0 MF_1 MF_1
F0 MF_2 MF_2
D0 MF_1 MF_2

With dummy individuals (for D0)

id fid mid
M0 MF_1 MF_1
F0 MF_2 MF_2
D0 FD0 MD0
FD0 MF_1 MF_1
MD0 MF_2 MF_2

When using homozygous geno inputs (see below), whether using dummy individuals or not, we get identical results in the alternative allele frequency estimates for the two metafounders.

geno_file (all homozygous)

id dosage
M0 0 0 0 0 0
F0 2 2 2 2 2
D0 2 2 2 2 2

The resulting alt_allele_prob

MF_1 MF_2
0.33 0.99
0.33 0.99
0.33 0.99
0.33 0.99
0.33 0.99

When D0 is heterozygous at each of the 5 loci, we get different results using dummy individuals compared to without.

geno_file inputted

id dosage
M0 0 0 0 0 0
F0 2 2 2 2 2
D0 1 1 1 1 1

alt_allele_prob without dummy individuals

MF_1 MF_2
0.17 0.83
0.17 0.83
0.17 0.83
0.17 0.83
0.17 0.83

alt_allele_prob with dummy individuals

MF_1 MF_2
0.01 0.99
0.01 0.99
0.33 0.67
0.01 0.99
0.01 0.99

The results for alt_allele_prob without dummy individuals are logical. When we input the allele dosage, there is an equal chance that the alternative allele came from MF_1 or MF_2 for individual D0. This will increase the MF_1 estimate and reduce the MF_2 estimate compared to just including individuals M0 or F0 only.

The result for alt_allele_prob with dummy individuals is odd. The 0.99 and 0.01 values in MF_1 and MF_2, respectively, although possible, cannot be known (i.e as certain) from the little information provided. However, the different results at the third locus (0.33 MF_1 and 0.67 MF_2) are more plausible. I do not understand why a different result was estimated at this locus only.

Referring to the phased genotype probabilities of D0 only, there are some strange results.
My expectation for the phased genotype probabilities of D0 is:

phase probs
aa 0 0 0 0 0
aA 0.5 0.5 0.5 0.5 0.5
Aa 0.5 0.5 0.5 0.5 0.5
AA 0 0 0 0 0

Phased probs without dummy individuals

phase probs
aa 0.0000 0.0000 0.0000 0.0000 0.0000
aA 0.5000 0.5000 0.0000 0.5000 0.5000
Aa 0.5000 0.5000 0.9999 0.5000 0.5000
AA 0.0000 0.0000 0.0000 0.0000 0.0000

Phased probs with dummy individuals

phase probs
aa 0.0000 0.0000 0.0001 0.0000 0.0000
aA 0.9999 0.9999 0.0001 0.9999 0.9999
Aa 0.0001 0.0001 0.9997 0.0001 0.0001
AA 0.0000 0.0000 0.0001 0.0000 0.0000

So it seems something is up with the phasing?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, issue solved. It is not an error, but a relevant note. Where multiple loci are heterozygous in genotyped founders, the default is to phase the middle locus so the alternative allele is paternal, but keep all other loci the same (50:50). This can be suppressed with the flag -no_phase_founder. For individual D0, this gives the phased probs:

Without dummy individuals

phase prob
aa 0.0000 0.0000 0.0000 0.0000 0.0000
aA 0.5000 0.5000 0.5000 0.5000 0.5000
Aa 0.5000 0.5000 0.5000 0.5000 0.5000
AA 0.0000 0.0000 0.0000 0.0000 0.0000

With dummy individuals

phase prob
aa 0.0000 0.0000 0.0000 0.0000 0.0000
aA 0.9999 0.9999 0.9999 0.9999 0.9999
Aa 0.0001 0.0001 0.0001 0.0001 0.0001
AA 0.0000 0.0000 0.0000 0.0000 0.0000

The phasing without dummy individuals ignores the additional information from the metafounders, where it is more likely that the alternative allele is maternal, as MF_2 has the highest alternative allele frequency. This is correct to ignore where individual D0 is known to come from MF_1 and MF_2, but not necessarily know which one represents the dam or sire. However, if it is known that individual D0's sire was MF_1 and its dam was MF_2, it would be valuable to include this additional information in the phased probabilities. When using dummy individuals, this is accounted for and can explain the difference in the estimates.

To summarise, when considering multiple metafounder individuals, there are two possible scenarios:

  1. It is not known which metafounders represent the dam or the sire.
  2. It is known which metafounders represent the dam or the sire.

I can modify how the anterior probability is updated so scenario (2) can be accounted for without the need for dummy individuals, but how would we distinguish between the two scenarios?

Will come back to this later.

@gregorgorjanc
Copy link
Member

All looks great except one condition, which likely should not be in there - we should be able to support different metafounders. Let's address this as part of this PR so we don't make the process to complicated. Can you suggest edits to this effect? We might need more/different tests for this.

@gregorgorjanc gregorgorjanc changed the title Updating error messages Updating error messages and metafounder rules Dec 4, 2025
Copy link
Contributor

@XingerTang XingerTang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good to me! Just a reminder: Don't forget to remove the "# expect more informative error message" comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants