Updating error messages and metafounder rules #27

RosCraddock · 2025-12-02T17:16:31Z

Per issue #26. Updating the error statements to be more informative for the user.

gregorgorjanc · 2025-12-02T21:02:03Z

Pedigree.py

                        else:
                            # expect more infomative error message
-                            print(f"ERROR: Invalid metafounder input format.\nExiting...")
+                            print(f"ERROR: The same metafounder is expected for both parents. For individual {idx} the metafounders were {sireID} and {damID}.\nExiting...")


@RosCraddock what if we have an F1 crossbred without pedigree and we want to give them different metafounders for the different parents?

Yes, I was thinking the same. This could still be considered with the above condition, but would require the user to create two dummy individuals, one for each parent of the F1 individual, and then assign the metafounders for the different breeds to each of the dummy individuals.

OK, this makes sense (to create two metafounders for such F1 individuals). So, we do cater for this edge case and we have tests for that?

We can handle this, but it requires the user to add dummy individuals to the pedigree and assign them the two different metafounders for the F1 cross. We do not do this within AlphaPeel, although we could do so instead of the error message. Do you think this should be handled by AlphaPeel or returned to the user via the error message?

@RosCraddock I am a bit too removed from the use cases here. Can you please share how the input looks like for this? ideally this could also serve as an example in the exapmle(s) folder and a clarification/mention in the docs - so we are super clear.

@RosCraddock thanks this is very helpful! The crux of the limitation is that we don't allow for different metafounders. I suggest we lift that limitation, and then I think the problem will go away? To show this strictness in another way, we would also not allow for half-founders having one metafounder, because the metafounder by definition differs with the known parent.

I appreciate that have now hijacked this issue on editing the error message - but I have only now realised that we are likely too strict with enforcing that metafounders must match. So, let's just try to fix this as part of this PR. Will rename it though.

@gregorgorjanc Yes, I agree. I can see how it is too restrictive, particularly in the context of Jana's data and questions.

The code will need some review because with AlphaPeel, we do not treat metafounders the same way we treat individuals. Will have a look and outline a plan tomorrow or next week.

To allow for different metafounders is a fairly easy change:

collect both metafounders in the ind.metafounder through ind.metafounder = [sireID, damID] in tinyhouse.pedigree.py

Adjust args in alphapeel.tinypeel.py and alphapeel.peelingupdates.py for this change. Assume that for founders with different metafounders, the anterior term is calculated through HWE, using the mean allele frequency of both metafounders.

For half-metafounders, it's relevant to give some context. Ignoring metafounders, AlphaPeel handles any half-founder by allocating a dummy individual labelled as "mother of .." or "father of ..". For consistency, we could handle any half-metafounders in the same way: (1) allocate a dummy individual, (2) then assign the metafounder to the dummy individual. This is essentially what AlphaPeel currently does, except we require the user to do this. I can add code so this is completed internally instead. However, this may introduce many dummy individuals. Instead, we could allocate the same dummy individual for each metafounder (rather than a unique dummy individual for each half-founder). However, is it preferable to support half-metafounders without dummy individuals? If so, this is a more complex modification. @gregorgorjanc

I have implemented code to support including multiple metafounders for the same individual and, as part of that, conducted some quick testing. One of the key changes in the code was in the -update_allele_prob function, which uses the mean of the respective founders to estimate the alternative allele frequency for each metafounder. For individuals with different metafounders as sires and dams, I used this code:

MF = list(pedigree.AAP.keys()) for mfx in MF: AAP = pedigree.AAP[mfx] sumOfGenotypes = np.full((4, peelingInfo.nLoci), 0, dtype=np.float32) n = 0 for ind in pedigree: if mfx in ind.MetaFounder and ind.isFounder(): ind_genotype = peelingInfo.getGenoProbs(ind.idn) if len(ind.MetaFounder) == 2: sumOfGenotypes += ind_genotype / len(ind.MetaFounder) n += 1 / len(ind.MetaFounder) else: sumOfGenotypes += ind_genotype n += 1 for i in range(peelingInfo.nLoci): marker_Aa = sumOfGenotypes[1, i] marker_aA = sumOfGenotypes[2, i] marker_AA = sumOfGenotypes[3, i] AAP[i] = (0.5 * (marker_Aa + marker_aA) + marker_AA) / n if AAP[i] < 0.01: AAP[i] = 0.01 elif AAP[i] > 0.99: AAP[i] = 0.99 pedigree.AAP[mfx] = AAP.astype(np.float32) for ind in pedigree: if ind.MetaFounder is not None and ind.isFounder(): AAP = np.zeros(peelingInfo.nLoci, dtype=np.float32) for mfx in ind.MetaFounder: AAP += pedigree.AAP[mfx] if len(ind.MetaFounder) > 1: AAP /= len(ind.MetaFounder) mafGeno = ProbMath.getGenotypesFromMaf(AAP) peelingInfo.anterior[ind.idn, :, :] = mafGeno

For testing, I compared this new approach with using dummy individuals, so the ped files look like this, where D0 is the multiple metafounder individual.

Without dummy individuals

id fid mid

M0 MF_1 MF_1

F0 MF_2 MF_2

D0 MF_1 MF_2

With dummy individuals (for D0)

id fid mid

M0 MF_1 MF_1

F0 MF_2 MF_2

D0 FD0 MD0

FD0 MF_1 MF_1

MD0 MF_2 MF_2

When using homozygous geno inputs (see below), whether using dummy individuals or not, we get identical results in the alternative allele frequency estimates for the two metafounders.

geno_file (all homozygous)

id dosage

M0 0 0 0 0 0

F0 2 2 2 2 2

D0 2 2 2 2 2

The resulting alt_allele_prob

MF_1 MF_2

0.33 0.99

0.33 0.99

0.33 0.99

0.33 0.99

0.33 0.99

When D0 is heterozygous at each of the 5 loci, we get different results using dummy individuals compared to without.

geno_file inputted

id dosage

M0 0 0 0 0 0

F0 2 2 2 2 2

D0 1 1 1 1 1

alt_allele_prob without dummy individuals

MF_1 MF_2

0.17 0.83

0.17 0.83

0.17 0.83

0.17 0.83

0.17 0.83

alt_allele_prob with dummy individuals

MF_1 MF_2

0.01 0.99

0.01 0.99

0.33 0.67

0.01 0.99

0.01 0.99

The results for alt_allele_prob without dummy individuals are logical. When we input the allele dosage, there is an equal chance that the alternative allele came from MF_1 or MF_2 for individual D0. This will increase the MF_1 estimate and reduce the MF_2 estimate compared to just including individuals M0 or F0 only.

The result for alt_allele_prob with dummy individuals is odd. The 0.99 and 0.01 values in MF_1 and MF_2, respectively, although possible, cannot be known (i.e as certain) from the little information provided. However, the different results at the third locus (0.33 MF_1 and 0.67 MF_2) are more plausible. I do not understand why a different result was estimated at this locus only.

Referring to the phased genotype probabilities of D0 only, there are some strange results.
My expectation for the phased genotype probabilities of D0 is:

phase probs

aa 0 0 0 0 0

aA 0.5 0.5 0.5 0.5 0.5

Aa 0.5 0.5 0.5 0.5 0.5

AA 0 0 0 0 0

Phased probs without dummy individuals

phase probs

aa 0.0000 0.0000 0.0000 0.0000 0.0000

aA 0.5000 0.5000 0.0000 0.5000 0.5000

Aa 0.5000 0.5000 0.9999 0.5000 0.5000

AA 0.0000 0.0000 0.0000 0.0000 0.0000

Phased probs with dummy individuals

phase probs

aa 0.0000 0.0000 0.0001 0.0000 0.0000

aA 0.9999 0.9999 0.0001 0.9999 0.9999

Aa 0.0001 0.0001 0.9997 0.0001 0.0001

AA 0.0000 0.0000 0.0001 0.0000 0.0000

So it seems something is up with the phasing?

Ok, issue solved. It is not an error, but a relevant note. Where multiple loci are heterozygous in genotyped founders, the default is to phase the middle locus so the alternative allele is paternal, but keep all other loci the same (50:50). This can be suppressed with the flag -no_phase_founder. For individual D0, this gives the phased probs:

Without dummy individuals

phase prob

aa 0.0000 0.0000 0.0000 0.0000 0.0000

aA 0.5000 0.5000 0.5000 0.5000 0.5000

Aa 0.5000 0.5000 0.5000 0.5000 0.5000

AA 0.0000 0.0000 0.0000 0.0000 0.0000

With dummy individuals

phase prob

aa 0.0000 0.0000 0.0000 0.0000 0.0000

aA 0.9999 0.9999 0.9999 0.9999 0.9999

Aa 0.0001 0.0001 0.0001 0.0001 0.0001

AA 0.0000 0.0000 0.0000 0.0000 0.0000

The phasing without dummy individuals ignores the additional information from the metafounders, where it is more likely that the alternative allele is maternal, as MF_2 has the highest alternative allele frequency. This is correct to ignore where individual D0 is known to come from MF_1 and MF_2, but not necessarily know which one represents the dam or sire. However, if it is known that individual D0's sire was MF_1 and its dam was MF_2, it would be valuable to include this additional information in the phased probabilities. When using dummy individuals, this is accounted for and can explain the difference in the estimates.

To summarise, when considering multiple metafounder individuals, there are two possible scenarios:

It is not known which metafounders represent the dam or the sire.

It is known which metafounders represent the dam or the sire.

I can modify how the anterior probability is updated so scenario (2) can be accounted for without the need for dummy individuals, but how would we distinguish between the two scenarios?

Will come back to this later.

gregorgorjanc · 2025-12-02T21:04:35Z

All looks great except one condition, which likely should not be in there - we should be able to support different metafounders. Let's address this as part of this PR so we don't make the process to complicated. Can you suggest edits to this effect? We might need more/different tests for this.

XingerTang

All looks good to me! Just a reminder: Don't forget to remove the "# expect more informative error message" comments

Updating errors

83a27f4

gregorgorjanc reviewed Dec 2, 2025

View reviewed changes

gregorgorjanc changed the title ~~Updating error messages~~ Updating error messages and metafounder rules Dec 4, 2025

XingerTang reviewed Dec 5, 2025

View reviewed changes

phase	probs
aa	0 0 0 0 0
aA	0.5 0.5 0.5 0.5 0.5
Aa	0.5 0.5 0.5 0.5 0.5
AA	0 0 0 0 0

phase	probs
aa	0.0000 0.0000 0.0000 0.0000 0.0000
aA	0.5000 0.5000 0.0000 0.5000 0.5000
Aa	0.5000 0.5000 0.9999 0.5000 0.5000
AA	0.0000 0.0000 0.0000 0.0000 0.0000

phase	probs
aa	0.0000 0.0000 0.0001 0.0000 0.0000
aA	0.9999 0.9999 0.0001 0.9999 0.9999
Aa	0.0001 0.0001 0.9997 0.0001 0.0001
AA	0.0000 0.0000 0.0001 0.0000 0.0000

phase	prob
aa	0.0000 0.0000 0.0000 0.0000 0.0000
aA	0.5000 0.5000 0.5000 0.5000 0.5000
Aa	0.5000 0.5000 0.5000 0.5000 0.5000
AA	0.0000 0.0000 0.0000 0.0000 0.0000

phase	prob
aa	0.0000 0.0000 0.0000 0.0000 0.0000
aA	0.9999 0.9999 0.9999 0.9999 0.9999
Aa	0.0001 0.0001 0.0001 0.0001 0.0001
AA	0.0000 0.0000 0.0000 0.0000 0.0000

Updating error messages and metafounder rules #27

Are you sure you want to change the base?

Updating error messages and metafounder rules #27

Uh oh!

Conversation

RosCraddock commented Dec 2, 2025

Uh oh!

gregorgorjanc Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

RosCraddock Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

gregorgorjanc Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RosCraddock Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

gregorgorjanc Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

gregorgorjanc Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

RosCraddock Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

RosCraddock Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RosCraddock Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RosCraddock Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

gregorgorjanc commented Dec 2, 2025

Uh oh!

XingerTang left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gregorgorjanc Dec 3, 2025 •

edited

Loading

RosCraddock Dec 11, 2025 •

edited

Loading

RosCraddock Dec 15, 2025 •

edited

Loading