Building on #107, consolidate several issues (e.g., duplicate_rsid, discrepant_XY) into one dataframe with the following columns / dtypes:
| Column |
pandas dtype |
rsid |
pd.StringDtype() |
chrom |
pd.CategoricalDtype() |
pos |
pd.UInt32Dtype() |
genotype |
pd.CategoricalDtype() |
duplicate_rsid |
pd.BooleanDtype() |
discrepant_loci |
pd.BooleanDtype() |
discrepant_XY |
pd.BooleanDtype() |
heterozygous_MT |
pd.BooleanDtype() |
discrepant_vcf_position |
pd.BooleanDtype() |
discrepant_merge_position |
pd.BooleanDtype() |
discrepant_merge_genotype |
pd.BooleanDtype() |
Multiple issue columns could take on the value of True, and getting SNPs with issues (e.g., discrepant_XY) could be handled by filtering the issues dataframe.
rsids could appear more than once in this dataframe. However, if an rsid has two or more rows that are equivalent (same values for chrom, pos, and genotype), their issues should be consolidated into one row, with the issue columns flagging the issue(s).
Building on #107, consolidate several issues (e.g.,
duplicate_rsid,discrepant_XY) into one dataframe with the following columns / dtypes:rsidpd.StringDtype()chrompd.CategoricalDtype()pospd.UInt32Dtype()genotypepd.CategoricalDtype()duplicate_rsidpd.BooleanDtype()discrepant_locipd.BooleanDtype()discrepant_XYpd.BooleanDtype()heterozygous_MTpd.BooleanDtype()discrepant_vcf_positionpd.BooleanDtype()discrepant_merge_positionpd.BooleanDtype()discrepant_merge_genotypepd.BooleanDtype()Multiple issue columns could take on the value of
True, and getting SNPs with issues (e.g.,discrepant_XY) could be handled by filtering the issues dataframe.rsids could appear more than once in this dataframe. However, if anrsidhas two or more rows that are equivalent (same values forchrom,pos, andgenotype), their issues should be consolidated into one row, with the issue columns flagging the issue(s).