Skip to content

Constraints violation when loading a merged VCF #55

@lbeltrame

Description

@lbeltrame

I'm hitting this repeatedly when trying to load VCFs that have been merged via bcftools merge then normalized and decomposed via vt and then annotated via vcfanno. Note that each of these had been annotated with VEP prior to merging.

It's absolutely unclear what causes it.

Traceback (most recent call last):
  File "/opt/bcbio/anaconda/bin/vcf2db.py", line 924, in <module>
    impacts_extras=a.impacts_field, aok=a.a_ok)
  File "/opt/bcbio/anaconda/bin/vcf2db.py", line 234, in __init__
    self.load()
  File "/opt/bcbio/anaconda/bin/vcf2db.py", line 319, in load
    i = self._load(self.cache, create=True, start=1)
  File "/opt/bcbio/anaconda/bin/vcf2db.py", line 312, in _load
    self.insert(variants, expanded, keys, i, create=create)
  File "/opt/bcbio/anaconda/bin/vcf2db.py", line 374, in insert
    vilengths, variant_impacts)
  File "/opt/bcbio/anaconda/bin/vcf2db.py", line 402, in _insert
    self.__insert(v_objs, self.metadata.tables['variants'].insert())
  File "/opt/bcbio/anaconda/bin/vcf2db.py", line 444, in __insert
    trans.execute(stmt, o)
  File "/opt/bcbio/anaconda/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 948, in execute
    return meth(self, multiparams, params)
  File "/opt/bcbio/anaconda/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 269, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/opt/bcbio/anaconda/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1060, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/opt/bcbio/anaconda/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1200, in _execute_context
    context)
  File "/opt/bcbio/anaconda/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1413, in _handle_dbapi_exception
    exc_info
  File "/opt/bcbio/anaconda/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/opt/bcbio/anaconda/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
    context)
  File "/opt/bcbio/anaconda/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 507, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.IntegrityError: (sqlite3.IntegrityError) CHECK constraint failed: variants [SQL: u'INSERT INTO variants (variant_id, chrom, start, "end", vcf_id, ref, alt, qual, filter, type, sub_type, call_rate, num_hom_ref, num_het, num_hom_alt, num_unknown, aaf, gene, ensembl_gene_id, transcript, is_exonic, is_coding, is_lof, is_splicing, is_canonical, exon, codon_change, aa_change, aa_length, biotype, impact, impact_so, impact_severity, polyphen_pred, polyphen_score, sift_pred, sift_score, aa, af, altaineandertal, ancestral_allele, cadd_phred, cadd_raw, cadd_raw_rankscore, caf, cds, clnsig, cnt, dann_rankscore, dann_score, db, dp, denisova, eigen_pc_phred, eigen_pc_raw, eigen_pc_raw_rankscore, eigen_phred, eigen_raw, eigen_coding_or_noncoding, "ensembl?", ensembl_functional_consequence, ensembl_gene, ensembl_geneid, ensembl_id_c_change_p_change, ensembl_proteinid, ensembl_region, ensembl_transcriptid, fathmm_converted_rankscore, fathmm_pred, fathmm_score, geneinfo, "gerp++_nr", "gerp++_rs", "gerp++_rs_rankscore", gm12878_confidence_value, gm12878_fitcons_score, gm12878_fitcons_score_rankscore, gtex_v6p_gene, gtex_v6p_tissue, genocanyon_score, genocanyon_score_rankscore, h1_hesc_confidence_value, h1_hesc_fitcons_score, h1_hesc_fitcons_score_rankscore, huvec_confidence_value, huvec_fitcons_score, huvec_fitcons_score_rankscore, interpro_domain, lof, lrt_omega, lrt_converted_rankscore, lrt_pred, lrt_score, lseq, m_cap_pred, m_cap_rankscore, m_cap_score, msi, msilen, metalr_pred, metalr_rankscore, metalr_score, metasvm_pred, metasvm_rankscore, metasvm_score, mutpred_aachange, mutpred_top5features, mutpred_protid, mutpred_rankscore, mutpred_score, mutationassessor_uniprotid, mutationassessor_pred, mutationassessor_score, mutationassessor_score_rankscore, mutationassessor_variant, mutationtaster_aae, mutationtaster_converted_rankscore, mutationtaster_model, mutationtaster_pred, mutationtaster_score, nmd, provean_converted_rankscore, provean_pred, provean_score, revel_rankscore, revel_score, rseq, "refseq?", refseq_functional_consequence, refseq_gene, refseq_id_c_change_p_change, refseq_region, reliability_index, sample, shift3, somatic, sor, ssf, status, siphy_29way_logodds, siphy_29way_logodds_rankscore, siphy_29way_pi, transcript_id_vest3, transcript_var_vest3, vd, vest3_rankscore, vest3_score, ac_adj_exac_afr, ac_adj_exac_amr, ac_adj_exac_eas, ac_adj_exac_fin, ac_adj_exac_nfe, ac_adj_exac_oth, ac_adj_exac_sas, ac_exac_all, ada_score, af_1kg_afr, af_1kg_all, af_1kg_amr, af_1kg_eas, af_1kg_eur, af_1kg_sas, af_adj_exac_afr, af_adj_exac_amr, af_adj_exac_eas, af_adj_exac_fin, af_adj_exac_nfe, af_adj_exac_oth, af_adj_exac_sas, af_esp_aa, af_esp_all, af_esp_ea, af_exac_all, an_adj_exac_afr, an_adj_exac_amr, an_adj_exac_eas, an_adj_exac_fin, an_adj_exac_nfe, an_adj_exac_oth, an_adj_exac_sas, an_exac_all, cds_strand, clinvar_disease_name, clinvar_pathogenic, clinvar_sig, codon_degeneracy, codonpos, common_pathogenic, cosmic_ids, cpg_island, cse_hiseq, dgv, encode_consensus_gm12878, encode_consensus_h1hesc, encode_consensus_helas3, encode_consensus_hepg2, encode_consensus_huvec, encode_consensus_k562, fathmm_mkl_coding_group, fathmm_mkl_coding_pred, fathmm_mkl_coding_rankscore, fathmm_mkl_coding_score, fitcons, gerp_elements, gnomad_ac, gnomad_ac_amr, gnomad_ac_asj, gnomad_ac_eas, gnomad_ac_fin, gnomad_ac_nfe, gnomad_ac_oth, gnomad_ac_popmax, gnomad_ac_sas, gnomad_af, gnomad_af_afr, gnomad_af_amr, gnomad_af_asj, gnomad_af_eas, gnomad_af_fin, gnomad_af_nfe, gnomad_af_oth, gnomad_af_popmax, gnomad_af_sas, gnomad_an, gnomad_an_amr, gnomad_an_asj, gnomad_an_eas, gnomad_an_fin, gnomad_an_nfe, gnomad_an_oth, gnomad_an_popmax, gnomad_an_sas, gnomad_gc, gnomad_gc_female, gnomad_gc_male, gnomad_hom, gnomad_hom_female, gnomad_hom_male, gnomad_popmax, gnomad_exomes_ac, gnomad_exomes_af, gnomad_exomes_an, gnomad_genomes_ac, gnomad_genomes_af, gnomad_genomes_an, hapmap1, hapmap2, integrated_confidence_value, integrated_fitcons_score, integrated_fitcons_score_rankscore, max_aaf_all, num_exac_het, num_exac_hom, phastcons100way_vertebrate, phastcons100way_vertebrate_rankscore, phastcons20way_mammalian, phastcons20way_mammalian_rankscore, phylop100way_vertebrate, phylop100way_vertebrate_rankscore, phylop20way_mammalian, phylop20way_mammalian_rankscore, refcodon, rf_score, rmsk, rs_ids, stam_mean, stam_names, tfbs, allele, feature_type, intron, hgvsc, hgvsp, cdna_position, cds_position, existing_variation, allele_num, distance, strand, flags, variant_class, symbol_source, hgnc_id, tsl, appris, ccds, ensp, swissprot, trembl, uniparc, refseq_match, source, given_ref, used_ref, bam_edit, gene_pheno, domains, hgvs_offset, afr_af, amr_af, eas_af, eur_af, sas_af, aa_af, ea_af, gnomad_afr_af, gnomad_amr_af, gnomad_asj_af, gnomad_eas_af, gnomad_fin_af, gnomad_nfe_af, gnomad_oth_af, gnomad_sas_af, max_af, max_af_pops, clin_sig, pheno, pubmed, motif_name, motif_pos, high_inf_pos, motif_score_change, lof_filter, lof_flags, lof_info, gts, gt_types, gt_phases, gt_depths, gt_ref_depths, gt_alt_depths, gt_quals, gt_alt_freqs) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)'] [parameters: (1, u'chr1', 11166712, 11166713, u'rs2536', u'T', u'C', 266.0, u'REJECT', 'snp', 'ts', 0.07317073170731707, 1, 11, 0, 152, 0.4583333333333333, u'MTOR', None, u'ENST00000361445', 1, 0, 0, 0, 1, u'58/58', '', '', u'', u'protein_coding', u'3_prime_UTR_variant', u'3_prime_UTR_variant', 'LOW', u'', None, u'', None, None, u'0.0970', None, None, None, None, None, u'0.903,0.09704', None, 'None', None, None, None, 1, 11178, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, u'', None, None, None, None, u'TATTTGTTCTGCTCATAATT', None, None, None, 2.0, 1.0, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, 'None', None, None, None, None, None, u'CCAATATGTACCAGACCTTC', None, None, None, None, None, None, u'mito_mango_63', 0, u'', inf, 0.0, u'StrongSomatic', None, None, None, None, None, 326, None, None, None, None, None, None, None, None, None, None, None, 0.1316000074148178, 0.09700000286102295, 0.07199999690055847, 0.08529999852180481, 0.024900000542402267, 0.15440000593662262, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, None, None, None, None, None, None, 0, None, 0, 0, None, u'T', u'T', u'T', u'T', u'T', u'T', None, None, None, None, 0.29510000348091125, None, 1797, 63, 20, 151, 213, 359, 54, 937, None, -1.0, 0.10740000009536743, 0.07519999891519547, 0.06620000302791595, 0.093299999833107, 0.061000000685453415, 0.023900000378489494, 0.054999999701976776, 0.10740000009536743, -1.0, 30974, 838, 302, 1618, 3494, 15018, 982, 8722, None, 15487.0, 6928.0, 8559.0, None, None, None, None, None, -1.0, None, None, -1.0, None, 0.07020000368356705, 23.648700714111328, None, None, None, 0.15440000593662262, None, None, None, None, None, None, None, None, None, None, None, None, None, u'rs2536', None, None, None, u'C', u'Transcript', u'', u'ENST00000361445.4:c.*829A>G', u'', u'8556/8677', u'', u'rs2536', u'1', u'', u'-1', u'', u'SNV', u'HGNC', u'3942', u'', u'', u'CCDS127.1', u'ENSP00000354558', u'P42345', u'Q96QW8&B1AKQ2&B1AKP8', u'UPI000012ABD3', u'', u'Ensembl', u'T', u'T', u'', u'1', u'', u'', u'0.1316', u'0.072', u'0.0853', u'0.0249', u'0.1544', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'0.1544', u'SAS', u'', u'', u'21973240&22815832&23209702&23423739&23524405&24816861&27462867&27533457&28280736&15720714&18545701', u'', u'', u'', u'', u'', u'', u'', <read-only buffer for 0x7fa9a86b2b28, size -1, offset 0 at 0x7fa99f6445f0>, <read-only buffer for 0x7fa9a86a3738, size -1, offset 0 at 0x7fa99f644630>, <read-only buffer for 0x7fa9a86c4ed8, size -1, offset 0 at 0x7fa99f644670>, <read-only buffer for 0x7fa9a86cb330, size -1, offset 0 at 0x7fa99f6446b0>, <read-only buffer for 0x7fa9a86a37b0, size -1, offset 0 at 0x7fa99f6446f0>, <read-only buffer for 0x7fa9a86a3828, size -1, offset 0 at 0x7fa99f644730>, <read-only buffer for 0x7fa9aebf42b0, size -1, offset 0 at 0x7fa99f644770>, <read-only buffer for 0x7fa9a98afb28, size -1, offset 0 at 0x7fa99f6447b0>)] (Background on this error at: http://sqlalche.me/e/gkpj)

The only other thing of note was that the PED was "generated" (I don't need these information) like this:

bcftools query -l data.annot.vcf.gz | awk 'BEGIN{FS="\t"}{print NR,$1,0,0,2,-9 }' > fake.ped

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions