Hi there, congrats for this tool, it's impressive :D!
I recently started using NPBDetect to analyze a set of bacterial genomes ( analyzed with DeepBGC). While running the prediction module, I encountered two blocking errors related to missing qualifiers in the GenBank files, i knew this is intended to work with the output files of antimash. So...
I managed to patch the code to get it running, but I would like to double-check with you if my "fixes" might be negatively impacting the prediction accuracy.
- The Issues Encountered
Error A: Missing Translations When running predict, the script crashed with a KeyError: 'translation' in bin/gbk_to_fa.py. It seems some CDS features in my input files do not have a translation qualifier.
Traceback:
File ".../NPBDetect/bin/gbk_to_fa.py", line 20, in extract_proteins
mseq = SeqRecord(Seq(P.qualifiers['translation'][0]), id = pname, name= '',
KeyError: 'translation'
Error B: Missing PFAM Scores After fixing the first error, I hit a second one: KeyError: 'score' in bin/PFAM_feats.py. It appears some annotated PFAM_domain features lack a specific score field.
Traceback:
File ".../NPBDetect/bin/PFAM_feats.py", line 30, in get_PFAM_domains
score = float(F.qualifiers["score"][0])
KeyError: 'score'
- The workaround I applied
To bypass these errors and allow the pipeline to finish, I modified the scripts to simply skip any feature that lacks these specific tags, rather than crashing.
Modification in bin/gbk_to_fa.py: I added a check to skip CDS features without translations:
In extract_proteins function
for P in features:
if P.type == 'CDS':
if 'translation' not in P.qualifiers: continue # <--- Added this line
# ... rest of the code
Modification in bin/PFAM_feats.py: I added a check to skip PFAM domains without scores:
In get_PFAM_domains function
for F in feat:
if(F.type == 'PFAM_domain'):
if 'score' not in F.qualifiers: continue # <--- Added this line
score = float(F.qualifiers["score"][0])
# ... rest of the code
3. My Question
With these modifications, I was able to successfully run the tool on my 31 genomes and obtained probabilities ranging from 0.68 to 0.70 for cytotoxic activity.
However, I am concerned about the validity of these results. Does skipping these specific features (incomplete CDS or PFAM entries) significantly compromise the model's feature vectors?
Does the model treat missing data as "neutral," or does the absence of these specific features (due to skipping) artificially lower or skew the probability scores?
Also, Deepbgc predicted all the sequences to have a high probability of being a cytotoxic region. Ill attach the content of some gbks to give you a better context:
LOCUS GCA_039889235_1 2266 bp DNA UNK 01-JAN-1980
DEFINITION .
ACCESSION GCA_039889235_1_1942816-1945082
VERSION GCA_039889235_1_1942816-1945082.1
KEYWORDS .
SOURCE GCA_039889235
ORGANISM GCA_039889235
.
COMMENT ##antiSMASH-Data-START##
Version :: 6.1.1
##antiSMASH-Data-END##
FEATURES Location/Qualifiers
CDS complement(1..432)
/locus_tag="GCA_039889235_1_GCA_039889235_1_1706"
/deepbgc_score="0.50514"
PFAM_domain complement(61..258)
/db_xref="PF00717.22"
/evalue="8.6e-14"
/locus_tag="GCA_039889235_1_GCA_039889235_1_1706"
/database="31.0"
/description="Peptidase S24-like"
/deepbgc_score="0.50514"
CDS complement(543..1271)
/locus_tag="GCA_039889235_1_GCA_039889235_1_1707"
CDS 1502..2266
/locus_tag="GCA_039889235_1_GCA_039889235_1_1708"
/deepbgc_score="0.65581"
PFAM_domain 1577..1951
/db_xref="PF01209.17"
/evalue="3.6e-14"
/locus_tag="GCA_039889235_1_GCA_039889235_1_1708"
/database="31.0"
/description="ubiE/COQ5 methyltransferase family"
/deepbgc_score="0.61972"
PFAM_domain 1610..1918
/db_xref="PF13489.5"
/evalue="4.7e-16"
/locus_tag="GCA_039889235_1_GCA_039889235_1_1708"
/database="31.0"
/description="Methyltransferase domain"
/deepbgc_score="0.60990"
PFAM_domain 1610..1915
/db_xref="PF08003.10"
/evalue="0.0025"
/locus_tag="GCA_039889235_1_GCA_039889235_1_1708"
/database="31.0"
/description="Protein of unknown function (DUF1698)"
/deepbgc_score="0.62242"
PFAM_domain 1616..1921
/db_xref="PF05175.13"
/evalue="0.00065"
/locus_tag="GCA_039889235_1_GCA_039889235_1_1708"
/database="31.0"
/description="Methyltransferase small domain"
/deepbgc_score="0.59407"
PFAM_domain 1616..1915
/db_xref="PF06325.12"
/evalue="0.00044"
/locus_tag="GCA_039889235_1_GCA_039889235_1_1708"
/database="31.0"
/description="Ribosomal protein L11 methyltransferase
(PrmA)"
/deepbgc_score="0.62270"
PFAM_domain 1619..1915
/db_xref="PF13847.5"
/evalue="1.8e-17"
/locus_tag="GCA_039889235_1_GCA_039889235_1_1708"
/database="31.0"
/description="Methyltransferase domain"
/deepbgc_score="0.68168"
PFAM_domain 1619..1885
/db_xref="PF07021.11"
/evalue="0.0043"
/locus_tag="GCA_039889235_1_GCA_039889235_1_1708"
/database="31.0"
/description="Methionine biosynthesis protein MetW"
/deepbgc_score="0.66653"
PFAM_domain 1622..1924
/db_xref="PF03141.15"
/evalue="3.7e-05"
/locus_tag="GCA_039889235_1_GCA_039889235_1_1708"
/database="31.0"
/description="Putative S-adenosyl-L-methionine-dependent
methyltransferase"
/deepbgc_score="0.65640"
PFAM_domain 1622..1915
/db_xref="PF05401.10"
/evalue="0.00035"
/locus_tag="GCA_039889235_1_GCA_039889235_1_1708"
/database="31.0"
/description="Nodulation protein S (NodS)"
/deepbgc_score="0.67902"
PFAM_domain 1628..1900
/db_xref="PF13649.5"
/evalue="4.1e-21"
/locus_tag="GCA_039889235_1_GCA_039889235_1_1708"
/database="31.0"
/description="Methyltransferase domain"
/deepbgc_score="0.72780"
PFAM_domain 1631..1912
/db_xref="PF08241.11"
/evalue="5.8e-24"
/locus_tag="GCA_039889235_1_GCA_039889235_1_1708"
/database="31.0"
/description="Methyltransferase domain"
/deepbgc_score="0.72197"
PFAM_domain 1631..1906
/db_xref="PF08242.11"
/evalue="1.5e-14"
/locus_tag="GCA_039889235_1_GCA_039889235_1_1708"
/database="31.0"
/description="Methyltransferase domain"
/deepbgc_score="0.70264"
PFAM_domain 1751..1912
/db_xref="PF05148.14"
/evalue="0.00015"
/locus_tag="GCA_039889235_1_GCA_039889235_1_1708"
/database="31.0"
/description="Hypothetical methyltransferase"
/deepbgc_score="0.62065"
region 1..2266
/region_number="1"
/candidate_cluster_numbers="1"
/product="unknown"
/contig_edge="False"
/tool="DeepBGC"
/activity="cytotoxic"
/note="DeepBGC_Score: 0.58048"
cand_cluster 1..2266
/candidate_cluster_number="1"
/protoclusters="1"
/kind="single"
/product="unknown"
/tool="DeepBGC"
protocluster 1..2266
/protocluster_number="1"
/product="unknown"
/tool="DeepBGC"
proto_core 1..2266
/protocluster_number="1"
/product="unknown"
/tool="DeepBGC"
ORIGIN
1 ctagacatgg tggaggcagc gggtcgcgac gccccagatg acgaggtccg acaacgctgc
61 gaccctgatg tccgggtagc gtgggttgtc cgcctggagg accacgccgg aagtcgtgag
121 gcgcagcctc ttgacggtga gttcgccgtc gaggaccgcg acgacgacac agccgtcgcg
181 gggctctagc gcccggttga cgatcagctc gtcgccgtcg ctgatcccgg cgccctccat
241 ggagtccccg gacacccgga cgacataggt gctggtgatg tccttgatga ggtgttcgtt
301 aaggtcgatg cggccgtcaa agtagtcctg ggccggggaa gggtagcctg cggcgaccgg
361 caccggcgag atcaggaccg acagaagaga agtgcccgcg tctatcacgc ggggcccgat
421 gattacgccc acaacacacc tttattcgaa tatatgttcg atacatccag tgtagctccg
481 ggtactgaca ttgggcagac tttgcccagt gccagctgtc ccaagctgcg gggccggcta
541 tttcagtttg cggacaggct gacgaagtac tcctgctgga ccggccgccg gcccagcttc
601 agggccgtgc gctgcgaggg caggttccct gcgtggatgt ggcccgagac ggtgtccgcg
661 cccgagcgtt ggccggtcag gaaagttgac tgcactgctg gagccaagcc cttgccacgc
721 aggcgttccg tcagaaacac gtccagcatg cagactgcac gtcggccaaa gatcggtgag
781 atggtggcag ccaccaacac cgcaaagccg tggtcgtccc ggagcgtcat caacgagccc
841 tggtccgccg cgtcctggag cccctcctcg tgggaccctg aaacaaacgg ggcgagcacc
901 ggggcatcgg accgccacgc ttgatgttcg tgctggtagt cggcgaacac ttcagtcgcc
961 gcttccggca cgcataggga cccggccacg cccaccccgc ggtacgcctc ggccacttgg
1021 gcggccagtg tatccgcaga ttcggtgtcg tttattcggg cagtagtctt gaccgccacg
1081 aacgggaatg ccggatcgag gttgcggtat ctgaggcccg aaatgacggt gctcccatcg
1141 cttgcgtcaa gtagccgctc tgtcatggtc ccggtgctga cgtcgactcc cagccggccg
1201 atcctggtcg ccacgacgtc cggtaacgca tgagcggcga gggaccctac gtcacctatc
1261 ccggtttgca ctgcgtcctc agcgtcgtac ccggcaggaa ggtaatccag gcccggacgg
1321 gcgaaggcgg cgagttggtg gtccggccgt ccaaaggagt tcgaaatgga gggcgggaag
1381 ggacgggatc acaaccaccc ttggagaagg cataaaagac accaaggggt gtcccgtaag
1441 tggtaaagga gggtcgaccc tattgtcccg cccgtgagga ttcgcctgaa agggaagaga
1501 agtgaaggcc aacccctatg acgctttcgc cgagaactac tcggctgaaa atgagtccag
1561 cctcctcaac gcgtactaca agcggccggc aatgattggc cttgccggcg acgtgaccgg
1621 tcaccgtgtc ctggacgcag ggtgcggttc cgggcccctg tccgcggcac tgagcgcgaa
1681 aggcgcgatc atgatcggct tcgattccag tcctgcgatg ctcgaattgg caagacagcg
1741 attgggcgcg accgcggacc tgtacgtggc cgacctcagc aaaccgctcc ctttcgccga
1801 tgactccttc gatgacgttg tctcgtcctt ggtcctgcac tatctggagg actggtcagc
1861 accacttgcc gaactgcggc gcgtcctgaa gccgggcggt cgcctgatcc tgtcggtcaa
1921 ccaccccacg gtcagcgttg tcacccaacc aacggaggac gacttcgcca tccggcagta
1981 ctcggaggat tacgagttca atggcgagcc tgcggtcctg gccttctggc accgaccact
2041 gcaagagatg atcagcgcct tcacgtcggc gggataccgc atagccaccg tgcgcgaacc
2101 aaagccatct ccagacacac cgcccgaact ccttcccccg cgcatcgtca acggcgagag
2161 gacagcgttc ctgtccttca tcttcttcgt cctcgaagca aacaaaaccg ccatgcccgt
2221 ttcggccgat gaggaactgg cacagggggt tgttggacgc cggtga
//
LOCUS GCA_039636795 2539 bp DNA UNK 01-JAN-1980
DEFINITION .
ACCESSION GCA_039636795_4840764-4843303
VERSION GCA_039636795_4840764-4843303.1
KEYWORDS .
SOURCE GCA_039636795
ORGANISM GCA_039636795
.
COMMENT ##antiSMASH-Data-START##
Version :: 6.1.1
##antiSMASH-Data-END##
FEATURES Location/Qualifiers
CDS 1..816
/locus_tag="GCA_039636795_GCA_039636795_4443"
/deepbgc_score="0.65311"
PFAM_domain 79..588
/db_xref="PF13489.5"
/evalue="1.4e-15"
/locus_tag="GCA_039636795_GCA_039636795_4443"
/database="31.0"
/description="Methyltransferase domain"
/deepbgc_score="0.34629"
PFAM_domain 88..336
/db_xref="PF05175.13"
/evalue="2.9e-06"
/locus_tag="GCA_039636795_GCA_039636795_4443"
/database="31.0"
/description="Methyltransferase small domain"
/deepbgc_score="0.27928"
PFAM_domain 100..450
/db_xref="PF08003.10"
/evalue="5e-06"
/locus_tag="GCA_039636795_GCA_039636795_4443"
/database="31.0"
/description="Protein of unknown function (DUF1698)"
/deepbgc_score="0.38819"
PFAM_domain 103..435
/db_xref="PF07021.11"
/evalue="2e-08"
/locus_tag="GCA_039636795_GCA_039636795_4443"
/database="31.0"
/description="Methionine biosynthesis protein MetW"
/deepbgc_score="0.42224"
PFAM_domain 106..363
/db_xref="PF02353.19"
/evalue="0.00036"
/locus_tag="GCA_039636795_GCA_039636795_4443"
/database="31.0"
/description="Mycolic acid cyclopropane synthetase"
/deepbgc_score="0.53808"
PFAM_domain 109..294
/db_xref="PF01135.18"
/evalue="2.9e-07"
/locus_tag="GCA_039636795_GCA_039636795_4443"
/database="31.0"
/description="Protein-L-isoaspartate(D-aspartate)
O-methyltransferase (PCMT)"
/deepbgc_score="0.59243"
PFAM_domain 112..417
/db_xref="PF03848.13"
/evalue="0.0076"
/locus_tag="GCA_039636795_GCA_039636795_4443"
/database="31.0"
/description="Tellurite resistance protein TehB"
/deepbgc_score="0.67284"
PFAM_domain 118..507
/db_xref="PF13847.5"
/evalue="8.7e-31"
/locus_tag="GCA_039636795_GCA_039636795_4443"
/database="31.0"
/description="Methyltransferase domain"
/deepbgc_score="0.76857"
PFAM_domain 118..429
/db_xref="PF01209.17"
/evalue="2.4e-17"
/locus_tag="GCA_039636795_GCA_039636795_4443"
/database="31.0"
/description="ubiE/COQ5 methyltransferase family"
/deepbgc_score="0.81779"
PFAM_domain 127..423
/db_xref="PF05219.11"
/evalue="0.0096"
/locus_tag="GCA_039636795_GCA_039636795_4443"
/database="31.0"
/description="DREV methyltransferase"
/deepbgc_score="0.82529"
PFAM_domain 127..309
/db_xref="PF02390.16"
/evalue="1.6e-05"
/locus_tag="GCA_039636795_GCA_039636795_4443"
/database="31.0"
/description="Putative methyltransferase "
/deepbgc_score="0.84340"
PFAM_domain 130..414
/db_xref="PF13649.5"
/evalue="1.6e-23"
/locus_tag="GCA_039636795_GCA_039636795_4443"
/database="31.0"
/description="Methyltransferase domain"
/deepbgc_score="0.88636"
PFAM_domain 133..426
/db_xref="PF08241.11"
/evalue="3.6e-24"
/locus_tag="GCA_039636795_GCA_039636795_4443"
/database="31.0"
/description="Methyltransferase domain"
/deepbgc_score="0.88448"
PFAM_domain 133..420
/db_xref="PF08242.11"
/evalue="4.4e-16"
/locus_tag="GCA_039636795_GCA_039636795_4443"
/database="31.0"
/description="Methyltransferase domain"
/deepbgc_score="0.87832"
CDS 934..1293
/locus_tag="GCA_039636795_GCA_039636795_4444"
/deepbgc_score="0.71925"
PFAM_domain 934..1245
/db_xref="PF03795.13"
/evalue="4.1e-08"
/locus_tag="GCA_039636795_GCA_039636795_4444"
/database="31.0"
/description="YCII-related domain"
/deepbgc_score="0.71925"
CDS 1295..2539
/locus_tag="GCA_039636795_GCA_039636795_4445"
/deepbgc_score="0.56171"
PFAM_domain 1340..1537
/db_xref="PF04542.13"
/evalue="3e-06"
/locus_tag="GCA_039636795_GCA_039636795_4445"
/database="31.0"
/description="Sigma-70 region 2 "
/deepbgc_score="0.53787"
PFAM_domain 1631..1777
/db_xref="PF08281.11"
/evalue="2.3e-07"
/locus_tag="GCA_039636795_GCA_039636795_4445"
/database="31.0"
/description="Sigma-70, region 4"
/deepbgc_score="0.58554"
region 1..2539
/region_number="1"
/candidate_cluster_numbers="1"
/product="unknown"
/contig_edge="False"
/tool="DeepBGC"
/activity="cytotoxic"
/note="DeepBGC_Score: 0.64469"
cand_cluster 1..2539
/candidate_cluster_number="1"
/protoclusters="1"
/kind="single"
/product="unknown"
/tool="DeepBGC"
protocluster 1..2539
/protocluster_number="1"
/product="unknown"
/tool="DeepBGC"
proto_core 1..2539
/protocluster_number="1"
/product="unknown"
/tool="DeepBGC"
ORIGIN
1 atgaacgcgc agcagcctga agatgtttac acccacgggc accacgagtc ggttgtccgg
61 gcccatgcct cgcggacggc cgagaattcg gccgcgtttg tcattccgca tctcaccccg
121 gggacgtcgg tcctcgacgt cgggtgcgga ccaggcagca tcacgtgcga tttcgcgggg
181 ctggttgcac ccgcgcaggt catcggcttg gatcgctcgg cggacatcgt cgcccaggca
241 acggaactgg caaaggaccg cggcgtagac aacgtggagt tccgaaccgg caacatctac
301 gatctcgagt ttgaggacga gaccttcgac ctcgtccacg cccatcaagt cctccagcac
361 ctcaccgatc ccgtcgccgc gctgcgtgag atgcgccgcg tggcaaaacc gggcgcgatc
421 gtggccgtcc gcgacgccga tttccacggc atgagctggt acccggaagt ccccgagctc
481 gatgactgga tggagctcta ccagaagatc gcacggcgaa acggtgccga accggatgcc
541 ggccgtcgat tggtctcgtg ggcacagcag gcaggctttg cccaggtggc gcccagcagc
601 agcaactggc tctacgccac agcccaacaa cgggcatggc agtcccgcgt gtggagcgaa
661 cgtgtcctcc actccgcttt tgccgagcaa gccctcgaat acgggttcgc caatgaggcc
721 gacctcgccc ggatcgctgc gggctggcac cgctggggag ccacggacga tggctacttc
781 ctcattccca acggcgaggt gatcgcgcgg gcctaggttg ggcgtgaagg aactttccca
841 aaaaactttg cgaaccatgt agaaaagcgc cccgcagttc cgacccatga gtgaaagcac
901 ccaaacaggg tgccacttca taaggagttt gagatgaaat acatgatcat gatgttcggg
961 tccgccgagg gcatgatgga aaccgccgat ccggagtggg tcaaggaaat gatcgggttc
1021 atgatccaga tcgacaagga cctccgcgat tccggtgaac tcgtctttaa cgcagggctg
1081 gctgatggca gcaccgcgaa gctcgtcaag cagaccccgg acggcgtcat caccacggac
1141 ggcccatacg ccgagtcgaa ggagtcgctg atcgggtact gggtggtgga tgtggccagc
1201 gaggaacgcg ctgtggaaat ctgctcgagc atcgtcaagt actcgcaagt ggttgagctc
1261 cgccctatcc cggacggtcc cccggaggtc tagtttggcc gttgcaccac gcgaggttcc
1321 cccacgcgac attgaggacc tgctgcgcac cttggcgccg caggtcctct ccgtgcttgc
1381 acgcaaccac ggacagttcg acgcctgcga ggatgcagtg caagaagccc tcattgaggc
1441 cgccctgcaa tggccatccg ggctgccaac aaatccgaag ggatggctgc tggctgtcgc
1501 atcgcgccgg ctggttgatg tgtggcgcag tgaaagcgcg cgccgcgccc gggaggaacg
1561 cgtcgccgca atggaggtca atttccagga cggcgcggcg tctgaagccg acgacaccct
1621 gaccctgatg ttcctgtgtt gccacccctc gatcagcgca ccatcgcaac tcgccttgac
1681 gctacgggcc gtaggaggac tcacgacggc ggaaatcgcc tctgctttcc tcgtccccga
1741 ggcgaccatg ggccagcgca tcagccgtgc gaaacaaggg atccagaagg caggcgcccg
1801 gttcgacatg ccgccggagt ccgagcggaa ggcgaggctc ggcgtcgttc ttcacgtcct
1861 gtacctgatc ttcaacgaag gctacgccgc aagctcgggc gactcgctgc aacgcgaaga
1921 cctcaccacc gaagcgatcc ggttggcacg cctgctggtg agcgccgcgc cggcagagct
1981 cgaggccacc ggtttgctcg ctctgatgct gctgacggac tcccgtcgtg ccgcgcgcac
2041 cctggctgac gggatgcccg tgccgctgtc cgaacaggac cggactctat ggaaccgcgg
2101 gcagatagag gaaggcatcg cactcctgtc ttccgtgctt ggacgtggcg cggctgggcc
2161 gtaccaactc caggccgcga tcgctgctgt gcatgcggag gcgccgtctg atgccgagac
2221 ggactggccg cagattctcg ccttgtacac agtccttgaa gccgtagcgc ccagtcccgt
2281 ggtgacgctg aatcgtgctg tagccgtggc catggtgaat gggccagccg ctgggcttga
2341 gctgctggcc cggctggatt cggcaattgg tcggtcgcat cgcctggatg cggtacgagg
2401 ccatctgtat gaaatggcag gctcgtatgg ggaagcgcgt gccgcctacc tcgccgccgc
2461 caagaaaacg ggcagcctgc aggagcgacg gtacttgatg gggaaagtgg cgcgaatgga
2521 ctcgtcaggg ggttcctga
//
LOCUS GCA_005860785 8747 bp DNA UNK 01-JAN-1980
DEFINITION .
ACCESSION GCA_005860785_8650686-8659433
VERSION GCA_005860785_8650686-8659433.1
KEYWORDS .
SOURCE GCA_005860785
ORGANISM GCA_005860785
.
COMMENT ##antiSMASH-Data-START##
Version :: 6.1.1
##antiSMASH-Data-END##
FEATURES Location/Qualifiers
CDS 1..642
/locus_tag="GCA_005860785_GCA_005860785_7712"
/deepbgc_score="0.78045"
PFAM_domain 31..249
/db_xref="PF05724.10"
/evalue="1.4e-08"
/locus_tag="GCA_005860785_GCA_005860785_7712"
/database="31.0"
/description="Thiopurine S-methyltransferase (TPMT)"
/deepbgc_score="0.25303"
PFAM_domain 103..432
/db_xref="PF01209.17"
/evalue="6.5e-07"
/locus_tag="GCA_005860785_GCA_005860785_7712"
/database="31.0"
/description="ubiE/COQ5 methyltransferase family"
/deepbgc_score="0.41177"
PFAM_domain 115..228
/db_xref="PF03848.13"
/evalue="3.1e-06"
/locus_tag="GCA_005860785_GCA_005860785_7712"
/database="31.0"
/description="Tellurite resistance protein TehB"
/deepbgc_score="0.52720"
PFAM_domain 118..447
/db_xref="PF13489.5"
/evalue="3.4e-12"
/locus_tag="GCA_005860785_GCA_005860785_7712"
/database="31.0"
/description="Methyltransferase domain"
/deepbgc_score="0.70708"
PFAM_domain 121..438
/db_xref="PF05175.13"
/evalue="3.2e-09"
/locus_tag="GCA_005860785_GCA_005860785_7712"
/database="31.0"
/description="Methyltransferase small domain"
/deepbgc_score="0.75342"
PFAM_domain 124..333
/db_xref="PF05401.10"
/evalue="0.00041"
/locus_tag="GCA_005860785_GCA_005860785_7712"
/database="31.0"
/description="Nodulation protein S (NodS)"
/deepbgc_score="0.85638"
PFAM_domain 127..435
/db_xref="PF06325.12"
/evalue="0.0057"
/locus_tag="GCA_005860785_GCA_005860785_7712"
/database="31.0"
/description="Ribosomal protein L11 methyltransferase
(PrmA)"
/deepbgc_score="0.88533"
PFAM_domain 130..288
/db_xref="PF12847.6"
/evalue="0.0094"
/locus_tag="GCA_005860785_GCA_005860785_7712"
/database="31.0"
/description="Methyltransferase domain"
/deepbgc_score="0.91676"
PFAM_domain 133..441
/db_xref="PF13847.5"
/evalue="6e-10"
/locus_tag="GCA_005860785_GCA_005860785_7712"
/database="31.0"
/description="Methyltransferase domain"
/deepbgc_score="0.95727"
PFAM_domain 133..303
/db_xref="PF09445.9"
/evalue="0.00017"
/locus_tag="GCA_005860785_GCA_005860785_7712"
/database="31.0"
/description="RNA cap guanine-N2 methyltransferase"
/deepbgc_score="0.95776"
PFAM_domain 136..429
/db_xref="PF08241.11"
/evalue="4.3e-14"
/locus_tag="GCA_005860785_GCA_005860785_7712"
/database="31.0"
/description="Methyltransferase domain"
/deepbgc_score="0.97305"
PFAM_domain 136..423
/db_xref="PF08242.11"
/evalue="7.1e-13"
/locus_tag="GCA_005860785_GCA_005860785_7712"
/database="31.0"
/description="Methyltransferase domain"
/deepbgc_score="0.97477"
PFAM_domain 136..417
/db_xref="PF13649.5"
/evalue="8e-17"
/locus_tag="GCA_005860785_GCA_005860785_7712"
/database="31.0"
/description="Methyltransferase domain"
/deepbgc_score="0.97198"
CDS complement(907..1473)
/locus_tag="GCA_005860785_GCA_005860785_7713"
/deepbgc_score="0.93716"
PFAM_domain complement(928..1017)
/db_xref="PF01923.17"
/evalue="0.0031"
/locus_tag="GCA_005860785_GCA_005860785_7713"
/database="31.0"
/description="Cobalamin adenosyltransferase"
/deepbgc_score="0.93115"
PFAM_domain complement(1156..1434)
/db_xref="PF09836.8"
/evalue="0.0022"
/locus_tag="GCA_005860785_GCA_005860785_7713"
/database="31.0"
/description="Putative DNA-binding domain"
/deepbgc_score="0.94318"
CDS complement(1478..2053)
/locus_tag="GCA_005860785_GCA_005860785_7714"
/deepbgc_score="0.96149"
PFAM_domain complement(1505..1843)
/db_xref="PF16859.4"
/evalue="6.6e-26"
/locus_tag="GCA_005860785_GCA_005860785_7714"
/database="31.0"
/description="Bacterial transcriptional repressor
C-terminal"
/deepbgc_score="0.95843"
PFAM_domain complement(1880..1990)
/db_xref="PF00440.22"
/evalue="1.7e-10"
/locus_tag="GCA_005860785_GCA_005860785_7714"
/database="31.0"
/description="Bacterial regulatory proteins, tetR family"
/deepbgc_score="0.96455"
CDS complement(2127..2987)
/locus_tag="GCA_005860785_GCA_005860785_7715"
/deepbgc_score="0.95681"
PFAM_domain complement(2130..2783)
/db_xref="PF12679.6"
/evalue="0.0069"
/locus_tag="GCA_005860785_GCA_005860785_7715"
/database="31.0"
/description="ABC-2 family transporter protein"
/deepbgc_score="0.95412"
PFAM_domain complement(2154..2738)
/db_xref="PF12698.6"
/evalue="0.00033"
/locus_tag="GCA_005860785_GCA_005860785_7715"
/database="31.0"
/description="ABC-2 family transporter protein"
/deepbgc_score="0.95056"
PFAM_domain complement(2373..2915)
/db_xref="PF12730.6"
/evalue="5e-05"
/locus_tag="GCA_005860785_GCA_005860785_7715"
/database="31.0"
/description="ABC-2 family transporter protein"
/deepbgc_score="0.96575"
CDS complement(2984..3895)
/locus_tag="GCA_005860785_GCA_005860785_7716"
/deepbgc_score="0.95974"
PFAM_domain complement(3341..3529)
/db_xref="PF13304.5"
/evalue="7.8e-09"
/locus_tag="GCA_005860785_GCA_005860785_7716"
/database="31.0"
/description="AAA domain, putative AbiEii toxin, Type IV TA
system"
/deepbgc_score="0.95292"
PFAM_domain complement(3428..3844)
/db_xref="PF00005.26"
/evalue="3.7e-25"
/locus_tag="GCA_005860785_GCA_005860785_7716"
/database="31.0"
/description="ABC transporter"
/deepbgc_score="0.96400"
PFAM_domain complement(3764..3838)
/db_xref="PF13555.5"
/evalue="0.0049"
/locus_tag="GCA_005860785_GCA_005860785_7716"
/database="31.0"
/description="P-loop containing region of AAA domain"
/deepbgc_score="0.96230"
CDS 4012..5214
/locus_tag="GCA_005860785_GCA_005860785_7717"
/deepbgc_score="0.95110"
PFAM_domain 4585..4782
/db_xref="PF07730.12"
/evalue="1.7e-17"
/locus_tag="GCA_005860785_GCA_005860785_7717"
/database="31.0"
/description="Histidine kinase"
/deepbgc_score="0.95964"
PFAM_domain 4930..5052
/db_xref="PF02518.25"
/evalue="0.0056"
/locus_tag="GCA_005860785_GCA_005860785_7717"
/database="31.0"
/description="Histidine kinase-, DNA gyrase B-, and
HSP90-like ATPase"
/deepbgc_score="0.94257"
CDS 5211..5870
/locus_tag="GCA_005860785_GCA_005860785_7718"
/deepbgc_score="0.96030"
PFAM_domain 5220..5552
/db_xref="PF00072.23"
/evalue="1.2e-23"
/locus_tag="GCA_005860785_GCA_005860785_7718"
/database="31.0"
/description="Response regulator receiver domain"
/deepbgc_score="0.94182"
PFAM_domain 5673..5801
/db_xref="PF04545.15"
/evalue="0.00012"
/locus_tag="GCA_005860785_GCA_005860785_7718"
/database="31.0"
/description="Sigma-70, region 4"
/deepbgc_score="0.95638"
PFAM_domain 5676..5837
/db_xref="PF00196.18"
/evalue="3e-15"
/locus_tag="GCA_005860785_GCA_005860785_7718"
/database="31.0"
/description="Bacterial regulatory proteins, luxR family"
/deepbgc_score="0.97019"
PFAM_domain 5679..5798
/db_xref="PF08281.11"
/evalue="1.2e-05"
/locus_tag="GCA_005860785_GCA_005860785_7718"
/database="31.0"
/description="Sigma-70, region 4"
/deepbgc_score="0.97018"
PFAM_domain 5715..5789
/db_xref="PF13412.5"
/evalue="0.0016"
/locus_tag="GCA_005860785_GCA_005860785_7718"
/database="31.0"
/description="Winged helix-turn-helix DNA-binding"
/deepbgc_score="0.96292"
CDS complement(5905..6498)
/locus_tag="GCA_005860785_GCA_005860785_7719"
/deepbgc_score="0.93610"
PFAM_domain complement(6106..6474)
/db_xref="PF11716.7"
/evalue="4.1e-17"
/locus_tag="GCA_005860785_GCA_005860785_7719"
/database="31.0"
/description="Mycothiol maleylpyruvate isomerase N-terminal
domain"
/deepbgc_score="0.94886"
PFAM_domain complement(6211..6468)
/db_xref="PF12867.6"
/evalue="0.0017"
/locus_tag="GCA_005860785_GCA_005860785_7719"
/database="31.0"
/description="DinB superfamily"
/deepbgc_score="0.92334"
CDS complement(6663..7181)
/locus_tag="GCA_005860785_GCA_005860785_7720"
CDS complement(7414..7854)
/locus_tag="GCA_005860785_GCA_005860785_7721"
/deepbgc_score="0.82746"
PFAM_domain complement(7603..7746)
/db_xref="PF03779.13"
/evalue="1.1e-16"
/locus_tag="GCA_005860785_GCA_005860785_7721"
/database="31.0"
/description="SPW repeat"
/deepbgc_score="0.82746"
CDS 8100..8747
/locus_tag="GCA_005860785_GCA_005860785_7722"
/deepbgc_score="0.78500"
PFAM_domain 8208..8303
/db_xref="PF00440.22"
/evalue="7.5e-06"
/locus_tag="GCA_005860785_GCA_005860785_7722"
/database="31.0"
/description="Bacterial regulatory proteins, tetR family"
/deepbgc_score="0.78500"
region 1..8747
/region_number="1"
/candidate_cluster_numbers="1"
/product="RiPP"
/contig_edge="False"
/tool="DeepBGC"
/activity="antibacterial-cytotoxic"
/note="DeepBGC_Score: 0.90556"
cand_cluster 1..8747
/candidate_cluster_number="1"
/protoclusters="1"
/kind="single"
/product="RiPP"
/tool="DeepBGC"
protocluster 1..8747
/protocluster_number="1"
/product="RiPP"
/tool="DeepBGC"
proto_core 1..8747
/protocluster_number="1"
/product="RiPP"
/tool="DeepBGC"
ORIGIN
1 atgacggatc acctcgacca cgcctccgcc gaagagttct gggacgcccg ctacggttcc
61 agcgaccgca tctggagcgg caaccccaat gccgccctgg tccgcgaaac ggccggactg
121 accccgggaa gcgccctgga cctcggatgc ggggagggag ccgacgcgct gtggctggcg
181 cagcagaact ggaaggtcac cgccgtcgac atctcccgga aggcactgga gcgaggggcc
241 gagcatgcgg ctgccgcggg tgtggccgac cggatcgact ggcagcggtg cgatctggcg
301 gtgtcgttcc ccaccggcgc cttcgacctg gtctccgccc acttcctgca ctcgcccgtc
361 acgatgccga gggagcagat cctgcgcagc gccgcggcgg ccgtcgcccc gggcggtgtg
421 ctgctggtcg tcgggcacgc cggatcccct tcctgggtgg ccgacaaggg tcacgccccg
481 cacttcccca catctgaaga ggtcttcgcg ggcctcgatc tcccggccgg gcagtgggag
541 gtgctcctcg cggacgtcca cgagagcggg atgaccggtc ccgacgggca gcccgcaacc
601 cgcacggaca acaccctgaa gctccggcgc ttggcgcagt gatcccctgc ccgtaggact
661 cgcctcgccc ccgtcggtag gtgccgacgg gccgggtcac ctggacagcg ggaagcggcg
721 gcagcaggac ggtcagcgcg atcagcaccg cgatgctcag gacacccatc acccctgttc
781 ggtggtgttc gttcgaaagg ccatgagccc atggaacagt cgcagcccgc tgcgatccgg
841 cgggttttcg acctcaccct gagcgttcgc gagcgagccg tcacgtcagc gatgccgatg
901 agccgttcag ccggaggctc cgtcggcatc ggcggcctcg gcggcggtac gctcggcccg
961 gcgcgcgacg gcgcgcagca cctcgagcct gatgaacgcg gcaccgccca gccggatgaa
1021 gagccagtag gggcggaaga tccgtcgcga gcgggcgtcg gtggcccgca ccctggtctc
1081 gaaggagacg acggtggcgt caccgtcggg caccacccgc agactcatgg ccgccttcgc
1141 ccaccccggc tccgcgaatc gggcgaaggc atcggcgtcg cccggctcga tcggtgcgga
1201 cgggggccga agccgccaga acttggcgac catgcccttg accagctccg ctccctccac
1261 ggtgctgagg gtcggcgtcg ggaacgtgtc gagaaacggg cctttgagcc ggttccggcc
1321 ggtcgtccgc agcagcatca gccggcgcga gacggggagt tcctcggcgg tgacgtcgag
1381 cagggcccgc catacggtgg ccgggtccgc cgcgatgtgc cgggtgaaac aggagcggaa
1441 gtcgtggacc ggaagcaggc ggtcgagttc catcgggtca ctccggagtg tcgagcccgg
1501 cggccaccgc gtcggccagc cgccggggca ggtccgacgg cggtgtgccg ccgagcatga
1561 acagatggac gaagacggtg ccgagcagtt gcgcgtgcgc cagtaccgga tcggacccgc
1621 tgcggatctc accccgctgg gtggcccgtt cgaggatccg ggccagtacg ttctgctgac
1681 ggtcgatgaa cgtggcggcg aaccgggtcg ccaactccgg gtcggcggcg atgtcggcga
1741 tcagccccgg cagcgccttc cgcgcgtgcg gcgcggacag caggacgatg atctgctcga
1801 cgagagcggt cagatcggcc cgcagaccgc cgaggtcggc gggcggggcg atctccgtcc
1861 cgtgcaccac ggccacgaac accatctcgg ccttggaccg ccaccgccgg tagatccccg
1921 ccttgccgac ccccgcccgg gccgcgacgc ggtcgatgct caactgctgg tacccgacct
1981 cgtccaggat cccccgtacg gcgtcactga ccgccgcgtc gaccttgccg tcccgcgggc
2041 gtcctcggtt catggcctca gcataaacga aacggatcga tccgtaaatg aaccggcgag
2101 gacaacgggc ccgatgcgcc cggggctcac acgtcccggc gccgcacgac aacgatggcg
2161 acgaccaccg acaccaacgc ccagcagccg agggcgatcc aggacccggt gatcgttgcc
2221 gggtacttac ccgggtcgaa gtacgggttg tccgggttct ccttcagccg gtccaccgcg
2281 gcgatgggca ggcagtggcc gatctccttg aggagcttgt accggtcacc gccgaacatc
2341 aggggagcga tgaacagcag ggacaccagg ccgacgatcg tggccgtggc gtgccggatg
2401 acggcggcga aggcgatgcc gatgagggcc gacaccggga cgatcagcgc gtacgccgtg
2461 acggcgcgca gacagccggg gtcgttgatc gagaggccga catggcgtga ggccagcatg
2521 gcgttggtgc cgaagaagga cgccgtcgag atgatcacgc ccatgacgag ggtgaccgcg
2581 gtgacgacga ccaccttggc cgccaccacc gctcgccggt cgggtacggc ggcgaacgtc
2641 gtacggacca ttccggtggc gtactcgccg aagatcgaga tggcgccgac gctggccgcc
2701 gcgagcatca tgaggtagac ggcgatgctg ttgagaccgt ggaagagcgg gtcgtaccgg
2761 tacgggggca tccccggcct cgcggactgg tcgatgtagg tcaggtcggt gtggacggcg
2821 ttgagattga ccgcgatggc gaccaacgcg ctgaccacca gcacccagta ggtcgaccgg
2881 agcgaccgca ttttgatcca ctcggccgcg agcaggtcca cgaaccgcac cgcgggatcg
2941 gtcattcgcg ccggcggcgt tacggcggtc gaggtcatcg tgctcatcgg ggatctcctg
3001 cgaggtattc aacgctgtcg gcggtgagtt ccatgaacgc ggtctccagc gaagaggtct
3061 gggtcgcgag ttcgtgcagc atgatgcggt gcaggtgggc gagttcgccg atccgggccg
3121 cactcagtcc ggccacccgc cccaccgtgc cgctctcatg acgcaccgac gcaccctcgg
3181 cggtcagtac ggttgccagc gcgtcgagtt cgggcgtctg cacggtcacg cccgtgccgg
3241 tgctgcgggc cgagaagtcc tccaggctct ggtcggcgat gagccggccc tgaccgatca
3301 cgatcagatg atcggcggtg tgctccatct cgcccatcat gtggctggag acgaacacgg
3361 tgcggccctc ggacgccagc cgtcggaaca ggccgcgcac ccacagcacg ccttccgggt
3421 ccaggccgtt gagcggttcg tcgaacagca gcaccggagg atcaccgagc agagcgcccg
3481 cgatgcccag ccgctgcttc atcccgaggg agaacccgcc gatgcggcgc cgcgccgcct
3541 tggccagccc gacctccgcc aggacatcgt ccacgcggtg cagcgggatg cggttgctgc
3601 gcgccagggc ggacagatgc gccgcggcgc tgcgcccgcc gtgcacgtcc tgcgcgtcca
3661 gcagcgcgcc gacatggcgc agtccgcgcg gccggtccgc gaacgggacc ccgtcgacgg
3721 tgacggaccc gccggtcggg gtgttcagtc ccaggatcat ccgcagagtg gtggacttgc
3781 ccgcaccgtt cgggccgagg aatccggtca cctgtccagg tcgcacggtg aaggtcaggg
3841 tgtcgacggc gagggtgtcg ccgtagctct tggtcagtcg gtcggcttca atcacgatgc
3901 caacggtgcc ggtccggcaa ccggtgccgc atcgggccgg cggccacctt ccgcaagtgc
3961 cgaggtggcc ccggggacta catccacggg ccaatgcgca ggtggaggcc ggtgacctac
4021 gatcgggcca tgaccgtcac ccaccgtcct ccattgctga agcgcttgcc catgggcgtg
4081 tgggtcggcg ccttctggtc gacgctcatc cttgtgcgct ccttccaacg gccggacgag
4141 ttccgtcatc tgaccgaact cgacggcaac atcgagggtg gcccgctgct catcgtcgcg
4201 gtcgtcacca ccttcggcgc cctgctgctg ttccgcgcgc cgctggcgtc actcggcctg
4261 gcgctcgcgg gcgtcgtcgt cgccctcgga tcgcgtgtgg tggaggcgcc gatcgtggtc
4321 ttcctgctgg ccgacggagt ggtgggctac atcgcggcca cccgctcgcg ccggatctcg
4381 atcctcgccg ccatgctgcc ggtcgttctg gtgacggctt tcacggtcac ccggctgata
4441 cgcgaagggg acgccgggat cgcggcggag gccgccgtcg cgtcgaccgc ggtcatcgcc
4501 tggctgatcg gcaacacgat ccaccagagc cgcgcctaca ccgagacgct gcgctcccgg
4561 gccactcaac aggcggtcac cgccgagcgg ctgcggatcg cccgcgaact gcacgacatg
4621 gtcgcgcaca gcatcggcat catcgccatc caggcgggtg tggccagccg cgtcatggac
4681 agtcagcccg acgagacccg caaagcactc gacgccatcg aggccaccag ccgcgagacc
4741 ctgtccgggc tccggcggac gctgcgcgcg ctgcgccagt cggatgcgga ctcggcgccc
4801 ctcgacccgg cgccgggact ggccgacgtc ggacaactgg tcgcgaagac cagggaagcc
4861 ggtgtgcgcg tcgacgtccg gtggcgcggc gaacgcagac ccgtgcccgc cgacatcgac
4921 ctctccgcct tccgcatcat ccaggaggcg gtcaccaacg tcgtgcggca ctccggcacc
4981 cgggactgca gcgtgagcgt cgactaccgg gacgaggagt tgtccatcga ggtcgtcgac
5041 ctcggctgcg gcggcgaggg gggagcgggg tacggcatcg tcggtatgcg cgagcgggtc
5101 agcctgttgc acggcgaatt cagcgccggc ccgcggcccg aaggcggctt ccgcgttgcc
5161 gcacggctcc cggtgcccgc cggggccggg atggccgggg tgaccgcccg atgatccgcg
5221 tcatcctcac cgacgaccag cccctggtcc gcaccggcct gcgcgtcctg atcgccgaca
5281 cccccgacat cgaagtcgtg ggtgaggccg cgaacggcgc cgaggcggtg tccctggcca
5341 cggaactgcg ccccgacgtc gccgtcatgg acatccgcat gcccgtgctg gacggcatcg
5401 gagccgcgcg actcatcacc gaggacccgc agttgccgac gcgcgtcctc gtcctcacca
5461 cgttcgacga cgacgagtac gtctacgccg cgctgcgcgc cggagcgagc ggattcctcg
5521 tcaaggacat gccgctggag tccattctcg acgggatccg cgtcgtcggc gccggtgacg
5581 ccctgatcgc ccccagcgtc acccgccgcc tcatcgcgga gttcgccggc cgccccgagc
5641 ccaccacccc gcaccccgcc ccggtggacg gcgtcaccaa ccgcgagcgc gaagtcctga
5701 cactcgtggg ccgcggtctg tccaacaccg agatcgccga ggaactcgtc atcagcgtct
5761 ccaccgcaaa agcccacgtc gcccgcctgt tcaccaaact cgccgcccgc gaccgcgtcc
5821 acctcgtcat catcgcgtac gaactcggcc tcgtatcccc gccccgctga cgcctactgc
5881 cagggcttcc cgcctttccc tgtatcaggc ggcggcccat gccgggtcgc gtccggtgag
5941 cgcgacgatg cggtcgagga gcggggcgtc ggaggggatg gcgacggcgg ggccgaagat
6001 gccctggcgg gcgggggact gtcccggggc ggccaggggc tccacgaact cgcggcagcc
6061 ttcgagcgtt tggacgtcgg cggagaacgg ctgcccggtg gcgcgggcga tgtcccagcc
6121 gtggatggtc agttcgttga ggcccaccct gccgatcatg ccggcgggca tcgtgacacc
6181 gccgacctcc gtcatcccct cccaggcggc cggatcgcgc caggcctcgg cgagggtggt
6241 gagctgctcg gggatgcggg tgcgccagtc cgcgccgagc cgggaggcgt cgggggaggg
6301 cggttgggaa ccctcgctga gcggcgtttt ccgggcggcg atgaggaagg cgtgggagag
6361 gccgtcgaca tggtcgagga ggtcgccgag ggtgtacttg gcacacggtg tcggcgcggt
6421 gagctgttcg tcggggatgg cgctcagcag gtccgccagt cgctgggccg cgggaccgag
6481 gtcgagcatc gctgtcatgg gagtgactcc ttcggcgcgt gttcgcagta tgcgtggcaa
6541 cgaaggggta gacggcatgg cgccgcggaa ctcatcggtc gcgccgatgg tcgcgccgat
6601 attgtttcgc gcggcagcct ggaggccccc ggacggccgc gtccgacctg gtggagcccg
6661 gcctagtgga gactgcgctg catccacggc acgaggtcgc gctcctcgtg caggaggtac
6721 tggtcgatct cgccgagcac gagctgcgtg gtgagcgtca tgccctcgac ggccaccttg
6781 ccggcgatga cgtcgtccag acggctcagc agctgagcac cgtcctcgcg atcgcgcacc
6841 accgccacgg aggccttgtc gtgccgctcc agaagcgggt ggaccgtcgc ttgtttcgcc
6901 gcgtagtgcg cccggaacgc ctcccggaac gcgggcaggc cctccggtgc gcccgtggga
6961 accctgtgca ggtcctgcag ggtggcccgc agggcgcggc tgcgggcgat gagagcgtcc
7021 gtcgcgacgg tgtgaatctc gtgcgcctgc gcctcgtcgt gccggcgctt ctcggcgttg
7081 gcgttgaatt ccgccccgtg ctggttgatc tcgtgcggac cctgggggtg gtcctgcgac
7141 gggcggttct gcgacgcgtg gtcctgatcc gaagcggtca tggtgggcga cacctcctgg
7201 ttgcatccac tcacgctgtc ctggtacccg caatccagga gcggacgccg ccgcggacgc
7261 cacgtcccca cgaatggacg cccggccggg gccgggcgtg tccagggtga tcgggccggc
7321 ttccgccgcg cgcacggatg gccgtccccg gcattcgccg gcccctccag gacgaccatc
7381 agtactttgt ccgcgtcgga tgtggtcccg gcgtcaggac tctctcttgt tcctcatcat
7441 cattcctgcc gccgcgaggc cgagcaggca gatgacggcg ccggtgatga tgttgttcca
7501 catcgcgccg gtctcggggt gacggatgac cacccacggg gagacgatca gccagacgcc
7561 gatcgccgcg cacgccaaac tcatgccctg catgcgatcc ggggccatgg tcagacacag
7621 cgccaacacg gcgaccgcga tgcccatgat caggttgttc tgtgccaact gcggggacgt
7681 gggcgcgaag tgcaccgtcc agggcgagat ggcggcgtac agcccggcca ggagcagcag
7741 tccgtcgact cctacgatcc cccggcctcc gagcacgcgg gcgtagcgat cacgcatttc
7801 cgatacgtcc ggatgtgctg cgatatcccc cgggcggtgt gacacgtcgg ccatacgact
7861 cgcctccttc ggcttatgca agtctgaccg cgtttgcggt atctaccgcg agtccattgt
7921 gcgcctaagt tgccgttatg tgtaggttct gccgtgccgt gctggcggcc tccgccggac
7981 ggggtgtccg acatgatgtt ggacaccatg tccgactgcc attaacttag gaaatgtccg
8041 atctggaggg agtgcgattc ggaaacgcac ctcaccggaa ggaccggaag gaggaaggaa
8101 tggcctcgat cacccgtccc cgttcgcagc agtcccagcg gcgcgccggc accgagcgcg
8161 tggtgttcgc cgcggtgcag cgcctcctcg acgcggggga gtgcttcacc gaactgggtg
8221 ttcggcgcat cgccagcgaa gcggggatcg cgcgctccac cttctatctc tgcttccagg
8281 acaagaccga ggtcctgatc cggctgaccg ccacgatgaa ggacgaactc ttcagcaggg
8341 gcgcggcctg gcgccccacc ggtccgggcg gcggacccga ggcgctggcc gccgtctacg
8401 cgggccggct cgcgtactgc cgggagcggg cgccactgct ggccgccgtc gcggaggtcg
8461 ccgcgtacga tccggtcgtt cgcgaggcca gggcgcagga gatcgagcgc ttcgcgcacc
8521 acatcgcgtc gctgctggag gaggagcaac gcgagggccg gctctccgcg gacgtcgacc
8581 cggtgaccgc cggacaggtg ctcgcctggg gcggggagca ggtcatcgcc cgtcaggtga
8641 cgaccggcgg cgcggaggac gacgccaagg tcgcccgcga gatggcgtac ggccaatggt
8701 tcggcaccta ccgccgccgg cctggccctg acgggacgcc gacctga
//
Any insight into how sensitive the model is to these missing inputs would be greatly appreciated!
Thanks for your very cool work on this tool!!!
Hi there, congrats for this tool, it's impressive :D!
I recently started using NPBDetect to analyze a set of bacterial genomes ( analyzed with DeepBGC). While running the prediction module, I encountered two blocking errors related to missing qualifiers in the GenBank files, i knew this is intended to work with the output files of antimash. So...
I managed to patch the code to get it running, but I would like to double-check with you if my "fixes" might be negatively impacting the prediction accuracy.
Error A: Missing Translations When running predict, the script crashed with a KeyError: 'translation' in bin/gbk_to_fa.py. It seems some CDS features in my input files do not have a translation qualifier.
Traceback:
File ".../NPBDetect/bin/gbk_to_fa.py", line 20, in extract_proteins
mseq = SeqRecord(Seq(P.qualifiers['translation'][0]), id = pname, name= '',
KeyError: 'translation'
Error B: Missing PFAM Scores After fixing the first error, I hit a second one: KeyError: 'score' in bin/PFAM_feats.py. It appears some annotated PFAM_domain features lack a specific score field.
Traceback:
File ".../NPBDetect/bin/PFAM_feats.py", line 30, in get_PFAM_domains
score = float(F.qualifiers["score"][0])
KeyError: 'score'
To bypass these errors and allow the pipeline to finish, I modified the scripts to simply skip any feature that lacks these specific tags, rather than crashing.
Modification in bin/gbk_to_fa.py: I added a check to skip CDS features without translations:
In extract_proteins function
for P in features:
if P.type == 'CDS':
if 'translation' not in P.qualifiers: continue # <--- Added this line
# ... rest of the code
Modification in bin/PFAM_feats.py: I added a check to skip PFAM domains without scores:
In get_PFAM_domains function
for F in feat:
if(F.type == 'PFAM_domain'):
if 'score' not in F.qualifiers: continue # <--- Added this line
score = float(F.qualifiers["score"][0])
# ... rest of the code
3. My Question
With these modifications, I was able to successfully run the tool on my 31 genomes and obtained probabilities ranging from 0.68 to 0.70 for cytotoxic activity.
However, I am concerned about the validity of these results. Does skipping these specific features (incomplete CDS or PFAM entries) significantly compromise the model's feature vectors?
Also, Deepbgc predicted all the sequences to have a high probability of being a cytotoxic region. Ill attach the content of some gbks to give you a better context:
LOCUS GCA_039889235_1 2266 bp DNA UNK 01-JAN-1980
DEFINITION .
ACCESSION GCA_039889235_1_1942816-1945082
VERSION GCA_039889235_1_1942816-1945082.1
KEYWORDS .
SOURCE GCA_039889235
ORGANISM GCA_039889235
.
COMMENT ##antiSMASH-Data-START##
Version :: 6.1.1
##antiSMASH-Data-END##
FEATURES Location/Qualifiers
CDS complement(1..432)
/locus_tag="GCA_039889235_1_GCA_039889235_1_1706"
/deepbgc_score="0.50514"
PFAM_domain complement(61..258)
/db_xref="PF00717.22"
/evalue="8.6e-14"
/locus_tag="GCA_039889235_1_GCA_039889235_1_1706"
/database="31.0"
/description="Peptidase S24-like"
/deepbgc_score="0.50514"
CDS complement(543..1271)
/locus_tag="GCA_039889235_1_GCA_039889235_1_1707"
CDS 1502..2266
/locus_tag="GCA_039889235_1_GCA_039889235_1_1708"
/deepbgc_score="0.65581"
PFAM_domain 1577..1951
/db_xref="PF01209.17"
/evalue="3.6e-14"
/locus_tag="GCA_039889235_1_GCA_039889235_1_1708"
/database="31.0"
/description="ubiE/COQ5 methyltransferase family"
/deepbgc_score="0.61972"
PFAM_domain 1610..1918
/db_xref="PF13489.5"
/evalue="4.7e-16"
/locus_tag="GCA_039889235_1_GCA_039889235_1_1708"
/database="31.0"
/description="Methyltransferase domain"
/deepbgc_score="0.60990"
PFAM_domain 1610..1915
/db_xref="PF08003.10"
/evalue="0.0025"
/locus_tag="GCA_039889235_1_GCA_039889235_1_1708"
/database="31.0"
/description="Protein of unknown function (DUF1698)"
/deepbgc_score="0.62242"
PFAM_domain 1616..1921
/db_xref="PF05175.13"
/evalue="0.00065"
/locus_tag="GCA_039889235_1_GCA_039889235_1_1708"
/database="31.0"
/description="Methyltransferase small domain"
/deepbgc_score="0.59407"
PFAM_domain 1616..1915
/db_xref="PF06325.12"
/evalue="0.00044"
/locus_tag="GCA_039889235_1_GCA_039889235_1_1708"
/database="31.0"
/description="Ribosomal protein L11 methyltransferase
(PrmA)"
/deepbgc_score="0.62270"
PFAM_domain 1619..1915
/db_xref="PF13847.5"
/evalue="1.8e-17"
/locus_tag="GCA_039889235_1_GCA_039889235_1_1708"
/database="31.0"
/description="Methyltransferase domain"
/deepbgc_score="0.68168"
PFAM_domain 1619..1885
/db_xref="PF07021.11"
/evalue="0.0043"
/locus_tag="GCA_039889235_1_GCA_039889235_1_1708"
/database="31.0"
/description="Methionine biosynthesis protein MetW"
/deepbgc_score="0.66653"
PFAM_domain 1622..1924
/db_xref="PF03141.15"
/evalue="3.7e-05"
/locus_tag="GCA_039889235_1_GCA_039889235_1_1708"
/database="31.0"
/description="Putative S-adenosyl-L-methionine-dependent
methyltransferase"
/deepbgc_score="0.65640"
PFAM_domain 1622..1915
/db_xref="PF05401.10"
/evalue="0.00035"
/locus_tag="GCA_039889235_1_GCA_039889235_1_1708"
/database="31.0"
/description="Nodulation protein S (NodS)"
/deepbgc_score="0.67902"
PFAM_domain 1628..1900
/db_xref="PF13649.5"
/evalue="4.1e-21"
/locus_tag="GCA_039889235_1_GCA_039889235_1_1708"
/database="31.0"
/description="Methyltransferase domain"
/deepbgc_score="0.72780"
PFAM_domain 1631..1912
/db_xref="PF08241.11"
/evalue="5.8e-24"
/locus_tag="GCA_039889235_1_GCA_039889235_1_1708"
/database="31.0"
/description="Methyltransferase domain"
/deepbgc_score="0.72197"
PFAM_domain 1631..1906
/db_xref="PF08242.11"
/evalue="1.5e-14"
/locus_tag="GCA_039889235_1_GCA_039889235_1_1708"
/database="31.0"
/description="Methyltransferase domain"
/deepbgc_score="0.70264"
PFAM_domain 1751..1912
/db_xref="PF05148.14"
/evalue="0.00015"
/locus_tag="GCA_039889235_1_GCA_039889235_1_1708"
/database="31.0"
/description="Hypothetical methyltransferase"
/deepbgc_score="0.62065"
region 1..2266
/region_number="1"
/candidate_cluster_numbers="1"
/product="unknown"
/contig_edge="False"
/tool="DeepBGC"
/activity="cytotoxic"
/note="DeepBGC_Score: 0.58048"
cand_cluster 1..2266
/candidate_cluster_number="1"
/protoclusters="1"
/kind="single"
/product="unknown"
/tool="DeepBGC"
protocluster 1..2266
/protocluster_number="1"
/product="unknown"
/tool="DeepBGC"
proto_core 1..2266
/protocluster_number="1"
/product="unknown"
/tool="DeepBGC"
ORIGIN
1 ctagacatgg tggaggcagc gggtcgcgac gccccagatg acgaggtccg acaacgctgc
61 gaccctgatg tccgggtagc gtgggttgtc cgcctggagg accacgccgg aagtcgtgag
121 gcgcagcctc ttgacggtga gttcgccgtc gaggaccgcg acgacgacac agccgtcgcg
181 gggctctagc gcccggttga cgatcagctc gtcgccgtcg ctgatcccgg cgccctccat
241 ggagtccccg gacacccgga cgacataggt gctggtgatg tccttgatga ggtgttcgtt
301 aaggtcgatg cggccgtcaa agtagtcctg ggccggggaa gggtagcctg cggcgaccgg
361 caccggcgag atcaggaccg acagaagaga agtgcccgcg tctatcacgc ggggcccgat
421 gattacgccc acaacacacc tttattcgaa tatatgttcg atacatccag tgtagctccg
481 ggtactgaca ttgggcagac tttgcccagt gccagctgtc ccaagctgcg gggccggcta
541 tttcagtttg cggacaggct gacgaagtac tcctgctgga ccggccgccg gcccagcttc
601 agggccgtgc gctgcgaggg caggttccct gcgtggatgt ggcccgagac ggtgtccgcg
661 cccgagcgtt ggccggtcag gaaagttgac tgcactgctg gagccaagcc cttgccacgc
721 aggcgttccg tcagaaacac gtccagcatg cagactgcac gtcggccaaa gatcggtgag
781 atggtggcag ccaccaacac cgcaaagccg tggtcgtccc ggagcgtcat caacgagccc
841 tggtccgccg cgtcctggag cccctcctcg tgggaccctg aaacaaacgg ggcgagcacc
901 ggggcatcgg accgccacgc ttgatgttcg tgctggtagt cggcgaacac ttcagtcgcc
961 gcttccggca cgcataggga cccggccacg cccaccccgc ggtacgcctc ggccacttgg
1021 gcggccagtg tatccgcaga ttcggtgtcg tttattcggg cagtagtctt gaccgccacg
1081 aacgggaatg ccggatcgag gttgcggtat ctgaggcccg aaatgacggt gctcccatcg
1141 cttgcgtcaa gtagccgctc tgtcatggtc ccggtgctga cgtcgactcc cagccggccg
1201 atcctggtcg ccacgacgtc cggtaacgca tgagcggcga gggaccctac gtcacctatc
1261 ccggtttgca ctgcgtcctc agcgtcgtac ccggcaggaa ggtaatccag gcccggacgg
1321 gcgaaggcgg cgagttggtg gtccggccgt ccaaaggagt tcgaaatgga gggcgggaag
1381 ggacgggatc acaaccaccc ttggagaagg cataaaagac accaaggggt gtcccgtaag
1441 tggtaaagga gggtcgaccc tattgtcccg cccgtgagga ttcgcctgaa agggaagaga
1501 agtgaaggcc aacccctatg acgctttcgc cgagaactac tcggctgaaa atgagtccag
1561 cctcctcaac gcgtactaca agcggccggc aatgattggc cttgccggcg acgtgaccgg
1621 tcaccgtgtc ctggacgcag ggtgcggttc cgggcccctg tccgcggcac tgagcgcgaa
1681 aggcgcgatc atgatcggct tcgattccag tcctgcgatg ctcgaattgg caagacagcg
1741 attgggcgcg accgcggacc tgtacgtggc cgacctcagc aaaccgctcc ctttcgccga
1801 tgactccttc gatgacgttg tctcgtcctt ggtcctgcac tatctggagg actggtcagc
1861 accacttgcc gaactgcggc gcgtcctgaa gccgggcggt cgcctgatcc tgtcggtcaa
1921 ccaccccacg gtcagcgttg tcacccaacc aacggaggac gacttcgcca tccggcagta
1981 ctcggaggat tacgagttca atggcgagcc tgcggtcctg gccttctggc accgaccact
2041 gcaagagatg atcagcgcct tcacgtcggc gggataccgc atagccaccg tgcgcgaacc
2101 aaagccatct ccagacacac cgcccgaact ccttcccccg cgcatcgtca acggcgagag
2161 gacagcgttc ctgtccttca tcttcttcgt cctcgaagca aacaaaaccg ccatgcccgt
2221 ttcggccgat gaggaactgg cacagggggt tgttggacgc cggtga
//
LOCUS GCA_039636795 2539 bp DNA UNK 01-JAN-1980
DEFINITION .
ACCESSION GCA_039636795_4840764-4843303
VERSION GCA_039636795_4840764-4843303.1
KEYWORDS .
SOURCE GCA_039636795
ORGANISM GCA_039636795
.
COMMENT ##antiSMASH-Data-START##
Version :: 6.1.1
##antiSMASH-Data-END##
FEATURES Location/Qualifiers
CDS 1..816
/locus_tag="GCA_039636795_GCA_039636795_4443"
/deepbgc_score="0.65311"
PFAM_domain 79..588
/db_xref="PF13489.5"
/evalue="1.4e-15"
/locus_tag="GCA_039636795_GCA_039636795_4443"
/database="31.0"
/description="Methyltransferase domain"
/deepbgc_score="0.34629"
PFAM_domain 88..336
/db_xref="PF05175.13"
/evalue="2.9e-06"
/locus_tag="GCA_039636795_GCA_039636795_4443"
/database="31.0"
/description="Methyltransferase small domain"
/deepbgc_score="0.27928"
PFAM_domain 100..450
/db_xref="PF08003.10"
/evalue="5e-06"
/locus_tag="GCA_039636795_GCA_039636795_4443"
/database="31.0"
/description="Protein of unknown function (DUF1698)"
/deepbgc_score="0.38819"
PFAM_domain 103..435
/db_xref="PF07021.11"
/evalue="2e-08"
/locus_tag="GCA_039636795_GCA_039636795_4443"
/database="31.0"
/description="Methionine biosynthesis protein MetW"
/deepbgc_score="0.42224"
PFAM_domain 106..363
/db_xref="PF02353.19"
/evalue="0.00036"
/locus_tag="GCA_039636795_GCA_039636795_4443"
/database="31.0"
/description="Mycolic acid cyclopropane synthetase"
/deepbgc_score="0.53808"
PFAM_domain 109..294
/db_xref="PF01135.18"
/evalue="2.9e-07"
/locus_tag="GCA_039636795_GCA_039636795_4443"
/database="31.0"
/description="Protein-L-isoaspartate(D-aspartate)
O-methyltransferase (PCMT)"
/deepbgc_score="0.59243"
PFAM_domain 112..417
/db_xref="PF03848.13"
/evalue="0.0076"
/locus_tag="GCA_039636795_GCA_039636795_4443"
/database="31.0"
/description="Tellurite resistance protein TehB"
/deepbgc_score="0.67284"
PFAM_domain 118..507
/db_xref="PF13847.5"
/evalue="8.7e-31"
/locus_tag="GCA_039636795_GCA_039636795_4443"
/database="31.0"
/description="Methyltransferase domain"
/deepbgc_score="0.76857"
PFAM_domain 118..429
/db_xref="PF01209.17"
/evalue="2.4e-17"
/locus_tag="GCA_039636795_GCA_039636795_4443"
/database="31.0"
/description="ubiE/COQ5 methyltransferase family"
/deepbgc_score="0.81779"
PFAM_domain 127..423
/db_xref="PF05219.11"
/evalue="0.0096"
/locus_tag="GCA_039636795_GCA_039636795_4443"
/database="31.0"
/description="DREV methyltransferase"
/deepbgc_score="0.82529"
PFAM_domain 127..309
/db_xref="PF02390.16"
/evalue="1.6e-05"
/locus_tag="GCA_039636795_GCA_039636795_4443"
/database="31.0"
/description="Putative methyltransferase "
/deepbgc_score="0.84340"
PFAM_domain 130..414
/db_xref="PF13649.5"
/evalue="1.6e-23"
/locus_tag="GCA_039636795_GCA_039636795_4443"
/database="31.0"
/description="Methyltransferase domain"
/deepbgc_score="0.88636"
PFAM_domain 133..426
/db_xref="PF08241.11"
/evalue="3.6e-24"
/locus_tag="GCA_039636795_GCA_039636795_4443"
/database="31.0"
/description="Methyltransferase domain"
/deepbgc_score="0.88448"
PFAM_domain 133..420
/db_xref="PF08242.11"
/evalue="4.4e-16"
/locus_tag="GCA_039636795_GCA_039636795_4443"
/database="31.0"
/description="Methyltransferase domain"
/deepbgc_score="0.87832"
CDS 934..1293
/locus_tag="GCA_039636795_GCA_039636795_4444"
/deepbgc_score="0.71925"
PFAM_domain 934..1245
/db_xref="PF03795.13"
/evalue="4.1e-08"
/locus_tag="GCA_039636795_GCA_039636795_4444"
/database="31.0"
/description="YCII-related domain"
/deepbgc_score="0.71925"
CDS 1295..2539
/locus_tag="GCA_039636795_GCA_039636795_4445"
/deepbgc_score="0.56171"
PFAM_domain 1340..1537
/db_xref="PF04542.13"
/evalue="3e-06"
/locus_tag="GCA_039636795_GCA_039636795_4445"
/database="31.0"
/description="Sigma-70 region 2 "
/deepbgc_score="0.53787"
PFAM_domain 1631..1777
/db_xref="PF08281.11"
/evalue="2.3e-07"
/locus_tag="GCA_039636795_GCA_039636795_4445"
/database="31.0"
/description="Sigma-70, region 4"
/deepbgc_score="0.58554"
region 1..2539
/region_number="1"
/candidate_cluster_numbers="1"
/product="unknown"
/contig_edge="False"
/tool="DeepBGC"
/activity="cytotoxic"
/note="DeepBGC_Score: 0.64469"
cand_cluster 1..2539
/candidate_cluster_number="1"
/protoclusters="1"
/kind="single"
/product="unknown"
/tool="DeepBGC"
protocluster 1..2539
/protocluster_number="1"
/product="unknown"
/tool="DeepBGC"
proto_core 1..2539
/protocluster_number="1"
/product="unknown"
/tool="DeepBGC"
ORIGIN
1 atgaacgcgc agcagcctga agatgtttac acccacgggc accacgagtc ggttgtccgg
61 gcccatgcct cgcggacggc cgagaattcg gccgcgtttg tcattccgca tctcaccccg
121 gggacgtcgg tcctcgacgt cgggtgcgga ccaggcagca tcacgtgcga tttcgcgggg
181 ctggttgcac ccgcgcaggt catcggcttg gatcgctcgg cggacatcgt cgcccaggca
241 acggaactgg caaaggaccg cggcgtagac aacgtggagt tccgaaccgg caacatctac
301 gatctcgagt ttgaggacga gaccttcgac ctcgtccacg cccatcaagt cctccagcac
361 ctcaccgatc ccgtcgccgc gctgcgtgag atgcgccgcg tggcaaaacc gggcgcgatc
421 gtggccgtcc gcgacgccga tttccacggc atgagctggt acccggaagt ccccgagctc
481 gatgactgga tggagctcta ccagaagatc gcacggcgaa acggtgccga accggatgcc
541 ggccgtcgat tggtctcgtg ggcacagcag gcaggctttg cccaggtggc gcccagcagc
601 agcaactggc tctacgccac agcccaacaa cgggcatggc agtcccgcgt gtggagcgaa
661 cgtgtcctcc actccgcttt tgccgagcaa gccctcgaat acgggttcgc caatgaggcc
721 gacctcgccc ggatcgctgc gggctggcac cgctggggag ccacggacga tggctacttc
781 ctcattccca acggcgaggt gatcgcgcgg gcctaggttg ggcgtgaagg aactttccca
841 aaaaactttg cgaaccatgt agaaaagcgc cccgcagttc cgacccatga gtgaaagcac
901 ccaaacaggg tgccacttca taaggagttt gagatgaaat acatgatcat gatgttcggg
961 tccgccgagg gcatgatgga aaccgccgat ccggagtggg tcaaggaaat gatcgggttc
1021 atgatccaga tcgacaagga cctccgcgat tccggtgaac tcgtctttaa cgcagggctg
1081 gctgatggca gcaccgcgaa gctcgtcaag cagaccccgg acggcgtcat caccacggac
1141 ggcccatacg ccgagtcgaa ggagtcgctg atcgggtact gggtggtgga tgtggccagc
1201 gaggaacgcg ctgtggaaat ctgctcgagc atcgtcaagt actcgcaagt ggttgagctc
1261 cgccctatcc cggacggtcc cccggaggtc tagtttggcc gttgcaccac gcgaggttcc
1321 cccacgcgac attgaggacc tgctgcgcac cttggcgccg caggtcctct ccgtgcttgc
1381 acgcaaccac ggacagttcg acgcctgcga ggatgcagtg caagaagccc tcattgaggc
1441 cgccctgcaa tggccatccg ggctgccaac aaatccgaag ggatggctgc tggctgtcgc
1501 atcgcgccgg ctggttgatg tgtggcgcag tgaaagcgcg cgccgcgccc gggaggaacg
1561 cgtcgccgca atggaggtca atttccagga cggcgcggcg tctgaagccg acgacaccct
1621 gaccctgatg ttcctgtgtt gccacccctc gatcagcgca ccatcgcaac tcgccttgac
1681 gctacgggcc gtaggaggac tcacgacggc ggaaatcgcc tctgctttcc tcgtccccga
1741 ggcgaccatg ggccagcgca tcagccgtgc gaaacaaggg atccagaagg caggcgcccg
1801 gttcgacatg ccgccggagt ccgagcggaa ggcgaggctc ggcgtcgttc ttcacgtcct
1861 gtacctgatc ttcaacgaag gctacgccgc aagctcgggc gactcgctgc aacgcgaaga
1921 cctcaccacc gaagcgatcc ggttggcacg cctgctggtg agcgccgcgc cggcagagct
1981 cgaggccacc ggtttgctcg ctctgatgct gctgacggac tcccgtcgtg ccgcgcgcac
2041 cctggctgac gggatgcccg tgccgctgtc cgaacaggac cggactctat ggaaccgcgg
2101 gcagatagag gaaggcatcg cactcctgtc ttccgtgctt ggacgtggcg cggctgggcc
2161 gtaccaactc caggccgcga tcgctgctgt gcatgcggag gcgccgtctg atgccgagac
2221 ggactggccg cagattctcg ccttgtacac agtccttgaa gccgtagcgc ccagtcccgt
2281 ggtgacgctg aatcgtgctg tagccgtggc catggtgaat gggccagccg ctgggcttga
2341 gctgctggcc cggctggatt cggcaattgg tcggtcgcat cgcctggatg cggtacgagg
2401 ccatctgtat gaaatggcag gctcgtatgg ggaagcgcgt gccgcctacc tcgccgccgc
2461 caagaaaacg ggcagcctgc aggagcgacg gtacttgatg gggaaagtgg cgcgaatgga
2521 ctcgtcaggg ggttcctga
//
LOCUS GCA_005860785 8747 bp DNA UNK 01-JAN-1980
DEFINITION .
ACCESSION GCA_005860785_8650686-8659433
VERSION GCA_005860785_8650686-8659433.1
KEYWORDS .
SOURCE GCA_005860785
ORGANISM GCA_005860785
.
COMMENT ##antiSMASH-Data-START##
Version :: 6.1.1
##antiSMASH-Data-END##
FEATURES Location/Qualifiers
CDS 1..642
/locus_tag="GCA_005860785_GCA_005860785_7712"
/deepbgc_score="0.78045"
PFAM_domain 31..249
/db_xref="PF05724.10"
/evalue="1.4e-08"
/locus_tag="GCA_005860785_GCA_005860785_7712"
/database="31.0"
/description="Thiopurine S-methyltransferase (TPMT)"
/deepbgc_score="0.25303"
PFAM_domain 103..432
/db_xref="PF01209.17"
/evalue="6.5e-07"
/locus_tag="GCA_005860785_GCA_005860785_7712"
/database="31.0"
/description="ubiE/COQ5 methyltransferase family"
/deepbgc_score="0.41177"
PFAM_domain 115..228
/db_xref="PF03848.13"
/evalue="3.1e-06"
/locus_tag="GCA_005860785_GCA_005860785_7712"
/database="31.0"
/description="Tellurite resistance protein TehB"
/deepbgc_score="0.52720"
PFAM_domain 118..447
/db_xref="PF13489.5"
/evalue="3.4e-12"
/locus_tag="GCA_005860785_GCA_005860785_7712"
/database="31.0"
/description="Methyltransferase domain"
/deepbgc_score="0.70708"
PFAM_domain 121..438
/db_xref="PF05175.13"
/evalue="3.2e-09"
/locus_tag="GCA_005860785_GCA_005860785_7712"
/database="31.0"
/description="Methyltransferase small domain"
/deepbgc_score="0.75342"
PFAM_domain 124..333
/db_xref="PF05401.10"
/evalue="0.00041"
/locus_tag="GCA_005860785_GCA_005860785_7712"
/database="31.0"
/description="Nodulation protein S (NodS)"
/deepbgc_score="0.85638"
PFAM_domain 127..435
/db_xref="PF06325.12"
/evalue="0.0057"
/locus_tag="GCA_005860785_GCA_005860785_7712"
/database="31.0"
/description="Ribosomal protein L11 methyltransferase
(PrmA)"
/deepbgc_score="0.88533"
PFAM_domain 130..288
/db_xref="PF12847.6"
/evalue="0.0094"
/locus_tag="GCA_005860785_GCA_005860785_7712"
/database="31.0"
/description="Methyltransferase domain"
/deepbgc_score="0.91676"
PFAM_domain 133..441
/db_xref="PF13847.5"
/evalue="6e-10"
/locus_tag="GCA_005860785_GCA_005860785_7712"
/database="31.0"
/description="Methyltransferase domain"
/deepbgc_score="0.95727"
PFAM_domain 133..303
/db_xref="PF09445.9"
/evalue="0.00017"
/locus_tag="GCA_005860785_GCA_005860785_7712"
/database="31.0"
/description="RNA cap guanine-N2 methyltransferase"
/deepbgc_score="0.95776"
PFAM_domain 136..429
/db_xref="PF08241.11"
/evalue="4.3e-14"
/locus_tag="GCA_005860785_GCA_005860785_7712"
/database="31.0"
/description="Methyltransferase domain"
/deepbgc_score="0.97305"
PFAM_domain 136..423
/db_xref="PF08242.11"
/evalue="7.1e-13"
/locus_tag="GCA_005860785_GCA_005860785_7712"
/database="31.0"
/description="Methyltransferase domain"
/deepbgc_score="0.97477"
PFAM_domain 136..417
/db_xref="PF13649.5"
/evalue="8e-17"
/locus_tag="GCA_005860785_GCA_005860785_7712"
/database="31.0"
/description="Methyltransferase domain"
/deepbgc_score="0.97198"
CDS complement(907..1473)
/locus_tag="GCA_005860785_GCA_005860785_7713"
/deepbgc_score="0.93716"
PFAM_domain complement(928..1017)
/db_xref="PF01923.17"
/evalue="0.0031"
/locus_tag="GCA_005860785_GCA_005860785_7713"
/database="31.0"
/description="Cobalamin adenosyltransferase"
/deepbgc_score="0.93115"
PFAM_domain complement(1156..1434)
/db_xref="PF09836.8"
/evalue="0.0022"
/locus_tag="GCA_005860785_GCA_005860785_7713"
/database="31.0"
/description="Putative DNA-binding domain"
/deepbgc_score="0.94318"
CDS complement(1478..2053)
/locus_tag="GCA_005860785_GCA_005860785_7714"
/deepbgc_score="0.96149"
PFAM_domain complement(1505..1843)
/db_xref="PF16859.4"
/evalue="6.6e-26"
/locus_tag="GCA_005860785_GCA_005860785_7714"
/database="31.0"
/description="Bacterial transcriptional repressor
C-terminal"
/deepbgc_score="0.95843"
PFAM_domain complement(1880..1990)
/db_xref="PF00440.22"
/evalue="1.7e-10"
/locus_tag="GCA_005860785_GCA_005860785_7714"
/database="31.0"
/description="Bacterial regulatory proteins, tetR family"
/deepbgc_score="0.96455"
CDS complement(2127..2987)
/locus_tag="GCA_005860785_GCA_005860785_7715"
/deepbgc_score="0.95681"
PFAM_domain complement(2130..2783)
/db_xref="PF12679.6"
/evalue="0.0069"
/locus_tag="GCA_005860785_GCA_005860785_7715"
/database="31.0"
/description="ABC-2 family transporter protein"
/deepbgc_score="0.95412"
PFAM_domain complement(2154..2738)
/db_xref="PF12698.6"
/evalue="0.00033"
/locus_tag="GCA_005860785_GCA_005860785_7715"
/database="31.0"
/description="ABC-2 family transporter protein"
/deepbgc_score="0.95056"
PFAM_domain complement(2373..2915)
/db_xref="PF12730.6"
/evalue="5e-05"
/locus_tag="GCA_005860785_GCA_005860785_7715"
/database="31.0"
/description="ABC-2 family transporter protein"
/deepbgc_score="0.96575"
CDS complement(2984..3895)
/locus_tag="GCA_005860785_GCA_005860785_7716"
/deepbgc_score="0.95974"
PFAM_domain complement(3341..3529)
/db_xref="PF13304.5"
/evalue="7.8e-09"
/locus_tag="GCA_005860785_GCA_005860785_7716"
/database="31.0"
/description="AAA domain, putative AbiEii toxin, Type IV TA
system"
/deepbgc_score="0.95292"
PFAM_domain complement(3428..3844)
/db_xref="PF00005.26"
/evalue="3.7e-25"
/locus_tag="GCA_005860785_GCA_005860785_7716"
/database="31.0"
/description="ABC transporter"
/deepbgc_score="0.96400"
PFAM_domain complement(3764..3838)
/db_xref="PF13555.5"
/evalue="0.0049"
/locus_tag="GCA_005860785_GCA_005860785_7716"
/database="31.0"
/description="P-loop containing region of AAA domain"
/deepbgc_score="0.96230"
CDS 4012..5214
/locus_tag="GCA_005860785_GCA_005860785_7717"
/deepbgc_score="0.95110"
PFAM_domain 4585..4782
/db_xref="PF07730.12"
/evalue="1.7e-17"
/locus_tag="GCA_005860785_GCA_005860785_7717"
/database="31.0"
/description="Histidine kinase"
/deepbgc_score="0.95964"
PFAM_domain 4930..5052
/db_xref="PF02518.25"
/evalue="0.0056"
/locus_tag="GCA_005860785_GCA_005860785_7717"
/database="31.0"
/description="Histidine kinase-, DNA gyrase B-, and
HSP90-like ATPase"
/deepbgc_score="0.94257"
CDS 5211..5870
/locus_tag="GCA_005860785_GCA_005860785_7718"
/deepbgc_score="0.96030"
PFAM_domain 5220..5552
/db_xref="PF00072.23"
/evalue="1.2e-23"
/locus_tag="GCA_005860785_GCA_005860785_7718"
/database="31.0"
/description="Response regulator receiver domain"
/deepbgc_score="0.94182"
PFAM_domain 5673..5801
/db_xref="PF04545.15"
/evalue="0.00012"
/locus_tag="GCA_005860785_GCA_005860785_7718"
/database="31.0"
/description="Sigma-70, region 4"
/deepbgc_score="0.95638"
PFAM_domain 5676..5837
/db_xref="PF00196.18"
/evalue="3e-15"
/locus_tag="GCA_005860785_GCA_005860785_7718"
/database="31.0"
/description="Bacterial regulatory proteins, luxR family"
/deepbgc_score="0.97019"
PFAM_domain 5679..5798
/db_xref="PF08281.11"
/evalue="1.2e-05"
/locus_tag="GCA_005860785_GCA_005860785_7718"
/database="31.0"
/description="Sigma-70, region 4"
/deepbgc_score="0.97018"
PFAM_domain 5715..5789
/db_xref="PF13412.5"
/evalue="0.0016"
/locus_tag="GCA_005860785_GCA_005860785_7718"
/database="31.0"
/description="Winged helix-turn-helix DNA-binding"
/deepbgc_score="0.96292"
CDS complement(5905..6498)
/locus_tag="GCA_005860785_GCA_005860785_7719"
/deepbgc_score="0.93610"
PFAM_domain complement(6106..6474)
/db_xref="PF11716.7"
/evalue="4.1e-17"
/locus_tag="GCA_005860785_GCA_005860785_7719"
/database="31.0"
/description="Mycothiol maleylpyruvate isomerase N-terminal
domain"
/deepbgc_score="0.94886"
PFAM_domain complement(6211..6468)
/db_xref="PF12867.6"
/evalue="0.0017"
/locus_tag="GCA_005860785_GCA_005860785_7719"
/database="31.0"
/description="DinB superfamily"
/deepbgc_score="0.92334"
CDS complement(6663..7181)
/locus_tag="GCA_005860785_GCA_005860785_7720"
CDS complement(7414..7854)
/locus_tag="GCA_005860785_GCA_005860785_7721"
/deepbgc_score="0.82746"
PFAM_domain complement(7603..7746)
/db_xref="PF03779.13"
/evalue="1.1e-16"
/locus_tag="GCA_005860785_GCA_005860785_7721"
/database="31.0"
/description="SPW repeat"
/deepbgc_score="0.82746"
CDS 8100..8747
/locus_tag="GCA_005860785_GCA_005860785_7722"
/deepbgc_score="0.78500"
PFAM_domain 8208..8303
/db_xref="PF00440.22"
/evalue="7.5e-06"
/locus_tag="GCA_005860785_GCA_005860785_7722"
/database="31.0"
/description="Bacterial regulatory proteins, tetR family"
/deepbgc_score="0.78500"
region 1..8747
/region_number="1"
/candidate_cluster_numbers="1"
/product="RiPP"
/contig_edge="False"
/tool="DeepBGC"
/activity="antibacterial-cytotoxic"
/note="DeepBGC_Score: 0.90556"
cand_cluster 1..8747
/candidate_cluster_number="1"
/protoclusters="1"
/kind="single"
/product="RiPP"
/tool="DeepBGC"
protocluster 1..8747
/protocluster_number="1"
/product="RiPP"
/tool="DeepBGC"
proto_core 1..8747
/protocluster_number="1"
/product="RiPP"
/tool="DeepBGC"
ORIGIN
1 atgacggatc acctcgacca cgcctccgcc gaagagttct gggacgcccg ctacggttcc
61 agcgaccgca tctggagcgg caaccccaat gccgccctgg tccgcgaaac ggccggactg
121 accccgggaa gcgccctgga cctcggatgc ggggagggag ccgacgcgct gtggctggcg
181 cagcagaact ggaaggtcac cgccgtcgac atctcccgga aggcactgga gcgaggggcc
241 gagcatgcgg ctgccgcggg tgtggccgac cggatcgact ggcagcggtg cgatctggcg
301 gtgtcgttcc ccaccggcgc cttcgacctg gtctccgccc acttcctgca ctcgcccgtc
361 acgatgccga gggagcagat cctgcgcagc gccgcggcgg ccgtcgcccc gggcggtgtg
421 ctgctggtcg tcgggcacgc cggatcccct tcctgggtgg ccgacaaggg tcacgccccg
481 cacttcccca catctgaaga ggtcttcgcg ggcctcgatc tcccggccgg gcagtgggag
541 gtgctcctcg cggacgtcca cgagagcggg atgaccggtc ccgacgggca gcccgcaacc
601 cgcacggaca acaccctgaa gctccggcgc ttggcgcagt gatcccctgc ccgtaggact
661 cgcctcgccc ccgtcggtag gtgccgacgg gccgggtcac ctggacagcg ggaagcggcg
721 gcagcaggac ggtcagcgcg atcagcaccg cgatgctcag gacacccatc acccctgttc
781 ggtggtgttc gttcgaaagg ccatgagccc atggaacagt cgcagcccgc tgcgatccgg
841 cgggttttcg acctcaccct gagcgttcgc gagcgagccg tcacgtcagc gatgccgatg
901 agccgttcag ccggaggctc cgtcggcatc ggcggcctcg gcggcggtac gctcggcccg
961 gcgcgcgacg gcgcgcagca cctcgagcct gatgaacgcg gcaccgccca gccggatgaa
1021 gagccagtag gggcggaaga tccgtcgcga gcgggcgtcg gtggcccgca ccctggtctc
1081 gaaggagacg acggtggcgt caccgtcggg caccacccgc agactcatgg ccgccttcgc
1141 ccaccccggc tccgcgaatc gggcgaaggc atcggcgtcg cccggctcga tcggtgcgga
1201 cgggggccga agccgccaga acttggcgac catgcccttg accagctccg ctccctccac
1261 ggtgctgagg gtcggcgtcg ggaacgtgtc gagaaacggg cctttgagcc ggttccggcc
1321 ggtcgtccgc agcagcatca gccggcgcga gacggggagt tcctcggcgg tgacgtcgag
1381 cagggcccgc catacggtgg ccgggtccgc cgcgatgtgc cgggtgaaac aggagcggaa
1441 gtcgtggacc ggaagcaggc ggtcgagttc catcgggtca ctccggagtg tcgagcccgg
1501 cggccaccgc gtcggccagc cgccggggca ggtccgacgg cggtgtgccg ccgagcatga
1561 acagatggac gaagacggtg ccgagcagtt gcgcgtgcgc cagtaccgga tcggacccgc
1621 tgcggatctc accccgctgg gtggcccgtt cgaggatccg ggccagtacg ttctgctgac
1681 ggtcgatgaa cgtggcggcg aaccgggtcg ccaactccgg gtcggcggcg atgtcggcga
1741 tcagccccgg cagcgccttc cgcgcgtgcg gcgcggacag caggacgatg atctgctcga
1801 cgagagcggt cagatcggcc cgcagaccgc cgaggtcggc gggcggggcg atctccgtcc
1861 cgtgcaccac ggccacgaac accatctcgg ccttggaccg ccaccgccgg tagatccccg
1921 ccttgccgac ccccgcccgg gccgcgacgc ggtcgatgct caactgctgg tacccgacct
1981 cgtccaggat cccccgtacg gcgtcactga ccgccgcgtc gaccttgccg tcccgcgggc
2041 gtcctcggtt catggcctca gcataaacga aacggatcga tccgtaaatg aaccggcgag
2101 gacaacgggc ccgatgcgcc cggggctcac acgtcccggc gccgcacgac aacgatggcg
2161 acgaccaccg acaccaacgc ccagcagccg agggcgatcc aggacccggt gatcgttgcc
2221 gggtacttac ccgggtcgaa gtacgggttg tccgggttct ccttcagccg gtccaccgcg
2281 gcgatgggca ggcagtggcc gatctccttg aggagcttgt accggtcacc gccgaacatc
2341 aggggagcga tgaacagcag ggacaccagg ccgacgatcg tggccgtggc gtgccggatg
2401 acggcggcga aggcgatgcc gatgagggcc gacaccggga cgatcagcgc gtacgccgtg
2461 acggcgcgca gacagccggg gtcgttgatc gagaggccga catggcgtga ggccagcatg
2521 gcgttggtgc cgaagaagga cgccgtcgag atgatcacgc ccatgacgag ggtgaccgcg
2581 gtgacgacga ccaccttggc cgccaccacc gctcgccggt cgggtacggc ggcgaacgtc
2641 gtacggacca ttccggtggc gtactcgccg aagatcgaga tggcgccgac gctggccgcc
2701 gcgagcatca tgaggtagac ggcgatgctg ttgagaccgt ggaagagcgg gtcgtaccgg
2761 tacgggggca tccccggcct cgcggactgg tcgatgtagg tcaggtcggt gtggacggcg
2821 ttgagattga ccgcgatggc gaccaacgcg ctgaccacca gcacccagta ggtcgaccgg
2881 agcgaccgca ttttgatcca ctcggccgcg agcaggtcca cgaaccgcac cgcgggatcg
2941 gtcattcgcg ccggcggcgt tacggcggtc gaggtcatcg tgctcatcgg ggatctcctg
3001 cgaggtattc aacgctgtcg gcggtgagtt ccatgaacgc ggtctccagc gaagaggtct
3061 gggtcgcgag ttcgtgcagc atgatgcggt gcaggtgggc gagttcgccg atccgggccg
3121 cactcagtcc ggccacccgc cccaccgtgc cgctctcatg acgcaccgac gcaccctcgg
3181 cggtcagtac ggttgccagc gcgtcgagtt cgggcgtctg cacggtcacg cccgtgccgg
3241 tgctgcgggc cgagaagtcc tccaggctct ggtcggcgat gagccggccc tgaccgatca
3301 cgatcagatg atcggcggtg tgctccatct cgcccatcat gtggctggag acgaacacgg
3361 tgcggccctc ggacgccagc cgtcggaaca ggccgcgcac ccacagcacg ccttccgggt
3421 ccaggccgtt gagcggttcg tcgaacagca gcaccggagg atcaccgagc agagcgcccg
3481 cgatgcccag ccgctgcttc atcccgaggg agaacccgcc gatgcggcgc cgcgccgcct
3541 tggccagccc gacctccgcc aggacatcgt ccacgcggtg cagcgggatg cggttgctgc
3601 gcgccagggc ggacagatgc gccgcggcgc tgcgcccgcc gtgcacgtcc tgcgcgtcca
3661 gcagcgcgcc gacatggcgc agtccgcgcg gccggtccgc gaacgggacc ccgtcgacgg
3721 tgacggaccc gccggtcggg gtgttcagtc ccaggatcat ccgcagagtg gtggacttgc
3781 ccgcaccgtt cgggccgagg aatccggtca cctgtccagg tcgcacggtg aaggtcaggg
3841 tgtcgacggc gagggtgtcg ccgtagctct tggtcagtcg gtcggcttca atcacgatgc
3901 caacggtgcc ggtccggcaa ccggtgccgc atcgggccgg cggccacctt ccgcaagtgc
3961 cgaggtggcc ccggggacta catccacggg ccaatgcgca ggtggaggcc ggtgacctac
4021 gatcgggcca tgaccgtcac ccaccgtcct ccattgctga agcgcttgcc catgggcgtg
4081 tgggtcggcg ccttctggtc gacgctcatc cttgtgcgct ccttccaacg gccggacgag
4141 ttccgtcatc tgaccgaact cgacggcaac atcgagggtg gcccgctgct catcgtcgcg
4201 gtcgtcacca ccttcggcgc cctgctgctg ttccgcgcgc cgctggcgtc actcggcctg
4261 gcgctcgcgg gcgtcgtcgt cgccctcgga tcgcgtgtgg tggaggcgcc gatcgtggtc
4321 ttcctgctgg ccgacggagt ggtgggctac atcgcggcca cccgctcgcg ccggatctcg
4381 atcctcgccg ccatgctgcc ggtcgttctg gtgacggctt tcacggtcac ccggctgata
4441 cgcgaagggg acgccgggat cgcggcggag gccgccgtcg cgtcgaccgc ggtcatcgcc
4501 tggctgatcg gcaacacgat ccaccagagc cgcgcctaca ccgagacgct gcgctcccgg
4561 gccactcaac aggcggtcac cgccgagcgg ctgcggatcg cccgcgaact gcacgacatg
4621 gtcgcgcaca gcatcggcat catcgccatc caggcgggtg tggccagccg cgtcatggac
4681 agtcagcccg acgagacccg caaagcactc gacgccatcg aggccaccag ccgcgagacc
4741 ctgtccgggc tccggcggac gctgcgcgcg ctgcgccagt cggatgcgga ctcggcgccc
4801 ctcgacccgg cgccgggact ggccgacgtc ggacaactgg tcgcgaagac cagggaagcc
4861 ggtgtgcgcg tcgacgtccg gtggcgcggc gaacgcagac ccgtgcccgc cgacatcgac
4921 ctctccgcct tccgcatcat ccaggaggcg gtcaccaacg tcgtgcggca ctccggcacc
4981 cgggactgca gcgtgagcgt cgactaccgg gacgaggagt tgtccatcga ggtcgtcgac
5041 ctcggctgcg gcggcgaggg gggagcgggg tacggcatcg tcggtatgcg cgagcgggtc
5101 agcctgttgc acggcgaatt cagcgccggc ccgcggcccg aaggcggctt ccgcgttgcc
5161 gcacggctcc cggtgcccgc cggggccggg atggccgggg tgaccgcccg atgatccgcg
5221 tcatcctcac cgacgaccag cccctggtcc gcaccggcct gcgcgtcctg atcgccgaca
5281 cccccgacat cgaagtcgtg ggtgaggccg cgaacggcgc cgaggcggtg tccctggcca
5341 cggaactgcg ccccgacgtc gccgtcatgg acatccgcat gcccgtgctg gacggcatcg
5401 gagccgcgcg actcatcacc gaggacccgc agttgccgac gcgcgtcctc gtcctcacca
5461 cgttcgacga cgacgagtac gtctacgccg cgctgcgcgc cggagcgagc ggattcctcg
5521 tcaaggacat gccgctggag tccattctcg acgggatccg cgtcgtcggc gccggtgacg
5581 ccctgatcgc ccccagcgtc acccgccgcc tcatcgcgga gttcgccggc cgccccgagc
5641 ccaccacccc gcaccccgcc ccggtggacg gcgtcaccaa ccgcgagcgc gaagtcctga
5701 cactcgtggg ccgcggtctg tccaacaccg agatcgccga ggaactcgtc atcagcgtct
5761 ccaccgcaaa agcccacgtc gcccgcctgt tcaccaaact cgccgcccgc gaccgcgtcc
5821 acctcgtcat catcgcgtac gaactcggcc tcgtatcccc gccccgctga cgcctactgc
5881 cagggcttcc cgcctttccc tgtatcaggc ggcggcccat gccgggtcgc gtccggtgag
5941 cgcgacgatg cggtcgagga gcggggcgtc ggaggggatg gcgacggcgg ggccgaagat
6001 gccctggcgg gcgggggact gtcccggggc ggccaggggc tccacgaact cgcggcagcc
6061 ttcgagcgtt tggacgtcgg cggagaacgg ctgcccggtg gcgcgggcga tgtcccagcc
6121 gtggatggtc agttcgttga ggcccaccct gccgatcatg ccggcgggca tcgtgacacc
6181 gccgacctcc gtcatcccct cccaggcggc cggatcgcgc caggcctcgg cgagggtggt
6241 gagctgctcg gggatgcggg tgcgccagtc cgcgccgagc cgggaggcgt cgggggaggg
6301 cggttgggaa ccctcgctga gcggcgtttt ccgggcggcg atgaggaagg cgtgggagag
6361 gccgtcgaca tggtcgagga ggtcgccgag ggtgtacttg gcacacggtg tcggcgcggt
6421 gagctgttcg tcggggatgg cgctcagcag gtccgccagt cgctgggccg cgggaccgag
6481 gtcgagcatc gctgtcatgg gagtgactcc ttcggcgcgt gttcgcagta tgcgtggcaa
6541 cgaaggggta gacggcatgg cgccgcggaa ctcatcggtc gcgccgatgg tcgcgccgat
6601 attgtttcgc gcggcagcct ggaggccccc ggacggccgc gtccgacctg gtggagcccg
6661 gcctagtgga gactgcgctg catccacggc acgaggtcgc gctcctcgtg caggaggtac
6721 tggtcgatct cgccgagcac gagctgcgtg gtgagcgtca tgccctcgac ggccaccttg
6781 ccggcgatga cgtcgtccag acggctcagc agctgagcac cgtcctcgcg atcgcgcacc
6841 accgccacgg aggccttgtc gtgccgctcc agaagcgggt ggaccgtcgc ttgtttcgcc
6901 gcgtagtgcg cccggaacgc ctcccggaac gcgggcaggc cctccggtgc gcccgtggga
6961 accctgtgca ggtcctgcag ggtggcccgc agggcgcggc tgcgggcgat gagagcgtcc
7021 gtcgcgacgg tgtgaatctc gtgcgcctgc gcctcgtcgt gccggcgctt ctcggcgttg
7081 gcgttgaatt ccgccccgtg ctggttgatc tcgtgcggac cctgggggtg gtcctgcgac
7141 gggcggttct gcgacgcgtg gtcctgatcc gaagcggtca tggtgggcga cacctcctgg
7201 ttgcatccac tcacgctgtc ctggtacccg caatccagga gcggacgccg ccgcggacgc
7261 cacgtcccca cgaatggacg cccggccggg gccgggcgtg tccagggtga tcgggccggc
7321 ttccgccgcg cgcacggatg gccgtccccg gcattcgccg gcccctccag gacgaccatc
7381 agtactttgt ccgcgtcgga tgtggtcccg gcgtcaggac tctctcttgt tcctcatcat
7441 cattcctgcc gccgcgaggc cgagcaggca gatgacggcg ccggtgatga tgttgttcca
7501 catcgcgccg gtctcggggt gacggatgac cacccacggg gagacgatca gccagacgcc
7561 gatcgccgcg cacgccaaac tcatgccctg catgcgatcc ggggccatgg tcagacacag
7621 cgccaacacg gcgaccgcga tgcccatgat caggttgttc tgtgccaact gcggggacgt
7681 gggcgcgaag tgcaccgtcc agggcgagat ggcggcgtac agcccggcca ggagcagcag
7741 tccgtcgact cctacgatcc cccggcctcc gagcacgcgg gcgtagcgat cacgcatttc
7801 cgatacgtcc ggatgtgctg cgatatcccc cgggcggtgt gacacgtcgg ccatacgact
7861 cgcctccttc ggcttatgca agtctgaccg cgtttgcggt atctaccgcg agtccattgt
7921 gcgcctaagt tgccgttatg tgtaggttct gccgtgccgt gctggcggcc tccgccggac
7981 ggggtgtccg acatgatgtt ggacaccatg tccgactgcc attaacttag gaaatgtccg
8041 atctggaggg agtgcgattc ggaaacgcac ctcaccggaa ggaccggaag gaggaaggaa
8101 tggcctcgat cacccgtccc cgttcgcagc agtcccagcg gcgcgccggc accgagcgcg
8161 tggtgttcgc cgcggtgcag cgcctcctcg acgcggggga gtgcttcacc gaactgggtg
8221 ttcggcgcat cgccagcgaa gcggggatcg cgcgctccac cttctatctc tgcttccagg
8281 acaagaccga ggtcctgatc cggctgaccg ccacgatgaa ggacgaactc ttcagcaggg
8341 gcgcggcctg gcgccccacc ggtccgggcg gcggacccga ggcgctggcc gccgtctacg
8401 cgggccggct cgcgtactgc cgggagcggg cgccactgct ggccgccgtc gcggaggtcg
8461 ccgcgtacga tccggtcgtt cgcgaggcca gggcgcagga gatcgagcgc ttcgcgcacc
8521 acatcgcgtc gctgctggag gaggagcaac gcgagggccg gctctccgcg gacgtcgacc
8581 cggtgaccgc cggacaggtg ctcgcctggg gcggggagca ggtcatcgcc cgtcaggtga
8641 cgaccggcgg cgcggaggac gacgccaagg tcgcccgcga gatggcgtac ggccaatggt
8701 tcggcaccta ccgccgccgg cctggccctg acgggacgcc gacctga
//
Any insight into how sensitive the model is to these missing inputs would be greatly appreciated!
Thanks for your very cool work on this tool!!!