Skip to content

Error with features coming from NCBI #7

@eggrandio

Description

@eggrandio

Hello,

I am using snapgene_reader to extract the plasmid name and length from a list of files to populate a database. I have been using it for a while without problems. Here is the script:

#Batch import
plasmid_path = 'D:/Plasmid_maps/*.dna'
plasmidfiles = glob.glob(plasmid_path)

#Generate output file
output = []
for x in plasmidfiles:
    plasmidname = re.sub(r'^.*? - ', '', os.path.basename(x).removesuffix(".dna")) #Filename format is "pGG0001 - pDGB3a2+p35S_RUBY_NtEUt.dna"
    print(plasmidname)
    if plasmidname[0:4] == "000_":
        continue #Skip maps that start by "000_".
    y = snapgene_file_to_dict(x)
    seqlen = len(y["seq"])
    output.append((plasmidname, seqlen))

#Write file
with open("0000_plasmid_info.txt", "w", newline='') as the_file:
    writer = csv.writer(the_file,delimiter='\t')
    for x in output:
        writer.writerow(x)

However, for one of the files, I get the following error:

ValueError                                Traceback (most recent call last)
Cell In[26], [line 20](vscode-notebook-cell:?execution_count=26&line=20)
     [18](vscode-notebook-cell:?execution_count=26&line=18) if plasmidname[0:4] == "000_":
     [19](vscode-notebook-cell:?execution_count=26&line=19)     continue #I made an "extra file" to quickly access the most used plasmids. The filename starts by "000_".
---> [20](vscode-notebook-cell:?execution_count=26&line=20) y = snapgene_file_to_dict(x)
     [21](vscode-notebook-cell:?execution_count=26&line=21) seqlen = len(y["seq"])
     [22](vscode-notebook-cell:?execution_count=26&line=22) output.append((plasmidname, seqlen))

File [d:\miniconda3\lib\site-packages\snapgene_reader\snapgene_reader.py:165](file:///D:/miniconda3/lib/site-packages/snapgene_reader/snapgene_reader.py:165), in snapgene_file_to_dict(filepath, fileobject)
    [163](file:///D:/miniconda3/lib/site-packages/snapgene_reader/snapgene_reader.py:163) parsed_qualifiers[qualifier["@name"]] = d_v = {}
    [164](file:///D:/miniconda3/lib/site-packages/snapgene_reader/snapgene_reader.py:164) for e_v in qualifier["V"]:
--> [165](file:///D:/miniconda3/lib/site-packages/snapgene_reader/snapgene_reader.py:165)     (fmt1, value1), (_, value2) = e_v.items()
    [166](file:///D:/miniconda3/lib/site-packages/snapgene_reader/snapgene_reader.py:166)     fmt = format_dict.get(fmt1, parse)
    [167](file:///D:/miniconda3/lib/site-packages/snapgene_reader/snapgene_reader.py:167)     d_v[value2] = fmt(value1)

ValueError: too many values to unpack (expected 2)

It seems that this file might have two names?
I have tracked the files that give the problem but I cannot find any possible cause (I am attaching them). It seems these two plasmid files contain a gene sequence that was downloaded from NCBI, so there might be the issue.
plasmids.zip

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions