-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Description
Hello,
I am using snapgene_reader to extract the plasmid name and length from a list of files to populate a database. I have been using it for a while without problems. Here is the script:
#Batch import
plasmid_path = 'D:/Plasmid_maps/*.dna'
plasmidfiles = glob.glob(plasmid_path)
#Generate output file
output = []
for x in plasmidfiles:
plasmidname = re.sub(r'^.*? - ', '', os.path.basename(x).removesuffix(".dna")) #Filename format is "pGG0001 - pDGB3a2+p35S_RUBY_NtEUt.dna"
print(plasmidname)
if plasmidname[0:4] == "000_":
continue #Skip maps that start by "000_".
y = snapgene_file_to_dict(x)
seqlen = len(y["seq"])
output.append((plasmidname, seqlen))
#Write file
with open("0000_plasmid_info.txt", "w", newline='') as the_file:
writer = csv.writer(the_file,delimiter='\t')
for x in output:
writer.writerow(x)
However, for one of the files, I get the following error:
ValueError Traceback (most recent call last)
Cell In[26], [line 20](vscode-notebook-cell:?execution_count=26&line=20)
[18](vscode-notebook-cell:?execution_count=26&line=18) if plasmidname[0:4] == "000_":
[19](vscode-notebook-cell:?execution_count=26&line=19) continue #I made an "extra file" to quickly access the most used plasmids. The filename starts by "000_".
---> [20](vscode-notebook-cell:?execution_count=26&line=20) y = snapgene_file_to_dict(x)
[21](vscode-notebook-cell:?execution_count=26&line=21) seqlen = len(y["seq"])
[22](vscode-notebook-cell:?execution_count=26&line=22) output.append((plasmidname, seqlen))
File [d:\miniconda3\lib\site-packages\snapgene_reader\snapgene_reader.py:165](file:///D:/miniconda3/lib/site-packages/snapgene_reader/snapgene_reader.py:165), in snapgene_file_to_dict(filepath, fileobject)
[163](file:///D:/miniconda3/lib/site-packages/snapgene_reader/snapgene_reader.py:163) parsed_qualifiers[qualifier["@name"]] = d_v = {}
[164](file:///D:/miniconda3/lib/site-packages/snapgene_reader/snapgene_reader.py:164) for e_v in qualifier["V"]:
--> [165](file:///D:/miniconda3/lib/site-packages/snapgene_reader/snapgene_reader.py:165) (fmt1, value1), (_, value2) = e_v.items()
[166](file:///D:/miniconda3/lib/site-packages/snapgene_reader/snapgene_reader.py:166) fmt = format_dict.get(fmt1, parse)
[167](file:///D:/miniconda3/lib/site-packages/snapgene_reader/snapgene_reader.py:167) d_v[value2] = fmt(value1)
ValueError: too many values to unpack (expected 2)
It seems that this file might have two names?
I have tracked the files that give the problem but I cannot find any possible cause (I am attaching them). It seems these two plasmid files contain a gene sequence that was downloaded from NCBI, so there might be the issue.
plasmids.zip
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels