Skip to content

Commit bed6412

Browse files
committed
v3.2.0
1 parent 48679f9 commit bed6412

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

48 files changed

+743
-1085
lines changed

doc/Release.html

Lines changed: 44 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,49 @@
77
<!--#include virtual="./ssi/start1.html" -->
88

99

10-
<h3>TCW Version 3.1</h3>
11-
Version 3.1 releases are about tidying things up, without any major new features
10+
<h3>TCW Version 3.2</h3>
11+
Release v3.1.8 converted from using the GO file <tt>go_&lt;date&gt;-termdb-tables.tar.gz</tt> to the GO file <tt>go-basic.obo</tt>,
12+
which is used to build the <tt>GOdb</tt>. This ended up to be a pretty big changed,
13+
hence, this release date is <u>v3.2</u> to indicate the new GO source.
1214

15+
<h4>v3.2.0 29-Mar-21</h4>
16+
<p><i>Existing GOdb and sTCWdb</i>:
17+
If you use GO evidence codes or EC (enzyme code), you will want to recreate the GOdb (i.e. <ttp>runAS</ttp>)
18+
and re-run <ttc>GO Only</ttc> from <ttp>runSingleTCW</ttp>.
19+
<ul>
20+
<li><ttp>runAS</ttp>
21+
<ul>
22+
<li>Parsing go-basic.obo:
23+
<ul>
24+
<li>As a sanity check, all UniProt GO are checked for existence in the <tt>go-basic.obo</tt> file.
25+
<li>Was not saving the last GO.
26+
</ul>
27+
<li>Parsing UniProt:
28+
<ul>
29+
<li>Only the last EC (enzyme code) was being saved; now all ECs under "RecName" are saved.
30+
<br>Also, the text after the EC code
31+
is removed, e.g. "<tt>2.4.1.- {ECO:0000256|RuleBase:RU362057}</tt>" is
32+
"<tt>2.4.1.-</tt>".
33+
<li>The GO evidence code "IC" was being stored as "UNK".
34+
<li>A count of the number of obsolete GOs in an UniProt file is printed to the terminal.
35+
<br>The obsolete GO is still in the GOdb and the obsolete GO in sTCWdb will have a prefix of "obsolete" and no neighborhood.
36+
</ul>
37+
</ul>
38+
<li><ttp>runSingleTCW</ttp>
39+
<ul>
40+
<li><ttc>Evidence Codes</ttc>: <font color=red>Bug</font> The evidence codes were wrong (I don't know what release broke this).
41+
<br>This has been fixed and only the evidence codes from the UniProt hits with the GO assigned will be shown.
42+
</ul>
43+
<li><ttp>viewSingleTCW</ttp> - <ttc>Basic GO</ttc>
44+
<ul>
45+
<li>The interface has been updated to indicate that only assigned Evidence codes are used.
46+
<li>Slight changes to <ttc>Show...</tcc> to make the 'alt_id' (replacements) more obvious.
47+
<li>Changed some terminology to be compatible with AmiGO, e.g. "GO term" changed "GO ID".
48+
</ul>
49+
<li>Updated the <a href="stcw/AnnoDBs.html#obo" class=ext target="_blank">runAS</a> documentation to describe parsing the OBO file.
50+
</ul>
51+
52+
<h3>TCW Version 3.1</h3>
1353
<h4>v3.1.9 25-Mar-21</h4>
1454
This release fixes a few tiny bugs created by the last release.
1555
<ul>
@@ -29,9 +69,9 @@ <h4>v3.1.9 25-Mar-21</h4>
2969
<li>Added item under <ttc>Table...</ttc> called <u>Each GO's parents with relation</u> which produces
3070
a popup or file with output like:
3171
<pre>
32-
---------- GO:0000019 bio regulation of mitotic recombination
72+
---------> GO:0000019 bio regulation of mitotic recombination
3373
is_a GO:0000018 bio regulation of DNA recombination
34-
---------- GO:0000027 bio ribosomal large subunit assembly
74+
---------> GO:0000027 bio ribosomal large subunit assembly
3575
is_a GO:0022618 bio ribonucleoprotein complex assembly
3676
part_of GO:0042255 bio ribosome assembly
3777
part_of GO:0042273 bio ribosomal large subunit biogenesis

doc/stcw/AnnoDBs.html

Lines changed: 77 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,10 @@
55
<body onload="init('set')">
66
<a id="top"></a>
77
<!--#include virtual="../ssi/start2.html" -->
8-
<b><font color=blue>19-Mar-2021 GO Update</font></b> -
98
The <ttp>runAS</ttp> program creates a GO MySQL database (called <ttx>GOdb</ttx>) to be used by <ttp>runSingleTCW</ttp>.
109
The <ttp>runAS</ttp> program has been updated to use the "GO Basic OBO file" in place of the "old GO MySQL tar file"".
11-
Though TCW v3.1.8 will work with old <ttx>GOdb</ttx> databases, it is <i>strongly recommended</i> that you create a new <tt>GOdb</tt>
12-
and update any <ttx>sTCWdbs</ttx>. Even if the <ttx>GOdb</ttx> is newer than the downloaded <ttx>UniProt</ttx> files, the OBO file
13-
is more up-to-date than the latest GO MySQL tar file.
14-
<p><font color=blue>Note:</font> For the next release v3.1.9, I will be making improvements to the <ttp>viewSingleTCW</ttp> GO
15-
features to improve how the GO relations are shown.
10+
<i>If you used a v3.1.8 or earlier version, <b>it is strongly recommended you update to v3.2.0 or later</b></i> (run <ttc>Build GO</ttc> in <ttp>runAS</ttp>,
11+
then <ttc>GO Only</ttc> in <ttp>runSingleTCW</ttp>).
1612

1713
<p>To prepare for annotation with <ttp>runSingleTCW</ttp>, it is necessary to download the databases to compared against.
1814
TCW provides support for downloading the taxonomic and full UniProts along with mapping from the UniProt IDs to GO, KEGG, Pfam, EC, and InterPro.
@@ -29,15 +25,16 @@
2925
<li><a href="#run">Using Java graphical interface -- <ttp>runAS</ttp></a>
3026
<li><a href="#details">Details and file structure</a>
3127
<li><a href="#clean">Cleanup</a>
28+
<li><a href="#mem">Memory and time</a>
3229
</ul>
3330
</td>
3431
<td class="top">
3532
<ul>
36-
<li><a href="#mem">Memory and time</a>
3733
<li><a href="#best">What AnnoDBs to use</a>
3834
<li><a href="#other">Creating AnnoDBs from other databases (e.g. NCBI-nr)</a>
3935
<li><a href="#runstcw">Entering AnnoDBs and GOs into <ttp>runSingleTCW</ttp></a>
4036
<li><a href="#tax">Why use taxonomic databases</a>
37+
<li><a href="#obo">Parsing go-basic.obo</a>
4138
<li><a href="#link">Links to important databases</a>
4239
</ul>
4340
</td>
@@ -68,19 +65,7 @@ <h2>Overview </h2>
6865
alt="curl" style="border: 1px solid black; width: 300px"></a> </td>
6966
</tr>
7067
</table>
71-
<p><b>Pre-v3.1.8 only</b>
72-
<p><a href="../img/bullet2.gif"><img src="../img/bullet2.gif" alt="bullet" style="border: 0px; width: 20px"></a>
73-
For mySQL, the command <b><ttx>mysqladmin</ttx></b> is used, so you may need to define its path, e.g. on Mac, </p>
74-
<pre>
75-
alias mysqladmin '/usr/local/mysql/bin/mysqladmin' #tcsh
76-
alias mysqladmin='/usr/local/mysql/bin/mysqladmin' #bash
77-
</pre>
78-
79-
<a href="../img/bullet2.gif"><img src="../img/bullet2.gif" alt="bullet" style="border: 0px; width: 20px"></a>
80-
With MySQL and MariaDB, you may need to set following MySQL variable in order to add the GO database:
81-
<pre>
82-
SET GLOBAL local_infile = 1;
83-
</pre>
68+
8469
<p><b><i>Processing steps</i></b>: The TCW <ttp>runAS</ttp> will perform the following: </p>
8570
<ol>
8671
<li>Create the directory under <ttx>projects/DBfasta</ttx> for the downloads and generated FASTA files.
@@ -145,7 +130,7 @@ <h4>
145130
which contains the following:
146131
<pre>
147132
GO_obodemo:
148-
go_obo
133+
go_basic.obo
149134

150135
UniProt_demo:
151136
sp_bacteria/ sp_fungi/ sp_plants/ tr_plants/
@@ -187,8 +172,8 @@ <h2>Java graphical interface -- <ttp>runAS</ttp> </h2>
187172
(see <a href="#runstcw">Import AnnoDBs</a>). <br>
188173
&nbsp;
189174
<li><ttc>Check</ttc>: Selecting this button highlights everything that has been done.
190-
For example, the figure on the upper right shows that the directory <ttx>UniProt_Jan2021</ttx> has been created and
191-
only Archaea and Virus SwissProt have been downloaded and processed.
175+
For example, the figure on the right shows that the directory <ttx>UniProt_Mar2021</ttx> has been created and
176+
only Plant SwissProt has been downloaded and processed.
192177
The <ttc>Check</ttc> automatically runs on startup.
193178
</ol>
194179
<td><a href="img/runAS.png"><img src="img/runAS.png" alt="runAS" style="border: 1px solid black; width: 350px"></a>
@@ -227,8 +212,8 @@ <h2>Details and file structure </h2>
227212
<li>At the top:
228213
<ul>
229214
<li>If the <ttl>UniProt</ttl> directory label is highlighted in blue, it exists.
230-
<li>If the <ttl>GO</ttl> directory label is highlighted in pink, it exists but the GO tables have not been downloaded.
231-
<br>If the <ttl>GO</ttl> directory label is highlighted in blue, the GO tables have been downloaded.
215+
<li>If the <ttl>GO</ttl> directory label is highlighted in pink, it exists but the GO OBO file has not been downloaded.
216+
<br>If the <ttl>GO</ttl> directory label is highlighted in blue, the GO OBO file have been downloaded.
232217
</ul>
233218
<li>On the middle right:
234219
<ul>
@@ -237,16 +222,15 @@ <h2>Details and file structure </h2>
237222
</ul>
238223
<b><i>Taxonomic and Full UniProt Highlights</i></b>
239224
<p>
240-
<u>Clear checkbox</u>: If a taxonomic or full checkbox is clear, then neither the .dat file or .fasta file exists for it.
241-
When you check the box followed by "Build Tax", you will need to confirm a popup that states "Download SP - xxx",
242-
where xxx will be the list of files to download. The download is always automatically followed by creating the .fasta file.
225+
<u>Clear checkbox</u>: If a <ttl>Taxonomic</ttl> is clear, then neither the .dat file or .fasta file exists for it.
226+
When you check the box followed by <ttc>Build Tax</ttc>, you will need to confirm a popup that states "Download SP - xxx",
227+
where xxx will be the list of files to download. The download is always automatically followed by creating the .fasta files.
228+
The same applies to the <ttl>Full</ttl> checkboxes.
243229
</p>
244230
<p>
245231
<u>Pink checkbox</u>: If the .dat file exists, but the .fasta file does not, the checkbox will be highlighted pink.
246-
Check the pink box(s) and run "Build Tax" in order to create the .fasta file. You will need to confirm a pop-up that
247-
states "Create SP Fasta - xxx", where xxx is the taxonomic groups that will be created.
248-
Since the .fasta file is automatically created after download, this will not happen unless
249-
there is a problem such as the disk being full.
232+
Check the pink box(s) and run <ttc>Build Tax</ttc> in order to create the .fasta file only.
233+
The same applies to the <ttl>Full</ttl> checkboxes.
250234
</p>
251235
<p>
252236
<u>Blue checkbox</u>: If both the .dat file and the .fasta file exists, the check box will be highlighted blue.
@@ -255,12 +239,12 @@ <h2>Details and file structure </h2>
255239
<tr>
256240
<td class="top">If you downloaded some taxonomic databases, created the SP Full database,
257241
then downloaded another taxonomic database, you will need to recreate the SP Full database. Check the SwissProt box,
258-
and run "Build Full" again. You will get a pop-up to confirm as shown on the right.
242+
and run <ttc>Build Full</ttc> again. You will get a pop-up to confirm as shown on the right.
259243
<td><a href="img/runASnoDL.png"><img src="img/runASnoDL.png" alt="" style="border: 1px solid black; width: 300px"> </a>
260244
</tr>
261245
</table>
262246
<p><b><i>File Structure</i></b> </p>
263-
<p>For each taxonomic and full UniProt that you downloaded, a directory will be created under the "UniProt" directory.
247+
<p>For each taxonomic and full UniProt that you downloaded, a directory will be created under the <ttl>UniProt</ttl> directory.
264248
For example, </p>
265249
<pre>
266250
./TCW/projects/DBfasta/UniProt_Jan2021%&gt; ls *
@@ -284,7 +268,7 @@ <h2>Details and file structure </h2>
284268
gzip */*.fasta
285269
</pre>
286270
<p><b><i>GO (Gene Ontology)</i></b></p>
287-
<p>The <tt>go.obo</tt> file containing the schema and data is downloaded from
271+
<p>The <tt>go-basic.obo</tt> file is downloaded from
288272
<a href="http://current.geneontology.org/ontology/" class="ext" target="_blank"><ttx>http://current.geneontology.org/ontology/</ttx></a>.</p>
289273
<p><u>Database</u>: This text entry on the <ttp>runAS</ttp> interface is the name of the GO MySQL database that
290274
will be created; you will enter this name in <ttp>runSingleTCW</ttp>. </p>
@@ -639,6 +623,64 @@ <h2>Why use taxonomic databases instead of the full UniProt </h2>
639623
<p>The following shows the details of a specific sequence: </p>
640624
<a href="img/runASview.png"><img src="img/runASview3.png" alt="" style="border: 1px solid black;"></a>
641625

626+
627+
<!-- ============================================= -->
628+
<a id="obo"></a>
629+
<table style="width: 100%"><tr><td style="text-align: left">
630+
<h2>Parsing go-basic.obo</h2>
631+
<td style="text-align: right"><a href="#top">Go to top</a></td></tr></table>
632+
633+
<i>The following is an example record in the OBO file:</i>
634+
<pre>
635+
[Term]
636+
id: GO:0000785
637+
name: chromatin
638+
namespace: cellular_component
639+
alt_id: GO:0000789
640+
alt_id: GO:0000790
641+
alt_id: GO:0005717
642+
def: "The ordered and organized complex of DNA, protein, ....
643+
comment: Chromosomes include parts that are not part of ....
644+
synonym: "chromosome scaffold" RELATED []
645+
synonym: "cytoplasmic chromatin" NARROW []
646+
synonym: "nuclear chromatin" NARROW []
647+
xref: NIF_Subcellular:sao1615953555
648+
is_a: GO:0110165 ! cellular anatomical entity
649+
relationship: part_of GO:0005694 ! chromosome
650+
</pre>
651+
652+
<i>TCW parses for the following keywords:</i>
653+
<table class=tabley>
654+
<tr><th>Keyword <th>AmiGO term<th>TCW term<th>Example
655+
<tr><td>id <td>Accession <td>GO ID <td>GO:0000785
656+
<tr><td>name <td>Name <td>Description <td>chromatin
657+
<tr><td>namespace <td>Ontology <td>Domain <td>cellular_component
658+
<tr><td>is_a <td>is_a <td>is_a <td>GO:0110165
659+
<tr><td>relationship: part_of<td>? <td>part_of <td>GO:0005694
660+
<tr><td>alt-id <td>Alternate ID<td>Alternate ID <td>GO:0000790
661+
<tr><td>&nbsp; <td>replaced by <td>Replaced by<td>GO:0000785
662+
<tr><td>is_obsolete: true<td>Name: obsolete<td>Description: obsolete<td>obsolete replicative cell aging
663+
</table>
664+
665+
<p><i>Views in AmiGO and TCW:</i></p>
666+
<table class=tablex>
667+
<tr>
668+
<th>AmiGO<th>TCW
669+
<tr>
670+
<td valign=top><a href="img/runASgo785.png"><img src="img/runASgo785.png" alt="" style="border: 1px solid black; width: 450px"></a>
671+
<td valign=top><a href="img/runASgo785t.png"><img src="img/runASgo785t.png" alt="" style="border: 1px solid black; width: 350px"></a>
672+
<tr>
673+
<td><a href="img/runASgo790.png"><img src="img/runASgo790.png" alt="" style="border: 1px solid black; width: 250px"></a>
674+
<td><a href="img/runASgo790t.png"><img src="img/runASgo790t.png" alt="" style="border: 1px solid black; width: 350px"></a>
675+
</tr>
676+
</table>
677+
678+
<i><u>NOTES:</u></i>
679+
<ol>
680+
<li>UniProt occasionally uses the Alternate IDs and has a few Obsolete GO terms.
681+
<li>I cannot guarantee that AmiGO always treats "alt_id" as specified here.
682+
</ol>
683+
642684
<!-- ============================================= -->
643685
<a id="link"></a>
644686
<table style="width: 100%"><tr><td style="text-align: left">

doc/stcw/img/runAS.png

566 Bytes
Loading

doc/stcw/img/runASdemo.png

172 Bytes
Loading

doc/stcw/img/runASgo785.png

21.2 KB
Loading

doc/stcw/img/runASgo785t.png

22.1 KB
Loading

doc/stcw/img/runASgo790.png

18.8 KB
Loading

doc/stcw/img/runASgo790t.png

15.9 KB
Loading

doc/stcw/runAS.log.html

Lines changed: 25 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -116,53 +116,51 @@
116116
Complete Full UniProt 5h:33m:22s
117117

118118
## Author's note: typically, the GOdb should be created as the same time as the UniProts.
119-
## However, the following illustrates the update with v3.1.8 to use the go-basic.obo file.
119+
## Whereas the UniProt were downloaded two months earlier than the GOs in this log.
120+
## However, there seems to be some obsolete GOs even if they are downloaded the same day.
120121

121-
Start GO processing go_Mar2021 18-Mar-21 11:02:30
122+
Start GO processing go_Mar2021 28-Mar-21 14:17:15
122123
UniProt directory: ./projects/DBfasta/UniProt_Jan2021
123-
GO temporary directory: projects/DBfasta/GO_oboMar2021
124+
GO temporary directory: ./projects/DBfasta/GO_oboMar2021
124125
Delete mySQL database go_Mar2021
125-
Use existing GO file
126-
50,514 Total GOs 3,305 Alt GOs
126+
URL: http://current.geneontology.org/ontology/
127+
50,514 Total GOs 3,305 Alt GOs 3,125 Obsolete
127128
32,542 Biological 56,609 is_a 5,533 part_of
128129
13,289 Molecular 15,018 is_a 11 part_of
129130
4,683 Cellular 5,127 is_a 2,097 part_of
130131
14 Slims 208 GOs in Slim
131-
Complete Load OBO file 0m:6s (4Mb)
132+
Complete Load OBO file 0m:6s (12Mb)
132133
Loading ./projects/DBfasta/UniProt_Jan2021 to go_Mar2021
133134
Processing sp_bacteria/uniprot_sprot_bacteria.dat.gz
134-
334,772 UniProts 1m:38s
135-
Processing sp_fungi/uniprot_sprot_fungi.dat.gz
136-
35,073 UniProts 0m:13s
137-
Processing sp_invertebrates/uniprot_sprot_invertebrates.dat.gz
138-
28,129 UniProts 0m:9s
139-
Processing sp_plants/uniprot_sprot_plants.dat.gz
140-
43,403 UniProts 0m:15s
141-
Processing sp_viruses/uniprot_sprot_viruses.dat.gz
142-
17,008 UniProts 0m:4s
143-
Processing tr_invertebrates/uniprot_trembl_invertebrates.dat.gz
144-
12,275,877 UniProts 35m:32s
145-
Processing tr_plants/uniprot_trembl_plants.dat.gz
146-
20,516,158 UniProts 47m:34s
135+
334,772 UniProts 5 Obsolete GOs 1m:16s (12Mb)
136+
Processing sp_fungi/uniprot_sprot_fungi.dat.gz
137+
35,073 UniProts 18 Obsolete GOs 0m:9s (12Mb)
138+
Processing sp_invertebrates/uniprot_sprot_invertebrates.dat.gz
139+
28,129 UniProts 8 Obsolete GOs 0m:7s (12Mb)
140+
Processing sp_plants/uniprot_sprot_plants.dat.gz
141+
43,403 UniProts 2 Obsolete GOs 0m:12s (12Mb)
142+
Processing sp_viruses/uniprot_sprot_viruses.dat.gz
143+
17,008 UniProts 1 Obsolete GOs 0m:4s (12Mb)
144+
Processing tr_invertebrates/uniprot_trembl_invertebrates.dat.gz
145+
12,275,877 UniProts 12 Obsolete GOs 28m:41s (12Mb)
146+
Processing tr_plants/uniprot_trembl_plants.dat.gz
147+
20,516,158 UniProts 4 Obsolete GOs 39m:14s (12Mb)
147148
Processing sp_fullSubset/uniprot_sprot_fullSubset.dat.gz
148-
105,587 UniProts 0m:51s
149+
105,587 UniProts 46 Obsolete GOs 0m:43s (12Mb)
149150
Totals:
150-
GO: 49,907,782 Pfam: 21,175,254 KEGG: 2,806,531 EC: 3,866,797 InterPro: 56,704,777
151+
GO: 49,907,782 Pfam: 21,175,254 KEGG: 2,806,531 EC: 3,736,965 InterPro: 56,704,777
151152
Compute levels
152153
19,633 Parent-child
153154
924,561 edges for biological_process; max level 18
154155
37,471 edges for cellular_component; max level 14
155156
28,245 edges for molecular_function; max level 13
156157
Add GO level numbers to term table
157-
Complete GO Levels 0m:14s (8Mb)
158+
Complete GO Levels 0m:14s (16Mb)
158159
Compute ancestors
159160
50,514 GOs to process ancestors
160161
676,026 Ancestor paths
161-
Complete ancestors 2m:52s (86Mb)
162-
Complete creating GO database go_Mar2021 1h:29m:36s (86Mb)
163-
164-
Write ./projects/AnnoDBs_UniProt_Jan2021.cfg
165-
12 entries written
162+
Complete ancestors 2m:57s (94Mb)
163+
Complete creating GO database go_Mar2021 1h:13m:48s (94Mb)
166164
</pre>
167165
</body>
168166
</html>

0 commit comments

Comments
 (0)