55< body onload ="init('set') ">
66< a id ="top "> </ a >
77<!--#include virtual="../ssi/start2.html" -->
8- < b > < font color =blue > 19-Mar-2021 GO Update</ font > </ b > -
98The < ttp > runAS</ ttp > program creates a GO MySQL database (called < ttx > GOdb</ ttx > ) to be used by < ttp > runSingleTCW</ ttp > .
109The < ttp > runAS</ ttp > program has been updated to use the "GO Basic OBO file" in place of the "old GO MySQL tar file"".
11- Though TCW v3.1.8 will work with old < ttx > GOdb</ ttx > databases, it is < i > strongly recommended</ i > that you create a new < tt > GOdb</ tt >
12- and update any < ttx > sTCWdbs</ ttx > . Even if the < ttx > GOdb</ ttx > is newer than the downloaded < ttx > UniProt</ ttx > files, the OBO file
13- is more up-to-date than the latest GO MySQL tar file.
14- < p > < font color =blue > Note:</ font > For the next release v3.1.9, I will be making improvements to the < ttp > viewSingleTCW</ ttp > GO
15- features to improve how the GO relations are shown.
10+ < i > If you used a v3.1.8 or earlier version, < b > it is strongly recommended you update to v3.2.0 or later</ b > </ i > (run < ttc > Build GO</ ttc > in < ttp > runAS</ ttp > ,
11+ then < ttc > GO Only</ ttc > in < ttp > runSingleTCW</ ttp > ).
1612
1713< p > To prepare for annotation with < ttp > runSingleTCW</ ttp > , it is necessary to download the databases to compared against.
1814TCW provides support for downloading the taxonomic and full UniProts along with mapping from the UniProt IDs to GO, KEGG, Pfam, EC, and InterPro.
2925 < li > < a href ="#run "> Using Java graphical interface -- < ttp > runAS</ ttp > </ a >
3026 < li > < a href ="#details "> Details and file structure</ a >
3127 < li > < a href ="#clean "> Cleanup</ a >
28+ < li > < a href ="#mem "> Memory and time</ a >
3229 </ ul >
3330 </ td >
3431 < td class ="top ">
3532 < ul >
36- < li > < a href ="#mem "> Memory and time</ a >
3733 < li > < a href ="#best "> What AnnoDBs to use</ a >
3834 < li > < a href ="#other "> Creating AnnoDBs from other databases (e.g. NCBI-nr)</ a >
3935 < li > < a href ="#runstcw "> Entering AnnoDBs and GOs into < ttp > runSingleTCW</ ttp > </ a >
4036 < li > < a href ="#tax "> Why use taxonomic databases</ a >
37+ < li > < a href ="#obo "> Parsing go-basic.obo</ a >
4138 < li > < a href ="#link "> Links to important databases</ a >
4239 </ ul >
4340 </ td >
@@ -68,19 +65,7 @@ <h2>Overview </h2>
6865 alt ="curl " style ="border: 1px solid black; width: 300px "> </ a > </ td >
6966 </ tr >
7067</ table >
71- < p > < b > Pre-v3.1.8 only</ b >
72- < p > < a href ="../img/bullet2.gif "> < img src ="../img/bullet2.gif " alt ="bullet " style ="border: 0px; width: 20px "> </ a >
73- For mySQL, the command < b > < ttx > mysqladmin</ ttx > </ b > is used, so you may need to define its path, e.g. on Mac, </ p >
74- < pre >
75- alias mysqladmin '/usr/local/mysql/bin/mysqladmin' #tcsh
76- alias mysqladmin='/usr/local/mysql/bin/mysqladmin' #bash
77- </ pre >
78-
79- < a href ="../img/bullet2.gif "> < img src ="../img/bullet2.gif " alt ="bullet " style ="border: 0px; width: 20px "> </ a >
80- With MySQL and MariaDB, you may need to set following MySQL variable in order to add the GO database:
81- < pre >
82- SET GLOBAL local_infile = 1;
83- </ pre >
68+
8469< p > < b > < i > Processing steps</ i > </ b > : The TCW < ttp > runAS</ ttp > will perform the following: </ p >
8570< ol >
8671 < li > Create the directory under < ttx > projects/DBfasta</ ttx > for the downloads and generated FASTA files.
145130which contains the following:
146131< pre >
147132 GO_obodemo:
148- go_obo
133+ go_basic.obo
149134
150135 UniProt_demo:
151136 sp_bacteria/ sp_fungi/ sp_plants/ tr_plants/
@@ -187,8 +172,8 @@ <h2>Java graphical interface -- <ttp>runAS</ttp> </h2>
187172 (see < a href ="#runstcw "> Import AnnoDBs</ a > ). < br >
188173
189174 < li > < ttc > Check</ ttc > : Selecting this button highlights everything that has been done.
190- For example, the figure on the upper right shows that the directory < ttx > UniProt_Jan2021 </ ttx > has been created and
191- only Archaea and Virus SwissProt have been downloaded and processed.
175+ For example, the figure on the right shows that the directory < ttx > UniProt_Mar2021 </ ttx > has been created and
176+ only Plant SwissProt has been downloaded and processed.
192177 The < ttc > Check</ ttc > automatically runs on startup.
193178 </ ol >
194179 < td > < a href ="img/runAS.png "> < img src ="img/runAS.png " alt ="runAS " style ="border: 1px solid black; width: 350px "> </ a >
@@ -227,8 +212,8 @@ <h2>Details and file structure </h2>
227212 < li > At the top:
228213 < ul >
229214 < li > If the < ttl > UniProt</ ttl > directory label is highlighted in blue, it exists.
230- < li > If the < ttl > GO</ ttl > directory label is highlighted in pink, it exists but the GO tables have not been downloaded.
231- < br > If the < ttl > GO</ ttl > directory label is highlighted in blue, the GO tables have been downloaded.
215+ < li > If the < ttl > GO</ ttl > directory label is highlighted in pink, it exists but the GO OBO file has not been downloaded.
216+ < br > If the < ttl > GO</ ttl > directory label is highlighted in blue, the GO OBO file have been downloaded.
232217 </ ul >
233218 < li > On the middle right:
234219 < ul >
@@ -237,16 +222,15 @@ <h2>Details and file structure </h2>
237222</ ul >
238223< b > < i > Taxonomic and Full UniProt Highlights</ i > </ b >
239224< p >
240- < u > Clear checkbox</ u > : If a taxonomic or full checkbox is clear, then neither the .dat file or .fasta file exists for it.
241- When you check the box followed by "Build Tax", you will need to confirm a popup that states "Download SP - xxx",
242- where xxx will be the list of files to download. The download is always automatically followed by creating the .fasta file.
225+ < u > Clear checkbox</ u > : If a < ttl > Taxonomic</ ttl > is clear, then neither the .dat file or .fasta file exists for it.
226+ When you check the box followed by < ttc > Build Tax</ ttc > , you will need to confirm a popup that states "Download SP - xxx",
227+ where xxx will be the list of files to download. The download is always automatically followed by creating the .fasta files.
228+ The same applies to the < ttl > Full</ ttl > checkboxes.
243229</ p >
244230< p >
245231 < u > Pink checkbox</ u > : If the .dat file exists, but the .fasta file does not, the checkbox will be highlighted pink.
246- Check the pink box(s) and run "Build Tax" in order to create the .fasta file. You will need to confirm a pop-up that
247- states "Create SP Fasta - xxx", where xxx is the taxonomic groups that will be created.
248- Since the .fasta file is automatically created after download, this will not happen unless
249- there is a problem such as the disk being full.
232+ Check the pink box(s) and run < ttc > Build Tax</ ttc > in order to create the .fasta file only.
233+ The same applies to the < ttl > Full</ ttl > checkboxes.
250234</ p >
251235< p >
252236 < u > Blue checkbox</ u > : If both the .dat file and the .fasta file exists, the check box will be highlighted blue.
@@ -255,12 +239,12 @@ <h2>Details and file structure </h2>
255239 < tr >
256240 < td class ="top "> If you downloaded some taxonomic databases, created the SP Full database,
257241 then downloaded another taxonomic database, you will need to recreate the SP Full database. Check the SwissProt box,
258- and run " Build Full" again. You will get a pop-up to confirm as shown on the right.
242+ and run < ttc > Build Full</ ttc > again. You will get a pop-up to confirm as shown on the right.
259243 < td > < a href ="img/runASnoDL.png "> < img src ="img/runASnoDL.png " alt ="" style ="border: 1px solid black; width: 300px "> </ a >
260244 </ tr >
261245</ table >
262246< p > < b > < i > File Structure</ i > </ b > </ p >
263- < p > For each taxonomic and full UniProt that you downloaded, a directory will be created under the " UniProt" directory.
247+ < p > For each taxonomic and full UniProt that you downloaded, a directory will be created under the < ttl > UniProt</ ttl > directory.
264248For example, </ p >
265249< pre >
266250 ./TCW/projects/DBfasta/UniProt_Jan2021%> ls *
@@ -284,7 +268,7 @@ <h2>Details and file structure </h2>
284268 gzip */*.fasta
285269</ pre >
286270< p > < b > < i > GO (Gene Ontology)</ i > </ b > </ p >
287- < p > The < tt > go.obo</ tt > file containing the schema and data is downloaded from
271+ < p > The < tt > go-basic .obo</ tt > file is downloaded from
288272< a href ="http://current.geneontology.org/ontology/ " class ="ext " target ="_blank "> < ttx > http://current.geneontology.org/ontology/</ ttx > </ a > .</ p >
289273< p > < u > Database</ u > : This text entry on the < ttp > runAS</ ttp > interface is the name of the GO MySQL database that
290274will be created; you will enter this name in < ttp > runSingleTCW</ ttp > . </ p >
@@ -639,6 +623,64 @@ <h2>Why use taxonomic databases instead of the full UniProt </h2>
639623< p > The following shows the details of a specific sequence: </ p >
640624< a href ="img/runASview.png "> < img src ="img/runASview3.png " alt ="" style ="border: 1px solid black; "> </ a >
641625
626+
627+ <!-- ============================================= -->
628+ < a id ="obo "> </ a >
629+ < table style ="width: 100% "> < tr > < td style ="text-align: left ">
630+ < h2 > Parsing go-basic.obo</ h2 >
631+ < td style ="text-align: right "> < a href ="#top "> Go to top</ a > </ td > </ tr > </ table >
632+
633+ < i > The following is an example record in the OBO file:</ i >
634+ < pre >
635+ [Term]
636+ id: GO:0000785
637+ name: chromatin
638+ namespace: cellular_component
639+ alt_id: GO:0000789
640+ alt_id: GO:0000790
641+ alt_id: GO:0005717
642+ def: "The ordered and organized complex of DNA, protein, ....
643+ comment: Chromosomes include parts that are not part of ....
644+ synonym: "chromosome scaffold" RELATED []
645+ synonym: "cytoplasmic chromatin" NARROW []
646+ synonym: "nuclear chromatin" NARROW []
647+ xref: NIF_Subcellular:sao1615953555
648+ is_a: GO:0110165 ! cellular anatomical entity
649+ relationship: part_of GO:0005694 ! chromosome
650+ </ pre >
651+
652+ < i > TCW parses for the following keywords:</ i >
653+ < table class =tabley >
654+ < tr > < th > Keyword < th > AmiGO term< th > TCW term< th > Example
655+ < tr > < td > id < td > Accession < td > GO ID < td > GO:0000785
656+ < tr > < td > name < td > Name < td > Description < td > chromatin
657+ < tr > < td > namespace < td > Ontology < td > Domain < td > cellular_component
658+ < tr > < td > is_a < td > is_a < td > is_a < td > GO:0110165
659+ < tr > < td > relationship: part_of< td > ? < td > part_of < td > GO:0005694
660+ < tr > < td > alt-id < td > Alternate ID< td > Alternate ID < td > GO:0000790
661+ < tr > < td > < td > replaced by < td > Replaced by< td > GO:0000785
662+ < tr > < td > is_obsolete: true< td > Name: obsolete< td > Description: obsolete< td > obsolete replicative cell aging
663+ </ table >
664+
665+ < p > < i > Views in AmiGO and TCW:</ i > </ p >
666+ < table class =tablex >
667+ < tr >
668+ < th > AmiGO< th > TCW
669+ < tr >
670+ < td valign =top > < a href ="img/runASgo785.png "> < img src ="img/runASgo785.png " alt ="" style ="border: 1px solid black; width: 450px "> </ a >
671+ < td valign =top > < a href ="img/runASgo785t.png "> < img src ="img/runASgo785t.png " alt ="" style ="border: 1px solid black; width: 350px "> </ a >
672+ < tr >
673+ < td > < a href ="img/runASgo790.png "> < img src ="img/runASgo790.png " alt ="" style ="border: 1px solid black; width: 250px "> </ a >
674+ < td > < a href ="img/runASgo790t.png "> < img src ="img/runASgo790t.png " alt ="" style ="border: 1px solid black; width: 350px "> </ a >
675+ </ tr >
676+ </ table >
677+
678+ < i > < u > NOTES:</ u > </ i >
679+ < ol >
680+ < li > UniProt occasionally uses the Alternate IDs and has a few Obsolete GO terms.
681+ < li > I cannot guarantee that AmiGO always treats "alt_id" as specified here.
682+ </ ol >
683+
642684<!-- ============================================= -->
643685< a id ="link "> </ a >
644686< table style ="width: 100% "> < tr > < td style ="text-align: left ">
0 commit comments