This document is a handoff from a Claude Code session running on AWS to a Claude Code
session running on hoag (Lowe Lab, /projects/lowelab/). The goal is to find and use
Brian Lin's original data-preparation code to generate genomes.tsv and trnas.tsv
files for a test species, so we can validate the add_species management command in
the tRNAviz Django web application.
tRNAviz is a Django web application for visualizing tRNA sequence features across the tree of life. It stores tRNA genes with per-position base identity (using Sprinzl numbering), computes consensus and frequency summaries at every taxonomic rank, and displays them as interactive heatmaps and cloverleaf diagrams.
- GitHub repo: UCSC-LoweLab/tRNAviz
- Live site: trnaviz.ucsc.edu
- Stack: Django 5.2 / MySQL / Python 3
Recent commits on master (most recent first):
d6a9ea8- Fix iTOL tree visualization: client-side upload and root node crashf691a37- Migrate from PostgreSQL/SQLite to MySQL backend4a16557- Add database indexes for MySQL performanceceb03d9- Add load_pgdump management commandfb452ff- Fix species distribution bug and add incrementaladd_speciescommand- Earlier: Django 5.2 upgrade, security fixes, cloverleaf coordinates
The key new feature is python manage.py add_species, which can incrementally add or
remove species without reloading the entire database. It needs two input files.
Required columns:
dbname taxid name domain kingdom subkingdom phylum subphylum class subclass order family genus species assembly
dbname: short identifier for the genome (e.g. "hg38", "sacCer3")taxid: NCBI Taxonomy ID for the assembly (note: this is often a strain-level taxid)name: organism/assembly display namedomainthroughassembly: NCBI taxids for each taxonomic rank in the lineage (not names -- numeric NCBI taxids stored as strings, e.g. "2759" for Eukaryota)
Columns:
seqname isotype anticodon score primary best_model isoscore isoscore_ac
dbname domain kingdom subkingdom phylum subphylum class subclass
order family genus species assembly
GCcontent insertions deletions intron_length dloop acloop tpcloop varm
1:72 1 2:71 2 3:70 3 4:69 4 5:68 5 6:67 6 7:66 7
8 8:14 9 9:23 10:25 10 10:45 11:24 11 12:23 12 13:22 13
14 15 15:48 16 17 17a 18 18:55 19 19:56 20 20a 20b 21
22 22:46 23 24 25 26 26:44 27:43 27 28:42 28 29:41 29
30:40 30 31:39 31 32 33 34 35 36 37 38 39 40 41 42 43
44 45 V11:V21 V12:V22 V13:V23 V14:V24 V15:V25 V16:V26 V17:V27
V1 V2 V3 V4 V5 V11 V12 V13 V14 V15 V16 V17
V21 V22 V23 V24 V25 V26 V27
46 47 48 49:65 49 50:64 50 51:63 51 52:62 52 53:61 53
54 54:58 55 56 57 58 59 60 61 62 63 64 65 66 67 68
69 70 71 72 73 74 75 76
Key details:
seqname: unique tRNA identifier (primary key), e.g. "hg38-tRNA-Ala-AGC-1-1"isotype: amino acid (Ala, Arg, ..., Val, fMet, iMet)primary: "True" or "False" (whether this is the primary/best prediction)- Lineage columns: same numeric NCBI taxids as genomes.tsv
- Position columns use Sprinzl numbering. Single positions have values: A, C, G, U, or -
- Paired positions (e.g. "1:72") have values like "G:C", "A:U", "-:-", etc.
GCcontent: float;insertions,deletions,intron_length,dloop,acloop,tpcloop,varm: integers
Brian Lin was the original developer of tRNAviz. His code for generating the TSV files
from tRNAscan-SE output is likely somewhere under /projects/lowelab/. Look for:
- A directory named something like
tRNAviz,trnaviz,tRNA-viz, ortrna_vizin Brian's home directory or project space - Python scripts that read tRNAscan-SE output (
.outfiles,.sssecondary structure files, or.isoisotype-specific files) and produce TSV/CSV with the column structure described above - Look for files containing keywords like:
sprinzl,alignment,position,isotype,genomes.tsv,trnas.tsv,trnascan,GtRNAdb - His username may be
blinorbrianlinor similar
Likely locations to search:
/projects/lowelab/users/blin/
/projects/lowelab/users/brianlin/
/home/blin/
/projects/lowelab/tRNAviz/
/projects/lowelab/GtRNAdb/
Once you find the code, map out the pipeline:
- What input does it take? (tRNAscan-SE output files, genome metadata, NCBI taxonomy)
- How does it map tRNA sequences to Sprinzl positions? (likely via covariance model alignment or structure-based mapping)
- How does it construct the lineage taxid columns?
- How does it compute
GCcontent, loop sizes (dloop,acloop,tpcloop), and variable arm length (varm)?
Pick a small, well-characterized genome that is NOT already in the tRNAviz database. Good candidates:
- A newly sequenced yeast (check what's already in tRNAviz first)
- A small bacterial genome
- Any organism with tRNAscan-SE results readily available on the lab disk
Generate the two TSV files (genomes.tsv and trnas.tsv) using Brian's pipeline,
or by adapting it.
Quick sanity checks:
genomes.tsvshould have all 15 required columns, with numeric taxids in the lineage columnstrnas.tsvseqnamevalues must be unique (it's the primary key)- Paired position values should be in "X:Y" format (e.g., "G:C", "-:-")
- Single position values should be single characters (A, C, G, U, or -)
- The
assemblytaxid intrnas.tsvmust match a row ingenomes.tsv
If you have access to the tRNAviz repo and a running dev instance:
# Dry run first
python manage.py add_species genomes.tsv trnas.tsv --dry-run
# If that passes, add for real
python manage.py add_species genomes.tsv trnas.tsv --skip-ncbi
# Test removal round-trip
python manage.py add_species --remove <assembly_taxid> --dry-runThe tRNA model has ~130 position fields using Sprinzl numbering. Field naming convention:
- Single positions:
p8,p9,p14,p17a,pV1, etc. - Paired positions:
p1_72,p10_25,pV11_V21, etc.
The TSV column-to-field mapping is in explorer/load_utils.py:tsv_col_to_field():
- TSV column
classmaps to Django fieldtaxclass(reserved word) - Position columns like
1:72map top1_72(prefixp, colon becomes underscore)
GitHub is flagging a moderate vulnerability in Biopython. This is not urgent and we are waiting for an upstream fix. No action needed.
This handoff was prepared by Todd Lowe's Claude Code session on AWS. The tRNAviz GitHub repo is at: github.com/UCSC-LoweLab/tRNAviz