Classify intron sequences as U12-type (minor spliceosome) or U2-type (major spliceosome). A 42-model RBF SVM ensemble scores each intron against position-weight matrices and outputs a calibrated probability (0-100%).
pip install intronIC# Classify introns (loads default model automatically)
intronIC -g genome.fa.gz -a annotation.gff3.gz -n species_name -p 8
# Extract sequences without classification
intronIC extract -g genome.fa.gz -a annotation.gff3.gz -n species_name -p 8
# Verify installation with bundled test data
intronIC test -p 4- 42-model RBF SVM ensemble on a streamlined 6D feature set
- Bayesian score adjustment suppresses false positives in species lacking a distinct U12-type intron population, using a species-level valley prior and per-intron ensemble agreement
- Species-specific U2-type background correction for cross-species composition bias
- Default threshold raised to 95% for higher-confidence calls
- See CHANGELOG.md for full release history
- Probability scores (0-100%) from a 42-model ensemble with isotonic calibration
- Pretrained model loaded automatically for cross-species analysis
- Streaming mode (default) reduces memory ~85% on large genomes
- Parallel scoring via
-p Nfor linear speedup - Comprehensive metadata: phase, position, parent gene/transcript
Most eukaryotic introns (~99.5%) use the major (U2-type) spliceosome. A small fraction (~0.5%) use the minor (U12-type) spliceosome, characterized by a conserved TCCTTAAC branch point motif and either AT-AC (~25%) or GT-AG (~75%) terminal dinucleotides.
intronIC identifies U12-type introns in five stages:
- PWM scoring — score the 5' splice site, branch point, and 3' splice site against position-weight matrices
- Background correction — blend species-specific nucleotide frequencies into U2-type PWMs to correct composition bias
- Normalization — convert raw log-odds to z-scores via robust scaling
- SVM classification — 42-model RBF SVM ensemble produces per-intron probabilities and ensemble agreement (sigma)
- Score adjustment — adjust probabilities using a species-level valley prior and an ensemble disagreement penalty
See Technical Details for the full algorithm description.
Full documentation lives in the intronIC Wiki:
- Quick Start — Installation, dependencies, resource usage
- Overview — Classification approach and scientific background
- Output Files — File formats and score interpretation
- Technical Details — Algorithm, features, score adjustment
- Usage Info — Complete CLI reference
- Example Usage — Common workflows
- Changelog — Release notes and version history
If you use intronIC in your research, please cite:
Moyer DC, Larue GE, Hershberger CE, Roy SW, Padgett RA. (2020) Comprehensive database and evolutionary dynamics of U12-type introns. Nucleic Acids Research 48(13):7066-7078. doi:10.1093/nar/gkaa464
- intronIC Wiki — Documentation
- GitHub Issues — Bug reports
- GitHub Discussions — Questions and ideas
See CONTRIBUTING.md for guidelines.
git clone https://github.com/glarue/intronIC.git
cd intronIC
make install # Set up development environment
make test # Run tests