Releases: exomiser/Exomiser
Bigger on the inside
This release is big. You just won't believe how vastly, hugely, mind-bogglingly big it is. I mean, you may think it's
a long way down the road to the chemist's, but that's just peanuts to this.-- Douglas Adams
Apologies for misquoting Douglas Adams, but this is a big release. For starters, we've fittingly managed to
coincide with Rare Disease Day this year, which is a first.
This release has touched literally every part of the Exomiser CLI and core libraries. These changes should be very apparent as the CLI has subtly changed and improved, the output files have either been replaced or enhanced, and a new parquet format has been added. The documentation has had a fair bit of work done to improve the user installation experience. The analysis scripts and CLI presets have been updated to improve their performance in various scenarios. The biggest change of all is probably that of the logistic regression model which has been updated to take into account the automated ACMG assignments and re-trained using the solved cases from the UK's 100,000 Genomes Project
Given all this change, we urge you to review the changelog also provided below and the documentation and open an issue if you have any questions or problems. Existing pipelines will need to be minorly changed to use this release, but the effort to do so should be worth the gains.
CLI Changes
-
Minimum Java version is now Java 21
-
The CLI is now handled by picocli and has new
analyseandbatchcommands.- The
analysecommand works with the same options as before, but will fail before loading resources if no samples have been provided in the command input. - The
batchcommand replaces the--batchoption and now has a--dry-runoption to check the input commands and samples before running and will write out an error file.
Run
exomiser --helpfor details or see the docs about how to migrate your scripts. However, the snippet below should be enough to get you started:# Running the `analyse` command: ## Exomiser < 15.0.0 java -jar exomiser-cli-14.1.0.jar --analysis examples/exome-analysis.yml --output-directory exomiser-results/exome-analysis --output-format HTML # Exomiser 15.0.0 java -jar exomiser-cli-15.0.0.jar analyse --analysis examples/exome-analysis.yml --output-directory exomiser-results/exome-analysis --output-format HTML # Running the `batch` command: ## Exomiser < 15.0.0 java -jar exomiser-cli-14.1.0.jar --batch examples/test-analysis-batch-commands.txt # Exomiser 15.0.0 java -jar exomiser-cli-15.0.0.jar batch examples/test-analysis-batch-commands.txt
- The
-
Updated logistic regression model which will take into account the ACMG assignment data which leads to improved accuracy of the results. !!! WARNING - THIS SIGNIFICANTLY CHANGES THE EXOMISER COMBINED SCORES, SO IF YOU USE ANY CUTOFFS TO FILTER YOUR RESULTS IN YOUR PIPELINE, YOU WILL NEED TO RE-CALIBRATE THEM !!!.
-
New
alleleBalanceFilter: {}analysis step to filter variants based on allele balance (see docs for details). -
Updated
examples/preset-exome-analysis.ymlandexamples/preset-genome-analysis.ymlto use new defaults. UPDATE YOUR SCRIPTS TO USE THESE FOR IMPROVED ACCURACY. -
Added
examples/preset-exome-analysis-human-only.yml -
Added
examples/preset-exome-analysis-with-introns.yml -
Added
examples/preset-phenotype-only-analysis.yml -
New
PARQUEToutput file format. This is a much more efficient format for storing results. It is an amalgamation of theTSV_VARIANTandTSV_GENEdata with added fields and should be considered as a replacement for the JSON output. -
JSONoutput has been replaced withJSONLoutput which is a line-delimited JSON format (https://jsonlines.org/). Note that the file suffix is now.jsonlrather than.json. -
New
HTMLoutput format. This is a much more compact and readable format for displaying results. -
Fix for issue #621 in
VCFoutput where ACMG categories were being concatenated with,which broke parsers. These are now replaced with&. -
Removed use of BS4 category in ACMG assignments as this was being applied too stringently, leading to lost diagnoses in DDD cohort.
-
Fixed PM4 assignment to include disruptive_inframe_deletion/insertion variants
-
Updated Exomiser CLI startup configuration to not write the
resultsdirectory to the installation directory by default.
Under the hood changes
New Java record classes have been added to the core module to represent the immutable data structures used in the analysis.
These have led to a much less 'getty' API as the traditional Java bean conventions have been replaced with a terser API.
Data Release
This update also includes a new 2512 data release. See the data-release discussions for links.
What's Changed
- Add OpenAPI /swagger-ui/index.html and /api-docs by @julesjacobsen in #582
- Enhancement/html report by @iimpulse in #584
- cli-v2 by @julesjacobsen in #596
- !refactor: separate ACMG_EVIDENCE by '&' in VCF by @znorgaard in #621
New Contributors
- @julesjacobsen made their first contribution in #582
- @iimpulse made their first contribution in #584
- @znorgaard made their first contribution in #621
Full Changelog: 14.1.0...15.0.0
14.1.0 Nice and Splicy
The main changes in this release focus on further updates to ACMG assignment categories, including addition of PS1, PM1, PM5, BS1, BS2 categories to ACMG assignments. This release also includes implementation of the assignment of the ClinGen recommendations for splicing variants.
- New AcmgMissenseInFrameIndelEvidenceAssigner class to assign PS1, PM1, PM5, PP2, BP1, PP3, BP4 to missense and inframe
indels. - New AcmgSpliceEvidenceAssigner class which applies PS1, PP3, BP4, BP7 to splice region variants according to ClinGen
recommendations for splicing variants published in
Using the ACMG/AMP framework to capture evidence related to predicted and observed impact on splicing: Recommendations
from the ClinGen SVI Splicing Subgroup. - New AcmgPVS1EvidenceAssigner handles assignment of PVS1 to loss of function variants according
to Recommendations for interpreting the loss of function PVS1 ACMG/AMP variant criterion - Downgraded BP4 to have a maximum of
Supportingevidence for REVEL scores under 0.290 - Updated gene constraints to use gnomad v4.1 data
Under the hood changes
- Deprecate out of date Acmg2015Classifier and Acgs2020Classifier
- Update JannovarSmallVariantAnnotator to remove
MNVannotations from effects as these were overriding more damaging
functional effects such as STOP_LOSS, STOP_GAIN, SPLICE_DONOR, SPLICE_ACCEPTOR which prevented potential assignment of
PVS1. - Update Acmg2015EvidenceAssigner to include BS1, BS2 assignments.
- Refactor Acmg2015EvidenceAssigner missense assignment methods into new AcmgMissenseInFrameIndelEvidenceAssigner class.
- Add PP2/BP1 assignments to AcmgMissenseInFrameIndelEvidenceAssigner using GeneStatistics
- Update ClinVarDao with new getGeneStatistics() method.
- Add new GeneStatistics class for handling aggregated ClinVar gene-level variant effect counts.
- Add new AcmgEvidence.parseAcmgEvidence() method.
- Changes to enable SpliceAI PP3 and other splicing-related ACMG assignments.
- Add new AcmgPVS1EvidenceAssigner class to assign PVS1 to loss of function variants
- Add new AcmgMissenseInFrameIndelEvidenceAssigner class to assign PS1, PM1, PM5, PP2, BP1, PP3, BP4 to missense and
inframe indels - Add new AcmgSpliceEvidenceAssigner class to assign PS1, PP3, BP4, BP7 to splice region variants
- Add new AcmgEvidence.Builder.containsWithEvidence method
- Add @nullable to PathogenicityData.pathogenicityScore method
This update also includes a new 2410 data release. See the data-release discussions for links.
Full Changelog: 14.0.2...14.1.0
Fix for issue #571
Fix for issue #571. This is a bug-fix release to prevent erroneous assignment of PVS1 to recessive-compatible variants in LOF-tolerant genes.
We strongly recommend updating to this release if you rely on the ACMG assignments from Exomiser.
Fix for Issue #565
This is a patch release to prevent a possible ArrayIndexOutOfBoundsException being thrown when outputting the variants TSV file. There are no other changes.
New Java version, new database format, smaller data downloads, more ACMG categories, better reporting...
- Minimum Java version is now Java 17
- Update database format REQUIRES DATABASE VERSION 2406 - these are significantly smaller than the previous versions (~50-60% of previous size) See the GitHub discussions section for details.
- Added new GeneBlacklistFilter #457
- Add new ClinVar conflicting evidence counts in HTML output #535
- Added PS1, PM1, PM5 categories to ACMG assignments
- Altered reporting of InheritanceModeFilter to state that the number shown refers to variants rather than genes.
- Updated gene constraints to use gnomad v4.0 data.
- TSV genes, TSV variants and VCF outputs will only write to a single file where the possible modes of inheritances are now shown together rather than split across separate files.
- Fix for issue #531 where the
priorityScoreFilterandregulatoryFeatureFilterpass/fail counts were not displayed in the HTML. - Fix for issue #534 where variant frequency and/or pathogenicity annotations are missing in certain run configurations.
- Fix for issue #541 where logging to /tmp/spring.log causes clashes in shared user environments.
- TSV output column
CLINVAR_ALLELE_IDhas been changed toCLINVAR_VARIANT_IDto allow easier reference to ClinVar variants.
Full Changelog: 13.3.0...14.0.0
MT codon tables and Bayesian ACMG
- Updated Jannovar version to 0.41 to fix incorrect MT codon table usage #521
- Downgraded PM2 - PM2_Supporting for variants lacking frequency information #502.
- Updated Acgs2020Classifier and Acmg2015Classifier to allow for PVS1 and PM2_Supporting to be sufficient to trigger LIKELY_PATHOGENIC
- Updated AcmgEvidence to fit a Bayesian points-based system #514
- Removed ASJ, FIN, OTH ExAC and gnomAD populations from presets and examples #513.
- Fix for regression causing
<INV>variants to be incorrectly down-ranked - Fix for issue #486 where VCF output includes whitespace in INFO field.
- Logs will now display elapsed time correctly if an analysis runs over an hour (!).
Full Changelog: 13.2.1...13.3.0
SV `<INS>` bugfix
This is a bugfix release to address the blanket scoring of <INS> variants with a variant score of 1.0. The fix should increase the accuracy of SV call prioritisation.
- Fix for bug where all
<INS>structural variants were given a maximal variant score of 1.0 regardless of their position on a transcript. - Added partial implementation of SVanna scoring for coding and splice site symbolic variants.
- Fix for issue #481 where TSV and VCF results files would contain no data when the analysis
inheritanceModeswas empty.
IMPORTANT! This will be the last major release to run on Java 11. Subsequent major releases (i.e. 14+) will require Java 17.
Sometimes it's the little things...
This release adds a couple of minor quality of life features to the CLI and fixes a few bugs.
- New multi-architecture docker images with and without bash #471. These images can be found on https://hub.docker.com/repositories/exomiser
- Deprecated of
output-prefixCLI option (will be removed in next major version) #469 - Added
output-directoryandoutput-filenameCLI options to replaceoutput-prefix#469 - Added
output-formatCLI option #471 - Fixed excessive CPU usage and application hang after variant prioritisation with large number of results #479
- Fixed issue #478 where gene.tsv output files are empty when running a phenotype only prioritisation.
- Fixed broken links to OMIM in the phenotypic similarity section of the HTML output #465
- Added gene symbol as HTML id tag in gene panel HTML results #422
Automated ACMG, p-values, simpler output, documentation!
The three new features for this release is the automated ACMG classification of small sequence variants, calculating
p-values for the combined scores and providing new and more interpretable TSV and VCF output files.
- Added new automated ACMG annotations for top-scoring variants in known disease-causing genes.
- Added new combined score p-value
- Added new TSV_GENE, TSV_VARIANT and VCF output files containing ranked genes/variants for all the assessed modes of
inheritance. Note that these new file formats will supersede the existing individual MOI-specific TSV/VCF files which
will be removed in the next major release. See the online documentation for details. - New update online documentation! See https://exomiser.readthedocs.io/en/latest/
- New Docker hub images for CLI and web on https://hub.docker.com/u/exomiser
- Added checks to ensure user specifies genome assembly if user specifies VCF path outside of phenopacket/analysis
- Added
--output-prefixoption to enable output prefix directly on the command line - Updated examples to use the latest recommended settings as per preset derived from 100,000 genomes project
for the latest data, please follow the discussions for announcements: #424
hg38 only configuration bugfix
Bug-fix release. No external changes.
CLI changes
- Bug fix for issue #410 where application fails to start when only specifying hg38 data in
application.properties