Skip to content

Commit 498ac2e

Browse files
authored
Merge pull request #13 from Integrative-Transcriptomics/dev
Merge dev into main.
2 parents 927e60c + d1331f3 commit 498ac2e

28 files changed

+975
-490
lines changed

README.md

Lines changed: 46 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -4,92 +4,102 @@
44

55
# MUSIAL
66

7-
**MUSIAL** (MUlti Sample varIant AnaLysis) is a Java command-line tool designed to analyze and summarize single nucleotide variants (SNVs) and insertions/deletions (indels) across multiple prokaryotic samples.
8-
The software aggregates and analyzes variant calls from multiple samples of a prokaryotic species and provides an interface to generate comprehensive statistics and alignments at the genome, gene and protein level.
9-
MUSIAL enables a comprehensive assessment of variability within a species at the genome, gene and protein level, providing insights into, for example, conserved and variable regions, diversity at the gene level and common proteoforms among samples.
10-
11-
## ✨ Features
7+
**MUSIAL** (MUlti Sample varIant AnaLysis) is a Java command-line tool to analyze large sets of VCF files with prokaryotic single nucleotide variants (SNVs) and insertions/deletions (indels). It provides an interface for generating comprehensive statistics and alignments, as well as assessing variability at genome, gene and protein levels.
128

139
- **Integrates SnpEff and other Sequence Ontology compliant annotations** to help interpret variants.
1410
- **Projection to genomic features (genes) facilitates allele- and proteoform-specific information** that supports the characterization of individual samples.
1511
- **VCF based sequence reconstruction** at nucleotide and protein sequence level and tabular reports on sample, feature and variant statistics.
1612

17-
## 📖 Usage
13+
### 📖 Usage
1814

1915
An executable `jar` file (`Java 21`) is available from the [Releases](https://github.com/Integrative-Transcriptomics/MUSIAL/releases) section.
2016
MUSIAL operates on a modular, task-based architecture that is primarily initiated by the `build` task, which creates a JSON file (_storage_) as its primary output; this is then used as input for all other tasks.
2117

22-
The general CLI usage is `java -jar MUSIAL-v2.4.0.jar <task>`, whereby the following tasks are available:
18+
Details on the use of the software and tutorials can be found in the repository [Wiki](https://github.com/Integrative-Transcriptomics/MUSIAL/wiki). The general CLI usage is `java -jar MUSIAL-v2.4.2.jar <task>`, whereby the following tasks are available:
2319

2420
<details>
2521
<summary><code>build</code> - Build a local database file (storage) in JSON format from variant calls; the mandatory input for other tasks.</summary>
2622

2723
```
2824
Command line arguments of task build
2925
30-
-C,--configuration <arg> Path to a JSON file specifying the build task parameter configuration for MUSIAL.
26+
-C,--configuration <arg> Path to a JSON file specifying the build task parameter configuration for MUSIAL. Visit the documentation for details.
3127
```
3228
</details>
3329

3430
<details>
35-
<summary><code>expand</code> - Expand an existing storage file from variant call format (VCF) files.</summary>
31+
<summary><code>expand</code> - Expand an existing storage file from variant call files and/or meta data.</summary>
3632

3733
```
3834
Command line arguments of task expand
3935
36+
-d,--dry-run Only report on novel entries without writing the updated storage.
4037
-I,--storage <arg> Path to a .json(.gz) file generated with the build task of MUSIAL.
4138
-m,--vcfMeta <arg> Path to a .tsv or .csv file specifying sample annotations.
4239
-o,--output <arg> Path to write the output file (default: overwrite input file).
43-
-p,--preview Only report on novel entries without writing the updated storage.
44-
-V,--vcfInput <arg> List of file or directory paths. All files must be in VCF format.
40+
-V,--vcfFiles <arg> List of file or directory paths. All files must be in VCF format.
4541
```
4642
</details>
4743

4844
<details>
49-
<summary><code>view</code> - View the content (features, samples or variants) and their attributes, of a MUSIAL storage file.</summary>
45+
<summary><code>view</code> - View the content (features, samples or variants; and their attributes) of a MUSIAL storage file.</summary>
5046

5147
```
5248
Command line arguments of task view
5349
54-
-C,--content <arg> One of sample, allele, call, variant, type, feature.
55-
-f,--filter <arg> List of feature-, sample names, and/or positions for which the output is to be filtered (default: no filters). Entries may be
56-
ignored depending on the content.
50+
-C,--content <arg> The content to view. One of FEATURES, SAMPLES, VARIANTS (case-insensitive).
5751
-I,--storage <arg> Path to a .json(.gz) file generated with the build task of MUSIAL.
58-
-o,--output <arg> Path to directory or file to write the output to (default: stdout).
52+
-o,--output <arg> Path to write the output file. If not provided, a default file will be created based on the input file (default). If `print` or
53+
`stdout` is specified, the output will be printed to the console.
54+
-q,--query <arg> One or multiple identifiers or genomic ranges (contig:start-end) to query.
5955
```
6056
</details>
6157

6258
<details>
63-
<summary><code>sequence</code> - Export FASTA format sequences of features from a MUSIAL storage file.</summary>
59+
<summary><code>profile</code> - Profile samples with respect to variants, alleles, or proteoforms.</summary>
6460

6561
```
66-
Command line arguments of task sequence
62+
Command line arguments of task profile
6763
68-
-c,--content <arg> One of `nt` or `aa` (default: `nt`).
69-
-F,--features <arg> List of feature names to export data for. Non-coding features are skipped if `content` is `aa`.
70-
-I,--input <arg> Path to a .json(.gz) file generated with the build task of MUSIAL.
71-
-k,--conserved Export conserved sites.
72-
-m,--merge Export sequences per allele or proteoform instead of per sample.
73-
-o,--output <arg> Path to a directory to write the output files to (default: parent of input).
74-
-r,--reference Include the reference sequence within the export.
75-
-s,--samples <arg> List of sample names to restrict the sequence export to.
76-
-x,--strip Strip all gap characters from the exported sequences.
64+
-C,--content <arg> The content to view. One of VARIANTS, ALLELES, PROTEOFORMS (case-insensitive).
65+
-I,--storage <arg> Path to a .json(.gz) file generated with the build task of MUSIAL.
66+
-o,--output <arg> Path to write the output file. If not provided, a default file will be created based on the input file (default). If `print` or
67+
`stdout` is specified, the output will be printed to the console.
68+
-q,--query <arg> One or multiple identifiers or genomic ranges (contig:start-end) to consider.
69+
-x,--reduced Represent entries in a reduced format, i.e., sequence types as numbers with 0 as the reference or synonymous sequence and
70+
variants without detailed call information.
7771
```
7872
</details>
7973

80-
---
74+
<details>
75+
<summary><code>sequence</code> - Generate and write sequence data.</summary>
8176

82-
Further details on the use of the software and internal workflows can be found in the repository [Wiki](https://github.com/Integrative-Transcriptomics/MUSIAL/wiki).
77+
```
78+
Command line arguments of task sequence
79+
80+
-a,--align Whether to align sequences (optional, default: false).
81+
-c,--content <arg> Whether to generate NUCLEOTIDE or AMINOACID sequences (optional, case-insensitive, default: NUCLEOTIDE).
82+
-f,--split <arg> Whether to split output files by FEATURE, SAMPLE, BOTH, or NONE (optional, case-insensitive, default: FEATURE).
83+
-I,--storage <arg> Path to a .json(.gz) file generated with the build task of MUSIAL.
84+
-l,--locations <arg> One or multiple feature identifiers or genomic ranges (contig:start-end) to generate sequence data of. If none are provided,
85+
all features or full contig ranges will be considered.
86+
-m,--merge Whether to merge identical sequences (optional, default: false).
87+
-o,--output <arg> Path to write the output. If not provided, the directory of the input storage is used. If a directory is provided, files are
88+
created there. If a file is provided, its parent directory is used.
89+
-s,--samples <arg> One or multiple sample identifiers to retrieve sequences for (optional).
90+
-v,--variable Whether to only consider variable positions (optional, default: false).
91+
```
92+
</details>
8393

84-
## 🌐 Web Interface
94+
### 🌐 Web Interface
8595

86-
To provide user-friendly access to its functionalities, MUSIAL is available via a web interface at https://musial-tuevis.cs.uni-tuebingen.de/ currently running version `v2.3.10`. The code is deposited in the `web` branch.
96+
MUSIAL is also available via a web interface at https://musial-tuevis.cs.uni-tuebingen.de/ currently running version `v2.3.10`.
8797

88-
## 🔨 Build
98+
### Build
8999

90-
MUSIAL `v2.4` is built with `JDK 21.0.6` and `Gradle 8.2.1`. If you want to compile the source code, run `gradle clean build` in the root directory of the project. The JavaDoc of the software is available at [https://integrative-transcriptomics.github.io/MUSIAL/javadoc/](https://integrative-transcriptomics.github.io/MUSIAL/javadoc/).
100+
MUSIAL `v2.4` is built with `JDK 21.0.6` and `Gradle 9.1.0`. If you want to compile the source code, run `gradle clean build` in the root directory of the project. The JavaDoc of the software is available at [https://integrative-transcriptomics.github.io/MUSIAL/javadoc/](https://integrative-transcriptomics.github.io/MUSIAL/javadoc/).
91101

92-
## 🙋 Need Help?
102+
### Need Help?
93103

94-
- 🎓 Detailed information about the software can be found in the repository's [Wiki](https://github.com/Integrative-Transcriptomics/MUSIAL/wiki).
95-
- 🐛 Found an issue or have a feature request? Feel free to [Open a GitHub issue](https://github.com/Integrative-Transcriptomics/MUSIAL/issues/new).
104+
- Detailed information about the software can be found in the repository's [Wiki](https://github.com/Integrative-Transcriptomics/MUSIAL/wiki).
105+
- Found an issue or have a feature request? Feel free to [Open a GitHub issue](https://github.com/Integrative-Transcriptomics/MUSIAL/issues/new).

build.gradle

Lines changed: 56 additions & 101 deletions
Original file line numberDiff line numberDiff line change
@@ -1,88 +1,23 @@
1-
/*
2-
Set meta-information for the project build.
3-
*/
4-
version 'v2.4.2'
5-
group 'de.tue.cs.ibmi.it'
6-
7-
println "Name: $name"
8-
println "Project directory: $projectDir"
9-
println "Build directory: $buildDir"
10-
println "Version: $version"
11-
println "Group: $project.group"
12-
println "AntBuilder: $ant"
13-
14-
/*
15-
The `buildscript` block defines properties (repositories, plugins, ...)
16-
used within the Gradle build process.
17-
*/
18-
buildscript {
19-
repositories {
20-
mavenCentral()
21-
}
22-
}
23-
24-
/*
25-
Set JUnit platform for testing.
26-
*/
27-
tasks.withType(Test) {
28-
useJUnitPlatform()
1+
/* -------------------- Plugins and Toolchain -------------------- */
2+
plugins {
3+
id 'java'
294
}
305

31-
/*
32-
Import plugins used during the build process.
33-
*/
34-
/*
35-
To prevent duplicate access to output directories a dependency-hierarchy is
36-
established on different tasks.
37-
*/
38-
tasks.whenTaskAdded { task ->
39-
if (task.name == 'jar' || task.name == 'processResources') {
40-
task.dependsOn unpackSnpEff
6+
java {
7+
toolchain {
8+
languageVersion = JavaLanguageVersion.of(21)
419
}
4210
}
4311

44-
/*
45-
This task unzips SnpEff.
46-
*/
47-
task unpackSnpEff(type: Copy) {
48-
from zipTree('src/main/resources/snpEff.zip')
49-
into 'src/main/resources/'
50-
}
51-
52-
/*
53-
After building, the Jar is copied into the `releases` directory.
54-
*/
55-
task copyJarToReleases(type: Copy) {
56-
mkdir 'releases'
57-
def jarName = "build/libs/" + rootProject.name + "-" + version + ".jar"
58-
from jarName
59-
into "releases"
60-
}
61-
62-
/*
63-
Defines the repositories to look up dependencies.
64-
*/
12+
/* -------------------- Repositories -------------------- */
6513
repositories {
6614
mavenCentral()
67-
maven {
68-
url "https://repository.jboss.org/nexus/content/repositories/thirdparty-releases/"
69-
}
70-
maven {
71-
url "https://bio.informatik.uni-jena.de/repository/libs-release-oss/"
72-
}
73-
maven {
74-
url "https://artifactory.cronapp.io/public-release/"
75-
}
15+
maven { url "https://repository.jboss.org/nexus/content/repositories/thirdparty-releases/" }
16+
maven { url "https://bio.informatik.uni-jena.de/repository/libs-release-oss/" }
17+
maven { url "https://artifactory.cronapp.io/public-release/" }
7618
}
7719

78-
/*
79-
The following plugins are used during the gradle build process.
80-
*/
81-
apply plugin: 'java'
82-
83-
/*
84-
All project dependencies are defined in the following block.
85-
*/
20+
/* -------------------- Dependencies -------------------- */
8621
dependencies {
8722
implementation 'commons-cli:commons-cli:1.5.0'
8823
implementation 'commons-io:commons-io:2.14.0'
@@ -99,24 +34,52 @@ dependencies {
9934
implementation 'org.ehcache:ehcache:3.11.1'
10035
implementation 'uk.co.omega-prime:btreemap:1.2.0'
10136
implementation 'tech.tablesaw:tablesaw-core:0.44.4'
102-
testImplementation 'org.junit.jupiter:junit-jupiter-api:5.8.1'
103-
testRuntimeOnly 'org.junit.jupiter:junit-jupiter-engine:5.8.1'
37+
testImplementation 'org.junit.jupiter:junit-jupiter-api:6.0.0'
38+
testRuntimeOnly 'org.junit.jupiter:junit-jupiter-engine:6.0.0'
39+
testRuntimeOnly 'org.junit.platform:junit-platform-launcher:6.0.0'
40+
}
41+
42+
dependencyLocking {
43+
lockAllConfigurations()
44+
}
45+
46+
/* -------------------- Project Metadata -------------------- */
47+
version 'v2.4.2'
48+
group 'de.tue.cs.ibmi.it'
49+
50+
println "Name: $name"
51+
println "Project directory: $projectDir"
52+
println "Build directory: $buildDir"
53+
println "Version: $version"
54+
println "Group: $project.group"
55+
println "AntBuilder: $ant"
56+
57+
/* -------------------- Task Definitions -------------------- */
58+
tasks.withType(Test) {
59+
useJUnitPlatform()
60+
}
61+
62+
tasks.named('processResources') {
63+
dependsOn tasks.named('unpackSnpEff')
64+
}
65+
66+
tasks.named('jar') {
67+
dependsOn tasks.named('unpackSnpEff')
10468
}
10569

106-
/*
107-
In order to build a final FatJar, all entries from configurations.implementation (the above defined
108-
dependencies) are copied into configurations.includeJars.
109-
*/
110-
configurations {
111-
includeJars.extendsFrom implementationjava
70+
tasks.register('unpackSnpEff', Copy) {
71+
from zipTree('src/main/resources/snpEff.zip')
72+
into 'src/main/resources/'
11273
}
11374

114-
sourceCompatibility = JavaVersion.VERSION_21
115-
targetCompatibility = JavaVersion.VERSION_21
75+
tasks.register('copyJarToReleases', Copy) {
76+
mkdir 'releases'
77+
def jarName = "build/libs/" + rootProject.name + "-" + version + ".jar"
78+
from jarName
79+
into "releases"
80+
}
11681

117-
/*
118-
Defines the source directories for the tasks executed by the `java` plugin.
119-
*/
82+
/* -------------------- Source Sets and Resource Processing -------------------- */
12083
sourceSets {
12184
main {
12285
java {
@@ -128,11 +91,8 @@ sourceSets {
12891
}
12992
}
13093

131-
/*
132-
Copies the title and version of the project into resources files.
133-
*/
13494
processResources {
135-
duplicatesStrategy 'exclude' // If duplicated files exist, they are excluded.
95+
duplicatesStrategy 'exclude'
13696
exclude('*.zip')
13797
filesMatching('version.properties') {
13898
expand projectVersion: version
@@ -142,29 +102,24 @@ processResources {
142102
}
143103
}
144104

105+
/* -------------------- Jar Configuration -------------------- */
145106
jar {
146-
duplicatesStrategy 'exclude' // If duplicated files exist, they are excluded.
107+
duplicatesStrategy 'exclude'
147108
manifest {
148109
attributes("Implementation-Title": rootProject.name,
149110
"Implementation-Version": archiveVersion,
150111
"Main-Class": "main.Musial")
151112
}
152113
doFirst {
153114
from { configurations.runtimeClasspath.collect { it.isDirectory() ? it : zipTree(it) } }
154-
// Collects all Jars from dependencies and builds a FatJar.
155115
}
156116
}
157117

158-
/*
159-
Defines steps executed by calling `gradle clean`.
160-
*/
118+
/* -------------------- Build Finalization and Cleanup -------------------- */
161119
clean.doFirst {
162120
delete "${rootDir}/releases/"
163121
delete "${rootDir}/build/"
164122
delete "${rootDir}/src/main/resources/snpEff"
165123
}
166124

167-
/*
168-
After building the FatJar, it is copied to the releases directory.
169-
*/
170125
build.finalizedBy(copyJarToReleases)

settings.gradle

Lines changed: 0 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1 @@
1-
/*
2-
* This settings file was auto generated by the Gradle buildInit task
3-
* by 'seitza' at '9/23/16 8:26 AM' with Gradle 2.13
4-
*
5-
* The settings file is used to specify which projects to include in your build.
6-
* In a single project build this file can be empty or even removed.
7-
*
8-
* Detailed information about configuring a multi-project build in Gradle can be found
9-
* in the user guide at https://docs.gradle.org/2.13/userguide/multi_project_builds.html
10-
*/
11-
12-
/*
13-
// To declare projects as part of a multi-project build use the 'include' method
14-
include 'shared'
15-
include 'api'
16-
include 'services:webservice'
17-
*/
18-
191
rootProject.name = 'MUSIAL'

src/main/java/cli/CLIProfile.java

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -74,11 +74,17 @@ public static Options options() {
7474
* The content type to profile.
7575
*/
7676
public enum Content {
77-
// Per sample variants.
77+
/**
78+
* Per sample variants.
79+
*/
7880
VARIANTS,
79-
// Per sample alleles of features.
81+
/**
82+
* Per sample alleles.
83+
*/
8084
ALLELES,
81-
// Per sample proteoforms of features.
85+
/**
86+
* Per sample proteoforms.
87+
*/
8288
PROTEOFORMS
8389
}
8490

0 commit comments

Comments
 (0)