MUSIAL: Multi Sample Variant Analysis

MUSIAL is a web platform that computes prokaryotic genome, gene and protein sequence alignments from variant call datasets (using both SNVs and Indels) derived from multiple samples of one species. Scientists can assess the variability of a species on the genome, gene, and protein sequence level, e.g., study which positions in the species exhibit large or no variability, identify genes with large variability, which samples agree on which proteoforms, etc. The platform is designed to support the detection of features of interest for downstream analyses. It provides multiple data export options for FASTA and tabular formats as well as diverse visual analytics features.


Usage Instruction

Start a new session by providing the required input or re-upload a previously saved session using the input forms on this page. Alternatively, you can start an example session to familiarize yourself with the functions of MUSIAL.

After successful processing, you will be redirected to the results page. Choose between a sample-, feature- or variant-oriented views to explore your data interactively.

Download your session data to continue later or share it with your colleagues. MUSIAL offers the download of (optionally aligned) sequence data and tabular formats.

Input Forms: New Session

You can start a new session by providing the necessary input files in the input forms below. Mandatory inputs are highlighted. For more information on the required formats, click on the labels above the input fields. After filling in the necessary information, you can submit the request to the server by clicking the submit button at the end of the form. The server will process the input data and redirect you to the results page.


Reference Sequence

Reference Sequence .fasta, .fas, .fa, .fna, .ffn, .faa, .mpfa, .frn (Mandatory)

FASTA format file containing one or more entries to use as the reference sequence for the analysis. Ensure that the sequence identifier (SeqID) is unique for each entry and matches the identifiers in the used variant call (.vcf) and generic feature format (.gff) files. Format example:

>NC_000962.3 [organism=Mycobacterium tuberculosis] [strain=H37Rv]
TTGACCGATGACCCCGGTTCAGGCTTCACCACAGTGTGGAACGCGGTCGTCTCCGAACTTAACGGCG...

Excluded Positions

Positions to be excluded from the analysis. Entries can be added in the input field by pressing enter. Format example:

NC_000962.3 1473246,12468-13016

Excluded Variants

Variants at positions to be excluded from the analysis. Entries can be added in the input field by pressing enter. The variant is matched with the ALT field in the sample VCF files. Format example:

NC_000962.3 1473246:A,1473246:GT


Genomic Features

Reference Sequence Annotation .gff3 (Mandatory)

GFF format file, describing features of the provided reference sequence. Ensure that the seqname cell entries match the SeqID values of the provided reference sequence. Format example:

CP004010.2 Genbank CDS 4 1398 . + 0 ID=cds-AGN75228.1;Parent=gene-TPANIC_0001;Name=AGN75228.1;gbkey=CDS;gene=dnaA;locus_tag=TPANIC_0001;product=DNA-directed DNA replication initiator protein

Selection Table

The selection table will be filled after uploading the reference sequence annotation file. MUSIAL is primarily designed to analyze a subset of the features, i.e. genes, on the provided reference sequence; Select the features to be analyzed by clicking on them. The first column coding-sequence (CDS) yields interactive cells that can be clicked to mark or unmark the corresponding feature for proteoform analysis. The text entered in the Search Keyword input is applied as a filter to all columns in an or logic.

Samples

Variant Calls .vcf (Mandatory)

File(s) in VCF format, each containing variant calls for exactly one biological sample with ploidity one. Ensure that the CHROM column's row values match any SeqID of the provided reference seuqence file. Filenames will be used as internal sample names. Format example:

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT ERR017782
NC_000962.3 1534 . T C 1000 . AC=1;AF=1.00;AN=1;DP=38;ExcessHet=0.0000;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=34.79;SOR=1.542 GT:AD:DP:GQ:PL 1:033:33:99:1162,0

Sample Meta Information .csv, .tsv

TSV like format file containing meta information to be added to samples. Column names and cell values can be arbitrary; the first column must contain sample names that match a provided variant call file name. Format example:

name strain date country region lineage
ERR970409 G04126 1970-01-02 South Africa Africa D
ERR234106 G00883 1970-01-02 Germany Europe L6


Variant Call Filter Parameters

Minimal Variant Call Position Coverage; Minimal number of total reads, independent of the reflected allele, to cover a variant call position to accept. Requires a positive integer.
Minimal Variant Call Frequency; Minimal allele frequency (in %, wrt. reads) of an allele at a variant call position to accept. Integer between 0 and 100.

*Mandatory fields must be filled in and at least one feature must be selected in the Selection Table.
Re-Upload Session

You can re-upload a previously saved session by selecting the corresponding .zlib file below. This will redirect you to the MUSIAL results page. Note that the exact state of the results page is not saved, only the processed data. The session data can be downloaded from the results page once a submission has been successfully processed.


MUSIL Session .zlib