Building a whole genome SNV Phylogeny with SNVPhyl
This is a quick tutorial on how to construct a whole genome SNV phylogeny with SNVPhyl through IRIDA.
- Initial Data
- Adding Samples to the Cart
- Selecting a Pipeline
- Selecting Parameters
- Monitoring Pipeline Status
- Viewing the Results
- Viewing Provenance Information
- Advanced SNVPHyl Visualizations
The data for this tutorial comes from https://irida.corefacility.ca/downloads/data/irida-sample-data.zip. It is assumed the sequence files in
miseq-run/ have been uploaded into appropriate samples as described in the Web Upload Tutorial. Before starting this tutorial you should have a project with samples that appear as:
SNVPhyl requires a reference genome to be used for mapping sequencing reads and calling variants. This must be uploaded to the project containing the samples to use. A number of example reference files are provided under the
references/ folder in the sample data package. Please upload the file
08-5578.fasta using the following steps.
Select the appropriate project from Projects > Your Projects.
Select the Reference Files link from within the project and then select Upload Reference File.
Select and upload the appropriate reference file. This should be in nucleotide FASTA format and contain no ambiguous base-pair characters. Once uploaded you should see a page like the following.
Adding Samples to the Cart
Before a pipeline can be run a set of samples and sequence read data must be selected and added to the cart. For this tutorial please select all three samples and click the Add to Cart button.
Once the samples have been added to the cart, the samples can be reviewed by clicking on the Cart button at the top.
Selecting a Pipeline
Once inside the cart, you can select an available pipeline from the pipelines grid:
For this tutorial, we will select the Phylogenomics Pipeline.
Once the pipeline is selected, the next page provides an overview of all the input files, as well as the option to modify parameters.
Selecting Customize brings up a page where parameters can be customized.
The default parameters will often be appropriate but we will modify the Minimum read coverage to
10 for this tutorial. When finished please select Use these Parameters.
Once a set of parameters has been chosen, the Ready to Launch? button may be used to start the pipeline.
Once the button is selected you should see a screen showing that your pipeline has been launched.
Monitoring Pipeline Status
To monitor the status of the launched pipeline, please select the Analysis > Your Analyses menu.
The will bring you to a page where you can monitor the status of each launched workflow.
Clicking the pipeline name SNVPhyl_20151117 will bring you to a page for that analysis pipeline.
This page will continue to refresh as the pipeline progresses through each stage. It will take a while the SNVPhyl analysis pipeline to complete.
Viewing the Results
Once the pipeline is complete, you will see the generated phylogenetic tree within your browser and you will be given the option to download the results of the analysis. Please click Download to download these results now.
A number of files are provided within the download package. These are described below:
vcf2core.tsv: This defines the number of core positions evaluated for constructing the phylogeny.
phylogeneticTreeStats.txt: This contains additional information about the constructed tree.
phylogeneticTree.newick: This contains the constructed phylogenetic tree in newick format.
mappingQuality.txt: This defines the percent of the reference covered by each genome.
snvAlignment.phy: This defines a multiple sequence alignment of SNVs used to generate the phylogeny.
snvMatrix.tsv: This contains a pair-wise SNV distance matrix.
snvTable.tsv: This is a table of the individual variants detected.
filterStats.txt: This defines information about the SNVs removed due to poor quality.
More information about interpreting these files can be found in the SNVPhyl Output Guide.
Viewing Provenance Information
To view the pipeline provenance information, please select the Provenance tab.
This will display the individual steps of this pipeline and the parameters used at each step. For more details on the pipeline please see the SNVPhyl documentation.
Advanced SNVPHyl Visualizations
SNVPHyl Analyses can be combined with metadata from the sample the were run to get a more complete picture. For more information see Advanced Visualizations.