Building a whole genome SNV Phylogeny with SNVPhyl

This is a quick tutorial on how to construct a whole genome SNV phylogeny with SNVPhyl through IRIDA.

Initial Data

The data for this tutorial comes from https://irida.corefacility.ca/downloads/data/irida-sample-data.zip. It is assumed the sequence files in miseq-run/ have been uploaded into appropriate samples as described in the Web Upload Tutorial. Before starting this tutorial you should have a project with samples that appear as:

tutorial-pipeline-samples.png

Reference Genome

SNVPhyl requires a reference genome to be used for mapping sequencing reads and calling variants. This must be uploaded to the project containing the samples to use. A number of example reference files are provided under the references/ folder in the sample data package. Please upload the file 08-5578.fasta using the following steps.

  1. Select the appropriate project from Projects > Your Projects.

    your-projects-menu.png

  2. Select the Reference Files link from within the project and then select Upload Reference File.

    reference-files-page.png

  3. Select and upload the appropriate reference file. This should be in nucleotide FASTA format and contain no ambiguous base-pair characters. Once uploaded you should see a page like the following.

    reference-file-uploaded.png

Adding Samples to the Cart

Before a pipeline can be run a set of samples and sequence read data must be selected and added to the cart. For this tutorial please select all three samples and click the Add to Cart button.

select-samples.png

Once the samples have been added to the cart, the samples can be reviewed by clicking on the Cart button at the top.

cart-button.png

Selecting a Pipeline

Once inside the cart, the Select a Pipeline button can be used to select a pipeline to run on the selected samples.

select-a-pipeline.png

From the Select a Pipeline view a number of different pipelines are available.

select-a-pipeline-view.png

For this tutorial, we will select the Phylogenomics Pipeline.

Selecting Parameters

Once the pipeline is selected, the next page provides an overview of all the input files, as well as the option to modify parameters.

snvphyl-pipeline-page.png

Selecting Customize brings up a page where parameters can be customized.

snvphyl-parameters.png

The default parameters will often be appropriate but we will modify the Minimum read coverage to 10 for this tutorial. When finished please select Use these Parameters.

Once a set of parameters has been chosen, the Ready to Launch? button may be used to start the pipeline.

ready-to-launch-button.png

Once the button is selected you should see a screen showing that your pipeline has been launched.

pipeline-launch.png

Monitoring Pipeline Status

To monitor the status of the launched pipeline, please select the Analysis > Your Analyses menu.

your-analyses-menu.png

The will bring you to a page where you can monitor the status of each launched workflow.

snvphyl-analysis-status.png

Clicking the pipeline name SNVPhyl_20151117 will bring you to a page for that analysis pipeline.

snvphyl-analysis-status-details.png

This page will continue to refresh as the pipeline progresses through each stage. It will take a while the SNVPhyl analysis pipeline to complete.

Viewing the Results

Once the pipeline is complete, you will see the generated phylogenetic tree within your browser and you will be given the option to download the results of the analysis. Please click Download to download these results now.

snvphyl-results.png

A number of files are provided within the download package. These are described below:

  1. vcf2core.tsv: This defines the number of core positions evaluated for constructing the phylogeny.
  2. phylogeneticTreeStats.txt: This contains additional information about the constructed tree.
  3. phylogeneticTree.newick: This contains the constructed phylogenetic tree in newick format.
  4. mappingQuality.txt: This defines the percent of the reference covered by each genome.
  5. snvAlignment.phy: This defines a multiple sequence alignment of SNVs used to generate the phylogeny.
  6. snvMatrix.tsv: This contains a pair-wise SNV distance matrix.
  7. snvTable.tsv: This is a table of the individual variants detected.
  8. filterStats.txt: This defines information about the SNVs removed due to poor quality.

More information about interpreting these files can be found in the SNVPhyl Output Guide.

Viewing Provenance Information

To view the pipeline provenance information, please select the Provenance tab.

snvphyl-provenance.png

This will display the individual steps of this pipeline and the parameters used at each step. For more details on the pipeline please see the SNVPhyl documentation.

Advanced SNVPHyl Visualizations

SNVPHyl Analyses can be combined with metadata from the sample the were run to get a more complete picture. For more information see Advanced Visualizations.