Performing de novo assemblies with IRIDA
This is a quick tutorial on how to assemble a set of genomes through IRIDA.
- Pipeline Overview
- Initial Data
- Adding Samples to the Cart
- Selecting a Pipeline
- Selecting Parameters
- Monitoring Pipeline Status
- Viewing the Results
- Viewing Provenance Information
The assembly and annotation pipeline built into IRIDA proceeds through the following steps.
- Paired-end reads are merged using FLASh.
- The merged paired-end reads as well as the unmerged reads are passed to SPAdes to perform a de novo assembly.
- The contigs returned by SPAdes are filtered to remove small and low coverage contigs.
- The filtered contigs are passed to Prokka for genome annotation.
- A set of summary statistics are generated for the assembled genome.
The data for this tutorial comes from https://irida.corefacility.ca/downloads/data/irida-sample-data.zip. It is assumed the sequence files in
miseq-run-assembly-small/ have been uploaded into appropriate samples as described in the Web Upload Tutorial. Before starting this tutorial you should have a project with samples that appear as:
Adding Samples to the Cart
Before a pipeline can be run a set of samples and sequence read data must be selected and added to the cart. For this tutorial please select all three samples and click the Add to Cart button.
Once the samples have been added to the cart, the samples can be reviewed by clicking on the Cart button at the top.
Selecting a Pipeline
Once inside the cart, the Select a Pipeline button can be used to select a pipeline to run on the selected samples.
From the Select a Pipeline view a number of different pipelines are available.
There are two different types of assembly pipelines available:
- Assembly and Annotation Pipeline: This is used for assembling and annotating a single genome.
- Assembly and Annotation Collection Pipeline: This is used for assembling and annotating a collection of genomes and compiling the results into a single downloadable package.
For this tutorial, we will select the Assembly and Annotation Collection Pipeline.
Once the pipeline is selected, the next page provides an overview of all the input files, as well as the option to modify parameters.
We will use the default parameters. Please select the Ready to Launch? button to continue.
Once the button is selected you should see a screen showing that your pipeline has been launched.
Monitoring Pipeline Status
To monitor the status of the launched pipeline, please select the Analysis > Your Analyses menu.
The will bring you to a page where you can monitor the status of each launched workflow.
Clicking the pipeline name AssemblyAnnotationCollection_… will bring you to a page for that analysis pipeline.
This page will continue to refresh as the pipeline progresses through each stage. It will take a few minutes for the assembly and annotation pipeline to complete.
Viewing the Results
Once the pipeline is complete, you will be given the option to download the results of the analysis. Please click Download to download these results now.
A number of files are provided in the downloaded results. These are described below.
contigs-with-repeats-combined.fasta.zip: The assembled contigs, after repeats were identified and low coverage/small contigs removed.
assembly-stats-with-repeats-combined.tsv: A table of assembly statistics for the assembled contigs.
genome-combined.gbk.zip: The annotated contigs, in GenBank format.
prokka_stats-combined.txt.zip: The stats output from Prokka.
prokka-combined.log.zip: The log files from Prokka.
prokka-combined.err.zip: The error files from Prokka.
filter-spades-combined.txt.zip: Information on the contigs removed and filtering parameters.
contigs-without-repeats-combined.fasta.zip: The assembled and filtered contigs, minus any repeat regions.
contigs-all-combined.fasta.zip: All contigs output from SPAdes, without any flitering.
spades-combined.log.zip: The log files from SPAdes.
flash-combined.log.zip: The log files from FLASh used to merge paired-end reads.
Viewing Provenance Information
To view the pipeline provenance information, please select the Provenance tab.
This will display the individual steps of this pipeline and the parameters used at each step.