Commit 4059c4fa authored by Peter Kruczkiewicz's avatar Peter Kruczkiewicz

Update docs for installing updated pipelines

parent 2f57b202
Pipeline #7593 passed with stage
in 80 minutes and 58 seconds
......@@ -7,31 +7,54 @@ description: "Install guide for the assembly and annotation collection pipeline.
Assembly and Annotation Collection
==================================
This workflow can be used for assemblying and annotating many genomes in one submission. The results from one submission will be packaged together into a single file. The workflow uses the software [SPAdes][] and [Prokka][] for assembly and annotation of genomes as well as a few tools for filtering of data and generating assembly statistics. The specific Galaxy tools are listed in the table below.
| Tool Name | Owner | Tool Revision | Toolshed Installable Revision | Toolshed |
|:--------------------------:|:--------:|:-------------:|:-----------------------------:|:--------------------:|
| **flash** | irida | 4287dd541327 | 0 (2015-05-05) | [IRIDA Toolshed][] |
| **filter_spades_repeats** | irida | f9fc830fa47c | 0 (2015-05-05) | [IRIDA Toolshed][] |
| **assemblystats** | irida | 51b76a5d78a5 | 1 (2015-05-07) | [IRIDA Toolshed][] |
| **bundle_collections** | irida | 7bc329e1ada4 | 0 (2015-05-20) | [IRIDA Toolshed][] |
| **combine_assembly_stats** | irida | c970d74729e9 | 0 (2015-05-20) | [IRIDA Toolshed][] |
| **spades** | nml | 35cb17bd8bf9 | 4 (2016-08-08) | [Galaxy Main Shed][] |
| **prokka** | crs4 | f5e44aad6498 | 7 (2015-10-01) | [Galaxy Main Shed][] |
| **regex_find_replace** | jjohnson | 9ea374bb0350 | 0 (2014-03-29) | [Galaxy Main Shed][] |
This workflow can assemble and annotate multiple genomes in one submission. The results from one submission will be packaged together into a single file. The workflow uses the [shovill] and [Prokka][] software for assembly and annotation of genomes, respectively, as well as [QUAST] for assembly quality assessment. The specific Galaxy tools are listed in the table below.
| Tool Name | Owner | Tool Revision | Toolshed Installable Revision | Toolshed |
|:--------------------------:|:--------:|:--------------:|:-----------------------------:|:--------------------:|
| **bundle_collections** | irida | [7bc329e1ada4] | 0 (2015-05-20) | [IRIDA Toolshed][] |
| **shovill** | iuc | [57d5928f456e] | 1 (2018-03-07) | [Galaxy Main Shed][] |
| **prokka** | crs4 | [eaee459f3d69] | 14 (2018-03-28) | [Galaxy Main Shed][] |
| **quast** | iuc | [0834c823d4b9] | 4 (2018-02-12) | [Galaxy Main Shed][] |
To install these tools please proceed through the following steps.
## Step 1: Install Dependencies
## Step 1: Galaxy Conda Setup
Galaxy makes use of [Conda][conda] to automatically install some dependencies for this workflow. Please verify that the version of Galaxy is >= v16.01 and has been setup to use conda (by modifying the appropriate configuration settings, see [here][galaxy-config] for additional details). A method to get this workflow to work with a Galaxy version < v16.01 is available in [FAQ/Conda dependencies][].
### Address shovill related issues
#### Error 256 from running `kmc`/`samtools`
You will need to install the correct versions of some dependencies for `kmc`/`samtools` so after installing `shovill`:
```bash
# activate the Galaxy shovill conda env
source galaxy/deps/_conda/bin/activate galaxy/deps/_conda/envs/__shovill@0.9.0
# install ncurses and bzip2 from conda-forge channel
conda install -c conda-forge ncurses bzip2
```
#### [PILON] Java/JVM heap allocation issues
[PILON] is a Java application and may require the JVM heap size to be set (e.g. `_JAVA_OPTIONS=-Xmx4g`).
If [shovill] under Galaxy submits jobs to a [SLURM] workload manager, it may be necessary to allot about 4G more through SLURM than through [shovill] `--ram` (default is `${SHOVILL_RAM:-4}` or 4G as of tool revision [57d5928f456e]) so if you give [shovill] 4G, give the SLURM job 8G.
Some of these tools require additional dependencies to be installed. For a cluster environment please make sure these are available on all cluster nodes by installing to a shared directory. This can be done with conda (assuming Galaxy is configured to load up the environment `galaxy` for each tool execution using the `env.sh` file).
One way you can adjust the `$SHOVILL_RAM` environment variable is via the [conda environment][]. That is, if you find the conda environment containing `shovill` you can set up files in `etc/conda/activate.d` and `etc/conda/deactivate.d` to set environment variables.
```bash
source activate galaxy
conda install perl-xml-simple perl-time-piece perl-bioperl perl-data-dumper openjdk gnuplot libjpeg-turbo
source deactivate
cd galaxy/deps/_conda/bin/activate galaxy/deps/_conda/envs/__shovill@0.9.0
mkdir -p etc/conda/activate.d
mkdir -p etc/conda/deactivate.d
echo -e "export _OLD_SHOVILL_RAM=\$SHOVILL_RAM\nexport SHOVILL_RAM=8" >> etc/conda/activate.d/shovill-ram.sh
echo -e "export SHOVILL_RAM=\$_OLD_SHOVILL_RAM" >> etc/conda/activate.d/shovill-ram.sh
```
You could also get fancier with this by setting `SHOVILL_RAM` based on [GALAXY_MEMORY_MB][], which is assigned by Galaxy based on your job configuration and resource requirements. For example, by setting `SHOVILL_RAM=$($GALAXY_MEMORY_MB/1024)`.
## Step 2: Install Galaxy Tools
Please install all the Galaxy tools in the table above by logging into Galaxy, navigating to **Admin > Search and browse tool sheds**, searching for the appropriate **Tool Name** and installing the appropriate **Toolshed Installable Revision**.
......@@ -58,23 +81,29 @@ A Galaxy workflow and some test data has been included with this documentation t
![dataset-pair-screen][]
4. This should have properly paired your data and named the sample **a**. Enter the name of this paired dataset collection at the bottom and click **Create list**.
5. Run the uploaded workflow by clicking on **Workflow**, clicking on the name of the workflow **FLASH, SPAdes and Prokka (imported from uploaded file)** and clicking **Run**. This should auto fill in the dataset collection. At the very bottom of the screen click **Run workflow**.
5. Run the uploaded workflow by clicking on **Workflow**, clicking on the name of the workflow **AssemblyAnnotationCollection-shovill-prokka-paired_reads-v0.4 (imported from uploaded file)** and clicking **Run**. This should auto fill in the dataset collection. At the very bottom of the screen click **Run workflow**.
6. If everything was installed correctly, you should see each of the tools run successfully (turn green). On completion this should look like.
![workflow-success][]
If you see any tool turn red, you can click on the view details icon ![view-details-icon][] for more information.
If everything was successfull then all dependencies for this pipeline have been properly installed.
If everything was successful then all dependencies for this pipeline have been properly installed.
[7bc329e1ada4]: http://irida.corefacility.ca/galaxy-shed/view/irida/bundle_collections/7bc329e1ada4
[57d5928f456e]: https://toolshed.g2.bx.psu.edu/repos/iuc/shovill/rev/57d5928f456e
[eaee459f3d69]: https://toolshed.g2.bx.psu.edu/view/crs4/prokka/eaee459f3d69
[0834c823d4b9]: https://toolshed.g2.bx.psu.edu/view/iuc/quast/0834c823d4b9
[galaxy-config]: ../../setup#step-4-modify-configuration-file
[SLURM]: https://slurm.schedmd.com
[PILON]: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4237348/
[SPAdes]: http://bioinf.spbau.ru/spades
[shovill]: https://github.com/tseemann/shovill/
[Prokka]: http://www.vicbioinformatics.com/software.prokka.shtml
[QUAST]: http://quast.sourceforge.net/quast.html
[tbl2asn]: http://www.ncbi.nlm.nih.gov/genbank/tbl2asn2/
[Galaxy Main Shed]: http://toolshed.g2.bx.psu.edu/
[IRIDA Toolshed]: https://irida.corefacility.ca/galaxy-shed
[Java]: http://www.oracle.com/technetwork/java/javase/downloads/index.html
[gnuplot]: http://www.gnuplot.info/
[BioPerl]: http://www.bioperl.org/wiki/Main_Page
[Assembly Annotation Galaxy Workflow]: ../test/assembly-annotation-collection/assembly-annotation-collection.ga
[upload-icon]: ../test/snvphyl/images/upload-icon.jpg
[test/reads]: ../test/assembly-annotation/reads
......@@ -84,3 +113,8 @@ If everything was successfull then all dependencies for this pipeline have been
[workflow-success]: ../test/assembly-annotation/images/workflow-success.png
[view-details-icon]: ../test/snvphyl/images/view-details-icon.jpg
[FAQ]: ../../../faq/#tbl2asn-out-of-date
[conda]: https://conda.io/docs/intro.html
[bioconda]: https://bioconda.github.io/
[FAQ/Conda dependencies]: ../../../faq#installing-conda-dependencies-in-galaxy-versions--v1601
[conda environment]: https://conda.io/docs/user-guide/tasks/manage-environments.html#saving-environment-variables
[GALAXY_MEMORY_MB]: https://planemo.readthedocs.io/en/latest/writing_advanced.html#developing-for-clusters-galaxy-slots-galaxy-memory-mb-and-galaxy-memory-mb-per-slot
\ No newline at end of file
......@@ -7,29 +7,53 @@ description: "Install guide for the assembly and annotation pipeline."
Assembly and Annotation
=======================
This workflow uses the software [SPAdes][] and [Prokka][] for assembly and annotation of genomes as well as a few tools for filtering of data and generating assembly statistics. The specific Galaxy tools are listed in the table below.
This workflow uses the [shovill] and [Prokka][] software for assembly and annotation of genomes, respectively, as well as [QUAST] for assembly quality assessment. The specific Galaxy tools are listed in the table below.
| Tool Name | Owner | Tool Revision | Toolshed Installable Revision | Toolshed |
|:--------------------------:|:--------:|:--------------:|:-----------------------------:|:--------------------:|
| **shovill** | iuc | [57d5928f456e] | 1 (2018-03-07) | [Galaxy Main Shed][] |
| **prokka** | crs4 | [eaee459f3d69] | 14 (2018-03-28) | [Galaxy Main Shed][] |
| **quast** | iuc | [0834c823d4b9] | 4 (2018-02-12) | [Galaxy Main Shed][] |
| Tool Name | Owner | Tool Revision | Toolshed Installable Revision | Toolshed |
|:-------------------------:|:--------:|:-------------:|:-----------------------------:|:--------------------:|
| **flash** | irida | 4287dd541327 | 0 (2015-05-05) | [IRIDA Toolshed][] |
| **filter_spades_repeats** | irida | f9fc830fa47c | 0 (2015-05-05) | [IRIDA Toolshed][] |
| **assemblystats** | irida | 51b76a5d78a5 | 1 (2015-05-07) | [IRIDA Toolshed][] |
| **spades** | nml | 35cb17bd8bf9 | 4 (2016-08-08) | [Galaxy Main Shed][] |
| **prokka** | crs4 | f5e44aad6498 | 7 (2015-10-01) | [Galaxy Main Shed][] |
| **regex_find_replace** | jjohnson | 9ea374bb0350 | 0 (2014-03-29) | [Galaxy Main Shed][] |
To install these tools please proceed through the following steps.
## Step 1: Install Dependencies
## Step 1: Galaxy Conda Setup
Galaxy makes use of [Conda][conda] to automatically install some dependencies for this workflow. Please verify that the version of Galaxy is >= v16.01 and has been setup to use conda (by modifying the appropriate configuration settings, see [here][galaxy-config] for additional details). A method to get this workflow to work with a Galaxy version < v16.01 is available in [FAQ/Conda dependencies][].
### Address shovill related issues
#### Error 256 from running `kmc`/`samtools`
You will need to install the correct versions of some dependencies for `kmc`/`samtools` so after installing `shovill`:
```bash
# activate the Galaxy shovill conda env
source galaxy/deps/_conda/bin/activate galaxy/deps/_conda/envs/__shovill@0.9.0
# install ncurses and bzip2 from conda-forge channel
conda install -c conda-forge ncurses bzip2
```
#### [PILON] Java/JVM heap allocation issues
Some of these tools require additional dependencies to be installed. For a cluster environment please make sure these are available on all cluster nodes by installing to a shared directory. This can be done with conda (assuming Galaxy is configured to load up the environment `galaxy` for each tool execution using the `env.sh` file).
[PILON] is a Java application and may require the JVM heap size to be set (e.g. `_JAVA_OPTIONS=-Xmx4g`).
If [shovill] under Galaxy submits jobs to a [SLURM] workload manager, it may be necessary to allot about 4G more through SLURM than through [shovill] `--ram` (default is `${SHOVILL_RAM:-4}` or 4G as of tool revision [57d5928f456e]) so if you give [shovill] 4G, give the SLURM job 8G.
One way you can adjust the `$SHOVILL_RAM` environment variable is via the [conda environment][]. That is, if you find the conda environment containing `shovill` you can set up files in `etc/conda/activate.d` and `etc/conda/deactivate.d` to set environment variables.
```bash
source activate galaxy
conda install perl-xml-simple perl-time-piece perl-bioperl openjdk gnuplot libjpeg-turbo
source deactivate
cd galaxy/deps/_conda/bin/activate galaxy/deps/_conda/envs/__shovill@0.9.0
mkdir -p etc/conda/activate.d
mkdir -p etc/conda/deactivate.d
echo -e "export _OLD_SHOVILL_RAM=\$SHOVILL_RAM\nexport SHOVILL_RAM=8" >> etc/conda/activate.d/shovill-ram.sh
echo -e "export SHOVILL_RAM=\$_OLD_SHOVILL_RAM" >> etc/conda/activate.d/shovill-ram.sh
```
You could also get fancier with this by setting `SHOVILL_RAM` based on [GALAXY_MEMORY_MB][], which is assigned by Galaxy based on your job configuration and resource requirements. For example, by setting `SHOVILL_RAM=$($GALAXY_MEMORY_MB/1024)`.
## Step 2: Install Galaxy Tools
Please install all the Galaxy tools in the table above by logging into Galaxy, navigating to **Admin > Search and browse tool sheds**, searching for the appropriate **Tool Name** and installing the appropriate **Toolshed Installable Revision**.
......@@ -42,6 +66,7 @@ The install progress can be checked by monitoring the Galaxy log files `galaxy/*
The assembly workflow makes use of the software [Prokka][] for genome annotation. Prokka makes use of [tbl2asn][], which has been programmed to stop working after 1 year from being built. The version of `tbl2asn` installed by default may have to be updated. Please see our [FAQ][] for more details.
## Step 3: Testing Pipeline
A Galaxy workflow and some test data has been included with this documentation to verify that all tools are installed correctly. To test this pipeline, please proceed through the following steps.
......@@ -56,23 +81,28 @@ A Galaxy workflow and some test data has been included with this documentation t
![dataset-pair-screen][]
4. This should have properly paired your data and named the sample **a**. Enter the name of this paired dataset collection at the bottom and click **Create list**.
5. Run the uploaded workflow by clicking on **Workflow**, clicking on the name of the workflow **FLASH, SPAdes and Prokka (imported from uploaded file)** and clicking **Run**. This should auto fill in the dataset collection. At the very bottom of the screen click **Run workflow**.
5. Run the uploaded workflow by clicking on **Workflow**, clicking on the name of the workflow **AssemblyAnnotation-shovill-prokka-quast-paired_reads-v0.5 (imported from uploaded file)** and clicking **Run**. This should auto fill in the dataset collection. At the very bottom of the screen click **Run workflow**.
6. If everything was installed correctly, you should see each of the tools run successfully (turn green). On completion this should look like.
![workflow-success][]
If you see any tool turn red, you can click on the view details icon ![view-details-icon][] for more information.
If everything was successfull then all dependencies for this pipeline have been properly installed.
If everything was successful then all dependencies for this pipeline have been properly installed.
[57d5928f456e]: https://toolshed.g2.bx.psu.edu/repos/iuc/shovill/rev/57d5928f456e
[eaee459f3d69]: https://toolshed.g2.bx.psu.edu/view/crs4/prokka/eaee459f3d69
[0834c823d4b9]: https://toolshed.g2.bx.psu.edu/view/iuc/quast/0834c823d4b9
[galaxy-config]: ../../setup#step-4-modify-configuration-file
[SLURM]: https://slurm.schedmd.com
[PILON]: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4237348/
[SPAdes]: http://bioinf.spbau.ru/spades
[shovill]: https://github.com/tseemann/shovill/
[Prokka]: http://www.vicbioinformatics.com/software.prokka.shtml
[QUAST]: http://quast.sourceforge.net/quast.html
[tbl2asn]: http://www.ncbi.nlm.nih.gov/genbank/tbl2asn2/
[Galaxy Main Shed]: http://toolshed.g2.bx.psu.edu/
[IRIDA Toolshed]: https://irida.corefacility.ca/galaxy-shed
[Java]: http://www.oracle.com/technetwork/java/javase/downloads/index.html
[gnuplot]: http://www.gnuplot.info/
[BioPerl]: http://www.bioperl.org/wiki/Main_Page
[Assembly Annotation Galaxy Workflow]: ../test/assembly-annotation/assembly-annotation.ga
[upload-icon]: ../test/snvphyl/images/upload-icon.jpg
[test/reads]: ../test/assembly-annotation/reads
......@@ -82,3 +112,8 @@ If everything was successfull then all dependencies for this pipeline have been
[workflow-success]: ../test/assembly-annotation/images/workflow-success.png
[view-details-icon]: ../test/snvphyl/images/view-details-icon.jpg
[FAQ]: ../../../faq/#tbl2asn-out-of-date
[conda]: https://conda.io/docs/intro.html
[bioconda]: https://bioconda.github.io/
[FAQ/Conda dependencies]: ../../../faq#installing-conda-dependencies-in-galaxy-versions--v1601
[conda environment]: https://conda.io/docs/user-guide/tasks/manage-environments.html#saving-environment-variables
[GALAXY_MEMORY_MB]: https://planemo.readthedocs.io/en/latest/writing_advanced.html#developing-for-clusters-galaxy-slots-galaxy-memory-mb-and-galaxy-memory-mb-per-slot
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment