Commit 342d1ae0 authored by Aaron Petkau's avatar Aaron Petkau

Started adding additional benchmarking details

parent 6dfca86f
......@@ -2,8 +2,32 @@
From <>.
Reference genome <>.
190 Illumina reads in `1-pneumo.tsv`.
189 Illumina reads in `1-pneumo.tsv`.
perl fastq-1-pneumo 12 < 1-pneumo.tsv
# Rename _3.fastq to _2.fastq
prename 's/_3.fastq/_2.fastq/' *_3.fastq
# Estimate coverages
(for i in fastq-1-pneumo/*_1.fastq; do name=`basename $i _1.fastq`; forward=`sed -n 2~4p fastq-1-pneumo/${name}_1.fastq|tr -d '\n'|wc -c`; reverse=`sed -n 2~4p fastq-1-pneumo/${name}_2.fastq|tr -d '\n'|wc -c`; ref=`bp_seq_length references/CP002176.fasta | cut -d ' ' -f 2| tr -d '\n'`; cov=`echo "($forward+$reverse)/$ref"|bc -l`; echo -e "$name\t$forward\t$reverse\t$ref\t$cov"; done) | sort -k 5,5n | tee coverages.txt
# Run SNVPhyl
## Case 2 SNVs in 500 bp
### Docker --deploy-docker --fastq-dir fastq-1-pneumo/ --reference-file references/CP002176.fasta --output-dir output-1-pneumo-docker-3 --min-coverage 20
### Cluster --galaxy-url [URL] --galaxy-api-key [KEY] --fastq-history-name fastq-1-pneumo --reference-file references/CP002176.fasta --output-dir output-1-pneumo-waffles-no-filter --min-coverage 20
## Case no filtering
### Cluster --galaxy-url [URL] --galaxy-api-key [KEY] --fastq-history-name fastq-1-pneumo --reference-file references/CP002176.fasta --output-dir output-1-pneumo-waffles-no-filter --min-coverage 20 --filter-density-window 5 --filter-density-threshold 10
# Swap 'Reference' for '670-6B' for comparison with microreact tree
sed -i -e 's/reference/670-6B /' [tree]
......@@ -14,8 +14,23 @@ The datasets from the [SNVPhyl manuscript][] were run on a single machine using
The **Simulated data** case was run using a set of simulated reads through SNVPhyl, based off of *E. coli* str. Sakai (NC_002695) and two plasmids (NC_002128 and NC_002127). The other two cases were run with real-world data. The **SNV density filtering** case was run using a set of 11 *Streptococcus pneumoniae* genomes through SNVPhyl, in particular the runtime presented was recorded when no SNV density filtering was applied. The **_Salmonella_ Heidelberg** case was run using a set of 59 *Salmonella* Heidelberg genomes, and in particular the runtime presented corresponds to the case of using a minimum coverage threshold of 10X while keeping all other parameters at default values.
The machine used to run each of these cases was an Intel Xeon CPU @ 3.33 GHz, 16 cores, and 24 GB of memory. More details on the methods can be found in the [SNVPhyl manuscript][], or in [snvphyl-validations][] github project.
## 189 *Streptococcus pneumoniae* genomes
For this scenario, we ran 189 *Streptococcus pneumoniae* genomes, published in <>, under a number of different parameter settings. We also provide a comparison of the SNVPhyl-produced phylogenetic tree with the tree available on [Microreact][] for this same dataset <>.
| Case | SNVs used for phylogenetic tree | Runtime with docker (hours) | Runtime on cluster (hours) | Phylogenetic tree comparison |
| SNVPhyl (2 SNVs in 500 bp) | 800 | 8.04 | 2.33 | <> |
| SNVPhyl (no density filtering) | 20,185 | | 5.09 | <> |
We also extract out the SNVs identified by SNVPhyl from the `snvTable.tsv` file to an alignment with both polymorphic and monomorphic SNVs, which was then ran through Gubbins (with default parameters). We note that this gives phylogenetic trees most closely matching the tree available on Microreact.
| Case | Gubbins runtime (minutes) | Phylogenetic tree comparison |
| SNVs without density filtering | | <> |
[docker version of SNVPhyl]: ../install/docker
[SNVPhyl manuscript]:
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment