Commit 2827b28a authored by Aaron Petkau's avatar Aaron Petkau

Better benchmarking info

parent 89ddf6bd
#!/bin/sh
if [ $# -ne 4 ]
then
echo "Usage: $0 [fastq_dir] [reference] [output_dir] [snvphyl_log]"
exit 1
fi
fastq_dir=$1
reference=$2
output_dir=$3
snvphyl_log=$4
echo "Disk before docker - `date`"
du -sh /var/lib/docker/
echo "Start docker"
docker_id=`docker run -d -p 48888:80 -v $fastq_dir:$fastq_dir phacnml/snvphyl-galaxy-1.0.1:1.0.1b | tr -d '\n'`
echo -n "Waiting 60s for docker to start..."
sleep 60
echo "started"
echo "Disk before SNVPhyl - `date`"
du -sh /var/lib/docker/
echo "Memory before SNVPhyl - `date`"
cat /sys/fs/cgroup/memory/docker/$docker_id/memory.stat
/home/aaron/software/snvphyl-galaxy-cli/bin/snvphyl.py --galaxy-url http://localhost:48888 --galaxy-api-key admin --fastq-dir $fastq_dir --reference-file $reference --fastq-files-as-links --output-dir $output_dir > $snvphyl_log &
while [ "`pgrep -f snvphyl.py`" != "" ]
do
echo "Memory during SNVPhyl - `date`"
cat /sys/fs/cgroup/memory/docker/$docker_id/memory.stat
sleep 5
done
echo "Peak memory usage after SNVPhyl - `date`"
cat /sys/fs/cgroup/memory/docker/$docker_id/memory.max_usage_in_bytes
echo "Disk usage after SNVPhyl - `date`"
du -sh /var/lib/docker/
docker rm -f -v $docker_id
......@@ -6,11 +6,14 @@ A number of datasets have been used to benchmark the runtime of SNVPhyl across a
The datasets from the [SNVPhyl manuscript][] were run on a single machine using the Docker version of the pipeline. The following table presents the run times (to go from sequence reads to a phylogeny) and data sizes of each case.
| Case | Number of genomes | Total size of reads (GB) | Runtime (min) |
|:-----------------------:|:-----------------:|:------------------------:|:-------------:|
| Simulated data | 4 | 1.4 | 15.6 |
| SNV density filtering | 11 | 13 | 25.25 |
| *Salmonella* Heidelberg | 59 | 40 | 159 |
| Case | Number of genomes | Total size of reads (GB) | Runtime (hours) | Peak RSS (GB) | Peak Memory (GB) | Temporary Disk Space (GB) |
|:--------------------------:|:-----------------:|:------------------------:|:---------------:|:-------------:|:----------------:|:-------------------------:|
| Docker no data | - | - | - | 0.662 | | 2.4 |
| Simulated data | 4 | 1.4 | 0.261 | 3.04 | 9.90 | 6.8 |
| SNV density filtering | 11 | 13 | 0.439 | 4.18 | 14.1 | 9.6 |
| *Salmonella* Heidelberg | 59 | 40 | 3.04 | 4.07 | 21.4 | 66.6 |
| *Streptococcus pneumoniae* | 189 | 169 | 8.04 |
The **Simulated data** case was run using a set of simulated reads through SNVPhyl, based off of *E. coli* str. Sakai (NC_002695) and two plasmids (NC_002128 and NC_002127). The other two cases were run with real-world data. The **SNV density filtering** case was run using a set of 11 *Streptococcus pneumoniae* genomes through SNVPhyl, in particular the runtime presented was recorded when no SNV density filtering was applied. The **_Salmonella_ Heidelberg** case was run using a set of 59 *Salmonella* Heidelberg genomes, and in particular the runtime presented corresponds to the case of using a minimum coverage threshold of 10X while keeping all other parameters at default values.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment