Commit 1c4a7908 authored by Thomas Matthews's avatar Thomas Matthews

Initial commit of irida-linker

parents
Pipeline #656 skipped
# NGS Archive Linker
## Overview
The NGS Archive Linker is a Perl script used to generate a structure of links for files stored in the NGS archive. You are able to get links to the files in an entire project, or to specific samples within a project.
## Install
* *Note*: The install script requires cpanm for Perl module installation and pip for Python lib installation.
Run the script install/install.pl. This script will install a number of Perl modules to the lib/ directory, install a number of Python modules to the python-deps/ directory, and install a configuration file in your home directory.
## Link Structure
Projects and samples in the NGS Archive are stored with the assumption that a sample resides within a project. To represent this structure on the filesystem, links are generated in the following fashion:
[output_directory]/[project_name]/[sample_name]/[file_link.fastq]
Example: A project (Project 5) containing multiple samples (Sample 1, Sample 2,Sample 3) and 2 files per sample would be represented as follows:
output/
Project 5/
Sample 1/
f1_1.fastq
f1_2.fastq
Sample 2/
f2_1.fastq
f2_2.fastq
Sample 3/
f3_1.fastq
f3_2.fastq
A user is able to use the same output directory for multiple project links. The new project directory will be created in the root output directory.
## Running NGS Archive Linker
### Arguments
* -p, --projectId [ARG]
> The ID of the project to get data from. (required)
* -o, --output [ARG]
> A directory to output the collection of links. (Default: Current working directory)
* -c, --config [ARG]
> The location of the config file. Not required if --baseURL option is used. (Default $HOME/.irida/ngs-archive-linker.conf, /etc/irida/ngs-archive.conf)
* -b, --baseURL [ARG]
> The base URL for the NGS Archive REST API. Overrides config file setting.
* -s, --sample [ARG]
> A sample id to get sequence files for. Not required. Multiple samples may be listed as -s 1 -s 2 -s 3...
* -i, --ignore
> Ignore creating links for files that already exist.
* -r, --rename
> Rename existing files with _<number> suffix. Useful for topup runs with similar filenames. NOTE: This option overrides the --ignore option.
* --flat
> Create links or files in a flat directory under the project name rather than in sample directories.
* --username
> The username to use for API requests. Note: if this option is not entered it will be requested during running of the script.
* --password
>The password to use for API requests. Note: if this option is not entered it will be requested during running of the script.
* --download
> Option to download files from the REST API instead of softlinking. Note: Files may be quite large. This option is not recommended if you have access to the sequencing filesystem.
* -v, --verbose
> Print verbose messages.
* -h, --help
> Display a help message.
### Usage Examples
#### Linking all files in a project
To get links for all files within a project, you only need to provide the project ID to NGS Archive linker. The linker will request the list of samples from the REST API to determine which samples it must retrieve.
Example -- Linking all samples for project *4* to directory *files*:
$ ngsArchiveLinker.pl --baseURL http://irida.ca/api --project 4 --output files
Enter username: test
Enter password:
Listing all samples from project 4
Created 18 files for 9 samples in files/4
#### Linking selected samples within a project
To get links for particular samples within a project, you must provide the project ID and the sample IDs you would like to get links for.
Example -- Linking samples 44, 45, and 46 for project *4* to directory *files*:
$ ngsArchiveLinker.pl -b http://irida.ca/api --project 4 --sample 44 --sample 45 --sample 46 --output files
Enter username: test
Enter password:
Reading samples 44,45,46 from project 4
Created 6 files for 3 samples in files/4
#### Getting new links for an already existing project
To get links for a project that already exists on the filesystem, you can use the **--ignore** option. This will skip over files and samples that have already been linked and only create links for the new samples.
Example -- 7 samples already exist. Retrieve rest of new samples from project 4:
$ ngsArchiveLinker.pl -b http://irida.ca/api --project 4 --output files --ignore
Enter username: test
Enter password:
Listing all samples from project 4
Created 4 files for 9 samples in files/4
Skipped 14 files as they already exist
#### Downloading files
Downloading files rather than linking can be acheived by using the **--download** option. Arguments for other usages remain the same.
Example -- Download samples 43 and 51 from project *4* to directory *files*:
$ ngsArchiveLinker.pl -b http://irida.ca/api --project 4 --sample 43 --sample 51 --output files --download
Enter username: test
Enter password:
Reading samples 43,51 from project 4
** GET http://irida.ca/api/projects/4/samples/51/sequenceFiles/32 ==> 200 OK (11s)
** GET http://irida.ca/api/projects/4/samples/51/sequenceFiles/37 ==> 200 OK (10s)
** GET http://irida.ca/api/projects/4/samples/43/sequenceFiles/31 ==> 200 OK (11s)
** GET http://irida.ca/api/projects/4/samples/43/sequenceFiles/43 ==> 200 OK (11s)
Created 4 files for 2 samples in files/4
Note: Downloading files is not recommended if your computer has access to the NGS Archive filesystem as sequence files can be large.
## Errors
* Error: File files/4/46/f1_1.fastq already exists
> A file that the linker is trying to create already exists on your local filesystem. It must be removed to be re-linked. If you would like to ignore existing files and only link new files, use the **--ignore** option.
* Error: Server returned internal server error. You may have used an incorrect URL for the API.
> The server returned a HTTP 500 status message. This may mean that you mistyped the NGS Archive REST API base URL (-b or --baseURL option). Check the address and try again.
* Error: This user does not have access to the resource at http://irida.ca/api/...
> The user you used in the application doesn't have access to the files in the NGS Archive REST API. Talk to the project manager to see if you can be added to the requested project.
* Error: Requested resource wasn't found at http://irida.ca/api/...
> The sample or project that you requested does not exist in the NGS Archive REST API. Check your options for the project id (-p or --project) and sample id (-s or --sample) and try again.
#!/usr/bin/perl
use FindBin;
use strict;
use warnings;
use constant DEFAULT_REST_URL=>"http://ngs-archive.corefacility.ca/irida-api";
use constant DEFAULT_GALAXY_URL=>"http://galaxy.corefacility.ca/";
use constant DEFAULT_GALAXY_APIKEY=>"None";
#Install PERL packages
my @requiredPackages = ("LWP::UserAgent","LWP::Simple","MIME::Base64","JSON",
"Getopt::Long","Pod::Usage","File::Path","File::Basename",
"Term::ReadKey","HTTP::Status","Config::Simple","OAuth::Lite2::Client::UsernameAndPassword");
my $binLoc = $FindBin::Bin;
my $defaultLib = "$binLoc/../lib/";
my $libDir = textOption("Perl library location?",$defaultLib);
if(!-d $libDir){
die "Library path $libDir is not a valid directory!";
}
print "Using $libDir as library\n";
foreach my $pack(@requiredPackages){
##INSTALL Config::Simple
eval("use $pack");
if($@){
my $ret = option("Package $pack is not installed. Would you like to try to install it using cpanm?","y");
if($ret eq "y"){
my $cmd = "cpanm -L $libDir $pack";
print "Running command: $cmd\n";
system($cmd);
}
}
}
##Install python libs
my $pythonDepsDir = "$binLoc/../python-deps/";
my $ret = option("Install required python libs into $pythonDepsDir?", "y");
if($ret eq "y"){
print "Installing required python libs into $pythonDepsDir\n";
my $cmd = "export PYTHONUSERBASE=$pythonDepsDir; pip install --user -r $binLoc/requirements.txt";
print "Running command: $cmd\n";
system($cmd);
}
##Install config file
my $iridaDir = $ENV{HOME}."/.irida/";
my $confFile = "$iridaDir/ngs-archive-linker.conf";
$ret = option("Install config file to $confFile?","y");
if($ret eq "y"){
if(!-d $iridaDir){
print "Creating directory $iridaDir\n";
mkdir($iridaDir);
}
open(OUT, ">" , $confFile) or die "Couldn't open file $confFile";
my $ngsloc = textOption("REST API location?",DEFAULT_REST_URL);
print "Setting base URL as $ngsloc in $confFile\n";
print OUT "[apiurls]\n";
print OUT "BASEURL=$ngsloc\n";
my $galaxyurl = textOption("Galaxy URL location?",DEFAULT_GALAXY_URL);
print "Setting Galaxy URL as $galaxyurl in $confFile\n";
print OUT "GALAXYURL=$galaxyurl\n";
my $galaxyapikey = textOption("Galaxy api key?",DEFAULT_GALAXY_APIKEY);
print "Setting Galaxy URL as $galaxyapikey in $confFile\n";
print OUT "[credentials]\n";
if ( $galaxyapikey eq 'None') {
print OUT "#GALAXYAPIKEY=VALUE\n";
}
else {
print OUT "GALAXYAPIKEY=$galaxyapikey\n";
}
close OUT;
}
sub textOption{
my $message = shift;
my $default = shift;
print "$message [$default] ";
my $input = <>;
chomp $input;
my $return = $default;
if($input ne ""){
$return = $input;
}
return $return;
}
sub option{
my $msg = shift;
my $default = shift;
if(lc($default) eq "y"){
$msg .= " [Y,n] ";
}
else{
$msg .= " [y,N] ";
}
print $msg;
my $input;
my $valid = 0;
do{
$input = <>;
chomp $input;
$input = lc($input);
if($input eq ""){
$input = $default;
}
if($input eq "y" or $input eq "n"){
$valid = 1;
}
else{
print "Invalid entry: $input\n";
}
}
while(!$valid);
return $input;
}
[apiurls]
BASEURL=http://localhost:8080
# Ignore everything in this directory
*
# Except this file
!.gitignore
.\" Automatically generated by Pod::Man 2.25 (Pod::Simple 3.16)
.\"
.\" Standard preamble:
.\" ========================================================================
.de Sp \" Vertical space (when we can't use .PP)
.if t .sp .5v
.if n .sp
..
.de Vb \" Begin verbatim text
.ft CW
.nf
.ne \\$1
..
.de Ve \" End verbatim text
.ft R
.fi
..
.\" Set up some character translations and predefined strings. \*(-- will
.\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left
.\" double quote, and \*(R" will give a right double quote. \*(C+ will
.\" give a nicer C++. Capital omega is used to do unbreakable dashes and
.\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff,
.\" nothing in troff, for use with C<>.
.tr \(*W-
.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p'
.ie n \{\
. ds -- \(*W-
. ds PI pi
. if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch
. if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch
. ds L" ""
. ds R" ""
. ds C` ""
. ds C' ""
'br\}
.el\{\
. ds -- \|\(em\|
. ds PI \(*p
. ds L" ``
. ds R" ''
'br\}
.\"
.\" Escape single quotes in literal strings from groff's Unicode transform.
.ie \n(.g .ds Aq \(aq
.el .ds Aq '
.\"
.\" If the F register is turned on, we'll generate index entries on stderr for
.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
.\" entries marked with X<> in POD. Of course, you'll have to process the
.\" output yourself in some meaningful fashion.
.ie \nF \{\
. de IX
. tm Index:\\$1\t\\n%\t"\\$2"
..
. nr % 0
. rr F
.\}
.el \{\
. de IX
..
.\}
.\"
.\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2).
.\" Fear. Run. Save yourself. No user-serviceable parts.
. \" fudge factors for nroff and troff
.if n \{\
. ds #H 0
. ds #V .8m
. ds #F .3m
. ds #[ \f1
. ds #] \fP
.\}
.if t \{\
. ds #H ((1u-(\\\\n(.fu%2u))*.13m)
. ds #V .6m
. ds #F 0
. ds #[ \&
. ds #] \&
.\}
. \" simple accents for nroff and troff
.if n \{\
. ds ' \&
. ds ` \&
. ds ^ \&
. ds , \&
. ds ~ ~
. ds /
.\}
.if t \{\
. ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u"
. ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u'
. ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u'
. ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u'
. ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u'
. ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u'
.\}
. \" troff and (daisy-wheel) nroff accents
.ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V'
.ds 8 \h'\*(#H'\(*b\h'-\*(#H'
.ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#]
.ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H'
.ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u'
.ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#]
.ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#]
.ds ae a\h'-(\w'a'u*4/10)'e
.ds Ae A\h'-(\w'A'u*4/10)'E
. \" corrections for vroff
.if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u'
.if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u'
. \" for low resolution devices (crt and lpr)
.if \n(.H>23 .if \n(.V>19 \
\{\
. ds : e
. ds 8 ss
. ds o a
. ds d- d\h'-1'\(ga
. ds D- D\h'-1'\(hy
. ds th \o'bp'
. ds Th \o'LP'
. ds ae ae
. ds Ae AE
.\}
.rm #[ #] #H #V #F C
.\" ========================================================================
.\"
.IX Title "NGSARCHIVELINKER 1"
.TH NGSARCHIVELINKER 1 "2013-11-01" "perl v5.14.2" "User Contributed Perl Documentation"
.\" For nroff, turn off justification. Always turn off hyphenation; it makes
.\" way too many mistakes in technical documents.
.if n .ad l
.nh
.SH "NAME"
ngsArchiveLinker.pl \- Get links for files stored in the NGS archive
.SH "SYNOPSIS"
.IX Header "SYNOPSIS"
ngsArchiveLinker.pl \-b <\s-1API\s0 \s-1URL\s0> \-p <projectId> \-d <outputDirectory> [\-s <sampleId> ...]
.SH "DESCRIPTION"
.IX Header "DESCRIPTION"
\&\fBngsArchiveLiner.pl\fR allows users to work with their files from the \s-1NGS\s0 Archive without having to copy the large sequencing files to their machines. It is a Perl script used to generate a structure of links for files stored in the \s-1NGS\s0 archive. You are able to get links to the files in an entire project, or to specific samples within a project.
.SH "OPTIONS"
.IX Header "OPTIONS"
.IP "\fB\-p, \-\-projectId [\s-1ARG\s0]\fR" 8
.IX Item "-p, --projectId [ARG]"
The \s-1ID\s0 of the project to get data from. (required)
.IP "\fB\-o, \-\-output [\s-1ARG\s0]\fR" 8
.IX Item "-o, --output [ARG]"
A directory to output the collection of links. (required)
.IP "\fB\-c, \-\-config [\s-1ARG\s0]\fR" 8
.IX Item "-c, --config [ARG]"
The location of the config file. Not required if \-\-baseURL option is used. (Default: \f(CW$HOME\fR/.irida/ngs\-archive\-linker.conf, /etc/irida/ngs\-archive\-linker.conf)
.IP "\fB\-b, \-\-baseURL [\s-1ARG\s0]\fR" 8
.IX Item "-b, --baseURL [ARG]"
The base \s-1URL\s0 for the \s-1NGS\s0 Archive \s-1REST\s0 \s-1API\s0. Overrides config file setting.
.IP "\fB\-s, \-\-sample [\s-1ARG\s0]\fR" 8
.IX Item "-s, --sample [ARG]"
A sample id to get sequence files for. Not required. Multiple samples may be listed as \-s 1 \-s 2 \-s 3...
.IP "\fB\-i, \-\-ignore\fR" 8
.IX Item "-i, --ignore"
Ignore creating links for files that already exist.
.IP "\fB\-r, \-\-rename\fR" 8
.IX Item "-r, --rename"
Rename existing files with _# suffix. Useful for topup runs with similar filenames. \s-1NOTE:\s0 This option overrides the \-\-ignore option.
.IP "\fB\-\-username\fR" 8
.IX Item "--username"
The username to use for \s-1API\s0 requests.
Note: if this option is not entered it will be requested during running of the script.
.IP "\fB\-\-password\fR" 8
.IX Item "--password"
The password to use for \s-1API\s0 requests.
Note: if this option is not entered it will be requested during running of the script.
.IP "\fB\-\-download\fR" 8
.IX Item "--download"
Option to download files from the \s-1REST\s0 \s-1API\s0 instead of softlinking. Note: Files may be quite large. This option is not recommended if you have access to the sequencing filesystem.
.IP "\fB\-v, \-\-verbose\fR" 8
.IX Item "-v, --verbose"
Print verbose messages.
.IP "\fB\-h, \-\-help\fR" 8
.IX Item "-h, --help"
Display a help message.
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment