...
 
Commits (28)
......@@ -10,5 +10,6 @@ build/
.virtualenv
.cache
.idea/
site/
.python-version
.pytest_cache/
1.7.0 to 2.0.0
==============
* The UI is completely rewritten to better show what's happened with samples as they're uploaded, and to facilitate quality control implementation later.
* Add an experimental auto-upload feature to automatically upload new runs as the sequencer finishes a run (monitoring run directories for `CompletedJobInfo.xml`).
* Use the sample name and ID columns correctly in `SampleSheet.csv`: the sample name column is used for naming `.fastq.gz` files, the sample ID column is used for sending data to IRIDA.
* Discard the cache of projects whenever sending data to the server.
1.5.0 to 1.6.0
==============
* Add an about dialog to show the version number in the UI.
* Changed the way that update notifications are shown: don't have an open in browser button, just create a link.
* Changed the installer to uninstall the previous version before installing the new version (user settings are preserved).
* Use the `pubsub` built into wxPython rather than pulling in an external version.
1.4.0 to 1.5.0
==============
* Changed the `Makefile` so that the uploader can be hacked on in other Linux distros, specifically Arch (thanks to @eric.enns)
* Fixed a UI issue where upload labels were overlapping in some cases (thanks to @eric.enns)
* Fixed an issue where the uploader found files that had `SampleSheet.csv` *anywhere* in the filename (i.e., `old_SampleSheet.csv`), now it matches **exactly** `SampleSheet.csv`.
* Changed the label `Upload speed` to `Average upload speed` to more accurately reflect the value we're reporting.
* Added support for resuming uploads on failure. If the uploader fails when uploading a run, it will now skip any files that were already uploaded.
* Added better error reporting when the uploader can't find the files for the sample by including the directory to help find issues.
1.4.0 to 1.4.1
==============
* Fix a performance regression when scanning directories for fastq files (thanks to @eric.enns)
1.3.0 to 1.4.0
==============
* Use [pytest](https://www.pytest.org) for testing instead of the custom testing framework build arount `unittest`.
* Add a feature to check the github repository for new releases.
* Add a menu File > Exit menu for familiarity.
* Use a wx widget for directory selection instead of building our own.
* Sample sheet parsing happens with a state machine instead of guessing sections based on number of columns.
* Extensive refactoring to move application functionality out of the GUI layer.
1.2.1 to 1.3.0
==============
* Added caching for some responses from the server so that we don't have to do a round-trip for every sample.
* Added default directory setting to settings panel.
* Limiting depth of directory scanning to 2 levels.
1.2.0 to 1.2.1
==============
* Fixed a bug where empty fields in the header metadata section were being incorrectly parsed as read lengths.
1.1.0 to 1.2.0
==============
* Simplify searching for `SampleSheet.csv` using `os.walk`. Fixes a bug where Windows 7 junction points were being followed, and the OS was reporting permission denied errors on the junction points. This came from auto-scanning the default directory, and the default directory defaulting to the user's home directory.
* Add a license!
* Fix an issue where samples that have a common prefix would upload the wrong files for each sample (i.e., `Sample1`, `Sample11`, `Sample111` would all upload files for `Sample111`).
1.0.1 to 1.1.0
==============
* Fixed some bugs with trailing Windows newlines and trailing commas that get added to header information by Excel.
* Fixed a bug with the settings dialog where when focus was lost on a field the display would get messed up.
* Added a feature to automatically scan the default directory when the app starts up.
* Upgraded to wxWidgets 3 series, and fixed a bug where selecting directories included the drive label.
1.0.0 to 1.0.1
==============
* Changed the installer to use `pynsist` instead of by-hand construction in Windows with NSIS.
......@@ -26,4 +26,9 @@ test: clean requirements
source .virtualenv/bin/activate
xvfb-run --auto-servernum --server-num=1 py.test --integration --irida-version=$(IRIDA_VERSION)
docs: requirements
source .virtualenv/bin/activate
mkdocs build
deactivate
.ONESHELL:
......@@ -2,72 +2,50 @@ IRIDA Uploader
==============
Windows Installation
Download / Installation
--------------------
Download the installer from https://github.com/phac-nml/irida-miseq-uploader/releases
Installation instructions can be found in our documentation.
Running in Linux
----------------
[ReadTheDocs](todo:link to read the docs needs to go here)
Install pip and wxpython:
$ sudo apt-get install python-pip python-wxgtk3.0
### virtualenv usage
Install virtualenv and setuptools
Creating the Windows installer from source code
------------------------------
$ pip install virtualenv
$ pip install setuptools
A new windows installer can be built on linux, so first see the installation instructions for installing on linux in our documentation.
If you already have these packages installed, ensure they are up to date
You will also need `nsis` installed to create the windows installer.
$ pip install virtualenv -U
$ pip install setuptools -U
$ sudo apt install nsis
Build a virtualenv and install the dependencies:
Then run the command:
$ git clone https://github.com/phac-nml/irida-miseq-uploader
$ cd irida-miseq-uploader
$ make requirements
$ source .virtualenv/bin/activate
$ make windows
This will create a new installer in the folder `build/nsis/` with a name similar to `IRIDA_Uploader_1.0.exe`
You can then run the uploader by running:
Running Tests
-------------
$ ./run_IRIDA_Uploader.py
You can verify PEP8 conformity by running:
Deactivate when finished:
$ ./scripts/verifyPEP8.sh
$ deactivate
Note: No output is produced (other than `pip`-related output) if the PEP8 verification succeeds.
Creating the Windows installer
Documentation
------------------------------
Documentation is built by `mkdocs`.
### Requirements
You must install several packages to build the Windows installer:
sudo apt-get install innoextract nsis python-pip python-virtualenv
It can be built locally with:
### Building the Windows installer
$ make docs
From inside the `irida-miseq-uploader` directory, you can simply run:
Or you can install mkdocs to your system:
make windows
$ sudo apt install mkdocs
$ mkdocs build
This will build a Windows installer inside the `build/nsis/` directory, named something like `IRIDA_Uploader_1.0.0.exe`.
HTML docs will be generated to `site/` for local browsing
Running Tests
-------------
You can run all tests (unit and integration) by running:
$ echo "grant all privileges on irida_uploader_test.* to 'test'@'localhost' identified by 'test';" | mysql -u mysql_user -p
$ make test
You can verify PEP8 conformity by running:
$ ./scripts/verifyPEP8.sh
Note: No output is produced (other than `pip`-related output) if the PEP8 verification succeeds.
TODO: Alternatively ReadTheDocs here. (once we have it hosted)
......@@ -735,8 +735,7 @@ class ApiCalls(object):
def get_seq_runs(self):
"""
Get list of pair files SequencingRuns
/api/sequencingRuns returns all SequencingRuns so this method
Get list of all SequencingRun objects
return list of SequencingRuns
"""
......@@ -840,7 +839,7 @@ class ApiCalls(object):
def project_exists(self, project_id):
"""
Check if a sample exists on a project
Check if a project exists
:param project_id: project that we are checking for existence
:return: True or False
......
## Configuration
To use the IRIDA uploader, you will need to create a client in IRIDA for the uploader.
Please refer to the ["Configure the Uploader" section on the IRIDA tutorials](https://irida.corefacility.ca/documentation/user/tutorials/uploader-tool/) for more information.
###File Location
You can create the config file yourself, or simply run the uploader for the first time to create a new config file with empty/default values.
You can find this file:
Linux: `~/.config/irida-uploader/config.conf`
Windows: `C:\Users\<Username>\AppData\Local\irida-uploader\config.conf`
###Options
The config file has the following fields:
* `client_id` : The id from the IRIDA client you created
* `client_secret` : The secret from the IRIDA client you created
* `username` : The user that will be accessing projects/samples, this user needs the `Sequencer` or `Administrator` role.
* `password` : Corresponding password for above user.
* `base_url` : The server URL is the location that the uploader should upload data to. If you navigate to your instance of IRIDA in your web browser, the URL (after you’ve logged in) will often look like: `https://irida.corefacility.ca/irida/`. The URL you should enter into the Server URL field is that URL, with `api/` at the end. So in the case of `https://irida.corefacility.ca/irida/`, you should enter the URL `https://irida.corefacility.ca/irida/api/`
* `parser` : Pick the parser that matches the file structure of your sequence files. We currently support [miseq](parsers/miseq.md) and [directory](parsers/directory.md).
###Example
```
[Settings]
client_id = uploader
client_secret = ZK1z6H165y4IZF2ckqNQES315OyKQU8CsrpHNdQr16
username = admin
password = password1
base_url = http://localhost:8080/irida-latest/api/
parser = miseq
```
This can also be found in the file `example_config.conf`
## Specify other config file
Alternatively, you can pass a config file to the command line uploader as an optional argument.
Use `-c` or `--config` and specify the path to your config file.
Example:
```
# Linux
$ ./irida-uploader.sh --config /path/to/config.conf /path/to/the/sequencing/run/
# Windows
C:\Users\username> iridauploader --config \path\to\config.conf \path\to\my\samples\
```
### Linux:
Use the the `irida-uploader.sh` script included with the source code to upload.
# The `api` Module
The `api` module is essentially a python wrapper for the [IRIDA REST API](https://irida.corefacility.ca/documentation/developer/rest/).
It is used in the IRIDA uploader to handle interaction between the uploader logic and IRIDA, but the module can be used on it's own to interact with IRIDA.
## Setup
The module can be used as follows:
```python
# import the module
import api
# Create an api instance by initializing an ApiCalls object
api_instance = api.ApiCalls(client_id, client_secret, base_url, username, password)
```
For more information on the arguments passed to `ApiCalls`, please see the [configuration documentation](../configuration.md)
## Use
### Getting Data from IRIDA
#### get_projects(self)
API call to api/projects to get list of projects
**returns:**
List containing projects. each project is Project object.
#### get_samples(self, project_id)
API call to api/projects/project_id/samples
**arguments:**
project_id -- project identifier from irida
**returns:**
list of samples for the given project.
Each sample is a Sample object.
#### get_sequence_files(self, project_id, sample_name)
API call to api/projects/project_id/sample_id/sequenceFiles
We fetch the sample file through the project id on this route
**arguments:**
sample_name -- the sample id to get from irida, relative to a project
project_id -- the id of the project the sample is on
**returns:**
list of sequencefile dictionary for given sample_id
### Sending Data to IRIDA
#### send_project(self, project, clear_cache=True)
post request to send a project to IRIDA via API
the project being sent requires a name that is at least
5 characters long
**arguments:**
project -- a Project object to be sent.
**returns:**
A dictionary containing the result of post request.
when post is successful the dictionary it returns will contain the same
name and projectDescription that was originally sent as well as
additional keys like createdDate and identifier.
when post fails then an error will be raised so return statement is
not even reached.
#### send_sample(self, sample, project_id)
Post request to send a sample to a project
**arguments:**
sample -- Sample object to send
project_id -- id of project to send sample too
**returns:**
Unmodified json response from server
#### send_sequence_files(self, sequence_file, sample_name, project_id, upload_id)
Post request to send sequence files found in given sample argument
raises error if either project ID or sample ID found in Sample object
doesn't exist in irida
**arguments:**
sample -- Sample object
upload_id -- the run to upload the files to
**returns:**
unmodified json response from server.
### Getting / Creating / Modifying Sequencing Runs
#### get_seq_runs(self)
Get list of all SequencingRun objects
**returns:**
list of SequencingRuns
#### create_seq_run(self, metadata)
Create a sequencing run.
uploadStatus "UPLOADING"
There are some parsed metadata keys from the SampleSheet.csv that are
currently not accepted/used by the API so they are discarded.
Everything not in the acceptable_properties list below is discarded.
**arguments:**
metadata -- SequencingRun's metadata
**returns:**
the sequencing run identifier for the sequencing run that was created
#### set_seq_run_complete(self, identifier)
Update a sequencing run's upload status to "COMPLETE"
**arguments:**
identifier -- the id of the sequencing run to be updated
**returns:**
unmodified json response from server
#### set_seq_run_uploading(self, identifier)
Update a sequencing run's upload status to "UPLOADING"
**arguments:**
identifier -- the id of the sequencing run to be updated
**returns:**
unmodified json response from server
#### set_seq_run_error(self, identifier)
Update a sequencing run's upload status to "ERROR"
**arguments:**
identifier -- the id of the sequencing run to be updated
**returns:**
unmodified json response from server
### Querying IRIDA
#### project_exists(self, project_id)
Check if a sample exists on a project
**arguments:**
project_id -- project that we are checking for existence
**returns:**
True or False
#### sample_exists(self, sample_name, project_id)
Check if a sample exists on a project
**arguments:**
sample_name -- sample to confirm existence of
project_id -- project that we think the sample is on
**returns:**
True or False
# Object Model
## Sequencing / IRIDA Objects
These objects are used to store the data of a sequencing run before uploading to IRIDA.
They each include a `uploadable_schema` which uses `cerberus` to define valid objects. Object validity is checked in `core/model_validator.py`, along with some extra edge case tests to ensure the built object model is ready for upload.
### SequencingRun `model/sequencing_run.py`
Each upload needs a single `SequencingRun` object that acts as the root for the tree of data.
It contains a `project_list` which relate to the IRIDA projects that samples will be uploaded to.
The `metadata` dict is mostly unused, but must include `layoutType` as either `PAIRED_END` or `SINGLE_END`, this determines if the samples within the sequencing run are paired end or single end reads.
### Project `model/project.py`
The `Project` object relates to a project on IRIDA.
The `id` field identifies what project number on IRIDA the samples are going to.
The `sample_list` list contains the `Sample` objects that will be uploaded.
When creating a new project on IRIDA using the API, a `name` at least 5 characters long must be given.
### Sample `model/sample.py`
The `Sample` object includes:
* `name` : How the sample is identified on IRIDA
* `description` : the description of the sample on IRIDA
* `sequence_file` : A `SequenceFile` object that holds the files to upload.
* `sample_dict` : a meta data dictionary
When using the API to get samples from IRIDA, the `get_irida_id` method can be used to get the samples numerical identification number.
### SequenceFile `model/sequence_file.py`
The `SeuqenceFile` holds the file paths to the sequence data to be uploaded.
It has a `file_list` list that can hold multiple files. Currently IRIDA only supports single end and paired end files (1 or 2 files) for upload.
It also includes a `properties_dict` that is used to store meta data, and is filled with required values by the API or Parsers before uploading.
## Other Objects
### DirectoryStatus `model/directory_status.py`
`DirectoryStatus` objects contain a `directory`, a `status` and a `message`
The `progress` modules makes use of them.
These are used when deciding if a run has been uploaded, partially uploaded, is a new run, or is invalid.
The `directory` field holds the path to the directory of some potential sequencing run.
`"new"`, `"partial"`, `"complete"`, and `"invalid"` are valid in the `status` field.
The `message` field is only filled if a run is `"invalid"`, and contains information on why a run is invalid.
### ValidationResult `model/validation_result.py`
`ValidationResult` objects contain an `error_list` with multiple errors. These are used to collect multiple `ModelValidationErrors` together so that all the validation issues can be seen at once.
## Exceptions
###ModelValidationError `model/exceptions/model_validation_result.py`
When validating the object model, A `ModelValidationError` will be thrown. It contains a `message` with information on the issue, and an `object` that holds the invalid object.
\ No newline at end of file
## Creating a new parser
Browsing through the `directory` and `miseq` parsers is the easiest way to get a feel for what is needed in a new parser, but below are explanations and information on what the hard requirements are.
### Create required files
Start by creating a new folder in `parser/`, for example `my-parser`
in `parser/my-parser` create your main python parser file `parser.py`
Create a `__init__.py` in `parser/my-parser` with the line:
```
from .parser import Parser
```
This is so the parser will be able to be grabbed by the `parser` module.
### Required functions for `parser.py`
####`find_runs(directory)` :
Given a directory, returns a list of `DirectoryStatus` objects for each directory in it.
This function should make use of `progress.get_directory_status(...)` to generate the DirectoryStatus objects.
This function should raise `exceptions.DirectoryError` if the directory is inaccessible.
####`find_single_run(directory)` :
Finds a run in the given directory. Returns a single `DirectoryStatus` object.
This function should raise `exceptions.DirectoryError` if the directory is inaccessible.
####`get_sample_sheet(directory)` :
Given a Directory, returns the path to the sample sheet or equivalent file.
This function should raise `exceptions.DirectoryError` when a directory is inaccessible, or if the sample sheet file is missing.
####`get_sequencing_run(sample_sheet)` :
Given the `sample_sheet` path, from the `get_sample_sheet` function, Creates a `SequencingRun` object with correct structure. [See Object Model Documentation](objects.md)
This function should make use of `exceptions.ValidationError` when errors occur so the user can be informed of problems with their sample sheet / samples.
`ValidationError` includes a `ValidationResult` object that can hold multiple errors. Include all errors encountered during parsing and building of the sequence run to give the user as much information as possible.
### Allow project to be grabbed by the uploader
Edit the file `parser/parser.py`
Edit the line
```
from . import directory, miseq
```
and add your new parser
```
from . import directory, miseq, my-parser
```
in the `factory(parser_type)` method, add your new parser to the `if` statements.
Now your parser will be able to be used when selected from the config file.
Unfortunately, uploads don't always follow the happy-path. Some errors can be expected to occur, and this part of the guide will help you deal with those errors.
#Common Errors
Connectivity issues
===================
Connectivity errors are usually caused by a couple of different issues:
Here are some common error messages you might see when trying to upload.
1. The server is not running or is not connectable,
2. The client credentials are not correct or are expired,
3. The user credentials are not correct or are expired.
##Parsing
The server is not running
-------------------------
No runs could be found in a directory: Double check that that is the directory with your sample sheet, and that you are using the right parser.
![Server is down.](images/connectivity-server-down.png)
Alternatively, you may be using the wrong parser, check your configuration parser matches the type of run/directory you are trying to upload.
If the server is down or is not correctly configured, the uploader will not be able to upload data to the server.
ERROR ERROR! An error occurred with directory 'tests/parsers/miseq/no_dirs/', with message: The directory tests/parsers/miseq/no_dirs/ has no sample sheet file in the MiSeq format with the name SampleSheet.csv
<br>
Invalid sample sheet. Problems were found when trying to parse your sample sheet.
You can verify that the server is down by opening the settings dialog and copying the **Server URL** that you have entered, excluding the `/api/` portion at the end, and paste the URL into your web browser. If no web site appears (or a web site other than the IRIDA login or dashboard page appears), then the server is probably down.
Pay close attention to the error list
**Resolution**: You should get in touch with the administrator of your IRIDA instance to help identify the problem.
ERROR Errors occurred while getting sample sheet
ERROR ERROR! Errors occurred during validation with message: Errors occurred while getting sample sheet
During an upload
----------------
ERROR Error list: [SampleSheetError('[Header] section not found in SampleSheet', 'tests/parsers/miseq/invalid_sample_sheet/SampleSheet.csv'),
SampleSheetError('[Data] section not found in SampleSheet', 'tests/parsers/miseq/invalid_sample_sheet/SampleSheet.csv'),
SampleSheetError('[Reads] section not found in SampleSheet', 'tests/parsers/miseq/invalid_sample_sheet/SampleSheet.csv'),
SampleSheetError('Missing required data header(s): Sample_ID, Sample_Name, Sample_Project, Description', 'tests/parsers/miseq/invalid_sample_sheet/SampleSheet.csv')]
<br>
Files matching the sample name could not be found, check your sample sheet and file names
![Error during upload.](images/connectivity-during-upload.png)
ERROR Errors occurred while building sequence run from sample sheet
ERROR ERROR! Errors occurred during validation with message: Errors occurred while building sequence run from sample sheet
ERROR Error list: [SequenceFileError('The uploader was unable to find an files with a file name that ends with .fastq.gz for the sample in your sample sheet with name 011111 in the directory tests/parsers/miseq/ngs_not_pf_list/Data/Intensities/BaseCalls. This usually happens when the Illumina MiSeq Reporter tool does not generate any FastQ data.',)]
<br>
To many, or not enough files found with names matching the sample name.
If the upload fails during an upload, the uploader will stop uploading data, and tell you which sample failed to be uploaded. Upload failures during an upload can happen for several reasons, most likely either internet connectivity issues or server errors. You can try uploading the data again by clicking the `Try again` button. Clicking the `Try again` button will skip all samples that were successfully uploaded and start with the sample where the failure happened.
ERROR Errors occurred while building sequence run from sample sheet
ERROR ERROR! Errors occurred during validation with message: Errors occurred while building sequence run from sample sheet
ERROR Error list: [SequenceFileError("The following file list ['01-1111_S1_L001_R1_001.fastq.gz', '01-1111_S1_L001_R3_001.fastq.gz', '01-1111_S1_L001_R2_001.fastq.gz'] found in the directory tests/parsers/miseq/ngs_not_valid_pf_list/Data/Intensities/BaseCalls is invalid. Please verify the folder containing the sequence files matches the SampleSheet file",)]
<br>
Invalid haracter in sample name (such as a space). This is not allowed on IRIDA
**Resolution**: If you still have problems uploading data, you should get in touch with the administrator of your IRIDA instance. The uploader generates detailed logs about what has happened during the upload. The logs are stored at `C:\Users\${YOURUSER}\AppData\Local\iridaUploader\iridaUploader\Logs\irida-uploader.log`. You should include the log file in your communication with the administrator to help figure out what the problem is.
ERROR Did not create sample on server. Response code is '400' and error message is '{"sampleName":["The name you have supplied contains a space character. Names must NOT include the space character and the following: ? ( ) [ ] / = + < > : ; \" ' , * ^ | & ."]}'
##Uploading
<br>
Invalid credentials in configuration file, double check that your configuration file is correct
Missing project(s)
==================
ERROR Can not get access token from IRIDA
ERROR ERROR! Could not initialize irida api.
ERROR Errors: ('Could not get access token from IRIDA. Credentials may be incorrect. IRIDA '
"returned with error message: ('Decoder failed to handle access_token with "
'data as returned by provider. A different decoder may be needed. Provider '
'returned: b\\\'{"error":"invalid_grant","error_description":"Bad '
'credentials"}\\\'\',)',)
<br>
Cannot connect to IRIDA, it is likely the base url in the configuration file is incorrect.
![Missing project ID.](images/missing-project-id.png)
Also check that the IRIDA instance you are trying to connect to is running.
The uploader requires a valid IRIDA project ID as a target for where the data should be sent. The project ID belongs in the `Sample_Project` column of `SampleSheet.csv`.
Alternatively, the client_id / client_secret may be incorrect
When the uploader encounters a project ID entered in `SampleSheet.csv` that does not exist on the IRIDA server, the uploader will report that the project ID does not exist.
ERROR Can not connect to IRIDA
ERROR ERROR! Could not initialize irida api.
ERROR Errors: ('Could not connect to the IRIDA server. URL may be incorrect. IRIDA returned '
'with error message: (MaxRetryError("HTTPConnectionPool(host=\'lcalhost\', '
'port=8080): Max retries exceeded with url: /irida-latest/api/oauth/token '
"(Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at "
'0x7f2d3b6759b0>: Failed to establish a new connection: [Errno -2] Name or '
'service not known\',))",),)',)
<br>
An invalid parser name was given, Double check your config file has a valid parser in the parser field
**Resolution**: Find the sample(s) marked in the uploader interface and edit the corresponding rows in `SampleSheet.csv` to use valid project IDs in IRIDA.
Missing sequencing data
=======================
![Missing data.](images/missing-data.png)
Sometimes the uploader will not be able to find files that correspond to samples in the sample sheet. The uploader searches for files under the folder `Data\Intensities\BaseCalls\`. The prefix of each `.fastq.gz` **must** be one of the `Sample_ID` or (if specified) the `Sample_Name`.
Illumina software will sometimes translate the prefix from the sample sheet if your sample name or ID includes special characters or underscores.
**Resolution**: The names or IDs of samples in `SampleSheet.csv` must match exactly with files in `Data\Intensities\BaseCalls`. You should either rename the samples in `SampleSheet.csv` *without* the special characters and regenerate the `.fastq.gz` files, or you should rename the files under `Data\Intensities\BaseCalls` so that they match the names specified in `SampleSheet.csv`.
AssertionError: Bad parser creation, invalid parser_type given: miseqa
# IRIDA MiSeq Uploader
# IRIDA Uploader
The IRIDA MiSeq Uploader is a tool built to upload data from the [Illumina MiSeq](http://www.illumina.com/systems/miseq.html) instrument to an [IRIDA](http://irida.ca) server.
![Uploader main window.](images/uploader-main-window.png)
##Features
* Command Line interface for Linux and Windows
* Single Directory Upload
* Miseq sequencing run parser
## Features
* Resumable uploads -- if the uploader fails due to a connection or server-related issue, the upload can be resumed later without re-uploading data.
* Automated uploads -- the uploader can optionally be configured to monitor an analysis directory for new runs and upload them immediately upon completion.
* Post-processing tasks -- the uploader can optionally be configured to execute post-processing tasks after uploading data, like backing up data to an external location.
* Straightforward user interface -- uploading a new run to an IRIDA server is as simple as clicking the `Upload` button!
## Upcoming Features
* Automated uploads
* File upload checksum validation
* Post-processing tasks
* GUI
* Pause and resume uploads
## Getting the Uploader
The IRIDA MiSeq Uploader can be run on Microsoft Windows, or any other operating system that supports Python.
# Getting Started
You can download pre-built packages for Windows from our [GitHub releases page](https://github.com/phac-nml/irida-miseq-uploader/releases/latest).
## Download / Install / Setup
You may also run the uploader on Linux or Mac, provided you have access to `pip` and [wxPython](https://wxpython.org/). Instructions for running the uploader on Linux or Mac can be found in our [`README.md` file](https://github.com/phac-nml/irida-miseq-uploader/blob/master/README.md).
### Download
## Preparing your sample sheet
Before using the uploader, you must prepare your sequencing run with IRIDA-specific project IDs. You can either enter the project IDs when you're creating your sample sheet using the Illumina Experiment Manager or after creating the sample sheet by editing `SampleSheet.csv` with Microsoft Excel or Windows Notepad.
The IRIDA MiSeq Uploader can be run on any operating system that supports Python.
An example, completed `SampleSheet.csv` with the `Sample_Project` column filled in looks like:
You can download the source code on [GitHub]().
```
[Header]
IEMFileVersion,4
Investigator Name,Investigator 1
Experiment Name,Experiment
Date,2015-05-14
Workflow,GenerateFASTQ
Application,FASTQ Only
Assay,Nextera XT
Description,
Chemistry,Amplicon
TODO: You can download pre-built packages for Windows from our [GitHub releases page](https://github.com/phac-nml/irida-miseq-uploader/releases/latest).
[Reads]
251
251
### Installation
[Settings]
ReverseComplement,0
Adapter,ATCGATCGATCG
#### Windows
[Data]
Sample_ID,Sample_Name,Sample_Plate,Sample_Well,I7_Index_ID,index,I5_Index_ID,index2,Sample_Project,Description
sample1,,plate1,A01,N801,ATCGAAA,S801,ATCGAAA,1,
sample2,,plate1,A01,N801,ATCGAAA,S801,ATCGAAA,1,
sample3,,plate1,A01,N801,ATCGAAA,S801,ATCGAAA,1,
```
Run an installer (links above) and follow along with the install wizard.
## Running the Uploader
Once you've installed the uploader and added project IDs to your sample sheet, you can run the uploader by clicking on the entry created in the Start menu:
You will need to configure your uploader before running. See [Configuration](configuration.md) for details
![Start menu entry.](images/start-menu-entry.png)
#### Linux
## Configuring the Uploader
Make sure Python 3 is installed
The first time you run the uploader, you'll be prompted to fill-in some configuration settings:
$ python3 --version
![Default settings dialog.](images/default-settings-dialog.png)
If python3 is not installed, install with
* The **Server URL** is the API address of your IRIDA server. This address is almost always `http://example.com/irida/api`, where `example.com` is your domain.
* The **Client authorization** section is where you will fill in the OAuth client ID for the uploader. You must create a client in your IRIDA instance using the `password` grant type, and your client must be given both `read` and `write` scopes. You can find out about creating clients in IRIDA in the [Creating a new system client](http://irida.corefacility.ca/documentation/user/administrator/#creating-a-new-system-client) documentation.
* The **User authorization** section is where you will fill in the username and password of the user account that has permission to upload data to IRIDA. The user account **must** be created having the *Sequencer* role.
* The **Task to run on successful upload** field gives you the opportunity to run post-processing scripts on your MiSeq data, after the uploader has finished uploading the data to the IRIDA server. This field can be used, for example, to back up the Illumina data to an external storage area using the Windows utility `robocopy`.
* The **Default directory** field should be used to specify the directory that the uploader will scan when it first starts. If you're running this tool on the MiSeq instrument, you should choose the folder `D:\Illumina\MiSeqAnalysis\`.
* The option **Monitor directory for new runs?** can be used to automate uploading newly completed runs to the IRIDA server. **WARNING**: For this feature to work, you must leave the IRIDA MiSeq Uploader tool running *during* the sequencing process. While this feature has been tested extensively, Illumina discourages running any non-Illumina software on the machine while the sequencing process is running.
$ sudo apt-get install python3
The settings dialog will verify the information you're entering as you type it in. If you've entered everything correctly, your settings dialog should have green checkmark icons <span style='color: green'>✓</span> beside each of the authorization fields:
Install pip:
![Valid settings.](images/valid-settings.png)
$ sudo apt-get install python3-pip
If anything is incorrectly entered, you'll see a red cross icon <span style='color: red'>✘</span>.
### virtualenv usage
# Uploader Interfaces
After you've finished configuring the IRIDA MiSeq Uploader, the first screen you will see will tell you that no new data has been found to upload:
Install virtualenv and setuptools
![No new runs found.](images/no-new-runs-found.png)
$ pip install virtualenv
$ pip install setuptools
If you launch the uploader when the directory `D:\Illumina\MiSeqAnalysis\` has sequencing data, then you'll see a list of the runs that the uploader discovered, along with a list of the samples in each run and the name of the project that each sample is going to be uploaded to on the IRIDA server:
If you already have these packages installed, ensure they are up to date
![New run found.](images/new-run-found.png)
$ pip install virtualenv -U
$ pip install setuptools -U
When a new run is found, click the `Upload` button!
Download the source code
![Uploading.](images/uploading.png)
$ git clone https://github.com/phac-nml/irida-miseq-uploader
$ cd irida-miseq-uploader
As each sample completes uploading to the server, it's moved to the bottom of the list of samples with a green checkmark <span style='color: green'></span>indicating that the upload succeeded.
Build a virtualenv and install the dependencies automatically with `make`:
When the run is complete, all samples will have a green checkmark <span style='color: green'></span> beside them, and the overall run progress bar will show 100% progress:
$ make
You will need to configure your uploader before running.
![Completed upload.](images/completed-upload.png)
### Configuration
You will need to configure IRIDA and the uploader to upload files.
[How to configure](configuration.md)
If you do not create a configuration file, IRIDA uploader will create one for you with default values the first time it try's to upload.
You will need to edit this file with your IRIDA credentials, and the parser that matches your data.
#### Choose a Parser
The config file has a `parser` field that you can use to parse different directory structures
We currently support the following:
`directory` : [Generic Directory](parsers/directory.md)
`miseq` : [Miseq](parsers/miseq.md)
## Starting an upload
You can upload with the following commands
### Windows:
Open a Command Prompt terminal and use the `iridauploader` command to upload
`C:\Users\username> iridauploader \path\to\my\samples\`
### Linux:
Use the the `irida-uploader.sh` script included with the source code to upload.
`./irida-uploader.sh /path/to/the/sequencing/run/`
# Problems?
### Problems uploading?
Check the [Errors Page](errors.md) for help with common errors.
### Found a bug or have an idea for a feature?
Create an issue on our [GitHub](todo: link to github)
# Developers
Want to create a parser for a sequencer that we don't yet support or have an idea for an IRIDA related project?
[Requirements for new parsers](developers/parsers.md)
[Information on the IRIDA python API](developers/api.md)
[Object Model Reference](developers/objects.md)
## File Structure
To upload using the directory parser, Organize your files according to the following
```
.
├── file_1.fastq.gz
├── file_2.fastq.gz
├── samp_F.fastq.gz
├── samp_R.fastq.gz
├── germ_f.fastq.gz
├── germ_r.fastq.gz
└── SampleList.csv
```
## File Names
When uploading paired end reads, your file names must indicate forward/reverse.
Use the same name between files, with the difference of including `1` / `2`, `F` / `R`, `f` / `r`, as shown above.
## Preparing your sample list file
Before using the uploader, you must create a `SampleList.csv` file.
It must contain the following fields
`Sample_Name` : This is what the sample will be identified as on IRIDA after upload.
`Project_ID`: The IRIDA project the sample will be uploaded to.
`File_Forward`: Always needed, the forward read file.
`File_Reverse`: Needed for paired end reads, the reverse read file.
An example, completed `SampleList.csv` with all the columns filled in looks like:
```
[Data]
Sample_Name,Project_ID,File_Forward,File_Reverse
my-sample-1,75,file_1.fastq.gz,file_2.fastq.gz
my-sample-2,75,samp_F.fastq.gz,samp_R.fastq.gz
my-sample-3,76,germ_f.fastq.gz,germ_r.fastq.gz
```
Another example, but with only single end reads:
```
[Data]
Sample_Name,Project_ID,File_Forward,File_Reverse
my-sample-1,75,file_1.fastq.gz
my-sample-2,75,samp_F.fastq.gz
my-sample-3,76,germ_f.fastq.gz
```
## File Structure
The file structure for a miseq run should be correct by default, but if you are having problems uploading, please verify your file structure is correct.
```
.
├── CompletedJobInfo.xml
├── Data
│   └── Intensities
│   └── BaseCalls
│   ├── sample1_S1_L001_R1_001.fastq.gz
│   ├── sample1_S1_L001_R2_001.fastq.gz
│   ├── sample2_S1_L001_R1_001.fastq.gz
│   ├── sample2_S1_L001_R2_001.fastq.gz
│   ├── sample3_S1_L001_R1_001.fastq.gz
│   └── sample3_S1_L001_R2_001.fastq.gz
└── SampleSheet.csv
```
## Preparing your miseq sample sheet
Before using the uploader, you must prepare your sequencing run with IRIDA-specific project IDs. You can either enter the project IDs when you're creating your sample sheet using the Illumina Experiment Manager or after creating the sample sheet by editing `SampleSheet.csv` with Microsoft Excel or Windows Notepad.
An example, completed `SampleSheet.csv` with the `Sample_Project` column filled in looks like:
```
[Header]
IEMFileVersion,4
Investigator Name,Investigator 1
Experiment Name,Experiment
Date,2015-05-14
Workflow,GenerateFASTQ
Application,FASTQ Only
Assay,Nextera XT
Description,
Chemistry,Amplicon
[Reads]
251
251
[Settings]
ReverseComplement,0
Adapter,ATCGATCGATCG
[Data]
Sample_ID,Sample_Name,Sample_Plate,Sample_Well,I7_Index_ID,index,I5_Index_ID,index2,Sample_Project,Description
sample1,,plate1,A01,N801,ATCGAAA,S801,ATCGAAA,1,
sample2,,plate1,A01,N801,ATCGAAA,S801,ATCGAAA,1,
sample3,,plate1,A01,N801,ATCGAAA,S801,ATCGAAA,1,
```
\ No newline at end of file
[Settings]
client_id = uploader
client_secret = ZK1z6H165y4IZF2ckqNQES315OyKQU8CsrpHNdQr16
username = admin
password = password1
base_url = http://localhost:8080/irida-latest/api/
parser = miseq
[Data]
Sample_Name,Project_ID,File_Forward,File_Reverse
my-sample-1,75,file_1.fastq.gz,file_2.fastq.gz
my-sample-2,75,samp_F.fastq.gz,samp_R.fastq.gz
my-sample-3,76,germ_f.fastq.gz,germ_r.fastq.gz
[Header]
IEMFileVersion,4
Investigator Name,Some Guy
Experiment Name,1
Date,10/15/2013
Workflow,GenerateFASTQ
Application,FASTQ Only
Assay,Nextera XT
Description,Superbug
Chemistry,Amplicon
[Reads]
251
250
[Settings]
ReverseComplement,0
Adapter,AAAAGGGGAAAAGGGGAAA
[Data]
Sample_ID,Sample_Name,Sample_Plate,Sample_Well,I7_Index_ID,index,I5_Index_ID,index2,Sample_Project,Description
01-1111,01-1111,1,01,N01,AAAAAAAA,S01,TTTTTTTT,6,Super bug
02-2222,02-2222,2,02,N02,GGGGGGGG,S02,CCCCCCCC,6,Scary bug
03-3333,03-3333,3,03,N03,CCCCCCCC,S03,GGGGGGGG,6,Deadly bug
site_name: IRIDA MiSeq Uploader
site_name: IRIDA Uploader
#repo_url: todo
site_description: Documentation for the IRIDA Uploader
theme: readthedocs
#forces use of .html on docs pages so they can be browsed locally
use_directory_urls: false
nav:
- Overview: index.md
- configuration.md
- errors.md
- Parsers:
- parsers/directory.md
- parsers/miseq.md
- Developers:
- developers/api.md
- developers/parsers.md
- developers/objects.md