Skip to content

Commit

Permalink
Clarify build process
Browse files Browse the repository at this point in the history
  • Loading branch information
veghp committed Sep 26, 2023
1 parent bc8bd45 commit 5f79e6f
Showing 1 changed file with 36 additions and 9 deletions.
45 changes: 36 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,52 +20,78 @@ Pull the Nextflow pipeline:
nextflow pull edinburgh-genome-foundry/Sequeduct -r v0.3.1
```

Pull the Docker image that contains the required software (requires access to EGF's container repo):
#### Docker image

Build the image that contains the software required for running the pipeline. First, obtain the code (Dockerfile) either by downloading or cloning:

##### Download

Download the repository...

* click on the "<> Code" button at the top of this page, and 'Download ZIP'
* open a terminal where the file was downloaded
* Unzip the file (e.g. `unzip Sequeduct-main.zip`)

##### Clone

... or clone the repository:

```bash
docker pull ghcr.io/edinburgh-genome-foundry/sequeduct:v0.3.1
git clone https://github.com/Edinburgh-Genome-Foundry/Sequeduct.git
```

Alternatively, build the image locally from the cloned repo:
#### Build

Change to the downloaded directory (e.g. `cd Sequeduct-main/`), then run:

```bash
docker build . -f containers/Dockerfile --tag sequeduct_local
```

where sequeduct_local is a custom tag that you can specify, and should be used in the run commands below.

Alternatively, pull the Docker image if you have access to EGF's container repo (e.g. EGF staff members):

```bash
docker pull ghcr.io/edinburgh-genome-foundry/sequeduct:v0.3.1
```

Use `-profile docker` to use this image in the below commands, instead of `-with-docker sequeduct_local`.


### Run

Create a directory for your project and copy (or link) the FASTQ directories from your Nanopore run (e.g. into `fastq`). Specify this together with a sample sheet in your commands:
Create a directory for your project and copy (or link) the FASTQ directories from your Nanopore run (e.g. `fastq_pass`). Specify this together with a sample sheet in your commands:

```bash
# Preview
nextflow run edinburgh-genome-foundry/Sequeduct -r v0.3.1 -entry preview --fastq_dir='fastq_pass' \
--reference_dir='genbank' \
--sample_sheet='sample_sheet.csv' \
-profile docker
-with-docker sequeduct_local
# Analysis
nextflow run edinburgh-genome-foundry/Sequeduct -r v0.3.1 -entry analysis --fastq_dir='fastq_pass' \
--reference_dir='genbank' \
--sample_sheet='sample_sheet.csv' \
--projectname='EGF project' \
-profile docker
-with-docker sequeduct_local
# Review
nextflow run edinburgh-genome-foundry/Sequeduct -r v0.3.1 -entry review --reference_dir='genbank' \
--results_csv='results_sheet.csv' \
--projectname='EGF project review' \
--all_parts='parts_fasta/part_sequences.fasta' \
--assembly_plan='assembly_plan.csv' \
-profile docker
-with-docker sequeduct_local
# De novo assembly
nextflow run edinburgh-genome-foundry/Sequeduct -r v0.3.1 -entry assembly --fastq_dir='fastq_pass' \
--assembly_sheet='assembly_sheet.csv' \
-profile docker
-with-docker sequeduct_local
```

The above commands each output a directory within a created `results` directory. Similarly, Nextflow creates and uses a directory named `work`, so ensure that your project directory doesn't have a directory with the same name. Specify revision of the project with `-r` (a git branch or tag), and choose a configuration profile (with `-profile`). Profiles are specified in the Nextflow config files. The Review pipeline utilises the output files of the Analysis pipeline, but otherwise the pipelines are independent. Please find example sheets in the `examples` directory.

A more detailed example and demonstration data are available at the [Sequeduct demo](https://github.com/Edinburgh-Genome-Foundry/Sequeduct_demo) site.

Use `-with-docker sequeduct_local` (where sequeduct_local is the tag you specified during build) to use a locally built Docker image (instead of `-profile docker`).

### Details

Expand All @@ -79,6 +105,7 @@ For convenience, a script is included to collect plot files from the result dire

The pipeline was designed to work with data from one or more barcodes (FASTQ subdirectories). It has been tested on a desktop machine running Ubuntu 20.04.6 LTS (Memory: 15.5 GiB; CPU: Intel® Core™ i5-6500 CPU @ 3.20GHz × 4), and confirmed to work with up to 96 barcodes. The largest tested dataset was 1.5 GB Nanopore FASTQ data, resulting in 1.1 GB filtered data (100k filtered reads) with up to 55 MB individual filtered FASTQ files (i.e. per sample). If the dataset is much larger, then it may return an error at the variant call or another step. A recommended solution is to increase the quality cutoff (with parameter `--quality_cutoff`), and optionally the minimum length cutoff (`--min_length`), to work with fewer but better reads.


## License = GPLv3+

Copyright 2021 Edinburgh Genome Foundry, University of Edinburgh
Expand Down

0 comments on commit 5f79e6f

Please sign in to comment.