Clarify build process

Edinburgh-Genome-Foundry · Sep 26, 2023 · 5f79e6f · 5f79e6f
1 parent bc8bd45
commit 5f79e6f
Showing 1 changed file with 36 additions and 9 deletions.
diff --git a/README.md b/README.md
@@ -20,52 +20,78 @@ Pull the Nextflow pipeline:
 nextflow pull edinburgh-genome-foundry/Sequeduct -r v0.3.1
 ```
 
-Pull the Docker image that contains the required software (requires access to EGF's container repo):
+#### Docker image
+
+Build the image that contains the software required for running the pipeline. First, obtain the code (Dockerfile) either by downloading or cloning:
+
+##### Download
+
+Download the repository...
+
+* click on the "<> Code" button at the top of this page, and 'Download ZIP'
+* open a terminal where the file was downloaded
+* Unzip the file (e.g. `unzip Sequeduct-main.zip`)
+
+##### Clone
+
+... or clone the repository:
 
 ```bash
-docker pull ghcr.io/edinburgh-genome-foundry/sequeduct:v0.3.1
+git clone https://github.com/Edinburgh-Genome-Foundry/Sequeduct.git
 ```
 
-Alternatively, build the image locally from the cloned repo:
+#### Build
+
+Change to the downloaded directory (e.g. `cd Sequeduct-main/`), then run:
 
 ```bash
 docker build . -f containers/Dockerfile --tag sequeduct_local
 ```
 
+where sequeduct_local is a custom tag that you can specify, and should be used in the run commands below.
+
+Alternatively, pull the Docker image if you have access to EGF's container repo (e.g. EGF staff members):
+
+```bash
+docker pull ghcr.io/edinburgh-genome-foundry/sequeduct:v0.3.1
+```
+
+Use `-profile docker` to use this image in the below commands, instead of `-with-docker sequeduct_local`.
+
+
 ### Run
 
-Create a directory for your project and copy (or link) the FASTQ directories from your Nanopore run (e.g. into `fastq`). Specify this together with a sample sheet in your commands:
+Create a directory for your project and copy (or link) the FASTQ directories from your Nanopore run (e.g. `fastq_pass`). Specify this together with a sample sheet in your commands:
 
 ```bash
 # Preview
 nextflow run edinburgh-genome-foundry/Sequeduct -r v0.3.1 -entry preview --fastq_dir='fastq_pass' \
  --reference_dir='genbank' \
  --sample_sheet='sample_sheet.csv' \
- -profile docker
+ -with-docker sequeduct_local
 # Analysis
 nextflow run edinburgh-genome-foundry/Sequeduct -r v0.3.1 -entry analysis --fastq_dir='fastq_pass' \
  --reference_dir='genbank' \
  --sample_sheet='sample_sheet.csv' \
  --projectname='EGF project' \
- -profile docker
+ -with-docker sequeduct_local
 # Review
 nextflow run edinburgh-genome-foundry/Sequeduct -r v0.3.1 -entry review --reference_dir='genbank' \
  --results_csv='results_sheet.csv' \
  --projectname='EGF project review' \
  --all_parts='parts_fasta/part_sequences.fasta' \
  --assembly_plan='assembly_plan.csv' \
- -profile docker
+ -with-docker sequeduct_local
 # De novo assembly
 nextflow run edinburgh-genome-foundry/Sequeduct -r v0.3.1 -entry assembly --fastq_dir='fastq_pass' \
  --assembly_sheet='assembly_sheet.csv' \
- -profile docker 
+ -with-docker sequeduct_local
 ```
 
 The above commands each output a directory within a created `results` directory. Similarly, Nextflow creates and uses a directory named `work`, so ensure that your project directory doesn't have a directory with the same name. Specify revision of the project with `-r` (a git branch or tag), and choose a configuration profile (with `-profile`). Profiles are specified in the Nextflow config files. The Review pipeline utilises the output files of the Analysis pipeline, but otherwise the pipelines are independent. Please find example sheets in the `examples` directory.
 
 A more detailed example and demonstration data are available at the [Sequeduct demo](https://github.com/Edinburgh-Genome-Foundry/Sequeduct_demo) site.
 
-Use `-with-docker sequeduct_local` (where sequeduct_local is the tag you specified during build) to use a locally built Docker image (instead of `-profile docker`).
 
 ### Details
 
@@ -79,6 +105,7 @@ For convenience, a script is included to collect plot files from the result dire
 
 The pipeline was designed to work with data from one or more barcodes (FASTQ subdirectories). It has been tested on a desktop machine running Ubuntu 20.04.6 LTS (Memory: 15.5 GiB; CPU: Intel® Core™ i5-6500 CPU @ 3.20GHz × 4), and confirmed to work with up to 96 barcodes. The largest tested dataset was 1.5 GB Nanopore FASTQ data, resulting in 1.1 GB filtered data (100k filtered reads) with up to 55 MB individual filtered FASTQ files (i.e. per sample). If the dataset is much larger, then it may return an error at the variant call or another step. A recommended solution is to increase the quality cutoff (with parameter `--quality_cutoff`), and optionally the minimum length cutoff (`--min_length`), to work with fewer but better reads.
 
+
 ## License = GPLv3+
 
 Copyright 2021 Edinburgh Genome Foundry, University of Edinburgh