Skip to content

Plant-Food-Research-Open/assemblyqc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GitHub Actions CI Status GitHub Actions Linting StatusCite Article nf-test

Nextflow run with conda ❌ run with docker run with singularity Launch on Seqera Platform

Introduction

plant-food-research-open/assemblyqc is a Nextflow pipeline which evaluates assembly quality with multiple QC tools and presents the results in a unified html report. The tools are shown in the Pipeline Flowchart and their references are listed in CITATIONS.md.

Pipeline Flowchart

%%{init: {
    'theme': 'base',
    'themeVariables': {
    'fontSize': '52px",
    'primaryColor': '#9A6421',
    'primaryTextColor': '#ffffff',
    'primaryBorderColor': '#9A6421',
    'lineColor': '#B180A8',
    'secondaryColor': '#455C58',
    'tertiaryColor': '#ffffff'
  }
}}%%
flowchart LR
  forEachTag(Assembly) ==> VALIDATE_FORMAT[VALIDATE FORMAT]

  VALIDATE_FORMAT ==> ncbiFCS[NCBI FCS\nADAPTOR]
  ncbiFCS ==> Check{Check}

  VALIDATE_FORMAT ==> ncbiGX[NCBI FCS GX]
  ncbiGX ==> Check
  Check ==> |Clean|Run(Run)

  Check ==> |Contamination|Skip(Skip All)
  Skip ==> REPORT

  VALIDATE_FORMAT ==> GFF_STATS[GENOMETOOLS GT STAT]

  Run ==> ASS_STATS[ASSEMBLATHON STATS]
  Run ==> BUSCO
  Run ==> TIDK
  Run ==> LAI
  Run ==> KRAKEN2
  Run ==> HIC_CONTACT_MAP[HIC CONTACT MAP]
  Run ==> MUMMER
  Run ==> MINIMAP2
  Run ==> MERQURY

  MUMMER ==> CIRCOS
  MUMMER ==> DOTPLOT

  MINIMAP2 ==> PLOTSR

  ASS_STATS ==> REPORT
  GFF_STATS ==> REPORT
  BUSCO ==> REPORT
  TIDK ==> REPORT
  LAI ==> REPORT
  KRAKEN2 ==> REPORT
  HIC_CONTACT_MAP ==> REPORT
  CIRCOS ==> REPORT
  DOTPLOT ==> REPORT
  PLOTSR ==> REPORT
  MERQURY ==> REPORT
Loading

Usage

Refer to usage, parameters and output documents for details.

Note

If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

Prepare an assemblysheet.csv file with following columns representing target assemblies and associated meta-data.

  • tag: A unique tag which represents the target assembly throughout the pipeline and in the final report
  • fasta: FASTA file

Now, you can run the pipeline using:

nextflow run plant-food-research-open/assemblyqc \
   -profile <docker/singularity/.../institute> \
   --input assemblysheet.csv \
   --outdir <OUTDIR>

Warning

Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

Plant&Food Users

Download the pipeline to your /workspace/$USER folder. Change the parameters defined in the pfr/params.json file. Submit the pipeline to SLURM for execution.

sbatch ./pfr_assemblyqc

Credits

plant-food-research-open/assemblyqc was originally written by Usman Rashid (@gallvp) and Ken Smith (@hzlnutspread).

Ross Crowhurst (@rosscrowhurst), Chen Wu (@christinawu2008) and Marcus Davy (@mdavy86) generously contributed their QC scripts.

Mahesh Binzer-Panchal (@mahesh-panchal) and Simon Pearce (@SPPearce) helped port the pipeline modules and sub-workflows to nf-core schema.

We thank the following people for their extensive assistance in the development of this pipeline:

The pipeline uses nf-core modules contributed by following authors:

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

Citations

If you use plant-food-research-open/assemblyqc for your analysis, please cite it as:

AssemblyQC: A Nextflow pipeline for reproducible reporting of assembly quality.

Usman Rashid, Chen Wu, Jason Shiller, Ken Smith, Ross Crowhurst, Marcus Davy, Ting-Hsuan Chen, Ignacio Carvajal, Sarah Bailey, Susan Thomson & Cecilia H Deng.

Bioinformatics. 2024 July 30. doi: 10.1093/bioinformatics/btae477.

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.