Skip to content

Assembly utility scripts used to assemble bacterial isolates sequenced on Illumina paired-end sequencers

License

Notifications You must be signed in to change notification settings

bioforensics/asm_tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Assembly Tools

Snakemake workflows used to assemble bacterial isolates.

Workflows were used to assemble five historical Bacillus anthracis isolates soon to be published in Microbiology Resource Annoucements.

The Bacillus anthracis assemblies have been deposited in DDBJ/ENA/GenBank under BioSample accession numbers SAMN12620928, SAMN12620929, SAMN12620930, SAMN12620931, and SAMN12620932. The raw Illumina paired-end sequencing reads have been deposited in the Sequence Read Archive under accession numbers SRR10019497, SRR10019498, SRR10019499, SRR10019500, and SRR10019501.

Installation

Read preprocessing workflow installation

  1. Install Anaconda
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
  1. Download asm_tools
git clone git:https://github.com/bioforensics/asm_tools

OR

Download a Release

  1. Setup python environment and use conda to install required packages (mash, fastp, etc).
   cd asm_tools/preprocess
   conda create -f preprocess_env.yml
   conda activate bmap_preprocess
  1. (Optional) Download databases for "mash screen" to check for contaminants.
    Mash Sketch databases for RefSeq release 88:
  1. Edit preprocess/config.yml with path to mash database
mashdb: path/to/mashdb
  1. Run the read preprocessing workflow
path/to/asm_tools/preprocess/bmap_preprocess -r1 test/seq/test_R1.fastq.gz -r2 test/seq/test_R2.fastq.gz -s sample_name

Singularity Container installation

singularity pull bmap_preprocess.sif library:https://dsommer/default/bmap/bmap_preprocess singularity exec bmap_preprocess.sif -r1 test/seq/test_R1.fastq.gz -r2 test/seq/test_R2.fastq.gz -s test1

Preprocessing Paired-End Reads

Snakemake DAG

Alt text

Workflow outline

The preprocessing.smk Snakemake workflow prepares Illumina reads to be assembled.

  1. Run fastp to trim adapter sequence, low quality bases, and very short reads. By default, bases below Q20 at ends of reads will be trimmed. Any reads below length 75 and/or containing Ns will be removed.
  2. Run "mash screen" against RefSeq to check for contaminents.
  3. Estimate genome size by building a k-mer profile on the reads.
  4. Randomly downsample reads to 150× coverage of the estimated genome size using sample-reads program.

About

Assembly utility scripts used to assemble bacterial isolates sequenced on Illumina paired-end sequencers

Resources

License

Stars

Watchers

Forks

Packages

No packages published