This pipeline merges and filters paired-end Illumina 16S rRNA sequences and then executes QIIME to cluster OTUs using the open reference OTU picking algorithm.
The steps are:
- Merge and QC the sequences using USEARCH and cutadapt
- Convert fastq files to QIIME-formatted fasta files
- Pick OTUs using QIIME's open reference algorithm
- Remove chimeras
The output of the pipeline is a biom file with the OTU table and taxonomic assignments (using Greengenes) and a phylogenetic tree of the pynast-aligned representative sequences for each OTU. To run the pipeline a file named "params.txt" is required in the directory the scripts are executed to set the parameters and paths. See the documentation for the relevant programs for details on what the parameters are. Also required is a qiime_params.txt file with QIIME-specific parameters. To use default parameters in QIIME create a blank text file named qiime_params.txt in the directory the scripts are executed.
The pipeline also generates some useful diagnostics in addition to those generated by QIIME: merge_QC_stats.csv breakdown of how many reads were merged and passed QC for each sample chimera_stats.csv breakdown of chimeric reads for each sample
To run the pipeline:
- Edit the params.txt and qiime_params.txt file to set your paths and parameters
- Execute 01.merge_QC.sh. Check merge_QC_stats.csv to ensure reads are merging and QC is acceptable.
- Execute 02.fastq_to_fasta.sh. Check script output to ensure all your samples were collated and subsampled correctly if this parameter was set.
- Execute 03.pick_open_otus.sh. Check the log file generated by QIIME for errors.
- Execute 04.chimera_remove.sh. Check chimera_stats.csv and the log file for possible chimeras. If there are chimeras, verify they are true chimeras by using BLAST.
Caporaso JG, Kuczynski J, Stombaugh J et al. (2010) QIIME allows analysis of high-throughput community sequencing data. Nature Methods, 7, 335–336.
Edgar RC, Flyvbjerg H (2015) Error filtering, pair assembly and error correction for next-generation sequencing reads. Bioinformatics, 31, 3476–3482.
Sodergren E, al et (2011) Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. Genome Research, 21.