AMPSEQ_ANALYSIS

Pipeline for read quality check and amplicon seq analysis/visualization based on Fastqc and CRISPResso2:

fastqc quality check (alone or integrated with pipeline)
CRISPResso2 in normal or batch mode for single or pair-end data
Read the whole README file before running (it's not long)

installation and requirements

miniconda (check installation in https://docs.conda.io/en/latest/miniconda.html)
fastqc (needs java runtime installed, check https://www.bioinformatics.babraham.ac.uk/projects/download.html#fastqc)
multiqc (installation below)
CRISPResso2 (installation below)

Note 1: using conda is not mandatory. You may install multiqc via pip and CRISPResso via docker.

create new environment and activate it

$ conda create -n ampseq -c bioconda -c conda-forge python=3.9 crispresso2 multiqc
$ conda activate ampseq

get ampseq_analysis

Clone OR download ampseq_analysis repository to your computer:

# clone repo using git (needs git to be installed)
$ git clone https://github.com/miriukaLab/ampseq_analysis.git
$ cd ampseq_analysis
$ chmod a+x amplugged

# download zip to your local device
$ curl -L https://github.com/miriukaLab/ampseq_analysis/archive/refs/heads/main.zip -o ampseq_analysis.zip
$ unzip ampseq_analysis.zip
$ rm ampseq_analysis.zip
$ cd ampseq_analysis
$ chmod a+x amplugged

prepare raw data and files

Place raw data (.fq|.fastq|.fastq.gz) in known directory, you will need to provide full path to samples when running the pipeline
Complete "ampseq_analysis/parameters.txt" file. Note the double quotes ("") next to the variable names (e.g. FASTQR1=""); you have to fill the variable's value within the quotes, unless it is a number (don't use the quotes here)
if running in batch mode, complete the "ampseq_analysis/batch.txt" as it is indicated in the file, providing the full path to the data file ("$HOME/raw_data/sample1_1.fastq.gz")

run pipeline (walkthrough)

amplugged runs through bash terminal (remember to complete the parameters file)

# enter ampseq directory
$ cd ampseq_analysis

#run main pipeline
$ ./amplugged

Running the command above wil start program.

input/output information

$ ./amplugged
# if path to raw_data is wrong, an error will be raised and a new directory will be requested (see Note 2 below)
Error: "$HOME/Desktop/wrong_path" not found. Cannot continue.
Please, provide raw data directory here: $HOME/Desktop/raw_data

$HOME/Desktop/raw_data  OK
$HOME/Desktop/raw_data/sample1_1.fastq.gz OK
$HOME/Desktop/raw_data/sample1_2.fastq.gz OK
$HOME/Desktop/raw_data/sample2_1.fastq.gz OK
$HOME/Desktop/raw_data/sample2_2.fastq.gz OK

# path to output directory is checked (see Note 3 below)
"$HOME/Desktop/new_dir_output" not found. Creating....

Note 2: if input directory exists, but samples have unknown extension (other than fq, fastq or fastq.gz), the program will exit. Check extensions and run again.

Note 3: if output directory does not exist, the program will create it...so, you just need to provide the path in the parameters file.

Perform (or not) quality check on raw samples with Fastqc

Note 4: if QUALITY_CHECK is set to "no", Fastqc will be skipped. On the contrary, if QUALITY_CHECK is set to "yes", Fastqc will be integrated to the rest of the pipeline. Finally, setting the variable to "only" will run Fastqc on samples and then exit (no CRISPResso).

Run CRISPResso2 on samples: normal (one sample at a time) or batch (all samples together; same parameters for all)

Note 5: If running in batch mode (default, but can be changed in parameters file), you don't need to provide values for FASTQR1, FASTQR2 and NAME variables in the parameters file. These are all covered in the file.batch. Note 6: there is no need to indicate that you will be running in single-end mode, just don't fill the FASTQR2 variable when running in normal mode or remove the fastq_r2 column in the file.batch in batch mode.

Go check results...tune parameters...provide new output path to avoid overwriting.

Good luck!

Alejandro La Greca | [email protected] | GitHub | twitter

Software

Fastqc: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ ([email protected])

CRISPResso2: https://doi.org/10.1038/s41587-019-0032-3 (https://github.com/pinellolab/CRISPResso2)

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
scripts		scripts
LICENSE		LICENSE
README.md		README.md
amplugged		amplugged
batch.txt		batch.txt
parameters.txt		parameters.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AMPSEQ_ANALYSIS

installation and requirements

create new environment and activate it

get ampseq_analysis

prepare raw data and files

run pipeline (walkthrough)

About

Releases

Packages

Languages

License

miriukaLab/ampseq_analysis

Folders and files

Latest commit

History

Repository files navigation

AMPSEQ_ANALYSIS

installation and requirements

create new environment and activate it

get ampseq_analysis

prepare raw data and files

run pipeline (walkthrough)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages