GitHub - z0on/2bRAD_denovo: Genome-wide de novo genotyping with 2bRAD

Whole genome de novo genotyping with 2bRAD

Current sample preparation protocol (feel free to comment!): https://docs.google.com/document/d/1am7L_Pa5JQ4sSx0eT5j4vdNPy5FUAtMZRsJZ0Ar5g9U/edit?usp=sharing

2bRAD has been described in Wang et al 2012 http:https://www.nature.com/nmeth/journal/v9/n8/abs/nmeth.2023.html

2bRAD features a very simple library prep protocol with no intermediate purification stages. It involves triple barcode scheme, two being standard Illumina barcodes and one in-read ligated barcode for pooling libraries in 12-plexes midway through library prep to further minimize prep costs. 2bRAD generates 36-base tags that can be sequenced on either strand. In a full-representation version it results in one high-quality genotyped tag every 2.5-3.5 kb on average. With reduced-representation with modified ligated adapters, the number of genotyped tags can be reduced 4- or 16-fold, to save sequencing costs in applications not requiring dense genome sampling (such as population structure analysis or linkage mapping).

Unique features of 2bRAD bioinformatics are removal of PCR duplicates based on degenerate tag (allowing for 128-fold dynamic range of sequencing depth per allele), use of direct-reverse strand ratio as a quality fltering parameter, and filtering and quality assessment based on genotyping replicates.

de novo 2bRAD pipeline is similar in ideology to STACKS, but takes advantage of the fact that 2bRAD sequences both strands. The assembled RAD loci are formatted as a fake genome, which is then used as a reference - from there on, the pipeline follows the reference-based steps. Since mid-2017 we favor ANGSD as our main data processing toolkit (http:https://www.popgen.dk/angsd/index.php/ANGSD). The 2bRAD_README.sh file suggests basic steps to analyze genotyping quality, investigate population subdivision and obtain unbiased SFS for demographic analysis (using dadi or Moments). We also keep instructions for GATK UnifiedGenotyper and variant quality score recalibration; these would be preferred for high-coverage data (10x or better).

This repository contains the lab protocol for sample preparation, as well as scripts and walkthrough (2bRAD_README.txt) for:

read trimming, quality filtering, removal of PCR duplicates
assembling RAD loci
creating and formatting a fake genome out of RAD loci
mapping original reads to fake genome
exploring coverage depth and genotyping rates

For <10x coverage data:

performing PCA and ADMIXTURE analysis based on "fuzzy" genotyping in ANGSD
deriving site frequency spectra using ANGSD and analyzing them in Moments

For >10x coverage data:

calling variants using GATK's UnifiedGenotyper
variant quality score recalibration (VQSR) based on genotyping replicates
final thinning and filtering
quality assessment based on replicates
deriving site frequency spectra

IMPORTANT: Do not sequence 2bRAD libraries on a HiSeq 4000 lane alone! Invariant bases (adaptor, restriction site) will be read very poorly, resulting in read trimming issues. Mix 20% of PhiX libraries with your 2bRAD samples to avoid this problem, or share lane with someone else's non-2bRAD samples. (If you already did, no worries - there is a solution; contact me.)

Name		Name	Last commit message	Last commit date
Latest commit History 239 Commits
2bRAD_README.sh		2bRAD_README.sh
2bRAD_bowtie2_launch.pl		2bRAD_bowtie2_launch.pl
2bRAD_denovo_scripts_OLDmanual.pdf		2bRAD_denovo_scripts_OLDmanual.pdf
2bRAD_oligos_order.xlsx		2bRAD_oligos_order.xlsx
2bRAD_reducedRep_oligos.xlsx		2bRAD_reducedRep_oligos.xlsx
2bRAD_trim_launch.pl		2bRAD_trim_launch.pl
2bRAD_trim_launch_dedup.pl		2bRAD_trim_launch_dedup.pl
2bRAD_trim_launch_dedup0.pl		2bRAD_trim_launch_dedup0.pl
2bRAD_trim_launch_dedup2.pl		2bRAD_trim_launch_dedup2.pl
2bRAD_trim_launch_dedup_N.pl		2bRAD_trim_launch_dedup_N.pl
2bRAD_trim_launch_dedup_N2.pl		2bRAD_trim_launch_dedup_N2.pl
2bRAD_trim_launch_dedup_old.pl		2bRAD_trim_launch_dedup_old.pl
2brad_analysis.txt		2brad_analysis.txt
HetMajorityProb.py		HetMajorityProb.py
PCA_adegenet.R		PCA_adegenet.R
README.md		README.md
admix.4.Q		admix.4.Q
admixturePlotting_pcangsd.R		admixturePlotting_pcangsd.R
admixturePlotting_v5.R		admixturePlotting_v5.R
alleleCounts.pl		alleleCounts.pl
angsd_ibs_pca.R		angsd_ibs_pca.R
automatic_RAD_pipeline.sh		automatic_RAD_pipeline.sh
bayescan_plots.R		bayescan_plots.R
concatFasta.pl		concatFasta.pl
dadi22Dsfs.py		dadi22Dsfs.py
dadi2sfs.py		dadi2sfs.py
dadi2sfs_fold.py		dadi2sfs_fold.py
dadiBoot.pl		dadiBoot.pl
ddRAD_R1_denovo.txt		ddRAD_R1_denovo.txt
deltaAIC_multimodels.R		deltaAIC_multimodels.R
dephase.pl		dephase.pl
detect_clones.R		detect_clones.R
dna.mixing.R		dna.mixing.R
fastStructurePlotting.R		fastStructurePlotting.R
fastStructurePlotting_functions.R		fastStructurePlotting_functions.R
filterStats.pl		filterStats.pl
fstPerGene.R		fstPerGene.R
haplocall_denovo.pl		haplocall_denovo.pl
haplocall_denovo_old.pl		haplocall_denovo_old.pl
heterozygosity_beagle.R		heterozygosity_beagle.R
hetfilter.pl		hetfilter.pl
illumina_mix_data.csv		illumina_mix_data.csv
inds2pops.tab		inds2pops.tab
installationInstructions_3dParty_software.txt		installationInstructions_3dParty_software.txt
kmerer.pl		kmerer.pl
ktable2loci.pl		ktable2loci.pl
launcher_creator.py		launcher_creator.py
ld2matrix.R		ld2matrix.R
ls5_launcher_creator.py		ls5_launcher_creator.py
ls6_launcher_creator.py		ls6_launcher_creator.py
mergeKmers.pl		mergeKmers.pl
mergeUniq.pl		mergeUniq.pl
methylRAD_oligos_order.xlsx		methylRAD_oligos_order.xlsx
mix_illumina_qpcr.R		mix_illumina_qpcr.R
ngs_concat.pl		ngs_concat.pl
pcaStructure.R		pcaStructure.R
picogreen.R		picogreen.R
plotQC.R		plotQC.R
plotQCwgs.R		plotQCwgs.R
plot_R.r		plot_R.r
plot_admixture_v5_function.R		plot_admixture_v5_function.R
poibin.py		poibin.py
realsfs2dadi.pl		realsfs2dadi.pl
recalibrateSNPs.pl		recalibrateSNPs.pl
recalibrateSNPs_old.pl		recalibrateSNPs_old.pl
removeBayescanOutliers.pl		removeBayescanOutliers.pl
removeOverlapping.pl		removeOverlapping.pl
repMatchStats.pl		repMatchStats.pl
replicatesMatch.pl		replicatesMatch.pl
retabvcf.pl		retabvcf.pl
runRepsStructure.pl		runRepsStructure.pl
s2_launcher_creator.py		s2_launcher_creator.py
sfs2dadi.R		sfs2dadi.R
siteSelector.R		siteSelector.R
snpByTagPos.pl		snpByTagPos.pl
tagSharing.R		tagSharing.R
thinner.pl		thinner.pl
thinner_dadi.R		thinner_dadi.R
thinner_dadi.awk		thinner_dadi.awk
thinner_dadi.pl		thinner_dadi.pl
tossNreads.pl		tossNreads.pl
trim2bRAD.pl		trim2bRAD.pl
trim2bRAD_2barcodes.pl		trim2bRAD_2barcodes.pl
trim2bRAD_2barcodes_TNT.pl		trim2bRAD_2barcodes_TNT.pl
trim2bRAD_2barcodes_dedup.pl		trim2bRAD_2barcodes_dedup.pl
trim2bRAD_2barcodes_dedup2.pl		trim2bRAD_2barcodes_dedup2.pl
trim2bRAD_2barcodes_dedup_N.pl		trim2bRAD_2barcodes_dedup_N.pl
trim2bRAD_2barcodes_dedup_N2.pl		trim2bRAD_2barcodes_dedup_N2.pl
trim2bRAD_2barcodes_dedup_old.pl		trim2bRAD_2barcodes_dedup_old.pl
trim2bRAD_dedup.pl		trim2bRAD_dedup.pl
trim_ddRAD.pl		trim_ddRAD.pl
uniq2loci.pl		uniq2loci.pl
uniq2loci_kmers.pl		uniq2loci_kmers.pl
uniq2loci_usearch.pl		uniq2loci_usearch.pl
uniquerOne.pl		uniquerOne.pl
vcf2dadi.pl		vcf2dadi.pl
vcf2fakeGenome.pl		vcf2fakeGenome.pl
vcf2genepop.pl		vcf2genepop.pl
vcf2map.pl		vcf2map.pl
vhap2fasta.pl		vhap2fasta.pl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whole genome de novo genotyping with 2bRAD

About

Releases 1

Packages

Languages

z0on/2bRAD_denovo

Folders and files

Latest commit

History

Repository files navigation

Whole genome de novo genotyping with 2bRAD

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages