Extremely low mapping efficiency using bowtie2 and reference genome with many scaffolds #639

LauraRt1 · 2023-11-15T12:14:34Z

Hi!
I'm I'm using Bismark 0.24.2 with my WGBS samples (paired-end reads and not trimmed with TrimGalore cause I'm not a fan of trimming sequences).

In genome preparation (bismark_genome_preparation) I'm using an assembly (with many scaffolds in the same fasta file and same genotype as my samples) and Bowtie2 as aligner, but the mapping efficiency is really low, just 20%.

Trying to solve this low efficiency I used bowtie2 out of Bismark (parameter --very-sensitive I had almost 55%) but it is still really low, isn't it?

Using bwa-meth the mapping rate is almost 95% but these SAM files are not compatible with bismark (I'm having troubles resolving the compatibility issue between the bwa-meth SAM files and Bismark).

I'd like to use Bismark for the entire workflow (from mapping to methylation extraction) but I don't know how to increase the mapping efficiency

Thanks!!

FelixKrueger · 2023-11-15T13:46:57Z

Thanks for reaching out.

By default, Bismark runs in end-to-end mode in a fairly stringent setting (--score_min L,0,-0.2) which makes it essential that you provide it with quality- as well as adapter-trimmed input data (it is somewhat insensitive towards your personal preferences in this regard).
If you could yourself to try out trim_galore --paired file1 file2 (potentially in conjunction with somewhat more relaxed mapping parameters such as --score_min L,0,-0.4) you should see a much increased alignment rate. I you wanted to send me the sample FastQC profiles of your input files I could see if I can spot any measure that might be useful.

If you really don't like trimming at all you could use the local alignments (envoked using the option --local), which should behave very similarly to what bwa-meth does. As a word of caution though, local alignments will not remove low-quality base calls, and will most likely include incorrectly filled-in sequences that should really be removed prior to mapping (e.g. for the Accel Swift kit).

Very happy to assist you along the way, we can also continue this via email.

LauraRt1 · 2023-11-15T17:23:26Z

Thank you very much for your quick response!

I've been testing by extracting data from a pair of paired-end reads, 10,000 lines each (equivalent to 2,500 fastq) to get results as quickly as possible. From now on, all the results are from these fastq files of 10,000 lines (2500 fastq).

I trimmed my sequences with trim_galore, and then I aligned them with Bowtie2, changing the parameter --score_min L,0,-06. However, the mapping efficiency remains at 22%.

If I use trimmed sequences and add --very-sensitive-local it increases to 31%.

If I only use one read, fastq_1 (trimmed and --very-sensitive-local), the mapping efficiency increases to 76%. On the other hand, if I use only fastq_2 (trimmed and --very-sensitive-local), it is 55.9%. I used the --pbat parameter only on fastq_2, and it increases to 51.6%.

Anyway, a 50% mapping efficiency is still low. I don't understand why when I use both sequences together, it decreases to 20%.

Should I map all my fastq files as if they were single-end reads? Can I use the --pbat parameter with paired-end reads? I'm not sure how to proceed.

Thanks for your help

FelixKrueger · 2023-11-15T21:11:33Z

Hi Laura, if possible at all, could you send me a subset of the reads of say 100K reads for some quick tests on my side?

changchuanjun · 2024-02-01T13:56:45Z

Hello，are you here? I have met the similar diffcuilties as you, I am very confused, I want to know have you solved this problem? I am sincerely looking forward to your reply.

LauraRt1 · 2024-02-01T15:45:11Z

Hi changchuanjun!
I solved the problem by using Trim-Galore and the parameter --score_min L0-0.6.
I finally obtained a mapping efficiency between 70 and 75 %.

FelixKrueger · 2024-02-01T16:07:43Z

You can find a few ways of interpreting and mitigating low mapping efficiency issues here: https://felixkrueger.github.io/Bismark/faq/low_mapping/. If this doesn't solve your problems feel free to send me an email with as many details as possible.

changchuanjun · 2024-02-02T08:47:58Z

Hello LauraRt1 and FelixKrueger!
thanks for your reply! In my data process pipeline,I used fastp to remove adapter, this is my cleaned data state

actually,I used bismap to analyse my data firstly ,but the mapping efficiency is very low, just 0,very surprising, and my bismap command and result below

So I decided to change the software, But when I used bismark's paired mode to execute the script with the default parameter,the result was similar to bismap,the mapping efficiency was 0.
I refered related issues and modified my script,I used bismark's single mode to run the script with one fq1.gz,the mapping efficiency result increased significantly from 0 to 54%,the command the result below.

meanwhile,I used bismark's single mode to treat fq2.gz with the parameter --pbat,the mapping efficiency result was 57%,similar to fq1.gz

I am confused why so big difference between the paired mode and single mode in biamark mapping efficiency?
Did this result mean the sequencing data have some error?
How to combine the result of fq1.gz and fq2.gz

Looking forward to your reply sincerely

FelixKrueger · 2024-02-02T11:52:15Z

The most likely issue here is that the trimming you performed somehow interfered with the synchronisation between R1 and R2: both files need to have corresponding sequences in both files, and have the exact same number of reads in total.

I would recommend using the raw FastQ files once more, and using Trim Galore:

trim_galore --paired sample1_R1.fastq.gz sample1_R2.fastq.gz

followed by a paired-end mapping command (maybe you could relax the mapping stringency a little, e.g. to --score_min L,0,-0.4. This should really result in a very similar mapping efficiency you observe with single-end alignments. Which type of kit did you use?

changchuanjun · 2024-02-23T12:36:41Z

Hello,I am very delighted to receive your reply.I am so sorry to reply your message lately because of Spring Festival (Chinses New Year) vacation. I have found the reason why the mapping efficiency is very low(just 0).I used incorrect command when I ran the fastp script to clean the raw data. the detailed information below.The paired input file -I and -i are the same file.

when I used the correct command to execute my script,I found there was just little different no matter fastp or trim_galore in bismark's mapping efficiency.

fastp:

trimgalore:

FelixKrueger · 2024-02-23T13:44:18Z

So can we say this is a success all-around? All the best going foward!

changchuanjun · 2024-02-23T14:08:33Z

Yes, exactly. Besides, I also used bsmap software to run pipeline.It also demonstrated there was just a little different no matter fastp or trim_galore to treat the raw data in final mapping efficiency result.

Interestingly, bsmap's mapping efficiency was sightly high than bismark's mapping efficiency when used the same input file.

FelixKrueger closed this as completed Feb 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extremely low mapping efficiency using bowtie2 and reference genome with many scaffolds #639

Extremely low mapping efficiency using bowtie2 and reference genome with many scaffolds #639

LauraRt1 commented Nov 15, 2023

FelixKrueger commented Nov 15, 2023 •

edited

Loading

LauraRt1 commented Nov 15, 2023

FelixKrueger commented Nov 15, 2023

changchuanjun commented Feb 1, 2024

LauraRt1 commented Feb 1, 2024

FelixKrueger commented Feb 1, 2024

changchuanjun commented Feb 2, 2024

FelixKrueger commented Feb 2, 2024

changchuanjun commented Feb 23, 2024

FelixKrueger commented Feb 23, 2024

changchuanjun commented Feb 23, 2024

Extremely low mapping efficiency using bowtie2 and reference genome with many scaffolds #639

Extremely low mapping efficiency using bowtie2 and reference genome with many scaffolds #639

Comments

LauraRt1 commented Nov 15, 2023

FelixKrueger commented Nov 15, 2023 • edited Loading

LauraRt1 commented Nov 15, 2023

FelixKrueger commented Nov 15, 2023

changchuanjun commented Feb 1, 2024

LauraRt1 commented Feb 1, 2024

FelixKrueger commented Feb 1, 2024

changchuanjun commented Feb 2, 2024

FelixKrueger commented Feb 2, 2024

changchuanjun commented Feb 23, 2024

FelixKrueger commented Feb 23, 2024

changchuanjun commented Feb 23, 2024

FelixKrueger commented Nov 15, 2023 •

edited

Loading