Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spiky position in the M-Bias plot #673

Open
neolithlee opened this issue Jun 13, 2024 · 5 comments
Open

Spiky position in the M-Bias plot #673

neolithlee opened this issue Jun 13, 2024 · 5 comments

Comments

@neolithlee
Copy link

bismark_mbias-CpG_R1
bismark_mbias-CpG_R2

The data I used was processed by fastqc and Trim_galore. And it is processed by bismark, deduplicate_bismark and bismark_methylation_extractor as specified in the manual.

As can be seen in the M-bias plot(from multiQC), the methylation level of read 1 appears to be stable. However, methylated read 2 produces some peaks. May I ask why this variant appears in the Reading 2 plot?

@FelixKrueger
Copy link
Owner

Such spikes in M-bias plots (of sometimes also GC content plots etc) are typically caused by individual sequences that are highly overrepresented, and have a certain methylation state. You could try to identify the particular sequence via various means, the easiest probably being looking for isolated loci with a very high number of mapping reads. You could also try to see how many calls there are at this position (not sure you can do this in the MultiQC report, but you could look at the equivalent Bismark_report.html). In all likelihood such minor blips won't affect your downstream analysis overall, but are likely some very localised effects (just my gut feeling at this point).

@neolithlee
Copy link
Author

Thanks for your reply.

As you said, some of the spikes seem to be related to the number of reads.
In the case of the largest spike, the average Qscore appears to be lower than other areas, so I will check whether there is an experimental problem.

M-bias

@FelixKrueger
Copy link
Owner

These things something seem present themselves problematic in more than one of the FastQC modules. There could for example have been a technical issue with the flowcell (which you might see in the per-tile plot), such as an air bubble, or a higher call of N at the position, or a very high number of a specific call (e.g. G when the signal from the dyes wasn't high enough), or indeed it there is a very high prevalence of a certain base because of an overrepresentation of a certain (repetitve?) sequence that will in turn down-adjust quality scores and the like. But given that it manifests itself in the M-bias plot, it has to come from a sequence that is mappable, which already narrows it down substantially. Happy sleuthing!

@shaohuaihan
Copy link

image
Hello Felix:

The M-bias of the Read2: the line of total calls is showing instability and a gradual decline.

Do I need to pay special attention to this indicator? How should I address it?
Should I retain reads with higher quality values during data preprocessing?

Thank you!

@FelixKrueger
Copy link
Owner

The drop in the total number of calls for Read 2 is a consequence of removing redundant methylation calls during overlap-trimming. This behaviour is expected, and it is totally fine to ignore this 'oddity'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants