Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

about unmapped reads #642

Closed
feihongloveworld opened this issue Dec 7, 2023 · 3 comments
Closed

about unmapped reads #642

feihongloveworld opened this issue Dec 7, 2023 · 3 comments

Comments

@feihongloveworld
Copy link

hi sir:
I got an unmapped read after the bismark mapping with default parameters on the hg19 reference.
but I think it's not reasonable.

image

the sequence below:
@XXXXF/1
CTGAAGATTTTTTATTTTGTAATGTATGTTGGAAATAATTATTTTTTTTTATTTTTTTAATAATTTTTATTATTTATATTTATTGAAATTGGAGATTTTTATTAGGGTGGAAAGAGTGGGGGATTGGGATTTTTTTTTATGATTGTTTTG
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@XXXXR/1
TACCCCCAAATAACATAAAAAAAAAAAAAACAAGGAAAAAAATTTCTCCCTAATTTTACCAAAAAAAACCTCCCCTCTACCCTCTACTCTTCCATTTACAATTTTTTACTTCCCAAAATTTACTATACAAAAACCAAAACCCCTCCCTTT
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

@feihongloveworld
Copy link
Author

which paramters do I need to adjust ?

@FelixKrueger
Copy link
Owner

FelixKrueger commented Dec 7, 2023

Well, the read 2 you supplied carries 21(!) non-bisulfite mismatches to the reference sequence (read 1 only a single mismatch)

readID	83	17_GA_converted	43125704	11	150M	=	43125550	-304	CAAAACAATCATAAAAAAAAATCCCAATCCCCCACTCTTTCCACCCTAATAAAAATCTCCAATTTCAATAAATATAAATAATAAAAATTATTAAAAAAATAAAAAAAAATAATTATTTCCAACATACATTACAAAATAAAAAATCTTCAA	IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII	AS:i:-6	XS:i:-111	XN:i:0	XM:i:1	XO:i:0	XG:i:0	NM:i:1	MD:Z:149C0	YS:i:-126	YT:Z:CP
readID	163	17_GA_converted	43125550	11	150M	=	43125704	304	TACCCCCAAATAACATAAAAAAAAAAAAAACAAAAAAAAAAATTTCTCCCTAATTTTACCAAAAAAAACCTCCCCTCTACCCTCTACTCTTCCATTTACAATTTTTTACTTCCCAAAATTTACTATACAAAAACCAAAACCCCTCCCTTT	IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII	AS:i:-126	XS:i:-191	XN:i:0	XM:i:21	XO:i:0	XG:i:0	NM:i:21	MD:Z:29C0A11A4A5A7T4T4A7A14A6C2A4A2A3T1A14T5T1A2A3A1	YS:i:-6	YT:Z:CP

The function guiding this is called --score_min, which is a linear function guiding how many mismatches a read (pair) may have before it fails for align. In your example, the alignment score (AS) is calculated as follows:

allowed maximum AS = (length in bp)* -0.2 (default), so in your case:

150 * -0.2 =  -30

As a mismatch 'costs' -6 penalty points, a read may have -30/-6 = 5 mismatches, anything above will be discarded. 22 mismatches definitely exceeds this limit, so all is well (from the aligner's perspective).

@feihongloveworld
Copy link
Author

As a mismatch 'costs' -6 penalty points, a read may have -30/-6 = 5 mismatches, anything above will be discarded. 22 mismatches definitely exceeds this limit, so all is well (from the aligner's perspective).

Why you have so much energy to maintain so many softs. like a superman in my mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants