-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MD tag generation incorrect? #37
Comments
According to this comment in the hts-spec issue seeking to clarify the MD tag, BWA (which BISCUIT is based on) should adhere to the spec for formatting MD tags (i.e., adding 0's after deletions that are followed by mismatches). I also looked at the reads with deletions in them in your BAM and the size of the deletions match up with the number of bases printed in the MD tag. So, I don't think the deletions are the cause of the error you're seeing. That said, if the pysam issue discussion turns up the BISCUIT MD tag is malformed, let me know and I'll figure out a fix. |
Please see the second half of pysam-developers/pysam#1180 (comment). It would appear that the MD values emitted by biscuit, while they are syntactically correct w.r.t. the ‘0’ separators, are not quite correct. (Or perhaps that there are ambiguity characters in the reference at these positions, which biscuit is handling in some way?) |
Comparing the actual reference to the reference inferred by
And in looking through the code, it appears that BISCUIT does not properly account for the C>T / G>A conversion for the MD tag. I'll work on getting a fix put together for this. EDIT: I previously mentioned that it seemed like the NM tag was "handled properly." This is partially correct, as described below. |
I made a commit that now brings the BISCUIT MD tag into alignment with the hts-spec (i.e., the MD tag is the same in BISCUIT as As for the NM tag, the decision was made to leave this as the number of non-cytosine-conversion mismatches, which falls under the gray area of the hts-spec (see the italicized section below):
The NM tag that would be given by |
I ran into an odd issue parsing biscuit alignments with pysam (pysam-developers/pysam#1180) — and I wonder if the MD tags generated by biscuit are correctly inserting a 0 after deleted bases followed by mismatches.
This is described a bit in this comment: pysam-developers/pysam#895 (comment)
Specifically I'm running into this issue looking at this tiny bam and read
A00354:758:HVTT7DSX3:3:2333:29405:16219
with MD tag18A5A5A1C3A13^C32^CCTAA10A1C1C32
The text was updated successfully, but these errors were encountered: