SeqLib::BamWriter.WriteRecord() seems to be truncating the qname field #33

bkohrn · 2018-12-07T01:31:45Z

When I create a new bam record, the new bam record seems to truncate the read name at 23 characters. I am using the following code (note that this is part of a larger program; I just stopped it in the middle to check the file quickly):

SeqLib::BamReader in_bam_file;
SeqLib::BamWriter temp_bam = SeqLib::BamWriter(SeqLib::SAM);
in_bam_file.Open(input);
bool testOpen = temp_bam.Open(prefix + ".temp.bam");
temp_bam.SetHeader(in_bam_file.Header());
temp_bam.WriteHeader(); 
SeqLib::BamRecord line;
SeqLib::BamRecord temp_read1_entry;
SeqLib::BamRecord temp_bam_entry;
while (in_bam_file.GetNextRecord(line)) {
    if (paired_end_count % 2 == 1) {
        temp_read1_entry = line;
    }
    if (paired_end_count % 2 == 0) {
        std::string tagName;
        SeqLib::BamRecord temp_bam_entry;
        if (temp_read1_entry.Sequence().substr(0,taglen) + temp_read1_entry.Sequence().substr(taglen + spacerlen, loclen) > line.Sequence().substr(0,taglen) + line.Sequence().substr(taglen + spacerlen, loclen)) {
            tagName = temp_read1_entry.Sequence().substr(0,taglen) + \
            temp_read1_entry.Sequence().substr(taglen + spacerlen, loclen) + \
            line.Sequence().substr(0,taglen) + \
            line.Sequence().substr(taglen + spacerlen, loclen) + "#ab";
        }
        else if (temp_read1_entry.Sequence().substr(0,taglen) + temp_read1_entry.Sequence().substr(taglen + spacerlen, loclen) < line.Sequence().substr(0,taglen) + line.Sequence().substr(taglen + spacerlen, loclen)) {
            tagName = line.Sequence().substr(0,taglen) + \
            line.Sequence().substr(taglen + spacerlen, loclen) + \
            temp_read1_entry.Sequence().substr(0,taglen) + \
            temp_read1_entry.Sequence().substr(taglen + spacerlen, loclen) + "#ba";
        }
        else if (temp_read1_entry.Sequence().substr(0,taglen) + temp_read1_entry.Sequence().substr(taglen + spacerlen, loclen) == line.Sequence().substr(0,taglen) + line.Sequence().substr(taglen + spacerlen, loclen)) {
            paired_end_count++;
            continue;
        }
        // Write entries for Read 1
        std::string tempQualities = temp_read1_entry.Qualities().substr(taglen + spacerlen);
        temp_bam_entry = temp_read1_entry;
        temp_bam_entry.AddZTag("X?", temp_read1_entry.Qname());
        temp_bam_entry.SetQname("abcdefghijklmnopqrstuvwxyz");
        temp_bam_entry.SetSequence(temp_read1_entry.Sequence().substr(taglen + spacerlen));
        temp_bam_entry.SetQualities(tempQualities, 33);
        temp_bam.WriteRecord(temp_bam_entry);
       
        // Write entries for Read 2
        tempQualities = line.Qualities().substr(taglen + spacerlen);
        temp_bam_entry = line;
        temp_bam_entry.AddZTag("X?", line.Qname());
        temp_bam_entry.SetQname("abcdefghijklmnopqrstuvwxyz");
        temp_bam_entry.SetSequence(line.Sequence().substr(taglen + spacerlen));
        temp_bam_entry.SetQualities(tempQualities, 33);
        temp_bam.WriteRecord(temp_bam_entry);
        temp_bam.Close();
        abort();

When I look at the resulting file, I see this:

@hd VN:1.5 SO:queryname
@rg ID:A SM:testNonRef
abcdefghijklmnopqrstuvw 77 * 0 0 * * 0 0 GGGGTGCCCCCAAAGCGGGCCCGTGGGGGCCTCTGGGATGATGATGATGCACCTCGGCCATCCCAATTCGAGGAGGACCTGGCA DGHIIIIIIIIIIHIIIIIHHHIIHIIIGIIIIIIIIIHIIIIIIIIIIIIIIFIIGIIIGIIIIIHHIIIHIIIIIIGIHHII RG:Z:A X?:Z:HISEQ_0537:1:1101:10000:20114#ACGCTC
abcdefghijklmnopqrstuvw 141 * 0 0 * * 0 0 AAGAGAATGAGGGTCAAGGGTCAGGGCCAGAATCACTGTAGGCCACACAGGCTGTACGGGGTGGAAGGGCAGTCCTTTGTAGGA DGHHIIIIIIIHIHHIIIIIIIIIGHIIIIIIIIIIIIIIIIIIIIIIIIHIIIIIIIIIGHIIIIIHIHHIIIIIFHFGHHIG RG:Z:A X?:Z:HISEQ_0537:1:1101:10000:20114#ACGCTC

Based on my understanding, the read name should be "abcdefghijklmnopqrstuvwxyz", but it isn't.

Thanks for any help you can give,
Brendan

The text was updated successfully, but these errors were encountered:

martinjvickers · 2018-12-07T09:47:05Z

That looks like a valid BAM to me, albeit unaligned to a genome. The 77 * 0 0 * * 0 0 part are columns 2-9 of the BAM/SAM specification. see section 1.4.

EDIT: Ignore above, I see what you're saying now, it's missing xyz

martinjvickers · 2018-12-07T10:10:52Z

I'm not entirely sure what is causing it for you, but I've just edited a program I made to see if SetQname is truncating but it doesn't seem to be;

BamRecordVector results;
bwa_ga.AlignSequence(seq, id, results, hardclip, secondary_cutoff, secondary_cap);

for(auto& i : results)
{
   i.SetQname("abcdefghijklmnopqrstuvwxyz");
   writer.WriteRecord(i);
}

abcdefghijklmnopqrstuvwxyz      16      chr4    5360249 60      101M    *       0       0       TAAATTAGGTTTGGATTTTGTGATAAGTTTAATTAATTATATTTATTTTAATGTTACGATGTTATATTATANTTTAGNATGCACTTAATNNNAGTNCNNTA   *       NA:i:2  NM:i:9  SB:i:0  AS:i:80

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SeqLib::BamWriter.WriteRecord() seems to be truncating the qname field #33

SeqLib::BamWriter.WriteRecord() seems to be truncating the qname field #33

bkohrn commented Dec 7, 2018

martinjvickers commented Dec 7, 2018 •

edited

Loading

martinjvickers commented Dec 7, 2018

SeqLib::BamWriter.WriteRecord() seems to be truncating the qname field #33

SeqLib::BamWriter.WriteRecord() seems to be truncating the qname field #33

Comments

bkohrn commented Dec 7, 2018

martinjvickers commented Dec 7, 2018 • edited Loading

martinjvickers commented Dec 7, 2018

martinjvickers commented Dec 7, 2018 •

edited

Loading