Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to get modbase info for record ..., Skipped: AUX data not found #159

Open
ArnavBharti opened this issue Apr 10, 2024 · 22 comments
Open
Labels
troubleshooting workflow and data preparation questions

Comments

@ArnavBharti
Copy link

ArnavBharti commented Apr 10, 2024

I am getting this error while running modkit pileup: modkit pileup <bam> <bed>

failed to get modbase info for record <record>, Skipped: AUX data not found

Procedure

  1. nanopolish index -d
  2. minimap2 -a -x map-ont
  3. samtools sort
  4. samtools index

I also ran modbam2bed. A file was formed by it gave methylation nan or 0

What could be the reason so?

@ArtRand
Copy link
Contributor

ArtRand commented Apr 11, 2024

Hello @ArnavBharti,

You must be loosing the MM/ML/MN tags somewhere along the way. Could you check that the records have these tags after each step? If you can tell me the file formats of the outputs and inputs to the steps that could help me debug the problem.

@ArnavBharti
Copy link
Author

  1. Initially FAST5
  2. After basecalling, FASTQ
  3. Nanopolish index: (input) FAST5, FASTQ; (output) index files .fai, etc.
  4. minimap: (input) FASTA, FASTQ; (output) SAM
  5. samtools: (input) sam (output) bam

@ArtRand
Copy link
Contributor

ArtRand commented Apr 12, 2024

Hello @ArnavBharti,

Could you check that the reads have the MM/ML/MN tags at the end of each step? I have a feeling that your basecalled FASTQ does not have them. For your reference the tags I'm talking about are described in the SAM tags specification in section 1.7.

@ArnavBharti
Copy link
Author

Hi, could you tell if you check for these tags via text search or is there a tool. Because there exists 'MM' in the FASTq file upon a simple text search but I feel like I am probably missing something.

@PRIYANKA-22091995
Copy link

Hi,
I was also facing the same issue, is there a way to check the tags in the fastq file.

@PRIYANKA-22091995
Copy link

Hello @ArtRand

It would be nice if you could give insights on the same.

Hi, could you tell if you check for these tags via text search or is there a tool. Because there exists 'MM' in the FASTq file upon a simple text search but I feel like I am probably missing something.

@marcus1487
Copy link
Contributor

The SAM/BAM tags as part of the FASTQ file format is not officially supported. The "support" for this is simply via the -y argument to minimap2 which "Copy input FASTA/Q comments to output." (see manual here). Thus the only way to check for the tags in a FASTQ would be text based. I would suggest like the following awk line to print out any rows of the file with the MM tag: awk 'NR %4 == 1 && $0 ~ /MM:Z:/'

If you must use minimap2 ensure that you have the -y option specified to ensure the tags are indeed copied to the output mappings. This is the most likely step where the tags are dropped as long as modified base calling was on at basecalling time (though I think this might require unmapped BAM output).

@ArtRand ArtRand added the troubleshooting workflow and data preparation questions label Apr 16, 2024
@ArnavBharti
Copy link
Author

ArnavBharti commented Apr 17, 2024

Hi,

  1. I used y flag with minimap2 minimap2 -aY -x map-ont and it still gave same error
  2. the awk awk 'NR %4 == 1 && $0 ~ /MM:Z:/' gave no output

As per 2, Is there a problem with the basecalled files?

I'll try witth dorado aligner

@marcus1487
Copy link
Contributor

-Y and -y are different options in minimap. In this case you want the lower case one not the upper case one as shown in your command.

@ArnavBharti
Copy link
Author

Even with -y same issue

@marcus1487
Copy link
Contributor

Given that the awk line gives no results the issue is that you do not have modified base calls in your basecalls. Modified base calls are not available for FASTQ output from Dorado. If you output the default unmapped BAM the modified base calls will be there. You can also have Dorado perform the mapping while basecalling to avoid any of the minimap issues.

I'm not sure what your downstream goals are with running nanopolish etc, but you can convert your unmapped BAM file to FASTQ with the samtools fastq -T "*" command. Let me know if you have any further issues.

@PRIYANKA-22091995
Copy link

Hello @marcus1487

The basecalling was done using trained RNN model i.e. res_dna_r941_min_modbases-all-context_v001 by using guppy v3.5.1.

@PRIYANKA-22091995
Copy link

Hello @marcus1487 @ArtRand Art
Using guppy v3.5.1, I attempted to do basecalling using the trained RNN model, res_dna_r941_min_modbases-all-context_v001.Following the completion of basecalling, which produced the fastq files, minimap was used to create the bam file. I then ran modkit pileup, however it failed with the error "failed to get modbase info for record \record>." omitted: AUX data not located

Note: Since the data came from the R9 flowcell, guppy was run. Therefore, in order to do the m6A modified basecalling, I utilised the following configuration file:
res_dna_r941_min_modbases-all-context_v001

Your opinions on the same would be greatly appreciated.

Thanks

@marcus1487
Copy link
Contributor

I don't believe modified base output in FASTQ format is supported. You'll have to specify BAM output format in order to proceed from guppy basecalling.

@PRIYANKA-22091995
Copy link

I don't believe modified base output in FASTQ format is supported. You'll have to specify BAM output format in order to proceed from guppy basecalling.
Hello @marcus1487
In case of this there is no option to specify bam output, by default it will only give fastq.
Note-But by normal text search i have observed MM tags in the fastq file.

@PRIYANKA-22091995
Copy link

Hello @marcus1487 @ArtRand

It would be nice if you could help me with the above issue.

Thanks

@ArtRand
Copy link
Contributor

ArtRand commented Apr 23, 2024

Hello @PRIYANKA-22091995 and @ArnavBharti,

You need to run the basecalling such that you get and unaligned BAM file as output (not FASTQ output). I believe that the --bam_out flag will do this. From there I would align the reads with dorado aligner 0.5.3. Could you tell me when you're able to successfully generate a basecall BAM file with the MM/ML tags? You can check the tags with modkit summary.

@ArtRand
Copy link
Contributor

ArtRand commented Apr 27, 2024

@ArnavBharti @PRIYANKA-22091995,

Any update on this?

@PRIYANKA-22091995
Copy link

@ArnavBharti @PRIYANKA-22091995,

Any update on this?

there is no option to put --bam out, and even after putting --bam out, it has only given fastq output. It would be nice to know any suggestion/input for the same.

@PRIYANKA-22091995
Copy link

Hello @ArtRand
The following command was used for modified basecalling for R9 flowcell data:
-i -s guppy_output_ed_cc_cq_m6a/ -c ~/workspace/rerio/basecall_models/res_dna_r941_min_modbases-all-context_v001.cfg --recursive --calib_reference ./PlasmoDB-61_Genome.fasta -x cuda:0 -m ~/workspace/rerio/basecall_models/res_dna_r941_min_modbases-all-context_v001.jsn --barcode_kits "EXP-NBD104"

@ArtRand
Copy link
Contributor

ArtRand commented Apr 30, 2024

@ArnavBharti @PRIYANKA-22091995,

Any chance you could update guppy to at least 6.3.2? I downloaded v3.5.1 and don't see the options you'll need. If you're using R9 data, you could probably use dorado also.

@PRIYANKA-22091995
Copy link

@ArnavBharti @PRIYANKA-22091995,

Any chance you could update guppy to at least 6.3.2? I downloaded v3.5.1 and don't see the options you'll need. If you're using R9 data, you could probably use dorado also.

Earlier was trying through guppy v0.6.2, but it failed with a error [guppy/error] The pipeline has shut down prematurely due to an error condition. So, then i moved to the previous version to do modified basecalling for m6A, but again it does not provide option with bam.
to do modified basecalling with R9 data, dorado does not have option for m6A.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
troubleshooting workflow and data preparation questions
Projects
None yet
Development

No branches or pull requests

4 participants