Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

call-methylation failing - non-VBZ problem, can't run --min-mapping-quality #1011

Closed
coeyct opened this issue May 21, 2022 · 13 comments
Closed

Comments

@coeyct
Copy link

coeyct commented May 21, 2022

Hi,

I'm having a new issue with generating methylation calls that's similar to a few issues others have described: i.e., exactly half of FAST5 reads are bad reads, the files are empty except for the headers. These are not VBZ compressed, so that's not the issue. But I can't get the --min-mapping-quality modifier to work, because it says this is not recognized for call-methylation. Is this a version problem (on 0.13.2, can't change it without requesting IT to do so), or is there a specific place in the command line where that is supposed to be dropped (that is, each line of my swarm has -r -b -g -w and I don't know where --min-mapping-quality is supposed to fall in that hierarchy, or if it's even supposed to matter). Thanks.

@coeyct
Copy link
Author

coeyct commented Jun 23, 2022

Hi,

Is there any news on this issue with exactly half of FAST5 reads being called bad reads? I can't move my data analysis forward because this program is consistently failing to output, when it processed files with longer N50s (>20kb) just fine. My current data sets are closer to 8kb and these are failing over and over.

I also tried running Filtlong on the data beforehand to remove both <500bp and <1kbp fragments first, but the "calls.tsv" files are still not populating as they had for the prior data sets I used.

I have my .fastq files in the same folder as the swarm file I'm running as well as the SAM and BAM files (along with all the files generated via the initial indexing step) and the genome file. The FAST5 files are in a subfolder called "FAST5". Am I supposed to have the .fastq files somewhere else, or any of the other files somewhere else?

Here is an example line of what I am running:

nanopolish call-methylation -r S205_500Filt.fastq -b S205_500_sort.bam -g GCF_000001405.39_GRCh38.p13_genomic.fna -w "NC_000001.11" > NPchr01_500_calls.tsv

I don't really know why this worked before but is no longer working now. I am at a point where I'm going to have to switch to a different analysis tool if I can't figure out why this is failing repeatedly. Thanks.

@jts
Copy link
Owner

jts commented Jun 23, 2022

Hi,

Sorry about this. While a few users have reported this issue, the only exact cause (in a subset of cases) I've been able to determine is when the reads are duplicated in the bam. Would you be able to provide me with a minimal set of data so I can reproduce the problem you're having, so I can look into it further?

Jared

@coeyct
Copy link
Author

coeyct commented Jun 23, 2022

How would I go about doing that? Coding is hardly my forte.

@jts
Copy link
Owner

jts commented Jun 23, 2022

Well, let's try like this first. Could you run this command, and paste the post-run summary line? It will run NP on just a small region of the genome, I'd like to confirm you get the problem over this region before we investigate further:

nanopolish call-methylation -r S205_500Filt.fastq -b S205_500_sort.bam -g GCF_000001405.39_GRCh38.p13_genomic.fna -w "NC_000001.11:1-10000" > /dev/null

@coeyct
Copy link
Author

coeyct commented Jun 23, 2022

I tried this, but I'm not seeing any output whatsoever. Was it supposed to just output to screen on Terminal or generate a file? I'm running this through an interactive session on the NIH Biowulf cluster, but there isn't anything in the way of error or output files connected to the specific job that I'm currently using.

@jts
Copy link
Owner

jts commented Jun 23, 2022

It should print something to stderr when it finishes (just before it returns to the shell prompt when run interactively)

@coeyct
Copy link
Author

coeyct commented Jun 23, 2022

OK, but I don't know where I'm supposed to find that; the only line on the screen was

[bam process] iterating over region: NC_000001.11:1-10000

And then it returned to the prompt. I don't know where any written file like that would be in the directory; typically the jobs have a ".out" file or a ".e/.o" file for swarms; this gave neither.

@jts
Copy link
Owner

jts commented Jun 23, 2022

Alright, there were probably no reads there, can you try a different 10kb region where you know some reads align?

@coeyct
Copy link
Author

coeyct commented Jun 23, 2022

[post-run summary] total reads: 132, unparseable: 0, qc fail: 0, could not calibrate: 0, no alignment: 0, bad fast5: 66

@jts
Copy link
Owner

jts commented Jun 23, 2022

Ok, great. Can you take that region and run:

samtools view S205_500_sort.bam <region> | cut -f1 > reads.list
grep -f reads.list S205_500Filt.fastq.index.readdb > reads.index.readdb

then send me those two files?

@coeyct
Copy link
Author

coeyct commented Jun 23, 2022

Sent.

@coeyct
Copy link
Author

coeyct commented Jun 27, 2022

OK, now this is telling me it is a VBZ problem, which doesn't make any sense, because those options have never been selected in MinKNOW or guppy at any point in time - I have left that alone since before the software update I had to run on the Mk1C a few months ago.

But this is the error fast5-check just gave me:

The fast5 file is compressed with VBZ but the required plugin is not loaded. Please read the instructions here: nanoporetech/vbz_compression#5

@jts
Copy link
Owner

jts commented Jun 27, 2022

I believe at some time in the past ONT turned on vbz compression by default. This is good news though, as it should work after you install the plugin. Let me know if this fixes it

@jts jts closed this as completed Jul 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants