Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

modkit dmr - failed to read tabix index #178

Open
lilypeck opened this issue May 2, 2024 · 6 comments
Open

modkit dmr - failed to read tabix index #178

lilypeck opened this issue May 2, 2024 · 6 comments
Labels
build-available custom build produced for fix. troubleshooting workflow and data preparation questions

Comments

@lilypeck
Copy link

lilypeck commented May 2, 2024

Hello

mod_kit 0.2.8

I am getting a very basic error:

> Error! failed to read tabix index "barcode05_E_CHH.bedmethyl.gz.tbi"
>  caused by invalid reference sequence names
>  caused by expected EOF

My script is:

/u/home/l/ldpeck/project-vlsork/longreads/dist/modkit dmr pair \
  -a ${dmrD_1}_${context}.bedmethyl.gz \
  -a ${dmrD_2}_${context}.bedmethyl.gz \
  -b ${dmrC_1}_${context}.bedmethyl.gz \
  -b ${dmrC_2}_${context}.bedmethyl.gz \
  -o dmr/dmp_${run}_${context}.tab \
  --ref /u/home/l/ldpeck/genome_resources/GCF_001633185.2_ValleyOak3.2_genomic.fna \
  --base C \
  -t 24 \
  -f \
  --log-filepath dmr/dmp_${run}.log

However I don't think the error is related to my .tbi files, because I have re-run a script that previously successfully completed with mod_kit 0.2.7, and it now fails with this error for mod_kit 0.2.8.

Is there something you can see that might be causing this?

Thank you in advance!

Lily

@ArtRand
Copy link
Contributor

ArtRand commented May 2, 2024

Hello @lilypeck,

There shouldn't be any changes in modkit v0.2.7 to v0.2.8 with respect to how the tabix index is handled. However, I did update the dependencies that modkit uses, so it's possible that it picked up a bug. Could you?

  1. Check if v0.2.7 works on the same input.
  2. Attach the tabix index that is failing to this thread so I can investigate what the problem is.

Thanks.

@ArtRand ArtRand added the troubleshooting workflow and data preparation questions label May 2, 2024
@lilypeck
Copy link
Author

lilypeck commented May 2, 2024

Hello @ArtRand
Thank you for your response!
I have just checked with v0.2.7 and I don't get the tabix file error -

> reading reference FASTA at "/u/home/l/ldpeck/genome_resources/GCF_001633185.2_ValleyOak3.2_genomic.fna"
> running single-site analysis
> using default prior, Beta(α: 0.55, β: 0.55)
> estimating max coverages from data
> sampled 4139233 a records and 4027045 b records, calculating max coverages for 95th percentile
> calculated max coverage for a: 24 and b: 30
> running with replicates and matched samples

I have attached two .tbi files which failed. I have also checked the bedmethyls.gz and they are complete (with the same tail output as the uncompressed versions).

Thanks

Lily

barcode21_U_CG.bedmethyl.gz.tbi.txt
barcode21_U_CHG.bedmethyl.gz.tbi.txt

@ArtRand
Copy link
Contributor

ArtRand commented May 2, 2024

Hello @lilypeck,

I was able to reproduce the error using noodles version 0.69.0 (the version in modkit 0.2.8), the error does not occur with version 0.50.0 (the version in modkit 0.2.7). What is strange, however, is that the tabix indices that I have in tests and some others I've used seem to be parsed without complaint. Could you tell me what version of tabix you have? This is what I have tested:

tabix --version
tabix (htslib) 1.18
Copyright (C) 2023 Genome Research Ltd.

If you give me a few minutes I can get you a build with the older version of the library to unblock your work, but I'd like to get to the bottom of the problem also. So to summarize, please:

  1. Tell me the version of tabix you have and if you could show me the script you're using.
  2. (If it's not too large) send me one of the bgzipped bedmethyl files.
  3. If this ends up being a noodles bug, I'd like to open an issue with the noodles developers, could you give me permission to use your file as an example to exercise the bug?

@lilypeck
Copy link
Author

lilypeck commented May 2, 2024

Hello @ArtRand
Thank you very much!
Tabix is:

tabix (htslib) 1.19.1
Copyright (C) 2024 Genome Research Ltd.

The complete .bedmethyl is too big to upload, so I have uploaded the first 1m lines. Or if you have an email address I could send you a copy? And yes very happy for you to use these to exercise the bug.

Thank you very much for your help.

Lily
barcode21_U_CHH.bedmethyl.head.gz

@ArtRand
Copy link
Contributor

ArtRand commented May 2, 2024

Hello @lilypeck,

Alright, I've made a branch (build attached) where I've changed the version back. Please let me know if this build works. I'm going to investigate why the later versions don't work with tabix 1.19.1. Thanks for permission to use your files as well.

modkit_dev9c754d4c_centos7_x86_64.tar.gz

@ArtRand ArtRand added the build-available custom build produced for fix. label May 2, 2024
@lilypeck
Copy link
Author

lilypeck commented May 3, 2024

Hi @ArtRand
Thank you so much it is working now!
Lily

ArtRand added a commit that referenced this issue May 21, 2024
[bug] Tabix indices don't load

See merge request machine-learning/modkit!176
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build-available custom build produced for fix. troubleshooting workflow and data preparation questions
Projects
None yet
Development

No branches or pull requests

2 participants