Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pileup does not allow gzipped reference sequence #100

Open
KloostermanJoukje opened this issue Dec 14, 2023 · 1 comment
Open

pileup does not allow gzipped reference sequence #100

KloostermanJoukje opened this issue Dec 14, 2023 · 1 comment
Labels
enhancement New feature or request
Milestone

Comments

@KloostermanJoukje
Copy link

KloostermanJoukje commented Dec 14, 2023

Hello,

I am using modkit pileup to create a bedmethyl file for modified basecalling.
I basecalled with Dorado, afterwards I sorted the bam file.
However when using modkit, i get an empty bed file.

Output:

[src/logging.rs::54][2023-12-14 08:35:36][DEBUG] command line: modkit pileup vip_fam0_FAW14788_sorted.bam vip_fam0_FAW14788_cpg.bed --cpg --ref ./resources/GRCh38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz --only-tabs --log-filepath vip_fam0_FAW14788_modkit.log

[src/pileup/subcommand.rs::386][2023-12-14 08:35:36][INFO] calculated chunk size: 6, interval size 100000, processing 600000 positions concurrently

[src/pileup/subcommand.rs::493][2023-12-14 08:35:36][INFO] filtering to only CpG motifs

[src/command_utils.rs::67][2023-12-14 08:35:36][INFO] attempting to sample 10042 reads

[src/reads_sampler/mod.rs::41][2023-12-14 08:35:36][DEBUG] found BAM index, sampling reads in 1000000 base pair chunks

[src/reads_sampler/sampling_schedule.rs::127][2023-12-14 08:35:36][DEBUG] derived sampling schedule, sampling total 10179 reads from 191 contigs, 0 unmapped reads

[src/reads_sampler/sampling_schedule.rs::137][2023-12-14 08:35:36][DEBUG] schedule: SQ: 0, 1020 reads SQ: 1, 722 reads SQ: 3, 674 reads SQ: 2, 584 reads SQ: 4, 559 reads SQ: 5, 501 reads SQ: 6, 468 reads SQ: 9, 465 reads SQ: 8, 448 reads SQ: 7, 439 reads SQ: 12, 380 reads SQ: 11, 373 reads SQ: 10, 370 reads SQ: 15, 336 reads SQ: 14, 289 reads SQ: 16, 287 reads SQ: 23, 265 reads SQ: 17, 256 reads SQ: 13, 255 reads SQ: 19, 252 reads SQ: 22, 241 reads SQ: 20, 212 reads SQ: 18, 163 reads SQ: 21, 154 reads SQ: 96, 44 reads SQ: 45, 27 reads SQ: 170, 26 reads SQ: 59, 24 reads SQ: 192, 18 reads SQ: 28, 17 reads SQ: 132, 12 reads SQ: 55, 11 reads SQ: 62, 11 reads SQ: 53, 10 reads SQ: 176, 9 reads SQ: 190, 8 reads SQ: 171, 7 reads SQ: 60, 7 reads SQ: 91, 7 reads SQ: 191, 6 reads SQ: 37, 6 reads SQ: 183, 5 reads SQ: 44, 5 reads SQ: 94, 5 reads SQ: 175, 5 reads SQ: 36, 5 reads SQ: 47, 4 reads SQ: 25, 4 reads SQ: 56, 4 reads SQ: 173, 4 reads SQ: 31, 3 reads SQ: 57, 3 reads SQ: 186, 3 reads SQ: 52, 3 reads SQ: 193, 3 reads SQ: 54, 3 reads SQ: 137, 3 reads SQ: 168, 3 reads SQ: 41, 3 reads SQ: 189, 2 reads SQ: 50, 2 reads SQ: 136, 2 reads SQ: 179, 2 reads SQ: 95, 2 reads SQ: 181, 2 reads SQ: 169, 2 reads SQ: 35, 2 reads SQ: 188, 2 reads SQ: 61, 2 reads SQ: 104, 2 reads SQ: 92, 2 reads SQ: 63, 2 reads SQ: 180, 2 reads SQ: 46, 2 reads SQ: 182, 2 reads SQ: 122, 1 reads SQ: 177, 1 reads SQ: 110, 1 reads SQ: 43, 1 reads SQ: 165, 1 reads SQ: 98, 1 reads SQ: 86, 1 reads SQ: 141, 1 reads SQ: 74, 1 reads SQ: 129, 1 reads SQ: 117, 1 reads SQ: 172, 1 reads SQ: 105, 1 reads SQ: 38, 1 reads SQ: 160, 1 reads SQ: 93, 1 reads SQ: 26, 1 reads SQ: 148, 1 reads SQ: 81, 1 reads SQ: 69, 1 reads SQ: 124, 1 reads SQ: 112, 1 reads SQ: 167, 1 reads SQ: 100, 1 reads SQ: 33, 1 reads SQ: 155, 1 reads SQ: 88, 1 reads SQ: 143, 1 reads SQ: 76, 1 reads SQ: 131, 1 reads SQ: 64, 1 reads SQ: 119, 1 reads SQ: 174, 1 reads SQ: 107, 1 reads SQ: 40, 1 reads SQ: 162, 1 reads SQ: 150, 1 reads SQ: 83, 1 reads SQ: 138, 1 reads SQ: 71, 1 reads SQ: 126, 1 reads SQ: 114, 1 reads SQ: 102, 1 reads SQ: 157, 1 reads SQ: 90, 1 reads SQ: 145, 1 reads SQ: 78, 1 reads SQ: 133, 1 reads SQ: 66, 1 reads SQ: 121, 1 reads SQ: 109, 1 reads SQ: 42, 1 reads SQ: 164, 1 reads SQ: 97, 1 reads SQ: 30, 1 reads SQ: 152, 1 reads SQ: 85, 1 reads SQ: 140, 1 reads SQ: 73, 1 reads SQ: 128, 1 reads SQ: 116, 1 reads SQ: 49, 1 reads SQ: 159, 1 reads SQ: 147, 1 reads SQ: 80, 1 reads SQ: 135, 1 reads SQ: 68, 1 reads SQ: 123, 1 reads SQ: 178, 1 reads SQ: 111, 1 reads SQ: 166, 1 reads SQ: 99, 1 reads SQ: 32, 1 reads SQ: 154, 1 reads SQ: 87, 1 reads SQ: 142, 1 reads SQ: 75, 1 reads SQ: 130, 1 reads SQ: 185, 1 reads SQ: 118, 1 reads SQ: 51, 1 reads SQ: 106, 1 reads SQ: 39, 1 reads SQ: 161, 1 reads SQ: 27, 1 reads SQ: 149, 1 reads SQ: 82, 1 reads SQ: 70, 1 reads SQ: 125, 1 reads SQ: 58, 1 reads SQ: 101, 1 reads SQ: 34, 1 reads SQ: 156, 1 reads SQ: 89, 1 reads SQ: 144, 1 reads SQ: 77, 1 reads SQ: 65, 1 reads SQ: 187, 1 reads SQ: 120, 1 reads SQ: 108, 1 reads SQ: 163, 1 reads SQ: 29, 1 reads SQ: 151, 1 reads SQ: 84, 1 reads SQ: 139, 1 reads SQ: 72, 1 reads SQ: 127, 1 reads SQ: 115, 1 reads SQ: 48, 1 reads SQ: 103, 1 reads SQ: 158, 1 reads SQ: 24, 1 reads SQ: 146, 1 reads SQ: 79, 1 reads SQ: 134, 1 reads SQ: 67, 1 reads

[src/reads_sampler/mod.rs::110][2023-12-14 08:36:57][DEBUG] sampled 10979 records

[src/command_utils.rs::88][2023-12-14 08:36:57][DEBUG] estimated pass threshold 0.7246094 for primary sequence base C

[src/pileup/subcommand.rs::633][2023-12-14 08:36:57][INFO] Using filter threshold 0.7246094 for C.

[src/pileup/subcommand.rs::787][2023-12-14 08:36:57][INFO] Done, processed 0 rows. Processed ~0 reads and skipped zero reads.

can you help me? i am using version 0.2.0

@ArtRand
Copy link
Contributor

ArtRand commented Dec 14, 2023

Hello @JoukjeKloosterman,

The problem is that the Fasta parser currently in modkit (and in 0.2.0) will silently drop gzipped records. One item on the roadmap for the next release is to update this library, so I'll make sure it also accepted gzipped FASTA reference files. There should also be an error when zero references are parsed (modkit dmr does this).

tl;dr the work around is to decompress your reference. You won't need to with the next release.

@ArtRand ArtRand added this to the 0.2.4 milestone Dec 14, 2023
@ArtRand ArtRand changed the title empty bedfile pileup does not allow gzipped reference sequence Dec 14, 2023
@ArtRand ArtRand added the enhancement New feature or request label Dec 14, 2023
@ArtRand ArtRand modified the milestones: 0.2.4, 0.2.5 Dec 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants