Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extract options #215

Closed
Baix7 opened this issue Jun 20, 2024 · 4 comments
Closed

extract options #215

Baix7 opened this issue Jun 20, 2024 · 4 comments
Labels
question Further information is requested

Comments

@Baix7
Copy link

Baix7 commented Jun 20, 2024

I have a question regarding the extract subcommand. Why does the pileup subcommand provide options like --ignore h and --combine-mods, while the extract subcommand only offers --ignore h?

@ArtRand
Copy link
Contributor

ArtRand commented Jun 20, 2024

Hello @Baix7,

The --combine-mods flag in modkit pileup sums the counts of all base modifications together at a reference position, whereas modkit extract works at a per-read level. When you use --ignore h it changes the probabilities at a per-read level, so it makes sense to have this as an option to modkit extract. I suppose what you're asking for is reasonable, however, that you want a flag that produces a binary "modified/not-modified" call from modkit extract. (Correct me if I'm wrong). With the latest release candidate you can stream the read calls table through awk to produce a table where all non-canonical bases are replaced with the code for "any mod" for that base (which happens to be the same letter as the primary sequence base).

${modkit} extract ${bam} null --read-calls stdout --no-filtering 2> /dev/null | awk -v OFS="\t" '{
if ($12=="-") {
  print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,"-",$13,$14,$15,$16,$17,$18,$19,$20,$21
} else {
  # notice that the `call-code` column is replaced with the `modified_primary_base` column
  print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$17,$13,$14,$15,$16,$17,$18,$19,$20,$21
}}' > ${read_calls_tsv}

@ArtRand ArtRand added the question Further information is requested label Jun 20, 2024
@Baix7
Copy link
Author

Baix7 commented Jun 21, 2024

Thank you for the quick response.

So, in this case, is it correct that no reference genome is used? If I want to see the results for a specific motif, what should I do? Should I use bedtools intersect with the motif_bed after processing with awk?

@ArtRand
Copy link
Contributor

ArtRand commented Jun 24, 2024

Hello @Baix7,

The reference genome is optional when using modkit extract (it's only used to get the ref_kmer column). If your modBAM has aligned reads it will extract the reference aligned coordinates for each modification. You may also want to use the --mapped-only flag. If you only care about certain genome locations (i.e. motifs) you can do like you say get the locations of the motifs with modkit motif-bed then run modkit extract followed by bedtools intersect to narrow down to just those locations. You can also use modkit extract --include-positions ${motif_bed} to do the same thing - but then you only get the intersect positions.

@Baix7
Copy link
Author

Baix7 commented Jun 24, 2024

Thank you very much for your response; it helps a lot. I greatly appreciate the work your team is doing.

@Baix7 Baix7 closed this as completed Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants