Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using multiple samples in modkit dmr #140

Closed
ArnavBharti opened this issue Mar 1, 2024 · 12 comments
Closed

Using multiple samples in modkit dmr #140

ArnavBharti opened this issue Mar 1, 2024 · 12 comments
Labels
build-available custom build produced for fix.

Comments

@ArnavBharti
Copy link

I have 5 control samples and 2 treatment samples. I used dmr multi and got multiple bed files since it does pairwise dmr.
Is it viable to use 'samtools merge' to combine the bam files and run modkit dmr pair?

These are not biological replicates. They are different and I have bam files for each of them individually.

@ArtRand
Copy link
Contributor

ArtRand commented Mar 5, 2024

Hello @ArnavBharti,

Funny you should ask, the modkit v0.2.5 will do just what you're thinking. You can use modkit dmr pair with multiple replicates (docs). That being said, without more details on your experimental design it is difficult for me to make strong recommendations.

@PRIYANKA-22091995
Copy link

Hello Team,

Could you please elaborate more on modkit v0.2.5, as i also have the same query that if i have multiple control samples and treatment samples(with same disease condition) and i have concatenated all the bam files and if i run dmr pair, is it advisable to follow this approach. It would be nice to have your insights on the same.
Thanks in advance.

@lance0499
Copy link

Hi @ArnavBharti,

I am having trouble inputting multiple replicates into modkit dmr pair, because the command is not allowing me to. Were you able to manage?

Lance

@ArnavBharti
Copy link
Author

Have you upgraded the modkit program? It was throwing an error on my side too saying "error: the argument '-a <CONTROL_BED_METHYL>' cannot be used multiple times".

I pulled the repo again and built it. It is running and finishing succesfully

@lance0499
Copy link

Hi @ArnavBharti,

Thanks a lot. Pulling the repo manually again made it work.
Much appreciated.

@ArnavBharti
Copy link
Author

Hello @ArnavBharti,

Funny you should ask, the modkit v0.2.5 will do just what you're thinking. You can use modkit dmr pair with multiple replicates (docs). That being said, without more details on your experimental design it is difficult for me to make strong recommendations.

The sample I have is sequencing of same vector, one carrying disease and one not. The control is from 5 sources. And the treatment from 2.

I used this command. It is showing the output finished, processed 0 sites successfully, 2327053 failed.

@ArtRand
Copy link
Contributor

ArtRand commented Mar 7, 2024

@ArnavBharti

I used this command. It is showing the output finished, processed 0 sites successfully, 2327053 failed.

Could you show me the command you're using? Sites will be considered "failed" when there isn't enough valid coverage in all of the samples. Maybe decrease the valid coverage requirement?

@ArnavBharti
Copy link
Author

modkit dmr pair \
  -a sequencing/SEQUENCING_ANALYSIS/.../pass/pileup.bed.gz \
  -a sequencing/SEQUENCING_ANALYSIS/.../pass/barcode01/pileup.bed.gz \
  -a sequencing/SEQUENCING_ANALYSIS_1/.../pass/barcode02/pileup.bed.gz \
  -a sequencing/SEQUENCING_ANALYSIS_1/.../pass/barcode01/pileup.bed.gz \
  -a sequencing/SEQUENCING_ANALYSIS/.../pass/barcode01/pileup.bed.gz \
  -b sequencing/SEQUENCING_ANALYSIS/.../pass/barcode02/pileup.bed.gz \
  -b sequencing/SEQUENCING_ANALYSIS/.../pass/barcode10/pileup.bed.gz \
  --ref PlasmoDB-..._Genome.fasta \
  --base C \
  -o dmr_out \
  -t 10 \
  -f

Maybe decrease the valid coverage requirement?

I will try this

@ArtRand
Copy link
Contributor

ArtRand commented Mar 8, 2024

@ArnavBharti there is currently a limitation that requires the number of a samples to be the same as the number of b samples. The debug log would mention this. I can get you a build without this restriction, (it shouldn't exist anyways).

@ArtRand
Copy link
Contributor

ArtRand commented Mar 8, 2024

Here is the linux build, I also pushed the branch ar/gh-140-imbalanced-samples to github. These changes will certainly make it into the next release.
modkit-dev176205f_centos7_x86_64.tar.gz

@ArtRand ArtRand added the build-available custom build produced for fix. label Mar 8, 2024
@ArtRand
Copy link
Contributor

ArtRand commented Mar 12, 2024

@ArnavBharti any luck?

ArtRand added a commit that referenced this issue Mar 16, 2024
[dmr, single-site] Don't require that there is an even number of samples.

See merge request machine-learning/modkit!153
@ArnavBharti
Copy link
Author

ArnavBharti commented Mar 17, 2024

It is working without error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build-available custom build produced for fix.
Projects
None yet
Development

No branches or pull requests

4 participants