Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What are the suitable parameters for RRBS data? #14

Open
bishwaG opened this issue Aug 28, 2018 · 6 comments
Open

What are the suitable parameters for RRBS data? #14

bishwaG opened this issue Aug 28, 2018 · 6 comments
Labels

Comments

@bishwaG
Copy link

bishwaG commented Aug 28, 2018

Hi,
In your paper you have mentioned that "dmrseq is generally applicable to WGBS data". My impression is that methods that work for WGBS also work for RRBS data. Have you tried dmrseq for RRBS data? If yes, what are the recommend parameters for dmrseq() function?

@kdkorthauer
Copy link
Owner

Hi @bishwaG,

Another great question! The dmrseq procedure is certainly also applicable to data from RRBS experiments. However, the default parameters for region construction and smoothing (e.g. bpSpan, minInSpan, maxGapSmooth, andmaxGap) were set with WGBS data in mind. I haven't yet extensively explored RRBS analysis to have an official set of recommendations, so you might have to try a couple of different parameter sets on a subset of your data (e.g. a single chromosome) to see how it impacts the types of regions identified.

For example, you might want to try decreasing minInSpan and increasing maxGapSmooth to allow the smoothing procedure to span gaps in coverage where the RRBS didn't measure any CpGs. For the same reason, you might also increase maxGap so that you allow two neighboring CpGs to be in the same DMR even if there more than 1000bp between them (since there could likely be CpGs in that span that just weren't targeted by RRBS).

Adjusting these parameters will likely affect the sizes of DMRs identified, but either way the procedure will still provide valid inference. Hope that helps! And feel free to report back if you happen to find a particular setting works well.

Best,
Keegan

@bishwaG
Copy link
Author

bishwaG commented Aug 31, 2018

Hi @kdkorthauer
Thank you very much for your reply. I tried minInSpan=10, bpSpan values between 5,000 to 10,000, maxGapSmooth between 10,000 and 100,000 and maxGap=5000. I was unable to get any significant (qval <= 0.05) DMRs. I tried for couple of chromosomes, but no success. I do not expect my groups be that homogeneous so that I wont get any DMR. How does dmrseq perform if I disable smoothing? If I do not remember wrong it is recommends not to smooth RRBS (because it is very sparse compared to WGBS) when I use DSS package for differential mentylation of bisulfite sequencing data.

@kdkorthauer
Copy link
Owner

Hi @bishwaG,

Thanks for reporting back! It seems like you're not seeing much difference in performance when you increase smoothing. A few things come to mind:

  • What are you using for the cutoff parameter? It is 0.10 by default, but you may want to try lowering it to something like 0.05 if you're not seeing a strong signal.

  • You could certainly try with no smoothing (smooth = FALSE). This might generate rather short regions, especially if your coverage is on the low end and the signal is a bit noisy. This is because longer regions will get broken up by short stretches of CpGs that don't exhibit signal (which is effectively smoothed over with the smoothing procedure).

  • What type of covariate are you testing and how many replicates do you have? If you are using a dichotomous covariate (2 groups), are there any additional covariates that you may want to match on (such as a different covariate that would split the samples into two groups, where each group has some samples from each of the groups of the covariate of interest - see the documentation for the matchCovariate parameter).

Best,
Keegan

@bishwaG
Copy link
Author

bishwaG commented Sep 3, 2018

Hi @kdkorthauer

Thank you for more insights. I have been using cutoff = 0.01. I have following experimental design and I would like to find methylation different between group A and B by adjusting effect coming from handlingTime. I have been using adjustCovariate = "handlingTime" to adjust covariates.


Sample	Group	handlingTime
S1	A	100
S2	A	152
S3	A	452
S4	A	1258
S5	B	214
S6	B	352
S7	B	574
S8	B	214

Regards,
BishwaG

@cauls19900319
Copy link

Hi, I am wondering do you have any recommendations for RRBS now since the latest discussion? Thanks!

@kdkorthauer
Copy link
Owner

Hi @cauls19900319,

Thanks for your question. In general I have not found a specific set of smoothing parameters that performs best in all cases. I still recommend testing out a few different sets of parameters on a small subset of your data to see how they compare (e.g. no smoothing vs default smoothing). When comparing, you can plot the signal in the top-ranked regions, and see whether certain settings tend to find more 'convincing' DMRs by eye (or whether it looks like a longer DMR is being broken up into smaller DMRs, for example - this would suggest to increase smoothing).

In either case, the results still provide valid inference. The tuning will simply help to provide more accurate region boundaries.

Best,
Keegan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants