Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

methylKit "unite" function error message #296

Open
mirurb opened this issue Aug 18, 2023 · 5 comments
Open

methylKit "unite" function error message #296

mirurb opened this issue Aug 18, 2023 · 5 comments
Assignees

Comments

@mirurb
Copy link

mirurb commented Aug 18, 2023

Hello,
I am sorry to bother you, but I'm hoping you will be able to help with a problem I am having with methylKit. More specifically, the program works well until the "unite" step. I am able to create the file.list, methRead and do the descriptive statistics on the Bismark-created txt files. However, keep getting the below-mentioned error when trying to "unite" the samples for comparative analysis. Here is a sample of the script I used:

file.list_test=list("UI-3063-Ml-LH-F19_dedup_sort.CX_report_sorted.txt", "UI-3063-Ml-LH-M16_dedup_sort.CX_report_sorted.txt")

myobj_test <- methRead(file.list_test,
sample.id = list("LH_F19", "LH_M16"),
assembly = "M_longipes_genome",
dbtype = "tabix",
pipeline = "bismarkCytosineReport",
header = TRUE,
sep = "\t",
context = "CpG",
resolution = "base",
dbdir = "CpG_methylDB_test",
treatment = c(2,1),
mincov = 1)

meth_test=unite(myobj_test, destrand=TRUE, min.per.group = 1L)

And here is where I get an error message:

meth_test=unite(myobj_test, destrand=TRUE, min.per.group = 1L)
destranding...
Error in value[3L] : index build failed
file: CpG_methylDB_test/LH_F19_destrand.txt.bgz
[E::hts_idx_push] Chromosome blocks not continuous

I also got this error message before, but then discovered that the .txt files created by Bismark methylation calling where not in the right chromosome order. So I sorted the files in Linux (sort -V -k 1,2), and that put every file in the same chormosome order. I thought that this would solve the problem, but the error also happened with the sorted files.

My object looks like this:

print(myobj_test)
methylRawListDB object with 2 methylRawDB objects

methylRawDB object with 19096632 rows

  chr start  end strand coverage numCs numTs

1 chr1_pp 613 613 + 3 1 2
2 chr1_pp 614 614 - 2 0 2
3 chr1_pp 684 684 - 2 0 2
4 chr1_pp 1047 1047 + 2 2 0
5 chr1_pp 1103 1103 + 3 3 0
6 chr1_pp 1166 1166 + 1 1 0

sample.id: LH_F19
assembly: M_longipes_genome
context: CpG
resolution: base
dbtype: tabix

methylRawDB object with 19213040 rows

  chr start  end strand coverage numCs numTs

1 chr1_pp 613 613 + 2 2 0
2 chr1_pp 1103 1103 + 2 1 1
3 chr1_pp 1251 1251 + 3 2 1
4 chr1_pp 1252 1252 - 1 1 0
5 chr1_pp 1307 1307 + 5 5 0
6 chr1_pp 1308 1308 - 4 1 3

sample.id: LH_M16
assembly: M_longipes_genome
context: CpG
resolution: base
dbtype: tabix

treament: 2 1

I really appreciate your help in this matter! I'm sure I have missed something as a novice to bioinformatics. Have a great day! M.

@alexg9010
Copy link
Collaborator

Hi @mirurb,

It seems that there may be a sorting issue or a problem with the naming of your chromosomes that is causing this error. However, it's difficult to determine the exact issue without a reproducible example.

Could you provide a small example using truncated versions of your bismark-generated files? Please share the files and code with me so I can assist you further.

Best,
Alex

@mirurb
Copy link
Author

mirurb commented Aug 23, 2023

Hello Alex,
Thank you very much for your help!
Before sending you the truncated files, I wanted to test them with methylKit. I tried various ways of truncating (e.g. one chromosome only, three chromosomes, first 30milj lines etc). None of these files gave me the error message. It seems that the error comes only with the original bismark file. Therefore, I prefer to send you two original files that can be downloaded from here. https://filesender.ens-lyon.fr/?vid=2441eb01-6152-7629-d3ae-0000157ab175

Here is the script I used:

library(methylKit)

file.list_test=list("UI-3063-Ml-LH-F19_bismark_CX_report.txt", "UI-3063-Ml-LH-M16_bismark_CX_report.txt")

myobj_test <- methRead(file.list_test,
sample.id = list("LH_F19", "LH_M16"),
assembly = "M_longipes_genome",
dbtype = "tabix",
pipeline = "bismarkCytosineReport",
header = TRUE,
sep = "\t",
context = "CpG",
resolution = "base",
dbdir = "CpG_methylDB_test",
treatment = c(2,1),
mincov = 1)
print(myobj_test)

meth_test=unite(myobj_test, destrand=TRUE, min.per.group = 1L)

Thank you for your help!
Have a great day! M.

@mirurb
Copy link
Author

mirurb commented Aug 24, 2023

Hello Alex,

A little update.
It seems that there is nothing wrong with the files themselves, but rather that something goes wrong when setting destrand=TRUE (the recommended setting for CpG methylation analyses) when uniting samples. The error message does not appear when destrand=FALSE. One of my colleagues suspects that during destranding and uniting the files are not sorted properly; i.e. rather than sorting the chormosomes by numbers (1,2,3 etc.), it sorts them by text (1,10,2 etc.). Have you encountered this problem before?
I am going to modify the Bismark-created .txt. files by adding 00 in front of the chromosome numbers to see it that works.
Do you have any other suggestions?

Best wishes,
Mirjam

@mirurb
Copy link
Author

mirurb commented Aug 28, 2023

Hello @alexg9010,
It does seem indeed that unite() works properly only when a bunch of 0s are added to the front of the chromosome numbers. Perhaps the future versions of methylKit could solve this issue.
Have you had a chance to look into this?
Best,
Mirjam

@alexg9010
Copy link
Collaborator

alexg9010 commented Aug 28, 2023

Hi @mirurb,

Sorry for the delayed reply and thanks for sending the Bismark files and example code.

It's great that you found a workaround for the issue! This is very helpful as long as we don't have a fix in methylKit.

So far, I haven't found a concrete explanation for the bug, but I was able to reproduce the error message. The error occurs after destranding when indexing the destranded tabix file of the first sample, just before the uniting is started.
I did some debugging of the destranding using your samples. The destranding itself is performed in chunks, where intermediate results are saved as tabix files and merged afterwards. The chunk size is the same for any chunk except the last one and intermediate tabix files should also have similar file sizes. However, I noticed that intermediate tabix files have quite large differences in file sizes (25MB vs. 5-8MB), so I suspect some issues there.

Will keep you updated once I know more.

Best,
Alex

@alexg9010 alexg9010 self-assigned this Sep 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants