got error when processing training data #24

sindax123 · 2022-12-25T20:20:04Z

Hi Dear developer,
I got an error when procesing training data with TR0 data provided by MXfold2

$ python process_data_newdataset.py TR0
Traceback (most recent call last):
File "process_data_newdataset.py", line 69, in
pair_dict_all_list = [[int(item_tmp)-1,int(t2[1].split('\n')[index_tmp])-1] for index_tmp,item_tmp in enumerate(t1[1].split('\n')) if int(t2[1].split('\n')[index_tmp]) != 0]
File "process_data_newdataset.py", line 69, in
pair_dict_all_list = [[int(item_tmp)-1,int(t2[1].split('\n')[index_tmp])-1] for index_tmp,item_tmp in enumerate(t1[1].split('\n')) if int(t2[1].split('\n')[index_tmp]) != 0]
ValueError: invalid literal for int() with base 10: 'X'

Having no idea of what the data exactly look like , I feel confused with this problem. Could you please tell me how to fix it ? Thank you!

sindax123 · 2022-12-26T10:49:43Z

when i tried to print t0,t1,t2 in the code some of the files are successfully processed while others turned out t0 t1 t2 respectively are (0, 'OS')
(0, '\x00\x05\x16\x07\x00\x02\x00\x00Mac')
(0, 'X')

sperfu · 2022-12-26T11:33:36Z

Hi there,

Since we used this script to process different formats of training data. So we may altered some of the scripts in process_data_newdataset.py during processing. So one solution way is to find out what is the data composed of by using pickle(python package) to load those files and check the exact details in those file. I hope that will work.

Thanks

sindax123 · 2022-12-26T15:27:15Z

Thank you for your reply!I checked the component of the data and found some of the data invalid.It ouputs "OS" instead of rna sequence,accounting for at least a half of the dataset.I wonder if such situation is normal or there is something wrong with my dataset. If there is something wrong with my dataset, where else can i get those data?

sperfu · 2022-12-27T02:22:23Z

I wonder if there is some format issue related to the system(like "OS""Mac" etc.), it seems you used MacOS to deal with those files. We process those file using Linux(Ubuntu). You may pay attention to that.
Secondly, if that doesn't solve your problem. You may resort to MXfold2 paper. They also provide those datasets.

sindax123 · 2022-12-27T02:47:30Z

Thank you for your reply! I think I have figured out what the problem is by double checking the data! In the TR0 folder I downloaded each piece of rna sequence contains two document named“._bpRNA_XXXXX”and“bpRNA_XXXX” respectively.I suppose it would be fixed by adding a selective condition.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

got error when processing training data #24

got error when processing training data #24

sindax123 commented Dec 25, 2022

sindax123 commented Dec 26, 2022

sperfu commented Dec 26, 2022

sindax123 commented Dec 26, 2022

sperfu commented Dec 27, 2022

sindax123 commented Dec 27, 2022

got error when processing training data #24

got error when processing training data #24

Comments

sindax123 commented Dec 25, 2022

sindax123 commented Dec 26, 2022

sperfu commented Dec 26, 2022

sindax123 commented Dec 26, 2022

sperfu commented Dec 27, 2022

sindax123 commented Dec 27, 2022