Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the txt vertion #11

Open
WangLilian opened this issue Aug 19, 2017 · 8 comments
Open

About the txt vertion #11

WangLilian opened this issue Aug 19, 2017 · 8 comments

Comments

@WangLilian
Copy link

Hi,
I would like to ask about the file type. The data given is binary, when I convert it to string or text with python, there would be some garbled. May I access to the "txt" version directly or may I ask for some suggestion? Thanks very much!

@abisee
Copy link
Owner

abisee commented Aug 20, 2017

We can't provide the data for legal reasons, but the README gives links to where you can download the original data which is plaintext, and also a link to the bin files you get from running the processing scripts in this repo.

@WangLilian
Copy link
Author

May I ask that how did you process the bin files to get the inputs of the nueral network? Do you need to convert to string first? I have some trouble in the step.

@abisee
Copy link
Owner

abisee commented Aug 20, 2017

That code is in this repository. See the function example_generator.

@JafferWilson
Copy link

JafferWilson commented Aug 21, 2017

@WangLilian @abisee Wang, if you wish to see the txt version of the data, then you can convert it using the file data_convert_example.py. This will help you convert the bin to txt and even txt to bin. This is the repository of Tensorflow Model. I hope this helps you.
If it does answer your query then please let me know. Thank you.

@qlwang25
Copy link

qlwang25 commented Dec 3, 2018

@JafferWilson the link of data_convert_example.dy is Invalid

@JafferWilson
Copy link

@qlwang25 Thank you for informing me regarding the link. Here is the data_convert_example.py. Hope this helps.
@abisee Please close this issue. I guess this issue is resolved and had been opened for a long and there is no response from the creator of this thread since long.

@qlwang25
Copy link

qlwang25 commented Dec 4, 2018

@JafferWilson
Thanks a lot . This script is very useful.
However, I run it with (--command binary_to_text --in_file test.bin --out_file test.txt),
at the time of decoding the thirteenth, an encoding error occurred.

Traceback (most recent call last):
  File "data_convert_example.py", line 70, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 126, in run
    _sys.exit(main(argv))
  File "data_convert_example.py", line 64, in main
    _binary_to_text()
  File "data_convert_example.py", line 40, in _binary_to_text
    examples.append('%s=%s' % (key, tf_example.features.feature[key].bytes_list.value[0]))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 854: ordinal not in range(128)

I accept this suggestion
i.e. using .decode('utf-8') or codecs.open.

examples.append('%s=%s' % (key, tf_example.features.feature[key].bytes_list.value[0].decode('utf-8')))

another error has appeared.

Traceback (most recent call last):
  File "data_convert_example.py", line 70, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 126, in run
    _sys.exit(main(argv))
  File "data_convert_example.py", line 64, in main
    _binary_to_text()
  File "data_convert_example.py", line 41, in _binary_to_text
    writer.write('%s\n' % '\t'.join(examples))
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2022' in position 862: ordinal not in range(128)

Do you have any good ideas?
Thanks.

@qlwang25
Copy link

qlwang25 commented Dec 4, 2018

I have solved this coding problem,
Solution:

import struct
import sys
reload(sys)
sys.setdefaultencoding('utf8')

import tensorflow as tf
from tensorflow.core.example import example_pb2

Also, thank you @JafferWilson again for the script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants