Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is the tdlstm built on Python2? #3

Closed
lydemo opened this issue Jan 22, 2018 · 4 comments
Closed

Is the tdlstm built on Python2? #3

lydemo opened this issue Jan 22, 2018 · 4 comments
Labels

Comments

@lydemo
Copy link

lydemo commented Jan 22, 2018

I tried to run the code in Win10 and my computer installed python3.6 with Anaconda 3, but I get trouble in twtokenizer, so I wonder whether the whole project is built on python2?When run the code the error comes in twtokenizer:
E:\tdlstm\src>python run.py
E:\software\Anaconda\lib\site-packages\gensim\utils.py:865: UserWarning: detected Windows; aliasing chunkize to chunkize_serial
warnings.warn("detected Windows; aliasing chunkize to chunkize_serial")
Traceback (most recent call last):
File "run.py", line 7, in
from optimise import TRAIN, TUNE, hyperoptTUNE, skoptTUNE
File "E:\tdlstm\src\optimise.py", line 9, in
from utils import load_data
File "E:\tdlstm\src\utils.py", line 6, in
import data.dataprocessor as dp
File "..\data\dataprocessor.py", line 8, in
from data.twtokenize import tokenize
File "..\data\twtokenize.py", line 166, in
unicode(regex_or(
NameError: name 'unicode' is not defined

@lydemo
Copy link
Author

lydemo commented Jan 22, 2018

And could you tell me the code is privided by the link below their paper or just be built by yourself?I tried to download their code by the link but it's unavailable now...

@bwang482
Copy link
Owner

1), You're right, it is written in Python 2.7 as from __future__ import ....

2), This is my version of implementation and it is slightly different to Tang's version (e.g. I used relu not tanh for activation; also I didn't set the clipping threshold of softmax layer as 200 but used other techniques). His original code is written in Java I believe. There is another version: https://github.com/scaufengyang/TD-LSTM, you can check it out as well and compare performance. Hope this helps.

@lydemo
Copy link
Author

lydemo commented Jan 23, 2018

Thank you very much for sharing the link, and I wonder whether the word embedding that your code used is the pre-trained vector http:https://nlp.stanford.edu/data/glove.twitter.27B.zip? I use one of that, which dimensional is 100, and an error occur:
Traceback (most recent call last):
File "/Users/luoyin/Downloads/tdlstm-master/src/run.py", line 50, in
TRAIN(args, args.model)
File "/Users/luoyin/Downloads/tdlstm-master/src/optimise.py", line 181, in TRAIN
data = load_data(args, args.data, saved=args.load_data)
File "/Users/luoyin/Downloads/tdlstm-master/src/utils.py", line 15, in load_data
embedding=embedding, saved=saved, max_length=max_length)
File "../data/dataprocessor.py", line 74, in init
glove, self.glove_vec, self.glove_shape, glove_vocab = util.gensim_load_vec('../resources/wordemb/glove.twitter.27B.100d.txt')
File "../data/util.py", line 10, in gensim_load_vec
gensim_emb = gensim.models.KeyedVectors.load_word2vec_format(path, binary=False)
File "/Users/luoyin/anaconda2/lib/python2.7/site-packages/gensim/models/keyedvectors.py", line 197, in load_word2vec_format
vocab_size, vector_size = (int(x) for x in header.split()) # throws for invalid file format
File "/Users/luoyin/anaconda2/lib/python2.7/site-packages/gensim/models/keyedvectors.py", line 197, in
vocab_size, vector_size = (int(x) for x in header.split()) # throws for invalid file format
ValueError: invalid literal for int() with base 10: 'user'

@bwang482
Copy link
Owner

Ah right, the code loads the Glove word vectors using Gensim, which means you have to add one extra line at the beginning with the number of tokens and the number of dimensions (and in this case, 1193514 100).

Check out:
https://radimrehurek.com/gensim/scripts/glove2word2vec.html

Think I will add this in the readme file, thanks for pointing out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants