Is the tdlstm built on Python2? #3

lydemo · 2018-01-22T04:00:54Z

I tried to run the code in Win10 and my computer installed python3.6 with Anaconda 3, but I get trouble in twtokenizer, so I wonder whether the whole project is built on python2?When run the code the error comes in twtokenizer:
E:\tdlstm\src>python run.py
E:\software\Anaconda\lib\site-packages\gensim\utils.py:865: UserWarning: detected Windows; aliasing chunkize to chunkize_serial
warnings.warn("detected Windows; aliasing chunkize to chunkize_serial")
Traceback (most recent call last):
File "run.py", line 7, in
from optimise import TRAIN, TUNE, hyperoptTUNE, skoptTUNE
File "E:\tdlstm\src\optimise.py", line 9, in
from utils import load_data
File "E:\tdlstm\src\utils.py", line 6, in
import data.dataprocessor as dp
File "..\data\dataprocessor.py", line 8, in
from data.twtokenize import tokenize
File "..\data\twtokenize.py", line 166, in
unicode(regex_or(
NameError: name 'unicode' is not defined

lydemo · 2018-01-22T04:06:39Z

And could you tell me the code is privided by the link below their paper or just be built by yourself?I tried to download their code by the link but it's unavailable now...

bwang482 · 2018-01-22T11:59:48Z

1), You're right, it is written in Python 2.7 as from __future__ import ....

2), This is my version of implementation and it is slightly different to Tang's version (e.g. I used relu not tanh for activation; also I didn't set the clipping threshold of softmax layer as 200 but used other techniques). His original code is written in Java I believe. There is another version: https://github.com/scaufengyang/TD-LSTM, you can check it out as well and compare performance. Hope this helps.

lydemo · 2018-01-23T03:13:04Z

Thank you very much for sharing the link, and I wonder whether the word embedding that your code used is the pre-trained vector http:https://nlp.stanford.edu/data/glove.twitter.27B.zip? I use one of that, which dimensional is 100, and an error occur:
Traceback (most recent call last):
File "/Users/luoyin/Downloads/tdlstm-master/src/run.py", line 50, in
TRAIN(args, args.model)
File "/Users/luoyin/Downloads/tdlstm-master/src/optimise.py", line 181, in TRAIN
data = load_data(args, args.data, saved=args.load_data)
File "/Users/luoyin/Downloads/tdlstm-master/src/utils.py", line 15, in load_data
embedding=embedding, saved=saved, max_length=max_length)
File "../data/dataprocessor.py", line 74, in init
glove, self.glove_vec, self.glove_shape, glove_vocab = util.gensim_load_vec('../resources/wordemb/glove.twitter.27B.100d.txt')
File "../data/util.py", line 10, in gensim_load_vec
gensim_emb = gensim.models.KeyedVectors.load_word2vec_format(path, binary=False)
File "/Users/luoyin/anaconda2/lib/python2.7/site-packages/gensim/models/keyedvectors.py", line 197, in load_word2vec_format
vocab_size, vector_size = (int(x) for x in header.split()) # throws for invalid file format
File "/Users/luoyin/anaconda2/lib/python2.7/site-packages/gensim/models/keyedvectors.py", line 197, in
vocab_size, vector_size = (int(x) for x in header.split()) # throws for invalid file format
ValueError: invalid literal for int() with base 10: 'user'

bwang482 · 2018-01-23T11:35:55Z

Ah right, the code loads the Glove word vectors using Gensim, which means you have to add one extra line at the beginning with the number of tokens and the number of dimensions (and in this case, 1193514 100).

Check out:
https://radimrehurek.com/gensim/scripts/glove2word2vec.html

Think I will add this in the readme file, thanks for pointing out.

bwang482 added the question label Jan 22, 2018

bwang482 closed this as completed Feb 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is the tdlstm built on Python2? #3

Is the tdlstm built on Python2? #3

lydemo commented Jan 22, 2018

lydemo commented Jan 22, 2018

bwang482 commented Jan 22, 2018

lydemo commented Jan 23, 2018 •

edited

Loading

bwang482 commented Jan 23, 2018

Is the tdlstm built on Python2? #3

Is the tdlstm built on Python2? #3

Comments

lydemo commented Jan 22, 2018

lydemo commented Jan 22, 2018

bwang482 commented Jan 22, 2018

lydemo commented Jan 23, 2018 • edited Loading

bwang482 commented Jan 23, 2018

lydemo commented Jan 23, 2018 •

edited

Loading