Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for jieba #285

Merged
merged 10 commits into from
Oct 2, 2018
Merged

Add support for jieba #285

merged 10 commits into from
Oct 2, 2018

Conversation

Gyczero
Copy link
Contributor

@Gyczero Gyczero commented Sep 1, 2018

gensim is a famous toolkit for topic modeling
jieba is a famous Chinese toolkit for text segmentation

@rosbo rosbo self-requested a review September 14, 2018 18:22
@rosbo rosbo changed the title Add support for gensim and jieba Add support for jieba Sep 14, 2018
@rosbo
Copy link
Contributor

rosbo commented Sep 14, 2018

I resolved the conflict on your PR and removed the gensim addition because it is now already being added earlier in the Dockerfile.

@rosbo
Copy link
Contributor

rosbo commented Sep 14, 2018

@Gyczero could you add unit tests for this package to prevent regression. Just pick one simple canonical usage of this package. Here is an example: https://github.com/Kaggle/docker-python/blob/master/tests/test_fastai.py

@rosbo rosbo mentioned this pull request Sep 14, 2018
@Gyczero
Copy link
Contributor Author

Gyczero commented Sep 22, 2018

@rosbo I added unit tests for jieba and thanks for adding gensim which is very helpful for NLP!

Copy link
Contributor

@rosbo rosbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding a unit test. That will help us prevent regression going forward. I added one small suggestion

Copy link
Contributor Author

@Gyczero Gyczero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add jieba

class TestJieba(unittest.TestCase):
def test_text_split(self):
sentence = "我爱北京天安门"
seg_list = jieba.cut(sentence)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add an assertion for this test please.

Maybe self.assertEqual(X, len(seg_list)) where X is the number of segments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I added an assertion.

Copy link
Contributor

@rosbo rosbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding tests. Merging :)

@rosbo rosbo merged commit 49eaa50 into Kaggle:master Oct 2, 2018
@rosbo rosbo mentioned this pull request Jan 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants