Skip to content

Releases: polm/fugashi

v1.3.0: M1 Wheels! Finally!

25 Aug 12:56
Compare
Choose a tag to compare

This release addresses one of the longest standing issues, #55. Many thanks to @nikitalita figuring out how to cross-compile MeCab for wheels.

There are no other changes.

v1.2.1: Python 3.11 Support

06 Dec 13:02
Compare
Choose a tag to compare

This release adds wheels for Python 3.11, with no other changes.

v1.2.0: Add nbestToNodeList, drop Python 3.6 and earlier

04 Sep 09:54
Compare
Choose a tag to compare

This release of fugashi adds one new feature: Tagger.nbestToNodeList returns the top N possible tokenizations of a string as node lists. Many thanks to @teowenshen for the implementation (#61).

This release also drops support for Python 3.6 and earlier versions. While the current source should still work with 3.5 and 3.6, wheels are not provided, and it is recommended you upgrade your Python version to one that has not reached end-of-life status. If you must use an older version, you can continue using v1.1.2.

v1.1.2: Python 3.10 Support, Cleaner Builds

16 Feb 05:43
Compare
Choose a tag to compare

This release adds long overdue wheels for Python 3.10. There are no changes in functionality or API.

On the backend, in addition to fixing issues with the 3.10 version number and quoting, the build process was cleaned up considerably. Many thanks to @lambdadog for the bugfixes and cleanup!

This release does not include wheels for M1 Macs - those may be working, but I've been unable to confirm it. See #55 for details or to help out.

v1.1.1: Bug Fixes and API Cleanup

24 Jul 10:45
Compare
Choose a tag to compare

This release has a number of stability and API improvements.

Note that the fix to #38 has a number of side effects that may need more extensive evaluation. In particular:

  • memory use will grow very slowly over the life of a Tagger object
  • execution speed will be a bit slower, up to around 10%

It's expected that these will both be addressed before long; despite the issues, the current fix has been deemed suitable for a release because in the vast majority of use cases it will behave more correctly than the previous release.

Experimental Support for Dictionary Building Added

25 Jan 06:31
Compare
Choose a tag to compare

One feature fugashi hasn't had until now is the ability to build user dictionaries. This feature can be important for improving tokenization quality in many applications. This release adds fugashi-build-dict, a wrapper for MeCab's mecab-dict-index command. You can use it like this:

fugashi-build-dict -d [system-dic-dir] -u mydic.dic input.csv

If you're familiar with MeCab's user dictionary creation process nothing has changed, so any feedback on use or any errors you encounter would be appreciated. If you're not familiar with the dictionary process, just wait a bit - a guide should be released soon.

fugashi v1.0.0

28 Jun 05:08
Compare
Choose a tag to compare

fugashi v1.0 has arrived. 🎊

This release does not include any major changes to the code. The main purpose of this release is to make it clear that the API has reached a point where it can remain stable moving forward. While there will surely be more patches to clean things up or add minor features, I don't have any major changes planned.

This release does include one small change: previously, __repr__ marked UNKs. This behavior is useful in some situations, but it's easier to add it to generic behavior than take it out, so I removed it. Now you can (mostly) reconstruct the input with ''.join([str(nn) for nn in nodes]).

Thanks for using fugashi, and if there's anything you'd like to see in it please feel free to open an issue.

Command line scripts and callable Taggers

18 May 15:58
Compare
Choose a tag to compare

This isn't a drastic release, but since I've been dragging out the patch numbers it seemed like a good time to bump the minor version. This is v0.2.0! 🎉

The first feature in this release is the addition of command line scripts. Since it's possible to install fugashi without MeCab, you might not have a command-line binary. This fixes that so you can use fugashi as a replacement for mecab. There's also the fugashi-info script, which is similar to mecab -D in that it prints dictionary information. I hope it will be useful when dealing with bugs and installation issues.

The other feature is that Tagger instances are now callable. One of the best features of fugashi is it makes it much easier to work with MeCab nodes, but the function associated with that - parseToNodeList - had an unfortunately long name. I didn't want to call it parse since that already has meaning in MeCab, but giving it a different name felt odd... so I realized the easiest thing is to make the Tagger instance itself callable. Here's an example of the change this makes possible:

from fugashi import Tagger
tagger = Tagger()

# before
for word in tagger.parseToNodeList(text):
    print(word.surface)

# after
for word in tagger(text):
    print(word.surface)

Feels better, doesn't it? I imagine this will be particularly helpful for compact expressions like list comprehensions. And parseToNodeList is still there, so existing code can be used unmodified.

Lately I've been working more on optimizing SudachiPy than fugashi, but there are still ease-of-use improvements to be made here, and if it works here it can be useful in other tokenizers too. If there's anything you'd like to see let me know.

Bundled UniDic Support

15 Apr 11:03
Compare
Choose a tag to compare

This release adds support for installing UniDic from PyPI, whether the easy-to-install unidic-lite or the full-fledged unidic package. Special thanks to @chezou for helping with testing on Windows, which had quoting issues due to backslashes in paths.

This release greatly simplifies installing and using fugashi. Assuming no major issues are found, the next release should be 1.0.0.

OSX Build Bugfix Release

02 Apr 14:08
Compare
Choose a tag to compare

This release includes a fix for builds on OSX. See #16 for details; thanks to @HiromuHota for the report and help with the fix.