pykakasi
is a Python Natural Language Processing (NLP) library to transliterate hiragana, katakana and kanji (Japanese text) into rōmaji (Latin/Roman alphabet). It can handle characters in NFC form.
Its algorithms are based on the kakasi library, which is written in C.
- Install (from PyPI):
pip install pykakasi
- Install (from conda-forge):
conda install -c conda-forge pykakasi
- Documentation available on readthedocs
- pykakasi supports python 3.6, 3.7, 3.8, 3.9, and pypy3
Transliterate Japanese text to kana, hiragana and romaji:
import pykakasi
kks = pykakasi.kakasi()
text = "かな漢字"
result = kks.convert(text)
for item in result:
print("{}: kana '{}', hiragana '{}', romaji: '{}'".format(item['orig'], item['kana'], item['hira'], item['hepburn']))
かな: kana 'カナ', hiragana: 'かな', romaji: 'kana'
漢字: kana 'カンジ', hiragana: 'かんじ', romaji: 'kanji'
Here is an example that output as similar with furigana mode.
import pykakasi
kks = pykakasi.kakasi()
text = "かな漢字交じり文"
result = kks.convert(text)
for item in result:
print("{}[{}] ".format(item['orig'], item['hepburn'].capitalize()), end='')
print()
かな[Kana] 漢字[Kanji] 交じり[Majiri] 文[Bun]
You can see benchmark result on various versions and platforms at miurahr#123
- PyKakasi::
- Copyright (C) 2010-2021 Hiroshi Miura and contributors(see AUTHORS)
- KAKASI Dictionary::
Copyright (C) 2010-2021 Hiroshi Miura and contributors(see AUTHORS)
Copyright (C) 1992 1993 1994 Hironobu Takahashi, Masahiko Sato, Yukiyoshi Kameyama, Miki Inooka, Akihiko Sasaki, Dai Ando, Junichi Okukawa, Katsushi Sato and Nobuhiro Yamagishi
- UniDic::
Copyright (c) 2011-2021, The UniDic Consortium
All rights reserved.
Unidic is released under any of the GPL2, the LGPL2.1, or the 3-clause BSD License. (See src/data/unidic/BSD.txt) PyKakasi relicenses a part of the unidic with GPL3+.
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see <http:https://www.gnu.org/licenses/>.