Monaka

A Japanese parser (including support for historical Japanese)

Installation

Parse

First, download and install appropriate UniDic dictionary:

monaka download wabun

Available dictionaries:

name	discription
gendai	現代書き言葉
spoken	現代話し言葉
novel	近現代口語小説
qkana	旧仮名口語
kindai	近代文語
kinsei	近世江戸口語
kyogen	中世口語
wakan	中世文語
wabun	中古和文
manyo	上代語

Then, call parse command:

monaka parse {model} 今日はいい天気ですね

output:

{
  "tokens": [
    "今日",
    "は",
    "いい",
    "天気",
    "です",
    "ね"
  ],
  "pos": [
    "名詞-普通名詞-副詞可能",
    "助詞-係助詞",
    "形容詞-非自立可能-形容詞",
    "名詞-普通名詞-一般",
    "助動詞-助動詞-デス",
    "助詞-終助詞"
  ],
  "luw": [
    "名詞-普通名詞-一般",
    "助詞-係助詞",
    "形容詞-一般-形容詞",
    "名詞-普通名詞-一般",
    "助動詞-助動詞-デス",
    "助詞-終助詞"
  ],
  "chunk": [
    "B",
    "I",
    "B",
    "B",
    "I",
    "I"
  ],
  "sentence": "今日はいい天気ですね"
}

You can specify output format ("bunsetsu-split" and "luw-split" )

monaka parse {model} 今日はいい天気ですね --output-format bunsetu-split

今日は いい 天気ですね

Model download

The author will provide trained model upon a request. Please contact the author.

Training monaka model

LUW and Bunsetsu tokenizer/chunker

Creating Dataset

A dataset should be JSON-L formatted and its each line shoud contains following fields:

    {
        "sentence": "str", 
        "tokens": ["a list of SUW"],
        "pos": ["POS-tag labels for each SUW"],
        "labels": ["Target labels for each SUW"]
    }

We provide data conversion script for UD-Japanese data. Here is an example command to convert UD-Japanese-GSD train data.

monaka_train ud2jsonl ja_gsd-ud-train.conllu ja_gsd-ud-train.jsonl

After creating the dataset files, then create label and pos-tag dictionaries:

monaka_train create-vocab [output_dir] ja_gsd-ud-train.jsonl ja_gsd-ud-dev.jsonl ja_gsd-ud-test.jsonl

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
config		config
monaka		monaka
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Monaka

Installation

Parse

Model download

Training monaka model

LUW and Bunsetsu tokenizer/chunker

Creating Dataset

Creating training configuration JSON file

License

komiya-lab/monaka

Folders and files

Latest commit

History

Repository files navigation

Monaka

Installation

Parse

Model download

Training monaka model

LUW and Bunsetsu tokenizer/chunker

Creating Dataset

Creating training configuration JSON file