JPS6132169A

JPS6132169A - Word extracting system

Info

Publication number: JPS6132169A
Application number: JP15234584A
Authority: JP
Inventors: Yasuyuki Numata; 泰之沼田
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1984-07-23
Filing date: 1984-07-23
Publication date: 1986-02-14

Abstract

PURPOSE:To extract a word with suffix registered in a word dictionary while decreasing the reduction in retrieval processing speed by generating three character strings only when the 3rd Kanji (Chinese syllabary) sound has a possibility of a suffix. CONSTITUTION:A sectioning section 3 using a Kanji sound applies sectioning by means of the Kanji sound to an input Kana (Japanese syllabary) character string transmitted from an input character string storage setion 2 at first. An attribute discriminating section 5 of the input character string discriminates whether there is any possibility of the 3rd minimum unit to be a suffix or not depending on the arrangement of the Kanji sound ans the Kana of the input character string sectioned by using the Kanji sound and the Kana as the minimum unit. Only when there is a possibility, the retrieved character string is generated based on the three minimum units and in other cases, the retrieved character string is formed based on the two minimum units as a conventionally system.

Description

【発明の詳細な説明】技術分野本発明は、カナ漢字変換処理装置に関し、より詳細には
日本語文書作成装置、電子計算機システム等に適用し得
るカナ漢字変換処理装置における単語抽出方式に関する
。DETAILED DESCRIPTION OF THE INVENTION Technical Field The present invention relates to a kana-kanji conversion processing device, and more particularly to a word extraction method in a kana-kanji conversion processing device that can be applied to a Japanese document creation device, a computer system, etc.

従来技術カナ漢字変換処理装置では、表音文字（ひら仮名、カタ
仮名、ローマ文字）で入力された文章を適切な漢字カナ
混じり文に変換するため、カナ漢字変換用の単語辞書を
設けている。この単語辞書の検索は、入力されたカナ文
字列から単語を切出して被検索文字列とし、被検索文字
列と単語辞書中の見出し文字列とのマツチングを行うこ
とにより行う。しかし、日本語は文法が複雑で、かつ同
音異義語が多数存在するため、辞書検索により複数の候
補単語が抽出される。In the conventional kana-kanji conversion processing device, a word dictionary for kana-kanji conversion is provided in order to convert sentences input in phonetic characters (hiragana, katakana, Roman letters) into appropriate sentences containing kanji and kana. . This search of the word dictionary is performed by cutting out a word from the input kana character string as a character string to be searched, and then matching the character string to be searched with the heading character string in the word dictionary. However, since Japanese has a complex grammar and many homonyms, a dictionary search will extract multiple candidate words.

この複数の候補単語の中から１つを選択し変換結果とす
るために、従来、次のような処理を行っている。すなわ
ち、抽出された候補単語のそれぞれに対し、前の変換済
単語（変換結果）との接続の可能性を判断し、さらに接
続可能な候補単語を読み長、出現頻度、接続の重み等を
パラメータとして評価し、評価の最も高い候補単語を変
換結果として出力する。Conventionally, the following processing is performed to select one of the plurality of candidate words and use it as a conversion result. In other words, for each extracted candidate word, the possibility of connection with the previous converted word (conversion result) is determined, and the connectable candidate words are determined using parameters such as reading length, frequency of appearance, connection weight, etc. The candidate word with the highest evaluation is output as the conversion result.

ところで、従来、単語辞書検索の容易化、誤解折の低減
化を図るため、入力文字列に対し漢字音による前処理を
行っている。By the way, conventionally, in order to facilitate word dictionary searches and reduce misunderstandings, input character strings have been preprocessed using kanji sounds.

漢字音には、カナ表記した場合に、その長さが■１文字
である１字漢字音、■２文字である２字漢字音、■３文
字である３生膜字音がある。例えば、 ■１字漢字音・・・はとんどのカナニア「亜、」、イ「
以、意、位、医、異・・・」等 ■２字漢字音・・・アイ「愛、挨、哀・・・」、アク「
悪、握・・・」等 ■３字漢字音・・・シュウ「集、収、週、衆、終、習、
修、周、就・・・」、ショウ「相、小、省、勝、少、商、証、消、正・・・」等である。Kanji sounds, when written in kana, include ■1-character kanji sounds whose length is one character, ■2-character kanji sounds whose length is two characters, and ■three-character kanji sounds whose length is three characters. For example, ■1 character kanji sound... is the most Kananian "A,", I "
■Two-character kanji sound...Ai "Love, Greeting, Sadness...", Ak "
Evil, grip...'', etc.■ Three kanji sounds...Shu ``Collect, collect, week, shu, end, xi,
``xiu, zhou, shu...'' and sho ``xiang, xiao, shu, katsu, xiao, shang, sho, erasing, zheng...'' and so on.

ところで、上記２字漢字音、３生膜字音において、２字
目、３字目を占めるカナは次に示す１８種に限定される
。By the way, in the above-mentioned 2-character kanji sounds and 3-character kanji sounds, the kana that occupy the second and third characters are limited to the following 18 types.

「イ、つ、キ、り、チ、ツ、ヤ、ユ、ヨ、ユウ、ヨウ、
ヤク、ユク、ヨウ、ユツ、ユン、ツ、ン」しかし、１８
種のカナの全てが１字目のカナに対して漢字音を構成す
るわけではない。例えば、１字目がアの場合。“I, tsu, ki, ri, chi, tsu, ya, yu, yo, yuu, yo.
Yaku, yuku, yo, yutsu, yun, tsu, n' But 18
Not all types of kana form a kanji sound for the first kana. For example, if the first character is A.

アイ・・・漢字音（上記例参照）アク・・漢字音でないアキ・・・漢字音でないアク・・・漢字音（上記例参照）アチ・・・漢字音でないのようになる。Ai...Kanji sound (see example above) Aku...not a kanji sound Aki...not a kanji sound Ak...kanji sound (see example above) Achi...not a kanji sound become that way.

入力文字列を上記２字以上の漢字音により区切り、それ
を単位として被検索文字列を作成することにより、本来
、漢字の読みの一部であるものを格助詞等と誤解析する
ことがなくなる。また、上記漢字音は単独で用いられる
ことはなく、必ず他の漢字音との組合わされて使用され
る。したがって、３文字以上の入力文字列とこれら漢字
音をマツチングした結果、先頭の１文字はマツチするが
２字目、３字目がマツチしない場合は、当該先頭の１文
字は漢字音ではなく付属語等のカナである可能性が高い
ものである推測することができる。このため、漢字音検
索用の漢字音装には、１字漢字音を含めた全ての漢字音
を格納する必要はなく、第３図（、）、（ｂ）、（ｃ）
に示したように２字以上の漢字音を格納すればよい。By dividing the input character string by the above two or more kanji sounds and creating a search string using these as units, it is possible to avoid misinterpreting what is originally part of the kanji reading as a case particle, etc. . Furthermore, the above Kanji sounds are never used alone, but are always used in combination with other Kanji sounds. Therefore, as a result of matching an input string of three or more characters with these kanji sounds, if the first character is matched but the second and third characters are not, the first character is not a kanji sound but an attached character. It can be inferred that it is likely to be a kana such as a word. For this reason, it is not necessary to store all kanji sounds, including single-character kanji sounds, in the kanji sound device for kanji sound searches, as shown in Figures 3 (,), (b), and (c).
It is sufficient to store two or more kanji sounds as shown in .

従来、第３図（ａ）、（ｂ）、（Ｃ）に示したような漢
字音装を用いて次のような前処理を行っている。Conventionally, the following preprocessing has been performed using kanji sound instruments as shown in FIGS. 3(a), 3(b), and 3(c).

例えば、「ぶんしようのさくせいがひじようによういで
ある。」という入力カナ文字列に基づいて漢字音装をア
クセスし、漢字音とカナを最小単位とした次のような区
切りを施している。For example, the kanji sound system is accessed based on the input kana character string ``Bunsho no sakusei wa hijyo ni yoidei.'' and the following separation is performed using kanji sounds and kana as the minimum units.

「ジン／ショウ／の／サク／セイ／が／ひ／ジヨウ／に
／ヨウ／い／で／あ／る／。」ただし、カタカナは漢字音、ひらがなはカナを示す。"jin/sho/no/saku/sei/ga/hi/jiyou/ni/you/i/de/a/ru/." However, katakana indicates kanji sounds, and hiragana indicates kana.

上記の区切り処理後、漢字音とカナの並び方により、次
のように人文字列に属性を付加する。After the above delimiting process, attributes are added to the human character string as follows, depending on how the kanji sounds and kana are arranged.

（漢字音）＋（漢字音）十〜　・・・・・・ＴＹＰＥＩ
（漢字音）＋（カナ）十〜　　・・・・・・ＴＹＰＥ２
（カナ）＋（漢字音）十〜　　・・・・・・ＴＹＰＥ３
（カナ）＋（カナ）十〜　　　・・・・・ＴＹＰＥ４「
ブン＋ショウ＋〜」は（漢字音）＋（漢字音）十〜であ
るので上記例文はＴＹＰＥＩとなる。(Kanji sound) + (Kanji sound) 10~ ・・・・・・TYPEI
(Kanji sound) + (kana) 10~ ・・・・・・TYPE2
(Kana) + (Kanji sound) 10~ ・・・・・・TYPE3
(Kana) + (Kana) 10~ TYPE 4
Bun+shou+~'' is (kanji sound) + (kanji sound) ten~, so the above example sentence becomes TYPEI.

次に、入力文字列の上記属性ＴＹＰＥＩ〜ＴＹＰＥ４に
したがって、被検索文字列を次のようにして作成する。Next, a searched character string is created in the following manner according to the attributes TYPEI to TYPE4 of the input character string.

ＴＹＰＥｌの場合・・・・・・■（漢字音）＋（漢字音
）■（漢字音）ＴＹＰＥ２の場合・・・・・・■（漢字音）＋（カナ）
■（漢字音）ＴＹＰＥ３の場合・・・・・・■（カナ）＋（漢字音）
■（カナ）ＴＹＰＥ４の場合・・・・・・■（カナ）■（カナ）＋
（カナ） ■（カナ）＋（カナ）＋（カナ）上記例文の場合はＴＹＰＥＩであるので１次のように被
検索文字列を設定する。For TYPEL...■(Kanji sound) + (Kanji sound)■(Kanji sound) For TYPE2...■(Kanji sound)+(Kana)
■(Kanji sound) For TYPE3...■(Kana) + (Kanji sound)
■(kana) For TYPE4...■(kana)■(kana)+
(Kana) ■ (Kana) + (Kana) + (Kana) In the above example sentence, since it is TYPEI, set the searched character string as follows.

［設定される被検索文字列コ・・・■ぶんしよう■ぶん次に、設定された被検索文字列にしたがって、単語辞書
を検索し、得られた候補単語群に対して評価を行い、最
適な候補単語を選択する。[The set search string... Select candidate words.

ここでは、仮に［文章Ｊが最適候補単語として抽出され
たとする。この場合、次の解析対象文字列は「のさくせ
いがひじようによういである。」であるので、これに対
して再び漢字音とカナを最小単位とした区給りを施す。Here, it is assumed that [sentence J is extracted as the optimal candidate word. In this case, the next character string to be analyzed is ``Nosakusei ga hijyo ni yoi desu.'', so this is again divided into kanji sounds and kana as the minimum units.

Ｙの／サク／セイ／が／ひ／ジヨウ／に／ヨウ／い／で
／あ／る。」〔の＋サク十〜」は（カナ）＋（漢字音）十〜であるの
で上記入力文字列の属性はＴＹＰＥ３である。ＴＹＰＥ
３の属性にしたがって被検索文字列を作成すると、被検
索文字列は次のようになる。Y's /saku/sei/ga/hi/jiyou/ni/you/i/de/a/ru. "[no+sakuju~" is (kana) + (kanji sound) ten~, so the attribute of the above input character string is TYPE3. TYPE
When a searched character string is created according to the attributes of 3, the searched character string will be as follows.

［設定される被検索文字列コ・・・■のさく■の以下、同様にして、残りの入力文字列に対し漢字音とカ
ナを最小単位とした区切りを施し、漢字音とカナの並び
による入力文字列の属性に応じて適切な被検索文字列を
作成する。[The search string to be set... After ■ no saku ■, the remaining input strings are separated using kanji sounds and kana as the minimum units, and the kanji sounds and kana are separated according to the arrangement of kanji sounds and kana. Create an appropriate search string according to the attributes of the input string.

なお、２個の最小単位により被検索文字列を作成した理
由は、単語辞書に登録されている単語のほとんどは、２
個以下の最小単位に対応しているという事実を考慮して
、辞書検索のスピードアップを図るためである。The reason for creating a search string using two minimum units is that most of the words registered in the word dictionary are
This is to speed up the dictionary search, taking into account the fact that the minimum unit is smaller than 1.

しかし、上記方式には、接尾語付単語が入力文字列に存
在する場合、次のような欠点がある。However, the above method has the following drawbacks when a suffixed word exists in the input character string.

入力文字列「せつめいかいでは−・」を例に説明する。The input character string "Setsumeikaideha-・" will be explained as an example.

この場合、漢字音による前処理により、「セツ／メイ／
カイ／で／は〜」のように区切りが施される。「セッ＋メイ十〜」は（漢
字音）＋（漢字音）十〜であるのでＴＹＰＥｌに属し、
被検索文字列として、 ■せつめい ■せつが設定される。In this case, pre-processing using kanji sounds allows
The words are separated as follows: kai/de/ha~. "Set+meiju~" is (kanji sound) + (kanji sound) ten~, so it belongs to TYPEL.
■Setsumei■Setsu is set as the character string to be searched.

したがって、例え「説明会」が単語辞書に登録されてい
たとしても、検索の対象とならず、「説明会」というカ
ナ漢字変換結果を得るためには。Therefore, even if "briefing session" is registered in the word dictionary, it will not be searched, and in order to obtain the kana-kanji conversion result "briefing session."

少なくとも「せつめい」と「かい」の２度にわたって単
語辞書の検索（解析）を行う必要がある。また、２度に
わたる解析の結果、接尾語［会」が選択されるとは限ら
ず、「回」、「界」、「開」・「解」、「改ｊ等が選択
されるというような誤解析の可能性も高くなる。It is necessary to search (analyze) the word dictionary at least twice for "setsumei" and "kai". Furthermore, as a result of two rounds of analysis, the suffix ``kai'' is not necessarily selected, but ``kai'', ``kai'', ``kai''/``kai'', ``kaij, etc.'' are selected. The possibility of erroneous analysis also increases.

目　　　　　的本発明の目的は、上記のような従来技術の欠点をＭ消し
、カナ漢字変換処理装置における単語抽出に際し、単語
辞書検索速度の低下を抑制しつつ、単語辞書に登録され
た接尾語付単語を信頼性良く、かつ効果的に抽出するこ
とにある。Purpose The purpose of the present invention is to eliminate the above-mentioned drawbacks of the prior art, to suppress a decrease in the word dictionary search speed when extracting words in a kana-kanji conversion processing device, and to extract words with suffixes registered in the word dictionary. The objective is to extract words reliably and effectively.

構　　　成上記目的を達成するため、本発明による単語抽出方式は
、漢字音とカナを最小単位として入力文字列を区切る手
段を有するカナ漢字変換処理装置において、前記手段に
より区切られた入力文字列の漢字音とカナの並び方を判
定する第１の手段と、解析対象文字列の漢字音と同一の
読みを持つ接尾語が存在するか否かを判定する第２の手
段と、少なくとも解析対象文字列の１番目または２番目
の最小単位が漢字音であり、かつ３番目が接尾語と同一
の読みを持つ漢字音である場合、３個の最小単位により
被検索文字列を設定する第３の手段を設けたことに特徴
がある。Configuration In order to achieve the above object, the word extraction method according to the present invention is provided in a kana-kanji conversion processing device having a means for dividing an input character string using kanji sounds and kana as the minimum units. a first means for determining the arrangement of kanji sounds and kana; a second means for determining whether a suffix having the same reading as the kanji sounds of a character string to be analyzed exists; and at least a character string to be analyzed. If the first or second minimum unit of is a kanji sound, and the third is a kanji sound with the same reading as the suffix, a third means of setting the searched character string by the three minimum units. It is distinctive in that it has been established.

なお、入力文字列は順次解析されるため、当然、当面解
析されるべき入力文字列は次々に変化する。Note that since the input character strings are sequentially analyzed, the input character string that should be analyzed for the time being naturally changes one after another.

本明細書では、当面解析されるべき入力文字列を解析対
象文字列と命名している。In this specification, an input character string to be analyzed for the time being is named an analysis target character string.

以下、本発明の構成を一実施例により詳細に説明する。Hereinafter, the configuration of the present invention will be explained in detail using an example.

第１図は、本発明の一実施例による単語抽出方式を適用
したカナ漢字変換処理装置のブロック図である。FIG. 1 is a block diagram of a kana-kanji conversion processing device to which a word extraction method according to an embodiment of the present invention is applied.

第１図において、■は作成しようとする日本語文に対応
したカナ文字列を入力するためのカナ文字入力部、２は
入力されたカナ文字列を一時記憶しておく入力文字列記
憶部、３は入力文字列に対し漢字音とカナを最小単位と
する区切りを施す、漢字音による区切り部、４は第３図
（ａ）、（ｂ）、（Ｃ）に示したものと同様の漢字音装
、５は漢字音による区切りを施されたカナ文字列の、漢
字音とカナの並び方を基準とする属性を判定する入力文
字列の属性判定部、６は解析対象文字列中の漢字音と同
一の読みを持つ接尾語が存在するか否かを基準とする属
性を判定する、漢字音の属性判定部、７は、入力文字列
の属性判定部５と漢字音の属性判定部６から得られる情
報に基づいて、接尾語を含む単語を抽出するよう被検索
文字列を設定する被検索文字列設定部、９は単語辞書、
８は被検索文字列に基づいて単語辞書９を検索する辞書
検索部、１０は辞書検索部８より得られた候補単語を記
憶する候補単語記憶部、１１は候補単語を評価して最も
適切な候補単語を選択する候補単語評価部、１２は候補
単語評価部１１で選択された最適候補単語を記憶する最
適候補単語記憶部、１３は最適候補単語をカナ漢字変換
結果として表示するための表示部である。In Fig. 1, ■ is a kana character input section for inputting a kana character string corresponding to the Japanese sentence to be created, 2 is an input character string storage section for temporarily storing the input kana character string, and 3 is an input character string storage section for temporarily storing the input kana character string. 4 is a kanji sound separation section that divides the input character string using kanji sounds and kana as the minimum unit, and 4 is a kanji sound similar to that shown in Figures 3 (a), (b), and (C). 5 is an input character string attribute determination unit that determines the attribute of a kana character string separated by kanji sounds based on the arrangement of kanji sounds and kana; 6 is an attribute judgment unit for determining the attributes of a kana character string separated by kanji sounds; The kanji sound attribute determination unit 7, which determines the attribute based on whether or not a suffix with the same reading exists, is obtained from the input character string attribute determination unit 5 and the kanji sound attribute determination unit 6. 9 is a word dictionary; 9 is a word dictionary; 9 is a word dictionary;
8 is a dictionary search unit that searches the word dictionary 9 based on the character string to be searched; 10 is a candidate word storage unit that stores candidate words obtained from the dictionary search unit 8; and 11 is a unit that evaluates the candidate words and selects the most appropriate word. A candidate word evaluation unit that selects a candidate word, 12 an optimal candidate word storage unit that stores the optimal candidate word selected by the candidate word evaluation unit 11, and 13 a display unit that displays the optimal candidate word as a kana-kanji conversion result. It is.

漢字音による属性判定部５は、解析対象文字列を構成す
る個々の漢字音が次に示す２つの属性のうちどちらに属
するかを判定する。The attribute determining unit 5 based on kanji sounds determines which of the following two attributes each kanji sound constituting the character string to be analyzed belongs to.

漢字音の属性Ａ：その漢字音の読みと同じ読みを持つ接
尾語が存在する。Kanji sound attribute A: There is a suffix with the same reading as the kanji sound.

例：でき（的）、かい（会）等漢字音の属性Ｂ：その漢字音の読みと同じ読みを持つ接
尾語は存在しない。Examples: Deki (target), Kai (kai), etc. Kanji sound attribute B: There is no suffix that has the same reading as the kanji sound.

例：あく、せつ、いき等第２図は、本発明の一実施例による単語抽出方式の動作
を示すフローチャートである。Examples: Ak, Setsu, Iki, etc. FIG. 2 is a flowchart showing the operation of a word extraction method according to an embodiment of the present invention.

まず、漢字音による区切り部３は、入力文字列記憶部２
から送出された入力カナ文字列に対し、漢字音装４を用
いて漢字音による区切りを施す（２０１）。入力文字列
の属性判定部５は、漢字音とカナを最小単位として区切
られた入力文字列（解析対象文字列）の漢字音とカナの
並び方による属性が、ＴＹＰＥＩまたはＴＹＰＥ２また
はＴＹＰＥ３であるか否かを判定する（２０２）。ＴＹ
ＰＥＩ、ＴＹＰＥ２、ＴＹＰＥ３以外の場合は、従来方
式により被検索文字列を設定する（２０３）ＴＹＰＥ　
１またはＴＹＰＥ２またはＴＹＰＥ３の場合は、以下に
述べる本実施例特有の被検索文字列の設定処理を実行す
る。その理由は、ＴＹＰＥＩ・・・（漢字音）＋（漢字
音）十〜ＴＹＰＥ２・・・（漢字音）＋（カナ）十〜Ｔ
ＹＰＥ３・・・（カナ）＋（漢字音）十〜の場合、３番
目の最小単位が接尾語となる可能性があるからである。First, the delimiter 3 based on kanji sounds is input to the input character string storage 2.
The input kana character string sent from is divided by kanji sounds using the kanji sound device 4 (201). The input character string attribute determination unit 5 determines whether the attribute of the input character string (character string to be analyzed) separated by Kanji sounds and kana as minimum units is TYPEI, TYPE2, or TYPE3, based on the arrangement of Kanji sounds and kana. (202). T.Y.
In cases other than PEI, TYPE2, and TYPE3, set the searched character string using the conventional method (203) TYPE
In the case of TYPE 1, TYPE 2, or TYPE 3, a search target character string setting process unique to this embodiment described below is executed. The reason is TYPEI...(Kanji sound) + (Kanji sound) 10~TYPE2...(Kanji sound) + (Kana) 10~T
This is because in the case of YPE3... (kana) + (kanji sound) 10~, the third minimum unit may be a suffix.

本実施例では、上記漢字音の属性Ａ、Ｂを利用すること
としたため、ステップ２０２の判定を行った後、まず解
析対象文字列がＴＹＰＥｌが否かを判定し、ＴＹＰＥＩ
である場合はその３番目の最小単位が漢字音か否かを判
定する（２ｏ４．２゜５）。３番目の最小単位が漢字音
であるときは、さらにその漢字音が属性Ａか否かを判定
し、属性Ａのときは３個の最小単位により被検索文字列
を作成する（２０５．２０６．２ｏ７）。その被検索文
字列は、 ■（漢字音）＋（漢字音）＋（属性Ａの漢字音）■（漢
字音）＋（漢字音） ■（漢字音）の３個である。In this embodiment, since attributes A and B of the kanji sounds are used, after making the determination in step 202, it is first determined whether or not the character string to be analyzed is TYPE1.
If so, it is determined whether the third minimum unit is a kanji sound (2o4.2°5). When the third minimum unit is a kanji sound, it is further determined whether the kanji sound has attribute A, and if it is attribute A, a searched character string is created using the three minimum units (205.206. 2o7). The character strings to be searched are three: ■ (Kanji sound) + (Kanji sound) + (Kanji sound of attribute A) ■ (Kanji sound) + (Kanji sound) ■ (Kanji sound).

例えば、入力文字列「せっめいかいでは〜」の場合、漢
字音による区切りは「セッ／メイ／カイ／で／は〜」と
なり、被検索文字列は、■せつめいかい ■せつめい ■せつのように設定される。For example, in the case of the input character string "Setsumei Kaideha~", the kanji sound delimiter is "Set/Mei/Kai/De/Ha~", and the searched character string is ■Setsumeikai■Setsumei■Setsuyo Set.

上記３個の被検索文字列の中で、［のせつめいかい］が
最も読み長が長く、高い評価が得られるので、「説明会
」が単語辞書９に登録されている場合、接尾語「会」を
有する単語「説明会」が選択される。また、最長−教法
による場合も当然「説明会」が選択される。Among the above three searched character strings, [nosetumeikai] has the longest reading length and is highly rated, so if "briefing session" is registered in the word dictionary ” is selected. Also, in the case of longest term - teaching method, "explanation session" is naturally selected.

なお、３番目の漢字音に限って接尾語である可能性（属
性Ａ）を判定した理由は、１番目、２番目の最小単位が
接尾語であっても、それは従来の方式により抽出可能で
あるからである。The reason for determining the possibility that the third kanji sound is a suffix (attribute A) is that even if the first and second minimum units are suffixes, they cannot be extracted using the conventional method. Because there is.

ＴＹＰＥＩの３番目の最小単位が漢字音でないとき、お
よび漢字音であっても属性がＡでいときは従来の方式に
より被検索文字列を設定する（■）。If the third minimum unit of TYPEI is not a Kanji sound, or if the attribute is A even if it is a Kanji sound, the searched character string is set using the conventional method (■).

上記のように、３番目の漢字音が接尾語である可能性を
有する場合にのみ、３個の最小単位に基づいて被検索文
字列を作成し、その他の場１合は、従来通り２個の最小
単位に基づいて被検索文字列を作成することにより、単
語辞書９に対する検索処理速度の低下率を抑制すること
ができる。As mentioned above, only when there is a possibility that the third kanji sound is a suffix, a search string is created based on three minimum units, and in other cases, two By creating the searched character string based on the minimum unit of , it is possible to suppress the rate of decrease in the search processing speed for the word dictionary 9.

ＴＹＰＥＩでないときは、ＴＹＰＥ２か否かを判定し、
ＴＹＰＥＩである場合はその３番目の最小単位が漢字音
か否かを判定する（２０４．２０８．２０９）。３番目
の最小単位が漢字音である場合は、さらにその漢字音が
属性Ａか否かを判定し、属性Ａのときは３個の最小単位
により被検索文字列を作成する（２０９，２１０，２１
１）。If it is not TYPEI, determine whether it is TYPE2 or not,
If it is TYPEI, it is determined whether the third minimum unit is a kanji sound (204.208.209). If the third minimum unit is a kanji sound, it is further determined whether the kanji sound has attribute A, and if it is attribute A, a search character string is created using the three minimum units (209, 210, 21
1).

この場合の被検索文字列は、 ■（漢字音）＋（カナ）十（属性Ａの字音）■（漢字音
）＋（カナ）＋（漢字音） ■（漢字音）の３個である。In this case, the character strings to be searched are three: (1) (kanji sound) + (kana) 10 (character sound of attribute A) (2) (kanji sound) + (kana) + (kanji sound) (2) (kanji sound).

ＴＹＰＥ２の３番目の最小単位が漢字音でないとき、ま
た漢字音であっても属性がＡでいときは従来の方式によ
り被検索文字列を設定する（■）。When the third minimum unit of TYPE 2 is not a Kanji sound, or even if it is a Kanji sound, the attribute is A, the searched character string is set using the conventional method (■).

ステップ２０８の判定でＴＹＰＥ２でないと判定された
場合は、解析対象文字列はＴＹＰＥ３であると特定でき
るので、その３番目の最小単位が漢字音か否かを判定す
る（２０８．２１２）。漢字音である場合は、さらにそ
の属性を調べ属性がＡであるときは３個の最小単位によ
り被検索文字列を作成する（２１２．２１３．２１４）
。この場合の被検索文字列は、 ■（カナ）＋（漢字音）＋（属性Ａの漢字音）■（カナ
）＋（漢字音） ■（カナ）の３個である。If it is determined in step 208 that it is not TYPE 2, the character string to be analyzed can be identified as TYPE 3, so it is determined whether the third minimum unit is a kanji sound (208.212). If it is a kanji sound, further check its attributes, and if the attribute is A, create a search string using three minimum units (212.213.214)
. The character strings to be searched in this case are three (kana) + (kanji sound) + (kanji sound of attribute A) - (kana) + (kanji sound) - (kana).

最後に、得られた被検索文字列により単語辞書９を検索
する（２１５）。Finally, the word dictionary 9 is searched using the obtained searched character string (215).

このように、３番目の最小単位が漢字音であり、かつそ
の漢字音が接尾語である可能性がある場合にのみ、３個
の最小単位により被検索文字列を作成することにより、
検索処理速度の低下を抑制しつつ、単語辞書に登録され
た接尾語付単語を抽出することができる。In this way, by creating a search string using three minimum units only when the third minimum unit is a Kanji sound and there is a possibility that the Kanji sound is a suffix,
It is possible to extract words with suffixes registered in a word dictionary while suppressing a decrease in search processing speed.

効　　　果以上説明したように、本発明の単語油路方式によれば、
カナ漢字変換処理装置における単語抽出し；際し、単語
辞書検索速度の低下を抑制しつつ。Effects As explained above, according to the word oil path method of the present invention,
Extracting words using a kana-kanji conversion processing device; while suppressing a decrease in word dictionary search speed.

単語辞書に登録された接尾語付単語を信頼性良く、かつ
効果的に抽出することができる。Words with suffixes registered in a word dictionary can be extracted reliably and effectively.

[Brief explanation of drawings]

第１図は本発明の一実施例による単語抽出方式を適用し
たカナ漢字変換処理装置のブロック図、第２図は本発明
の一実施例による単語抽出方式の動作を示すフローチャ
ート、第３図は漢字音装を示す図である。３：漢字音による区切り部、４：漢字音装、５：入力文
字列の属性判定部、６：漢字音の属性判定部、７：被検
索文字列設定部、８：辞書検索部、９：単語辞書。FIG. 1 is a block diagram of a kana-kanji conversion processing device to which a word extraction method according to an embodiment of the present invention is applied, FIG. 2 is a flowchart showing the operation of the word extraction method according to an embodiment of the present invention, and FIG. It is a figure showing a kanji sound system. 3: Kanji sound separator, 4: Kanji sound device, 5: Input character string attribute determination section, 6: Kanji sound attribute determination section, 7: Searched character string setting section, 8: Dictionary search section, 9: Word dictionary.

Claims

[Claims]

(1) In a kana-kanji conversion processing device having means for separating an input character string using kanji sounds and kana as minimum units, a first means for determining the arrangement of kanji sounds and kana in the input character string separated by the means; , the second step determines whether there is a suffix with the same reading as the kanji sound in the character string to be analyzed.
and at least the first or second of the string to be parsed.
If the second minimum unit is a Kanji sound and the third minimum unit is a Kanji sound with the same reading as the suffix, a third means is provided for creating a search string using the three minimum units. A word extraction method characterized by: