JP5743938B2

JP5743938B2 - Associative search system, associative search server, and program

Info

Publication number: JP5743938B2
Application number: JP2012069750A
Authority: JP
Inventors: 修今一
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2012-03-26
Filing date: 2012-03-26
Publication date: 2015-07-01
Anticipated expiration: 2032-03-26
Also published as: JP2013200795A

Description

本発明は、検索要求として与えられた文書に関連する文書を検索する連想検索システムに関し、特に、与えられた文書中の特徴単語の出現位置の情報を用いる連想検索システム、連想検索サーバ及びそれらを実現するプログラムに関する。 The present invention relates to an associative search system for searching for a document related to a document given as a search request, and in particular, an associative search system, an associative search server using information on the appearance position of a feature word in a given document, and those It relates to the program to be realized.

コンピュータやインターネットの普及に伴い、文書情報の電子化が急速に進んでいる。一方、入手可能な情報の増加に伴い、それらの中から必要な情報を探し出すことが重要な課題となってきている。また、複数の文書データベース間での文書群の関連性を調べたいという要求も高まっている。例えば、興味のある新聞記事に対し、それらに関連する百科事典の項目を検索したいという要求は多い。 With the spread of computers and the Internet, computerization of document information is rapidly progressing. On the other hand, with the increase of available information, it has become an important issue to search for necessary information from them. In addition, there is an increasing demand for examining the relationship between document groups among a plurality of document databases. For example, there are many requests to search for articles of encyclopedia related to newspaper articles of interest.

現在実用化されているキーワード検索技術の場合、複数の文書データベースを切り替えて検索することは可能であるが、ある文書データベースに含まれる文書群に対し、それに関連する文書群を、同一の文書データベース、あるいは、別の文書データベースから検索すること（文書連想検索と呼ばれる検索方式）は不可能である。 In the case of keyword search technology that is currently in practical use, it is possible to search by switching between multiple document databases, but for a document group included in a document database, the related document group is assigned to the same document database. Alternatively, it is impossible to search from another document database (a search method called document associative search).

同一の文書データベースに限れば、文書間の類似度を予め計算しておくことにより、文書群を検索入力とした文書連想検索を実現することはできる。しかし、複数の文書データベース間での文書連想検索を実現しようとすると、予め計算すべき文書間の関連度の組み合わせ数が、文書データベース数の増加に伴って爆発的に増加する。このため、文書間の類似度を予め計算する方法による文書連想検索の現実は不可能である。 As long as it is limited to the same document database, a document associative search using a document group as a search input can be realized by calculating the similarity between documents in advance. However, when realizing a document associative search between a plurality of document databases, the number of combinations of relevance levels between documents to be calculated in advance increases explosively as the number of document databases increases. For this reason, the reality of the document associative search by the method of calculating the similarity between documents in advance is impossible.

これに対し、特許文献１には、利用者が指定した文書データベース中の任意の文書群に対して、その文書群に関連する文書群を任意の文書データベースから効率よく検索するための方法が開示されている。 On the other hand, Patent Document 1 discloses a method for efficiently retrieving a document group related to a document group from an arbitrary document database for an arbitrary document group in a document database designated by a user. Has been.

特許文献１に開示の方法は、文書群として入力された検索入力内の特徴的な単語群（特徴単語群）のみを使用し、高速な文書連想検索を実現する。この方法を用いれば、利用者は、複数の異なる種類の文書データベースを切り替えながら、文書群の関連性を調べることができ、高精度かつ効率的に文書を検索することができる。また、この方法は、検索結果として得られた文書群に出現する特徴単語群を抽出し、それらを検索結果の概観（要約）として利用者に提示することにより、利用者による検索結果の可否の判断を支援する技術も提供する。 The method disclosed in Patent Literature 1 uses only a characteristic word group (characteristic word group) in a search input input as a document group, and realizes a high-speed document associative search. By using this method, the user can check the relevance of the document group while switching between a plurality of different types of document databases, and can search the document with high accuracy and efficiency. Also, this method extracts feature word groups appearing in the document group obtained as a search result, and presents them to the user as an overview (summary) of the search result, thereby determining whether or not the search result by the user is acceptable. It also provides technology to support judgment.

特開２０００−１５５７５８号公報JP 2000-155758 A

一般に、単語に基づく文書検索では、文書中に出現する単語によって文書のインデックス付けを行ない、文書検索を実現する。特許文献１の場合も同様であり、文書から特徴単語群を抽出する際には、文書に含まれる単語の統計的尺度（tf*idf法などが代表的）を用いて重要度を計算し、重要度の高い順に単語を抽出し、連想検索を実現する。 In general, in document retrieval based on words, documents are indexed by words appearing in the document to realize document retrieval. The same applies to Patent Document 1, and when extracting a feature word group from a document, the importance is calculated using a statistical measure (typically tf * idf method) of words included in the document, Extract words in descending order of importance and realize associative search.

しかし、従来の連想検索では、特徴単語を抽出する対象は文書全体である。このため、文書に複数の話題が含まれている場合には、複数の話題の特徴単語が混在した状態のまま単語が抽出される。つまり、複数の話題を総合的に判断して類似した文書が検索される。このため、利用者が望んだ結果が必ずしも得られるとは限らない。 However, in the conventional associative search, a target word is extracted from the entire document. For this reason, when a document includes a plurality of topics, the words are extracted while the feature words of the plurality of topics are mixed. That is, a similar document is searched by comprehensively judging a plurality of topics. For this reason, the result desired by the user is not always obtained.

この技術課題を鑑み、本発明は、検索入力となる文書群中に含まれる話題ごとに類似する文書を検索できる連想検索システムを提供する。 In view of this technical problem, the present invention provides an associative search system capable of searching for similar documents for each topic included in a document group serving as a search input.

このために、本発明においては、連想検索における検索入力文書から特徴単語群を抽出する際に、各単語の重要度だけでなく、その単語の検索入力文書中での位置情報も付加して抽出処理を実行する。次に、抽出した特徴単語群を、各単語の重要度と出現位置に基づいて分類する。特徴単語群の分類数は、検索入力文書中での特徴単語の重要度と距離に応じて分類する際の分類スコアに閾値を設定して自動的に設定してもよいし、利用者がユーザインタフェース上で分類数を任意に設定してもよい。最後に、分類結果として得られた特徴単語群のそれぞれを検索入力として検索を実行する。 For this reason, in the present invention, when extracting a feature word group from a search input document in an associative search, not only the importance of each word but also position information of the word in the search input document is added and extracted. Execute the process. Next, the extracted feature word group is classified based on the importance and appearance position of each word. The classification number of the feature word group may be automatically set by setting a threshold value for the classification score when classifying according to the importance and distance of the feature word in the search input document. The number of classifications may be arbitrarily set on the interface. Finally, the search is executed using each of the feature word groups obtained as a classification result as a search input.

本発明によれば、複数の話題が含まれている文書群を検索入力とする場合でも、文書群全体として類似した文書ではなく、分類された特徴単語群（文書中に含まれる話題に相当）毎に類似した文書を連想検索結果として得ることができる。これにより、利用者の希望により近い結果を提示することができる。前述した以外の課題、構成及び効果は、以下の実施の形態の説明により明らかにされる。 According to the present invention, even when a document group including a plurality of topics is used as a search input, a group of characteristic words (corresponding to topics included in the document) is not a similar document as a whole document group, but a group of feature words. A similar document can be obtained as an associative search result every time. Thereby, a result closer to the user's request can be presented. Problems, configurations, and effects other than those described above will become apparent from the following description of embodiments.

連想検索システムの概念的な構成例を示す図。The figure which shows the conceptual structural example of an associative search system. 連想検索スレーブサーバの構成例を示す図。The figure which shows the structural example of an associative search slave server. 検索手段の構成例を示す図。The figure which shows the structural example of a search means. 特徴単語抽出手段の構成例を示す図。The figure which shows the structural example of a characteristic word extraction means. 文書中における特徴単語の分布例を示す図。The figure which shows the example of distribution of the characteristic word in a document. 特徴単語の抽出及び分類方法の例を示す図。The figure which shows the example of the extraction and classification method of a feature word. 特徴単語の抽出及び分類方法の例を示す図。The figure which shows the example of the extraction and classification method of a feature word. 検索クライアントにおける初期画面の例を示す図。The figure which shows the example of the initial screen in a search client. 検索クライアントにおける検索結果の表示例を示す図。The figure which shows the example of a display of the search result in a search client. 検索クライアントにおける検索結果の表示例を示す図。The figure which shows the example of a display of the search result in a search client. 検索クライアントにおける検索結果の表示例を示す図。The figure which shows the example of a display of the search result in a search client. 特徴単語群分類結果の確認画面の表示例を示す図。The figure which shows the example of a display of the confirmation screen of a characteristic word group classification result. インデックス付けの一例を示す図。The figure which shows an example of indexing. 検索クライアント、連想検索マスタサーバ、連想検索スレーブサーバ間におけるデータ及び処理の流れを示す図。The figure which shows the flow of data and a process between a search client, an associative search master server, and an associative search slave server. 検索クライアント、連想検索マスタサーバ、連想検索スレーブサーバ間におけるデータ及び処理の流れを示す図。The figure which shows the flow of data and a process between a search client, an associative search master server, and an associative search slave server. 検索クライアント、連想検索マスタサーバ、連想検索スレーブサーバ間におけるデータ及び処理の流れを示す図。The figure which shows the flow of data and a process between a search client, an associative search master server, and an associative search slave server.

以下、図面に基づいて、本発明の実施の形態を説明する。なお、本発明の実施の態様は、後述する形態例に限定されるものではなく、その技術思想の範囲において、種々の変形が可能である。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. The embodiment of the present invention is not limited to the embodiments described later, and various modifications are possible within the scope of the technical idea.

図１は、形態例に係る連想検索システムの概略構成を示している。このシステムは、利用者による検索要求の入力及び検索結果の表示に使用される検索クライアント２０と、文書データベースを検索する連想検索スレーブサーバ４０、５０、６０と、連想検索クライアント２０と連想検索スレーブサーバ４０、５０、６０を仲介する連想検索マスタサーバ３０と、これらを接続する通信ネットワーク１０とで構成される。 FIG. 1 shows a schematic configuration of an associative search system according to an embodiment. This system includes a search client 20 used for inputting a search request and displaying a search result by a user, an associative search slave server 40, 50, 60 for searching a document database, an associative search client 20 and an associative search slave server. The associative search master server 30 that mediates 40, 50, and 60 and the communication network 10 that connects them are configured.

図１の例では、文書データベースを検索する連想検索スレーブサーバが通信ネットワーク１０に３台接続されている場合を表しているが、通信ネットワーク１０に接続される連想検索スレーブサーバの数は任意でよい。検索クライアント２０の数も任意である。 In the example of FIG. 1, the case where three associative search slave servers that search the document database are connected to the communication network 10 is shown, but the number of associative search slave servers connected to the communication network 10 may be arbitrary. . The number of search clients 20 is also arbitrary.

また、図１の例では、検索クライアント１０と、連想検索マスタサーバ３０と、連想検索スレーブサーバ４０、５０、６０とを通信ネットワーク１０を介して接続しているが、これらのうちの幾つかを、あるいは、全てを同一の計算機上に構成してもよい。 In the example of FIG. 1, the search client 10, the associative search master server 30, and the associative search slave servers 40, 50, and 60 are connected via the communication network 10. Alternatively, all may be configured on the same computer.

図２に、連想検索スレーブサーバ４０の構成例を示す。他の連想検索スレーブサーバ５０、６０の構成も、連想検索スレーブサーバ４０と同じである。連想検索スレーブサーバ４０は、メモリ装置４９１、演算処理装置４９２、インタフェース装置４９３、補助記憶装置４９４、入力装置４９５、出力装置４９６を有し、それぞれがバス４９０を介して相互に接続されている。 FIG. 2 shows a configuration example of the associative search slave server 40. The configuration of the other associative search slave servers 50 and 60 is the same as that of the associative search slave server 40. The associative search slave server 40 includes a memory device 491, an arithmetic processing device 492, an interface device 493, an auxiliary storage device 494, an input device 495, and an output device 496, which are connected to each other via a bus 490.

メモリ装置４９１は、補助記憶装置４９４からプログラムを読み出して記憶するＲＡＭ（Random Access Memory）等の記憶装置である。メモリ装置４９１には、検索手段４１０と特徴単語抽出手段４２０に対応するプログラム、その実行に必要な検索インデックス４３０と文書データベース４４０に対応するファイルやデータ等が記憶される。 The memory device 491 is a storage device such as a RAM (Random Access Memory) that reads and stores a program from the auxiliary storage device 494. The memory device 491 stores programs corresponding to the search means 410 and the feature word extraction means 420, files and data corresponding to the search index 430 and the document database 440 necessary for the execution.

演算処理装置４９２は、メモリ装置４９１に格納されたプログラムを実行するＣＰＵ（Central Processing Unit）等の演算処理装置である。インタフェース装置４９３は、外部ネットワーク等に接続するためのインタフェース装置である。補助記憶装置４９４は、検索手段４１０と特徴単語抽出手段４２０に対応するプログラム、検索インデックス４３０と文書データベース４４０に対応するファイルやデータ等を記憶するＨＤＤ（Hard Disk Drive）等の記憶装置である。入力装置４９５は、ユーザインタフェースを提供する装置（例えば、キーボード、マウス）である。出力装置４９６は、ユーザインタフェースを提供する出力装置（例えば、ディスプレイ装置）である。 The arithmetic processing device 492 is an arithmetic processing device such as a CPU (Central Processing Unit) that executes a program stored in the memory device 491. The interface device 493 is an interface device for connecting to an external network or the like. The auxiliary storage device 494 is a storage device such as an HDD (Hard Disk Drive) that stores programs corresponding to the search means 410 and the feature word extraction means 420, files and data corresponding to the search index 430 and the document database 440, and the like. The input device 495 is a device that provides a user interface (for example, a keyboard and a mouse). The output device 496 is an output device (for example, a display device) that provides a user interface.

図２は、連想検索スレーブサーバの構成を示す図であるが、検索クライアント２０と連想検索マスタサーバ３０の構成も、補助記憶装置に記憶されるプログラムやデータの違いを除き、同様に構成される。 FIG. 2 is a diagram showing the configuration of the associative search slave server, but the configurations of the search client 20 and the associative search master server 30 are also configured in the same manner except for the difference in programs and data stored in the auxiliary storage device. .

図３に、連想検索スレーブサーバ４０が備える検索手段４１０の機能ブロック構成を示す。プログラムとしての検索手段４１０は、単語頻度取得手段４１１、位置情報取得手段４１２、関連度計算手段４１３、近似性計算手段４１４、スコア計算手段４１５の各機能により構成される。これらの検索手段４１０を構成する各手段もプログラム処理を通じて提供される。 FIG. 3 shows a functional block configuration of the search means 410 provided in the associative search slave server 40. The search means 410 as a program is constituted by the functions of a word frequency acquisition means 411, a position information acquisition means 412, a relevance calculation means 413, a proximity calculation means 414, and a score calculation means 415. Each means constituting these search means 410 is also provided through program processing.

連想検索スレーブサーバ４０は、連想検索マスタサーバ３０が備える検索要求発行手段３２０から送られてきた検索要求に対し、関連度の高い文書群を文書データベース４４０から検索し、その検索結果を関連度のスコア付きで連想検索マスタサーバ３０に返す。ここでの検索は、例えば公知のキーワード検索手法により実現することができる。 The associative search slave server 40 searches the document database 440 for a document group having a high degree of relevance in response to the search request sent from the search request issuing means 320 provided in the associative search master server 30, and the search result is obtained from the relevance degree. It returns to the associative search master server 30 with a score. The search here can be realized by, for example, a known keyword search method.

キーワード検索手法では、検索処理の効率を上げるために、文書データベースに含まれる文書を単語に分割し（日本語の文書に対しては形態素解析を実行し、英語の文書に対してはステミング処理を実行する）、どの文書にどの単語が含まれているかを示す検索インデックスを事前に作成する。後述する本実施例の検索方法のように、検索時に位置情報も用いる場合には、各単語の出現位置もインデックスに格納しておく。検索実行時には、事前に作成された検索インデックスを用いることで、検索処理を高速に実行することができる。 In the keyword search method, in order to increase the efficiency of the search process, the documents contained in the document database are divided into words (morphological analysis is performed for Japanese documents and stemming processing is performed for English documents). A search index is created in advance indicating which words are included in which documents. When position information is also used at the time of search as in the search method of this embodiment described later, the appearance position of each word is also stored in the index. When a search is executed, the search process can be executed at high speed by using a search index created in advance.

図１の場合には、連想検索スレーブサーバ４０、５０、６０が有する文書データベース４４０、５４０、６４０のそれぞれについて、検索インデックス４３０、５３０、６３０を事前に作成し、検索処理に利用する。 In the case of FIG. 1, search indexes 430, 530, and 630 are created in advance for each of the document databases 440, 540, and 640 included in the associative search slave servers 40, 50, and 60, and are used for search processing.

検索要求と検索対象文書間の関連度の計算は、以下の手順で実行される。まず、検索手段４１０が、連想検索マスタサーバ３０の検索要求発行手段３２０から送信された検索要求を受信する。検索手段４１０は、受信した検索要求に含まれる単語群を含む文書を検索する。単語頻度取得手段４１１は、検索結果として得られた文書のそれぞれについて、各文書に含まれる単語群のうち検索要求に含まれる単語群の頻度情報を取得する。次に、関連度計算手段４１３は検索要求とその文書の関連度を計算する。関連度の計算方法は任意でよい。例えば公知の技術であるtf*idf法により単語の重要度を計算し、その総和を関連度とする。単語の近接性を検索スコアに反映する場合には、位置情報取得手段４１２が、各文書に含まれる単語群のうち検索要求に含まれる単語群の出現位置情報を取得し、近接性計算手段４１４が近接スコアを計算する。近接スコアの計算方法は任意でよい。例えば、検索要求に含まれる単語群がどれくらい密集して出現しているかを計算し、その計算結果を近接スコアとする。スコア計算手段４１５は、関連度計算手段４１３と近接性計算手段４１４のそれぞれから得られたスコアを統合し、統合後のスコアを関連度として文書に付与する。 The calculation of the degree of association between the search request and the search target document is executed in the following procedure. First, the search means 410 receives the search request transmitted from the search request issuing means 320 of the associative search master server 30. The search unit 410 searches for a document including a word group included in the received search request. The word frequency acquisition unit 411 acquires the frequency information of the word group included in the search request among the word groups included in each document for each of the documents obtained as the search results. Next, the relevance calculation means 413 calculates the relevance between the search request and the document. The calculation method of the relevance may be arbitrary. For example, the importance of words is calculated by a known technique, tf * idf method, and the sum is used as the relevance. When the word proximity is reflected in the search score, the position information acquisition unit 412 acquires the appearance position information of the word group included in the search request among the word groups included in each document, and the proximity calculation unit 414. Calculates the proximity score. The calculation method of the proximity score may be arbitrary. For example, how densely the word group included in the search request appears, and the calculation result is used as the proximity score. The score calculation unit 415 integrates the scores obtained from each of the relevance calculation unit 413 and the proximity calculation unit 414, and gives the integrated score to the document as the relevance.

図４に、連想検索スレーブサーバ４０が備える特徴単語抽出手段４２０の機能ブロック構成を示す。プログラムとしての特徴単語抽出手段４２０は、単語頻度取得手段４２１、位置情報取得手段４２２、重要度計算手段４２３、近接性クラスタリング手段４２４、単語追加手段４２５の各機能により構成される。これらの特徴単語抽出手段４２０を構成する各手段もプログラム処理を通じて提供される。 FIG. 4 shows a functional block configuration of the feature word extraction unit 420 included in the associative search slave server 40. The feature word extraction unit 420 as a program is configured by the functions of a word frequency acquisition unit 421, a position information acquisition unit 422, an importance degree calculation unit 423, a proximity clustering unit 424, and a word addition unit 425. Each means constituting these characteristic word extracting means 420 is also provided through program processing.

特徴単語抽出手段４２０は、連想検索マスタサーバ３０が備える特徴単語要求手段３３０から送られてきた文書群に対する特徴単語を、文書データベース４４０から抽出する。特徴単語抽出手段４２０は、特徴単語の高速抽出を実現するために、検索手段４１０と同様、検索インデックス４３０を利用する。すなわち、特徴単語抽出手段４２０は、ある文書にどの単語が含まれているかを、検索インデックス４３０を参照して調べる。 The feature word extraction unit 420 extracts, from the document database 440, the feature words for the document group sent from the feature word request unit 330 included in the associative search master server 30. The feature word extraction unit 420 uses the search index 430 in the same manner as the search unit 410 in order to realize high-speed extraction of feature words. That is, the feature word extracting unit 420 checks which word is included in a certain document with reference to the search index 430.

特徴単語の抽出は、以下の手順で実行される。まず、特徴単語抽出手段４２０が、連想検索マスタサーバ３０の特徴単語要求手段３３０から送信された文書群を受信する。単語頻度取得手段４２１は、受信した文書群に含まれる各単語の頻度情報を取得する。取得された頻度情報に基づいて、重要度計算手段４２３は、各単語の重要度を計算する。重要度の計算方法は任意でよい。例えば公知の技術であるtf*idf法により単語の重要度を計算する。位置情報を用いない連想検索の場合、特徴単語抽出手段４２０は、高い重要度が付された単語から順番に特徴単語として連想検索マスタサーバ３０に返す。 The extraction of feature words is performed in the following procedure. First, the feature word extraction unit 420 receives the document group transmitted from the feature word request unit 330 of the associative search master server 30. The word frequency acquisition unit 421 acquires frequency information of each word included in the received document group. Based on the acquired frequency information, the importance calculation means 423 calculates the importance of each word. The calculation method of importance may be arbitrary. For example, the importance of a word is calculated by a known technique tf * idf method. In the case of an associative search that does not use position information, the feature word extraction unit 420 returns to the associative search master server 30 as feature words in order from words with high importance.

本実施の形態では、位置情報取得手段４２２が、重要度付きの各単語について出現位置情報を取得する。さらに、近接性クラスタリング手段４２４が、重要度と位置情報とに基づいて検索された単語群を分類する。さらに、単語追加手段４２５が、分類結果のそれぞれに含まれる単語群に近接する単語を追加する。特徴単語抽出手段４２０は、このようにして得られた特徴単語群の集合を連想検索マスタサーバ３０に返す。単語追加手段４２５の使用は任意でよい。 In the present embodiment, the position information acquisition unit 422 acquires appearance position information for each word with importance. Further, the proximity clustering means 424 classifies the searched word group based on the importance level and the position information. Further, the word adding means 425 adds words that are close to the word group included in each of the classification results. The feature word extraction unit 420 returns the set of feature words obtained in this way to the associative search master server 30. Use of the word adding means 425 may be arbitrary.

次に、近接性クラスタリング手段４２４の動作を図５、図６、図７を用いて説明する。図５は、同じ単語が含まれる二つの文書１、文書２を例示している。文書１では、各単語（term1〜term6）が文書全体に分散して分布しているのに対し、文書２では、term1〜term3が文書中の前半に、term4〜term6が文書中の後半に集中して分布している。 Next, the operation of the proximity clustering means 424 will be described with reference to FIGS. FIG. 5 illustrates two documents 1 and 2 that include the same word. In document 1, each word (term1 to term6) is distributed and distributed throughout the document, whereas in document 2, term1 to term3 are concentrated in the first half of the document and term4 to term6 are concentrated in the second half of the document. Distributed.

このような場合でも、従来の連想検索では、特徴単語の出現位置を考慮していないため、文書１を検索入力として連想検索を実行した場合の結果と、文書２を検索入力として連想検索を実行した場合の結果は、同じである。しかし、特徴単語の文書中での分布が偏っている場合、複数の話題について書かれている可能性があるため、特徴単語群を個々の話題に分類することが望ましい。 Even in such a case, in the conventional associative search, the appearance position of the feature word is not taken into consideration. Therefore, the result of executing the associative search using the document 1 as a search input and the associative search using the document 2 as a search input are executed. The result is the same. However, if the distribution of feature words in the document is biased, it may be written about a plurality of topics, so it is desirable to classify feature word groups into individual topics.

図６は、文書１から特徴単語群を抽出する場合の例である。この場合、各特徴単語は、文書全体に分散して分布しているため、一つの話題について書かれていると考えられる。従って、位置情報に基づいて特徴単語群を分類しても、分類することができず、一つの特徴単語群となる。 FIG. 6 is an example of extracting a feature word group from the document 1. In this case, since each characteristic word is distributed and distributed throughout the document, it is considered that one feature word is written. Therefore, even if the feature word group is classified based on the position information, it cannot be classified and becomes one feature word group.

図７は、文書２から特徴単語群を抽出する場合の例である。この場合、各特徴単語は、文書の前半にterm1〜term3、文書の後半にterm4〜term6が集中して分布しているため、二つの話題について書かれていると考えられる。従って、位置情報に基づいて特徴単語群を分類すると、term1〜term3の特徴単語群と、term4〜term6の特徴単語群の二つの特徴単語群が抽出される。 FIG. 7 shows an example of extracting a feature word group from the document 2. In this case, each characteristic word is considered to be written on two topics because term1 to term3 are concentrated in the first half of the document and term4 to term6 are concentrated in the second half of the document. Therefore, when the feature word group is classified based on the position information, two feature word groups, that is, the feature word group of term1 to term3 and the feature word group of term4 to term6 are extracted.

近接性クラスタリング手段４２４による特徴単語群の分類には、例えば、単語の出現位置とその重みを用いる階層的クラスタリング手法を適用すればよい。複数回出現する単語については、予め、その重心位置を求めておく。その後、各単語の位置に基づいて、最も近接する単語をまとめあげる。その際、それぞれに単語の重みを考慮して、新しい重心を決定する。この処理を繰り返すことでクラスタリング結果を得る。 For classification of the feature word group by the proximity clustering unit 424, for example, a hierarchical clustering method using the appearance position of the word and its weight may be applied. For words that appear multiple times, the position of the center of gravity is obtained in advance. Then, based on the position of each word, the closest words are put together. At that time, a new center of gravity is determined in consideration of the weight of each word. A clustering result is obtained by repeating this process.

あるいは、別の手法として、複数回出現する単語が文書中のどの範囲を被覆するかを求め、文書全体における被覆度の少ない箇所で特徴単語群を分類してもよい。 Alternatively, as another method, it is possible to determine which range in the document is covered by a word that appears multiple times, and classify the feature word group at a location with a low coverage in the entire document.

前述の説明では、近接性クラスタリング手段４２４における特徴単語分類手法として、二つの手法について説明したが、位置情報に基づいて特徴単語群を分類する手法であれば任意のものを用いてもよい。 In the above description, two methods have been described as the feature word classification method in the proximity clustering unit 424. However, any method may be used as long as it is a method for classifying a feature word group based on position information.

このようにして得られた特徴単語群を用いて連想検索を実行することにより、文書中に複数の話題が含まれている場合でも、利用者の望んだ検索結果を得ることが可能となる。 By performing an associative search using the feature word group obtained in this way, it is possible to obtain a search result desired by the user even when a document includes a plurality of topics.

図８は、検索クライアント２０が備える検索要求入力手段２１０により提供される画面例を表している。利用者は、検索要求入力エリア２１１に検索要求を入力し、検索指示ボタン２１２をクリックすることにより検索の実行を検索クライアント２０に指示する。 FIG. 8 shows an example of a screen provided by the search request input means 210 provided in the search client 20. The user inputs a search request in the search request input area 211 and clicks the search instruction button 212 to instruct the search client 20 to execute the search.

図９は、検索クライアント２０による検索結果の表示例である。検索結果は、検索結果表示手段２２０により表示され、検索結果から抽出された特徴単語群が特徴単語表示手段２３０により表示される。特徴単語表示手段２３０を用いるか否かは任意である。検索結果表示手段２２０は文書群指定手段も兼ねている。文書選択チェックボックス２２１により任意個の文書を選択した状態で、連想検索指示ボタン２１３をクリックすると、選択した文書と関連する文書を検索することができる。特徴単語表示手段２３０は、単語群指定手段も兼ねている。単語選択チェックボックス２３１により任意個の単語を選択した状態で、連想検索指示ボタン２１３をクリックすると、特徴単語からの検索を実行することができる。分類数指定手段２４０は、文書を選択して連想検索を実行する場合に、文書中に含まれる話題を何個に分割するかを指定入力するために用いられる。分類数は、数値として直接指定してもよいし、スライドバーやボタン等を用いて指定してもよい。また、分類数は、分類スコアと閾値との比較を通じて自動的に設定してもよい。分類スコアは、特徴単語の重要度のスコアと近接度合のスコアを統合したスコアとして規定する。分類数を閾値により自動設定する場合には、分類数指定手段２４０を画面に表示しなくてもよい。 FIG. 9 is a display example of a search result by the search client 20. The search result is displayed by the search result display unit 220, and the feature word group extracted from the search result is displayed by the feature word display unit 230. Whether or not to use the feature word display means 230 is arbitrary. The search result display unit 220 also serves as a document group designation unit. When an associative search instruction button 213 is clicked in a state where an arbitrary number of documents are selected by the document selection check box 221, a document related to the selected document can be searched. The feature word display means 230 also serves as a word group designation means. When an associative search instruction button 213 is clicked in a state where an arbitrary number of words are selected by the word selection check box 231, a search from a characteristic word can be executed. The classification number designation means 240 is used to designate and input the number of topics to be divided into when a document is selected and an associative search is executed. The number of classifications may be directly specified as a numerical value, or may be specified using a slide bar, a button, or the like. Further, the number of classifications may be automatically set through a comparison between the classification score and a threshold value. The classification score is defined as a score obtained by integrating the importance score and the proximity score of the feature word. When the number of classifications is automatically set based on a threshold value, the classification number designation unit 240 does not have to be displayed on the screen.

図１０は、検索入力として与えられた文書に二つの話題が含まれている場合の検索結果の例である。この場合、検索結果表示手段２２０には、二列に分けて、それぞれの話題に関する検索結果が表示される。左列の記事１〜５が話題１に対応し、右列の記事Ａ〜Ｅが話題２に対応する。なお、図１０の場合、特徴単語表示手段２３０には、二つの話題の検索結果を統合して、そこから特徴単語群を抽出した結果を表示している。 FIG. 10 is an example of a search result when two topics are included in a document given as a search input. In this case, the search result display means 220 displays the search results for each topic in two columns. Articles 1 to 5 in the left column correspond to topic 1, and articles A to E in the right column correspond to topic 2. In the case of FIG. 10, the feature word display means 230 displays the result of extracting the feature word group from the search results of the two topics integrated.

一方、図１１は、検索入力として与えられた文書に二つの話題が含まれている点は図１０と同じであるが、特徴単語表示手段２３０に、各話題の検索結果ごとに特徴単語群を抽出し、それぞれを二列に表示している。左列の特徴ターム１〜５が話題１に対応し、右列の特徴タームＡ〜Ｅが話題２に対応する。図９の場合と同様、特徴単語表示手段２３０を用いるか否かは任意である。 On the other hand, FIG. 11 is the same as FIG. 10 in that the document given as the search input includes two topics, but the feature word group is displayed in the feature word display means 230 for each topic search result. Extracted and displayed in two columns. Feature terms 1 to 5 in the left column correspond to topic 1, and feature terms A to E in the right column correspond to topic 2. As in the case of FIG. 9, whether or not to use the feature word display means 230 is arbitrary.

図１２は、近接性クラスタリング手段４２４が分類した特徴単語群を確認する画面である。利用者は、この画面を用いて、分類された特徴単語群が適切かどうかを判断し、適切であれば検索指示ボタン２１３をクリックする。適切でなければ、利用者は、分類数指定手段２４０に新たな分類数を指定し、その後、分類数変更指示ボタン２４１をクリックし、再度、分類された特徴単語群を確認する。なお、この画面の使用は任意である。 FIG. 12 is a screen for confirming the feature word group classified by the proximity clustering means 424. Using this screen, the user determines whether or not the classified feature word group is appropriate, and if so, clicks the search instruction button 213. If not appropriate, the user designates a new classification number in the classification number designation means 240, and then clicks the classification number change instruction button 241 to confirm the classified feature word group again. Use of this screen is arbitrary.

図１３は、文書データベース４４０、５４０、６４０に含まれる文書から検索インデックス４３０、５３０、６３０を作成した場合の検索インデックスの例である。文書ＩＤの列に個々の文書を識別する識別子、その識別子に該当する文書に含まれる単語の出現位置の情報が格納されている。 FIG. 13 is an example of a search index when the search indexes 430, 530, and 630 are created from documents included in the document databases 440, 540, and 640. The document ID column stores an identifier for identifying each document, and information on the appearance position of a word included in the document corresponding to the identifier.

次に、実施の形態に係る連想検索システムで実行される処理の流れを、図１４のシーケンス図を用いて説明する。以下では、連想検索スレーブサーバとして連想検索スレーブサーバ４０を用いる場合を説明する。 Next, the flow of processing executed by the associative search system according to the embodiment will be described with reference to the sequence diagram of FIG. Below, the case where the associative search slave server 40 is used as an associative search slave server is demonstrated.

利用者は、検索クライアント２０が備える検索要求入力手段２１０を用い、検索要求を入力する。入力された検索要求は、検索クライアント２０から連想検索マスタサーバ３０に送信される（Ｔ１１）。 The user uses the search request input means 210 provided in the search client 20 to input a search request. The input search request is transmitted from the search client 20 to the associative search master server 30 (T11).

連想検索マスタサーバ３０の検索要求解析手段３１０は検索要求を解析し、連想検索スレーブサーバ４０に送信するための検索要求を作成する。検索要求発行手段３２０により、検索要求が連想検索スレーブサーバ４０に送信される（Ｔ１２）。 The search request analysis unit 310 of the associative search master server 30 analyzes the search request and creates a search request for transmission to the associative search slave server 40. The search request issuing means 320 transmits a search request to the associative search slave server 40 (T12).

連想検索スレーブサーバ４０が備える検索手段４１０は、検索インデックス４３０を用いて文書データベース４４０を検索し、その結果を連想検索マスタサーバ３０に返す（Ｔ１３）。 The search means 410 provided in the associative search slave server 40 searches the document database 440 using the search index 430 and returns the result to the associative search master server 30 (T13).

連想検索マスタサーバ３０の特徴単語要求手段３３０は、得られた検索結果から特徴単語を抽出するために、特徴単語の抽出要求を連想検索スレーブサーバ４０に送信する（Ｔ１４）。 The feature word requesting unit 330 of the associative search master server 30 transmits a feature word extraction request to the associative search slave server 40 in order to extract a feature word from the obtained search result (T14).

連想検索スレーブサーバ４０が備える特徴単語抽出手段４２０は、検索インデックス４３０を利用して特徴単語群を抽出し、連想検索マスタサーバ３０へ返す（Ｔ１５）。 The feature word extraction means 420 included in the associative search slave server 40 extracts a feature word group using the search index 430 and returns it to the associative search master server 30 (T15).

最後に、検索結果と特徴単語群が連想検索マスタサーバ３０から検索クライアント２０に送信され（Ｔ１６）、検索クライアント２０の検索結果表示手段２２０と特徴単語表示手段２３０によって利用者に提示される。 Finally, the search result and the feature word group are transmitted from the associative search master server 30 to the search client 20 (T16) and presented to the user by the search result display means 220 and the feature word display means 230 of the search client 20.

次に、図１５に示すシーケンス図について説明する。このシーケンス図は、検索結果として得られた文書群から連想検索を実行する場合の処理の流れを示している。 Next, the sequence diagram shown in FIG. 15 will be described. This sequence diagram shows a flow of processing when an associative search is executed from a document group obtained as a search result.

利用者は、検索クライアント２０が備える文書群指定手段２２０を用いて、検索入力となる文書群を選択する。選択された文書群の識別子は連想検索マスタサーバ３０に送信される（Ｔ２１）。 The user uses the document group specifying means 220 provided in the search client 20 to select a document group as a search input. The identifier of the selected document group is transmitted to the associative search master server 30 (T21).

連想検索マスタサーバ３０の特徴単語要求手段３３０は、選択された文書群から特徴単語を抽出するために、特徴単語の抽出要求を連想検索スレーブサーバ４０に送信する（Ｔ２２）。 The feature word requesting unit 330 of the associative search master server 30 transmits a feature word extraction request to the associative search slave server 40 in order to extract a feature word from the selected document group (T22).

連想検索スレーブサーバ４０が備える特徴単語抽出手段４２０は、検索インデックス４３０を利用して特徴単語群を抽出し、連想検索マスタサーバ３０へ返す（Ｔ２３）。 The feature word extraction unit 420 included in the associative search slave server 40 extracts a feature word group using the search index 430 and returns it to the associative search master server 30 (T23).

連想検索マスタサーバ３０の検索要求発行手段３２０は、得られた特徴単語群を連想検索スレーブサーバに送信する（Ｔ２４）。 The search request issuing unit 320 of the associative search master server 30 transmits the obtained feature word group to the associative search slave server (T24).

連想検索スレーブサーバ４０が備える検索手段４１０は、検索インデックス４３０を用いて文書データベース４４０を検索し、その結果を連想検索マスタサーバ３０に返す（Ｔ２５）。 The search means 410 provided in the associative search slave server 40 searches the document database 440 using the search index 430, and returns the result to the associative search master server 30 (T25).

連想検索マスタサーバ３０の特徴単語要求手段３３０は、得られた検索結果から特徴単語を抽出するために、特徴単語の抽出要求を連想検索スレーブサーバ４０に送信する（Ｔ２６）。 The feature word requesting unit 330 of the associative search master server 30 transmits a feature word extraction request to the associative search slave server 40 in order to extract a feature word from the obtained search result (T26).

連想検索スレーブサーバ４０が備える特徴単語抽出手段４２０は、検索インデックス４３０を利用して特徴単語群を抽出し、連想検索マスタサーバ３０へ返す（Ｔ２７）。 The feature word extraction means 420 included in the associative search slave server 40 extracts a feature word group using the search index 430 and returns it to the associative search master server 30 (T27).

最後に、検索結果と特徴単語群が連想検索マスタサーバ３０から検索クライアント２０に送信され（Ｔ２８）、検索クライアント２０の検索結果表示手段２２０と特徴単語表示手段２３０によって利用者に提示される。 Finally, the search result and the feature word group are transmitted from the associative search master server 30 to the search client 20 (T28) and presented to the user by the search result display means 220 and the feature word display means 230 of the search client 20.

次に、図１６に示すシーケンス図について説明する。このシーケンス図は、検索結果として得られた文書群から連想検索を実行する場合の処理の流れを示しており、かつ、得られた文書群に二つの話題が含まれている場合を示している。 Next, the sequence diagram shown in FIG. 16 will be described. This sequence diagram shows a flow of processing when an associative search is executed from a document group obtained as a search result, and shows a case where two topics are included in the obtained document group. .

利用者は、検索クライアント２０が備える文書群指定手段２２０を用いて、検索入力となる文書群を選択する。選択された文書群の識別子は、検索クライアント２０から連想検索マスタサーバ３０に送信される（Ｔ３１）。 The user uses the document group specifying means 220 provided in the search client 20 to select a document group as a search input. The identifier of the selected document group is transmitted from the search client 20 to the associative search master server 30 (T31).

連想検索マスタサーバ３０の特徴単語要求手段３３０は、選択された文書群から特徴単語を抽出するために、特徴単語の抽出要求を連想検索スレーブサーバ４０に送信する（Ｔ３２）。 The feature word requesting unit 330 of the associative search master server 30 transmits a feature word extraction request to the associative search slave server 40 in order to extract a feature word from the selected document group (T32).

連想検索スレーブサーバ４０が備える特徴単語抽出手段４２０は、検索インデックス４３０を利用して特徴単語群を抽出し、連想検索マスタサーバ３０へ返す（Ｔ３３）。 The feature word extraction means 420 included in the associative search slave server 40 extracts a feature word group using the search index 430 and returns it to the associative search master server 30 (T33).

連想検索マスタサーバ３０の検索要求発行手段３２０は、得られた二つの特徴単語群のうち一つ目の話題に相当する特徴単語群を連想検索スレーブサーバ４０に送信する（Ｔ３４１）。 The search request issuing unit 320 of the associative search master server 30 transmits the feature word group corresponding to the first topic among the obtained two feature word groups to the associative search slave server 40 (T341).

連想検索スレーブサーバ４０が備える検索手段４１０は、検索インデックス４３０を用いて文書データベース４４０を検索し、その結果を連想検索マスタサーバ３０に返す（Ｔ３５１）。 The search means 410 provided in the associative search slave server 40 searches the document database 440 using the search index 430 and returns the result to the associative search master server 30 (T351).

連想検索マスタサーバ３０の特徴単語要求手段３３０は、得られた検索結果から特徴単語を抽出するために、特徴単語の抽出要求を連想検索スレーブサーバ４０に送信する（Ｔ３６１）。 The feature word requesting unit 330 of the associative search master server 30 transmits a feature word extraction request to the associative search slave server 40 in order to extract a feature word from the obtained search result (T361).

連想検索スレーブサーバ４０が備える特徴単語抽出手段４２０は、検索インデックス４３０を利用して特徴単語群を抽出し、連想検索マスタサーバ３０へ返す（Ｔ３７１）。 The feature word extraction unit 420 included in the associative search slave server 40 extracts a feature word group using the search index 430 and returns it to the associative search master server 30 (T371).

次に、連想検索マスタサーバ３０の検索要求発行手段３２０は、得られた二つの特徴単語群のうち二つ目の話題に相当する特徴単語群を連想検索スレーブサーバ４０に送信する（Ｔ３４２）。 Next, the search request issuing unit 320 of the associative search master server 30 transmits a feature word group corresponding to the second topic among the obtained two feature word groups to the associative search slave server 40 (T342).

連想検索スレーブサーバ４０が備える検索手段４１０は、検索インデックス４３０を用いて文書データベース４４０を検索し、その結果を連想検索マスタサーバ３０に返す（Ｔ３５２）。 The search means 410 provided in the associative search slave server 40 searches the document database 440 using the search index 430 and returns the result to the associative search master server 30 (T352).

連想検索マスタサーバ３０の特徴単語要求手段３３０は、得られた検索結果から特徴単語を抽出するために、特徴単語の抽出要求を連想検索スレーブサーバ４０に送信する（Ｔ３６２）。 The feature word requesting unit 330 of the associative search master server 30 transmits a feature word extraction request to the associative search slave server 40 in order to extract a feature word from the obtained search result (T362).

連想検索スレーブサーバ４０が備える特徴単語抽出手段４２０は、検索インデックス４３０を利用して特徴単語群を抽出し、連想検索マスタサーバ３０へ返す（Ｔ３７２）。 The feature word extraction means 420 included in the associative search slave server 40 extracts a feature word group using the search index 430 and returns it to the associative search master server 30 (T372).

話題が三つ以上ある場合には、Ｔ３３の後の検索要求発行手段→Ｔ３４１→検索手段→Ｔ３５１→特徴単語要求手段→Ｔ３６１→特徴単語抽出手段→Ｔ３７１と同様の処理を必要な回数繰り返せばよい。 When there are three or more topics, the same processing as the search request issuing means after T33 → T341 → search means → T351 → feature word requesting means → T361 → feature word extracting means → T371 may be repeated as many times as necessary. .

図１０に示したように二つの話題の検索結果全体から特徴単語を抽出する場合は、図１６のシーケンス図において、Ｔ３５１の後の特徴単語要求手段→Ｔ３６１→特徴単語抽出手段→Ｔ３７１を省略し、Ｔ３５２の後の特徴単語要求手段において、二つの話題の検索結果全体の文書群を連想検索スレーブサーバ４０に送信すればよい。 When extracting feature words from the entire search results of two topics as shown in FIG. 10, feature word requesting means after T351 → T361 → feature word extracting means → T371 is omitted in the sequence diagram of FIG. In the feature word requesting means after T352, the entire document group of the search results of the two topics may be transmitted to the associative search slave server 40.

なお、本発明は上述した形態例に限定されるものでなく、様々な変形例が含まれる。例えば、上述した形態例は、本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、ある形態例の一部を他の形態例の構成に置き換えることが可能であり、また、ある形態例の構成に他の形態例の構成を加えることも可能である。また、各形態例の構成の一部について、他の構成を追加、削除又は置換することも可能である。 In addition, this invention is not limited to the form example mentioned above, Various modifications are included. For example, the above-described embodiments have been described in detail for easy understanding of the present invention, and are not necessarily limited to those having all the configurations described. Moreover, it is possible to replace a part of a certain form example with the structure of another form example, and it is also possible to add the structure of another form example to the structure of a certain form example. Moreover, it is also possible to add, delete, or replace another structure with respect to a part of structure of each form example.

また、上述した各構成、機能、処理部、処理手段等は、それらの一部又は全部を、例えば集積回路その他のハードウェアとして実現することも可能である。 In addition, each of the above-described configurations, functions, processing units, processing means, and the like can be realized in part or in whole as, for example, an integrated circuit or other hardware.

１０：通信ネットワーク
２０：検索クライアント
２１０：検索要求入力手段
２１１：検索要求入力エリア
２１２：検索指示ボタン
２１３：連想検索指示ボタン
２２０：検索結果表示手段（文書群指定手段）
２２１：文書選択チェックボックス
２３０：特徴単語表示手段（単語群指定手段）
２３１：単語選択チェックボックス
２４０：分類数指定手段
２４１：分類数変更指示ボタン
３０：連想検索マスタサーバ
３１０：検索要求解析手段
３２０：検索要求発行手段
３３０：特徴単語要求手段
４０：連想検索スレーブサーバ
４１０：検索手段
４１１：単語頻度取得手段
４１２：位置情報取得手段
４１３：関連度計算手段
４１４：近接性計算手段
４１５：スコア計算手段
４２０：特徴単語抽出手段
４２１：単語頻度取得手段
４２２：位置情報取得手段
４２３：重要度計算手段
４２４：近接性クラスタリング手段
４２５：単語追加手段
４３０：検索インデックス
４４０：文書データベース
４９０：バス
４９１：メモリ装置
４９２：演算処理装置
４９３：インタフェース装置
４９４：補助記憶装置
４９５：入力装置
４９６：出力装置
５０：連想検索スレーブサーバ
５１０：検索手段
５２０：特徴単語抽出手段
５３０：検索インデックス
５４０：文書データベース
５０：連想検索スレーブサーバ
５１０：検索手段
５２０：特徴単語抽出手段
５３０：検索インデックス
５４０：文書データベース 10: Communication network 20: Search client 210: Search request input means 211: Search request input area 212: Search instruction button 213: Associative search instruction button 220: Search result display means (document group specification means)
221: Document selection check box 230: Feature word display means (word group designation means)
231: Word selection check box 240: Classification number designation means 241: Classification number change instruction button 30: Associative search master server 310: Search request analysis means 320: Search request issue means 330: Feature word request means 40: Associative search slave server 410 : Search means 411: word frequency acquisition means 412: position information acquisition means 413: relevance calculation means 414: proximity calculation means 415: score calculation means 420: feature word extraction means 421: word frequency acquisition means 422: position information acquisition means 423: Importance calculation means 424: Proximity clustering means 425: Word addition means 430: Search index 440: Document database 490: Bus 491: Memory device 492: Arithmetic processing device 493: Interface device 494: Auxiliary storage device 495: Input device 496: Output device 50: Virtual search slave server 510: search means 520: feature word extraction unit 530: search index 540: the document database 50: the associative search slave server 510: search means 520: feature word extraction unit 530: search index 540: document database

Claims

A search client having at least input means for inputting a search request document, and search result display means for displaying the searched search results;
A document database storing multiple documents;
Search means for searching the document database for a document having a high degree of relevance to the received search request document, and extracting a feature word group from the given document group, and extracting the extracted feature word group and the importance of each word And a feature word extracting means for classifying into one or more feature word groups based on the appearance position information, and when a plurality of feature word groups are extracted, the degree of relevance for each of the classified feature word groups An associative search system comprising: an associative search server for searching a high document from the document database.

The associative search system according to claim 1,
An associative search system in which the number of classifications of feature words is arbitrarily input by the user through an interface.

The associative search system according to claim 2,
The interface includes a button for instructing a change in the number of classifications of the feature word group.

The associative search system according to claim 1,
The associative search system is characterized in that the number of classifications of the feature word group is automatically set by comparing the classification score with a threshold value.

The associative search system according to claim 1,
The associative search system characterized in that the feature word extraction means obtains a range covered by a word that appears multiple times and classifies the feature word group at a location with a low coverage.

The associative search system according to claim 1,
The associative search system characterized in that the feature word extraction means obtains the centroid position of a word that appears multiple times and classifies the feature word group around the centroid position.

The associative search system according to claim 1,
The associative search system, wherein the search client includes feature word display means for displaying a feature word of a searched document group.

In an associative search server that searches a document database storing a plurality of documents for documents similar to a search request document input from a search client,
Search means for searching the document database for documents having a high degree of relevance to the received search request document;
Feature word extraction means for extracting a feature word group from a given document group and classifying the extracted feature word group into one or more feature word groups based on the importance of each word and its appearance position information And an associative search server that searches the document database for documents having a high degree of relevance for each of the classified feature word groups.

The associative search server according to claim 8,
The associative search server characterized in that the feature word extracting means classifies the feature word group based on the number of classifications arbitrarily designated by the user through the interface.

The associative search server according to claim 9,
The interface includes a button for instructing a change in the number of classifications of the feature word group.

The associative search server according to claim 8,
The feature word extraction unit automatically sets the number of classifications of the feature word group by comparing the classification score with a threshold value.
An associative search server characterized by that.

To a computer functioning as an associative search server that searches a document database storing a plurality of documents for documents similar to a search request document input from a search client,
A first process for searching the document database for a document having a high degree of relevance to the received search request document;
A second process for extracting a feature word group from a given document group and classifying the extracted feature word group into one or more feature word groups based on the importance of each word and its appearance position information When,
When a plurality of feature word groups are extracted, a program that executes a third process of searching the document database for a document having a high degree of association with each of the classified feature word groups.

The program according to claim 12,
The program in which the second processing classifies the feature word group based on the number of classifications arbitrarily designated by the user through the interface.

The program according to claim 13, wherein
The said interface has a button which instruct | indicates the change of the classification number of a characteristic word group. The program characterized by the above-mentioned.

The program according to claim 12,
In the second process, the classification number of the feature word group is automatically set by comparing the classification score with a threshold value.