JP5313295B2

JP5313295B2 - Document search service providing method and system

Info

Publication number: JP5313295B2
Application number: JP2011114168A
Authority: JP
Inventors: 兌榮郭; 恩芝李; 秉學金
Original assignee: Naver Corp
Current assignee: Naver Corp
Priority date: 2007-10-10
Filing date: 2011-05-20
Publication date: 2013-10-09
Anticipated expiration: 2028-10-07
Also published as: JP2009093659A; KR100902674B1; JP2011154739A; KR20090036929A

Abstract

A method and a system for serving document exploration service are provided to form an accurately adjusted tree structure by renewing the tree structure based on the document browse inclination of the user who is using the document search service for searching documents classified in tree structure. A document delivery unit(215) provides an access medium to a document. A reaction information acquisition unit(216) obtains a user response information about the access medium. A tree structure updating unit(217) renews a tree structure in consideration of the response information. The document delivery unit provides a medium for visualizing the tree structure. The reaction information acquisition unit collects the response information of the user about the visualization medium. The tree structure updating unit cuts off the connection where the frequency of the response of the user is less than a designated value.

Description

本発明は、文書探索サービス提供方法及びシステムに関するものである。 The present invention relates to a document search service providing method and system.

ウェブ上では、多様な関心事に対する数多くの文書が存在する。使用者たちは、自身の望む情報に対する質疑語情報（クエリー）を検索エンジンに伝送することで、情報を取得することができる。しかしながら、使用者自身が関心を持つ主題に対する質疑語をその都度毎回入力することは、非常に面倒なことである。 There are many documents on the web for diverse interests. The user can acquire information by transmitting the query information (query) for the information desired by the user to the search engine. However, it is very troublesome to input a question and answer for a subject that the user himself is interested in each time.

一方、質疑語入力などの手順を経ずに使用者自身の望む情報にアクセスするために、特定の分野に専門性を有するバーティカルサイト（ｖｅｒｔｉｃａｌｓｉｔｅ（専門分野サイト））及びブログなどに接続し、該当の分野の最新情報を取得することができる。 On the other hand, in order to access the information desired by the user without going through the steps such as inputting questions, connect to a vertical site (vertical site (special field site)) or a blog that has expertise in a specific field. You can get the latest information on the fields.

このようなバーティカルサイト及びブログに存在する各情報の水準は日々に向上しており、バーティカルサイト及びブログは、該当の分野で最も速くかつ深みのある有益な情報を取得可能なメディアとして発展しつつある。 The level of information existing on such vertical sites and blogs is improving day by day, and vertical sites and blogs are developing as media that can acquire useful information that is the fastest and deepest in the field. is there.

しかしながら、多様なバーティカルサイト及びブログに分散されている情報を閲覧するために各サイトを訪問することも、使用者にとって不便である。これに対する補完策として、バーティカルサイト及びブログは、ＲＳＳフィード（ＲｅａｌｌｙＳｉｍｐｌｅＳｙｎｄｉｃａｔｉｏｎＦｅｅｄ；ＲＳＳＦｅｅｄ）を提供しており、これを購読するためにＲＳＳリーダー（ＲＳＳＲｅａｄｅｒ）などのプログラムが使用される。 However, it is also inconvenient for the user to visit each site to browse information distributed on various vertical sites and blogs. As a supplement to this, vertical sites and blogs provide RSS feeds (Real Feed Syndication Feeds), and programs such as RSS readers (RSS Reader) are used to subscribe to them.

ところが、各ＲＳＳフィードは、互いに独立的に情報を提供し、同一の内容または極めて類似した内容の文書も別個の情報として取り扱うので、使用者には情報の探索・閲覧過程で効率性を向上させるための追加的な努力が要求される。 However, each RSS feed provides information independently of each other and handles documents of the same content or very similar content as separate information, so that the user can improve efficiency in the process of searching and browsing information. Additional effort is required.

そこで、本発明の目的は、ツリー構造で分類された文書を探索できる文書探索サービスを提供する方法するにあたり、使用者の反応情報を考慮することでツリー構造を好適に更新する文書探索サービス提供方法及びシステムを提供することにある。 SUMMARY OF THE INVENTION Accordingly, an object of the present invention is to provide a document search service that suitably updates the tree structure by considering user reaction information in providing a document search service that can search for documents classified by the tree structure. And providing a system.

また、本発明の他の目的は、文書探索サービスを用いる使用者の文書閲覧性向を反映してツリー構造を更新することで、より直観的でありながらも正確に調整されたツリー構造を形成する文書探索サービス提供方法及びシステムを提供することにある。 Another object of the present invention is to update the tree structure to reflect the document browsing tendency of the user who uses the document search service, thereby forming a more intuitive but accurately adjusted tree structure. To provide a document search service providing method and system.

本発明の一側面によると、ツリー構造で分類された文書を探索できる文書探索サービスを提供する方法であって、文書に対するアクセス手段を提供する段階と、アクセス手段に対する使用者反応情報を取得する段階と、反応情報を考慮してツリー構造を更新する段階とを含む文書探索サービス提供方法が提供される。 According to an aspect of the present invention, there is provided a method for providing a document search service capable of searching for documents classified in a tree structure, the step of providing an access means for a document, and the step of obtaining user response information for the access means And a method of providing a document search service including a step of updating the tree structure in consideration of reaction information.

文書に対するアクセス手段を提供する段階は、ツリー構造を視覚化するための手段を提供することができる。使用者反応情報は、文書のツリー構造が視覚化される手段に対する使用者の反応情報を収集することで取得される。 Providing access to the document can provide a means for visualizing the tree structure. User response information is obtained by collecting user response information for the means by which the tree structure of the document is visualized.

文書探索サービス提供方法でツリー構造を更新する段階は、ツリー構造で使用者の反応頻度が所定値以下である連結を断絶させることができる。 The step of updating the tree structure by the document search service providing method can break the connection in which the user's reaction frequency is not more than a predetermined value in the tree structure.

また、ツリー構造を更新する段階は、文書の閲覧者が入力した質疑語情報をさらに考慮してツリー構造を更新することができる。 Further, in the step of updating the tree structure, the tree structure can be updated by further considering the question word information input by the document viewer.

文書探索サービスを提供するために使用されるツリー構造は、文書を主題によって分類する段階と、文書の題目を抽出する段階と、抽出された題目に基づいてクラスターを形成する段階と、クラスターを主題に属する所定のディレクトリにマッピングする段階とを含むツリー構造形成方法によって形成される。 The tree structure used to provide the document search service includes the steps of classifying documents by subject, extracting the subject of the document, forming a cluster based on the extracted subject, and subjecting the cluster to the subject. Mapping to a predetermined directory belonging to a tree structure forming method.

このようなツリー構造形成方法において、文書は、少なくとも一つのフィールドを含み、文書の題目を抽出する段階は、文書を構成するフィールドの属性を考慮して題目を抽出することができる。 In such a tree structure forming method, the document includes at least one field, and in the step of extracting the title of the document, the title can be extracted in consideration of the attributes of the fields constituting the document.

一方、クラスターを形成する段階で、抽出された題目を音節単位で区分し、題目のうち他の文書と共有される部分を前記クラスターの中心概念候補として選定することができる。中心概念候補は、抽出された題目に対するｎ−ｇｒａｍ分析を用いて選定される。 On the other hand, at the stage of forming a cluster, the extracted theme can be classified in syllable units, and a portion of the theme that is shared with other documents can be selected as a central concept candidate of the cluster. The central concept candidate is selected using n-gram analysis for the extracted subject.

また、本発明の他の側面によると、ツリー構造で分類された文書を探索できる文書探索サービスを提供するシステムであって、文書に対するアクセス手段を提供する文書提供部と、アクセス手段に対する使用者反応情報を取得する反応情報獲得部と、反応情報を考慮してツリー構造を更新するツリー構造更新部とを含む文書探索サービス提供システムが提供される。 According to another aspect of the present invention, there is provided a system for providing a document search service capable of searching for documents classified in a tree structure, a document providing unit for providing access means for a document, and a user response to the access means. There is provided a document search service providing system including a reaction information acquisition unit that acquires information and a tree structure update unit that updates the tree structure in consideration of the reaction information.

文書探索サービス提供システムにおいて、文書提供部は、ツリー構造を視覚化するための手段を提供することができ、反応情報獲得部は、文書提供部によって提供された視覚化手段に対する使用者の反応情報を収集することができる。 In the document search service providing system, the document providing unit can provide a means for visualizing the tree structure, and the response information acquiring unit is a user's reaction information with respect to the visualizing means provided by the document providing unit. Can be collected.

文書探索サービス提供システムにおいて、ツリー構造更新部は、ツリー構造で使用者の反応頻度が所定値以下である連結を断絶させることができる。 In the document search service providing system, the tree structure update unit can break a connection in which a user's reaction frequency is equal to or lower than a predetermined value in the tree structure.

また、ツリー構造更新部は、文書の閲覧者が質疑語情報をさらに考慮し、文書のツリー構造を更新することができる。 In addition, the tree structure update unit can update the tree structure of the document by allowing the viewer of the document to further consider the question word information.

文書探索サービス提供システムは、ツリー構造を形成するツリー構造形成部を含むことができる。ツリー構造形成部は、文書を主題によって分類する文書分類部と、文書の題目を抽出する題目抽出部と、抽出された題目に基づいてクラスターを形成するクラスター形成部と、クラスターを主題に属する所定のディレクトリにマッピングするディレクトリマッピング部とを含むことができる。 The document search service providing system can include a tree structure forming unit that forms a tree structure. The tree structure forming unit includes a document classifying unit that classifies documents according to a subject, a subject extracting unit that extracts a subject of the document, a cluster forming unit that forms a cluster based on the extracted subject, and a predetermined cluster belonging to the subject And a directory mapping unit for mapping to the directory.

一方、文書は、少なくとも一つのフィールドを含み、題目抽出部は、文書を構成するフィールドの属性を考慮して題目を抽出することができる。 On the other hand, the document includes at least one field, and the subject extraction unit can extract the subject in consideration of the attributes of the fields constituting the document.

また、クラスター形成部は、抽出された題目を音節単位で区分し、題目のうち他の文書と共有される部分をクラスターの中心概念候補として選定することができる。中心概念候補は、抽出された題目に対するｎ−ｇｒａｍ分析を用いて選定される。 In addition, the cluster forming unit can classify the extracted subject in syllable units, and select a portion of the subject that is shared with other documents as a central concept candidate of the cluster. The central concept candidate is selected using n-gram analysis for the extracted subject.

一方、本発明の文書探索サービス提供方法は、コンピュータによって行われ、コンピュータで実行するためのプログラムを記録するコンピュータ可読記録媒体に記録される。 On the other hand, the document search service providing method of the present invention is performed by a computer and recorded on a computer-readable recording medium that records a program to be executed by the computer.

その他の側面、特徴及び利点は、添付された図面、特許請求の範囲及び発明の詳細な説明によって明確になるだろう。 Other aspects, features and advantages will be apparent from the accompanying drawings, the claims and the detailed description of the invention.

本発明の好適な一実施例によると、ツリー構造で分類された文書を探索できる文書探索サービスを提供する場合において、使用者の反応情報を考慮することで好適にツリー構造を更新することが可能な文書探索サービス提供方法及びシステムを提供することができる。 According to a preferred embodiment of the present invention, when providing a document search service capable of searching for documents classified in a tree structure, it is possible to update the tree structure suitably by considering user reaction information. A document search service providing method and system can be provided.

また、本発明の好適な一実施例によると、文書探索サービスを用いる使用者の文書閲覧性向を反映してツリー構造を更新することで、より直観的でありながらも正確に調整されたツリー構造を形成する文書探索サービス提供方法及びシステムを提供することができる。 According to a preferred embodiment of the present invention, the tree structure is updated to reflect the user's document browsing tendency using the document search service, so that the tree structure can be adjusted more intuitively but accurately. Can provide a document search service providing method and system.

本発明の一実施例に係る文書探索サービス提供方法のフローチャートである。4 is a flowchart of a document search service providing method according to an embodiment of the present invention. 本発明の一実施例に係る文書探索サービス提供システムの構成図である。1 is a configuration diagram of a document search service providing system according to an embodiment of the present invention. FIG. 本発明の一実施例に係る更新前の文書ツリー構造を例示した図である。It is the figure which illustrated the document tree structure before the update which concerns on one Example of this invention. 本発明の一実施例に係る更新後の文書ツリー構造を例示した図である。It is the figure which illustrated the document tree structure after the update which concerns on one Example of this invention. 本発明の一実施例に係る更新前の文書探索サービス提供画面を例示した図である。It is the figure which illustrated the document search service provision screen before the update which concerns on one Example of this invention. 本発明の一実施例に係る更新後の文書探索サービス提供画面を例示した図である。It is the figure which illustrated the document search service provision screen after the update which concerns on one Example of this invention. 本発明の一実施例に係る広告表示領域を含む文書探索サービス提供画面を例示した図である。It is the figure which illustrated the document search service provision screen containing the advertisement display area concerning one example of the present invention.

以下、本発明に係る文書探索サービス提供方法及びシステムの実施例を添付された図面に基づいて詳細に説明する。本発明は、特定の実施形態に限定されるものでなく、本発明の技術的思想及び技術範囲に含まれる全ての変更、均等物及び代替物を含むものとして理解されるべきである。本発明の説明において、関連した公知技術に対する具体的な説明が本発明の要旨を不明確にすると判断される場合には、それに対する詳細な説明を省略している。また、添付された図面に基づく説明において、同一の構成要素または対応する構成要素には同一の図面番号を付与し、それに対する重複的な説明を省略する。 Hereinafter, embodiments of a document search service providing method and system according to the present invention will be described in detail with reference to the accompanying drawings. The present invention is not limited to the specific embodiments, but should be understood as including all modifications, equivalents and alternatives included in the technical idea and scope of the present invention. In the description of the present invention, when it is determined that a specific description of a related known technique obscures the gist of the present invention, a detailed description thereof is omitted. Further, in the description based on the attached drawings, the same constituent elements or corresponding constituent elements are given the same drawing numbers, and redundant description thereof is omitted.

図１は、本発明の一実施例に係る文書探索サービス提供方法の処理フローを示すフローチャートであり、図２は、本発明の一実施例に係る文書探索サービス提供システムの構成図である。 FIG. 1 is a flowchart showing a processing flow of a document search service providing method according to an embodiment of the present invention, and FIG. 2 is a configuration diagram of a document search service providing system according to an embodiment of the present invention.

図１及び図２を参照すると、本発明の一実施例に係る文書探索サービス提供システムは、探索サービス提供サーバー２１０、文書分類部２１１、題目抽出部２１２、クラスター形成部２１３、ディレクトリマッピング部２１４、文書提供部２１５、反応情報獲得部２１６、ツリー構造更新部２１７、原本文書データベース２２１、探索サービスデータベース２２２、反応情報データベース２２３、使用者端末機２３０及びツリー構造形成部２４０を含んでいる。 1 and 2, a document search service providing system according to an embodiment of the present invention includes a search service providing server 210, a document classification unit 211, a subject extraction unit 212, a cluster formation unit 213, a directory mapping unit 214, The document providing unit 215, the reaction information obtaining unit 216, the tree structure updating unit 217, the original document database 221, the search service database 222, the reaction information database 223, the user terminal 230, and the tree structure forming unit 240 are included.

ツリー構造形成段階（Ｓ１１０）は、文書探索サービス提供方法が適用された文書探索サービス提供システムによって提供される各文書の間の連結関係を表現するツリー構造を形成する段階である。ツリー構造形成段階（Ｓ１１０）は、ツリー構造形成部２４０によって行われる。 The tree structure forming step (S110) is a step of forming a tree structure that expresses a connection relationship between each document provided by the document search service providing system to which the document search service providing method is applied. The tree structure forming step (S110) is performed by the tree structure forming unit 240.

ツリー構造形成段階（Ｓ１１０）は、文書を主題によって分類する段階（Ｓ１１２）と、分類された各文書の題目を抽出する段階（Ｓ１１４）と、抽出された題目に基づいて文書のクラスターを形成する段階（Ｓ１１６）と、文書のクラスターをディレクトリにマッピングする段階（Ｓ１１８）とを含むことができ、以下、各段階の細部的な動作を説明する。 The tree structure forming step (S110) includes a step of classifying documents by subject (S112), a step of extracting the subject of each classified document (S114), and forming a cluster of documents based on the extracted subject. A step S116 and a step of mapping a cluster of documents to a directory (S118) may be included. The detailed operation of each step will be described below.

文書を主題によって分類する段階（Ｓ１１２）は、文書分類部２１１が原本文書データベース２２１内の各文書を分類する段階である。 The step of classifying documents by subject (S112) is a step in which the document classification unit 211 classifies each document in the original document database 221.

文書分類部２１１は、原本文書データベース２２１から文書に対する情報を取得し、分類の構造に関する情報などを探索サービスデータベース２２２から取得する。取得した各情報に基づいて文書がマッチングされる分類を決定し、文書と分類との間のマッチング関係に対する情報を探索サービスデータベース２２２に保存する。 The document classification unit 211 acquires information on the document from the original document database 221 and acquires information on the structure of the classification from the search service database 222. Based on the acquired information, the classification to which the document is matched is determined, and information on the matching relationship between the document and the classification is stored in the search service database 222.

本段階で、文書分類部２１１は、文書に含まれた情報を使用して文書が特定のキーワード及び特定の内容を含んでいるかを判断することで、文書を主題別に分類することができる。 At this stage, the document classification unit 211 can classify the document by subject by determining whether the document includes a specific keyword and specific content using information included in the document.

一例として、‘ワイン’という分類とマッチングされるのに適した文書は、‘ワイン’という分類名自体及び分類名の同義語がその文章に含まれているか、及び‘ワイン’と深い関係があると判断されるキーワードである‘ソムリエ’、‘デキャンティング’などがその文書に含まれているかなどを考慮して決定される。 As an example, a document that is suitable for matching with the classification 'wine' contains a classification name 'wine' itself and a synonym for the classification name, and is closely related to 'wine'. It is determined in consideration of whether the document includes “sommelier”, “decanting”, and the like.

一方、文書と分類との間のマッチング可否を決定する場合において、分類関連キーワードの包含可否を数値化し、これを基準として活用することができる。一例として、特定の主題と関連した各キーワードが含まれる場合、所定の点数を付与し、これら点数の合計が一定の基準を越える場合、該当の分類とマッチングされると決定することができる。 On the other hand, when determining whether or not matching between a document and a classification is possible, it is possible to quantify whether or not the classification-related keyword is included and use it as a reference. As an example, when each keyword related to a specific subject is included, a predetermined score is given, and when the total of these scores exceeds a certain criterion, it can be determined that the keyword is matched with the corresponding classification.

なお、文書の分類段階において一つの文書が必ずしも一つの分類のみにマッチングされると判断されることはない。一例として、‘ワイン’という分類と‘日本漫画’という分類が存在する場合、ワインを主題とした日本漫画である‘神の雫’に対する鑑賞評（レビュー）などを取り扱う文書は、‘ワイン’に対する分類だけでなく、‘日本漫画’という分類にも同時にマッチングされる。 Note that it is not determined that one document is necessarily matched with only one classification in the document classification stage. As an example, if there is a classification of “Wine” and a classification of “Japanese comics”, a document that deals with appreciation reviews (reviews) on “Japanese manga”, which is a Japanese comic with wine as the subject, Not only the classification but also the classification of 'Japanese comics' is matched at the same time.

原本文書データベース２２１には、探索サービス提供サーバー２１０によって分類・再構成される各原本文書に対する情報が保存される。これら原本文書は、最終的に使用者端末機２３０に提供される。原本文書は、ウェブ上でウェブロボットなどによって収集される。 The original document database 221 stores information for each original document that is classified and reconfigured by the search service providing server 210. These original documents are finally provided to the user terminal 230. The original document is collected on the web by a web robot or the like.

一方、本発明の原本文書データベース２２１に保存される各原本文書は、所定の属性を有する文書を含むことができる。一例として、バーティカルサイト及びブログで使用される構造化された文書が原本文書として使用される。 On the other hand, each original document stored in the original document database 221 of the present invention can include a document having a predetermined attribute. As an example, structured documents used in vertical sites and blogs are used as original documents.

このような構造化された各文書は、文書内のコンテンツを少なくとも一つの領域または区画に分けて保存することができる。これら領域または区画は、フィールドと命名される。一例として、ブログのポストのような文書は、題目フィールド、本文フィールド、作成時刻フィールド及び該当のポストに対するキーワードフィールドなどを含むことができる。 Each structured document can be stored by dividing the content in the document into at least one region or section. These regions or partitions are named fields. As an example, a document such as a blog post may include a title field, a body field, a creation time field, a keyword field for the post, and the like.

このような文書に関して、その作成者は、それぞれのフィールド名に対応するコンテンツを入力することで文書を生成できるので、フィールド名及びそれに該当するコンテンツは、後述する題目抽出段階などで有用に使用される。 With respect to such a document, the creator can generate a document by inputting content corresponding to each field name, so that the field name and the corresponding content are usefully used in the subject extraction stage described later. The

また、このようなバーティカルサイト及びブログでは、各文書の間の関係も構造化されている可能性がある。このような文書の構造化された関係は、該当のサイトでのディレクトリ形態で表れる。 Further, in such a vertical site and blog, there is a possibility that the relationship between each document is structured. Such a structured relationship of documents appears in a directory form at the corresponding site.

一例として、映画を主題とするバーティカルサイトは、当該サイトの文書を分類するディレクトリとして‘映画鑑賞評’、‘映画順位’及び‘最新公開作’などのディレクトリを含むことができ、ブログもそれぞれのポストを分類するディレクトリに関する情報を有することができる。 As an example, a vertical site with a movie theme can include directories such as 'Movie Appreciation Review', 'Movie Ranking' and 'Latest Releases' as a directory to classify documents on the site, It can have information about the directory that classifies the post.

このようなバーティカルサイト及びブログでのディレクトリ名は、そのサイトが取り扱う主題と関連したキーワードとして使用される。これらキーワードは、上述した文書の主題別分類段階で活用されるもので、その分類の正確度を向上させるために使用される。 The directory names on such vertical sites and blogs are used as keywords related to the subjects handled by the sites. These keywords are used in the above-described document classification by subject, and are used to improve the accuracy of the classification.

本発明において、文書という用語は、電子的に記録された文書を通称する用語として理解される。文書は、ＨＴＭＬなどのマークアップランゲージを使用して記述され、「＊．ｈｔｍ」などの拡張子を有するが、特定の記述形態及び拡張子を有するファイルに限定されるものとして解析されることはない。 In the present invention, the term “document” is understood as a term used to refer to an electronically recorded document. A document is described using a markup language such as HTML and has an extension such as “* .htm”, but is analyzed as being limited to a file having a specific description format and extension. Absent.

探索サービスデータベース２２２には、文書分類部２１１によって決定された文書と分類との間のマッチング関係に対する情報が保存される。すなわち、探索サービスデータベース２２２は、文書別に各分類に対するマッチング可否を保存し、各分類別関連キーワードの包含可否を数値化して保存することができる。 The search service database 222 stores information on the matching relationship between the document and the classification determined by the document classification unit 211. In other words, the search service database 222 can store the matching availability for each classification for each document, and can numerically store the inclusion availability for each classification-related keyword.

一方、上述した原本文書データベース２２１及び探索サービスデータベース２２２での情報保存方法及び形態などは、本発明の目的範囲内で多様に変化される。 On the other hand, the information storage method and form in the original document database 221 and the search service database 222 described above may be variously changed within the scope of the present invention.

各文書の題目を抽出する段階（Ｓ１１４）は、題目抽出部２１２が原本文書データベース２２１に保存された各文書の題目を抽出する段階である。文書の題目とは、文書の内容及び主題を含んでいる単語、句または文章を意味する。 The step of extracting the subject of each document (S114) is a step in which the subject extracting unit 212 extracts the subject of each document stored in the original document database 221. Document title means a word, phrase or sentence that contains the document content and subject.

題目抽出部２１２は、原本文書データベース２２１に保存された文書の情報を用いて各文書の題目を抽出し、抽出された題目を探索サービスデータベース２２２に保存する。 The subject extraction unit 212 extracts the subject of each document using the document information stored in the original document database 221 and stores the extracted subject in the search service database 222.

本段階で、題目抽出部２１２は、各文書に含まれた情報を用いて文書の題目を抽出することができる。文書の構造、文書に含まれた各単語の出現頻度及び文書が使用者端末機２３０でブラウジングされる場合の属性などは、題目決定の基準として使用される。 At this stage, the subject extraction unit 212 can extract the subject of the document using information included in each document. The structure of the document, the appearance frequency of each word included in the document, the attributes when the document is browsed by the user terminal 230, and the like are used as the criteria for determining the title.

すなわち、題目を抽出する過程で使用される文書情報は、文書に直接的に含まれたコンテンツテキストのみならず、文書が使用者端末機２３０で閲覧される形態に関する情報などを含む概念として理解される。 That is, the document information used in the process of extracting the title is understood as a concept including not only the content text directly included in the document but also information regarding the form in which the document is viewed on the user terminal 230. The

一例として、ブログなどのウェブサイトは、構造化された各文書を含むことができる。これら文書は、少なくとも一つの名を有するフィールドに情報を保存することができる。すなわち、‘題目’、‘（ｔｉｔｌｅ）’などの名を有するフィールドに含まれたテキストを題目として選定することができる。 As an example, a website such as a blog may include each structured document. These documents can store information in a field having at least one name. That is, text included in a field having a name such as 'title' or '(title)' can be selected as the title.

他の例として、文書が使用者端末機２３０のウェブブラウザーなどを通してブラウジングされる場合、文書内の他の内容より相対的に大きく表示されたり、差別化される属性を有して表現されて強調されるテキストも題目の候補句として考慮される。 As another example, when a document is browsed through a web browser or the like of the user terminal 230, the document is emphasized by being displayed larger than other contents in the document or expressed with a differentiated attribute. Text is also considered as a candidate phrase.

抽出された題目に基づいて文書のクラスターを形成する段階（Ｓ１１６）は、クラスター形成部２１３が文書の題目情報に基づいてクラスタリングを行う段階である。 The step of forming a cluster of documents based on the extracted title (S116) is a step in which the cluster forming unit 213 performs clustering based on the title information of the document.

クラスター形成部２１３は、探索サービスデータベース２２２から取得した文書の題目情報に基づいて各文書のクラスターを形成する。形成されたクラスターに関する情報は、探索サービスデータベース２２２に保存される。 The cluster forming unit 213 forms a cluster of each document based on the document title information acquired from the search service database 222. Information about the formed cluster is stored in the search service database 222.

文書のクラスターは、中心概念を共有する各文書のグループを意味する。文書のクラスターは、各文書の題目における共通部分の存在可否を考慮して形成される。各クラスターの名称は、この中心概念を用いて命名される。 A document cluster refers to a group of documents that share a central concept. A cluster of documents is formed in consideration of whether or not a common part exists in the subject of each document. The name of each cluster is named using this central concept.

文書の題目のうち、他の文書と共通される文字列がクラスターの中心概念の候補となり、共通の文字列を有する文書の個数が所定値以上である場合、一つの独立したクラスターが形成される。 Among the document titles, a character string that is common to other documents is a candidate for the central concept of the cluster, and if the number of documents having a common character string is greater than or equal to a predetermined value, one independent cluster is formed. .

一例として、一つの文書の題目が‘ソムリエ追跡：ワインエチケット−一緒に楽しむワイン’で、他の文書の題目が‘テーブルマナー５編−ワインエチケット’である場合、二つの題目で共通される部分である‘ワインエチケット’が中心概念として抽出される。 As an example, if the subject of one document is 'Sommelier Tracking: Wine Etiquette-Wine to Enjoy Together' and the subject of the other document is 'Table Manner 5-Wine Etiquette', the common part of the two subjects The “wine etiquette” is extracted as a central concept.

文書の題目における重複部分を中心概念として抽出する過程で、ｎ−ｇｒａｍ分析方法を使用することができる。この場合、題目は、音節単位で分離され、所定個数の音節を有する文字列に再び組み合わされる。 An n-gram analysis method can be used in the process of extracting the overlapping part in the document title as a central concept. In this case, the titles are separated in syllable units and recombined into a character string having a predetermined number of syllables.

このように再び組み合わされた文字列のうち、重複部分が中心概念の候補となり得る。上述した例示の場合、二つの音節を有する‘ワイン’と５個の音節を有する‘ワインエチケット’が中心概念の候補として抽出される。 Of the character strings recombined in this way, the overlapping portion can be a candidate for the central concept. In the above example, 'wine' having two syllables and 'wine etiquette' having five syllables are extracted as candidates for the central concept.

なお、上記のように、各文書の題目で同一の重複部分が多数個ある場合、そのうち一つの重複部分を中心概念として決定する過程が要求されるが、このとき、重複部分の音節数、重複部分と文書の分類名との関係、及び該当の重複部分を有する文書の数などが決定基準として使用され、一つの中心概念を決定することができる。 In addition, as described above, when there are many identical overlapping parts in the titles of each document, a process of determining one overlapping part as a central concept is required, but at this time, the number of overlapping syllables, overlapping The relationship between the part and the classification name of the document, the number of documents having the corresponding overlapping part, and the like are used as determination criteria, and one central concept can be determined.

上述した例において、重複部分は、‘ワイン’、‘エチケット’及び‘ワインエチケット’である。この場合、‘ワイン’は、各文書が含まれた主題である‘ワイン’と同一であるので、一つのクラスターの中心概念としては適切でない。 In the example described above, the overlapping portions are 'wine', 'etiquette' and 'wine etiquette'. In this case, “wine” is the same as “wine”, which is the subject of each document, and is not appropriate as the central concept of one cluster.

重複部分を有する各文書の個数の側面でも、‘ワイン’を共有する文書の個数は、一つのクラスターに形成するには過度に大きい値である。このようにクラスターの中心概念の候補を決定する場合、その中心概念の候補を共有する各文書の個数を所定の範囲に制限することが要求される。 Even in the aspect of the number of documents having overlapping portions, the number of documents sharing 'wine' is an excessively large value for forming a single cluster. In this way, when determining the central concept candidates of the cluster, it is required to limit the number of documents sharing the central concept candidates to a predetermined range.

また、中心概念候補の長さも考慮対象になり得る。過度に短い中心概念候補の場合、その品詞が助詞であったり、特定の分類に使用されるのに不適切な一般的な用語である可能性がある。 The length of the central concept candidate can also be taken into consideration. In the case of an overly short central concept candidate, the part of speech may be a particle or a general term that is inappropriate for use in a particular classification.

一方、長い中心概念候補の場合、その候補を共有するクラスター内の各文書の間の関連度が高いこと、及びノイズが含まれる確率が小さいことが期待されるので、中心概念候補のうち最も長い候補を優先的に考慮することができる。 On the other hand, in the case of a long central concept candidate, it is expected that the degree of association between the documents in the cluster sharing the candidate is high and the probability that noise is included is small, so the longest central concept candidate Candidates can be considered preferentially.

上述した例の場合、残りの二つの中心概念候補が‘ワインエチケット’に含まれるので、まず、最も長い‘ワインエチケット’を候補として考慮することができ、‘ワインエチケット’を共有する文書数などの他の基準を満足すると判断される場合、単一のクラスターを構成する中心概念として選定される。 In the case of the above example, since the remaining two central concept candidates are included in the “wine etiquette”, first, the longest “wine etiquette” can be considered as a candidate, and the number of documents sharing the “wine etiquette”, etc. If it is judged that the other criteria are satisfied, it is selected as a central concept constituting a single cluster.

また、‘ワイン’という主題に含まれた文書で、‘エチケット’を共有する文書の個数と‘ワインエチケット’を共有する文書の数が極めて類似している場合、より具体的な‘ワインエチケット’を中心概念として選定することが効率的である。 Also, if the number of documents that share the “etiquette” and the number of documents that share the “wine etiquette” are very similar among documents included in the subject of “wine”, a more specific “wine etiquette” It is efficient to select as the central concept.

これら中心概念のうち所定の個数以上の文書が関連したことを基準にしてクラスターを構成することができる。クラスターをなす中心概念及びその中心概念を共有するクラスターに属する文書に関する情報は、探索サービスデータベース２２２に保存される。 A cluster can be constructed based on the fact that a predetermined number or more of these central concepts are related. Information about the central concept forming a cluster and documents belonging to the cluster sharing the central concept is stored in the search service database 222.

文書のクラスターをディレクトリにマッピングする段階（Ｓ１１８）は、ディレクトリマッピング部２１４が文書のクラスターを各クラスターの中心概念を基準にしてディレクトリにマッピングする段階である。 The step (S118) of mapping the document cluster to the directory is a step in which the directory mapping unit 214 maps the document cluster to the directory based on the central concept of each cluster.

ディレクトリは、文書の分類、すなわち、主題の下位概念として一つまたは複数のクラスターを含むことができる小主題を意味する。一例として、文書が分類された主題が‘ワイン’である場合、そのディレクトリとして‘ワインの産地’、‘ワインの歴史’及び‘ワインエチケット’などが含まれ、‘ワインの産地’ディレクトリは、ワインの生産地として知られた‘ボルドー’及び‘ブルゴーニュ’などの地名が中心概念として形成されたクラスターを含むことができる。 A directory means a sub-topic that can contain one or more clusters as a classification of documents, ie, subordinate concepts of the subject. As an example, if the subject matter to which the document was categorized is 'Wine', its directory includes 'Wine Origin', 'Wine History' and 'Wine Etiquette', etc. It is possible to include clusters in which place names such as 'Bordeaux' and 'Burgundy', known as production areas, are formed as central concepts.

ディレクトリマッピング部２１４は、探索サービスデータベース２２２からディレクトリ構造に関する情報及びクラスターに対する情報を取得し、各ディレクトリにマッピングされるクラスターを決定する。決定されたディレクトリマッピング情報は、探索サービスデータベース２２２に保存される。 The directory mapping unit 214 acquires information on the directory structure and information on the cluster from the search service database 222, and determines a cluster to be mapped to each directory. The determined directory mapping information is stored in the search service database 222.

クラスターがマッピングされるディレクトリは、該当のクラスターの中心概念がそのディレクトリと関連したキーワードを含んでいるかによって決定される。 The directory to which a cluster is mapped is determined by whether the central concept of the cluster includes a keyword associated with the directory.

一例として、ディレクトリが‘ワインエチケット’である場合、ディレクトリ名では、‘エチケット’がディレクトリの包含可否を決定するためのキーワードになり得る。既に‘ワイン’分類に該当すると判断された各文書に対して形成されたクラスターをマッピングする過程では、分類名である‘ワイン’自体を除いたキーワードでディレクトリをマッピングすることが効率的である。 As an example, when the directory is “wine etiquette”, “etiquette” can be a keyword for determining whether or not to include the directory in the directory name. In the process of mapping a cluster formed for each document that has already been determined to fall under the 'wine' classification, it is efficient to map the directory with keywords excluding the classification name 'wine' itself.

一方、これらキーワードに対しては、辞典式羅列法を使用して該当のキーワードを拡張することができる。‘エチケット’は、同義語、類義語及び表記言語を異にするキーワードに拡張される。 On the other hand, for these keywords, the corresponding keywords can be expanded using the dictionary enumeration method. 'Etiquette' is extended to keywords that have different synonyms, synonyms and notation languages.

この場合、‘礼節’、‘ｅｔｉｑｕｅｔｔｅ’、‘マナー’及び‘ｍａｎｎｅｒ’などのキーワードがディレクトリマッピングのための追加的なキーワードとして考慮される。これを通じて、ディレクトリマッピングの効率性を向上させることができる。 In this case, keywords such as 'respect,' 'equitette,' 'manner,' and 'manner' are considered as additional keywords for directory mapping. Through this, the efficiency of directory mapping can be improved.

このようなディレクトリマッピングのためのキーワードも、ディレクトリ構造に関する情報の一部として探索サービスデータベース２２２に保存される。 Such a keyword for directory mapping is also stored in the search service database 222 as part of information on the directory structure.

以上では、ツリー構造形成段階（Ｓ１１０）の各段階の細部的な動作を説明した。形成されたツリー構造は、分類される文書自体に含まれる情報を用いて形成される。この場合、ツリー構造は、好ましくない文書、クラスター及びそれらの連結を含むことができる。このようにノイズを含むツリー構造は、更新前のツリー構造として図３に基づいて説明される。 The detailed operation of each stage of the tree structure formation stage (S110) has been described above. The formed tree structure is formed by using information included in the classified document itself. In this case, the tree structure can include unfavorable documents, clusters and their concatenation. Thus, the tree structure including noise will be described based on FIG. 3 as a tree structure before update.

文書を提供する段階（Ｓ１２０）は、文書提供部２１５が使用者端末機２３０にクラスタリングされた各文書をディレクトリ別に提供する段階である。 The document providing step (S120) is a step in which the document providing unit 215 provides each document clustered in the user terminal 230 by directory.

本段階は、所定の主題、すなわち、分類に属するディレクトリ構造及びディレクトリに属するクラスターの包含関係を視覚化して提供することで、使用者が自身の関心分野の各文書を容易に探索できることに特徴がある。 This stage is characterized in that the user can easily search each document in his / her area of interest by visualizing and providing a predetermined subject, that is, a directory structure belonging to a classification and an inclusive relation of clusters belonging to a directory. is there.

使用者は、使用者端末機２３０を通じて自身が関心を持つ主題に関する情報を探索サービス提供サーバー２１０に伝送する。これは、該当の主題に対する探索サービスを提供するウェブページに対するリンクのクリック動作などによって行われる。 The user transmits information about the subject he is interested in to the search service providing server 210 through the user terminal 230. This is performed, for example, by clicking a link to a web page that provides a search service for the subject.

文書提供部２１５は、使用者の関心主題に関する情報が含まれた使用者端末機２３０からの要請を受信し、これに対する応答として、上述した各段階で分類されてクラスタリングされた各文書に対するアクセスリンクを含むウェブページを提供することができる。これを通じて、探索サービスが使用者に提供される。 The document providing unit 215 receives a request from the user terminal 230 including information on the subject of interest of the user, and, as a response thereto, an access link for each document classified and clustered at each of the above steps. A web page including can be provided. Through this, a search service is provided to the user.

一方、文書探索サービスを提供するために、文書提供部２１５は、クラスタリングされた各文書に対する情報を探索サービスデータベース２２２から取得する。 On the other hand, in order to provide a document search service, the document providing unit 215 acquires information on each clustered document from the search service database 222.

文書提供部２１５の応答が提供される形態及び様式は、ＣＳＳ（ｃａｓｃａｄｉｎｇｓｔｙｌｅｓｈｅｅｔｓ）などを用いて調節される。また、別途のコンテンツマネジメントシステム（ｃｏｎｔｅｎｔｍａｎａｇｅｍｅｎｔｓｙｓｔｅｍ：ＣＭＳ）を用いることも可能である。 The form and manner in which the response of the document providing unit 215 is provided is adjusted using CSS (cascading stylesheets) or the like. It is also possible to use a separate content management system (CMS).

使用者の反応情報を取得する段階（Ｓ１３０）は、反応情報獲得部２１６が使用者の反応情報を取得する段階である。 The step (S130) of obtaining user reaction information is a step in which the reaction information acquisition unit 216 obtains user reaction information.

本実施例において、探索サービスは、文書提供部２１５によって視覚化されたツリー構造を使用者に提供することができる。このようなツリー構造の視覚化は、図５及び図６に示した分類構造表示領域５１０、探索位置表示領域５２０及びクラスター表示領域５３０などで行われる。 In this embodiment, the search service can provide the tree structure visualized by the document providing unit 215 to the user. Such visualization of the tree structure is performed in the classification structure display area 510, the search position display area 520, the cluster display area 530, and the like shown in FIGS.

このような領域で、使用者は、各領域において自身が探索しようとするディレクトリ、クラスター及び文書と関連した部分をクリックすることできる。そして、使用者がこのように自身の探索位置を決定して文書を閲覧する使用者のアクション、すなわち、文書を閲覧するための使用者のツリー構造で分類された文書に対する反応を、当該使用者の文書に対する反応に関する情報として反応情報取得部２１６が収集する。 In such an area, the user can click on the part associated with the directory, cluster and document that he / she wishes to search in each area. Then, the user's action to determine his / her search position and browse the document, that is, the reaction to the document classified in the user's tree structure for browsing the document is displayed. The reaction information acquisition unit 216 collects information related to the response to the document.

取得された反応情報は、反応情報データベース２２３に保存され、使用者反応情報は、使用者に対する識別情報、反応時刻及び反応対象に対する情報を含むことができる。このように収集・保存された使用者反応情報は、ツリー構造を更新する基礎資料として活用される。 The obtained reaction information is stored in the reaction information database 223, and the user reaction information can include identification information for the user, reaction time, and information about the reaction target. The user response information collected and stored in this way is used as basic data for updating the tree structure.

一方、文書提供部２１５は、ツリー構造の文書に含まれた情報を検索するための検索手段を提供することができる。このような検索手段は、文書提供部２１５が使用者端末機２３０に提供するウェブページなどに含まれた検索ウィンドウなどの形態で提供される。このような検索ウィンドウに入力された質疑語も、使用者の反応情報としてツリー構造の更新に使用される。 On the other hand, the document providing unit 215 can provide search means for searching for information contained in a tree-structured document. Such search means is provided in the form of a search window included in a web page provided by the document providing unit 215 to the user terminal 230. The question words input in such a search window are also used for updating the tree structure as user reaction information.

文書のツリー構造を更新する段階（Ｓ１４０）は、ツリー構造更新部２１７が文書のツリー構造を更新する段階である。ツリー構造更新部２１７は、反応情報データベース２３２に保存された使用者の反応情報に基づいて文書のツリー構造を更新することができる。
ツリー構造の更新は、不必要なディレクトリ、クラスター及び文書をツリーから排除する形式などで行われる。 The step of updating the tree structure of the document (S140) is a step of updating the tree structure of the document by the tree structure updating unit 217. The tree structure update unit 217 can update the tree structure of the document based on the user reaction information stored in the reaction information database 232.
The tree structure is updated in a form that excludes unnecessary directories, clusters, and documents from the tree.

一例として、特定のクラスターに属する文書に対するリンクが使用者に持続的に提供されていたにもかかわらず、使用者による閲覧頻度が他の文書に比べて相対的に低い場合、該当の文書とクラスターとの連結を断絶（解除）することで、ツリー構造を更新することができる。 For example, if a link to a document belonging to a specific cluster is continuously provided to the user but the frequency of browsing by the user is relatively low compared to other documents, the document and the cluster The tree structure can be updated by disconnecting (releasing) the connection with.

これと同様に、特定のクラスターに対する使用者の反応頻度が相対的に低い場合、該当のクラスターをディレクトリから排除（削除）することも可能である。 Similarly, when the user's response frequency to a specific cluster is relatively low, it is possible to exclude (delete) the corresponding cluster from the directory.

また、含まれた下位分類に対する使用者の閲覧頻度を考慮し、上位分類を変更することができる。一例として、特定のディレクトリに含まれた各クラスターに対する使用者の訪問頻度が過度に低い場合、そのディレクトリをツリー構造から排除（削除）することができる。 In addition, the upper classification can be changed in consideration of the user's browsing frequency for the included lower classification. As an example, when the frequency of user visits to each cluster included in a specific directory is excessively low, the directory can be excluded (deleted) from the tree structure.

このようなツリー構造の変更は、ツリー構造の各構成要素の間の連結程度を数値化することで行われる。一例として、使用者が頻繁に用いる探索経路に対して所定の点数を付与することで、頻繁に用いられる探索経路に含まれた各連結に対して高い点数を付与することができ、その反対に、該当の連結に対して低い点数を付与することができる。この場合、所定値以下の点数を有する各連結を断絶（解除）対象の候補として選定する方法によってツリー構造の更新が可能である。 Such a change in the tree structure is performed by quantifying the degree of connection between the components of the tree structure. As an example, by assigning a predetermined score to a frequently used search route, a high score can be given to each link included in a frequently used search route, and vice versa. , A low score can be given to the corresponding connection. In this case, the tree structure can be updated by a method of selecting each connection having a score of a predetermined value or less as a candidate for disconnection (cancellation).

また、特定の文書に対して使用者たちがアクセスした探索経路の比率も、ツリー構造を更新する情報として考慮される。文書が二つ以上の上位概念と関連している場合、使用者の流入経路（アクセス経路）が特定の経路に偏重されていると、その経路のみを残す形態でツリー構造を更新することができる。特定の経路で流入（アクセス）する比率が極めて低い場合には、その経路を排除することも可能である。 The ratio of search paths accessed by users for a specific document is also considered as information for updating the tree structure. When a document is related to two or more superordinate concepts, if the user's inflow route (access route) is biased to a specific route, the tree structure can be updated in a form that leaves only that route. . When the ratio of inflow (access) in a specific route is very low, the route can be excluded.

このように、探索経路に対する使用者の照会頻度及び特定の文書と関連した流入経路などは、文書分類部２１５によって提供された情報に対する使用者の反応情報に基づいて算出される。 As described above, the user's inquiry frequency with respect to the search route and the inflow route related to the specific document are calculated based on the user's reaction information with respect to the information provided by the document classification unit 215.

一方、ツリー構造は、ツリー構造に属する文書を閲覧した閲覧者たちが入力した質疑語情報をさらに考慮して更新される。ウェブ上の文書を検索して閲覧する場合、その文書を含む検索結果を入力するために入力した質疑語は、その文書と関連しているものと判断される。一例として、検索エンジンで‘ボルドーワイン’という質疑語を入力した使用者が閲覧した文書は、‘ボルドーワイン’と関連した文書であると判断される。 On the other hand, the tree structure is updated by further considering the question word information input by the viewers who have viewed the documents belonging to the tree structure. When searching and browsing a document on the web, it is determined that a question word input to input a search result including the document is related to the document. As an example, a document viewed by a user who has input a query word “Bordeaux wine” using a search engine is determined to be a document related to “Bordeaux wine”.

一方、ツリー構造は、新しいディレクトリ及び新しいクラスターを形成することによっても更新される。原本文書データベース２２１に新しい文書が追加される場合、これら新しい文書は、既存のツリー構造に新しく連結される。 On the other hand, the tree structure is also updated by creating new directories and new clusters. When new documents are added to the original document database 221, these new documents are newly linked to the existing tree structure.

上述したように、文書のクラスターを形成する場合において、クラスターに含まれた文書の個数も考慮対象になり得る。新しい文書の追加によって特定のクラスターに含まれた文書の個数が過度に大きくなる場合、該当のクラスターを多数個のクラスターに分離することも可能である。この場合、クラスターをディレクトリに変更することも考慮される。 As described above, when forming a cluster of documents, the number of documents included in the cluster can also be considered. When the number of documents included in a specific cluster becomes excessively large due to the addition of a new document, it is possible to separate the corresponding cluster into a large number of clusters. In this case, changing the cluster to a directory is also considered.

このように更新されたツリー構造は、探索サービスデータベース２２２に保存される。また、更新されたツリー構造は、文書提供部２１５によって使用者に提供され、これを通じて、使用者は、より正確な分類結果を有するツリー構造を探索することができる。 The tree structure updated in this way is stored in the search service database 222. Also, the updated tree structure is provided to the user by the document providing unit 215, and through this, the user can search for a tree structure having a more accurate classification result.

一方、本発明の一実施例に係る探索サービス提供サーバー２１０は、広告提供部２１８をさらに含むことができる。 Meanwhile, the search service providing server 210 according to an embodiment of the present invention may further include an advertisement providing unit 218.

広告提供部２１８は、広告コンテンツを使用者端末機２３０に提供することができる。広告コンテンツは、広告データベース２２４に保存されるもので、広告提供部２１８によって呼び出されて使用者端末機２３０に伝送される。 The advertisement providing unit 218 can provide advertisement content to the user terminal 230. The advertisement content is stored in the advertisement database 224 and is called by the advertisement providing unit 218 and transmitted to the user terminal 230.

使用者端末機２３０に伝送される広告コンテンツを決定する要素として、使用者に関する情報及び使用者が探索する文書に関する情報などが考慮される。 Information regarding the user, information regarding a document searched by the user, and the like are considered as factors for determining the advertisement content transmitted to the user terminal 230.

一例として、使用者が文書探索サービスを用いる過程でログイン手順を行った場合、使用者の年齢、職業、性別及び居住地域などの使用者の個人情報が広告コンテンツ決定要素として考慮される。 As an example, when a user performs a login procedure in the course of using a document search service, the user's personal information such as the user's age, occupation, gender, and residential area is considered as an advertising content determination factor.

一方、広告コンテンツ決定要素として、使用者端末機２３０を通じて閲覧される文書に関する各情報が考慮され、使用者が入力した質疑語情報も考慮される。 On the other hand, each piece of information related to a document viewed through the user terminal 230 is considered as an advertisement content determining factor, and question word information input by the user is also considered.

また、使用者が本発明の一実施例に係る文書探索サービスを用いる過程で取得される各情報も、コンテンツ決定要素として考慮される。 In addition, each piece of information acquired in the process of using the document search service according to the embodiment of the present invention by the user is also considered as a content determination factor.

このように、使用者に関する情報及び使用者が探索する文書に関する情報などを用いて提供される広告コンテンツを決定することで、提供される広告の効果が極大化されるという長所がある。 As described above, by determining the advertisement content to be provided by using the information about the user and the information about the document searched by the user, there is an advantage that the effect of the provided advertisement is maximized.

図３は、本発明の一実施例に係る更新前の文書ツリー構造を例示した図で、図４は、本発明の一実施例に係る更新後の文書ツリー構造を例示した図である。 FIG. 3 is a diagram illustrating a document tree structure before update according to an embodiment of the present invention, and FIG. 4 is a diagram illustrating a document tree structure after update according to an embodiment of the present invention.

図３及び図４を参照すると、原本文書データベース２２１に属する各文書は、分類（主題）、ディレクトリ、クラスター及び文書の順に連結される階層構造によって構造化されている。 3 and 4, each document belonging to the original document database 221 is structured by a hierarchical structure in which a classification (theme), a directory, a cluster, and a document are connected in this order.

図３及び図４を参照すると、各文書が構造化される上位概念である分類（主題）は‘ワイン’である。特定の文書が‘ワイン’分類に該当するかどうかは、文書分類部２１１によって判断される。 Referring to FIGS. 3 and 4, the classification (theme), which is a superordinate concept in which each document is structured, is 'wine'. The document classification unit 211 determines whether a specific document falls under the “wine” classification.

分類（主題）は、その下位概念として少なくとも一つのディレクトリを含むことができる。‘ワイン’分類は、‘ワインの産地’、‘ワインの歴史’及び‘ワインエチケット’と命名されたディレクトリを含む。 A classification (subject) can include at least one directory as its subordinate concept. The 'Wine' classification includes directories named 'Wine Origin', 'Wine History' and 'Wine Etiquette'.

ディレクトリの名称は、使用者が文書を探索しようとするグループの名として機能できるので、原本文書データベース２２１に保存された各文書の出処であるバーティカルサイト及びブログなどで使用される文書グループの名称をディレクトリ名として使用することで、使用者の文書探索効率を高めることができる。 Since the name of the directory can function as the name of the group in which the user tries to search for the document, the name of the document group used in the vertical site and blog that is the source of each document stored in the original document database 221 is used. By using it as a directory name, the document search efficiency of the user can be improved.

ディレクトリは、その下位に少なくとも一つのクラスターを含むことができる。‘ワインの産地’ディレクトリは、‘ボルドー’、‘ブルゴーニュ’、‘カンパーニャ’及び‘ボルドーＴＶ’と命名されたクラスターを含む。 A directory can include at least one cluster below it. The 'Wine District' directory contains clusters named 'Bordeaux', 'Burgundy', 'Campaña' and 'Bordeaux TV'.

クラスターも少なくとも一つの文書を含むことができ、図３に例示した文書２の場合、その題目に‘ボルドー’及び‘ブルゴーニュ’を含むので、‘ボルドー’クラスターと‘ブルゴーニュ’クラスターの全てに含まれる。 The cluster can also include at least one document. In the case of document 2 illustrated in FIG. 3, the title includes “Bordeaux” and “Burgundy”, so it is included in all of the “Bordeaux” cluster and the “Burgundy” cluster. .

一方、バーティカルサイト及びブログで使用されるディレクトリの名称及びこのようなサイトに含まれた構造化された各文書のフィールド情報が、文書を主題別に分類してクラスタリングするために使用可能であることは、図１及び図２の詳細な説明で述べた通りである。 On the other hand, the names of directories used in vertical sites and blogs and the field information of each structured document contained in such sites can be used to classify and cluster documents by subject. This is as described in the detailed description of FIGS.

一方、‘ワインに似たボルドーＴＶ’という題目を有する文書３は、ワインに対する文書ではない韓国の電子製品業体で生産したＴＶに関する文書で、ワインに関する文書として分類されるのに不適切なノイズ文書である。 On the other hand, Document 3, which has the title 'Bordeaux TV similar to wine', is a document related to TV produced by a Korean electronics industry that is not a document for wine, and is inappropriate noise to be classified as a document related to wine. It is a document.

しかしながら、文書３は、その題目に‘ワイン’及び‘ボルドー’という文字列を含み、文書のコンテンツとしてワインと関連したマーケティング活動、製品開発コンセプトを含むので、‘ワイン’分類で別途のクラスターに含まれる結果をもたらす。 However, since Document 3 includes the strings “Wine” and “Bordeaux” in its title, and includes marketing activities and product development concepts related to wine as document content, it is included in a separate cluster under the “Wine” classification. Results.

これら文書は、ワインに対する関心を持って文書を探索する訪問者の注意を引くことができないので、他の文書に比べて低い照会数が記録されるようになる。このような訪問者の反応情報は、ツリー構造の更新に反映される。 Since these documents cannot attract the attention of visitors searching for documents with an interest in wine, a lower number of queries will be recorded compared to other documents. Such visitor reaction information is reflected in the update of the tree structure.

一方、上述したように、文書の閲覧者が入力した質疑語情報もツリー構造の更新に反映される。文書３を閲覧する使用者が入力した質疑語の分布が‘ワイン’や‘ボルドー’でなく、製品の生産者である‘三星電子’や‘ＴＶ’に偏重されている場合、これをツリー構造に反映し、‘ワイン’分類から文書３を排除（削除）することができる。 On the other hand, as described above, the question word information input by the document viewer is also reflected in the update of the tree structure. If the distribution of questions entered by the user viewing the document 3 is biased by the product producers “Samsung Electronics” and “TV” instead of “Wine” and “Bordeaux”, this is represented in a tree structure. The document 3 can be excluded (deleted) from the “wine” classification.

一方、上述したように文書に流入する経路の比率がツリー構造の更新に反映される。一例として、文書３が分類（主題）‘ワイン’のみならず、分類（主題）‘ＴＶ’（図示せず）に連結されている場合、文書３を閲覧する使用者のうち、後者と関連した探索経路で文書３にアクセスする使用者の比率が相対的に大きくなり得る。この場合、その偏重度が所定の値を超えると、文書３と関連した‘ワイン’分類の経路を遮断することで、ツリー構造を更新することができる。 On the other hand, as described above, the ratio of paths flowing into the document is reflected in the update of the tree structure. As an example, if the document 3 is linked not only to the classification (theme) 'wine' but also to the classification (theme) 'TV' (not shown), the user who viewed the document 3 was associated with the latter The ratio of users accessing the document 3 through the search path can be relatively large. In this case, if the degree of weighting exceeds a predetermined value, the tree structure can be updated by blocking the path of the 'wine' classification associated with the document 3.

これによって、更新後のツリー構造を表す図４では、クラスター‘ボルドーＴＶ’と文書３が除去された。 As a result, the cluster ‘Bordeaux TV’ and the document 3 are removed in FIG. 4 representing the tree structure after the update.

一方、‘カンパーニャ産モッツァレッラチーズ−ボルドーワインにうってつけ’という題目を有する文書４の場合、含まれたコンテンツは、モッツァレッラチーズに対するもので、ワイン生産地としてのボルドーに対する内容を取り扱った文書ではない。 On the other hand, in the case of the document 4 having the title of “Campagna Mozzarella Cheese-Perfect for Bordeaux Wine”, the content included is for the mozzarella cheese, not the document dealing with the content for Bordeaux as a wine producing area.

したがって、ワインの生産地としてのボルドーに対する関心を持ってクラスター‘ボルドー’を探索する使用者の照会頻度が、文書１に比べて低くなり得る。この場合、ツリー構造更新過程で、文書４とクラスター‘ボルドー’との連結が断絶（解除）され、図４のような結果を表す。 Therefore, the inquiry frequency of the user who searches for the cluster 'Bordeaux' with an interest in Bordeaux as a wine-producing region can be lower than that of the document 1. In this case, in the process of updating the tree structure, the connection between the document 4 and the cluster “Bordeaux” is broken (released), and the result shown in FIG. 4 is obtained.

一方、この過程で照会頻度だけではなく、流入経路の比率が考慮されることは、上述した通りである。 On the other hand, as described above, not only the inquiry frequency but also the ratio of the inflow route is considered in this process.

しかしながら、‘カンパーニャ’は、イタリアワインの生産地でもあるので、文書３の場合と異なり、クラスター自体が除去されないこともある。このようにクラスター及び文書自体を主題から排除すること以外にも、ツリー構造での連結を変更することでツリー構造が更新される。 However, 'Campaña' is also an Italian wine production area, so unlike the case of Document 3, the cluster itself may not be removed. As described above, in addition to excluding the cluster and the document itself from the subject, the tree structure is updated by changing the connection in the tree structure.

図５は、本発明の一実施例に係る更新前の文書探索サービス提供画面を例示した図で、図６は、本発明の一実施例に係る更新後の文書探索サービス提供画面を例示した図である。すなわち、図５は、図３に関する文書探索サービス提供画面で、図６は、図４に関する文書探索サービス提供画面である。 FIG. 5 is a diagram illustrating a document search service provision screen before update according to an embodiment of the present invention, and FIG. 6 is a diagram illustrating a document search service provision screen after update according to an embodiment of the present invention. It is. 5 is a document search service providing screen related to FIG. 3, and FIG. 6 is a document search service providing screen related to FIG.

図５及び図６を参照すると、文書探索サービス提供画面は、分類表示領域５００、分類構造表示領域５１０、探索位置表示領域５２０、クラスター表示領域５３０及び文書リンク５３２を含んでいる。 5 and 6, the document search service providing screen includes a classification display area 500, a classification structure display area 510, a search position display area 520, a cluster display area 530, and a document link 532.

上述したように、本実施例の文書探索サービスを提供するために、文書提供部２１５は、使用者端末機２３０に提供されるウェブページを生成して伝送することができる。 As described above, the document providing unit 215 can generate and transmit a web page provided to the user terminal 230 in order to provide the document search service of the present embodiment.

文書探索サービスで提供されるウェブページは、図５及び図６に例示したような画面構成を有することができる。このような画面構成には、探索対象文書の構造化を視覚化するための表示領域が含まれる。 The web page provided by the document search service can have a screen configuration as illustrated in FIGS. 5 and 6. Such a screen configuration includes a display area for visualizing the structure of the search target document.

分類表示領域５００は、文書が構造化される上位概念である分類（主題）に関する情報が表示される領域である。本実施例では、分類名である‘ワイン’が相対的に差別化された属性として表示されている。 The classification display area 500 is an area in which information related to classification (subject), which is a superordinate concept in which a document is structured, is displayed. In this embodiment, “Wine” as a classification name is displayed as a relatively differentiated attribute.

分類構造表示領域５１０は、図３及び図４に例示したツリー構造を使用者に提供する領域である。図４の場合、使用者が探索しているディレクトリである‘ワインの産地’とクラスター‘ボルドー’は、他の項目と差別化された属性として表示されている。 The classification structure display area 510 is an area for providing the tree structure illustrated in FIGS. 3 and 4 to the user. In the case of FIG. 4, “the wine region” and the cluster “Bordeaux”, which are directories searched by the user, are displayed as attributes differentiated from other items.

また、使用者が探索しているディレクトリとクラスターを表示する探索位置表示領域５２０が追加的に提供される。 In addition, a search position display area 520 for displaying the directory and cluster being searched by the user is additionally provided.

クラスター表示領域５３０は、使用者が探索しているクラスターに属する各文書に対するアクセス手段を提供する領域である。本実施例において、使用者が探索しているクラスターの中心概念は‘ボルドー’で、クラスター表示領域５３０には、‘ボルドー’という中心概念と関連した文書に対するアクセス手段としての文書リンク５３２が提供される。 The cluster display area 530 is an area that provides an access means for each document belonging to the cluster that the user is searching for. In this embodiment, the central concept of the cluster that the user is searching for is “Bordeaux”, and the cluster display area 530 is provided with a document link 532 as an access means for a document associated with the central concept “Bordeaux”. The

クラスター表示領域５３０には、使用者が探索しているクラスターに属する各文書に対する文書リンク５３２が提供される。文書リンク５３２は、参照する文書の題目情報をアンカーテキストとして表示することができる。 The cluster display area 530 is provided with a document link 532 for each document belonging to the cluster searched by the user. The document link 532 can display the title information of the document to be referenced as anchor text.

文書リンク５３２は、各文書に対するリンクである。このリンクを選択することで、使用者は、自身が探索しようとする情報を含む文書の内容にアクセスすることができる。この場合、選択された文書のコンテンツは、使用者端末機２３０で新しいブラウザーウィンドウを生成することで提供され、図５及び図６のような文書探索サービス提供画面が表示されたブラウザーウィンドウの一部または全部を更新することによっても提供される。 The document link 532 is a link to each document. By selecting this link, the user can access the contents of the document including the information he / she wants to search. In this case, the content of the selected document is provided by generating a new browser window on the user terminal 230, and a part of the browser window on which the document search service providing screen as shown in FIGS. 5 and 6 is displayed. Or it can be provided by updating everything.

使用者は、図５及び図６で提供される各領域の項目をクリック方法などで選択することで、自身の探索対象を変更することができる。これに対する応答として、文書提供部２１５は、分類構造表示領域５１０、探索位置表示領域５２０及びクラスター表示領域５３０に視覚化される文書に関する情報を提供することができる。 The user can change his / her search target by selecting items in each area provided in FIG. 5 and FIG. 6 by a click method or the like. In response to this, the document providing unit 215 can provide information regarding the document visualized in the classification structure display area 510, the search position display area 520, and the cluster display area 530.

一方、このような使用者たちの文書探索行為に関する情報は、使用者反応情報として反応情報データベース２３２に保存される。これら使用者反応情報に基づいてツリー構造更新部２１７がツリー構造を更新可能であることは、上述した通りである。 On the other hand, the information regarding the user's document search action is stored in the reaction information database 232 as user reaction information. As described above, the tree structure update unit 217 can update the tree structure based on the user response information.

上述した画面構成を通して使用者にディレクトリ構造及びクラスタリング構造を視覚的に伝達することで、使用者が訪問するバーティカルサイト及びブログなどを個別的に訪問せずにも、関心分野に対する情報を効率的に探索することができ、上述した画面に対する使用者の反応情報などをツリー構造に反映することで、ツリー構造をより正確でかつ効率的に更新することができる。 By visually transmitting the directory structure and clustering structure to the user through the above-described screen structure, information on the field of interest can be efficiently obtained without visiting the vertical sites and blogs visited by the user individually. The tree structure can be updated more accurately and efficiently by reflecting, in the tree structure, user reaction information on the screen described above.

図７は、本発明の一実施例に係る広告表示領域を含む文書探索サービス提供画面を例示した図である。図７を参照すると、本発明の一実施例に係る広告表示領域を含む文書探索サービス提供画面は、分類表示領域５００、分類構造表示領域５１０、探索位置表示領域５２０、クラスター表示領域５３０及び広告表示領域７１０を含んでいる。 FIG. 7 is a diagram illustrating a document search service providing screen including an advertisement display area according to an embodiment of the present invention. Referring to FIG. 7, a document search service providing screen including an advertisement display area according to an embodiment of the present invention includes a classification display area 500, a classification structure display area 510, a search position display area 520, a cluster display area 530, and an advertisement display. Region 710 is included.

広告表示領域７１０は、広告提供部２１８が使用者端末機２３０に提供する広告コンテンツが表示される領域である。広告表示領域７１０には、テキスト広告コンテンツ７１１及びアニメーション広告コンテンツ７１２が表示されている。 The advertisement display area 710 is an area in which advertisement content provided by the advertisement providing unit 218 to the user terminal 230 is displayed. In the advertisement display area 710, text advertisement content 711 and animation advertisement content 712 are displayed.

テキスト広告コンテンツ７１１及びアニメーション広告コンテンツ７１２は、広告主と関連した追加的な情報を含んでいるサイトに接続可能なリンクなどを追加的に含むことができる。 The text advertisement content 711 and the animation advertisement content 712 may additionally include a link that can be connected to a site that includes additional information related to the advertiser.

広告表示領域７１０に表示される広告は、クリック回数に相応して広告費用が執行されるＰＰＣ（ｐａｙｐｅｒｃｌｉｃｋ）モデル、及び露出回数のうち少なくとも一つに相応して広告費用が執行されるＰＰＶ（ｐａｙｐｅｒｖｉｅｗ）モデルなどに基づいて運営される。 The advertisement displayed in the advertisement display area 710 includes a PPC (pay per click) model in which the advertising cost is executed according to the number of clicks, and a PPV in which the advertising cost is executed in accordance with at least one of the number of exposures. Operated based on a (payper view) model.

一方、広告表示領域７１０に表示される広告コンテンツを決定する要素として、使用者に関する情報及び使用者が探索する文書に関する情報などが考慮されることは、上述した通りである。 On the other hand, as described above, information relating to the user, information relating to the document searched by the user, and the like are considered as factors for determining the advertisement content displayed in the advertisement display area 710.

一例として、文書の主題（分類）、ディレクトリ、クラスター及び文書の題目に関する情報が広告コンテンツ決定要素として考慮される。図５を参照すると、使用者が探索している主題は‘ワイン’で、ディレクトリは‘ワインの産地’で、クラスターは‘ボルドー’である。このような情報に基づいて、‘ボルドーワイン共同購買申し込み’というタイトルを有するテキスト広告コンテンツ７１１を提供することで、広告の効果を最大化することができる。 As an example, information about the subject (classification), directory, cluster, and document title of the document is considered as an advertising content determinant. Referring to FIG. 5, the subject that the user is searching for is “wine”, the directory is “wine origin”, and the cluster is “Bordeaux”. By providing the text advertisement content 711 having the title 'Bordeaux wine joint purchase application' based on such information, the effect of the advertisement can be maximized.

一方、文書探索サービス提供方法は、コンピュータプログラムで作成可能である。前記プログラムを構成する各コード及び各コードセグメントは、当該分野のコンピュータプログラマーによって容易に推論される。また、前記プログラムは、コンピュータ可読情報保存媒体に保存され、コンピュータによって読まれて実行されることで、文書探索サービス提供方法を具現する。前記情報保存媒体は、磁気記録媒体、光記録媒体及びキャリアウェーブ媒体を含む。 On the other hand, the document search service providing method can be created by a computer program. Each code and each code segment constituting the program is easily inferred by a computer programmer in the field. In addition, the program is stored in a computer-readable information storage medium, and is read and executed by a computer, thereby realizing a document search service providing method. The information storage medium includes a magnetic recording medium, an optical recording medium, and a carrier wave medium.

本出願で使用した用語は、特定の実施例を説明するために使用されたもので、本発明を限定する意図を持つものではない。単数の表現は、文脈上に明白に異なった意味を持たない限り、複数の表現を含む。 The terminology used in this application is used to describe specific examples and is not intended to limit the invention. An expression used in the singular encompasses the expression of the plural, unless it has a clearly different meaning in context.

本出願で、"含む"または"有する"などの用語は、明細書に記載された特徴、数字、段階、動作、構成要素、部品またはこれらの組み合わせが存在することを指定するもので、一つまたはそれ以上の他の特徴や数字、段階、動作、構成要素、部品またはこれらの組み合わせの存在または付加可能性を予め排除しないものとして理解されるべきである。
‘第１、第２’などの用語は、多様な構成要素を説明するために使用されるが、前記各構成要素は、前記各用語によって限定されてはならない。前記各用語は、一つの構成要素を他の構成要素から区別する目的のみに使用される。 In this application, a term such as “including” or “having” designates the presence of a feature, number, step, action, component, part, or combination thereof as described in the specification. It should also be understood as not excluding the existence or additional possibilities of other features or numbers, steps, actions, components, parts or combinations thereof.
Terms such as “first” and “second” are used to describe various components, but each component should not be limited by each term. Each term is used only to distinguish one component from another.

以上、本発明の実施例に対して説明したが、上述した実施例以外にも多くの実施例が本発明の特許請求の範囲内に存在する。本発明の属する技術分野で通常の知識を有する者は、本発明がその本質的な特性から逸脱しない範囲で変形された形態で具現可能であることを理解するであろう。したがって、開示された実施例は、限定的な観点でなく、説明的な観点で考慮されるべきである。本発明の範囲は、上述した説明でなく、特許請求の範囲に示されており、それと同等な範囲内にある全ての差異点は、本発明に含まれたものとして解析されるべきである。 Although the embodiments of the present invention have been described above, many embodiments other than the above-described embodiments are within the scope of the claims of the present invention. Those skilled in the art to which the present invention pertains will understand that the present invention can be embodied in a modified form without departing from the essential characteristics thereof. Accordingly, the disclosed embodiments are to be considered in an illustrative, not a limiting sense. The scope of the present invention is shown not in the above description but in the claims, and all differences within the equivalent scope should be analyzed as being included in the present invention.

２１０探索サービス提供サーバー
２１１文書分類部
２１２題目抽出部
２１３クラスター形成部
２１４ディレクトリマッピング部
２１５文書提供部
２１６反応情報獲得部
２１７ツリー構造更新部
２１８広告提供部
２２１原本文書データベース
２２２探索サービスデータベース
２２３反応情報データベース
２２４広告データベース
２３０使用者端末機
２４０ツリー構造形成部
５００分類表示領域
５１０分類構造表示領域
５２０探索位置表示領域
５３０クラスター表示領域
５３２文書リンク
７１０広告表示領域
７１１テキスト広告コンテンツ
７１２アニメーション広告コンテンツ 210 Search service provision server 211 Document classification unit 212 Subject extraction unit 213 Cluster formation unit 214 Directory mapping unit 215 Document provision unit 216 Reaction information acquisition unit 217 Tree structure update unit 218 Advertisement provision unit 221 Original document database 222 Search service database 223 Reaction information Database 224 Advertising database 230 User terminal 240 Tree structure forming unit 500 Classification display area 510 Classification structure display area 520 Search position display area 530 Cluster display area 532 Document link 710 Advertisement display area 711 Text advertisement content 712 Animation advertisement content

Claims

A method for providing a user terminal with a document search service capable of searching a classified document by a document search service providing system,
Receiving information on the subject of interest from the user terminal;
Extracting a document related to information on the subject of interest from the classified documents, and providing the extracted document to the user terminal;
The document classification is:
Classify documents by subject,
Extract the title of the document,
Forming a cluster based on the extracted subjects;
A document search service providing method, characterized in that the cluster is formed by a document structure forming method comprising mapping the cluster to a predetermined directory belonging to the subject.

The document includes at least one field;
The method for providing a document search service according to claim 1, wherein the title of the document is extracted in consideration of attributes of fields constituting the document.

The formation of the cluster is
The document search service provision according to claim 1, wherein the extracted subject is classified in syllable units, and a portion of the subject that is shared with another document is selected as a central concept candidate of the cluster. Method.

A computer-readable recording medium having recorded thereon a computer program for executing the document search service providing method according to any one of claims 1 to 3.

A system for providing a user terminal with a document search service capable of searching for documents classified in a tree structure,
A tree structure forming unit for forming a classification structure of the document;
Of the documents classified by the classification structure, a document related to information on a subject received by the user from the user terminal is extracted, and the extracted document is extracted from the user terminal. Including a document providing unit provided to
The tree structure forming unit includes:
A document classification unit for classifying the document according to subject;
A subject extraction unit for extracting the subject of the document;
A cluster forming unit for forming a cluster based on the extracted title;
A document mapping service providing system comprising: a directory mapping unit that maps the cluster to a predetermined directory belonging to the subject.

The document includes at least one field;
The system for providing a document search service according to claim 5, wherein the subject extraction unit extracts the subject based on an attribute of a field constituting the document.

The cluster forming part is
6. The document search service provision according to claim 5, wherein the extracted subject is classified in syllable units, and a portion of the subject that is shared with other documents is selected as a central concept candidate of the cluster. system.