JP4885112B2

JP4885112B2 - Document processing apparatus, document processing method, and document processing program

Info

Publication number: JP4885112B2
Application number: JP2007293392A
Authority: JP
Inventors: 慶久大黒
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2007-11-12
Filing date: 2007-11-12
Publication date: 2012-02-29
Anticipated expiration: 2027-11-12
Also published as: JP2009122758A

Description

本発明は、文書画像間の照合を行う文書処理装置、文書処理方法および文書処理プログラムに関する。 The present invention relates to a document processing apparatus, a document processing method, and a document processing program that perform collation between document images.

従来、文字列が画像（文字行）として記録された文書画像中から文字列を抽出する方法として、種々の技術が提案されている。例えば、文書画像に含まれた文字行に外接する矩形の形状及び位置に関する特徴（大きさ、間隔等）について、複数の制約を適用することにより文字行を文字列として認識することが可能な技術が提案されている（例えば、特許文献１、２参照）。 Conventionally, various techniques have been proposed as a method for extracting a character string from a document image in which the character string is recorded as an image (character line). For example, a technique capable of recognizing a character line as a character string by applying a plurality of restrictions on features (size, spacing, etc.) related to the shape and position of a rectangle circumscribing the character line included in the document image Has been proposed (see, for example, Patent Documents 1 and 2).

特開平１１−２１９４０７号公報JP 11-219407 A 国際公開第００／６２２４３号パンフレットInternational Publication No. 00/62243 Pamphlet

しかしながら、特許文献１、２に記載の技術では、文字行の認識を精度よく行うために外接矩形に関数する複数の制約を人手によって最適値に調整する必要がある。また、文字行らしさを判定することはできるものの、文字行の内容に関する特徴を認識することはできないため、文書画像間の照合に用いたとしても十分な精度を得ることができない可能性がある。また、複数行間の相対的な位置関係の利用については何等言及されていないため、文書画像の一部分となる部分画像を照合対象とした場合には、対応することができないという問題がある。 However, in the techniques described in Patent Documents 1 and 2, it is necessary to manually adjust a plurality of constraints that function in the circumscribed rectangle to an optimum value in order to recognize a character line with high accuracy. In addition, although the character-likeness can be determined, the characteristics relating to the contents of the character line cannot be recognized, so that there is a possibility that sufficient accuracy cannot be obtained even when used for collation between document images. Further, since there is no mention of the use of the relative positional relationship between a plurality of lines, there is a problem that it is not possible to deal with a partial image that is a part of a document image as a collation target.

本発明は、上記に鑑みてなされたものであって、文書画像間の類似性をより効率的且つ高精度に判定することが可能な文書処理装置、文書処理方法及び文書処理プログラムを提供することを目的とする。 The present invention has been made in view of the above, and provides a document processing apparatus, a document processing method, and a document processing program capable of determining similarity between document images more efficiently and with high accuracy. With the goal.

上述した課題を解決し、目的を達成するために、請求項１に係る発明は、文書画像間の照合を行う文書処理装置において、前記文書画像に含まれた文字画像毎の外接矩形に基づいて、当該外接矩形を連結した文字行を切り出す文字行切出手段と、前記文字行内における前記外接矩形の特性を表す配置情報を固定段階に量子化する量子化手段と、前記量子化された配置情報の各々を固定種類のシンボルにシンボル化するシンボル生成手段と、所定個の前記シンボルの組合せからなるシンボル系列の出現頻度を算出する出現頻度算出手段と、照合対象の文書画像と、当該文書画像の被照合対象となる複数の文書画像とについて、前記出現頻度算出手段により算出された出現頻度を照合し、より高い相関を有した被照合対象の文書画像を所定数選定する被照合対象選定手段と、前記照合対象の文書画像と、前記被照合対象選定手段により選定された被照合対象の文書画像の各々とで一致した前記シンボル系列に対応する各配置情報に基づいて、当該各配置情報の何れか又は全てが表す外接矩形の出現位置の分布状態を文書画像毎に導出する分布状態導出手段と、前記分布状態導出手段により導出された前記照合対象の文書画像についての分布状態と、前記被照合対象の文書画像についての分布状態との類似度を判定し、最も高い類似度を有した被照合対象の文書画像を照合結果として選定する照合結果選定手段と、を備えたことを特徴とする。 In order to solve the above-described problems and achieve the object, the invention according to claim 1 is a document processing apparatus that performs collation between document images, based on a circumscribed rectangle for each character image included in the document image. A character line cutting unit that cuts out character lines that connect the circumscribed rectangles, a quantizing unit that quantizes arrangement information representing the characteristics of the circumscribed rectangles in the character lines in a fixed stage, and the quantized arrangement information A symbol generation means for symbolizing each of the symbols into a fixed type of symbol, an appearance frequency calculation means for calculating an appearance frequency of a symbol series composed of a predetermined number of the symbols, a document image to be collated, and the document image The appearance frequency calculated by the appearance frequency calculating means is checked against a plurality of document images to be checked, and a predetermined number of document images to be checked having higher correlation are selected. On the basis of the arrangement information corresponding to the symbol series that coincides with the document image to be collated and each document image to be collated selected by the collation target selecting unit. A distribution state deriving unit for deriving, for each document image, a distribution state of appearance positions of circumscribed rectangles represented by any or all of the arrangement information, and the document image to be collated derived by the distribution state deriving unit. Collation result selection means for determining a similarity between the distribution state and the distribution state of the document image to be collated, and selecting a document image to be collated having the highest similarity as a collation result; It is characterized by that.

また、請求項２に係る発明は、請求項１に係る発明において、前記分布状態導出手段は、前記文書画像の水平方向及び／又は垂直方向について、前記外接矩形の出現位置の分布状態を導出することを特徴とする。 The invention according to claim 2 is the invention according to claim 1, wherein the distribution state deriving means derives the distribution state of the appearance position of the circumscribed rectangle in the horizontal direction and / or the vertical direction of the document image. It is characterized by that.

また、請求項３に係る発明は、請求項１又は２に係る発明において、前記分布状態導出手段は、前記外接矩形の出現位置の分布状態を度数分布ヒストグラムとして導出することを特徴とする。 The invention according to claim 3 is the invention according to claim 1 or 2, wherein the distribution state deriving means derives the distribution state of the appearance position of the circumscribed rectangle as a frequency distribution histogram.

また、請求項４に係る発明は、請求項１又は２に係る発明において、前記分布状態導出手段は、前記外接矩形の出現位置の分布状態を正規分布とみなし、当該正規分布の平均、標準偏差、歪度及び尖度を導出することを特徴とする。 The invention according to claim 4 is the invention according to claim 1 or 2, wherein the distribution state deriving means regards the distribution state of the appearance position of the circumscribed rectangle as a normal distribution, and calculates an average and a standard deviation of the normal distribution. The method is characterized by deriving skewness and kurtosis.

また、請求項５に係る発明は、請求項４に係る発明において、前記分布状態導出手段は、前記照合対象の文書画像と、前記被照合対象の文書画像とにおける前記文字行に含まれた各文字画像の前記外接矩形のサイズを集計し、当該サイズの平均値又は最頻値により前記正規分布を規定する数値を正規化することを特徴とする。 The invention according to claim 5 is the invention according to claim 4, wherein the distribution state deriving means includes each of the character lines included in the document image to be collated and the document image to be collated. The size of the circumscribed rectangle of the character image is totaled, and a numerical value defining the normal distribution is normalized by an average value or a mode value of the size.

また、請求項６に係る発明は、請求項５に係る発明において、前記分布状態導出手段は、前記照合対象の文書画像と、前記被照合対象の文書画像とにおいて一致した前記シンボル系列に対応する配置情報が表す外接矩形のサイズを集計することを特徴とする。 Further, the invention according to claim 6 is the invention according to claim 5, wherein the distribution state deriving means corresponds to the symbol series matched in the document image to be collated and the document image to be collated. The size of the circumscribed rectangle represented by the arrangement information is totaled.

また、請求項７に係る発明は、文書画像間の照合を行う文書処理装置で実行される文書処理方法であって、文字行切出手段が、前記文書画像に含まれた文字画像毎の外接矩形に基づいて、当該外接矩形を連結した文字行を切り出す文字行切出ステップと、量子化手段が、前記文字行内における前記外接矩形の特性を表す配置情報を固定段階に量子化する量子化ステップと、シンボル系列生成手段が、前記量子化された配置情報の各々を固定種類のシンボルにシンボル化するシンボル生成ステップと、出現頻度算出手段が、所定個の前記シンボルの組合せからなるシンボル系列の出現頻度を算出する出現頻度算出ステップと、被照合対象選定手段が、照合対象の文書画像と、当該文書画像の被照合対象となる複数文書画像とについて、前記出現頻度算出手段により算出された出現頻度を照合し、より高い相関を有した被照合対象の文書画像を所定数選定する被照合対象選定ステップと、分布状態導出手段が、前記照合対象の文書画像と、前記被照合対象選定ステップで選定された被照合対象の文書画像の各々とで一致した前記シンボル系列に対応する各配置情報に基づいて、当該各配置情報の何れか又は全てが表す外接矩形の出現位置の分布状態を文書画像毎に導出する分布状態導出ステップと、照合結果選定手段が、前記分布状態導出ステップで導出された前記照合対象の文書画像についての分布状態と、前記被照合対象の文書画像についての分布状態との類似度を判定し、最も高い類似度を有した被照合対象の文書画像を照合結果として選定する照合結果選定ステップと、を含むことを特徴とする。 The invention according to claim 7 is a document processing method executed by a document processing apparatus that performs collation between document images, wherein the character line cut-out means is connected to each character image included in the document image. A character line cutting step for cutting out a character line connecting the circumscribed rectangles based on the rectangle, and a quantization step for quantizing the arrangement information representing the characteristics of the circumscribed rectangle in the character line to a fixed stage. A symbol generation step in which the symbol sequence generation means converts each of the quantized arrangement information into symbols of a fixed type, and an appearance frequency calculation means generates an appearance of a symbol sequence consisting of a combination of the predetermined number of symbols The appearance frequency calculating step for calculating the frequency, and the matching target selecting means, for the document image to be verified and the plurality of document images to be compared with the document image, A collation target selection step for collating the appearance frequencies calculated by the degree calculation means and selecting a predetermined number of document images to be collated with higher correlation; and a distribution state deriving means for comparing the document images to be collated The circumscribed rectangle represented by any or all of the arrangement information based on the arrangement information corresponding to the symbol series matched with each of the document images to be checked selected in the check target selection step A distribution state deriving step for deriving a distribution state of appearance positions for each document image, and a collation result selection unit, the distribution state for the document image to be collated derived in the distribution state deriving step; A collation result selection step of determining a similarity between the document image and the distribution state, and selecting a collation target document image having the highest similarity as a collation result. And wherein the door.

また、請求項８に係る発明は、請求項７に係る発明において、前記分布状態導出手段は、前記文書画像の水平方向及び／又は垂直方向について、前記外接矩形の出現位置の分布状態を導出することを特徴とする。 The invention according to claim 8 is the invention according to claim 7, wherein the distribution state deriving means derives the distribution state of the appearance position of the circumscribed rectangle in the horizontal direction and / or the vertical direction of the document image. It is characterized by that.

また、請求項９に係る発明は、請求項７又は８に係る発明において、前記分布状態導出手段は、前記外接矩形の出現位置の分布状態を度数分布ヒストグラムとして導出することを特徴とする。 The invention according to claim 9 is the invention according to claim 7 or 8, wherein the distribution state deriving means derives the distribution state of the appearance position of the circumscribed rectangle as a frequency distribution histogram.

また、請求項１０に係る発明は、請求項７又は８に係る発明において、前記分布状態導出手段は、前記外接矩形の出現位置の分布状態を正規分布とみなし、当該正規分布の平均、標準偏差、歪度及び尖度を導出することを特徴とする。 The invention according to claim 10 is the invention according to claim 7 or 8, wherein the distribution state deriving means regards the distribution state of the appearance position of the circumscribed rectangle as a normal distribution, and calculates an average and a standard deviation of the normal distribution. The method is characterized by deriving skewness and kurtosis.

また、請求項１１に係る発明は、請求項１０に係る発明において、前記分布状態導出手段は、前記照合対象の文書画像と、前記被照合対象の文書画像とにおける前記文字行に含まれた各文字画像の前記外接矩形のサイズを集計し、当該サイズの平均値又は最頻値により前記正規分布を規定する数値を正規化することを特徴とする。 The invention according to claim 11 is the invention according to claim 10, wherein the distribution state deriving means includes each of the character lines included in the document image to be collated and the document image to be collated. The size of the circumscribed rectangle of the character image is totaled, and a numerical value defining the normal distribution is normalized by an average value or a mode value of the size.

また、請求項１２に係る発明は、請求項１１に係る発明において、前記分布状態導出手段は、前記照合対象の文書画像と、前記被照合対象の文書画像とにおいて一致した前記シンボル系列に対応する配置情報が表す外接矩形のサイズを集計することを特徴とする。 The invention according to a twelfth aspect is the invention according to the eleventh aspect, wherein the distribution state deriving means corresponds to the symbol series matched in the document image to be collated and the document image to be collated. The size of the circumscribed rectangle represented by the arrangement information is totaled.

また、請求項１３に係る発明は、文書画像間の照合を行うコンピュータを、前記文書画像に含まれた文字画像毎の外接矩形に基づいて、当該外接矩形を連結した文字行を切り出す文字行切出手段と、前記文字行内における前記外接矩形の特性を表す配置情報を固定段階に量子化する量子化手段と、前記量子化された配置情報の各々を固定種類のシンボルにシンボル化するシンボル生成手段と、前記シンボル系列内における、所定個のシンボルの組合せからなるシンボル系列の出現頻度を算出する出現頻度算出手段と、照合対象の文書画像と、当該文書画像の被照合対象となる複数の文書画像とについて、前記出現頻度算出手段により算出された出現頻度を照合し、より高い相関を有した被照合対象の文書画像を所定数選定する被照合対象選定手段と、前記照合対象の文書画像と、前記被照合対象選定手段により選定された被照合対象の文書画像の各々とで一致した前記シンボル系列に対応する各配置情報に基づいて、当該各配置情報の何れか又は全てが表す外接矩形の出現位置の分布状態を文書画像毎に導出する分布状態導出手段と、前記分布状態導出手段により導出された前記照合対象の文書画像についての分布状態と、前記被照合対象の文書画像についての分布状態との類似度を判定し、最も高い類似度を有した被照合対象の文書画像を照合結果として選定する照合結果選定手段と、して機能させることを特徴とする。 According to a thirteenth aspect of the present invention, a computer that performs collation between document images, based on a circumscribed rectangle for each character image included in the document image, cuts out a character line that connects the circumscribed rectangles. Output means, quantization means for quantizing the arrangement information representing the characteristics of the circumscribed rectangle in the character line in a fixed stage, and symbol generation means for symbolizing each of the quantized arrangement information into a fixed type of symbol And an appearance frequency calculating means for calculating an appearance frequency of a symbol series composed of a combination of a predetermined number of symbols in the symbol series, a document image to be collated, and a plurality of document images to be collated with the document image The matching target selection is performed by collating the appearance frequency calculated by the appearance frequency calculating unit and selecting a predetermined number of document images to be compared having higher correlation. Each piece of arrangement information based on the arrangement information corresponding to the symbol series matched in the stage, the document image to be collated, and each document image to be collated selected by the collation target selection unit A distribution state deriving unit for deriving the distribution state of the appearance position of the circumscribed rectangle represented by any or all of each document image, a distribution state for the document image to be collated derived by the distribution state deriving unit, and It is characterized by functioning as a collation result selecting means for determining a similarity with a distribution state of a collation target document image and selecting a collation target document image having the highest similarity as a collation result. And

また、請求項１４に係る発明は、請求項１３に係る発明において、前記分布状態導出手段は、前記文書画像の水平方向及び／又は垂直方向について、前記外接矩形の出現位置の分布状態を導出することを特徴とする。 The invention according to claim 14 is the invention according to claim 13, wherein the distribution state deriving means derives the distribution state of the appearance position of the circumscribed rectangle in the horizontal direction and / or the vertical direction of the document image. It is characterized by that.

また、請求項１５に係る発明は、請求項１３又は１４に係る発明において、前記分布状態導出手段は、前記外接矩形の出現位置の分布状態を度数分布ヒストグラムとして導出することを特徴とする。 The invention according to claim 15 is the invention according to claim 13 or 14, wherein the distribution state deriving means derives the distribution state of the appearance position of the circumscribed rectangle as a frequency distribution histogram.

また、請求項１６に係る発明は、請求項１３又は１４に係る発明において、前記分布状態導出手段は、前記外接矩形の出現位置の分布状態を正規分布とみなし、当該正規分布の平均、標準偏差、歪度及び尖度を導出することを特徴とする。 The invention according to claim 16 is the invention according to claim 13 or 14, wherein the distribution state deriving means regards the distribution state of the appearance position of the circumscribed rectangle as a normal distribution, and calculates an average and a standard deviation of the normal distribution. The method is characterized by deriving skewness and kurtosis.

また、請求項１７に係る発明は、請求項１６に係る発明において、前記分布状態導出手段は、前記照合対象の文書画像と、前記被照合対象の文書画像とにおける前記文字行に含まれた各文字画像の前記外接矩形のサイズを集計し、当該サイズの平均値又は最頻値により前記正規分布を規定する数値を正規化することを特徴とする。 The invention according to claim 17 is the invention according to claim 16, wherein the distribution state deriving means includes each of the character lines included in the document image to be collated and the document image to be collated. The size of the circumscribed rectangle of the character image is totaled, and a numerical value defining the normal distribution is normalized by an average value or a mode value of the size.

また、請求項１８に係る発明は、請求項１７に係る発明において、前記分布状態導出手段は、前記照合対象の文書画像と、前記被照合対象の文書画像とにおいて一致した前記シンボル系列に対応する配置情報が表す外接矩形のサイズを集計することを特徴とする。 The invention according to claim 18 is the invention according to claim 17, wherein the distribution state deriving means corresponds to the symbol series that matches in the document image to be collated and the document image to be collated. The size of the circumscribed rectangle represented by the arrangement information is totaled.

本発明によれば、照合対象の文書画像と被照合対象の文書画像とについて、文字行内における外接矩形の特徴を表した配置情報を抽出し、これらを固定段階に量子化してシンボルを生成することにより、文字認識することなく文字行の特徴の抽出が可能となり、被照合対象の文書画像から、照合対象の文書画像と相関の高い被照合対象の文書画像を所定の数だけ選定することができる。また、照合対象の文書画像と、選定された被照合対象の文書画像とについて、一致するシンボル系列の出現位置の分布状態を照合することで、当該シンボル系列の相対的な位置関係の類似性を判定することができるため、照合対象の文書画像と被検照合対象の文書画像との類似性を高精度に判定することができる。これにより、文書画像中の部分画像が照合対象の文書画像とされた場合であっても、この部分画像に含まれた文字画像の外接矩形の位置関係に基づいて、当該部分画像と類似する文書画像を高精度に検索することが可能となる。 According to the present invention, with respect to a document image to be collated and a document image to be collated, arrangement information representing features of a circumscribed rectangle in a character line is extracted, and these are quantized to a fixed stage to generate a symbol. Thus, it is possible to extract the characteristics of the character line without recognizing characters, and it is possible to select a predetermined number of document images to be verified that have a high correlation with the document image to be verified from the document images to be verified. . Further, by comparing the distribution state of the appearance positions of the matching symbol series between the document image to be verified and the selected document image to be verified, the similarity of the relative positional relationship of the symbol series can be obtained. Since it can be determined, the similarity between the document image to be verified and the document image to be verified can be determined with high accuracy. As a result, even if a partial image in the document image is a document image to be collated, a document similar to the partial image based on the positional relationship of the circumscribed rectangle of the character image included in the partial image Images can be searched with high accuracy.

以下に添付図面を参照して、本発明に係る文書処理装置、文書処理方法及び文書処理プログラムの最良な実施の形態を詳細に説明する。 Exemplary embodiments of a document processing apparatus, a document processing method, and a document processing program according to the present invention are explained in detail below with reference to the accompanying drawings.

（文書処理装置のハードウェア構成）
図１は、本発明の第１の実施形態にかかる文書処理装置１００のハードウェア構成を示したブロック図である。図１に示したように、文書処理装置１００は、ＰＣ（Personaｌ Computer）などのコンピュータであり、文書処理装置１００の各部を制御するＣＰＵ（Central Processing Unit）１、ＣＰＵ１を起動するためのプログラムが記憶されるＲＯＭ（Read Only Memory）２、後述する画像入力部２１により入力された文書画像やオペレーティングシステム、種々のプログラム等を記憶するハードディスク３、ＣＰＵ１のワークエリアとして機能するＲＡＭ（Random Access Memory）４、オペレータからの各種入力を受け付けるキーボード５、入力状況等を表示する表示装置６、ＣＤ−ＲＯＭなどの各種光情報記録メディア（図示せず）に記憶されたプログラム等を読み取る光ディスクドライブ７、インターネットやＬＡＮ（Local Area Network）等の電気通信回線を介して文書画像を送受信する通信装置８、原稿画像の光学的な読み取りを行うスキャナ９等から構成されており、これらの各部間で入出力されるデータをバスコントローラ１０が調停して動作する。 (Hardware configuration of document processing device)
FIG. 1 is a block diagram showing a hardware configuration of a document processing apparatus 100 according to the first embodiment of the present invention. As shown in FIG. 1, the document processing apparatus 100 is a computer such as a PC (Personal Computer), and a CPU (Central Processing Unit) 1 that controls each part of the document processing apparatus 100 and a program for starting the CPU 1 are provided. A ROM (Read Only Memory) 2 to be stored, a hard disk 3 for storing a document image and an operating system input by an image input unit 21 (to be described later), various programs, etc., and a RAM (Random Access Memory) functioning as a work area for the CPU 1 4, a keyboard 5 for receiving various inputs from an operator, a display device 6 for displaying input conditions, an optical disk drive 7 for reading programs stored in various optical information recording media (not shown) such as a CD-ROM, the Internet Document images via telecommunication lines such as LAN and Local Area Network (LAN) Communication device 8 to transmit and receive, are composed of a scanner 9, etc. for performing optical reading of a document image, a data input and output between these units is a bus controller 10 operates to arbitrate.

文書処理装置１００では、オペレータが電源を投入するとＣＰＵ１がＲＯＭ２内のローダーというプログラムを起動させ、ハードディスク３よりオペレーティングシステムというコンピュータのハードウェアとソフトウェアとを管理するプログラムをＲＡＭ４に読み込み、このオペレーティングシステムを起動させる。このようなオペレーティングシステムは、オペレータの操作に応じてプログラムを起動したり、情報を読み込んだり、保存を行ったりする。オペレーティングシステムのうち代表的なものとしては、Ｗｉｎｄｏｗｓ（登録商標）、ＵＮＩＸ（登録商標）等が知られている。これらのオペレーティングシステム上で走る動作プログラムをアプリケーションプログラムと呼んでいる。 In the document processing apparatus 100, when the operator turns on the power, the CPU 1 activates a program called a loader in the ROM 2, loads a program for managing the computer hardware and software called the operating system from the hard disk 3 into the RAM 4, and loads this operating system. Start. Such an operating system starts a program, reads information, and stores information in response to an operator's operation. As typical operating systems, Windows (registered trademark), UNIX (registered trademark), and the like are known. An operation program running on these operating systems is called an application program.

ここで、文書処理装置１００は、ＣＰＵ１が実行するプログラムとして、後述する文書照合処理）にかかる文書処理プログラムをハードディスク３に記憶している。この意味で、ハードディスク３は、文書処理プログラムを記憶する記憶媒体として機能する。 Here, the document processing apparatus 100 stores in the hard disk 3 a document processing program related to document collation processing (to be described later) as a program executed by the CPU 1. In this sense, the hard disk 3 functions as a storage medium that stores the document processing program.

また、一般的には、文書処理装置１００のハードディスク３にインストールされるプログラムは、ＣＤ−ＲＯＭなどの各種光情報記録メディアやＦＤ等の磁気メディア等の記憶媒体に記録され、この記憶媒体に記録されたプログラムがハードディスク３にインストールされる。このため、ＣＤ−ＲＯＭなどの各種光情報記録メディアやＦＤ等の磁気メディア等の可搬性を有する記憶媒体も、文書処理プログラムを記憶する記憶媒体となり得る。さらには、文書処理プログラムは、例えば通信装置８を介して外部から取り込まれ、ハードディスク３にインストールされても良い。 In general, a program installed in the hard disk 3 of the document processing apparatus 100 is recorded on a storage medium such as various optical information recording media such as a CD-ROM or a magnetic medium such as an FD, and is recorded on the storage medium. The installed program is installed in the hard disk 3. For this reason, portable storage media such as various optical information recording media such as CD-ROM and magnetic media such as FD can be storage media for storing document processing programs. Further, the document processing program may be imported from the outside via the communication device 8 and installed in the hard disk 3, for example.

ＣＰＵ１は、オペレーティングシステム上で動作する文書処理プログラムが起動すると、この文書処理プログラムとの協働により後述する各機能部を実現させる。以下、文書処理装置１００の機能的構成について説明する。 When the document processing program operating on the operating system is activated, the CPU 1 realizes each functional unit described later in cooperation with the document processing program. Hereinafter, the functional configuration of the document processing apparatus 100 will be described.

（文書処理装置の機能的構成）
図２は、文書処理装置１００の機能的構成を示したブロック図である。図２に示したように、文書処理装置１００は機能部として、画像入力部２１、照合画像選択部２２、矩形抽出部２３、行切出部２４、量子化部２５、シンボル生成部２６、出現頻度集計部２７、候補画像選定部２８、出現位置分布導出部２９、照合結果選定部３０及び表示部３１を含み構成される。 (Functional configuration of document processing device)
FIG. 2 is a block diagram illustrating a functional configuration of the document processing apparatus 100. As shown in FIG. 2, the document processing apparatus 100 includes, as function units, an image input unit 21, a collation image selection unit 22, a rectangle extraction unit 23, a line cutout unit 24, a quantization unit 25, a symbol generation unit 26, an appearance, and the like. A frequency totaling unit 27, a candidate image selecting unit 28, an appearance position distribution deriving unit 29, a matching result selecting unit 30, and a display unit 31 are included.

画像入力部２１は、外部から入力される文書画像を受け付け、ハードディスク３に記憶する。具体的に、画像入力部２１の機能は、図１に示した光ディスクドライブ７、通信装置８、スキャナ９により実現することができる。 The image input unit 21 receives a document image input from the outside and stores it in the hard disk 3. Specifically, the function of the image input unit 21 can be realized by the optical disc drive 7, the communication device 8, and the scanner 9 shown in FIG.

照合画像選択部２２は、画像入力部２１から入力される文書画像や、キーボード５を介して指定されたハードディスク３に記憶された文書画像を、照合対象の文書画像として選択する。以下、照合対象の文書画像を「照合画像」という。なお、照合画像選択部２２は、文書画像中の特定の領域がキーボード５を介して指定された場合には、この領域内に含まれる部分的な文書画像（部分画像）を照合画像として選択するものとする。 The collation image selection unit 22 selects a document image input from the image input unit 21 or a document image stored in the hard disk 3 designated via the keyboard 5 as a document image to be collated. Hereinafter, the document image to be collated is referred to as “collation image”. When a specific area in the document image is designated via the keyboard 5, the collation image selection unit 22 selects a partial document image (partial image) included in this area as a collation image. Shall.

また、照合画像選択部２２は、照合画像の照合先となる被照合対象の文書画像を選択する。ここで、被照合対象の文書画像は、例えば、ハードディスク３に予め記憶された一部又は全ての文書画像としてもよいし、キーボード５を介して指定された文書画像を被照合対象の文書画像としてもよい。以下、被照合対象の文書画像を「被照合画像」という。 The collation image selection unit 22 selects a document image to be collated as a collation destination of the collation image. Here, the document image to be collated may be, for example, a part or all of the document images stored in advance in the hard disk 3, or the document image designated via the keyboard 5 is used as the document image to be collated. Also good. Hereinafter, the document image to be collated is referred to as “collated image”.

矩形抽出部２３は、文書画像に含まれた各文字画像の外接矩形を抽出する。ここで「文字画像」とは、所定の言語からなる文字が画像として表されたものを意味する。行切出部２４は、矩形抽出部２３で抽出された外接矩形を連結することで文字行の切り出しを行う。以下、文字行に含まれる外接矩形を「行内矩形」という。 The rectangle extraction unit 23 extracts a circumscribed rectangle of each character image included in the document image. Here, “character image” means an image in which characters in a predetermined language are represented as an image. The line cutout unit 24 cuts out character lines by connecting the circumscribed rectangles extracted by the rectangle extraction unit 23. Hereinafter, a circumscribed rectangle included in a character line is referred to as an “in-line rectangle”.

量子化部２５は、行切出部２４で切り出された文字行に含まれる各行内矩形の特性を表す配置情報を固定段階に量子化する。ここで、行内矩形の特性とは、各行内矩形に対応する文字画像の黒画素密度や文字行内における行内矩形の高さ、始点位置等のパラメータ群であって、行内矩形に固有の配置状態を表すものである。なお、配置情報の量子化については後述する。 The quantization unit 25 quantizes the arrangement information representing the characteristics of the in-line rectangles included in the character line extracted by the line extraction unit 24 at a fixed stage. Here, the characteristics of the in-line rectangle are parameter groups such as the black pixel density of the character image corresponding to each in-line rectangle, the height of the in-line rectangle in the character line, the start point position, and the like. It represents. The quantization of the arrangement information will be described later.

シンボル生成部２６は、量子化部２５により量子化された配置情報の各々を固定種類のシンボルにシンボル化し、文書画像を構成する各文字行に対応する一連のシンボル系列を生成する。以下、文書画像全体についてのシンボル系列を全体シンボル系列という。 The symbol generation unit 26 converts each of the arrangement information quantized by the quantization unit 25 into a fixed type symbol, and generates a series of symbol sequences corresponding to each character line constituting the document image. Hereinafter, a symbol series for the entire document image is referred to as an entire symbol series.

出現頻度集計部２７は、全体シンボル系列内において、所定個のシンボルの組合せからなるシンボル系列が出現する頻度（出現頻度）を算出する。候補画像選定部２８は、照合画像と、当該照合画像の照合先となる被照合画像とについて、出現頻度集計部２７により算出された出現頻度を照合し、より高い相関を有した被照合画像を所定個数選定する。以下、候補画像選定部２８により選定された被照合画像を「候補画像」という。 The appearance frequency totaling unit 27 calculates a frequency (appearance frequency) at which a symbol series composed of a predetermined number of symbols appears in the entire symbol series. The candidate image selection unit 28 collates the appearance frequency calculated by the appearance frequency totaling unit 27 for the collation image and the collation image to be collated with the collation image, and selects the collation image having a higher correlation. Select a predetermined number. Hereinafter, the image to be verified selected by the candidate image selection unit 28 is referred to as “candidate image”.

出現位置分布導出部２９は、照合画像と候補画像との各文書画像において、両文書画像で一致した各シンボル系列に対応する配置情報の何れか又は全てが表す行内矩形に基づき、当該行内矩形の出現位置の分布状態を文書画像毎に夫々導出する。また、出現位置分布導出部２９は、照合画像についての分布状態と、候補画像についての分布状態との類似度を算出し、算出した類似度を対応する候補画像と対応付けてＲＡＭ４等に保持する。 The appearance position distribution deriving unit 29, in each document image of the collation image and the candidate image, based on the in-line rectangle represented by any or all of the arrangement information corresponding to each symbol series matched in both document images, The appearance state distribution state is derived for each document image. Further, the appearance position distribution deriving unit 29 calculates the similarity between the distribution state for the collation image and the distribution state for the candidate image, and stores the calculated similarity in association with the corresponding candidate image in the RAM 4 or the like. .

照合結果選定部３０は、出現位置分布導出部２９により算出された類似度に基づいて、最も高い類似度を有した候補画像を照合結果として選定する。 The matching result selection unit 30 selects a candidate image having the highest similarity as a matching result based on the similarity calculated by the appearance position distribution deriving unit 29.

表示部３１は、画像入力部２１から入力された文書画像や各処理の経過状況等の表示を行うとともに、照合結果選定部３０により選定された候補画像の表示を行う。なお、表示部３１の機能は、図１に示した表示装置６により実現できる。 The display unit 31 displays the document image input from the image input unit 21, the progress of each process, and the like, and displays the candidate image selected by the matching result selection unit 30. The function of the display unit 31 can be realized by the display device 6 shown in FIG.

以下、文書処理装置１００が実行する各種の処理のうち、本実施の形態に特長的な処理である文書照合処理について以下に説明する。 Hereinafter, a document collation process which is a characteristic process of the present embodiment among various processes executed by the document processing apparatus 100 will be described below.

図３は、文書照合処理の手順を示したフローチャートである。まず、照合画像選択部２２は、画像入力部２１から入力される文書画像や、キーボード５を介して指定された文書画像を照合画像として選択する（ステップＳ１）。次いで、照合画像選択部２２は、ステップＳ１で選択した照合画像の照合先となる、被照合画像を選択する（ステップＳ２）。 FIG. 3 is a flowchart showing the procedure of the document matching process. First, the collation image selection unit 22 selects a document image input from the image input unit 21 or a document image designated via the keyboard 5 as a collation image (step S1). Next, the collation image selection unit 22 selects a collation image as a collation destination of the collation image selected in step S1 (step S2).

続いて、矩形抽出部２３、行切出部２４、量子化部２５、シンボル生成部２６及び出現頻度集計部２７は、ステップＳ１、Ｓ２で選択された各文書画像について、出現頻度集計処理を実行する（ステップＳ３）。以下、図４を参照して、ステップＳ３の出現頻度集計処理について説明する。なお、出現頻度集計処理は、照合画像及び被照合画像の各々について行われるものとするが、以下の説明では「文書画像」と総称して説明する。 Subsequently, the rectangle extracting unit 23, the line extracting unit 24, the quantizing unit 25, the symbol generating unit 26, and the appearance frequency totaling unit 27 execute an appearance frequency totaling process for each document image selected in steps S1 and S2. (Step S3). Hereinafter, with reference to FIG. 4, the appearance frequency totaling process in step S3 will be described. It should be noted that the appearance frequency totaling process is performed for each of the collation image and the collation image, but in the following description, it will be collectively referred to as “document image”.

図４は、出現頻度集計処理の手順を示したフローチャートである。まず、矩形抽出部２３は、文書画像に含まれた各文字画像の黒画素に外接する外接矩形を抽出する（ステップＳ３１）。続いて、行切出部２４は、水平方向に隣接する外接矩形同士を連結して文字行に成長させた後、この文字行を夫々切り出す（ステップＳ３２）。 FIG. 4 is a flowchart showing the procedure of the appearance frequency counting process. First, the rectangle extraction unit 23 extracts a circumscribed rectangle circumscribing the black pixel of each character image included in the document image (step S31). Subsequently, the line cutout unit 24 connects the circumscribed rectangles adjacent in the horizontal direction to grow into character lines, and then cuts out each of these character lines (step S32).

ここで、文書画像の行の切り出しについて、図５−１〜図５−３を参照して説明する。矩形抽出部２３は、文書画像（図５−１）について、黒画素の連結成分を求め、それと外接する外接矩形Ａ，Ｂ，Ｃ・・・を求める（図５−２）。そして、行切出部２４は、矩形抽出部２３により求められた外接矩形を、水平方向に隣接する外接矩形同士を連結して文字行Ｚに成長させる（図５−３）。行内矩形の生成及び文字行の切り出しにかかる処理自体は、公知の手法を用いることができるため詳細な説明は省略する。 Here, extraction of a line of a document image will be described with reference to FIGS. The rectangle extraction unit 23 obtains a black pixel connected component for the document image (FIG. 5-1) and obtains circumscribed rectangles A, B, C,. Then, the line cutout unit 24 grows the circumscribed rectangle obtained by the rectangle extracting unit 23 into a character line Z by connecting the circumscribed rectangles adjacent in the horizontal direction (FIG. 5-3). Since the processing itself related to the generation of the in-line rectangle and the cut-out of the character line can use a known method, a detailed description thereof will be omitted.

なお、文書画像から一つの文字行として切り出す単位は、行単位や段落単位、章単位等で切り出すことが好ましい。一般的に文書画像に含まれる文字画像のサイズは、行単位や段落単位、章単位で均一となるため、このような纏まりで文字行を切り出すことで、当該文字行内に含まれる文字画像のサイズ（文字サイズ）を揃えることが可能となる。また、本実施形態では、外接矩形の成長を水平方向で実施する態様としたが、これに限らず、文字方向等に応じて垂直方向、或いは、水平方向及び垂直方向の両方で実施する態様としてもよい。 It should be noted that it is preferable to cut out a single character line from a document image in line units, paragraph units, chapter units, or the like. In general, the size of a character image included in a document image is uniform for each line, paragraph, or chapter. Therefore, by cutting out a character line in such a group, the size of the character image included in the character line is as follows. (Character size) can be made uniform. In this embodiment, the circumscribed rectangle is grown in the horizontal direction. However, the present invention is not limited to this. As an aspect in which the growth is performed in the vertical direction or in both the horizontal direction and the vertical direction according to the character direction. Also good.

図４に戻り、量子化部２５及びシンボル生成部２６は、ステップＳ３２で切り出した各文字行について、シンボル生成処理を実行する。以下、図６を参照してステップＳ３３のシンボル生成処理について説明する。 Returning to FIG. 4, the quantization unit 25 and the symbol generation unit 26 execute symbol generation processing for each character line cut out in step S 32. Hereinafter, the symbol generation processing in step S33 will be described with reference to FIG.

図６は、シンボル生成処理の手順を示したフローチャートである。まず、量子化部２５は、ステップＳ３２で切り出された各文字行の高さを計測する（ステップＳ３３１）。 FIG. 6 is a flowchart showing a procedure of symbol generation processing. First, the quantization unit 25 measures the height of each character line cut out in step S32 (step S331).

次いで、量子化部２５は、各文字行に含まれる各行内矩形の水平方向の始点（Ｘｓ）に基づいて、当該行内矩形を昇順にソートすることで配置順序を整列する（ステップＳ３３２）。続いて、量子化部２５は、整列した各行内矩形の配置状態を表す配置情報を夫々取得し、この配置情報を固定段階に量子化する（ステップＳ３３３）。以下、図７−１、図７−２、図８および図９を参照して、ステップＳ３３２、Ｓ３３３の処理を説明する。 Next, the quantization unit 25 sorts the in-line rectangles in ascending order based on the horizontal start point (Xs) of the in-line rectangles included in each character line (step S332). Subsequently, the quantization unit 25 obtains arrangement information indicating the arrangement state of the aligned in-row rectangles, and quantizes the arrangement information in a fixed stage (step S333). Hereinafter, with reference to FIG. 7A, FIG. 7B, FIG. 8, and FIG. 9, the processing in steps S332 and S333 will be described.

図７−１および図７−２は、行内矩形の配置例を示す説明図である。欧米系文字行は、図７−１に示すように、大文字と小文字とが混在していることに加え、アポストロフィー、アクサンテギュ、ウムラウトなど、記号類の有無が存在するので、行内矩形の始点の高さは、図７−１のａの位置とｂの位置との２カ所に集中することは明らかである。つまり、矩形の配置位置は上下に対称ではない。一方、アジア系文字行は、図７−２に示すように、漢字、ひらがな、カタカナ、ハングルなど、文字の構造が複雑であり、行内矩形の始点の高さは、欧米系文字行で見られるような、２カ所への明確な集中はない。しかし、矩形の配置位置が上下左右、対称ではないことは、欧米系行と同じである。 FIG. 7A and FIG. 7B are explanatory diagrams illustrating an example of arrangement of in-row rectangles. As shown in Fig. 7-1, Western character lines have mixed uppercase and lowercase letters, and there are symbols such as apostrophe, axis, umlaut, etc. It is clear that the heights are concentrated at two positions, a position and b position in FIG. That is, the rectangular arrangement positions are not symmetrical vertically. Asian character lines, on the other hand, have a complicated character structure such as kanji, hiragana, katakana, and hangul, as shown in Fig. 7-2. There is no clear concentration in two places. However, the arrangement position of the rectangle is not symmetrical in the up / down / left / right direction, which is the same as the Western line.

図７−１の欧文文字の行内矩形と、図７−２のアジア系文字の行内矩形とを比較してみると、行内矩形の並び方は言語の種類に関わらず、その文字行の内容に応じて変化していることがわかる。そこで、文字の外接矩形を抽出することで、文字の大まかな特徴を捉えることができる。すなわち、文字そのものを特定しなくても、例えば図８に示すように、矩形座標の始点（Ｘｓ，Ｙｓ）と終点（Ｘｅ，Ｙｅ）を求め、これを利用した文字画像の外接矩形の配置状態を表す特徴を取得するだけで各文字行の画像特徴を捉えることができる。 Comparing the in-line rectangle of the European characters in Fig. 7-1 with the in-line rectangle of the Asian characters in Fig. 7-2, the arrangement of the in-line rectangles depends on the contents of the character line, regardless of the language type. It can be seen that it has changed. Therefore, by extracting the circumscribed rectangle of the character, it is possible to capture a rough feature of the character. That is, even if the character itself is not specified, for example, as shown in FIG. 8, the start point (Xs, Ys) and end point (Xe, Ye) of the rectangular coordinates are obtained, and the arrangement state of the circumscribed rectangle of the character image using this The image feature of each character line can be captured simply by acquiring the feature representing

行内矩形の配置位置が同じであっても、欧米系文字は構造が単純なためアジア系文字と較べて矩形内の黒画素密度は低くなる。なお、アジア系文字においても、構造が簡単なひらがな、カタカナの黒画素密度は低く、構造が複雑な漢字の黒画素密度が高くなることは言うまでもない。 Even if the arrangement positions of the in-line rectangles are the same, Western characters have a simple structure, so the density of black pixels in the rectangles is lower than that of Asian characters. Of course, even in Asian characters, the black pixel density of hiragana and katakana with a simple structure is low, and the black pixel density of kanji with a complicated structure is high.

このように、文字行内における一つの矩形の配置状態は、行内矩形の始点の高さ、矩形サイズ（幅、高さ）行内矩形中の黒画素密度等を計測することによって唯一に定義することができる。ステップＳ３３３の処理では、これら計測結果を配置情報として各文字行の行内矩形毎に取得し、固定段階に量子化する。 Thus, the arrangement state of one rectangle in a character line can be uniquely defined by measuring the height of the start point of the in-line rectangle, the rectangle size (width, height), the black pixel density in the in-line rectangle, and the like. it can. In the process of step S333, these measurement results are acquired as arrangement information for each in-line rectangle of each character line, and quantized to a fixed stage.

以下では、行内矩形の始点の高さを基準にして行内矩形の配置状態を定義する一例を示す。図９は、行内矩形の配置状態を示す特徴を量子化する方法を示す説明図である。原稿を特定していない状況下では、行高さは可変であり、処理が行高さの値に依存しないように、行内矩形の高さを次式で正規化する。なお、ｙｓは行内矩形始点の高さ、ＨはステップＳ３３２で取得した行高を意味する。
ＹｓＲａｔｅ＝ｙｓ／Ｈ・・・（１） In the following, an example of defining the arrangement state of the in-line rectangle with reference to the height of the start point of the in-line rectangle is shown. FIG. 9 is an explanatory diagram showing a method of quantizing the feature indicating the arrangement state of the in-line rectangle. Under the situation where the document is not specified, the line height is variable, and the height of the in-line rectangle is normalized by the following expression so that the processing does not depend on the value of the line height. In addition, ys means the height of the in-line rectangle start point, and H means the line height acquired in step S332.
YsRate = ys / H (1)

ここで、０＜ＹｓＲａｔｅ≦１であるから、ＹｓＲａｔｅを固定段階に量子化することは容易である。例えば、Ｎ段階に量子化するなら、
ＹｓＶａｌ＝ＩＮＴ（ＹｓＲａｔｅ＊（Ｎ−１））・・・（２）
（ただし、ＩＮＴ（）：小数点以下切捨て）
とすればよい。各段階は、０〜（Ｎ−１）とラベル付けされる。矩形幅ｗおよび矩形高さｈも同様の手順で量子化される。 Here, since 0 <YsRate ≦ 1, it is easy to quantize YsRate in a fixed stage. For example, if you quantize to N stages,
YsVal = INT (YsRate * (N−1)) (2)
(However, INT (): rounded down to the nearest decimal point)
And it is sufficient. Each stage is labeled 0- (N-1). The rectangular width w and the rectangular height h are also quantized in the same procedure.

ところで、記憶容量節約および演算量低減のためなどの理由で、画像処理においては原画像そのものではなく圧縮画像を処理対象にする場合が多い。圧縮画像は、画素数が減るために文字画像の細部に関する情報は失われる。本発明は、図９に示すように、文字画像の外接矩形に注目するものであり、画像そのものの詳細な特徴に基づくものではない。したがって、原画像だけでなく、圧縮画像に対しても有効に機能しうる。 By the way, in order to save storage capacity and reduce the amount of calculation, in image processing, a compressed image is often used as a processing target instead of an original image itself. Since the compressed image has a reduced number of pixels, information on the details of the character image is lost. As shown in FIG. 9, the present invention focuses on the circumscribed rectangle of the character image, and is not based on the detailed features of the image itself. Therefore, it can function effectively not only for the original image but also for the compressed image.

なお、上記では文字行画像の特徴として行内矩形の始点の高さを基準としたが、これに限定されない。例えば、文字行画像の特徴として行内矩形の高さを用いる場合は、図９において、次のとおりである。
ＨｅｉｇｈｔＲａｔｅ＝ｈ／Ｈ・・・（３）
ＨｅｉｇｈｔＶａｌ
＝ＩＮＴ（ＨｅｉｇｈｔＲａｔｅ＊（Ｎ−１））＋０．５・・・（４）
（ただし、ＩＮＴ（）：小数点以下切捨て）
各段階は、０〜（Ｎ−１）とラベル付けされる。 In the above description, the character line image is characterized by the height of the start point of the in-line rectangle, but the present invention is not limited to this. For example, in the case where the height of the in-line rectangle is used as the feature of the character line image, it is as follows in FIG.
HeightRate = h / H (3)
HeightVal
= INT (HeightRate * (N-1)) + 0.5 (4)
(However, INT (): rounded down to the nearest decimal point)
Each stage is labeled 0- (N-1).

また、文字行画像の特徴として行内矩形の幅を用いる場合は、次のとおりである。
ＷｉｄｔｈＲａｔｅ＝ｗ／Ｈ・・・（５）
ＷｉｄｔｈＶａｌ
＝ＩＮＴ（ＷｉｄｔｈＲａｔｅ＊（Ｎ−１））＋０．５・・・（６）
（ただし、ＩＮＴ（）：小数点以下切捨て）
各段階は、０〜（Ｎ−１）とラベル付けされる。 Further, when the width of the in-line rectangle is used as a feature of the character line image, it is as follows.
WidthRate = w / H (5)
WidthVal
= INT (WidthRate * (N-1)) + 0.5 (6)
(However, INT (): rounded down to the nearest decimal point)
Each stage is labeled 0- (N-1).

図５に戻り、続いて、シンボル生成部２６は、ステップＳ３３３で量子化された配置情報の各々を固定種類のシンボルにシンボル化した後（ステップＳ３３４）、図４のステップＳ３４の処理に移行する。 Returning to FIG. 5, subsequently, the symbol generation unit 26 converts each of the arrangement information quantized in step S 333 into a fixed type symbol (step S 334), and then proceeds to the process of step S 34 in FIG. 4. .

以下、図１０および図１１を参照し、ステップＳ３３４の処理について説明する。上述したとおり、ステップＳ３３３で取得された配置情報は、対応する行内矩形の配置状態を特徴付けるものとなっている。ステップＳ３３４の処理では、量子化された配置情報に含まれる複数種類の測定結果を一つにまとめてシンボル化することで、一つの行内矩形を一つのシンボルに対応させる。 Hereinafter, the processing in step S334 will be described with reference to FIGS. As described above, the arrangement information acquired in step S333 characterizes the arrangement state of the corresponding in-line rectangle. In the process of step S334, a plurality of types of measurement results included in the quantized arrangement information are grouped into one symbol to make one in-line rectangle correspond to one symbol.

例えば、矩形の始点の高さ、矩形高さ、矩形幅の３種の情報をまとめる。仮に、前述の処理で、矩形の始点の高さ（ｙｓ／Ｈ）を１５段階、矩形高さ（ｈ／Ｈ）を８段階、矩形幅（ｗ／Ｈ）を２段階に量子化するとする。この結果、図１０に示すように、各情報は、矩形の始点の高さ（ｙｓ／Ｈ）は１５段階であるから４ｂｉｔｓ、矩形高さ（ｈ／Ｈ）は８段階であるから３ｂｉｔｓ、矩形幅（ｗ／Ｈ）は２段階であるから１ｂｉｔで表現することができる。また、
４ｂｉｔｓ＋３ｂｉｔｓ＋１ｂｉｔ＝８ｂｉｔｓ
であるから、１ｂｙｔｅの各ビットに全情報を格納することができる。そして、これらの３種の情報を一つにまとめたシンボルの種類は、
１５段階×８段階×２段階＝２４０種
となる。 For example, three kinds of information of the height of the start point of the rectangle, the rectangle height, and the rectangle width are collected. Suppose that the height of the rectangular starting point (ys / H) is quantized to 15 levels, the rectangular height (h / H) is quantized to 8 levels, and the rectangular width (w / H) is quantized to 2 levels. As a result, as shown in FIG. 10, each piece of information has a rectangular start point height (ys / H) of 15 levels, 4 bits, and a rectangular height (h / H) of 8 levels, 3 bits, a rectangle. Since the width (w / H) has two stages, it can be expressed by 1 bit. Also,
4bits + 3bits + 1bit = 8bits
Therefore, all information can be stored in each bit of 1 byte. And the type of symbol that combines these three types of information into one,
15 stages × 8 stages × 2 stages = 240 types.

ところで、矩形の配置状態を表す複数の特徴を多次元ベクトルの各次元とみなせば、矩形は、その各特徴を用いて一つのベクトルデータに変換（ベクトル量子化）できる。ベクトル量子化とは、周知のように、ベクトルデータの多数のバラエティから、それらを代表する少数のベクトルデータを求めることである。求められた代表ベクトルに順にラベル付けすれば、ベクトルデータの系列を単なる一次元のシンボルデータの系列に変換することができる。ベクトル量子化に関しては、「ベクトル量子化と情報圧縮」（コロナ社）ＡｌｌｅｎＧｅｒｓｈｏ，ＲｏｂｅｒｔＭ．Ｇｒａｙ著、田崎三郎ほか訳、に詳しい。 By the way, if a plurality of features representing a rectangular arrangement state are regarded as each dimension of a multidimensional vector, the rectangle can be converted into one vector data (vector quantization) using each feature. As is well known, vector quantization is to obtain a small number of vector data representing them from a large variety of vector data. By labeling the obtained representative vectors in order, the vector data series can be converted into a simple one-dimensional symbol data series. For vector quantization, see “Vector quantization and information compression” (Corona) Allen Gersho, Robert M. et al. Familiar with Gray, Saburo Tazaki et al.

なお、まとめる情報の種類及びその格納のための記憶エリアは、記憶サイズは固定ではなく、識別対象である文字行を特定するのに好適な情報を適宜選択し、決定することが可能であることは言うまでもない。また、図１０では、矩形の始点の高さ、矩形高さ、矩形幅についてシンボル化する例を示したが、これに限らず、上述した黒画素密度などの配置情報を含めてシンボル化する態様としてもよい。 Note that the type of information to be collected and the storage area for storing the information are not fixed in storage size, and it is possible to appropriately select and determine information suitable for specifying the character line to be identified. Needless to say. In addition, in FIG. 10, an example in which the height of the start point of the rectangle, the height of the rectangle, and the rectangle width are symbolized is shown. It is good.

以上の作業を経ることによって、シンボル生成部２６は、各文字行に含まれる行内矩形を、固定個のシンボル（ラベル）に変換することができる。したがって、実際の行内矩形の配置は、図１１に示すような単なるシンボルの並びとみなすことができる。これで、シンボル系列の並び傾向を記録することができ、行内矩形の並び傾向を記録することと等価となる。 Through the above operation, the symbol generation unit 26 can convert the in-line rectangle included in each character line into a fixed number of symbols (labels). Therefore, the actual arrangement of the in-line rectangles can be regarded as a simple symbol arrangement as shown in FIG. Thus, the arrangement tendency of the symbol series can be recorded, which is equivalent to recording the arrangement tendency of the in-line rectangles.

図４に戻り、出現頻度集計部２７は、ステップＳ３４でシンボル化した各配置情報に対して、所定個のシンボルの組合せからなるシンボル系列の出現頻度を照合画像及び被照合画像の各々について夫々算出、集計し（ステップＳ３４）、図３のステップＳ４の処理に移行する。 Returning to FIG. 4, the appearance frequency totaling unit 27 calculates the appearance frequency of a symbol series composed of a combination of a predetermined number of symbols for each of the collation image and the collation image for each arrangement information symbolized in step S 34. Then, the calculation is performed (step S34), and the process proceeds to step S4 in FIG.

以下、ステップＳ３４の処理について説明する。配置情報がシンボル化された後には、テキスト検索と同様に、一般的な検索手法によって検索することが可能になる。つまり、照合画像と被照合画像についてシンボル系列間の完全一致を求めればよい。ただし、文字行画像の読み取り誤差によって、文字矩形の特徴の計測結果は異なるので、文字行が同一であっても、そのシンボル変換結果が同一にならない場合もある。よって、シンボル系列の完全一致を求めるのみでは、同一文字行画像を検索できない虞がある。 Hereinafter, the process of step S34 will be described. After the arrangement information is symbolized, the search can be performed by a general search method as in the text search. That is, it is only necessary to obtain complete matching between symbol sequences for the collation image and the collation image. However, since the measurement result of the feature of the character rectangle differs depending on the reading error of the character line image, the symbol conversion result may not be the same even if the character lines are the same. Therefore, there is a possibility that the same character line image cannot be searched only by obtaining a complete match of the symbol series.

そこで、ステップＳ３４の処理では、シンボル系列の完全一致ではなく、シンボル系列の並び傾向の相関を求める。具体的には、照合画像及び被照合画像について生成された全シンボル系列の各々における、所定個のシンボルの組みからなるシンボル系列の出現頻度を算出し集計する。
以下、詳述する。 Therefore, in the process of step S34, not the complete matching of the symbol series but the correlation of the arrangement tendency of the symbol series is obtained. Specifically, the appearance frequency of a symbol series composed of a predetermined number of symbols in each of all the symbol series generated for the collation image and the collation image is calculated and aggregated.
Details will be described below.

並びの傾向を記録する手段としては、ｎ−ｇｒａｍモデルがある。ｎ−ｇｒａｍモデルは、クロード・エルウッドシャノンによって提案された言語モデルである。このモデルでは、系列中のシンボルの出現が、直前のｎ個（ｎは自然数）のシンボルに影響されるとしている。現在の状態がｎ個前の入力に依存して決まる確率プロセスをｎ重マルコフ過程と呼び、ｎ−ｇｒａｍモデルは（ｎ−１）重マルコフモデルとも呼ばれる。特に、ｎ＝３の場合をｔｒｉｇｒａｍと呼び、広く使用されている。 There is an n-gram model as a means for recording the tendency of arrangement. The n-gram model is a language model proposed by Claude Elwood Shannon. In this model, the appearance of a symbol in a sequence is influenced by the immediately preceding n symbols (n is a natural number). A stochastic process whose current state is determined depending on the n-th previous input is called an n-fold Markov process, and the n-gram model is also called an (n-1) -fold Markov model. In particular, the case of n = 3 is called trigram and is widely used.

具体的には、下記式（７）で示されるモデルである。さらに、式（８）にしたがって、照合画像及び被照合画像の各全シンボル系列から３つのシンボルの組みからなるシンボル系列（ｔｒｉｇｒａｍ）の出現頻度を夫々算出する。

Specifically, it is a model represented by the following formula (7). Furthermore, according to the equation (8), the appearance frequency of a symbol series (trigram) composed of a set of three symbols is calculated from all the symbol series of the collation image and the collation image.

一方で、ｔｒｉｇｒａｍの出現頻度順位を求め、出現頻度の高い順にｔｒｉｇｒａｍを集計する。表１に、ｔｒｉｇｒａｍ集計結果の一例を示す。

On the other hand, the appearance frequency rank of the trigram is obtained, and the trigrams are aggregated in descending order of the appearance frequency. Table 1 shows an example of the trigram count result.

表１において、出現頻度はｔｒｉｇｒａｍに示した３つ組みのシンボル系列、即ち三つの行内矩形を表す配置情報が、全体シンボル系列中にこの順序で出現する頻度を表している。例えば、ｔｒｉｇｒａｍ［s013，s045，s032］では、s013，s045の後ろにs032が出現する頻度が324であり、ｔｒｉｇｒａｍ［s013，s064，s033］では、s013，s064の後ろにs033が出現する頻度が312であることを示している。このように、文書画像の全シンボル系列に関して表１に示したようなｔｉｇｒａｍ集計結果を求めることが、各文書画像の特徴を求めること（学習）に相当する。 In Table 1, the appearance frequency represents the frequency at which arrangement information representing the triple symbol series shown in the trigram, that is, three in-line rectangles, appears in this order in the entire symbol series. For example, in trigram [s013, s045, s032], the frequency of s032 appearing after s013, s045 is 324, and in trigram [s013, s064, s033], the frequency of s033 appearing after s013, s064 312 is shown. As described above, obtaining the tiger count result as shown in Table 1 for all the symbol sequences of the document image corresponds to obtaining (learning) the characteristics of each document image.

以上の動作を経ることによって、出現頻度集計部２７は、照合画像および被照合画像の各文書画像について、表１に示したようなｔｒｉｇｒａｍの出現確率の集計結果を導出する。 Through the above operation, the appearance frequency totaling unit 27 derives the total result of the occurrence probability of the trigram as shown in Table 1 for each document image of the collation image and the collated image.

続いて、候補画像選定部２８は、ステップＳ３の処理で導出された照合画像に対応する集計結果（照合画像集計結果）と、被照合画像に対応する集計結果（被照合画像集計結果）とを照合し、より高い相関を有した上位ｎ個の被照合画像を候補画像として選定する（ステップＳ４）。ここで、「ｎ」は１以上の整数であって、任意の値を設定することが可能であるものとする。 Subsequently, the candidate image selection unit 28 obtains a totaling result (matching image totaling result) corresponding to the collation image derived in the process of step S3 and a totaling result (matching image totaling result) corresponding to the collated image. The top n checked images having higher correlation are selected as candidate images (step S4). Here, “n” is an integer of 1 or more, and an arbitrary value can be set.

照合画像集計結果と、被照合画像集計結果とを照合する場合、一つの文字行に含まれる行内矩形の個数は同値とならないことが多いため、出現頻度そのものを比較することは有意ではない。そのため、ステップＳ４では、下記式（９）に示した順位相関係数を用いることで、照合画像集計結果と、被照合画像集計結果との相関を判定する。
Ｒｘｙ＝１−（６＊Σ（Ｒｘｉ−Ｒｙｉ）＾２）／（ｎ＊（ｎ＾２−１））・・・（９） When the collation image aggregation result and the collated image aggregation result are collated, since the number of in-line rectangles included in one character line often does not have the same value, it is not significant to compare the appearance frequencies themselves. Therefore, in step S4, the correlation between the collation image aggregation result and the collated image aggregation result is determined by using the rank correlation coefficient expressed by the following equation (9).
Rxy = 1− (6 * Σ (Rxi−Ryi) ^ 2) / (n * (n ^ 2-1)) (9)

ここで、ｎはデータ数、Ｒｘｉは照合画像集計結果の順位毎の出現頻度、Ｒｙｉは被照合画像集計結果の順位毎の出現頻度を意味しており、各順位についてＲｘｉとＲｙｉとの差を二乗した値の総和がΣにより演算されるようになっている。なお、順位相関係数に関しては、「ノンパラメトリック法」（培風館）柳川尭著に詳しい。 Here, n means the number of data, Rxi means the appearance frequency for each rank of the collated image aggregation results, and Ryi means the appearance frequency for each rank of the collated image aggregation results, and the difference between Rxi and Ryi for each rank. The sum of the squared values is calculated by Σ. The rank correlation coefficient is detailed in “Non-parametric method” (Baifukan) by Yanagawa.

候補画像選定部２８は、照合画像集計結果と、被照合画像集計結果とに含まれる各出現頻度について、順位相関係数Ｒｘｙを算出し、被照合画像のうち、Ｒｘｙの値が“１”に近いものからｎ個分の被照合画像を候補画像として選定する。なお、順位相関係数を統計的に検定し、最大の順位相関係数が有意な値を示さない場合には、照合画像に類似する被照合画像はない、と判断することとしてもよい。 The candidate image selection unit 28 calculates the rank correlation coefficient Rxy for each appearance frequency included in the collation image aggregation result and the collation image aggregation result, and among the collation images, the value of Rxy is “1”. N images to be compared are selected as candidate images from the closest. Note that the rank correlation coefficient may be statistically tested, and if the maximum rank correlation coefficient does not show a significant value, it may be determined that there is no image to be collated similar to the collation image.

ここで、図１２を参照して、上述したステップＳ１〜Ｓ４迄の処理の概要を説明する。ステップＳ１、Ｓ２の処理において、照合画像Ｘと、複数の被照合画像Ｙとが選択されると、ステップＳ３の処理では、これら文書画像を構成する各文字行に含まれた行内矩形の各々が、配置情報に基づいてシンボル化され、照合画像Ｘについての全シンボル系列Ｘ１と、各被照合画像Ｙについての全シンボル系列Ｙ１とが夫々生成される。そして、全シンボル系列中における、ｔｒｉｇｒａｍの出現頻度が集計されることで照合画像Ｘに対応する照合画像集計結果Ｘ２と、被照合画像Ｙの夫々に対応する被照合画像集計結果Ｙ２とが導出される。続いて、照合画像集計結果Ａ２に含まれた順位毎の出現頻度と、被照合画像集計結果Ｂ２の夫々に含まれた順位毎の出現頻度と、に基づいて順位相関係数Ｒｘｙが算出される。 Here, with reference to FIG. 12, the outline | summary of the process to step S1-S4 mentioned above is demonstrated. When a collation image X and a plurality of collated images Y are selected in the processes of steps S1 and S2, in the process of step S3, each of the in-line rectangles included in each character line constituting these document images is displayed. The symbols are converted into symbols based on the arrangement information, and all symbol sequences X1 for the matching image X and all symbol sequences Y1 for each matching image Y are generated. Then, by summing up the appearance frequency of the trigram in all the symbol sequences, a collation image aggregation result X2 corresponding to the collation image X and a collation image aggregation result Y2 corresponding to each of the collation images Y are derived. The Subsequently, the rank correlation coefficient Rxy is calculated based on the appearance frequency for each rank included in the collation image aggregation result A2 and the appearance frequency for each rank included in the collated image aggregation result B2. .

続くステップＳ４において、ステップＳ３で算出された被照合画像Ｙ毎の順位相関係数Ｒｘｙの値に基づいて、この値が “１”に近いものからｎ個分の被照合画像Ｙが候補画像として選定されることになる。 In subsequent step S4, based on the value of the rank correlation coefficient Rxy for each image to be verified Y calculated in step S3, n images to be verified Y having the value close to “1” are used as candidate images. Will be selected.

図３に戻り、出現位置分布導出部２９は、出現位置分布照合処理を実行する（ステップＳ５）。以下、ステップＳ５の出現位置分布照合処理について説明する。 Returning to FIG. 3, the appearance position distribution deriving unit 29 executes an appearance position distribution matching process (step S5). Hereinafter, the appearance position distribution matching process in step S5 will be described.

図１３は、出現位置分布照合処理の手順を示したフローチャートである。まず、出現位置分布導出部２９は、ステップＳ４の処理で選定されたｎ個の候補画像から、本処理の対象とする候補画像を一つ選択する（ステップＳ５１）。 FIG. 13 is a flowchart showing the procedure of the appearance position distribution matching process. First, the appearance position distribution deriving unit 29 selects one candidate image as a target of the present process from the n candidate images selected in the process of Step S4 (Step S51).

続いて、出現位置分布導出部２９は、ステップＳ５１で処理対象とした候補画像の被照合画像集計結果と、照合画像の照合画像集計結果とに基づいて、両文書画像の間で一致するｔｒｉｇｒａｍ、即ち三つのシンボルの組みからなるシンボル系列を選択する（ステップＳ５２）。ここで、選択するｔｒｉｇｒａｍの個数は特に問わないものとするが、より出現頻度の高いｔｒｉｇｒａｍを選択することが好ましい。また、ｔｒｉｇｒａｍを構成する三つのシンボルのうち、何れかのシンボルを選択する態様としてもよい。 Subsequently, the appearance position distribution deriving unit 29 matches the trigrams between the two document images based on the collated image aggregation result of the candidate image to be processed in step S51 and the collation image aggregation result of the collation image. That is, a symbol series composed of a set of three symbols is selected (step S52). Here, the number of trigrams to be selected is not particularly limited, but it is preferable to select a trigram having a higher appearance frequency. Moreover, it is good also as an aspect which selects any symbol among the three symbols which comprise trigram.

次いで、出現位置分布導出部２９は、照合画像と処理対象の候補画像とについて、文書画像の水平方向および垂直方向における、ステップＳ５２で選択したシンボル系列に対応する行内矩形の出現位置の分布状態をヒストグラム（度数分布ヒストグラム）として導出する（ステップＳ５３）。 Next, the appearance position distribution deriving unit 29 determines the distribution state of the appearance position of the in-line rectangle corresponding to the symbol series selected in step S52 in the horizontal direction and the vertical direction of the document image for the collation image and the candidate image to be processed. Derived as a histogram (frequency distribution histogram) (step S53).

図８に示したように、行内矩形は始点（Ｘｓ、Ｙｓ）と終点（Ｘｅ、Ｙｅ）との２点により表現される。そのため、水平方向（Ｘ軸）に関して分布をとる場合、始点Ｘｓについてヒストグラムを生成すればよく、垂直方向（Ｙ軸）に関しては分布をとる場合、始点Ｙｓについてヒストグラムを生成すればよい。 As shown in FIG. 8, the in-line rectangle is represented by two points, a start point (Xs, Ys) and an end point (Xe, Ye). Therefore, when taking a distribution in the horizontal direction (X axis), a histogram may be generated for the start point Xs, and when taking a distribution in the vertical direction (Y axis), a histogram may be generated for the start point Ys.

図１４は、行内矩形の存在位置の分布状態をヒストグラムで表現した一例を示した図である。同図に示したように、照合画像と被照合画像との両文書画像の間で一致した行内矩形（図中Ｋ）について、文書画像の水平方向と垂直方向でのヒストグラムを夫々導出する。ヒストグラム集計にあたっての集計幅は、特に問わないものとするが、例えば、ステップＳ３の処理で切り出した各文字行の高さの平均値程度とすることとしてもよい。 FIG. 14 is a diagram showing an example in which the distribution state of the existence positions of the in-line rectangles is represented by a histogram. As shown in the figure, the histograms in the horizontal direction and the vertical direction of the document image are derived for the in-line rectangles (K in the figure) that coincide between the document images of the collation image and the collation image. The total width for the histogram total is not particularly limited, but may be, for example, about the average value of the height of each character line cut out in the process of step S3.

図１３に戻り、次に出現位置分布導出部２９は、ステップＳ５３で求めた両ヒストグラムを照合し、その類似度を算出する（ステップＳ５４）。なお、本実施形態では両ヒストグラムの照合方法として、メジアン（中央値）、モード（最頻値）、平均の各々が属するデータ区間のヒストグラム値を、両ヒストグラムの間で比較するものとする。 Returning to FIG. 13, next, the appearance position distribution deriving unit 29 collates both histograms obtained in step S53, and calculates the similarity (step S54). In this embodiment, as a method for comparing both histograms, the histogram values of the data section to which each of median (median value), mode (mode), and average belong are compared between the two histograms.

具体的には、ヒストグラムのデータ区間を座標の小さいものから順次番号付けし、メジアン、モード、平均の所属するデータ区間の番号を求める。ここで、メジアン、モード、平均の所属するデータ区間番号を（MedianClassNo, ModeClassNo, AvClassNo）と表現すれば、以下の４種の組が求められる。
（MedianClassNoXaxQuery,ModeClassNoXaxQuery,AvClassNoXaxQuery）・・・（１０）
（MedianClassNoYaxQuery,ModeClassNoYaxQuery,AvClassNoYaxQuery）・・・（１１）
（MedianClassNoXaxDB,ModeClassNoXaxDB,AvClassNoXaxDB）・・・（１２）
（MedianClassNoYaxDB,ModeClassNoYaxDB,AvClassNoYaxDB）・・・（１３） Specifically, the data sections of the histogram are sequentially numbered from the smallest coordinate, and the number of the data section to which the median, mode, and average belong is obtained. Here, if the data section number to which the median, mode, and average belong is expressed as (MedianClassNo, ModeClassNo, AvClassNo), the following four types of sets are obtained.
(MedianClassNoXaxQuery, ModeClassNoXaxQuery, AvClassNoXaxQuery) (10)
(MedianClassNoYaxQuery, ModeClassNoYaxQuery, AvClassNoYaxQuery) (11)
(MedianClassNoXaxDB, ModeClassNoXaxDB, AvClassNoXaxDB) (12)
(MedianClassNoYaxDB, ModeClassNoYaxDB, AvClassNoYaxDB) (13)

なお、「XaxQuery」は、照合画像の水平方向のヒストグラムを意味するものであり、上記（１０）式は、照合画像の水平方向のヒストグラムにおける、該当するデータ区間番号のヒストグラム値を夫々意味する。また、「YaxQuery」は、照合画像の垂直方向のヒストグラムを意味するものであり、上記（１１）式は、照合画像の垂直方向のヒストグラムにおける、該当するデータ区間番号のヒストグラム値を夫々意味する。また、「XaxDB」は、被照合画像の水平方向のヒストグラムを意味するものであり、上記（１２）式は、被照合画像の水平方向のヒストグラムにおける、該当するデータ区間番号のヒストグラム値を夫々意味する。また、「YaxDB」は、被照合画像の垂直方向のヒストグラムを意味するものであり、上記（１３）式は、被照合画像の垂直方向のヒストグラムにおける、該当するデータ区間番号のヒストグラム値を夫々意味する。 “XaxQuery” means a horizontal histogram of the collation image, and the above equation (10) means a histogram value of the corresponding data section number in the horizontal histogram of the collation image. “YaxQuery” means a vertical histogram of the collation image, and the above equation (11) means a histogram value of the corresponding data section number in the vertical histogram of the collation image. “XaxDB” means a horizontal histogram of the image to be verified, and the above equation (12) means a histogram value of the corresponding data section number in the horizontal histogram of the image to be verified. To do. “YaxDB” means the vertical histogram of the image to be verified, and the above equation (13) means the histogram value of the corresponding data section number in the vertical histogram of the image to be verified. To do.

出現位置分布導出部２９は、上記４種の組の値を算出した後、下記（１４）〜（１６）式を用いて、垂直方向についての照合画像のヒストグラムと、被照合画像のヒストグラムとの形状の類似度を算出する。
MedianClassNoXaxDB＋CA=MedianClassNoXaxQuery ・・・（１４）
ModeClassNoXaxDB＋CA=ModeClassNoXaxQuery ・・・（１５）
AvClassNoXaxDB＋CA=AvClassNoXaxQuery ・・・（１６） After calculating the above four types of values, the appearance position distribution deriving unit 29 uses the following equations (14) to (16) to calculate the matching image histogram and the matching image histogram in the vertical direction. Calculate the similarity of shapes.
MedianClassNoXaxDB + CA = MedianClassNoXaxQuery (14)
ModeClassNoXaxDB + CA = ModeClassNoXaxQuery (15)
AvClassNoXaxDB + CA = AvClassNoXaxQuery (16)

上記（１４）〜（１６）式において、「ＣＡ」は定数であって、最初に処理する１式（例えば（１４）式）から求まる値である。出現位置分布導出部２９は、この定数ＣＡの値が残りの２式にて成立するか否か、つまり、残り２式での定数ＣＡからのずれの度合いを、照合画像のヒストグラムと、被照合画像のヒストグラムとの形状の類似度として算出する。なお、定数ＣＡからのずれの度合いは、例えば、ＣＡ’／ＣＡを算出することで導出できる。ここで、ＣＡ’は、ＣＡ＋α（αは定数ＣＡからのずれ値）であり、完全一致する際のずれの度合い、即ち類似度は“１”となる。 In the above formulas (14) to (16), “CA” is a constant and is a value obtained from one formula (for example, formula (14)) to be processed first. The appearance position distribution deriving unit 29 determines whether or not the value of the constant CA is satisfied by the remaining two expressions, that is, the degree of deviation from the constant CA in the remaining two expressions, the histogram of the collation image, and the collation target It is calculated as the similarity of the shape with the histogram of the image. The degree of deviation from the constant CA can be derived, for example, by calculating CA ′ / CA. Here, CA ′ is CA + α (α is a deviation value from the constant CA), and the degree of deviation when completely matching, that is, the similarity is “1”.

また、同様に出現位置分布導出部２９は、下記（１７）〜（１９）式を用いて、垂直方向についての、照合画像のヒストグラムと、被照合画像のヒストグラムとの形状の類似度を算出する。
MedianClassNoYaxDB＋CB=MedianClassNoYaxQuery ・・・（１７）
ModeClassNoYaxDB＋CB=ModeClassNoYaxQuery ・・・（１８）
AvClassNoYaxDB＋CB=AvClassNoYaxQuery ・・・（１９） Similarly, the appearance position distribution deriving unit 29 calculates the similarity between the shapes of the matching image histogram and the matching image histogram in the vertical direction using the following equations (17) to (19). .
MedianClassNoYaxDB + CB = MedianClassNoYaxQuery (17)
ModeClassNoYaxDB + CB = ModeClassNoYaxQuery (18)
AvClassNoYaxDB + CB = AvClassNoYaxQuery (19)

上記（１７）〜（１９）式において、「ＣＢ」は定数であって、上述したＣＡと同様、最初に処理する１式（例えば（１７）式）から求まる値である。出現位置分布導出部２９は、この定数ＣＢの値が残りの２式にて成立するか否か、つまり、残り２式での定数ＣＢからのずれの度合いを、照合画像のヒストグラムと、被照合画像のヒストグラムとの形状の類似度として算出する。なお、定数ＣＢからのずれの度合いは、上述した定数ＣＡについてと同様に導出することができる。 In the above formulas (17) to (19), “CB” is a constant, and is a value obtained from the first formula (for example, formula (17)) to be processed first, similarly to the CA described above. The appearance position distribution deriving unit 29 determines whether or not the value of the constant CB is satisfied by the remaining two expressions, that is, the degree of deviation from the constant CB in the remaining two expressions, the histogram of the collation image, and the collation target It is calculated as the similarity of the shape with the histogram of the image. The degree of deviation from the constant CB can be derived in the same manner as for the constant CA described above.

出現位置分布導出部２９は、上記の手続きにより算出した水平方向および垂直方向での類似度を、処理対象の被照合画像と対応付けてＲＡＭ４等に保持する。ここで、水平方向（又は垂直方向）に対して導出されるずれの度合いの個数は、２式（或いは３式）分となるが、これらを個別に類似度として保持する態様としてもよいし、これらの平均値を類似度として保持する態様としてもよい。 The appearance position distribution deriving unit 29 stores the similarity in the horizontal direction and the vertical direction calculated by the above procedure in the RAM 4 or the like in association with the verification target image. Here, the number of degrees of deviation derived with respect to the horizontal direction (or vertical direction) is for two formulas (or three formulas), but these may be individually retained as similarities, It is good also as an aspect which hold | maintains these average values as similarity.

なお、本実施形態では、文書画像の水平方向および垂直方向の両方向について、ヒストグラムの形状の類似度を算出したが、何れか一方向のみについて算出する態様としてもよい。また、本実施形態では、行内矩形の出現位置の分布状態をヒストグラムで表すものとしたが、これに限らず、例えば正規分布を用いて表すものとしてもよい。 In the present embodiment, the similarity of the histogram shape is calculated for both the horizontal direction and the vertical direction of the document image, but it may be calculated only for one direction. In the present embodiment, the distribution state of the appearance position of the in-line rectangle is represented by a histogram. However, the present invention is not limited to this, and may be represented using, for example, a normal distribution.

図１５は、行内矩形の存在位置の分布状態を正規分布で表現した一例を示した図である。同図に示したように、正規分布を用いて表す場合には、各行内矩形の始点に基づいた集計結果から、水平方向（Ｘ軸）に関して、平均μｘ、標準偏差σｘ、歪度、尖度を算出し、また同様に垂直方向（Ｙ軸）に関して、平均μｙ、標準偏差σｙ、歪度、尖度を算出すればよい。 FIG. 15 is a diagram illustrating an example in which the distribution state of the position of the in-line rectangle is expressed by a normal distribution. As shown in the figure, when the normal distribution is used, the average μx, the standard deviation σx, the skewness, the kurtosis in the horizontal direction (X axis) are obtained from the aggregation result based on the start point of each in-line rectangle. Similarly, average μy, standard deviation σy, skewness, and kurtosis may be calculated in the vertical direction (Y-axis).

この場合、平均値については、照合画像と被照合画像とで画像サイズが異なる可能性があるため、直接比較することは有意ではない。正規分布の形状が一致しているか否かを求めるには、標準偏差、歪度、尖度が類似しているかを判定すればよい。例えば、検索画像の標準偏差、歪度、尖度と、被検索画像の標準偏差、歪度、尖度との各々を比較し、比率が１に近いものほど正規分布の形状が類似するものと判断することができる。 In this case, as for the average value, there is a possibility that the image size may be different between the collation image and the collation image, so that direct comparison is not significant. In order to determine whether the shapes of the normal distribution match, it is only necessary to determine whether the standard deviation, skewness, and kurtosis are similar. For example, the standard deviation, skewness, and kurtosis of the search image are compared with the standard deviation, skewness, and kurtosis of the image to be searched. Judgment can be made.

なお、照合画像の解像度と、被照合画像の解像度とが一致している場合には、同一文字を構成するドット数は同じになるが、解像度が異なる場合にはドット数は同じにならない。つまり、ヒストグラムや正規分布の形状の一致を評価する場合にも、解像度が同じ場合には両者の数値をそのまま利用しても構わないが、解像度が異なる場合には、ドット数に基づく数値をそのまま利用することができない。 In addition, when the resolution of the collation image and the resolution of the collated image match, the number of dots constituting the same character is the same, but when the resolution is different, the number of dots is not the same. In other words, when evaluating the coincidence of the shape of the histogram or normal distribution, both numerical values may be used as they are when the resolution is the same, but when the resolutions are different, the numerical value based on the number of dots is used as it is. It cannot be used.

そこで、両文書画像の解像度が異なる場合、或いは解像度自体が未知の場合には、数値の正規化を行う必要がある。一般的な文書画像においては段落単位では文字のサイズは同一であるため、同じ段落に属する文字行は行高さが等しくなる。また、照合画像が被照合画像の一部分であれば、同じ行高さになる可能性が高いことは明らかである。よって、被検索画像および検索画像において、各文字行の行高さを集計し、最頻出となる行高さについて、ヒストグラムを規定する数値（平均、モード、メジアン）を除算する。なお、正規分布の場合も同様である。また、最頻出の行高さではなく、各文字行の行高さの平均値で除算してもよい。いずれを選択するかは設計事項であり、使用する環境に応じて決定すればよい。 Therefore, when the document images have different resolutions or when the resolution itself is unknown, it is necessary to normalize numerical values. In a general document image, the character size is the same for each paragraph, so that the character lines belonging to the same paragraph have the same line height. In addition, if the collation image is a part of the collation image, it is clear that there is a high possibility that the row height will be the same. Therefore, in the search target image and the search image, the line height of each character line is totaled, and the numerical value (average, mode, median) that defines the histogram is divided for the line height that occurs most frequently. The same applies to the normal distribution. Moreover, you may divide by the average value of the line height of each character line instead of the most frequent line height. Which to select is a design matter and may be determined according to the environment to be used.

また、照合画像が被照合画像の一部分であっても、その一部分の特異な部分だけが照合画像となった場合には、全体画像において最頻出する行高さが、部分画像において最頻出となる行高さと一致しないことが考えられる。例えば、本文行と見出し行とは行高さが大きく異なる文書画像において、全体画像の行数としては本文行が圧倒的に多いと予想される。その文書の部分画像には見出し行だけしか含まれていない場合には、最頻出行は見出し行となり、全体画像の最頻出行から推定した行高さとは一致しないため、この一致しない結果に基づいて正規化しても正しい比較結果を得ることができないのは明らかである。 Further, even if the collation image is a part of the image to be collated, when only a specific part of the collation image becomes the collation image, the line height that appears most frequently in the entire image becomes the most frequent in the partial image. It is possible that the line height does not match. For example, in a document image in which the text line and the headline line have greatly different line heights, it is expected that the text line is overwhelmingly large as the number of lines of the entire image. If the partial image of the document contains only the heading line, the most frequent line becomes the heading line and does not match the line height estimated from the most frequent line of the whole image. It is clear that correct comparison results cannot be obtained even if normalized.

このような場合、照合画像と被照合画像との両文書画像内において、一致した行内矩形（シンボル系列）だけを対象に矩形サイズの集計を行い、最頻出した矩形サイズのドット数に基づいて、数値（平均、モード、メジアン）を正規化することで対応することができる。 In such a case, in both document images of the collation image and the collation image, the rectangular size is aggregated only for the matching in-line rectangle (symbol series), and based on the number of rectangular size dots that appear most frequently, This can be dealt with by normalizing numerical values (average, mode, median).

図１３に戻り、出現位置分布導出部２９は、ステップＳ５４の処理で求めた類似度を、処理対象の候補画像に対応付けてハードディスク３又はＲＡＭ４に保持する（ステップＳ５５）。続いて、出現位置分布導出部２９は、ステップＳ４の処理で選定されたｎ個の被候補画像の全てに対して、本処理の処理対象としたか否かを判定する（ステップＳ５６）。ここで、本処理の対象としていない未処理の候補画像が存在すると判定した場合には（ステップＳ５６；Ｎｏ）、ステップＳ５１へと再び戻り、未処理の候補画像のうち一つを処理対象として選択する。 Returning to FIG. 13, the appearance position distribution deriving unit 29 stores the similarity obtained in the process of step S54 in the hard disk 3 or the RAM 4 in association with the candidate image to be processed (step S55). Subsequently, the appearance position distribution deriving unit 29 determines whether or not all n candidate images selected in the process of step S4 have been processed in this process (step S56). Here, when it is determined that there is an unprocessed candidate image that is not a target of the main processing (step S56; No), the process returns to step S51, and one of the unprocessed candidate images is selected as a processing target. To do.

一方、ステップＳ５６において、全ての候補画像を処理対象としたと判定した場合（ステップＳ５６；Ｙｅｓ）、図３のステップＳ６の処理に移行する。 On the other hand, if it is determined in step S56 that all candidate images have been processed (step S56; Yes), the process proceeds to step S6 in FIG.

図３に戻り、照合結果選定部３０は、ステップＳ５の処理によりＲＡＭ４等に保持されたｎ個の候補画像の類似度に基づいて、最も高い類似度を有した候補画像、即ち類似度の値が“１”に最も近かった候補画像を照合結果として選定する（ステップＳ６）。 Returning to FIG. 3, the collation result selection unit 30 determines the candidate image having the highest similarity, that is, the value of the similarity based on the similarity of the n candidate images held in the RAM 4 or the like by the process of step S5. The candidate image that is closest to “1” is selected as the collation result (step S6).

続いて、表示部３１は、ステップＳ６の処理で照合結果に選定された文書画像を、照合画像に対する照合結果として表示装置６に表示し（ステップＳ７）、本処理を終了する。 Subsequently, the display unit 31 displays the document image selected as the collation result in the process of step S6 on the display device 6 as the collation result for the collation image (step S7), and ends this process.

図１６は、上記文書照合処理の動作を説明するための図である。同図において、Ｄ１１は照合画像であって、特定の文書画像中の一部分となる部分画像が照合画像に選択された場合を示している。また、Ｄ２１〜Ｄ２４は、ステップＳ４までの処理により選定された４つの候補画像を示している。なお、照合画像Ｄ１１は、候補画像Ｄ２４の部分画像となっている。即ち、候補画像Ｄ２４が照合画像Ｄ１１に最も類似する文書画像となっている。 FIG. 16 is a diagram for explaining the operation of the document collation process. In the drawing, D11 is a collation image, and shows a case where a partial image that is a part of a specific document image is selected as the collation image. D21 to D24 indicate four candidate images selected by the processing up to step S4. The collation image D11 is a partial image of the candidate image D24. That is, the candidate image D24 is a document image that is most similar to the collation image D11.

上述したようにステップＳ４の処理では、行内矩形の配置情報に対応するシンボル系列を照合することで、照合画像Ｄ１１と相関関係にある文書画像として、候補画像Ｄ２１〜Ｄ２４までを絞り込むことが可能である。なお、照合画像Ｄ１１、候補画像Ｄ２１〜Ｄ２４中矩形Ｋで表した部分が、各文書画像で一致したシンボル系列（或いはシンボル）の行内矩形を意味している。 As described above, in the process of step S4, it is possible to narrow down candidate images D21 to D24 as document images correlated with the collation image D11 by collating the symbol series corresponding to the in-line rectangle arrangement information. is there. In addition, the part represented by the rectangle K in the collation image D11 and the candidate images D21 to D24 means the in-line rectangle of the symbol series (or symbol) that matches in each document image.

しかしながら、ステップＳ４の処理ではシンボル系列の出現頻度に基づいて類似度を判断するのみであるため、候補画像Ｄ２４が照合画像Ｄ１１に最も類似する文書画像であること、即ち、照合画像Ｄ１１が候補画像Ｄ２４の一部分であることまでを判断することはできない。そのため、ステップＳ５の処理では、各文書画像で一致したシンボル系列の相対的な位置関係、即ち出現位置の分布状態を照合することで、候補画像Ｄ２４が照合画像Ｄ１１に最も類似する文書画像であることを特定することが可能となる。 However, since the degree of similarity is only determined based on the appearance frequency of the symbol series in the process of step S4, the candidate image D24 is the document image most similar to the collation image D11, that is, the collation image D11 is the candidate image. It cannot be determined that it is a part of D24. Therefore, in the process of step S5, the candidate image D24 is the document image that is most similar to the collation image D11 by collating the relative positional relationship of the symbol series that matched in each document image, that is, the distribution state of the appearance positions. It becomes possible to specify.

以上のように、本実施形態によれば、照合画像と被照合画像とについて、文字行内における外接矩形の特徴を表した配置情報を抽出し、これらを固定段階に量子化してシンボルを生成することにより、文字認識することなく文字行の特徴の抽出が可能となり、被照合画像から、照合画像と相関の高い被照合画像を所定の数だけ候補画像として選定することができる。また、照合画像と候補画像とについて、一致するシンボル系列の出現位置の分布状態を照合することで、当該シンボル系列の相対的な位置関係の類似性を判定することができるため、照合対象画像と候補画像との類似性を高精度に判定することができる。これにより、文書画像中の部分画像が照合対象の文書画像とされた場合であっても、この部分画像に含まれた文字画像の外接矩形の位置関係に基づいて、当該部分画像と類似する文書画像を高精度に検索することが可能となる。 As described above, according to the present embodiment, the arrangement information representing the characteristics of the circumscribed rectangle in the character line is extracted from the collation image and the collation image, and these are quantized to a fixed stage to generate a symbol. Thus, it is possible to extract the characteristics of the character line without recognizing characters, and it is possible to select, as a candidate image, a predetermined number of images to be verified that have a high correlation with the verification image. Further, by comparing the distribution state of the appearance positions of the matching symbol series for the matching image and the candidate image, the similarity of the relative positional relationship of the symbol series can be determined. Similarity with a candidate image can be determined with high accuracy. As a result, even if a partial image in the document image is a document image to be collated, a document similar to the partial image based on the positional relationship of the circumscribed rectangle of the character image included in the partial image Images can be searched with high accuracy.

なお、本発明は、上記実施の形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化することができる。また、上記実施の形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成することができる。例えば、実施の形態に示される全構成要素からいくつかの構成要素を削除してもよい。さらに、異なる実施の形態にわたる構成要素を適宜組み合わせても良い。 It should be noted that the present invention is not limited to the above-described embodiment as it is, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. In addition, various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the above embodiments. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, constituent elements over different embodiments may be appropriately combined.

例えば、本実施形態で実行される文書照合処理にかかるプログラムを、インターネット等のネットワークに接続されたコンピュータ上に格納し、ネットワーク経由でダウンロードさせることにより提供するように構成しても良い。また、本実施形態の文書処理装置１００で実行される文書照合処理にかかるプログラムをインターネット等のネットワーク経由で提供または配布するように構成しても良い。 For example, the program relating to the document matching process executed in the present embodiment may be stored on a computer connected to a network such as the Internet and provided by being downloaded via the network. In addition, a program related to document collation processing executed by the document processing apparatus 100 according to the present embodiment may be provided or distributed via a network such as the Internet.

また、本実施形態で実行される文書照合処理にかかるプログラムを、ＲＯＭ等の記憶媒体に予め組み込んで提供するように構成してもよい。 Further, the program relating to the document matching process executed in the present embodiment may be provided by being incorporated in advance in a storage medium such as a ROM.

また、上記実施形態では、図２に示した各機能部をＣＰＵ１とＲＯＭ２に記憶された所定のプログラムとの協働により実現する態様としたが、これに限らず、ハードウェア構成により実現する態様としてもよい。具体的には、リアルタイム性が重要視される場合には、処理を高速化する必要があるため、論理回路（図示せず）を別途設け、論理回路の動作により各種の演算処理を実行するようにすることが好ましい。 In the above-described embodiment, each functional unit illustrated in FIG. 2 is realized by cooperation with the CPU 1 and a predetermined program stored in the ROM 2. However, the present invention is not limited to this and is realized by a hardware configuration. It is good. Specifically, when the real-time property is regarded as important, it is necessary to speed up the processing. Therefore, a logic circuit (not shown) is separately provided, and various arithmetic processes are executed by the operation of the logic circuit. It is preferable to make it.

また、上記実施形態では、文字行よりも小さな単位として行内矩形に着目したが、これに限らず、他の単位でも適用可能である。例えば、文字（文字画像）単位や単語単位の画像特徴でも数値化し量子化することで、上記と同様にシンボル化することが可能であり、照合することが可能である。この場合、黒画画素に基づいて文字画像を切り出したのち、当該文字画像の外接矩形を文字単位又は単語単位で用いることで対応することが可能である。なお、文字単位又は単語単位での分割は、ＯＣＲ（Optical Character Recognition）等で用いられる公知の文字切り出し手法を用いればよい。 In the above embodiment, the in-line rectangle is focused on as a unit smaller than the character line. However, the present invention is not limited to this, and other units are applicable. For example, by characterizing and quantizing image characteristics in units of characters (character images) or words, it is possible to symbolize and collate as described above. In this case, it is possible to cope by cutting out the character image based on the black image pixels and then using the circumscribed rectangle of the character image in character units or word units. In addition, what is necessary is just to use the well-known character extraction method used by OCR (Optical Character Recognition) etc. for the division | segmentation in a character unit or a word unit.

代表的な文字切り出し手法として、射影を利用する方法がある。この方法では、水平行について、垂直方向に黒画素数を集計し、その分布を求め、ある黒画素数がしきい値以下の部分を分割位置候補とする。また、分割位置候補に対しては、行高さから推定した文字幅、隣接する分割位置との距離、行全体に亘る分割位置の周期性等の観点から妥協点を評価し、適当な分割位置の選択を行う（垂直行も同様）。 As a typical character segmentation method, there is a method using projection. In this method, the number of black pixels in the horizontal direction is counted in the vertical direction, the distribution is obtained, and a portion where a certain number of black pixels is equal to or smaller than a threshold value is determined as a division position candidate. In addition, for the division position candidates, a compromise point is evaluated from the viewpoint of the character width estimated from the line height, the distance from the adjacent division position, the periodicity of the division position over the entire line, and the like. Is selected (the same applies to the vertical row).

また、単語単位に分割する他の方法としては、欧文等分かち書きの習慣のある言語については、単語間の空白に基づいて容易に実現することが可能である。このように、文字単位、単語単位等の単位で分割された場合であっても、その範囲の画像に外接する矩形を求めることが可能であり、その外接矩形の開始位置、終点位置を用いることで行内矩形に対する場合と同様な手順で量子化を行うことができる。 In addition, as another method of dividing into words, it is possible to easily realize a language that has a habit of dividing equally into European sentences based on a space between words. In this way, even when divided in units of characters, words, etc., it is possible to obtain a rectangle that circumscribes the image in that range, and use the start position and end point position of the circumscribed rectangle Thus, quantization can be performed in the same procedure as for the in-line rectangle.

以上のように、本発明に係る文書処理装置、文書処理方法および文書処理プログラムは、文書画像間を照合する文字処理装置に有用であり、特に、文書画像の一部分となる部分画像を照合対象とし、この部分画像に類似する文書画像の検索を行う文書処理装置に適している。 As described above, the document processing apparatus, the document processing method, and the document processing program according to the present invention are useful for a character processing apparatus that collates between document images, and particularly, a partial image that is a part of a document image is a collation target. It is suitable for a document processing apparatus that searches for a document image similar to this partial image.

文書処理装置のハードウェア構成を示したブロック図である。It is the block diagram which showed the hardware constitutions of the document processing apparatus. 文書処理装置の機能的構成を示したブロック図である。It is the block diagram which showed the functional structure of the document processing apparatus. 文書照合処理の手順を示したフローチャートである。It is the flowchart which showed the procedure of document collation processing. 出現頻度集計処理の手順を示したフローチャートである。It is the flowchart which showed the procedure of appearance frequency totaling process. 文字行の切り出しを説明するための図である。It is a figure for demonstrating extraction of a character line. 文字行の切り出しを説明するための図である。It is a figure for demonstrating extraction of a character line. 文字行の切り出しを説明するための図である。It is a figure for demonstrating extraction of a character line. シンボル生成処理の手順を示したフローチャートである。It is the flowchart which showed the procedure of the symbol production | generation process. 行内矩形の配置例を示した図である。It is the figure which showed the example of arrangement | positioning of the rectangle in a line. 行内矩形の配置例を示した図である。It is the figure which showed the example of arrangement | positioning of the rectangle in a line. 行内矩形に対する座標の設定例を説明するための図である。It is a figure for demonstrating the example of a setting of the coordinate with respect to the rectangle in a line. 行内矩形の配置状態を説明するための図である。It is a figure for demonstrating the arrangement | positioning state of the rectangle in a line. 配置情報の量子化を説明するための図である。It is a figure for demonstrating the quantization of arrangement | positioning information. 量子化された配置情報をシンボル化した一例を示した図である。It is the figure which showed an example which symbolized the arrangement | positioning information quantized. 文書照合処理の概要を説明するための図である。It is a figure for demonstrating the outline | summary of a document collation process. 出現位置分布照合処理の手順を示したフローチャートである。It is the flowchart which showed the procedure of appearance position distribution collation processing. 行内矩形の存在位置の分布状態をヒストグラムで表現した一例を示した図である。It is the figure which showed an example which expressed the distribution state of the presence position of the rectangle in a line with the histogram. 行内矩形の存在位置の分布状態を正規分布で表現した一例を示した図である。It is the figure which showed an example which expressed the distribution state of the existing position of the rectangle in a line by normal distribution. 文書照合処理の概要を説明するための図である。It is a figure for demonstrating the outline | summary of a document collation process.

Explanation of symbols

１００文書処理装置
１ＣＰＵ
２ＲＯＭ
３ハードディスク
４ＲＡＭ
５キーボード
６表示装置
７光ディスクドライブ
８通信装置
９スキャナ
１０バスコントローラ
２１画像入力部
２２照合画像選択部
２３矩形抽出部
２４行切出部
２５量子化部
２６シンボル生成部
２７出現頻度集計部
２８候補画像選定部
２９出現位置分布導出部
３０照合結果選定部
３１表示部 100 document processing apparatus 1 CPU
2 ROM
3 Hard disk 4 RAM
DESCRIPTION OF SYMBOLS 5 Keyboard 6 Display apparatus 7 Optical disk drive 8 Communication apparatus 9 Scanner 10 Bus controller 21 Image input part 22 Collation image selection part 23 Rectangle extraction part 24 Line extraction part 25 Quantization part 26 Symbol generation part 27 Appearance frequency totaling part 28 Candidate image Selection unit 29 Appearance position distribution deriving unit 30 Verification result selection unit 31 Display unit

Claims

In a document processing apparatus that performs collation between document images,
Based on a circumscribed rectangle for each character image included in the document image, character line cutting means for cutting out a character line connecting the circumscribed rectangles;
Quantization means for quantizing the arrangement information representing the characteristics of the circumscribed rectangle in the character line in a fixed stage;
Symbol generating means for symbolizing each of the quantized arrangement information into a fixed type of symbol;
Appearance frequency calculating means for calculating the appearance frequency of a symbol series composed of a predetermined number of symbol combinations;
The document image to be collated has a higher correlation by collating the appearance frequency calculated by the appearance frequency calculating means for the document image to be collated and a plurality of document images to be collated with the document image. A check target selection means for selecting a predetermined number,
Based on each piece of arrangement information corresponding to the symbol series matched between the document image to be collated and each document image to be collated selected by the collation target selection unit, any one of the pieces of arrangement information Or a distribution state deriving means for deriving the distribution state of the appearance position of the circumscribed rectangle that all represents, for each document image;
A collation target having the highest similarity by determining a similarity between the distribution state of the collation target document image derived by the distribution state deriving unit and the distribution state of the collation target document image; Collation result selection means for selecting a document image of
A document processing apparatus comprising:

The document processing apparatus according to claim 1, wherein the distribution state deriving unit derives a distribution state of appearance positions of the circumscribed rectangle in a horizontal direction and / or a vertical direction of the document image.

The document processing apparatus according to claim 1, wherein the distribution state deriving unit derives a distribution state of appearance positions of the circumscribed rectangle as a frequency distribution histogram.

The distribution state deriving means regards the distribution state of the appearance position of the circumscribed rectangle as a normal distribution, and derives an average, standard deviation, skewness, and kurtosis of the normal distribution. The document processing apparatus described.

The distribution state deriving means totals the sizes of the circumscribed rectangles of the character images included in the character lines in the document image to be collated and the document image to be collated, and an average value of the sizes or The document processing apparatus according to claim 4, wherein a numerical value defining the normal distribution is normalized by a mode value.

The distribution state deriving unit aggregates the sizes of circumscribed rectangles represented by the arrangement information corresponding to the symbol series that match between the document image to be collated and the document image to be collated. 5. The document processing apparatus according to 5.

A document processing method executed by a document processing apparatus that performs collation between document images,
A character line cutting step for cutting out a character line obtained by connecting the circumscribed rectangles based on a circumscribed rectangle for each character image included in the document image;
A quantization step, wherein the quantization means quantizes the arrangement information representing the characteristic of the circumscribed rectangle in the character line in a fixed stage;
A symbol sequence generating means for symbolizing each of the quantized arrangement information into a fixed type of symbol; and
An appearance frequency calculating means for calculating an appearance frequency of a symbol series comprising a combination of a predetermined number of the symbols;
The collation target selecting means collates the appearance frequencies calculated by the appearance frequency calculation means for the document image to be collated and the plurality of document images to be collated with the document image, and has a higher correlation. A verification target selection step for selecting a predetermined number of document images to be verified;
The distribution state deriving means, based on each arrangement information corresponding to the symbol series matched between the document image to be collated and each document image to be collated selected in the collation target selection step, A distribution state deriving step for deriving a distribution state of appearance positions of circumscribed rectangles represented by any or all of the respective pieces of arrangement information for each document image;
The collation result selecting means determines the similarity between the distribution state for the document image to be collated derived in the distribution state deriving step and the distribution state for the document image to be collated, and the highest similarity A matching result selection step for selecting a document image to be verified as a matching result,
A document processing method comprising:

The distribution state deriving means derives a distribution state of appearance positions of the circumscribed rectangles in the horizontal direction and / or the vertical direction of the document image in the distribution state deriving step. Document processing method.

9. The document processing method according to claim 7, wherein the distribution state deriving unit derives the distribution state of the appearance position of the circumscribed rectangle as a frequency distribution histogram in the distribution state deriving step.

The distribution state deriving means regards the distribution state of the appearance position of the circumscribed rectangle as a normal distribution in the distribution state deriving step, and derives an average, standard deviation, skewness, and kurtosis of the normal distribution. The document processing method according to claim 7 or 8.

In the distribution state deriving step, the distribution state deriving unit totalizes the sizes of the circumscribed rectangles of the character images included in the character lines in the document image to be collated and the document image to be collated. The document processing method according to claim 10, wherein a numerical value defining the normal distribution is normalized by an average value or a mode value of the size.

In the distribution state deriving step, the distribution state deriving unit totalizes the sizes of circumscribed rectangles represented by the arrangement information corresponding to the symbol sequences that coincide in the document image to be collated and the document image to be collated. The document processing method according to claim 11.

A computer that performs collation between document images
Based on a circumscribed rectangle for each character image included in the document image, character line cutting means for cutting out a character line connecting the circumscribed rectangles;
Quantization means for quantizing the arrangement information representing the characteristics of the circumscribed rectangle in the character line in a fixed stage;
Symbol generating means for symbolizing each of the quantized arrangement information into a fixed type of symbol;
Appearance frequency calculating means for calculating the appearance frequency of a symbol sequence consisting of a combination of a predetermined number of symbols in the symbol sequence;
The document image to be collated has a higher correlation by collating the appearance frequency calculated by the appearance frequency calculating means for the document image to be collated and a plurality of document images to be collated with the document image. A check target selection means for selecting a predetermined number,
Based on each piece of arrangement information corresponding to the symbol series matched between the document image to be collated and each document image to be collated selected by the collation target selection unit, any one of the pieces of arrangement information Or a distribution state deriving means for deriving the distribution state of the appearance position of the circumscribed rectangle that all represents, for each document image;
A collation target having the highest similarity by determining a similarity between the distribution state of the collation target document image derived by the distribution state deriving unit and the distribution state of the collation target document image; Collation result selection means for selecting a document image of
Document processing program characterized in that it functions as a computer program.

The document processing program according to claim 13, wherein the distribution state deriving unit derives a distribution state of appearance positions of the circumscribed rectangles in a horizontal direction and / or a vertical direction of the document image.

15. The document processing program according to claim 13, wherein the distribution state deriving unit derives the distribution state of the appearance position of the circumscribed rectangle as a frequency distribution histogram.

The distribution state deriving means regards the distribution state of the appearance position of the circumscribed rectangle as a normal distribution, and derives an average, standard deviation, skewness, and kurtosis of the normal distribution. The document processing program described.

The distribution state deriving means totals the sizes of the circumscribed rectangles of the character images included in the character lines in the document image to be collated and the document image to be collated, and an average value of the sizes or 17. The document processing program according to claim 16, wherein a numerical value defining the normal distribution is normalized by a mode value.

The distribution state deriving unit aggregates the sizes of circumscribed rectangles represented by the arrangement information corresponding to the symbol series that match between the document image to be collated and the document image to be collated. The document processing program according to 17.