JP6559415B2

JP6559415B2 - Document image processing apparatus, information processing apparatus including the same, program, and recording medium

Info

Publication number: JP6559415B2
Application number: JP2014235989A
Authority: JP
Inventors: 松岡　輝彦; 輝彦松岡; 真彦高島; 和之濱田
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2014-11-20
Filing date: 2014-11-20
Publication date: 2019-08-14
Anticipated expiration: 2034-11-20
Also published as: JP2016099793A

Description

本発明は、文書画像の再構成を行う文書画像処理装置、それを備えた情報処理装置、コンピュータプログラム、及び記録媒体に関する。 The present invention relates to a document image processing apparatus that reconstructs a document image, an information processing apparatus including the same, a computer program, and a recording medium.

近年、複写機又は複合機等の画像形成装置は高機能化が進められており、スキャナにより読み取った文書を文書画像データ（以下、文書画像）として保存し、保存した文書画像を管理する機能等が求められている。スキャナにより読み取った文書は画像データとして保存されるが、この画像データの形式は、フィックス型と呼ばれる固定の幅及び高さを持つファイル形式となっている。 2. Description of the Related Art In recent years, image forming apparatuses such as copiers and multifunction peripherals have advanced functions, and a function of storing a document read by a scanner as document image data (hereinafter referred to as a document image) and managing the stored document image, etc. Is required. A document read by a scanner is stored as image data. The format of the image data is a file format having a fixed width and height called a fixed type.

代表的なファイル形式として、例えば、ＰＤＦ（Portable Document Format）ファイルやＴＩＦＦ（Tagged Image File Format）ファイル等が挙げられる。これらのファイル形式の文書画像を、携帯電話、スマートフォン、タブレット等表示領域の小さな画像表示装置で表示すると、その表示領域に収まりきらず、垂直方向のスクロール操作と水平方向のスクロール操作の両方が必要となる場合がある。この場合、操作が非常に煩雑になる。 Typical file formats include, for example, PDF (Portable Document Format) files and TIFF (Tagged Image File Format) files. When document images in these file formats are displayed on an image display device with a small display area such as a mobile phone, smartphone, tablet, etc., they will not fit in the display area and both vertical and horizontal scroll operations are required. There is a case. In this case, the operation becomes very complicated.

そこで、例えば、表示領域の画素数に合わせて表示倍率を調整することにより、文書画像の幅を表示領域の幅に合わせて縮小表示することで、行方向のスクロール操作の省略が可能となる。しかし、縮小処理を実施することで文書画像中の文字の可読性が低下してしまう。そのため、読み取られた文書画像を、フィックス型のファイルではなく、表示領域に合わせて折り返し表示することが可能なリフロー型のファイルとして提供することが望ましい。 Therefore, for example, by adjusting the display magnification according to the number of pixels in the display area and reducing the width of the document image according to the width of the display area, the scroll operation in the row direction can be omitted. However, the readability of characters in the document image is reduced by performing the reduction process. Therefore, it is desirable to provide the read document image as a reflow type file that can be displayed in a folded manner in accordance with the display area, not a fixed type file.

リフロー型のファイルは固定の幅及び高さを持たず、画像表示装置の表示領域の範囲で行を自動的に折り返すことで、１行の文を表示領域からはみ出させることなく表示できる。よって、行方向にスクロールすることなく、行方向と直交する方向のスクロール操作のみで文書を読むことが可能となる。ＨＴＭＬ（Hyper Text Markup Language）や、スマートフォンやタブレット向けに展開されている電子書籍機能が提供するファイル形式は、リフロー型の表示が可能なファイル形式の例である。前記読み取られた文書画像をリフロー型のファイル形式に変換することにより、上記の可読性の問題は解決される。 A reflow type file does not have a fixed width and height, and by automatically folding lines within the range of the display area of the image display device, a single line of text can be displayed without protruding from the display area. Therefore, it is possible to read the document only by scrolling in the direction orthogonal to the line direction without scrolling in the line direction. A file format provided by HTML (Hyper Text Markup Language) and an electronic book function developed for smartphones and tablets is an example of a file format that can be displayed in a reflow type. The readability problem is solved by converting the read document image into a reflow type file format.

ファイル形式を変換する装置として、例えば、特許文献１の文書ファイル表示装置がある。特許文献１の表示装置は、構造化された文書ファイル（doc、txt、odf、xls等）を文書画像のファイル（jpeg、tiff、bmp等）に変換し、変換した文書画像から、文書を構成する個々の要素の存在領域及び要素の並び方向を含むレイアウト情報を検出する。そして、検出したレイアウト情報に基づき、個々の要素の存在領域に相当する要素画像（部分画像）を文書画像から抽出し、抽出した各要素画像の要素の並び方向に沿ったサイズに基づき、行情報を作成する。次に、作成した行情報のスクロール方向を決定し、決定したスクロール方向に沿って複数の行情報を配列することで、段落情報を作成する。そして、作成した段落情報を、表示部の表示範囲内でスクロール表示する。よって、特許文献１の文書ファイル表示装置では、表示領域と同じ横幅の文書画像を表示するため、一方向のスクロール操作のみで文書画像を閲覧することが可能となる。 As an apparatus for converting the file format, for example, there is a document file display apparatus disclosed in Patent Document 1. The display device of Patent Document 1 converts a structured document file (doc, txt, odf, xls, etc.) into a document image file (jpeg, tiff, bmp, etc.), and constructs a document from the converted document image Layout information including the existence area of each element to be processed and the arrangement direction of the elements is detected. Then, based on the detected layout information, an element image (partial image) corresponding to the existence area of each element is extracted from the document image, and line information is extracted based on the size along the arrangement direction of the elements of each extracted element image. Create Next, paragraph information is created by determining a scroll direction of the created line information and arranging a plurality of pieces of line information along the decided scroll direction. Then, the created paragraph information is scroll-displayed within the display range of the display unit. Therefore, since the document file display device of Patent Document 1 displays a document image having the same width as that of the display area, the document image can be browsed only by a one-way scroll operation.

特開２０１２−２３０６２３号公報（２０１２年１１月２２日公開）JP 2012-230623 A (published on November 22, 2012)

しかしながら、特許文献１に記載の文書ファイル表示装置は、ユーザが指定した文書ファイルであれば、リフロー型に不向きな文書でも関係なくリフロー型の文書に変換してしまう。そのため、表やインデントを多用して文章を構成している文書など、元々意味を持っていたレイアウトの文書では、レイアウトが崩れてしまい、リフロー型の文書に変換したために内容が理解できなくなるといったことが起こり得る。 However, the document file display device described in Patent Document 1 converts a document file specified by the user into a reflow type document regardless of a document unsuitable for the reflow type. For this reason, a document with a layout that originally had meaning, such as a document that uses a lot of tables or indents, has a layout that is corrupted and cannot be understood because it has been converted to a reflow document. Can happen.

そこで、本発明は、前述した問題に鑑みなされたものであり、文書画像をリフロー型に変換するか否かを判断し、常に最適なフォーマットでの表示が可能なように文書画像を処理する文書画像処理装置等を提供することを目的とする。 Accordingly, the present invention has been made in view of the above-described problems, and determines whether or not to convert a document image to a reflow type and processes a document image so that it can always be displayed in an optimal format. An object is to provide an image processing apparatus and the like.

上記の課題を解決するために、本発明の一態様に係る文書画像処理装置は、文書を電子化した文書画像の再構成を行う文書画像処理装置において、前記文書画像の構造解析を行う構造解析部と、前記構造解析により前記文書画像から抽出した文字列あるいは図又は表の特徴量に基づいて、前記文書画像に含まれる各文字、図、及び／又は表である各要素を再構成するか否かの判定を行う変換判定部と、前記変換判定部が前記文書画像の各要素を再構成すると判定すると、前記構造解析部による解析結果に基づいて、前記文書画像を再構成した際の上記各要素の順序を記述した参照リストを生成する参照リスト生成部と、を備えることを特徴とする。 In order to solve the above problems, a document image processing apparatus according to an aspect of the present invention provides a structure analysis that performs a structure analysis of a document image in a document image processing apparatus that reconstructs a document image obtained by digitizing a document. Or each element included in the document image is reconstructed based on the character string extracted from the document image by the structural analysis and the feature amount of the figure or table If the conversion determination unit that determines whether or not the conversion determination unit determines to reconstruct each element of the document image, the document image is reconstructed based on the analysis result by the structure analysis unit. And a reference list generation unit that generates a reference list describing the order of each element.

上記構成によると、文書画像をリフロー型に変換するか否かを判断し、常に最適なフォーマットでの表示が可能なように文書画像を処理することができる。 According to the above configuration, it is possible to determine whether or not to convert the document image to the reflow type, and to process the document image so that it can always be displayed in the optimum format.

本発明の一実施形態に係る画像形成装置の機能的構成を示すブロック図である。1 is a block diagram illustrating a functional configuration of an image forming apparatus according to an embodiment of the present invention. 上記画像形成装置の有する画像処理装置が備える変換処理部の構成を示すブロック図である。FIG. 2 is a block diagram illustrating a configuration of a conversion processing unit included in an image processing apparatus included in the image forming apparatus. 行頭禁則の文字の例と行末禁則の文字の例を示す図である。It is a figure which shows the example of the character of a line head prohibition character, and the example of the character of a line end prohibition character. 横書きの文字列の例を示す図である。It is a figure which shows the example of the character string of horizontal writing. 文書画像の例を示す図である。It is a figure which shows the example of a document image. 行に分類した上記文書画像の例を示す図である。It is a figure which shows the example of the said document image classified into the line. （ａ）は、２段組の横書きの文書の例、（ｂ）は、２段組の縦書きの文書の例を示す図である。(A) is a diagram showing an example of a two-column horizontal writing document, and (b) is a diagram showing an example of a two-column vertical writing document. （ａ）〜（ｆ）は、２つの行の行間距離を説明する図である。(A)-(f) is a figure explaining the distance between two lines. （ａ）は、行に分類した文書画像の例、（ｂ）は、（ａ）をさらに行ブロックに分類した文書画像の例を示す図である。(A) is an example of a document image classified into rows, and (b) is a diagram illustrating an example of a document image into which (a) is further classified into row blocks. 行と行ブロックとに分類した文書画像の例を示す図である。It is a figure which shows the example of the document image classified into the line and the line block. 上記変換処理部の有するレイアウト解析処理部の構成を示すブロック図である。It is a block diagram which shows the structure of the layout analysis process part which the said conversion process part has. （ａ）は、行ブロックに分類した文書画像の例、（ｂ）は、（ａ）をさらに段組に分類した文書画像の例、（ｃ）は、（ｂ）をさらにカラムに分類した文書画像の例を示す図である。(A) is an example of a document image classified into row blocks, (b) is an example of a document image into which (a) is further classified into columns, and (c) is a document in which (b) is further classified into columns. It is a figure which shows the example of an image. ２段組構成の文書画像の例を示す図である。It is a figure which shows the example of the document image of a two-column structure. 行、行ブロック、段（カラム）、及び段組に分類した文書画像の例を示す図である。It is a figure which shows the example of the document image classified into the line, the line block, the stage (column), and the column group. 行順序リストの例を示す図である。It is a figure which shows the example of a line order list. 行ブロック、カラム、及び段組についての情報を示す図である。It is a figure which shows the information about a row block, a column, and a column. 文書構造ツリーの構造の例を示す図である。It is a figure which shows the example of the structure of a document structure tree. 上記レイアウト解析処理部の有する段落解析処理部における改行判定処理の概要を示すイメージ図である。It is an image figure which shows the outline | summary of the line feed determination process in the paragraph analysis process part which the said layout analysis process part has. 上記段落解析処理部の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the said paragraph analysis process part. 行ＩＤバッファの更新処理の概要を示すイメージ図である。It is an image figure which shows the outline | summary of the update process of a row ID buffer. 初期化された文書構造ツリーの例を示す図である。It is a figure which shows the example of the initialized document structure tree. 更新された行順序リストの例を示す図である。It is a figure which shows the example of the updated line order list. 行順序リストに従って生成された文書構造ツリーの構造の例を示す図である。It is a figure which shows the example of the structure of the document structure tree produced | generated according to the line order list. 段落に分類した文書画像の例を示す図である。It is a figure which shows the example of the document image classified into the paragraph. 上記変換処理部の有する再配置処理部の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the rearrangement process part which the said conversion process part has. ＨＴＭＬ言語で記述されたファイルの例を示す図である。It is a figure which shows the example of the file described by the HTML language. ファイル記述処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of a file description process. ＣＳＳ形式で記述したスタイルシートの外部ファイルの例を示す図である。It is a figure which shows the example of the external file of the style sheet described in CSS format. （ａ）は、ＣＳＳ形式で記述したスタイルシートの外部ファイルの例を示す図であり、（ｂ）は、ＨＴＭＬ言語で記述された参照リストの例を示す図である。(A) is a figure which shows the example of the external file of the style sheet described in CSS format, (b) is a figure which shows the example of the reference list described by the HTML language. 本発明の別の実施形態に係る画像読取装置の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the image reading apparatus which concerns on another embodiment of this invention. ブロック位置が揃っていない文書画像の例を示す図である。It is a figure which shows the example of the document image in which the block position is not aligned. （ａ）〜（ｃ）は、文書画像から線ベースのグラフを抽出する処理を説明する図である。(A)-(c) is a figure explaining the process which extracts a line-based graph from a document image.

以下に、本発明の実施の形態を図面に基づき詳述する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

〔実施の形態１：画像形成装置〕
以下の説明では、本発明に係る文書画像処理装置が変換処理部として画像処理装置の一部を成し、また、その画像処理装置が画像形成装置の一部を成す形態を例示する。 [Embodiment 1: Image forming apparatus]
In the following description, a document image processing apparatus according to the present invention forms a part of an image processing apparatus as a conversion processing unit, and the image processing apparatus forms a part of an image forming apparatus.

［１．画像形成装置］
図１は、実施の形態１に係る画像形成装置（情報処理装置）１００の機能的構成を示すブロック図である。画像形成装置１００は、コピー機能及びスキャナ機能等を有するデジタル複合機である。画像形成装置１００は、画像処理装置１、画像入力装置２、画像出力装置３、及び送信装置４を備えている。 [1. Image forming apparatus]
FIG. 1 is a block diagram illustrating a functional configuration of an image forming apparatus (information processing apparatus) 100 according to the first embodiment. The image forming apparatus 100 is a digital multifunction machine having a copy function, a scanner function, and the like. The image forming apparatus 100 includes an image processing apparatus 1, an image input apparatus 2, an image output apparatus 3, and a transmission apparatus 4.

画像入力装置２、画像処理装置１、画像出力装置３及び送信装置４には、操作パネル６が接続されている。操作パネル６は、ユーザが画像形成装置１００の動作モードを設定するための設定ボタン及びテンキー等の操作部（図示せず）と、液晶ディスプレイ等で構成される表示部（図示せず）とを備える。 An operation panel 6 is connected to the image input device 2, the image processing device 1, the image output device 3, and the transmission device 4. The operation panel 6 includes an operation unit (not shown) such as a setting button and a numeric keypad for the user to set the operation mode of the image forming apparatus 100, and a display unit (not shown) configured by a liquid crystal display or the like. Prepare.

画像形成装置１００で実行される各種処理は、図示しない制御部（ＣＰＵ（Central Processing Unit）あるいはＤＳＰ（Digital Signal Processor）等のプロセッサを含むコンピュータ）が制御する。画像形成装置１００の制御部は、図示しないネットワークカード及びＬＡＮケーブルを介して、ネットワークに接続されたコンピュータ及び他のデジタル複合機等とデータ通信を行う。 Various processes executed by the image forming apparatus 100 are controlled by a control unit (a computer including a processor such as a CPU (Central Processing Unit) or DSP (Digital Signal Processor)) (not shown). The control unit of the image forming apparatus 100 performs data communication with a computer and other digital multifunction peripherals connected to the network via a network card and a LAN cable (not shown).

以下、画像形成装置１００の各部について詳述する。 Hereinafter, each part of the image forming apparatus 100 will be described in detail.

画像入力装置２は、原稿から画像を光学的に読み取る。画像入力装置２は、例えばＣＣＤ（Charge Coupled Device）を有するカラースキャナよりなり、原稿からの反射光像を、ＣＣＤを用いてＲＧＢ（Ｒ：赤，Ｇ：緑，Ｂ：青）のアナログ信号として読み取り、画像処理装置１へ出力する。画像入力装置２は、スキャナでなくてもよく、例えばデジタルカメラ等であってもよい。 The image input device 2 optically reads an image from a document. The image input device 2 is composed of, for example, a color scanner having a CCD (Charge Coupled Device), and a reflected light image from an original is converted into an RGB (R: red, G: green, B: blue) analog signal using the CCD. Read and output to the image processing apparatus 1. The image input device 2 may not be a scanner, but may be a digital camera, for example.

画像処理装置１は、画像入力装置２が読み取った画像データに処理を施し、処理を施した画像データを保存、あるいは、送信するために圧縮ファイルを生成する。 The image processing device 1 performs processing on the image data read by the image input device 2, and generates a compressed file for storing or transmitting the processed image data.

画像処理装置１は、画像入力装置２から入力されたＲＧＢのアナログ信号に対して、Ａ／Ｄ変換部１１、シェーディング補正部１２、原稿種別判別部１３、入力階調補正部１４、及び領域分離処理部１５にて各後述する画像処理を実行することによって、ＲＧＢのデジタル信号（以下、ＲＧＢ信号という）からなる画像データを生成する。 The image processing apparatus 1 performs an A / D conversion unit 11, a shading correction unit 12, a document type determination unit 13, an input tone correction unit 14, and a region separation for RGB analog signals input from the image input device 2. Image data composed of RGB digital signals (hereinafter referred to as RGB signals) is generated by executing image processing to be described later in the processing unit 15.

また、画像処理装置１は、領域分離処理部１５が出力したＲＧＢ信号に対して色補正部１６、黒色生成下色除去部１７、空間フィルタ処理部１８、出力階調補正部１９、及び階調再現処理部２０にて各後述する画像処理を実行することによって、ＣＭＹＫ（Ｃ：シアン，Ｍ：マゼンタ，Ｙ：イエロー，Ｋ：ブラック）のデジタル信号からなる画像データを生成して、ストリームとして画像出力装置３へ出力する。なお、画像出力装置３へ出力される前に、画像データが記憶部５に一旦記憶されてもよい。記憶部５は、不揮発性の記憶装置（例えばハードディスク）である。 In addition, the image processing apparatus 1 performs a color correction unit 16, a black generation and under color removal unit 17, a spatial filter processing unit 18, an output gradation correction unit 19, and a gradation for the RGB signal output from the region separation processing unit 15. The reproduction processing unit 20 executes image processing to be described later, thereby generating image data composed of digital signals of CMYK (C: cyan, M: magenta, Y: yellow, K: black), and generating an image as a stream. Output to the output device 3. Note that the image data may be temporarily stored in the storage unit 5 before being output to the image output device 3. The storage unit 5 is a non-volatile storage device (for example, a hard disk).

画像出力装置３は、画像処理装置１が生成した画像データに基づいて画像を出力する。画像出力装置３は、画像処理装置１から入力された画像データに基づいて、熱転写、電子写真、又はインクジェット等の方式により、記録シート（例えば記録用紙等）上にカラー画像を形成（印刷）して出力する。 The image output device 3 outputs an image based on the image data generated by the image processing device 1. The image output device 3 forms (prints) a color image on a recording sheet (for example, recording paper) based on the image data input from the image processing device 1 by a method such as thermal transfer, electrophotography, or inkjet. Output.

本実施形態では、画像出力装置３はカラー画像を出力する構成とするが、記録シート上にモノクローム（白黒）画像を形成して出力する構成であってもよい。この場合、画像処理装置１にて、カラー画像の画像データがモノクローム画像の画像データに変換されてから画像出力装置３へ出力される。 In the present embodiment, the image output device 3 is configured to output a color image, but may be configured to form and output a monochrome (monochrome) image on a recording sheet. In this case, the image data of the color image is converted into the image data of the monochrome image by the image processing apparatus 1 and then output to the image output apparatus 3.

更にまた、画像処理装置１は、領域分離処理部１５が出力したＲＧＢ信号に対して圧縮処理部２１にて画像圧縮処理を実行することによって、圧縮されたカラー画像の画像データを有する圧縮ファイルを生成し、送信装置４へ出力する。なお、送信装置４へ出力される前に、圧縮ファイルが記憶部５に一旦記憶されてもよい。 Furthermore, the image processing apparatus 1 executes an image compression process on the RGB signal output from the region separation processing unit 15 by the compression processing unit 21, thereby generating a compressed file having image data of a compressed color image. Generate and output to the transmitter 4. Note that the compressed file may be temporarily stored in the storage unit 5 before being output to the transmission device 4.

画像処理装置１は、操作パネル６においてフォーマット変換モードが選択されている場合、領域分離処理部１５が出力したＲＧＢ信号に対して、変換処理部２２にてフォーマット変換処理を実行する。後述のように、変換処理部２２の処理によって、画像が有する文書レイアウトを解析して文書構造ツリーを生成する。 In the image processing apparatus 1, when the format conversion mode is selected on the operation panel 6, the conversion processing unit 22 performs format conversion processing on the RGB signals output from the region separation processing unit 15. As will be described later, a document structure tree is generated by analyzing the document layout of the image by the processing of the conversion processing unit 22.

変換処理部２２は、本発明に係る文書画像処理装置として機能する。また、送信装置４へ出力する前に、変換されたファイルを記憶部５に一旦記憶してもよい。また、画像入力装置２から入力される文書画像が複数ページにわたる場合、操作パネル６において指定したページのみ、後述のように、文書レイアウトを解析して文書の再構成を行うようにすることができる。例えば、表紙ページは再構成の対象とせず、そのままページ全体を画像として出力するといった方法も可能とする。 The conversion processing unit 22 functions as a document image processing apparatus according to the present invention. In addition, the converted file may be temporarily stored in the storage unit 5 before being output to the transmission device 4. Further, when the document image input from the image input device 2 covers a plurality of pages, only the page designated on the operation panel 6 can be analyzed to reconstruct the document as will be described later. . For example, the cover page is not subject to reconstruction, and the entire page can be directly output as an image.

送信装置４は、画像処理装置が生成した圧縮ファイルを外部へ送信する。送信装置４は、図示しない公衆回線網、ＬＡＮ（Local Area Network）又はインターネット等の通信ネットワークに接続可能であり、ファクシミリ又は電子メール等の通信方法により、通信ネットワークを介して外部へ圧縮ファイルを送信する。例えば、操作パネル６においてscan to e-mailモードが選択されている場合、ネットワークカード、モデム等を用いてなる送信装置４は、圧縮ファイルをe-mailに添付し、設定された送信先へ送信する。 The transmission device 4 transmits the compressed file generated by the image processing device to the outside. The transmission device 4 can be connected to a communication network such as a public network (not shown), a LAN (Local Area Network), or the Internet, and transmits a compressed file to the outside via the communication network by a communication method such as facsimile or e-mail. To do. For example, when the scan to e-mail mode is selected on the operation panel 6, the transmission device 4 using a network card, a modem or the like attaches the compressed file to the e-mail and transmits it to the set transmission destination. To do.

なお、ファクシミリ送信を行う場合は、画像形成装置１００の制御部が、モデムを用いてなる送信装置４にて、相手先との通信手続きを行い、送信可能な状態が確保されたときに、圧縮ファイルに対して圧縮形式の変更等の必要な処理を施してから、相手先に通信回線を介して順次送信する。 When facsimile transmission is performed, the control unit of the image forming apparatus 100 performs a communication procedure with the other party in the transmission apparatus 4 using a modem, and compression is performed when a transmission possible state is secured. After performing necessary processing such as changing the compression format on the file, the file is sequentially transmitted to the other party via a communication line.

また、ファクシミリを受信する場合、画像形成装置１００の制御部は、送信装置４にて通信手続きを行いながら、相手先から送信されてくる圧縮ファイルを受信して、画像処理装置１に入力する。画像処理装置１では、受信した圧縮ファイルに対し、不図示の圧縮／伸張処理部で伸張処理が施される。圧縮ファイルを伸張することによって得られた画像データには、必要に応じて、不図示の処理部で回転処理及び／又は解像度変換処理等が施され、また、出力階調補正部１９で出力階調補正が施され、階調再現処理部２０で階調再現処理が施される。各種画像処理が施された画像データは、画像出力装置３へ出力され、画像出力装置３にて、記録シート上に画像が形成される。 When receiving a facsimile, the control unit of the image forming apparatus 100 receives a compressed file transmitted from the other party and inputs it to the image processing apparatus 1 while performing a communication procedure in the transmission apparatus 4. In the image processing apparatus 1, the received compressed file is decompressed by a compression / decompression processing unit (not shown). The image data obtained by decompressing the compressed file is subjected to rotation processing and / or resolution conversion processing by a processing unit (not shown) as necessary, and the output gradation correction unit 19 outputs the output level. Tone correction is performed, and the gradation reproduction processing unit 20 performs gradation reproduction processing. The image data that has undergone various types of image processing is output to the image output device 3, and an image is formed on the recording sheet by the image output device 3.

［２．画像処理装置］
以下では、画像処理装置１の構成について、画像処理装置１における画像処理及びフォーマット変換処理を説明しながら詳述する。 [2. Image processing apparatus]
Hereinafter, the configuration of the image processing apparatus 1 will be described in detail while describing image processing and format conversion processing in the image processing apparatus 1.

Ａ／Ｄ変換部１１は、画像入力装置２から画像処理装置１へ入力されたＲＧＢのアナログ信号をＲＧＢのデジタル信号（即ちＲＧＢ信号）に変換する。 The A / D converter 11 converts RGB analog signals input from the image input device 2 to the image processing device 1 into RGB digital signals (that is, RGB signals).

シェーディング補正部１２は、Ａ／Ｄ変換部１１から入力されたＲＧＢ信号に対して、画像入力装置２の照明系、結像系及び撮像系で生じる各種の歪みを取り除く。 The shading correction unit 12 removes various distortions generated in the illumination system, the imaging system, and the imaging system of the image input device 2 from the RGB signal input from the A / D conversion unit 11.

原稿種別判別部１３は、シェーディング補正部１２から入力されたＲＧＢ信号をＲＧＢ各色の濃度を示す濃度信号に変換し、文字、写真、又は印画紙等の原稿のモードを判別する原稿種別判別処理を実行する。原稿種別をユーザが操作パネル６を用いてマニュアル設定する場合、原稿種別判別部１３はシェーディング補正部１２から入力されたＲＧＢ信号をそのまま後段の入力階調補正部１４に出力する。原稿種別判別処理の処理結果（原稿種別）は、後段の画像処理に反映される。 The document type discrimination unit 13 converts the RGB signal input from the shading correction unit 12 into a density signal indicating the density of each RGB color, and performs a document type discrimination process for discriminating the mode of a document such as characters, photographs, or photographic paper. Run. When the user manually sets the document type using the operation panel 6, the document type determination unit 13 outputs the RGB signal input from the shading correction unit 12 as it is to the input tone correction unit 14 at the subsequent stage. The processing result (document type) of the document type determination process is reflected in the subsequent image processing.

入力階調補正部１４は、原稿種別判別部１３から入力されたＲＧＢ信号に対して、カラーバランスの調整、下地濃度の除去、及びコントラストの調整等の画質調整処理を行う。 The input gradation correction unit 14 performs image quality adjustment processing such as color balance adjustment, background density removal, and contrast adjustment on the RGB signals input from the document type determination unit 13.

領域分離処理部１５は、入力階調補正部１４から入力されたＲＧＢ信号が表す画像中の各画素を、文字領域、網点領域、又は写真領域のいずれかに分離する。また、領域分離処理部１５は、分離結果に基づき、各画素がいずれの領域に属しているかを示す領域識別信号を、黒色生成下色除去部１７、空間フィルタ処理部１８、階調再現処理部２０、及び圧縮処理部２１へ出力する。更に、領域分離処理部１５は、入力階調補正部１４から入力されたＲＧＢ信号を、そのまま後段の色補正部１６、圧縮処理部２１及び変換処理部２２へ出力する。 The region separation processing unit 15 separates each pixel in the image represented by the RGB signal input from the input tone correction unit 14 into one of a character region, a dot region, and a photo region. Further, the region separation processing unit 15 generates a region identification signal indicating which region each pixel belongs to based on the separation result, and generates a black generation and under color removal unit 17, a spatial filter processing unit 18, and a gradation reproduction processing unit. 20 and the compression processing unit 21. Further, the region separation processing unit 15 outputs the RGB signal input from the input tone correction unit 14 to the subsequent color correction unit 16, compression processing unit 21, and conversion processing unit 22 as they are.

色補正部１６は、領域分離処理部１５から入力されたＲＧＢ信号をＣＭＹのデジタル信号（以下、ＣＭＹ信号という）へ変換し、色再現の忠実化実現のために、不要吸収成分を含むＣＭＹ色材の分光特性に基づいた色濁りをＣＭＹ信号から取り除く。 The color correction unit 16 converts the RGB signal input from the region separation processing unit 15 into a CMY digital signal (hereinafter, referred to as a CMY signal), and CMY colors including unnecessary absorption components for realizing faithful color reproduction. Color turbidity based on the spectral characteristics of the material is removed from the CMY signal.

黒色生成下色除去部１７は、色補正部１６から入力されたＣＭＹ信号に基づき、ＣＭＹ信号から黒色（Ｋ）信号を生成する黒色生成処理と、ＣＭＹ信号から黒色生成処理で得たＫ信号を差し引いて新たなＣＭＹ信号を生成する処理とを行う。この結果、ＣＭＹ３色のデジタル信号は、ＣＭＹＫ４色のデジタル信号（以下、ＣＭＹＫ信号という）に変換される。 Based on the CMY signal input from the color correction unit 16, the black generation and under color removal unit 17 generates a black (K) signal from the CMY signal and a K signal obtained from the CMY signal by the black generation process. Subtraction is performed to generate a new CMY signal. As a result, the CMY3 color digital signals are converted into CMYK 4 color digital signals (hereinafter referred to as CMYK signals).

黒色生成処理の一例としては、一般に、スケルトン・ブラックによる黒色生成を行う方法が用いられる。この方法では、スケルトン・カーブの入出力特性をｙ＝ｆ（ｘ）、入力されるデータをＣ，Ｍ，Ｙ、出力されるデータをＣ'，Ｍ'，Ｙ'，Ｋ'、ＵＣＲ（Under Color Removal）率をα（０＜α＜１）とすると、黒色生成下色除去処理は、下記の式（１）〜式（４）で表わされる。
Ｋ'＝ｆ（ｍｉｎ（Ｃ，Ｍ，Ｙ））・・・（１）
Ｃ'＝Ｃ−αＫ' ・・・（２）
Ｍ'＝Ｍ−αＫ' ・・・（３）
Ｙ'＝Ｙ−αＫ' ・・・（４）
ここで、ＵＣＲ率α（０＜α＜１）とは、ＣＭＹが重なっている部分をＫに置き換えてＣＭＹをどの程度削減するかを示すものである。式（１）は、ＣＭＹの各信号強度の内の最も小さい信号強度に応じてＫ信号が生成されることを示している。 As an example of the black color generation process, a method of generating black color using skeleton black is generally used. In this method, the input / output characteristic of the skeleton curve is y = f (x), the input data is C, M, Y, the output data is C ′, M ′, Y ′, K ′, UCR (Under When the color removal rate is α (0 <α <1), the black color generation and under color removal processing is expressed by the following equations (1) to (4).
K ′ = f (min (C, M, Y)) (1)
C ′ = C−αK ′ (2)
M ′ = M−αK ′ (3)
Y ′ = Y−αK ′ (4)
Here, the UCR rate α (0 <α <1) indicates how much CMY is reduced by replacing the portion where CMY overlaps with K. Equation (1) indicates that the K signal is generated in accordance with the smallest signal strength among the signal strengths of CMY.

空間フィルタ処理部１８は、黒色生成下色除去部１７から入力されたＣＭＹＫ信号の画像データに対して、領域分離処理部１５から入力された領域識別信号に基づいてデジタルフィルタによる空間フィルタ処理を行い、空間周波数特性を補正することによって、画像のぼやけ又は粒状性劣化を改善する。例えば、領域分離処理部１５にて文字に分離された領域に対しては、空間フィルタ処理部１８は、文字の再現性を高めるために、高周波成分の強調量が大きいフィルタを用いて空間フィルタ処理を行う。また、領域分離処理部１５にて網点に分離された領域に対しては、空間フィルタ処理部１８は、入力網点成分を除去するためのローパス・フィルタ処理を行う。 The spatial filter processing unit 18 performs spatial filter processing using a digital filter on the image data of the CMYK signal input from the black generation and under color removal unit 17 based on the region identification signal input from the region separation processing unit 15. By correcting the spatial frequency characteristics, image blurring or graininess degradation is improved. For example, for the region separated into characters by the region separation processing unit 15, the spatial filter processing unit 18 uses a filter with a high enhancement amount of the high frequency component in order to improve the character reproducibility. I do. Further, the spatial filter processing unit 18 performs low-pass filter processing for removing the input halftone dot component on the region separated into halftone dots by the region separation processing unit 15.

出力階調補正部１９は、空間フィルタ処理部１８から入力されたＣＭＹＫ信号に対して、画像出力装置３の特性である網点面積率に基づく出力階調補正処理を行う。 The output tone correction unit 19 performs an output tone correction process on the CMYK signal input from the spatial filter processing unit 18 based on the dot area ratio that is a characteristic of the image output device 3.

階調再現処理部２０は、出力階調補正部１９から入力されたＣＭＹＫ信号に対して、領域分離処理部１５から入力された領域識別信号に基づいて、領域に応じた中間調処理を行う。例えば、領域分離処理部１５にて文字に分離された領域に対しては、階調再現処理部２０は、高域周波成分の再現に適した高解像度のスクリーンによる二値化又は多値化の処理を行う。また、領域分離処理部１５にて網点に分離された領域に対しては、階調再現処理部２０は、階調再現性を重視したスクリーンでの二値化又は多値化の処理を行う。次いで、階調再現処理部２０は、処理後の画像データを画像出力装置３へ出力する。 The gradation reproduction processing unit 20 performs halftone processing corresponding to the region on the CMYK signal input from the output gradation correction unit 19 based on the region identification signal input from the region separation processing unit 15. For example, for a region separated into characters by the region separation processing unit 15, the gradation reproduction processing unit 20 performs binarization or multi-value conversion using a high-resolution screen suitable for reproducing high-frequency components. Process. In addition, for a region separated into halftone dots by the region separation processing unit 15, the gradation reproduction processing unit 20 performs binarization or multi-value processing on the screen with an emphasis on gradation reproducibility. . Next, the gradation reproduction processing unit 20 outputs the processed image data to the image output device 3.

圧縮処理部２１は、領域分離処理部１５から入力された領域識別信号とＲＧＢ信号からなる画像データとに基づき、圧縮ファイルを生成する。圧縮処理部２１に入力される画像データは、マトリクス状に配置されている複数の画素で構成されている。圧縮処理部２１は、この画像データを、前景レイヤと背景レイヤとに分離する。そして、前景レイヤを更に２値画像に変換し、各２値画像を例えばＭＭＲ（Modified Modified READ）で可逆圧縮する。他方、背景レイヤを例えばＪＰＥＧで非可逆圧縮する。最後に、可逆圧縮された２値画像及び非可逆圧縮された背景レイヤと、これらを伸張してカラー画像の画像データと成すための伸張情報とを一つのファイルにまとめる。このファイルが圧縮ファイルである。また、この伸張情報としては、圧縮形式を示す情報、及びインデックス・カラー・テーブル（以下、ＩＣテーブルという）等が用いられる。画素毎に生成された領域識別信号の圧縮は、例えば、可逆圧縮方法であるＭＭＲ方式、ＭＲ（Modified READ）方式に基づいて行われる。 The compression processing unit 21 generates a compressed file based on the region identification signal input from the region separation processing unit 15 and the image data composed of RGB signals. The image data input to the compression processing unit 21 is composed of a plurality of pixels arranged in a matrix. The compression processing unit 21 separates this image data into a foreground layer and a background layer. Then, the foreground layer is further converted into a binary image, and each binary image is reversibly compressed by, for example, MMR (Modified Modified READ). On the other hand, the background layer is irreversibly compressed using, for example, JPEG. Finally, the reversible compressed binary image and the irreversible compressed background layer and the decompression information for decompressing them to form color image data are combined into one file. This file is a compressed file. As the decompression information, information indicating a compression format, an index color table (hereinafter referred to as an IC table), and the like are used. The compression of the region identification signal generated for each pixel is performed based on, for example, the MMR method and MR (Modified READ) method which are lossless compression methods.

変換処理部２２は、入力された文書画像に対してフォーマット変換処理を実行する。変換処理部２２の詳細について、以下で説明する。 The conversion processing unit 22 performs format conversion processing on the input document image. Details of the conversion processing unit 22 will be described below.

以上の処理は、画像形成装置１００に備えられる図示しない制御部により制御される。 The above processing is controlled by a control unit (not shown) provided in the image forming apparatus 100.

［３．変換処理部］
図２は、変換処理部（文書画像処理装置）２２の構成を示すブロック図である。変換処理部２２は、行解析処理部（構造解析部）３１と、行ブロック解析処理部（構造解析部）３２と、レイアウト解析処理部（構造解析部）３３と、変換可否判定処理部（変換判定部）３４、再配置処理部（参照リスト生成部）３５と、を備える。以下では、文書を構成する個々の文字、図、表を要素（要素画像）と称する。図はグラフを含むものとする。 [3. Conversion processing unit]
FIG. 2 is a block diagram illustrating a configuration of the conversion processing unit (document image processing apparatus) 22. The conversion processing unit 22 includes a row analysis processing unit (structure analysis unit) 31, a row block analysis processing unit (structure analysis unit) 32, a layout analysis processing unit (structure analysis unit) 33, and a conversion availability determination processing unit (conversion). A determination unit) 34, and a rearrangement processing unit (reference list generation unit) 35. Hereinafter, individual characters, diagrams, and tables constituting a document are referred to as elements (element images). The figure shall include a graph.

行解析処理部３１は、文書画像から各要素を抽出して、文字（文字画像）から構成される文字列の行（文字列行）と、図（図画像）又は表（表画像）から成る行（図表行）に分類する。更に、文書の横書き、縦書きといった記述方向を示す文書第１方向を解析する。 The line analysis processing unit 31 extracts each element from the document image, and includes a character string line (character string line) composed of characters (character images) and a figure (graphic image) or a table (table image). Sort into rows (chart rows). Further, the first document direction indicating the writing direction such as horizontal writing or vertical writing of the document is analyzed.

行ブロック解析処理部３２は、行解析処理部３１で抽出された行を、少なくとも１つ以上有する行ブロックに統合する処理を行う。 The row block analysis processing unit 32 performs processing for integrating the rows extracted by the row analysis processing unit 31 into row blocks having at least one or more rows.

レイアウト解析処理部３３は、行ブロック解析処理部３２で分類された行ブロック同士の位置関係から、段組構成を解析して文書全体の行の順序付けを行い、行の前後関係から改行位置を検出することで、文書を１つ以上の段落に分類し、段落毎に行の情報を格納した文書構造ツリーを生成する。詳細は後述する。文書構造ツリーの各段落は、文字列の行の順序の情報と、図表の順序の情報をそれぞれ分けて格納することで、図表の配置を段落内で修正できるようにする。 The layout analysis processing unit 33 analyzes the column structure from the positional relationship between the row blocks classified by the line block analysis processing unit 32, orders the lines of the entire document, and detects the line feed position from the line context. As a result, the document is classified into one or more paragraphs, and a document structure tree storing line information for each paragraph is generated. Details will be described later. Each paragraph of the document structure tree stores the information on the order of the character string lines and the information on the order of the chart separately, so that the arrangement of the chart can be corrected in the paragraph.

変換可否判定処理部３４は、行解析処理部３１からレイアウト解析処理部３３までの処理にて得られた情報から、文書画像をリフロー型に変換するか否かを判定する。変換可否判定処理部３４は、リフロー型に変換しないと判定した場合は、圧縮処理部２１に判定信号を出力する。圧縮処理部２１では、上記判定信号を受信すると、ＲＧＢの画像データを例えば、ＪＰＥＧファイルフォーマットに変換して出力する。すなわち、送信装置４の送信先の表示装置にてフィックス型で表示されるように送信装置４の送信先の表示装置での表示に適したフォーマットに変換して出力する。なお、圧縮処理部２１は、ＪＰＥＧファイルフォーマット以外に、例えば、ＰＮＧファイルフォーマットあるいはＧＩＦファイルフォーマットに変換してもよい。 The conversion possibility determination processing unit 34 determines whether or not to convert the document image to the reflow type from information obtained by the processing from the line analysis processing unit 31 to the layout analysis processing unit 33. The conversion possibility determination processing unit 34 outputs a determination signal to the compression processing unit 21 when determining that the conversion to the reflow type is not performed. Upon receiving the determination signal, the compression processing unit 21 converts the RGB image data into, for example, a JPEG file format and outputs it. In other words, it is converted into a format suitable for display on the display device of the transmission destination of the transmission device 4 so as to be displayed on the display device of the transmission destination of the transmission device 4 and output. Note that the compression processing unit 21 may convert, for example, to the PNG file format or the GIF file format in addition to the JPEG file format.

再配置処理部３５は、変換可否判定処理部３４にてリフロー型に変換すると判定された文書画像に対して、レイアウト解析処理部３３で生成された文書構造ツリーに従って、文書画像の文字、図、表の各要素を順序通り参照するための命令と、段落の開始及び終了を宣言するための命令を列記した参照リストとを生成する。参照リストのフォーマットは特に固定されておらず、例えばＨＴＭＬ等のマークアップ言語で記述した文書の形式として生成したものをファイル出力してもよい。 The rearrangement processing unit 35 applies the characters, diagrams, and characters of the document image according to the document structure tree generated by the layout analysis processing unit 33 to the document image determined to be converted to the reflow type by the conversion possibility determination processing unit 34. An instruction for referring to each element of the table in order and a reference list listing instructions for declaring the start and end of a paragraph are generated. The format of the reference list is not particularly fixed. For example, a file generated as a document format described in a markup language such as HTML may be output as a file.

なお、本実施形態では、変換処理部２２の処理について、画像入力装置２が読み取った画像データ（文書画像）を処理する場合を用いて説明するが、ネットワークを介して受信した、あるいは、ＵＳＢメモリ等のメモリに格納されているデータ（ＰＤＦファイル又は構造化された文書ファイル（doc、txt、odf、xls等））については、次のように処理を行う。受信した、あるいは、メモリに格納されているデータを、不図示のソフトウェア処理部において、文書画像ファイル（jpeg、tiff、bmp等）に変換し、文書画像ファイルに変換されたデータを変換処理部２２に入力する。受信した、あるいは、メモリに格納されたデータが文書画像ファイルである場合は、ソフトウェア処理部において、何ら処理は行わない。 In the present embodiment, the processing of the conversion processing unit 22 will be described using a case where image data (document image) read by the image input device 2 is processed. However, it is received via a network or a USB memory. For data stored in a memory such as a PDF file or a structured document file (doc, txt, odf, xls, etc.), processing is performed as follows. Data received or stored in the memory is converted into a document image file (jpeg, tiff, bmp, etc.) by a software processing unit (not shown), and the data converted into the document image file is converted into a conversion processing unit 22. To enter. If the data received or stored in the memory is a document image file, the software processing unit does not perform any processing.

文書画像ファイルに変換されたデータについては、変換可否判定処理部３４において、リフロー型に変換するか否かの判定を行う。リフロー型に変換しないと判定された場合、圧縮処理部２１に文書画像ファイルに変換されたデータが出力される。圧縮処理部２１は、文書画像ファイルに変換されたデータが、ＪＰＥＧファイルフォーマットである場合は、何も処理を行わずにそのまま出力する。文書画像ファイルに変換されたデータが、tiff又はbmpデータである場合は、文書画像ファイルに変換されたデータを、例えば、ＪＰＥＧファイルフォーマットに変換して出力する。つまり、送信装置４の送信先の表示装置での表示に適したフォーマットに変換して出力する。 For the data converted into the document image file, the conversion possibility determination processing unit 34 determines whether or not to convert the data into a reflow type. If it is determined not to convert to the reflow type, the data converted into the document image file is output to the compression processing unit 21. When the data converted into the document image file is in the JPEG file format, the compression processing unit 21 outputs the data as it is without performing any processing. When the data converted into the document image file is tiff or bmp data, the data converted into the document image file is converted into, for example, a JPEG file format and output. In other words, the data is converted into a format suitable for display on the transmission destination display device of the transmission device 4 and output.

以下、変換処理部２２の各処理部について詳述する。 Hereinafter, each processing unit of the conversion processing unit 22 will be described in detail.

[４．行解析処理部］
＜４−１．行解析処理部の構成＞
行解析処理部３１は、文書画像から各要素画像を抽出し、文字列行と図表行とに分類する。なおグラフは図表行に含まれるものとする。行解析処理部３１は、文字列抽出処理部３１ａ及び図表抽出処理部３１ｂを備えて構成される。行解析処理部３１は、更に、文書の横書き又は縦書きといった記述方向を示す文書第１方向を解析する。 [4. Line analysis processing section]
<4-1. Configuration of line analysis processing section>
The line analysis processing unit 31 extracts each element image from the document image and classifies it into a character string line and a chart line. Note that the graph is included in the chart row. The line analysis processing unit 31 includes a character string extraction processing unit 31a and a chart extraction processing unit 31b. The line analysis processing unit 31 further analyzes a document first direction indicating a description direction such as horizontal writing or vertical writing of the document.

＜４−２．文字列抽出処理部＞
文字列抽出処理部３１ａは、文書画像から個々の文字を抽出（検出して切り出す）すると共に、文字が複数並べられて構成される文字列を抽出する。文字及び文字列の抽出は、次の方法により行う。文書画像から文字領域の画素を抽出し、その中から１つの文字を構成していると思われる画素の集合を囲む最小外接矩形を文字構成要素として抽出する。更に、上下左右の各方向における近隣の各文字構成要素の矩形同士の距離から文字列として連続する文字構成要素の関係にあるかを判定し、その連続する矩形の連続数から、文字列領域を特定する。このとき、左右方向における連続数が上下方向における連続数を上回る場合は横書きの文字列領域として、上下方向における連続数が左右方向における連続数を上回る場合は縦書きの文字列領域として、文字列の持つ方向（文字列の方向、記述方向）を同時に取得する。 <4-2. Character string extraction processing section>
The character string extraction processing unit 31a extracts (detects and cuts out) individual characters from the document image, and extracts a character string formed by arranging a plurality of characters. Characters and character strings are extracted by the following method. A pixel of a character area is extracted from a document image, and a minimum circumscribed rectangle surrounding a set of pixels considered to constitute one character is extracted from the document image as a character component. Further, it is determined whether there is a relationship between consecutive character components as a character string from the distance between rectangles of adjacent character components in each of the upper, lower, left, and right directions, and the character string region is determined from the number of consecutive rectangles. Identify. At this time, if the continuous number in the left-right direction exceeds the continuous number in the vertical direction, the character string area is written horizontally.If the continuous number in the vertical direction exceeds the continuous number in the left-right direction, the character string is written as the vertical character string area. The direction (character string direction, description direction) of is simultaneously acquired.

なお、文字及び文字列の抽出方法は、上記に記載の方法に限らず他の方法を用いることができる。例えば、光学式文字読取装置（ＯｐｔｉｃａｌＣｈａｒａｃｔｅｒＲｅｃｏｇｎｉｔｉｏｎ；以下ＯＣＲ）で個々の文字及び文字列を抽出してもよい。 The method for extracting characters and character strings is not limited to the method described above, and other methods can be used. For example, individual characters and character strings may be extracted by an optical character reader (hereinafter referred to as OCR).

＜４−３．図表抽出処理部＞
図表抽出処理部３１ｂは、文書画像から図（図領域）及び表（表領域）を抽出する。 <4-3. Chart Extraction Processing Unit>
The chart extraction processing unit 31b extracts a figure (figure area) and a table (table area) from the document image.

図領域の抽出は次の方法により行う。文書画像の所定領域毎に画素値の出現頻度（即ち、ヒストグラム）を求めた場合に、図領域の一つである写真領域上の各画素では濃度変化が広範囲に及ぶヒストグラムが得られることを利用して、ヒストグラムのエントロピー（平均情報量）を算出する。このようにエントロピーが高い領域を抽出することで精度よく写真領域を抽出することが可能となる。図領域の抽出方法は、上記に記載の方法に限らず他の方法を用いてもよい。 The drawing area is extracted by the following method. Utilizing the fact that when a pixel value appearance frequency (that is, a histogram) is obtained for each predetermined area of a document image, a histogram with a wide range of density changes is obtained for each pixel on the photographic area, which is one of the figure areas. Then, the entropy (average information amount) of the histogram is calculated. Thus, by extracting a region with high entropy, it is possible to accurately extract a photographic region. The method for extracting the figure region is not limited to the method described above, and other methods may be used.

また、表領域の抽出は、次の方法により行う。文書画像からラインとなる可能性のある候補画素を抽出し、前記候補画素が水平方向もしくは垂直方向に所定画素数以上連続する場合に前記連続する候補画素の集合をラインとして抽出し、前記抽出された水平方向及び垂直方向のラインの位置関係から、各ラインが表を構成する罫線であるか単一のラインであるかを判定し、同一の表を構成するラインの集合について、それら全てを囲む最小外接矩形を表領域として抽出する。このように抽出することで、精度よく表領域を検出することが可能である。なお、表領域の抽出方法は、上記に記載の方法に限らず他の方法を用いてもよい。 The table area is extracted by the following method. Extracting candidate pixels that may be lines from the document image, and extracting the set of consecutive candidate pixels as a line when the candidate pixels continue for a predetermined number of pixels in the horizontal or vertical direction, the extracted From the positional relationship between the horizontal and vertical lines, it is determined whether each line is a ruled line or a single line constituting the table, and all of the sets of lines constituting the same table are enclosed. The minimum circumscribed rectangle is extracted as a table area. By extracting in this way, it is possible to detect the table area with high accuracy. Note that the table region extraction method is not limited to the method described above, and other methods may be used.

グラフは図の一種であるが、以下のようにグラフ領域を抽出することができる。円グラフの様にその形状だけでグラフが構成されているグラフや、棒グラフ等の様にグラフの軸の線と繋がっており、ベタや網掛け等のあるグラフの場合は、前述したエントロピー値を用いてグラフ領域を抽出することが可能である。また、折れ線グラフのような線ベースのグラフでグラフの軸から離れているようなグラフの場合は、次のように、抽出可能である。図３２の（ａ）〜（ｃ）に示すように、前記エントロピー値と並行して表を構成する罫線ほどの水平又は垂直ラインではないが、単一の直線とは異なり、矩形やＬ字型やＵ字型のような水平又は垂直ラインの繋がりのある領域に対して、その領域の最外郭を矩形化処理し、その矩形の大きさが、予め定められた閾値以上の面積を持つ矩形であり、かつ、その矩形領域内にエントロピー値がある程度高い領域が存在するかを判定することでグラフ領域を抽出することが可能である。なお、グラフ領域の抽出方法は、上記に記載の方法に限らず他の方法を用いてもよい。 Although the graph is a kind of diagram, the graph region can be extracted as follows. In the case of a graph that is composed only of its shape, such as a pie graph, or a graph that is connected to the axis line of a graph, such as a bar graph, etc. It is possible to extract a graph area by using it. Further, in the case of a graph that is separated from the graph axis by a line-based graph such as a line graph, it can be extracted as follows. As shown in (a) to (c) of FIG. 32, it is not as horizontal or vertical as the ruled lines constituting the table in parallel with the entropy value, but is different from a single straight line in a rectangle or L-shape. For areas with horizontal or vertical line connections such as U-shaped or U-shaped, the outermost outline of the area is rectangularized, and the size of the rectangle is a rectangle having an area equal to or greater than a predetermined threshold. It is possible to extract a graph region by determining whether there is a region having a certain degree of entropy value in the rectangular region. Note that the graph region extraction method is not limited to the method described above, and other methods may be used.

なお、文字列抽出処理部３１ａで抽出した文字が、図表抽出処理部３１ｂで抽出した図又は表として抽出した範囲と重複する場合、該抽出した文字をキャンセルする。特に、抽出された表には文字が含まれる可能性が高いが、表のサイズを表示領域の幅に合わせるためには表を構成する各列の幅を調整する必要がある。結果として、調整後の列幅に合わせて、表内の文字列は折り返し表示され、かえって可読性を低下させる原因となる。そのため、本実施の形態では、表として抽出された領域については、文字も含めたまま図表として抽出して表示する。 If the character extracted by the character string extraction processing unit 31a overlaps the range extracted as a diagram or table extracted by the chart extraction processing unit 31b, the extracted character is canceled. In particular, there is a high possibility that characters are included in the extracted table. However, in order to match the size of the table to the width of the display area, it is necessary to adjust the width of each column constituting the table. As a result, the character string in the table is displayed in a folded manner in accordance with the adjusted column width, which causes a decrease in readability. For this reason, in the present embodiment, the area extracted as a table is extracted and displayed as a chart including characters.

＜４−４．行ＩＤの設定＞
行解析処理部３１は、以上のようにして抽出された文字列行及び図表行に対して、各行を識別する重複しない識別記号として行ＩＤ（Identification）を設定する。１つの行ＩＤについて、その行ＩＤを有する行が２つ以上存在しなければ、必ずしも文書の順序に従って行ＩＤを割り振る必要は無い。行ＩＤの設定方法として、行ＩＤが「０」の場合を存在しない行である無効行とし、例えば、各ページの文書画像において、読み取った原稿の左上を原点（０，０）とし、原点に対して、右方向をＸ座標、下方向をＹ座標となる座標系を採用し、行の範囲を表わす最も左上のＹ座標が小さい順に行ＩＤを連番で割り振る方法が挙げられる。なお、Ｙ座標が同じ行同士はＸ座標が小さい方の行を割り振りにおいて優先する。この方法を用いる場合、段組構成により必ずしも文書の読み順序通りに行ＩＤが割り振られる訳ではないが、横書き文書であれば行が上にあるほど順序が先である可能性が高いため、比較的文書の順序を反映した行ＩＤの割り振り方になると言える。行ＩＤの設定方法はこれに限らず、自由に選択することができる。 <4-4. Setting of row ID>
The row analysis processing unit 31 sets a row ID (Identification) as a non-overlapping identification symbol for identifying each row for the character string row and the chart row extracted as described above. If two or more rows having the row ID do not exist for one row ID, it is not always necessary to assign the row ID according to the document order. As a method for setting the row ID, a case where the row ID is “0” is set as an invalid row which does not exist. For example, in the document image of each page, the upper left corner of the read original is set as the origin (0, 0). On the other hand, there is a method of adopting a coordinate system in which the right direction is the X coordinate and the lower direction is the Y coordinate, and the row IDs are assigned sequentially in ascending order of the top left Y coordinate representing the row range. It should be noted that the rows having the same Y coordinate are given priority in allocating the row having the smaller X coordinate. When this method is used, line IDs are not necessarily assigned according to the reading order of the document due to the column structure. However, in the case of a horizontally written document, there is a high possibility that the order is higher as the line is higher. It can be said that this is a way of allocating row IDs reflecting the order of the target documents. The setting method of the row ID is not limited to this, and can be freely selected.

＜４−５．文書第１方向及び文書第２方向の決定＞
さらに、行解析処理部３１は、１ページの文書画像における全ての行について、文字列行か図表行かの分類が終わると、文字列の方向から、文書全体の方向を示す文書第１方向を決定する。文書第１方向は、横書きのとき水平となり、縦書きのとき垂直となる。文書第１方向は、取得した全ての文字列の持つ方向を分類し、その比率により決定する。文書第１方向を決定するための比率の算出方法の簡単な例として、単純に横書きもしくは縦書きの文字列の数をカウントして、その数の比率を算出する方法が挙げられる。この方法の場合、例えば横書きの行数と縦書きの行数を比較して、多い方の方向を文書第１方向として決定（設定）する。文書第１方向の決定は上記の方法に限らず様々な方法を採用することができる。 <4-5. Determination of document first direction and document second direction>
Further, when the line analysis processing unit 31 classifies all lines in one page of the document image as a character string line or a chart line, the line analysis processing unit 31 determines a document first direction indicating the direction of the entire document from the direction of the character string. . The first direction of the document is horizontal for horizontal writing and vertical for vertical writing. The first direction of the document is determined by classifying the directions of all the acquired character strings and by the ratio. As a simple example of the ratio calculation method for determining the document first direction, there is a method of simply counting the number of horizontally or vertically written character strings and calculating the ratio of the number. In the case of this method, for example, the number of horizontal writing lines is compared with the number of vertical writing lines, and the larger direction is determined (set) as the first document direction. The determination of the document first direction is not limited to the above method, and various methods can be adopted.

ここで、算出した比率が所定閾値（例えば、０．７）以下である場合、文書には縦書きの行と横書きの行とが無視できない比率で混在しており文書全体の方向を一意に判別できないとして、行ブロック解析処理部３２及びレイアウト解析処理部３３での処理を行わず、変換可否判定処理部３４にて入力文書画像をリフロー型に変換しないと判定する。 Here, when the calculated ratio is equal to or less than a predetermined threshold (for example, 0.7), the vertical writing line and the horizontal writing line are mixed in a ratio that cannot be ignored in the document, and the direction of the entire document is uniquely determined. If it is not possible, the line block analysis processing unit 32 and the layout analysis processing unit 33 do not perform processing, and the conversion possibility determination processing unit 34 determines that the input document image is not converted to the reflow type.

さらに、上記の方法によって文書第１方向を決定すると、文書第１方向に直交する方向として文書第２方向を決定（設定）する。すなわち、文書第１方向が水平（横書き）の場合、文書第２方向は垂直、文書第１方向が垂直（縦書き）の場合、文書第２方向は水平となる。 Further, when the document first direction is determined by the above method, the document second direction is determined (set) as a direction orthogonal to the document first direction. That is, when the document first direction is horizontal (horizontal writing), the document second direction is vertical, and when the document first direction is vertical (vertical writing), the document second direction is horizontal.

＜４−６．記号（約物）の統合処理＞
個々の文字の切り出しにおいて、以下に示すような記号（約物）の統合処理を追加することができる。記号には、例えば行頭に来ることが禁止とされる（行頭禁則）ものや、行末に来ることが禁止とされる（行末禁則）ものがあり、図３に示すような文字がその一部として挙げられる。切り出した個々の文字を表示した際、行の折り返しによりこれらのルールが守られず可読性が低下する場合がある。そこで、行頭禁則の記号については、１つ前の文字と統合し、行末禁則の記号については、１つ後の文字と統合することで、単独で行頭もしくは行末に来ることがなくなる。 <4-6. Integrated processing of symbols (about items)>
In the extraction of individual characters, a symbol (about object) integration process as shown below can be added. There are, for example, symbols that are prohibited from coming to the beginning of a line (prohibition at the beginning of a line) and symbols that are prohibited from coming to the end of a line (prohibition at the end of line), and characters as shown in FIG. Can be mentioned. When individual cut out characters are displayed, these rules may not be observed due to line wrapping, and readability may deteriorate. Therefore, by combining the preceding character with the preceding character for the prohibition symbol and integrating with the succeeding character for the prohibition symbol, the character at the beginning or the end of the line can be prevented.

各文字が、前述したルールを持つ記号であるかの判定方法は公知の方法を使用することができる。例えば、ＯＣＲ処理を利用して文字種を照合してもよいし、文字の大きさや、文字を構成する画素の特徴から判別してもよい。例えば句読点の場合、図４のように横書きである場合に、行の下半分のみで構成され、また行の高さ（矢印で示された範囲）に比べて半分程度の幅を持つ場合、その文字が句読点である可能性が高いとして、１つ前の文字と統合してもよい。 A known method can be used as a method for determining whether each character is a symbol having the above-described rule. For example, the character type may be collated using OCR processing, or may be determined from the size of the character and the characteristics of the pixels constituting the character. For example, in the case of a punctuation mark, when it is horizontal writing as shown in FIG. 4, it is composed of only the lower half of the line and has a width about half the height of the line (the range indicated by the arrow). Assuming that the character is likely to be a punctuation mark, it may be integrated with the previous character.

図４では横書きの例を示したが、縦書きの場合も同様に統合を行うことができる。半角英小文字と区別するため、行を構成する他の文字の高さや幅の傾向から和文、英文の判定を加え、和文の場合のみ句読点と判定するようにする等の処理を追加してもよい。例えば、和文ではひらがな、カタカナ及び漢字等の全角文字が文章の大半を占めており、行の上半分もしくは下半分のみで構成される文字が少なくなる傾向がある。また、半角文字に比べて、全角文字では行の高さに対して文字の横幅が半分より大きい文字の種類が多い。従って、（１）行を上下に分割する水平方向の直線をまたぎ、（２）文字の横幅が行の高さに所定係数（例えば０．６）を乗算した値以上である、文字数をカウントし、行を構成する文字数に対して前記（１）及び（２）を満たす文字数の割合が所定閾値（例えば０．５）以上である場合に、その行が和文であるとして判定する処理を適用することができる。和文、英文の判定方法はこの方法に限らず、他の方法により判定してもよい。また、縦書きの文書である場合は自動的に和文とみなしてもよい。 Although an example of horizontal writing is shown in FIG. 4, integration can be similarly performed in the case of vertical writing. In order to distinguish it from single-byte lower-case letters, processing such as adding Japanese and English sentences based on the tendency of the height and width of other characters that make up the line, and determining punctuation marks only for Japanese sentences may be added. . For example, in Japanese, full-width characters such as hiragana, katakana, and kanji occupy most of the sentence, and there is a tendency that characters composed of only the upper half or the lower half of a line are reduced. In addition, compared to half-width characters, there are many types of characters with full-width characters whose width is greater than half of the line height. Therefore, (1) straddling a horizontal straight line that divides the line up and down, and (2) counting the number of characters whose horizontal width is equal to or greater than the value obtained by multiplying the line height by a predetermined coefficient (for example, 0.6). When the ratio of the number of characters satisfying the above (1) and (2) with respect to the number of characters constituting the line is equal to or greater than a predetermined threshold (for example, 0.5), a process for determining that the line is a Japanese sentence is applied. be able to. The determination method of Japanese and English is not limited to this method, and may be determined by other methods. If the document is vertically written, it may be automatically regarded as a Japanese sentence.

ここで、句読点と、「ァ」等小さい和字との区別がつかない可能性もあるが、これらの小さい和字も行頭禁則であるため句読点と同様に統合しても問題無い。そのため、厳密に句読点専用の処理とする必要はない。 Here, there is a possibility that punctuation marks cannot be distinguished from small Japanese characters such as “a”. However, since these small Japanese characters are also prohibited from beginning of line, there is no problem even if they are integrated in the same manner as punctuation marks. Therefore, it is not necessary to strictly process punctuation.

＜４−７．処理例＞
行解析処理部３１が実行する処理の具体例として、図５に示す構造の文書画像（１ページ）に対して行解析処理を適用する場合について説明する。行解析処理部３１は、図６に示すように、その行の要素を全て含んだ最小サイズの外接矩形の範囲を各行の領域として分離し、それぞれの行に、行ＩＤを、外接矩形の左上の垂直座標（Ｙ座標）位置の順で割り当てる。図６に示す文書画像では、行ＩＤが１０５の行が図表の行であることを除いては、残りの行はいずれも横書きの文字列の行であり、縦書きの文字列の行は１つも含まれていない。そのため、行解析処理部３１は、この文書画像における文書第１方向は水平方向であると決定する。 <4-7. Processing example>
As a specific example of the process executed by the line analysis processing unit 31, a case where the line analysis process is applied to a document image (one page) having the structure shown in FIG. 5 will be described. As shown in FIG. 6, the row analysis processing unit 31 separates a range of a circumscribed rectangle having a minimum size including all the elements of the row as a region of each row, and assigns a row ID to each row and an upper left corner of the circumscribed rectangle. Are assigned in the order of their vertical coordinates (Y coordinates). In the document image shown in FIG. 6, the remaining lines are horizontal character string lines except that the line ID 105 is a chart line, and the vertical character string line is 1. No one is included. Therefore, the line analysis processing unit 31 determines that the first document direction in the document image is the horizontal direction.

［５．行ブロック解析処理部］
＜５−１．行ブロック解析処理部の処理＞
行ブロック解析処理部３２は、行解析処理部３１で分類された文字列行を、少なくとも１つ以上の文字列行から成る文字列の行ブロックに統合し、重複しない行ブロックＩＤを持つ新規行ブロックとして記憶部５に記憶（登録）する。行ブロック解析処理部３２は、図表行については、単一行で１つの行ブロックを構成するものとし、それぞれ重複しない行ブロックＩＤを持つ新規行ブロックとして登録する。 [5. Row block analysis processing unit]
<5-1. Processing of line block analysis processing section>
The row block analysis processing unit 32 integrates the character string rows classified by the row analysis processing unit 31 into a character string row block including at least one character string row, and creates a new row having a non-overlapping row block ID. Store (register) in the storage unit 5 as a block. The row block analysis processing unit 32 registers one row block with a single row as a new row block having a row block ID that does not overlap each other.

行ブロック解析処理部３２による文字列の行ブロック統合処理について以下で詳細に説明する。初めに、行解析処理部３１で分類された行のうち文字列行のグループから、注目行Ｌ１を選択する。続いて、注目行Ｌ１に関して、前方及び後方（定義は、後述の＜５−５＞章を参照）に連続する文字列行を探索する。具体的には、文字列行のグループの、注目行Ｌ１とは異なる文字列行から、注目行Ｌ１の前方もしくは後方に連続する文字列行を最大１つずつ選択する。注目行Ｌ１の連続行の候補となる文字列行は注目行Ｌ１を除く全ての文字列行であり、連続行の候補となる条件については後述する。 The character string row block integration processing by the row block analysis processing unit 32 will be described in detail below. First, the target line L1 is selected from the group of character string lines among the lines classified by the line analysis processing unit 31. Subsequently, with respect to the target line L1, a character string line that continues forward and backward (for details, see <5-5> section below) is searched. Specifically, at most one character string row that is continuous forward or backward of the target line L1 is selected from a character string line that is different from the target line L1 in the group of character string lines. The character string rows that are candidates for the continuous line of the target line L1 are all the character string lines except the target line L1, and the conditions that are candidates for the continuous line will be described later.

全ての文字列行について、前方及び後方に連続する文字列行を選択すると、連続する文字列行同士の繋がりから、前方及び後方の両方において連続する文字列行がなくなるまで１つの行ブロックとして分類、統合し、未割り当ての行ブロックＩＤを持つ新規行ブロックとして記憶部５に登録する。全ての文字列行がいずれかの行ブロックに登録されるまで処理を繰り返し、全ての文字列行についての登録が完了すると、行ブロック解析処理部３２は処理を終了する。 For all the character string lines, if a character string line that is continuous forward and backward is selected, it is classified as one line block from the connection of consecutive character string lines until there is no continuous character string line in both the forward and backward directions. Then, they are integrated and registered in the storage unit 5 as a new row block having an unassigned row block ID. The process is repeated until all the character string lines are registered in any one of the line blocks. When the registration for all the character string lines is completed, the line block analysis processing unit 32 ends the process.

＜５−２．連続行の候補の判定＞
ここで、注目行Ｌ１とは別に選択された文字列行Ｌ２が、注目行Ｌ１の連続行の候補であるか否かを判定する方法について説明する。行Ｌ２が行Ｌ１の連続行の候補である条件として、少なくとも下記２つの条件を満たすものとする。
条件１：一方の行の先頭から末尾までの範囲において、もう一方の行の先頭もしくは末尾のうち少なくとも一方が存在する。
条件２：２つの行の行間変位量ｌｉｎｅｓｐａｃｅ（Ｌ１，Ｌ２）が下記の式（５）を満たす。
ＴＨ＿ＭＩＮ＿ＬＳ≦ｌｉｎｅｓｐａｃｅ（Ｌ１，Ｌ２）≦ＴＨ＿ＭＡＸ＿ＬＳ・・・（５）
（ＴＨ＿ＭＩＮ＿ＬＳ、ＴＨ＿ＭＡＸ＿ＬＳは、予め設定される閾値）
なお、条件１は、異なる段に属する行を連続行の候補として判定しないために用いる。条件２は、行間が広過ぎる又は狭過ぎる行を連続行の候補として判定しないために用いる。 <5-2. Judgment of candidates for continuous lines>
Here, a method for determining whether or not the character string row L2 selected separately from the attention row L1 is a candidate for a continuous row of the attention row L1 will be described. It is assumed that at least the following two conditions are satisfied as a condition that the row L2 is a candidate for a continuous row of the row L1.
Condition 1: In the range from the beginning to the end of one line, at least one of the beginning or the end of the other line exists.
Condition 2: The inter-line displacement amount linespace (L1, L2) of the two rows satisfies the following expression (5).
TH_MIN_LS ≦ linespace (L1, L2) ≦ TH_MAX_LS (5)
(TH_MIN_LS and TH_MAX_LS are preset threshold values)
Condition 1 is used so that lines belonging to different stages are not determined as continuous line candidates. Condition 2 is used in order not to determine a line whose line spacing is too wide or too narrow as a candidate for a continuous line.

図７の（ａ）は、２段組の横書きの文書の例、図７の（ｂ）は、２段組の縦書きの文書の例である。条件１を満たすために、行Ｌ１と行Ｌ２とは、文書第１方向で一部もしくは全部が重複している必要がある。例えば、図７の例の場合、行ａと行ｂ、行ｃと行ｄ、行ｅと行ｆ、行ｇと行ｈは、条件１を満たすため、これらの組合せは互いに連続行の候補となる。しかし、行ａと行ｄ、行ｅと行ｈ等の組合せでは条件１を満たさないため、これらの組合せは互いに連続行の候補とならない。 FIG. 7A shows an example of a horizontally written document with two columns, and FIG. 7B shows an example of a vertically written document with two columns. In order to satisfy the condition 1, part or all of the lines L1 and L2 need to overlap in the first direction of the document. For example, in the example of FIG. 7, row a and row b, row c and row d, row e and row f, row g and row h satisfy condition 1, and therefore these combinations are mutually consecutive row candidates. Become. However, since the condition 1 is not satisfied in the combination of the row a and the row d, the row e and the row h, these combinations are not candidates for continuous rows.

条件２で示す行Ｌ１と行Ｌ２との行間変位量ｌｉｎｅｓｐａｃｅ（Ｌ１，Ｌ２）は、図８に示すように、横書きであれば行Ｌ１と行Ｌ２とのうち下側にある方の行の上端座標と、もう一方の行の下端座標との差分値（図８の（ａ）〜（ｃ）参照）、縦書きであれば行Ｌ１と行Ｌ２とのうち左にある方の行の右端座標ともう一方の行の左端座標との差分値（図８の（ｄ）〜（ｆ）参照）である。行Ｌ１と行Ｌ２とが重複しないとき、行間変位量ｌｉｎｅｓｐａｃｅ（Ｌ１，Ｌ２）は２つの行の行間距離を示す。また、ＴＨ＿ＭＩＮ＿ＬＳ及びＴＨ＿ＭＡＸ＿ＬＳは、連続行同士の行間変位量として許容される差分値の最小値及び最大値を示す所定係数である。例えば、行Ｌ１の文字サイズに所定係数ｒ１（例えばｒ１＝０．１）を乗算したものを閾値ＴＨ＿ＭＩＮ＿ＬＳと設定し、所定係数ｒ２（ｒ２は正の数とする、例えばｒ２＝１．５）を乗算したものを閾値ＴＨ＿ＭＡＸ＿ＬＳとして設定する。 As shown in FIG. 8, the interline displacement amount linespace (L1, L2) between the rows L1 and L2 shown in the condition 2 is the upper end of the lower row of the rows L1 and L2 in the case of horizontal writing. The difference value between the coordinates and the lower end coordinates of the other line (see (a) to (c) of FIG. 8). In the case of vertical writing, the right end coordinates of the left line of the lines L1 and L2 And the difference value between the left end coordinates of the other line (see (d) to (f) of FIG. 8). When the line L1 and the line L2 do not overlap, the interline displacement amount linespace (L1, L2) indicates the interline distance between the two lines. Moreover, TH_MIN_LS and TH_MAX_LS are predetermined coefficients indicating the minimum value and the maximum value of the difference value allowed as the inter-row displacement amount between consecutive rows. For example, a value obtained by multiplying the character size of the line L1 by a predetermined coefficient r1 (for example, r1 = 0.1) is set as a threshold TH_MIN_LS, and the predetermined coefficient r2 (r2 is a positive number, for example, r2 = 1.5) is set. The multiplied value is set as the threshold value TH_MAX_LS.

閾値ＴＨ＿ＭＩＮ＿ＬＳ及び閾値ＴＨ＿ＭＡＸ＿ＬＳは、他の方法により設定されてもよく、例えば行Ｌ１と行Ｌ２の文字サイズの平均値に所定係数を乗算したものとしてもよい。また閾値ＴＨ＿ＭＩＮ＿ＬＳを正値に設定することで、重複のある２つの行同士を連続行として認めないようにすることができる。逆に閾値ＴＨ＿ＭＩＮ＿ＬＳを負値に設定することで、図８の（ｃ）及び（ｆ）のように、行Ｌ１と行Ｌ２とが多少重複する場合も許容することができる。 The threshold value TH_MIN_LS and the threshold value TH_MAX_LS may be set by other methods. For example, the average value of the character sizes of the lines L1 and L2 may be multiplied by a predetermined coefficient. Moreover, by setting the threshold value TH_MIN_LS to a positive value, it is possible to prevent two overlapping rows from being recognized as continuous rows. Conversely, by setting the threshold value TH_MIN_LS to a negative value, it is also possible to allow a case where the row L1 and the row L2 slightly overlap as shown in (c) and (f) of FIG.

＜５−３．条件の強化：インデントの範囲指定＞
また、連続行の候補を判定する条件を強化するために、上記条件１，２に加えて、別の条件を設定してもよい。例えば、次式（６）を満たすことを条件として追加することができる。
ｉｎｄｅｎｔ（Ｌ１,Ｌ２）≦ＴＨ＿ＩＮＤＥＮＴ・・・（６）
ここで、ｉｎｄｅｎｔ（Ｌ１，Ｌ２）は行Ｌ１の開始位置の文書第１方向成分と行Ｌ２の開始位置の文書第１方向成分の差の大きさであり、すなわちインデントの大きさを意味する。また、閾値ＴＨ＿ＩＮＤＥＮＴは行の先頭のインデントとして許容される距離を示す所定係数である。閾値ＴＨ＿ＩＮＤＥＮＴは、例えば行Ｌ１の文字サイズに所定係数α（αは正の数とする、例えばα＝１．５）を乗算した値を与え、α文字以内のインデントを許容することができる。閾値ＴＨ＿ＩＮＤＥＮＴは他の方法により設定してもよく、例えば行Ｌ１と行Ｌ２との文字サイズの平均値に所定係数αを乗算したものとしてもよい。 <5-3. Strengthening of conditions: Specifying the indent range>
In addition to the above conditions 1 and 2, another condition may be set in order to reinforce the condition for determining candidates for continuous lines. For example, it can be added as a condition that the following expression (6) is satisfied.
indent (L1, L2) ≦ TH_INDENT (6)
Here, indent (L1, L2) is the magnitude of the difference between the document first direction component at the start position of line L1 and the document first direction component at the start position of line L2, that is, the magnitude of indentation. Further, the threshold value TH_INDENT is a predetermined coefficient indicating a distance allowed as an indent at the beginning of a line. The threshold value TH_INDENT, for example, gives a value obtained by multiplying the character size of the line L1 by a predetermined coefficient α (α is a positive number, for example α = 1.5), and allows indentation within α characters. The threshold value TH_INDEX may be set by other methods, for example, an average value of the character sizes of the lines L1 and L2 may be multiplied by a predetermined coefficient α.

＜５−４．条件の強化：行終了位置の差異の許容範囲指定＞
連続行の候補を判定する条件を強化する他の条件として、例えば次式（７）を満たすことを条件として追加することで、行の終了位置がある程度近い行同士を連続行の候補とすることができる。
｜Ｌ１ＭＡＸ１−Ｌ２ＭＡＸ１｜≦ＴＨ＿ＤＩＦＦ＿ＥＮＤＰＯＳ・・・（７）
ここで、Ｌ１ＭＡＸ１は行Ｌ１の文書第１方向成分の最大値、Ｌ２ＭＡＸ１は行Ｌ２の文書第１方向成分の最大値である。例えば、文書第１方向が水平方向（横書き）である場合、Ｌ１ＭＡＸ１及びＬ２ＭＡＸ１は、行Ｌ１及び行Ｌ２の右端のＸ座標を指す。また、閾値ＴＨ＿ＤＩＦＦ＿ＥＮＤＰＯＳは行の終了位置の差として許容される距離を示す所定係数である。例えば行Ｌ１の文字サイズの平均値に所定係数β（βは正の数とする、例えばβ＝０．５）を乗算したものを閾値ＴＨ＿ＤＩＦＦ＿ＥＮＤＰＯＳとすることで、β文字以内のインデントを許容することになる。 <5-4. Strengthening of conditions: Specifying tolerance range for line end position differences>
As another condition for strengthening the condition for determining the candidate for the continuous line, for example, by adding the condition that the following expression (7) is satisfied as a condition, the lines where the end positions of the lines are close to a certain extent are made candidates for the continuous line. Can do.
| L1MAX1-L2MAX1 | ≦ TH_DIFF_ENDPOS (7)
Here, L1MAX1 is the maximum value of the document first direction component of the row L1, and L2MAX1 is the maximum value of the document first direction component of the row L2. For example, when the first direction of the document is the horizontal direction (horizontal writing), L1MAX1 and L2MAX1 indicate the X coordinates of the right ends of the rows L1 and L2. The threshold value TH_DIFF_ENDPOS is a predetermined coefficient indicating a distance allowed as a difference between the end positions of rows. For example, the threshold value TH_DIFF_ENDPOS can be used to allow indentation within β characters by multiplying the average value of the character size of the line L1 by a predetermined coefficient β (β is a positive number, for example, β = 0.5). become.

＜５−５．前方又は後方の連続行の選択＞
行Ｌ１の連続行の候補として抽出された行から、行Ｌ１の前方で最も近い位置にある行、及び、後方で最も近い位置にある行を、それぞれ最大１つずつ選択する。なお、文書第１方向が水平方向（横書き）である場合、行Ｌ１より上にある行を前方の行、行Ｌ１より下にある行を後方の行とし、文書第１方向が垂直方向（縦書き）である場合、行Ｌ１より右にある行を前方の行、行Ｌ１より左にある行を後方の行とする。また、行の近さを表わす値として、例えば、前述の行間変位量ｌｉｎｅｓｐａｃｅ（Ｌ１，Ｌ２）を使用し、ｌｉｎｅｓｐａｃｅ（Ｌ１，Ｌ２）が小さい程、行が近いとみなすことができる。なお前方、後方とも、連続行は最大で１つずつであり、必ずしも連続行が存在する必要はない。 <5-5. Select forward or backward continuous lines>
From the rows extracted as candidates for the continuous row of the row L1, the row closest to the front of the row L1 and the row closest to the rear are selected one at a time. When the document first direction is the horizontal direction (horizontal writing), the line above the line L1 is the front line, the line below the line L1 is the back line, and the document first direction is the vertical direction (vertical). In the case of writing), a line on the right side of the line L1 is a front line, and a line on the left side of the line L1 is a rear line. Further, for example, the above-described interline displacement amount linespace (L1, L2) is used as a value representing the closeness of the rows, and the smaller the linespace (L1, L2), the closer the rows can be considered. Note that there is a maximum of one continuous line for both the front and rear, and it is not always necessary to have a continuous line.

＜５−６．行ブロックへの分類及び統合＞
行ブロック解析処理部３２は、以上のようにして、全ての文字列行について前方及び後方の連続行を選択すると、行ブロックへの分類、統合を行う。但し、複数の行から連続行として選択されるケースもあり得るため、相互に連続行であるとされていない行のペアについては、その間の連続関係を事前に解消しておく。例えば、文書画像が図９の（ａ）である場合、前方の連続行として行Ｌ３を選択する行は、行Ｌ４と行Ｌ５との２つ存在するが、行Ｌ５は行Ｌ３の後方の連続行として選択されていない。そのため、行Ｌ３と行Ｌ５との間の連続関係は解消される。同様にして行Ｌ４と行Ｌ６との間の連続関係も解消される。このことにより、図９の（ａ）に示すような例では、行Ｌ３及び行Ｌ４において注目行Ｌ１からの連続関係が断たれるため、注目行Ｌ１と同一の行ブロックとして分類できなくなるケースも起こり得る。しかし、図９の（ｂ）に示すように複数の行ブロックとして分類することができ、後段のレイアウト解析処理部３３における段組解析処理部３７で、同一の段組、及びその段組を構成する同一の段（カラム）として統合できるため、この時点でブロックが分かれてしまっても、問題とはならない。 <5-6. Classification and integration into row blocks>
As described above, when the line block analysis processing unit 32 selects the front and rear continuous lines for all the character string lines, the line block analysis processing unit 32 performs classification and integration into the line blocks. However, since there may be a case where a plurality of rows are selected as continuous rows, the continuous relationship between the pairs of rows that are not considered to be continuous rows is canceled in advance. For example, when the document image is (a) in FIG. 9, there are two rows L4 and L5 for selecting the row L3 as the front continuous rows, but the row L5 is the continuous back of the row L3. Not selected as a row. Therefore, the continuous relationship between row L3 and row L5 is canceled. Similarly, the continuous relationship between the rows L4 and L6 is also eliminated. Accordingly, in the example as shown in FIG. 9A, the continuous relationship from the target row L1 is broken in the row L3 and the row L4, so that it may not be classified as the same row block as the target row L1. Can happen. However, as shown in FIG. 9B, it can be classified as a plurality of row blocks, and the column analysis processing unit 37 in the subsequent layout analysis processing unit 33 configures the same column and its column configuration. Therefore, even if the blocks are separated at this point, there is no problem.

行ブロックへの分類及び統合処理は、次のように行う。まず。行ブロックとして分類されていない文字列行のうち任意の行Ｌ１（注目行Ｌ１）について、まず、行Ｌ１を新規の行ブロックとして設定する。続いて、行Ｌ１から前後の連続行をたどり、行ブロックの範囲を拡大する。前方及び後方とも、連続行が無くなると、行ブロックの拡大を終了し、その行ブロックに含まれる先頭の行から順に行ＩＤを取得する。また、行ブロックの情報として、行ブロックに含まれる全ての行に外接する最小矩形の左上座標、幅及び高さ、並びに含まれる行数を取得する。以上のようにして得られた行の順序と各種情報とを持つ行ブロックを、既に登録済みの行ブロックと重複しないＩＤを持つ新規の行ブロックとして登録を行い、またその行ブロックに含まれる各行の所属行ブロックＩＤを更新する。行ブロック解析処理部３２は、このようにして行われる行ブロックへの分類及び統合処理を、全ての文字列行がいずれかの行ブロックに分類されるまで繰り返す。 The classification and integration processing into row blocks is performed as follows. First. Regarding an arbitrary line L1 (target line L1) among character string lines not classified as a line block, first, the line L1 is set as a new line block. Subsequently, the continuous line before and after the line L1 is traced to expand the range of the line block. When there are no consecutive rows in both the front and rear, the expansion of the row block is terminated, and row IDs are acquired in order from the first row included in the row block. Further, as the row block information, the upper left coordinates, the width and the height of the smallest rectangle circumscribing all the rows included in the row block, and the number of included rows are acquired. The row block having the row order and various information obtained as described above is registered as a new row block having an ID that does not overlap with an already registered row block, and each row included in the row block is registered. Update the belonging row block ID. The row block analysis processing unit 32 repeats the classification and integration processing into the row blocks performed in this way until all the character string rows are classified into any row block.

＜５−７．同一の行ブロックに分類できる（連続行の候補とできる）行の条件＞
なお、文書第１方向の文字列行は文書第１方向の文字列行とのみ、文書第２方向の文字列行は文書第２方向の文字列行とのみ、行ブロックを構成する。すなわち、１つの行ブロックに、文書第１方向の文字列行と文書第２方向の文字列行とが混在することは無い。従って、注目行Ｌ１の連続行の候補を探索する際、注目行Ｌ１の文字列方向と異なる方向の文字列行は連続行の候補としない。 <5-7. Conditions for rows that can be classified into the same row block (can be candidates for continuous rows)>
A character string line in the first direction of the document constitutes a line block only with a character string line in the first direction of the document, and a character string line in the second direction of the document constitutes a character string line in the second direction of the document. That is, the character string line in the document first direction and the character string line in the document second direction are not mixed in one line block. Therefore, when searching for a candidate for a continuous line of the target line L1, a character string line in a direction different from the character string direction of the target line L1 is not a candidate for a continuous line.

＜５−８．処理例＞
行ブロック解析処理部３２が実行する行ブロック解析処理を、具体例を用いて説明する。既に示した図６のように文書画像から検出された複数の行に対して行ブロック解析処理を適用すると、文書画像は、図１０のように行ブロックとして分類される。図１０に示す例では、行ブロックＢ３は、章の見出しの行であり、行ブロックＢ４に比べて文字が大きい。このように文字のサイズが大きく異なる２つの行同士を連続行の候補として選択しないような、連続行の候補の判定の条件を追加することも有効な手段である。 <5-8. Processing example>
The row block analysis processing executed by the row block analysis processing unit 32 will be described using a specific example. When the row block analysis process is applied to a plurality of rows detected from the document image as shown in FIG. 6, the document image is classified as a row block as shown in FIG. In the example shown in FIG. 10, the row block B3 is a chapter heading row, and has a larger character than the row block B4. It is also an effective means to add a condition for determining candidates for continuous lines so that two lines having greatly different character sizes are not selected as candidates for continuous lines.

［６．レイアウト解析処理部］
＜６−１．レイアウト解析処理部の構成＞
図１１は、レイアウト解析処理部３３の詳細構成を示すブロック図である。レイアウト解析処理部３３は、前段の行ブロック解析処理部３２で分類された行ブロック同士の上下左右の位置関係から、行ブロック構成を解析し、文書画像中の文章（本文）の読み順を推定する処理を行うものである。レイアウト解析処理部３３は、段組解析処理部３７、行順序付け処理部３８、段落解析処理部３９を備えて構成される。 [6. Layout analysis processing unit]
<6-1. Configuration of layout analysis processing unit>
FIG. 11 is a block diagram showing a detailed configuration of the layout analysis processing unit 33. The layout analysis processing unit 33 analyzes the row block configuration from the vertical and horizontal positional relationships between the row blocks classified by the previous row block analysis processing unit 32, and estimates the reading order of the sentences (text) in the document image. The process which performs is performed. The layout analysis processing unit 33 includes a column analysis processing unit 37, a line ordering processing unit 38, and a paragraph analysis processing unit 39.

＜６−２．段組解析処理部＞
段組解析処理部３７は、複数の行ブロックの上下及び左右の位置関係から、段組及び段組を構成する各段（カラム）を分類する段組解析処理を実行する。文書は文書第２方向に段組が配置され、各段組構成内で文書第１方向にカラムが配置されているものとして、ページ内の行ブロックの集合を、適切に境界線を設定して行ブロックをまたぐことなく分割して初期段組とする。そして、同一の初期段組に含まれる行ブロックの集合を、適切に境界線を設定して行ブロックをまたぐことなく分割して、該初期段組を構成する初期カラムとする。 <6-2. Multi-column analysis processing section>
The column analysis processing unit 37 executes column analysis processing for classifying the columns and the respective columns (columns) constituting the columns from the vertical and horizontal positional relationships of the plurality of row blocks. Assuming that the document is arranged in columns in the second direction of the document, and columns are arranged in the first direction of the document in each column structure, the set of row blocks in the page is appropriately set with a boundary line. Divide without crossing the row block to the initial column. Then, a set of row blocks included in the same initial column is divided without setting a boundary line so as to cross the row block, and used as an initial column constituting the initial column.

境界線の設定方法は特に指定はなく、最も簡単な例として、初期段組の分類には文書第２方向と平行な直線を使用し、初期カラムの分類には文書第１方向と平行な直線を使用することが挙げられる。例えば、図１２の（ａ）のように横書きの文書画像から行ブロックの構造が解析された場合、図１２の（ｂ）のように行ブロックを初期段組に分類され、さらに初期段組は図１２の（ｃ）のようにそれぞれ初期カラムとして分類される。なお、図１２の（ｂ）及び（ｃ）では、段組間の境界線は実線で、カラム間の境界線は一点鎖線で示されている。 The method of setting the boundary line is not particularly specified. As the simplest example, a straight line parallel to the second direction of the document is used for classification of the initial column, and a straight line parallel to the first direction of the document is used for classification of the initial column. Can be used. For example, when the structure of a row block is analyzed from a horizontally written document image as shown in FIG. 12A, the row block is classified into an initial column as shown in FIG. Each is classified as an initial column as shown in FIG. In FIGS. 12B and 12C, the boundary lines between the columns are indicated by solid lines, and the boundary lines between the columns are indicated by alternate long and short dash lines.

＜６−３．段組の分割禁止（同一段組として許容される行ブロック間距離の算出）＞
本来は同一段組であるが、偶然、行ブロックを分割することができるために複数の段組に分かれてしまうようなケースもまれに存在する。こうしたケースに対応するため、例えば連続する２つの行ブロック間の距離を算出し、その距離が所定値（例えば行ブロックの平均行間距離の２倍）以下の２つのブロック間には境界線を引くことを禁止する条件を追加することができる。 <6-3. Prohibition of column division (calculation of distance between row blocks allowed for the same column)>
Although it is originally the same column, there is a rare case where a row block can be divided by chance and thus divided into a plurality of columns. In order to cope with such a case, for example, a distance between two consecutive row blocks is calculated, and a boundary line is drawn between two blocks whose distance is equal to or less than a predetermined value (for example, twice the average inter-row distance of the row blocks). You can add a condition that prohibits this.

図１３は、行ブロックＢ１０，Ｂ１１，Ｂ１２を左側のカラム、行ブロックＢ２０，Ｂ２１を右側のカラムとした２段組構成の例を示す。行ブロックＢ１０と行ブロックＢ１１との間、行ブロックＢ２０と行ブロックＢ２１との間が空いているため、行ブロックＢ１０と行ブロックＢ２０から成る２段組構成、及び、行ブロックＢ１１とＢ１２と行ブロックＢ２１とから成る２段組構成として分割してしまう恐れもある。しかし、行ブロックＢ２０の平均行間距離（２０）に対して、行ブロックＢ２０と行ブロックＢ２１とのブロック間距離（３０）が所定値（２０×２＝４０）以下であるとして、行ブロックＢ２０と行ブロックＢ２１との間に境界線を引くことを禁止することで、これらの行ブロックが２つの異なる段組に分かれることを防ぐことができる。 FIG. 13 shows an example of a two-column configuration in which the row blocks B10, B11, and B12 are on the left column, and the row blocks B20 and B21 are on the right column. Since the space between the row block B10 and the row block B11 and the space between the row block B20 and the row block B21 are vacant, the two-stage configuration including the row block B10 and the row block B20, and the row blocks B11 and B12 and the row block There is also a possibility that it is divided as a two-stage configuration composed of the block B21. However, assuming that the inter-block distance (30) between the row block B20 and the row block B21 is equal to or less than a predetermined value (20 × 2 = 40) with respect to the average inter-row distance (20) of the row block B20, By prohibiting the boundary line from being drawn with the row block B21, it is possible to prevent these row blocks from being divided into two different columns.

＜６−４．位置関係以外の情報の活用＞
また、行ブロックの位置関係に加えて、行ブロックが持つ各種情報を利用して、段組及びカラムの分類（すなわち境界線の設定）を行うことができる。行ブロックが持つ各種情報の例として、行の長さや主要な文字のサイズ等が挙げられる。隣り合う行ブロック同士でこれらの情報が大きく異なる場合は、同一の段組に分類することを避けるようにすることができる。逆に位置が大きく離れた行ブロック同士でも、例えば同じカラム境界線を共有することができ、かつ類似する情報を持つ場合、同一段組として分類してもよい。 <6-4. Utilization of information other than positional relationships>
Further, in addition to the positional relationship of the row blocks, it is possible to classify columns and columns (that is, set boundary lines) by using various information held by the row blocks. Examples of various information held by the line block include the length of the line and the size of main characters. If these pieces of information differ greatly between adjacent row blocks, it can be avoided to classify them into the same column set. On the other hand, even when the row blocks are far apart from each other, for example, the same column boundary line can be shared and similar information may be classified as the same column.

＜６−５．行ブロックが０個もしくは１個しかない場合＞
なお、段組解析処理部３７に入力されたページ画像が、ただ１つの行ブロックを持つ場合、そのページ画像は１段構成の文書であるとして、境界線の設定は行わない。また、該ページ画像が、１つも行ブロックを持たない場合（すなわち白紙ページの場合）も境界線の設定は行わない。 <6-5. When there are only 0 or 1 row block>
If the page image input to the column analysis processing unit 37 has only one row block, the page image is a one-stage document, and no boundary line is set. Even when the page image has no row block (that is, a blank page), no border is set.

また、文書画像に文書第１方向の文字列行と文書第２方向の文字列行とが混在する場合は、文書第２方向の文字列行の行ブロックを図表行の行ブロックに置き換える。このことにより、文書第１方向に記述された文章の最中に、文書第２方向に記述された文章が混じることを防ぐことができる。 Further, when the document image includes character string rows in the first direction of the document and character string rows in the second direction of the document, the row block of the character string rows in the second direction of the document is replaced with the row block of the chart row. Thus, it is possible to prevent a sentence described in the second direction of the document from being mixed with a sentence described in the first direction of the document.

＜６−６．処理例＞
段組解析処理部３７が実行する処理の具体例として、例えば既に示した図１０に示す文書画像から検出された複数の行ブロックに対して段組解析処理を適用する場合について説明する。段組解析処理部３７は、図１０に示す文書画像を、図１４に示す段組及びカラム（淡いグレー地）に分類する。カラムＣ１及びカラムＣ２は、それぞれ１段構成の段組Ｇ１及び段組Ｇ２を成し、カラムＣ３及びカラムＣ４は２段組構成の段組Ｇ３における左右のカラムを成している。なお、図１４では、行（文字列行及び図表行）を直線、行ブロックを点線、カラムを一点鎖線で囲んでいる。 <6-6. Processing example>
As a specific example of the processing executed by the column analysis processing unit 37, for example, a case where the column analysis processing is applied to a plurality of row blocks detected from the document image shown in FIG. The column analysis processing unit 37 classifies the document image shown in FIG. 10 into columns and columns (light gray background) shown in FIG. The column C1 and the column C2 respectively constitute a column set G1 and a column set G2 having a one-stage configuration, and the column C3 and the column C4 constitute left and right columns in the column set G3 having a two-stage configuration. In FIG. 14, lines (character string lines and chart lines) are surrounded by straight lines, line blocks are surrounded by dotted lines, and columns are surrounded by alternate long and short dash lines.

＜６−７．行順序付け処理部＞
行順序付け処理部３８は、段組、カラム、行ブロック、及び行の位置関係から文書全体における行の順序を解析し、行順序リストを生成する処理を、以下のルール（１）〜（９）に従って行う。
（１）同じ行ブロックに属する行同士については、横書き文書であれば上から下、縦書き文書であれば右から左の順に優先順位を設定する。ここでは、上記のように既に、行ブロックに分類する際に、その行ブロックに含まれる行についての順序の情報も取得しているため、この情報を利用する。
（２）同じカラムに属する行ブロック同士については、横書き文書であれば上から下、縦書き文書であれば右から左の順に優先順位を設定する。
（３）連続する２つの行ブロック間では、優先順位の高い方の行ブロックの末尾の行の次に、優先順位の低い方の行ブロックの先頭の行が優先されるように設定する。
（４）同じ段組に属するカラム同士については、横書き文書であれば左から右、縦書き文書であれば上から下の順に優先順位を設定する。
（５）連続する２つのカラム間では、優先順位の高い方のカラムの末尾の行ブロックの次に、優先順位の低い方のカラムの先頭の行ブロックが優先されるように設定する。
（６）同じページに属する段組については、横書き文書であれば上から下、縦書き文書であれば右から左の順に優先順位を設定する。
（７）連続する２つの段組間では、優先順位の高い方の段組の末尾のカラムの次に、優先順位の低い方の段組の先頭のカラムが優先されるように設定する。
（８）同じ文書画像ファイルに属するページ同士については、ページ番号の小さい順に優先順位が高くなるよう設定する。
（９）連続する２つのページ間では、優先順位の高いページの末尾の段組の次に、優先順位の低い方のページの先頭の段組が優先されるように設定する。 <6-7. Line ordering processing section>
The line ordering processing unit 38 analyzes the order of lines in the entire document from the column, column, line block, and line positional relationship, and generates a line order list according to the following rules (1) to (9). Follow the instructions.
(1) For the rows belonging to the same row block, priorities are set in the order from top to bottom for horizontal writing documents and from right to left for vertical writing documents. Here, as described above, since the information on the order of the rows included in the row block is already acquired when the data is classified into the row blocks, this information is used.
(2) For row blocks belonging to the same column, priorities are set in order from top to bottom for horizontal writing documents and from right to left for vertical writing documents.
(3) Between two consecutive row blocks, the first row of the row block with the lower priority is set to be given priority after the last row of the row block with the higher priority.
(4) For columns belonging to the same column, priorities are set in order from left to right for horizontally written documents and from top to bottom for vertically written documents.
(5) Between two consecutive columns, the first row block of the lower priority column is set to be given priority after the last row block of the higher priority column.
(6) For columns belonging to the same page, priorities are set in order from top to bottom for horizontal writing documents and from right to left for vertical writing documents.
(7) Between two consecutive columns, setting is performed so that the first column in the column with the lower priority is given priority after the column at the end of the column with the higher priority.
(8) For pages belonging to the same document image file, settings are made so that the priorities are higher in order of increasing page numbers.
(9) Between two consecutive pages, setting is made so that the top column of the page with the lower priority is given priority after the last column of the page with the higher priority.

行順序付け処理部３８は、上記のルール（１）〜（９）に従って、ページの順序、段組の順序、カラムの順序、行ブロックの順序を決定し、それらにより行の順序付けを行う。順序付けされた行は、各行が属する行ブロック、カラム、段組及びページの順序を示す番号を保有すると共に、先頭から順に各行の行ＩＤを行順序リストに格納する。
行順序リストは、下記の規定（ａ）〜（ｃ）に従う形式であれば特に構造は問わない。
（ａ）上記順序付けルールに従って決定された順序通りに行を呼び出すことができる。
（ｂ）呼び出した行について、その座標情報や種類（文字列行か図表行か）等の各種情報を参照することができる。
（ｃ）呼び出した行について、段落情報（後述）を格納することができる。 The row ordering processing unit 38 determines the order of pages, the order of columns, the order of columns, and the order of row blocks in accordance with the above rules (1) to (9), and performs ordering of rows by them. The ordered rows hold numbers indicating the order of row blocks, columns, columns and pages to which each row belongs, and store the row ID of each row in the row order list in order from the top.
The structure of the line order list is not particularly limited as long as it conforms to the following rules (a) to (c).
(A) Rows can be called in the order determined according to the ordering rules.
(B) With respect to the called line, various information such as coordinate information and type (whether it is a character string line or a chart line) can be referred to.
(C) Paragraph information (described later) can be stored for the called line.

＜６−８．処理例＞
行順序付け処理部３８が、上記の規定（ａ）〜（ｃ）に従って、図１４の構成の文書画像について行順序リストを生成した例を、図１５に示す。行順序リストは、決定された順序の先頭から順に、行のＩＤと、行の情報として、所属する行ブロックＩＤ、行の種別（文字列行であるか図表行であるか）、及び行の範囲を示す外接矩形の左上座標及び右下座標、の情報とを格納し、さらに行毎に段落情報を格納している。なお、図１５の例では、改行が発生するときに、その行から新たな段落が始まるとして、段落情報を改行の有無を有り（Ｙｅｓ）か無し（Ｎｏ）かの２通りで示しており、事前に「Ｎｏ」で初期化している。図１６のように行ブロックの情報やカラムの情報、段組の情報を別途作成し、相互参照により各行及び各行ブロックが所属するカラム、段組、ページを参照できるようにしておくことで、冗長の少ない行順序リストを構成することができる。もちろん、行順序リスト単独で各行に関する情報を全て抽出できるようにしてもよい。 <6-8. Processing example>
FIG. 15 shows an example in which the line ordering processing unit 38 generates a line order list for the document image having the configuration shown in FIG. 14 in accordance with the rules (a) to (c). The line order list includes, from the head of the determined order, the line ID, the line block ID to which the line belongs, the line type (whether it is a character string line or a chart line), and the line information The information of the upper left coordinate and the lower right coordinate of the circumscribed rectangle indicating the range is stored, and the paragraph information is stored for each line. In the example of FIG. 15, when a line break occurs, a new paragraph starts from that line, and the paragraph information is shown in two ways, whether there is a line break (Yes) or not (No), It is initialized with “No” in advance. By creating separate row block information, column information, and column information as shown in FIG. 16 and making it possible to refer to the columns, columns, and pages to which each row and each row block belong by cross-reference. It is possible to construct a line order list with few. Of course, all the information regarding each line may be extracted by the line order list alone.

＜６−９．段落解析処理部＞
段落解析処理部３９は、各行の前後の位置関係等の情報から、その行の位置で改行が発生しているかどうかを判定し、文書画像中の各行を１つ以上の段落に分類する処理を行う。具体的には、行順序リストから複数の行を参照して段落の切れ目、すなわち改行位置を判定し（改行判定処理）、段落毎に行の順序を記述した文書構造ツリーを生成する（文書構造ツリー生成処理）。 <6-9. Paragraph analysis processing section>
The paragraph analysis processing unit 39 determines whether a line break has occurred at the position of the line from information such as the positional relationship before and after each line, and classifies each line in the document image into one or more paragraphs. Do. Specifically, a plurality of lines are referenced from the line order list to determine paragraph breaks, that is, line break positions (line break determination processing), and generate a document structure tree describing the line order for each paragraph (document structure) Tree generation processing).

ところで、文書中の図表は、必ずしも段落の切れ目に配置されるとは限らず、例えばページの端に挿入される場合が多く、それにより文章が図表を挟んで前後に分かれることがある。この順序のまま行を呼び出し、行を構成する要素（文字、図表）を挿入していくと、図表の挿入によって不自然に途切れた文章が出力されてしまう。そこで、本実施の形態では、段落毎に、文字列行と図表行とが混在した順序ではなく、文字列行の順序と図表行の順序をそれぞれ別に保有する文書構造ツリーを生成する。図１７は、文書構造ツリーの構造を示す図である。それにより、文書画像を構成する文字列のみの順序を把握しながら、その段落に係る図表を、段落の先頭や末尾等にまとめて配置できるようにする。 By the way, charts in a document are not necessarily arranged at paragraph breaks, and are often inserted, for example, at the end of a page, so that a sentence may be divided before and after the chart. If the lines are called in this order and the elements (characters, charts) constituting the lines are inserted, sentences that are unnaturally interrupted by the insertion of the charts are output. Therefore, in the present embodiment, for each paragraph, a document structure tree is generated in which the order of character string lines and the order of chart lines are held separately, not the order in which character string lines and chart lines are mixed. FIG. 17 is a diagram showing the structure of a document structure tree. Thereby, while grasping the order of only the character strings constituting the document image, the chart relating to the paragraph can be arranged collectively at the beginning or end of the paragraph.

＜６−１０．改行判定処理＞
図１８は、段落解析処理部３９における、改行判定処理の概要を示すイメージ図である。改行判定処理は、判定の対象となる注目行と、注目行より前に順序づけられるＭ個の行と、注目行より後に順序づけられるＮ個の行と、のＭ＋Ｎ＋１個の行によって判定される。なお、図１８に示す例では、Ｍ＝Ｎ＝２としている。なお、改行判定の対象となる行、及びその前後の行は、いずれも文字列行である。本実施の形態では、Ｍ＋Ｎ＋１個の行ＩＤバッファＬ［０］，Ｌ［１］・・・，Ｌ［Ｍ＋Ｎ］を記憶部５に備え、行順序リストで参照されるＭ＋Ｎ＋１個の行ＩＤをそれぞれ格納することで、注目行と、注目行の前後の行との比較を行う。 <6-10. Line feed judgment processing>
FIG. 18 is an image diagram showing an outline of a line feed determination process in the paragraph analysis processing unit 39. The line break determination process is determined by M + N + 1 lines of a target line to be determined, M lines ordered before the target line, and N lines ordered after the target line. In the example shown in FIG. 18, M = N = 2. Note that the line that is subject to line feed determination and the lines before and after it are both character string lines. In the present embodiment, M + N + 1 row ID buffers L [0], L [1]..., L [M + N] are provided in the storage unit 5, and M + N + 1 row IDs referenced in the row order list are respectively provided. By storing, the attention line is compared with the lines before and after the attention line.

以下、段落解析処理部３９の処理内容について詳細に説明する。図１９は、段落解析処理部３９の処理手順を示すフローチャートである。段落解析処理の開始にあたり、事前に初期化を済ませておく。具体的には、リスト参照番号をＬＮＯＷ＝１とし、行ＩＤバッファには全て無効行（０）を格納しておく。初期化が終わると、注目行にあたる行ＩＤバッファＬ［Ｍ］に、行ＩＤを選択して格納する（ステップＳ１、以下ではＳ１のように略す）。任意の行ＩＤバッファＬ［ｋ］（ｋ＝０，１，・・・，Ｍ＋Ｎ）の選択方法は以下の（１）〜（３）の通り行われる。
（１）行順序リストにおいて、第ＬＮＯＷ番目から順に、文字列行を探索する。
（２）最初に見つかった文字列行の行ＩＤを行ＩＤバッファＬ［ｋ］に格納し、そのときの行順序リストの位置（リスト番号）に１を加えた番号を新たなリスト参照番号ＬＮＯＷとして更新する。
（３）文字列行が見つからないまま行順序リストの末尾まで探索が終了した場合、行ＩＤバッファＬ［ｋ］には無効行（０）を格納する。 Hereinafter, the processing content of the paragraph analysis processing unit 39 will be described in detail. FIG. 19 is a flowchart showing the processing procedure of the paragraph analysis processing unit 39. Before starting the paragraph analysis process, initialize it in advance. Specifically, the list reference number is LNOW = 1, and all invalid rows (0) are stored in the row ID buffer. When the initialization is completed, the row ID is selected and stored in the row ID buffer L [M] corresponding to the target row (step S1, hereinafter abbreviated as S1). A method of selecting an arbitrary row ID buffer L [k] (k = 0, 1,..., M + N) is performed as follows (1) to (3).
(1) In the line order list, the character string lines are searched in order from the LNOWth.
(2) The row ID of the first found character string row is stored in the row ID buffer L [k], and a number obtained by adding 1 to the position (list number) of the row order list at that time is a new list reference number LNOW. Update as.
(3) When the search is completed up to the end of the line order list without finding the character string line, the invalid line (0) is stored in the line ID buffer L [k].

行Ｌ［Ｍ］の更新後、行Ｌ［Ｍ］が有効行（ゼロでない行ＩＤを持つ行）であるかどうかを判定し（Ｓ２）、有効行である場合（Ｓ２の判定がＹＥＳ）、Ｓ３に移る。一方、Ｌ［Ｍ］が無効行である場合（Ｓ２の判定がＮＯ）、入力した文書画像には文字列行が存在しない図表行のみの文書画像であるとして、文書構造ツリーの生成処理（後述）を実行する（Ｓ７）。 After updating the row L [M], it is determined whether or not the row L [M] is a valid row (a row having a non-zero row ID) (S2). If the row L [M] is a valid row (YES in S2), Move on to S3. On the other hand, when L [M] is an invalid line (NO in S2), the document structure tree generation process (described later) is performed assuming that the input document image is a document image of only a chart row in which no character string row exists. ) Is executed (S7).

次に、注目行より後の行に当たる行ＩＤバッファＬ［Ｍ＋１］，・・・，Ｌ［Ｍ＋Ｎ］に、行ＩＤを選択して格納する（Ｓ３）。各バッファにおける行ＩＤの選択方法は上記と同様であるため省略する。続いて、注目行Ｌ［Ｍ］の改行判定を実行する（Ｓ４）。改行判定は、改行判定対象となる注目行Ｌ［Ｍ］と、注目行より前に位置する行Ｌ［０］，・・・，Ｌ［Ｍ−１］及び注目行より後に位置する行Ｌ［Ｍ＋１］，・・・，Ｌ［Ｍ＋Ｎ］からなる複数の行を用いた公知の方法で行うことができる。簡単な例として、改行判定対象のインデントの有無を確認する方法がある。行Ｌ［Ｍ］の開始位置が、他の行に比べて文書第１方向に正値のずれが生じている場合に、行Ｌ［Ｍ］はインデントを持ち、行Ｌ［Ｍ］の位置で改行がなされているとみなすことができる。また、改行判定対象行Ｌ［Ｍ］の１つ前の行Ｌ［Ｍ−１］が、他の行に比べて短い場合、行Ｌ［Ｍ］の位置で改行がなされているとみなすことができる。例えば、図１８（Ｍ＝Ｎ＝２）のような横書きの文字列行が存在している場合、注目行Ｌ［２］の開始位置が、他の行に比べて右側（横書きの場合の正方向）にずれており、また１行前の行Ｌ［１］が他の行に比べて短いという特徴を持っており、これらの結果から、注目行Ｌ［２］は総合的に改行位置であるとして判定され易くなる。なお、文書編集者の好み等により段落の先頭行でインデントが付加されない場合もあり、また１つ前の段落の最終行が必ずしも短くなるとは限らないため注意する。 Next, the row ID is selected and stored in the row ID buffers L [M + 1],..., L [M + N] corresponding to the row after the target row (S3). Since the selection method of the row ID in each buffer is the same as described above, the description is omitted. Subsequently, line feed determination for the target line L [M] is executed (S4). Line break determination is performed by the attention line L [M] to be the line break determination target, the lines L [0],..., L [M−1] positioned before the target line, and the line L [ M + 1],..., L [M + N] can be performed by a known method using a plurality of rows. As a simple example, there is a method of confirming whether or not there is an indent to be detected as a line feed. When the start position of the line L [M] has a positive shift in the first direction of the document compared to the other lines, the line L [M] has an indent, and the position of the line L [M] It can be considered that a line feed has been made. In addition, when the line L [M−1] immediately before the line feed determination target line L [M] is shorter than the other lines, it may be considered that a line break is made at the position of the line L [M]. it can. For example, when there is a horizontally written character string line as shown in FIG. 18 (M = N = 2), the start position of the target line L [2] is on the right side (the correct line in the case of horizontal writing) as compared to the other lines. And the previous line L [1] is shorter than the other lines. From these results, the target line L [2] It becomes easy to determine that there is. Note that indentation may not be added to the first line of a paragraph depending on the preference of the document editor, and the last line of the previous paragraph is not necessarily shortened.

改行位置であるかどうかの判定の他の例としては次のものがある。注目行の文字サイズが周辺の行に比べて大きく異なるかどうかを判定することで、見出し行のように文字サイズが大きくなっている行、また補足コメント等のように逆に文字サイズが小さくなっている行等で改行位置と判定することができる。また、周辺の連続する２つの行の行間距離に比べて、注目行とその１つ前の行との行間距離が大きくなっている場合に、１つ前の行で段落が終了している可能性が高くなる。上記挙げられた条件を例として、様々な条件を複合的に判定して、注目行における改行の有無を設定するのが好ましい。 Another example of determining whether a line feed position is present is as follows. By determining whether the character size of the line of interest is significantly different from the surrounding lines, the character size is reduced, such as a line with a large character size, such as a heading line, or a supplementary comment. It can be determined that the line is at a line break position. In addition, when the distance between the target line and the previous line is larger than the distance between two adjacent consecutive lines, the paragraph may end at the previous line. Increases nature. Taking the above-mentioned conditions as an example, it is preferable to determine whether or not there is a line break in the line of interest by determining various conditions in combination.

行Ｌ［Ｍ］の改行判定の結果は、行順序リストのＬ［Ｍ］に該当する行情報に段落情報として反映させる。段落情報は段落の切れ目が判るものであれば何でもよく、最も簡単な例として、改行の有無をＹｅｓあるいはＮｏの２通りで示すだけでもよい。 The result of line feed determination for line L [M] is reflected as paragraph information in the line information corresponding to L [M] in the line order list. The paragraph information may be anything as long as the break of the paragraph is known. As the simplest example, the presence / absence of a line feed may be indicated in two ways, Yes or No.

改行判定対象行Ｌ［Ｍ］の改行判定が終了すると、注目行の次の行Ｌ［Ｍ＋１］が有効行であるかどうかを判定し（Ｓ５）、Ｌ［Ｍ＋１］が有効行である場合（Ｓ５の判定がＹＥＳ）、行ＩＤバッファの更新を行い、次の行についての改行判定を行う準備をする（Ｓ６）。行ＩＤバッファの更新は、具体的には、図２０に示すように、Ｌ［０］＝Ｌ［１］，・・・，Ｌ［Ｍ＋Ｎ−１］＝Ｌ［Ｍ＋Ｎ］としてバッファを１つずつずらすとともに、バッファＬ［Ｍ＋Ｎ］を新たに行順序リストから選択する。バッファＬ［Ｍ＋Ｎ］の選択方法は前述の方法と同様であるため省略する。バッファの更新後、Ｓ４に戻り、更新された注目行Ｌ［Ｍ］について、改行判定を行う。これを、Ｓ５で判定がＮＯとなるまで繰り返す。Ｓ５の判定がＮＯとなると、全ての文字列行について改行判定が終了したことになり、文書構造ツリーの生成処理を実行する（Ｓ７）。 When the line feed determination of the line feed determination target line L [M] is completed, it is determined whether or not the line L [M + 1] next to the target line is a valid line (S5), and if L [M + 1] is a valid line ( If the determination in S5 is YES), the line ID buffer is updated, and preparations are made for line feed determination for the next line (S6). Specifically, as shown in FIG. 20, the row ID buffer is updated with L [0] = L [1],..., L [M + N−1] = L [M + N] one by one. At the same time, the buffer L [M + N] is newly selected from the row order list. Since the method for selecting the buffer L [M + N] is the same as that described above, a description thereof will be omitted. After updating the buffer, the process returns to S4, and a line feed is determined for the updated attention line L [M]. This is repeated until the determination is NO in S5. If the determination in S5 is NO, the line feed determination has been completed for all the character string rows, and the document structure tree generation process is executed (S7).

＜６−１１．文書構造ツリー生成処理＞
段落解析処理部３９による文書構造ツリー生成処理は、段落情報を考慮した行順序リストに格納された段落情報に従って実行される。但し、文書構造ツリーは事前に、図２１に示すような１つの空の段落（第０段落）を持つ状態に初期化されているものとする。初期段落番号をＰＮＯＷ＝０として、行順序リストの先頭から順に行の情報を参照し、段落情報から該行が段落開始行と判定された場合（段落情報がＹＥＳ）のみ、ＰＮＯＷ＝ＰＮＯＷ＋１として段落番号の更新を行い、また文書構造ツリーに空の段落（第ＰＮＯＷ段落）を新たに追加する。そして、該行が文字列行である場合、文書構造ツリーの第ＰＮＯＷ段落が持つ文字列ツリーの末尾に該行の行ＩＤを追加する。一方、該行が図表行である場合、文書構造ツリーの第ＰＮＯＷ段落が持つ図表ツリーの末尾に該行の行ＩＤを追加する。これを行順序リストの各行について反復し、末尾まで探索が終了すると、文書構造ツリーの生成処理を終了する。なお、文字列行の改行位置から次の改行位置までが１つの段落となるので、図表行が段落の分類から漏れるということはない。そのため、ひと固まりの文字列行の後に図表行があり、その後新しい段落が始まる（改行が発生する）場合は、その図表行は１つ前の段落に含まれることになる。 <6-11. Document structure tree generation processing>
The document structure tree generation process by the paragraph analysis processing unit 39 is executed according to the paragraph information stored in the line order list considering the paragraph information. However, it is assumed that the document structure tree is initialized in advance to have a single empty paragraph (0th paragraph) as shown in FIG. The initial paragraph number is set to PNOW = 0, the line information is referred to in order from the top of the line order list, and the paragraph is set to PNOW = PNOW + 1 only when the line is determined to be the paragraph start line from the paragraph information (the paragraph information is YES). The number is updated, and an empty paragraph (the PNOW paragraph) is newly added to the document structure tree. If the line is a character string line, the line ID of the line is added to the end of the character string tree held in the PNOW paragraph of the document structure tree. On the other hand, when the line is a chart line, the line ID of the line is added to the end of the chart tree of the PNOW paragraph of the document structure tree. This is repeated for each line in the line order list, and when the search is completed to the end, the document structure tree generation process is terminated. Note that a line from the line feed position of the character string line to the next line feed position is one paragraph, so that the chart line is not leaked from the paragraph classification. Therefore, when there is a chart row after a group of character strings and a new paragraph starts (a line break occurs), the chart row is included in the previous paragraph.

＜６−１２．処理例＞
段落解析処理部３９が実行する処理の具体例として、図６の構成の文書画像（図１５に示す初期の行順序リストを持つ）に段落解析処理部３９での処理を適用する場合について説明する。見出しに当たる行ＩＤ：１０１、行ＩＤ１０４、行ＩＤ：１２９の各行は、行の文字サイズや１つ前の行間距離等の条件から、改行位置と判定される。また行ＩＤ：１０６、行ＩＤ：１１２、行ＩＤ：１１９及び行ＩＤ：１３１の各行は、前後の複数の行に比べ、行の開始位置が文書第１方向において正方向にシフトしている、よってインデントが存在するとして、改行位置と判定される。行ＩＤ：１０２の行も、見出し行の行ＩＤ：１０１の次の行であることから、改行位置と判定することができる。従って、行ＩＤ：１０１、１０２、１０４、１０６、１１２、１２９、１３１、１１９の各行が改行位置として設定され、行順序リストの段落情報は、図２２のように更新される。 <6-12. Processing example>
As a specific example of the processing executed by the paragraph analysis processing unit 39, a case will be described in which the processing in the paragraph analysis processing unit 39 is applied to a document image having the configuration shown in FIG. 6 (having the initial line order list shown in FIG. 15). . Each line of the line ID: 101, line ID 104, and line ID: 129 corresponding to the headline is determined to be a line feed position based on conditions such as the character size of the line and the distance between the previous lines. In addition, each line of line ID: 106, line ID: 112, line ID: 119, and line ID: 131 has a start position of the line shifted in the positive direction in the first direction of the document as compared to the preceding and following lines. Therefore, it is determined that there is a line break position because there is an indent. Since the line with the line ID: 102 is also the line next to the line ID: 101 of the heading line, it can be determined as a line feed position. Therefore, each line of line ID: 101, 102, 104, 106, 112, 129, 131, 119 is set as a line feed position, and the paragraph information of the line order list is updated as shown in FIG.

例えば、行順序リストが図２２のように、段落情報として改行の有無（ＹｅｓもしくはＮｏ）が表されている場合、改行がある行から新しい段落が開始するとみなせるため、改行がある（Ｙｅｓ）場合に該行が段落開始行として判定することができる。また、段落が開始してから、次の改行位置が見つかるまでの行を同一の段落の範囲とみなす。更新された行順序リストの段落情報に従い、図６の文書画像を段落毎に分類すると、図２４に示すように分類することができる。図２４において、段落Ｒ７は、左下の３行の文字列行（行ＩＤ：１３１，１３３，１３５）から、右上の図表行（行ＩＤ：１０５）及び４行の文字列行（行ＩＤ：１１１，１１３，１１５，１１７）までを同一の段落の範囲としており、行ＩＤ：１３５と行ＩＤ：１１１との文字列行の間に改行位置は存在しないため、一続きの文章を構成することが可能となる。更に、この行順序リストに従って文書構造ツリーを生成すると、図２３に示すような文書構造ツリーを得ることができる。 For example, when the line order list indicates whether or not there is a line break (Yes or No) as paragraph information as shown in FIG. 22, it can be considered that a new paragraph starts from a line with a line break, and therefore there is a line break (Yes). The line can be determined as the paragraph start line. Also, the line from the start of the paragraph until the next line feed position is found is regarded as the same paragraph range. If the document images in FIG. 6 are classified for each paragraph in accordance with the updated paragraph information in the line order list, they can be classified as shown in FIG. In FIG. 24, the paragraph R7 starts from the lower left three character string lines (line ID: 131, 133, 135), the upper right chart line (line ID: 105), and the four character string lines (line ID: 111). , 113, 115, 117) are the same paragraph range, and there is no line break position between the character string lines of line ID: 135 and line ID: 111, so that a series of sentences can be formed. It becomes possible. Furthermore, when a document structure tree is generated according to this line order list, a document structure tree as shown in FIG. 23 can be obtained.

［７．変換可否判定処理部］
＜７−１．変換可否判定処理部の処理＞
変換可否判定処理部３４は、行解析処理部３１からレイアウト解析処理部３３までの処理にて得られた情報（文書画像から抽出した文字列あるいは図又は表の特徴量）から、文書画像を次に示す条件によってリフロー型に変換するか否か、言い換えれば、文書画像に含まれる各要素を再構成するか否か、を判定する。
第１の条件：行解析処理部３１により文書内に縦書きの行と横書きの行とが無視できない比率で混在しており文書全体の方向を一意に判別できないと判定された。
第２の条件：行解析処理部３１の処理では図表行しか抽出されなかった。
第３の条件：行解析処理部３１にて抽出された文字列の行数が、所定の閾値（ＴＨｌｎ）（第１閾値）以下である。
第４の条件：行解析処理部３１にて抽出された全ての文字列の行に対する、横書きの行の高さ（縦書きの場合は行の幅）が、所定の閾値（ＴＨｃｓ）（第２閾値）以上である。
第５の条件：レイアウト解析処理部３３にて順序付けられた横書き又は縦書きの行ブロックの位置が、一定の範囲（ＴＨｒｇ）内には揃っていない。 [7. Conversion enable / disable determination processing unit]
<7-1. Processing of conversion possibility determination processing unit>
The conversion possibility determination processing unit 34 next converts the document image from the information (character strings extracted from the document image or feature values of the figure or table) obtained by the processing from the line analysis processing unit 31 to the layout analysis processing unit 33. Whether to convert to the reflow type, in other words, whether to reconstruct each element included in the document image is determined.
First condition: The line analysis processing unit 31 determines that the vertical writing line and the horizontal writing line are mixed in a ratio that cannot be ignored in the document, and the direction of the entire document cannot be uniquely determined.
Second condition: In the processing of the row analysis processing unit 31, only the chart row was extracted.
Third condition: The number of lines of the character string extracted by the line analysis processing unit 31 is equal to or less than a predetermined threshold (THln) (first threshold).
Fourth condition: The height of the horizontal writing line (the width of the line in the case of vertical writing) for all the character string lines extracted by the line analysis processing unit 31 is a predetermined threshold (THcs) (second Threshold) or more.
Fifth condition: The positions of the horizontal writing or vertical writing row blocks ordered by the layout analysis processing unit 33 are not aligned within a certain range (THrg).

変換可否判定処理部３４は、入力文書画像が以上の５つの条件の何れかに当てはまる場合には、リフロー型に変換しないと判定する。そして、変換可否判定処理部３４は、リフロー型に変換しないと判定した場合は、圧縮処理部２１に判定信号を出力する。圧縮処理部２１では、上記判定信号を受信すると、ＲＧＢの画像データを例えば、ＪＰＥＧファイルフォーマットに変換して出力する。すなわち、フィックス型で表示されるように画像全体をそのまま出力する。このように、本実施形態では、圧縮処理部２１が、リフロー型に変換しないと判定された文書画像をフィックス型の表示が可能にフォーマット変換するフォーマット変換処理部として機能する。 The conversion possibility determination processing unit 34 determines that the input document image is not converted to the reflow type when any of the above five conditions is satisfied. If the conversion possibility determination processing unit 34 determines not to convert to the reflow type, it outputs a determination signal to the compression processing unit 21. Upon receiving the determination signal, the compression processing unit 21 converts the RGB image data into, for example, a JPEG file format and outputs it. That is, the entire image is output as it is so as to be displayed in a fixed type. As described above, in the present embodiment, the compression processing unit 21 functions as a format conversion processing unit that converts the format of a document image that is determined not to be converted into the reflow type so that a fixed type display can be performed.

また、変換可否判定処理部３４は、入力文書画像が以上の５つの条件の何れかも当てはまらない場合には、リフロー型に変換すると判定する。リフロー型に変換すると判定された文書画像については、次の再配置処理部３５にて、参照リストの生成が行われる。 The conversion possibility determination processing unit 34 determines that the input document image is converted to the reflow type when any of the above five conditions is not satisfied. With respect to the document image determined to be converted to the reflow type, the next rearrangement processing unit 35 generates a reference list.

＜７−２．判定条件の詳細＞
上記５つの条件についてそれぞれ詳細に説明する。 <7-2. Details of judgment conditions>
Each of the above five conditions will be described in detail.

まず、第１の条件について説明する。上記したように、行解析処理部３１は、取得した全ての文字列の持つ方向を分類し、その比率により文書第１方向を決定する際に、例えば、単純に横書きもしくは縦書きの文字列の数をカウントする。そして、カウントした数の比率を算出することにより、算出した比率が所定閾値（例えば、０．７）以下である場合、文書画像には縦書きの行と横書きの行とが無視できない比率で混在していると判定する。そのため、文書全体の方向を一意に判別できず、また、レイアウト解析する際にも、縦書きと横書きの行をどう接続していけばよいかの判定が困難になる。よって、リフロー型に変換しないと判定する。 First, the first condition will be described. As described above, the line analysis processing unit 31 classifies the directions of all the acquired character strings, and determines the first direction of the document based on the ratios. Count the number. Then, by calculating the ratio of the counted number, when the calculated ratio is equal to or less than a predetermined threshold (for example, 0.7), the vertically written line and the horizontally written line are mixed in a ratio that cannot be ignored in the document image. It is determined that Therefore, the direction of the entire document cannot be uniquely determined, and it is difficult to determine how to connect vertical writing and horizontal writing lines when performing layout analysis. Therefore, it determines with not converting into a reflow type.

次に、第２の条件について説明する。行解析処理部３１にて図表行しか抽出されなかった場合、図や表などはその中に文字があったとしても、レイアウトを崩さずにそのまま表示しないと内容がわからなくなってしまう可能性がある。よって、リフロー型に変換しないと判定する。 Next, the second condition will be described. If only the diagram line is extracted by the line analysis processing unit 31, even if there are characters in the diagram or table, the contents may not be understood if the layout is not broken and displayed as it is. . Therefore, it determines with not converting into a reflow type.

次に、第３の条件について説明する。行解析処理部３１にて抽出した文字列の行数が例えば５行以下（ＴＨｌｎ＝５）であった場合には、わざわざリフロー型にして読むほどの行数でもないと考えられる。そのため、リフロー型に変換しないと判定する。この閾値（ＴＨｌｎ）をいくつにするかは、例えば、表示する画面のサイズと行の文字サイズに応じて１画面で表示できる行数分を閾値とすることが考えられる。 Next, the third condition will be described. If the number of lines in the character string extracted by the line analysis processing unit 31 is, for example, 5 lines or less (THln = 5), it is considered that the number of lines is not enough to be read in the reflow type. Therefore, it determines with not converting into a reflow type. The threshold (THln) can be determined by, for example, setting the threshold for the number of lines that can be displayed on one screen according to the size of the screen to be displayed and the character size of the lines.

次に、第４の条件について説明する。行解析処理部３１にて抽出した全ての文字列の行に対し、横書きの行の高さ（縦書きの場合は幅）が例えば４０画素以上（ＴＨｃｓ＝４０）であった場合には、文字が十分大きいので、縮小表示しても十分可読性があると考えられる。よって、リフロー型に変換しないと判定する。この閾値（ＴＨｃｓ）をいくつにするかは、例えば、表示する画面のサイズと画像全体のサイズによる縮小率と、縮小した場合に読める最小のフォントサイズの画素数とから、元の画像での１文字の大きさの画素数を算出し、それを閾値とすることが考えられる。 Next, the fourth condition will be described. If the height of a horizontal writing line (width in vertical writing) is, for example, 40 pixels or more (THcs = 40) with respect to all the character string lines extracted by the line analysis processing unit 31, Is sufficiently large, it is considered that it is sufficiently readable even if it is reduced in size. Therefore, it determines with not converting into a reflow type. The threshold value (THcs) is determined by, for example, 1 in the original image based on the size of the screen to be displayed, the reduction ratio based on the size of the entire image, and the number of pixels of the minimum font size that can be read when the image is reduced. It is conceivable that the number of pixels of the character size is calculated and used as a threshold value.

次に、第５の条件について説明する。レイアウト解析処理部３３にて順序付けられた横書き又は縦書きの行ブロックの位置が、一定の範囲内に揃っていない場合には、整理されて段組みされたレイアウトでない可能性が高いため、行ブロックの接続が失敗し、誤った行ブロック同士をつないでしまう可能性が高い。そのため、リフロー型に変換しないと判定する。例えば、図３１に示すように縦書きの行ブロックが複数存在し、また、図表も複数存在している文書画像について考える。この文書画像について、行ブロックのブロックサイズと位置情報からレイアウト解析処理部３３で順序付けられた行ブロックの位置のずれを求める。ここで、一定の範囲内で揃っているかどうかの閾値（ＴＨｒｇ）としてＴＨｒｇ＝４０とすると、どの行ブロックも閾値（ＴＨｒｇ）を超えており、一定範囲内に揃っていないということになる。このような場合、ブロック単位で上から順に行を接続していくと、縦書き１→縦書き２→縦書き４→縦書き３の順に接続されてしまい、正しい順序にならない。また、ブロック単位で右から順に行を接続した場合には、縦書き１→縦書き３→縦書き２→縦書き４の順に接続されてしまい、こちらも正しい順序にはならない。このように行の接続順序が失敗する可能性が高くなる。そのため、このようにブロック位置が揃っていない文書画像の場合には、リフロー型に変換しないと判定する。 Next, the fifth condition will be described. If the positions of the horizontal or vertical row blocks ordered by the layout analysis processing unit 33 are not within a certain range, there is a high possibility that the layout is not organized and arranged in a row. There is a high probability that the connection will fail and connect the wrong row blocks. Therefore, it determines with not converting into a reflow type. For example, consider a document image having a plurality of vertically written row blocks and a plurality of charts as shown in FIG. For this document image, the deviation of the position of the row blocks ordered by the layout analysis processing unit 33 is obtained from the block size and position information of the row blocks. Here, if THrg = 40 is set as a threshold value (THrg) as to whether or not they are aligned within a certain range, any row block exceeds the threshold value (THrg) and is not aligned within the certain range. In such a case, if rows are connected in order from the top in block units, they are connected in the order of vertical writing 1 → vertical writing 2 → vertical writing 4 → vertical writing 3 and the order is not correct. Further, when rows are connected in order from the right in block units, they are connected in the order of vertical writing 1 → vertical writing 3 → vertical writing 2 → vertical writing 4, and this is not the correct order. In this way, there is a high possibility that the row connection order will fail. Therefore, in the case of a document image in which the block positions are not aligned in this way, it is determined that the document image is not converted to the reflow type.

変換可否判定処理部３４がリフロー型に変換しないと判定した文書画像（ページ）とリフロー型に変換すると判定した文書画像とを混在して表示させる処理については、後述の＜９−３＞の章にて説明する。 The processing for displaying the document image (page) determined not to be converted into the reflow type by the conversion possibility determination processing unit 34 and the document image determined to be converted into the reflow type in a mixed manner will be described later in the section <9-3>. Will be explained.

［８．再配置処理部］
＜８−１．再配置処理部の処理＞
再配置処理部３５は、変換可否判定処理部３４にてリフロー型に変換すると判定された文書画像に対して、以下の様に参照リストを生成する。 [8. Relocation processing section]
<8-1. Processing of Relocation Processing Unit>
The rearrangement processing unit 35 generates a reference list for the document image determined to be converted into the reflow type by the conversion possibility determination processing unit 34 as follows.

再配置処理部３５は、レイアウト解析処理部３３で定義した順序に従って行を呼び出す。そして、呼び出した行が文字列行である場合は対応する文字列行の先頭から順に文字を呼び出し、呼び出した行が図表行である場合は対応する図又は表を呼び出して、各要素の参照情報（要素を呼び出すための情報）を順に記述する。更に、要素が改行位置が含まれる行の末尾の要素である場合は、改行命令を挿入して参照リストを生成する。 The rearrangement processing unit 35 calls the rows in the order defined by the layout analysis processing unit 33. If the called line is a character string line, characters are called in order from the beginning of the corresponding character string line. If the called line is a chart line, the corresponding figure or table is called, and the reference information of each element (Information for calling an element) is described in order. Further, if the element is the last element in the line including the line feed position, a line feed command is inserted to generate a reference list.

つまり、参照リストは、文字列行及び図表行の順序付けに従って、文書画像に含まれる各要素の順序が、文字列中の文字においては文書第１方向に沿って、記述されている。ここで、参照リストは、ＨＴＭＬに代表されるようなマークアップ言語形式で記述することができる。以下では、ＨＴＭＬファイルとして記述する場合を例として、再配置処理部３５の詳細について説明する。 That is, in the reference list, the order of each element included in the document image is described along the first direction of the document in the character string according to the ordering of the character string line and the chart line. Here, the reference list can be described in a markup language format represented by HTML. Below, the case where it describes as an HTML file is taken as an example, and the detail of the rearrangement process part 35 is demonstrated.

図２５は、再配置処理部３５の処理手順を示すフローチャートである。以下、図２５に従って再配置処理部３５の処理内容を説明する。 FIG. 25 is a flowchart illustrating a processing procedure of the rearrangement processing unit 35. The processing contents of the rearrangement processing unit 35 will be described below with reference to FIG.

図２５に示すように、まず、ファイルのヘッダの記述を行う（Ｓ１１）。ファイルのヘッダは、そのファイルの各種情報を記述するものである。例えばＨＴＭＬファイルの場合、図２６に示すように、そのファイルがＨＴＭＬ言語で記述されていることの宣言（＜ＨＴＭＬ＞タグ）や、本文には記載しないファイルの情報、例えばスタイルの定義やページタイトル等の情報、コメント、本文の記述が開始することの宣言（＜ＢＯＤＹ＞タグ）等が含まれる。 As shown in FIG. 25, first, a file header is described (S11). The file header describes various information of the file. For example, in the case of an HTML file, as shown in FIG. 26, a declaration that the file is described in the HTML language (<HTML> tag), information on a file that is not described in the text, such as a style definition and a page title. Etc., a comment, a declaration that the body description starts (<BODY> tag), and the like.

続いて、本文の記述を行う。ここでは、レイアウト解析処理部３３で生成した文書構造ツリーの第０段落を初期呼び出し位置として、呼び出し位置において文書構造ツリーから段落の情報が呼び出し可能であるかどうかを判定する（Ｓ１２）。呼び出し可能である場合（Ｓ１２の判定がＹＥＳ）、文書構造ツリーから段落を呼び出し、該段落が少なくとも１行以上の行を持つかどうか判定する（Ｓ１３）。該段落が少なくとも１行以上の行を持つ場合（Ｓ１３の判定がＹＥＳ）、該段落を構成する全ての行の情報を順次呼び出し、該行に含まれる要素を参照してファイルで表示するための記述を行うファイル記述処理を実行する（Ｓ１４）。他方、該段落が１つも行を持たない場合（Ｓ１３の判定がＮＯ）、呼び出し位置を次の段落に移し、Ｓ１２に戻る。 Next, the body text is described. Here, using the 0th paragraph of the document structure tree generated by the layout analysis processing unit 33 as an initial call position, it is determined whether or not paragraph information can be called from the document structure tree at the call position (S12). If the call is possible (YES at S12), a paragraph is called from the document structure tree, and it is determined whether the paragraph has at least one line (S13). When the paragraph has at least one line (Yes in S13), information on all lines constituting the paragraph is sequentially called, and elements included in the line are referred to and displayed in a file. A file description process for performing the description is executed (S14). On the other hand, if the paragraph has no lines (NO in S13), the calling position is moved to the next paragraph, and the process returns to S12.

図２７は、Ｓ１４でのファイル記述処理の処理手順を示すフローチャートである。呼び出した段落について、初めに、段落が開始することを宣言する段落開始宣言命令（後述の＜８−２＞の章を参照）を実行する（Ｓ２１）。その後、該段落が持つ文字列行を、文書構造ツリーの先頭から順次呼び出し、該行に含まれる文字要素を参照するための要素参照処理を実行する（Ｓ２２）。つまり、該行の先頭から順に要素（文字）を呼び出し、該要素の参照命令を実行する。具体的には、該行の先頭から順に要素を呼び出して、全ての要素に対して同様の処理を繰り返す。 FIG. 27 is a flowchart showing the procedure of the file description process in S14. For the called paragraph, first, a paragraph start declaration command (see the section <8-2> described later) for declaring that the paragraph starts is executed (S21). Thereafter, the character string lines of the paragraph are sequentially called from the top of the document structure tree, and element reference processing for referring to the character elements included in the line is executed (S22). That is, the elements (characters) are called in order from the top of the line, and the reference instruction for the element is executed. Specifically, the elements are called in order from the top of the line, and the same processing is repeated for all the elements.

全ての要素に対して参照命令を終了すると、該行についての参照処理を終了し、次の文字列行を呼び出して同様の処理を繰り返す。以上の処理を、文書構造ツリーにおける該段落が持つ全ての文字列行（該段落の文字列行ツリーに含まれる全ての文字列行）に対して実行する。該段落の文字列行ツリーに含まれる全ての文字列行に対する要素参照処理を終えると、次に、図表行の要素参照処理を実行する（Ｓ２３）。図表行は１つの要素しか持たないため、各図表行について１回の参照命令を行うと、該段落の持つ次の図表行（該段落の図表行ツリーに含まれる次の図表行）を呼び出し、同様の参照処理を実行する。該段落の図表行ツリーに含まれる全ての図表行について参照命令を実行すると、最後に、該段落についてのファイル記述処理が終了することを宣言する段落終了宣言（後述の＜８−２＞の章を参照）を実行し（Ｓ２４）、ファイル記述処理（Ｓ１４）を終了する。 When the reference command is completed for all the elements, the reference process for the line is terminated, the next character string line is called, and the same process is repeated. The above processing is executed for all character string rows (all character string rows included in the character string row tree of the paragraph) of the paragraph in the document structure tree. When the element reference processing for all the character string rows included in the character string row tree of the paragraph is completed, the element reference processing for the chart row is executed (S23). Since a chart row has only one element, if a reference instruction is performed once for each chart row, the next chart row (the next chart row included in the paragraph's chart row tree) of the paragraph is called, A similar reference process is executed. When the reference instruction is executed for all the chart rows included in the chart row tree of the paragraph, finally, a paragraph end declaration (chapter <8-2> described later) is declared to end the file description processing for the paragraph. ) Is executed (S24), and the file description process (S14) is terminated.

図２５に戻り説明を続ける。Ｓ１４の後は、呼び出し位置を次の段落に更新し、Ｓ１２に戻る。 Returning to FIG. 25, the description will be continued. After S14, the call position is updated to the next paragraph, and the process returns to S12.

以上の処理を、Ｓ１２の判定がＮＯになる、すなわち、文書構造ツリーの全ての段落に対するファイル記述処理を終了するまで反復する。Ｓ１２の判定がＮＯになると、フッタの記述を行う（Ｓ１５）。図２６に示すように、フッタには、ヘッダの記述において開始を宣言した事項（例えば、本文の記述や、ＨＴＭＬ言語の記述の開始等）の終了の宣言等が含まれる。フッタ部分の記述を終えるとファイルを保存し（Ｓ１６）、終了する。なお、当該ファイルは、再配置処理部３５から出力される。 The above processing is repeated until the determination in S12 is NO, that is, the file description processing for all the paragraphs in the document structure tree is completed. If the determination in S12 is NO, the footer is described (S15). As shown in FIG. 26, the footer includes a declaration of the end of matters (for example, a description of the body text, a description of the start of HTML language description, etc.) declared to be the start in the header description. When the description of the footer part is finished, the file is saved (S16) and the process ends. The file is output from the rearrangement processing unit 35.

＜８−２．各命令の例＞
段落開始宣言命令の例として、例えば、段落タグ＜ｐ＞を挿入する方法が挙げられる。この場合、段落終了宣言命令は必ず段落タグ＜／ｐ＞を挿入する方法とする。また、別の方法として、段落開始宣言命令は特に何も行わず、段落終了宣言命令として改行タグ＜ｂｒ＞を挿入する方法も挙げられる。要素の参照命令は、例えば、出力ファイルに画像表示タグ＜ｉｍｇ＞を挿入して、該要素の切り取り画像ファイルのパスを指定して表示させる方法が挙げられる。また、図表要素の参照命令は文字要素の参照命令と同様としてもよいし、図表要素の場合のみ、改行タグ＜ｂｒ＞や表タグ＜ｔａｂｌｅ＞及び＜／ｔａｂｌｅ＞等の挿入処理と組み合わせることで、より視認性の高いレイアウトを構成することができる。 <8-2. Example of each command>
As an example of a paragraph start declaration command, for example, a method of inserting a paragraph tag <p> can be mentioned. In this case, a paragraph end declaration instruction is always inserted into the paragraph tag </ p>. As another method, there is a method of inserting a line feed tag <br> as a paragraph end declaration command without performing any particular paragraph start declaration command. The element reference command includes, for example, a method in which an image display tag <img> is inserted into an output file, and the path of the cut image file of the element is designated and displayed. In addition, the reference instruction for the chart element may be the same as the reference instruction for the character element, and only in the case of the chart element, it can be combined with insertion processing such as a line feed tag <br>, table tags <table>, and </ table>. A layout with higher visibility can be configured.

＜８−３．図表を段落の先頭に配置修正したい場合＞
上記では、各段落において図表を本文の後に配置する場合の処理手順について説明した。反対に、図表を本文の前に配置したい場合は、図２７において、Ｓ２２とＳ２３とを入れ替えることで容易に実現できる。また、図表を、段落の末尾に配置、段落の先頭に配置、もしくは図表を表示しない（文字列のみ表示する）等の複数のモードから操作パネル６を通じてユーザが指定できるようにしてもよい。 <8-3. If you want to modify the chart at the beginning of the paragraph>
In the above, the processing procedure in the case of arranging the chart after the text in each paragraph has been described. On the other hand, if it is desired to place the chart in front of the text, it can be easily realized by replacing S22 and S23 in FIG. In addition, the user may be able to specify the chart through the operation panel 6 from a plurality of modes such as placing the chart at the end of the paragraph, placing it at the beginning of the paragraph, or not displaying the chart (displaying only the character string).

＜８−４．文書のスタイルの定義＞
上記命令のほか、スタイルシートを組み込むことで、文書のファイル書式を変更しても構わない。スタイルシートは、ＨＴＭＬ言語等で構造化された文書の見栄え、表示形式を効率的に制御する公知の手段である。これらの見栄え等の情報を「スタイル」と呼ぶ。スタイルシートは専用のコンピュータ言語で実現され、その例として、ＣＳＳ（ＣａｓｃａｄｉｎｇＳｔｙｌｅＳｈｅｅｔ）等が挙げられる。また、スタイルシートを定義する場所は大きく分けて３つある。それは、ヘッダ要素内（＜ｈｅａｄ＞〜＜/ｈｅａｄ＞間）にスタイル要素（＜ｓｔｙｌｅ＞〜＜/ｓｔｙｌｅ＞）を追加してスタイルを記述する方法、スタイルを記述した外部ファイルを用意して、参照リストのヘッダ要素内で前記外部ファイルを呼び出す方法、及び本文部分（＜ｂｏｄｙ＞〜＜/ｂｏｄｙ＞間）における各種タグにおいて、そのタグにのみ有効となるスタイルを記述する方法、である。これらの方法はただ１つに選択されるものではなく、複数の方法を組み合わせてスタイルシートを定義することもできる。本実施の形態では、公知の方法を利用するものとして以下では詳述をせず、簡単な記述例の紹介に留める。 <8-4. Definition of document style>
In addition to the above command, the file format of the document may be changed by incorporating a style sheet. A style sheet is a known means for efficiently controlling the appearance and display format of a document structured in an HTML language or the like. Such information such as appearance is called “style”. The style sheet is realized by a dedicated computer language, and examples thereof include CSS (Cascading Style Sheet). There are three main locations for defining style sheets. It includes a method of describing a style by adding a style element (<style> to </ style>) in a header element (between <head> to </ head>), and an external file describing the style. A method of calling the external file in the header element of the reference list, and a method of describing a style that is valid only for the tag in various tags in the body part (between <body> to </ body>). These methods are not just selected, and a style sheet can be defined by combining a plurality of methods. In this embodiment, it is assumed that a known method is used and will not be described in detail below, but only a simple description example will be introduced.

図２８は、ＣＳＳ形式で記述したスタイルシートの外部ファイルの例である。図中の範囲Ａでは、段落を定義するタグ＜ｐ＞のスタイルを定義しており、この例では各段落の先頭行に、１文字分のインデントを付加することが定義されている。これにより、段落開始宣言タグが呼び出された場合は常に１文字分のインデントを付加することができる。範囲Ｂと範囲Ｃでは、ともに画像を参照するタグ＜ｉｍｇ＞のスタイルを定義しているが、範囲Ｂではｇａｉｊｉというクラスに属する場合に限定したスタイルであり、範囲Ｃではｆｉｇというクラスに属する場合に限定したスタイルである。 FIG. 28 is an example of an external file of a style sheet described in the CSS format. In the range A in the figure, the style of the tag <p> that defines a paragraph is defined. In this example, it is defined that an indent for one character is added to the first line of each paragraph. Thereby, when the paragraph start declaration tag is called, an indent for one character can always be added. In the range B and the range C, the style of the tag <img> that refers to the image is defined, but in the range B, the style is limited to the case belonging to the class gajii, and in the range C, the style belongs to the class ig This is a limited style.

このように、同じタグに対しても、それぞれ固有のスタイルを持つ複数のクラスを定義して外部ファイルとして保存し（ここではｓｔｙｌｅ．ｃｓｓというファイル名をつけている）、図２９に示す例（ＨＴＭＬ言語で記述、一部のみ抜粋）のように、ヘッダ内で図２９の（ａ）に示すような前記外部ファイル（図２８のものと同じ）を読み込み（図２９の（ｂ）の３行目）、各タグにおいて所望のスタイルを持つクラスを指定することで、局所的なスタイルの指定を行うことが可能となる。図２８及び図２９の例では、画像の参照時において、文字を表わす画像である場合に、ｉｍｇタグ内において、図２９の（ｂ）に示すようにｃｌａｓｓ＝”ｇａｉｊｉ”と記述することで、ｇａｉｊｉ（外字）クラスを指定し、図表を表わす画像である場合にはｃｌａｓｓ＝”ｆｉｇ”と記述することで、ｆｉｇ（図）クラスを指定しており、このように要素毎に適切なスタイルを定義することで、より見栄えのよい文書を表示することが可能となる。 In this way, a plurality of classes having unique styles are defined for the same tag and saved as an external file (here, the file name is style.css), and an example shown in FIG. Read the external file (same as in FIG. 28) as shown in FIG. 29 (a) in the header as described in the HTML language (partially excerpted) (3 lines in FIG. 29 (b)). Eye) By specifying a class having a desired style in each tag, it is possible to specify a local style. In the example of FIG. 28 and FIG. 29, when referring to an image, if it is an image representing a character, by describing class = “gaiji” as shown in FIG. 29B in the img tag, A gaiji (external character) class is specified, and in the case of an image representing a chart, a class (“fig”) is specified by describing class = “fig”, and thus an appropriate style is specified for each element. By defining it, it becomes possible to display a document with better appearance.

また、操作パネル６でこれらのスタイルを選択して指定できるようにしてもよい。例えば、文書画像データから決定した文書第１方向とは別に、再配置する際の文書の方向を、操作パネル６を通じてユーザが指定できるようにしてもよい。具体的には、ＨＴＭＬ言語形式のフォーマットにおいてＣＳＳ形式によりｈｔｍｌ｛ｗｒｉｔｉｎｇ−ｍｏｄｅ：ｔｂ−ｒｌ；｝とスタイルシートを定義することで、本文全体の文書の行方向を上から下、更に行が右から始まり左に進む、すなわち縦書き表記が可能となる。 Further, these styles may be selected and designated on the operation panel 6. For example, apart from the first document direction determined from the document image data, the user may be able to specify the direction of the document at the time of rearrangement through the operation panel 6. Specifically, by defining the style sheet as html {writing-mode: tb-rl;} in the CSS format in the HTML language format, the line direction of the document of the entire body is changed from top to bottom, and the line is further to the right Starting from, it goes to the left, that is, vertical writing is possible.

従って、縦書きでのリフロー型の表示が選択された場合に、前記のようなスタイルシートの定義を追加するようにすることで、縦書き表示を実現できるようになる。なお、縦書き表記の実現方法は前記の方法以外の方法をとることができる。また、出力時の文書の方向として「自動モード」を準備し、行解析処理部３１で得られた文書第１方向と同じ方向で出力するように自動的に選択するようにすることもできる。 Therefore, when the reflow type display in the vertical writing mode is selected, the vertical writing display can be realized by adding the style sheet definition as described above. Note that a method other than the above-described method can be used as a method for realizing vertical writing. In addition, “automatic mode” may be prepared as the document direction at the time of output, and the document may be automatically selected to be output in the same direction as the first direction of the document obtained by the line analysis processing unit 31.

＜８−５．他のファイル形式＞
本実施の形態の再配置処理部３５においては、入力された文書画像からＨＴＭＬファイルを出力する場合について述べたが、出力するファイルは、ＨＴＭＬファイルに限らず、リフロー型表示を実現するあらゆるファイル形式（例えば、ＸＭＬファイル、ＸＭＤＦファイル等）から選択できる。 <8-5. Other file formats>
In the rearrangement processing unit 35 of the present embodiment, the case where an HTML file is output from an input document image has been described. However, the output file is not limited to an HTML file, and any file format that realizes reflow display. (For example, an XML file, an XMDF file, etc.) can be selected.

本実施の形態では、以上のように、文書画像からその構造を解析し、行の順序を理解することで、行を順に参照し、さらにその行を構成する文字や図表を先頭から順に参照していくことで、フィックス型の画像ファイルとして生成された文書画像であっても、リフロー型のファイルとして変換するための情報（参照ファイル）を得ることができる。また、改行の有無を判定し、段落の範囲を定義することで、文書画像に行の折り返しがあっても、１つの段落内では改行をしないようにするほか、各段落に従属する図又は表の配置を、その段落の先頭や末尾にまとめて表示する等の修正をすることで、文字列と文字列の間に図又は表が挟まれていても、同一段落内とみなされていれば、そのまま図又は表を挟まずに配置するよう修正することにより、文章の連続性及び可読性を向上することができる。 In the present embodiment, as described above, the structure is analyzed from the document image, the line order is understood by referring to the line order, and the characters and diagrams constituting the line are further referred from the top in order. By doing so, even if the document image is generated as a fixed image file, information (reference file) for conversion as a reflow file can be obtained. In addition, by determining the presence or absence of line breaks and defining the range of paragraphs, in addition to preventing line breaks within a single paragraph even if the document image has line wrapping, a figure or table subordinate to each paragraph If the figure is placed within the beginning or end of the paragraph and the figure or table is sandwiched between the strings, it can be considered to be within the same paragraph. The continuity and readability of the text can be improved by modifying the layout so that the figure or table is not sandwiched as it is.

以上のように、参照リストは、文字列行及び図表行の順序付けと文書の記述方向とに従ったものであり、リフロー型のファイルとして変換するための情報である。参照リストとして、文書画像をリフロー型表示が可能なファイル形式に変換したものを生成することで、処理量を減らして、あらゆる表示装置の表示領域（表示画面）にそれぞれ適したレイアウト配置が可能となる。参照リストを用いることで、表示装置では、文書画像の文書の記述方向に垂直な方向のスクロール操作のみでの表示を実現することが可能となる。 As described above, the reference list is information for conversion as a reflow type file according to the ordering of character string lines and chart lines and the document description direction. As a reference list, a document image converted into a file format that can be displayed in reflow format can be generated, reducing the amount of processing and enabling a layout arrangement suitable for each display area (display screen) of any display device. Become. By using the reference list, the display device can realize display only by scroll operation in a direction perpendicular to the document description direction of the document image.

また、参照リストを、文書構造を列記したテキストデータとして参照リストを生成することで、閲覧用途でなく文書構造の解析用途とすることができる。加えて、出力されたテキスト形式の参照リストから更に所望のファイルフォーマットに変換することで、別のファイルフォーマットで出力したい場合に最初から処理をやり直さなくてもよくなる。 Also, by generating the reference list as text data listing the document structure, the reference list can be used for analyzing the document structure instead of for browsing. In addition, by converting the output text format reference list into a desired file format, it is not necessary to perform the process from the beginning when it is desired to output in another file format.

［９．表示装置］
＜９−１．表示装置での参照リストの使用＞
前記生成された参照リスト及び切り出した各要素の画像データは、本実施形態の画像形成装置１００が備える送信装置４から送信され、図示しない受信側の装置である表示装置（例えば、スマートフォンやタブレット等であってもよい）が備えるアプリケーションである閲覧プログラム（ビューア）を通じて閲覧することができる。参照リストが取るファイル形式によって最適な閲覧プログラムは異なり、例えば、参照リストとしてＨＴＭＬファイルの形式をとる場合、ＩｎｔｅｒｎｅｔＥｘｐｌｏｒｅｒ（登録商標）等、良く知られたＨＴＭＬ５をサポートしたウェブブラウザを用いて開くことで、容易にリフロー型の表示を実現することが可能となる。 [9. Display device]
<9-1. Use of reference lists on display devices>
The generated reference list and the image data of each extracted element are transmitted from the transmission device 4 included in the image forming apparatus 100 of the present embodiment, and are a display device (for example, a smartphone or a tablet) that is a reception-side device (not shown). Can be viewed through a viewing program (viewer) which is an application provided in the device. The optimal browsing program differs depending on the file format taken by the reference list. For example, when the HTML file format is used as the reference list, it is opened using a well-known HTML browser such as Internet Explorer (registered trademark). Thus, it is possible to easily realize a reflow type display.

なお、特定のマークアップ言語形式のファイルに変換せず、各要素（文字、図、表）を参照する順序と段落の開始宣言及び終了宣言等、再配置処理部３５で得られた文書構造を列記したテキストデータとして参照リストを生成することもできる。この参照リストは、例えば文書構造の解析結果として利用することも可能であるし、受信側の装置が備えるコンピュータプログラム（変換プログラム）等によって、所望のファイル形式に変換することも可能である。また、このようにファイル形式の変換を二段階とすることで、処理ステップ数の増加はあるものの、同じ文書画像データから複数のファイル形式への変換を行いたい場合に参照リストを共通で使用することが可能となる。さらに、前記変換プログラムにおいてレイアウト解析結果の手動による修正も可能となる。 It should be noted that the document structure obtained by the relocation processing unit 35, such as the order of referring to each element (character, figure, table) and the start declaration and end declaration of paragraphs, without being converted into a specific markup language format file, It is also possible to generate a reference list as listed text data. This reference list can be used, for example, as an analysis result of the document structure, or can be converted into a desired file format by a computer program (conversion program) provided in the receiving apparatus. In addition, using two stages of file format conversion in this way increases the number of processing steps, but uses a common reference list when you want to convert the same document image data to multiple file formats. It becomes possible. Furthermore, the layout analysis result can be manually corrected in the conversion program.

受信側の表示装置では、画像形成装置１００から送信される参照リストとして、表示装置の備えるビューアアプリケーションに適したファイル形式（ＨＴＭＬ等）に変換したものを受信すると、そのアプリケーションにおいてファイルを読み込むだけでリフロー型の文書に変換されたものを表示できるので、特別な処理はない。ただし、参照リストが特定のビューアと関連付けされていない、たとえば、最も単純な形式として座標情報等を列記しただけのようなものである場合、そのままではリフロー型の表示はできない。このような参照リストを受信した場合、再配置するための処理は必要となるが、公知の処理を利用することができる。 When the display apparatus on the receiving side receives a reference list transmitted from the image forming apparatus 100 that has been converted into a file format (such as HTML) suitable for a viewer application provided in the display apparatus, the application simply reads the file in the application. There is no special processing because it is possible to display the converted document. However, when the reference list is not associated with a specific viewer, for example, when the coordinate information is listed as the simplest format, the reflow display cannot be performed as it is. When such a reference list is received, a process for rearrangement is necessary, but a known process can be used.

また、表示装置の表示領域の表示幅に合わせて、参照リストが参照する文字は、折り返し部分が変更され、図や表については、表示幅に合わせて縮小又は拡大表示される。例えば、上記＜８−４＞で説明したように、図又は表にｆｉｇクラスを割り当て、ＨＴＭＬファイル形式で出力する場合、画像の幅（もしくは高さ）を表示装置の表示幅（もしくは高さ）に対する割合として設定することで、表示装置の表示幅や表示倍率が変更となった場合でも、そのときの表示幅（もしくは高さ）に合わせて自動調整させることが可能となる。 In addition, the folded portion of the character referred to by the reference list is changed according to the display width of the display area of the display device, and the figure or table is reduced or enlarged according to the display width. For example, as described in <8-4> above, when assigning a fig class to a figure or table and outputting it in the HTML file format, the width (or height) of the image is set to the display width (or height) of the display device. Thus, even when the display width or display magnification of the display device is changed, it is possible to automatically adjust according to the display width (or height) at that time.

上記のように表示装置の表示幅（横書きの場合）に対する割合として画像の幅を設定する方法の例として、たとえば、ｉｍｇタグの呼び出し時にｆｉｇクラスに属するもののみｉｍｇタグのサイズ属性として下記のようにパーセンテージを追加する、
＜ｉｍｇｃｌａｓｓ＝”ｆｉｇ” ｓｒｃ＝”〜〜”ｗｉｄｔｈ＝”９０％”／＞・・・（８）
もしくはＣＳＳ形式でｆｉｇクラスのスタイルシートとして下記を追加する。
.ｆｉｇ｛ｗｉｄｔｈ：９０％；｝・・・（９）
等の方法がある。上記（８）又は（９）の方法はいずれも、表示幅に対して９０％のサイズが図又は表の幅として設定され、表示領域が変更された際には自動で変更後の表示幅の９０％を図又は表の幅として再設定する。（８）は、上記のサイズ属性の設定を追加したｉｍｇタグのみに適用される、すなわち個別に適用されるのに対し、（９）は、ｆｉｇクラスを割り当てた図表を一括で設定する点で異なる。もちろん、上記以外の公知の方法を利用することも可能である。 As an example of the method for setting the image width as a ratio to the display width (in the case of horizontal writing) of the display device as described above, for example, only those belonging to the fig class when the img tag is called are as the size attribute of the img tag as follows: Add a percentage to the
<Img class = “fig” src = ”˜˜“ width = “90%” /> (8)
Alternatively, add the following as a style sheet for the fig class in the CSS format.
.fig {width: 90%;} (9)
There are methods. In any of the methods (8) and (9), 90% of the display width is set as the width of the figure or table, and the display width after the change is automatically changed when the display area is changed. Reset 90% as the width of the figure or table. (8) applies only to the img tag to which the setting of the size attribute described above is added, that is, it is applied individually, whereas (9) is a point that sets the diagram to which the fig class is assigned collectively. Different. Of course, it is also possible to use known methods other than those described above.

＜９−２．図表の表示の別の例＞
上記＜８−１＞、＜８−３＞では、段落の最後又は最初に図表を表示する方法を開示したが、それらとは別に、図表を本文と同じファイル上に混在して表示せずに、画像へのリンク一覧（先頭の段落から順に図表を並べる）を別途作成し、そのリンク一覧からユーザが所望したときに選択した画像を個別に表示できるようにしてもよい。 <9-2. Another example of chart display>
In the above <8-1> and <8-3>, the method of displaying a chart at the end or the beginning of a paragraph has been disclosed, but separately from that, the chart is not displayed in the same file as the main text. Alternatively, a list of links to images (arrange charts in order from the first paragraph) may be created separately so that images selected by the user when desired from the list of links can be displayed individually.

例えば、画像形成装置１００にて生成された参照リスト及び抽出した（切り出した）各要素の画像を受信する表示装置において、コンテンツを表示するコンテンツ表示領域とユーザ操作を受け付ける操作領域とを個別に備え、操作領域に、本文と画像リンク一覧との表示を切り替える操作機能を実行する手段を備える。このような構成とすることで、ユーザが所望するときに、コンテンツ表示領域に表示する内容を本文と図表とで切り替えるようにすることができる。 For example, in a display device that receives a reference list generated by the image forming apparatus 100 and an image of each extracted (cut out) element, a content display area for displaying content and an operation area for receiving user operations are individually provided. The operation area includes means for executing an operation function for switching between the display of the text and the image link list. With such a configuration, the contents displayed in the content display area can be switched between the text and the chart when the user desires.

また、表示装置において、コンテンツ表示領域と操作領域とを個別に備え、操作領域に、リンク一覧の先頭から順に各画像へのリンクを表示し、選択できるように構成してもよい。リンクの表示の方法は、例えば、符号（図１、図２、・・・等順序を認識できるものが望ましい）や、画像のサムネイル等を利用することができる。リンクを選択すると、選択したリンクに対応する図表を表示する。図表の表示方法は、例えば、コンテンツ表示領域に表示する方法のほかに、コンテンツ表示領域に重畳して図表表示領域を生成し、図表表示領域に該当する図表を表示する、いわゆるポップアップ形式を採用してもよい。 Further, the display device may be configured such that a content display area and an operation area are individually provided, and links to each image are displayed in the operation area in order from the top of the link list. As a method for displaying the link, for example, a code (preferably capable of recognizing the order such as FIG. 1, FIG. 2,...), An image thumbnail, or the like can be used. When a link is selected, a chart corresponding to the selected link is displayed. For example, in addition to the method of displaying in the content display area, the chart display method adopts a so-called pop-up format in which a chart display area is generated by being superimposed on the content display area and the chart corresponding to the chart display area is displayed. May be.

上記のように表示装置がコンテンツ表示領域と操作領域とを備える場合、操作領域は、コンテンツ表示領域と必ずしも分離されている必要は無く、コンテンツ表示領域に重畳して表示してもよく、また、操作領域は常時表示せずに表示の命令が入力された場合にのみ画面上に表示するようにしてもよい。表示の命令の入力方法は、例えば、表示装置が備えるタッチパネルのうち表示領域に該当する範囲において、一定時間以上、タッチ開始時点でのタッチ座標から一定距離以上離れることなくタッチパネルをタッチした状態を継続する（いわゆる長押し）等が挙げられる。 When the display device includes the content display area and the operation area as described above, the operation area does not necessarily have to be separated from the content display area, and may be displayed so as to be superimposed on the content display area. The operation area may not be always displayed but may be displayed on the screen only when a display command is input. For example, in the display command input method, in the range corresponding to the display area of the touch panel included in the display device, the touch panel is kept touched without leaving the touch coordinates at a certain distance or longer for a certain time or longer. (So-called long press).

なお、上記の方法自体は、図表の順序や対応する段落との関連付けの精度を向上するものでは無いが、図表の表示位置を固定せず、「（ほぼ）本文の順序通りに並んだ図表の一覧」からユーザ自身が選択して閲覧することで、対応する段落との関連付けが困難な複雑なレイアウトの文章を読み進める上での違和感を解消させることができる。 Note that the above method itself does not improve the accuracy of the order of the chart and the association with the corresponding paragraph, but does not fix the display position of the chart, and "(almost) of the chart arranged in the order of the text" When the user himself / herself selects and browses from the “list”, it is possible to eliminate a sense of incongruity when reading a sentence having a complicated layout that is difficult to associate with the corresponding paragraph.

＜９−３．リフロー型とフィックス型の文書の表示＞
変換可否判定処理部３４がリフロー型に変換不可と判定したページとリフロー型に変換可能と判定したページとを混在して表示させる処理について以下で説明する。 <9-3. Display of reflow type and fixed type documents>
Processing for displaying the page that the conversion possibility determination processing unit 34 determines to be unconvertible to the reflow type and the page determined to be convertible to the reflow type will be described below.

フィックス型の場合、ページ全体を１つの画像として扱うことで、リフロー型と混在させて表示させることが可能である。ただし、画像として扱われるフィックス型のページをリフロー型と混在させて表示させると画像が全画面表示になってしまう。そのため、文字が読みづらいページもある。その場合には、表示装置をリフロー型とフィックス型の表示を切り替えられる構成にしておくと、各ページに合わせて読みやすいフォーマットで表示させることが可能となる。 In the case of the fixed type, the entire page can be handled as one image, and can be displayed in a mixed manner with the reflow type. However, if a fixed page treated as an image is mixed with the reflow type and displayed, the image is displayed in full screen. For this reason, there are pages where it is difficult to read characters. In that case, if the display device is configured to be able to switch between reflow type and fixed type display, it is possible to display in a format that is easy to read according to each page.

例えば、ＨＴＭＬ５などを用いてウェブブラウザで表示させる場合、ブラウザの表示領域内にリフロー型表示とフィックス型表示との切替ボタン（切替部）を表示させ、押されたボタンに合わせて表示領域にリフロー型での表示とフィックス型での表示を切り替えて表示できるようにしておけばよい。例えば、切替ボタンとしては、各型を選択するボタンをそれぞれ設ける、あるいは、トグル状のスイッチを設け、何れかの型を選択できるようにすればよい。 For example, when displaying on a web browser using HTML5 or the like, a switching button (switching unit) between a reflow type display and a fix type display is displayed in the display area of the browser, and the reflow is performed in the display area according to the pressed button. It should be possible to switch between the display in the type and the display in the fixed type. For example, as the switching button, a button for selecting each type may be provided, or a toggle-like switch may be provided so that any type can be selected.

〔実施の形態２：画像読取装置〕
実施の形態１では、本発明に係る文書画像処理装置を画像形成装置が有する画像処理装置に適用した構成について説明したが、これに限るものではない。そこで、本実施の形態では、本発明に係る文書画像処理装置を変換処理部として、フラットベッドスキャナ等の画像読取装置が有する画像処理装置に適用した例について説明する。 [Embodiment 2: Image reading apparatus]
In Embodiment 1, the configuration in which the document image processing apparatus according to the present invention is applied to the image processing apparatus included in the image forming apparatus has been described. However, the present invention is not limited to this. Therefore, in the present embodiment, an example in which the document image processing apparatus according to the present invention is applied to an image processing apparatus included in an image reading apparatus such as a flatbed scanner as a conversion processing unit will be described.

なお、実施の形態１の説明に用いた図面に記載されている部材と同じ機能を有する部材については、以下の説明においても同じ符号を付記する。また、それらの各部材の詳細な説明はここでは繰り返さない。 In addition, about the member which has the same function as the member described in drawing used for description of Embodiment 1, the same code | symbol is attached | subjected also in the following description. The detailed description of each member will not be repeated here.

図３０は、実施の形態２に係る画像処理装置１ａを備える画像読取装置（情報処理装置）２００の構成を示すブロック図である。図３０に示すように、画像読取装置２００は、画像処理装置１ａ、画像入力装置２、送信装置４、記憶部５、及び操作パネル６を備えている。画像処理装置１ａは、Ａ／Ｄ変換部１１、シェーディング補正部１２、原稿種別判別部１３、入力階調補正部１４、領域分離処理部１５、圧縮処理部２１、及び変換処理部（文書画像処理装置）２２を備えている。当該変換処理部２２にて、実施の形態１にて説明したのと同様に、参照リストが生成される。 FIG. 30 is a block diagram illustrating a configuration of an image reading apparatus (information processing apparatus) 200 including the image processing apparatus 1a according to the second embodiment. As shown in FIG. 30, the image reading apparatus 200 includes an image processing apparatus 1 a, an image input apparatus 2, a transmission apparatus 4, a storage unit 5, and an operation panel 6. The image processing apparatus 1a includes an A / D conversion unit 11, a shading correction unit 12, a document type determination unit 13, an input tone correction unit 14, a region separation processing unit 15, a compression processing unit 21, and a conversion processing unit (document image processing Device) 22. In the conversion processing unit 22, a reference list is generated in the same manner as described in the first embodiment.

画像読取装置２００で実行される各種処理は、画像読取装置２００に備えられる図示しない制御部（ＣＰＵあるいはＤＳＰ等のプロセッサを含むコンピュータ）により制御される。 Various processes executed by the image reading apparatus 200 are controlled by a control unit (a computer including a processor such as a CPU or DSP) provided in the image reading apparatus 200.

本実施の形態では、画像読取装置２００は、スキャナに限定されることはなく、例えば、デジタルスチルカメラ、書画カメラ、あるいは、カメラを搭載した電子機器類（例えば、携帯電話、スマートフォン、タブレット端末等）であってもよい。これらカメラあるいはカメラを搭載した電子機器類においては、自装置にて文書画像の構造解析を行って、自装置の表示部にてリフロー型あるいはフィックス型で表示することが可能に構成されていてもよい。 In the present embodiment, the image reading apparatus 200 is not limited to a scanner. For example, a digital still camera, a document camera, or an electronic device equipped with a camera (for example, a mobile phone, a smartphone, a tablet terminal, etc.) ). Even if these cameras or electronic devices equipped with cameras are configured to be able to perform structural analysis of document images on their own devices and display them in a reflow type or a fixed type on their display units Good.

〔実施の形態３：ネットワークを経由した文書画像処理装置〕
上記では、本発明に係る文書画像処理装置を、画像形成装置１００が有する画像処理装置１あるいは画像読取装置２００が有する画像処理装置１ａに適用する例を示したが、これに限るものではない。本発明に係る文書画像処理装置を、例えばサーバ装置に適用してもよい。この場合のサーバ装置の構成の一例は、画像形成装置あるいは画像読取装置により画像読取及び各種画像処理が施された文書画像をネットワークを介して受信する受信装置と、実施の形態１にて説明した変換処理部２２での処理を実行する文書画像処理装置と、当該文書画像処理装置から出力されたファイル（文書画像及び参照リスト）をネットワークを介して送信する送信装置と、を備えたサーバ装置（情報処理装置）である。 [Embodiment 3: Document image processing apparatus via network]
In the above description, the document image processing apparatus according to the present invention is applied to the image processing apparatus 1 included in the image forming apparatus 100 or the image processing apparatus 1a included in the image reading apparatus 200. However, the present invention is not limited to this. The document image processing apparatus according to the present invention may be applied to, for example, a server apparatus. An example of the configuration of the server apparatus in this case is described in the first embodiment with the receiving apparatus that receives the document image that has been subjected to image reading and various image processing by the image forming apparatus or the image reading apparatus, via the network. A server apparatus comprising: a document image processing apparatus that executes processing in the conversion processing unit 22; and a transmission apparatus that transmits files (document images and reference lists) output from the document image processing apparatus via a network. Information processing apparatus).

このようにサーバ装置を構成することにより、画像形成装置あるいは画像読取装置にて画像読取及び各種画像処理が施された文書画像を、ネットワークを経由して受信して、上記の変換処理部２２での処理を実行する文書画像処理装置により参照リストを生成し（フォーマット変換を適用し）、出力されたファイルをユーザの端末装置（例えば、スマートフォンやタブレット端末等）に送信する、という使い方が可能となる。また、このサーバ装置により、既に設置された画像形成装置あるいは画像読取装置を交換することなく、フォーマット変換機能を利用することが可能となる。また、フォーマット変換後のファイルをサーバ装置に記憶しておくことで、ユーザが望むときに変換後のファイルを受信して閲覧することも可能となる。 By configuring the server device in this manner, a document image that has been subjected to image reading and various image processing by the image forming device or the image reading device is received via the network, and the conversion processing unit 22 performs the above processing. It is possible to use such a method that a reference list is generated by a document image processing apparatus that executes the above processing (format conversion is applied), and an output file is transmitted to a user terminal device (for example, a smartphone or a tablet terminal). Become. In addition, the server apparatus can use the format conversion function without replacing an already installed image forming apparatus or image reading apparatus. Further, by storing the file after format conversion in the server device, it is possible to receive and browse the converted file when the user desires.

あるいは、本発明に係る文書画像処理装置を、例えば、携帯電話、スマートフォン、タブレット端末、電子書籍専用端末等の通信端末装置に適用してもよい。この場合の通信端末装置の構成の一例は、画像形成装置あるいは画像読取装置により画像読取及び各種画像処理が施された文書画像をネットワークを介して受信する受信部と、実施の形態１にて説明した変換処理部２２と、実施の形態１にて説明した表示装置と、を備えた通信端末装置（情報処理装置）である。当該通信端末装置は、電子化された文書画像を受信し、受信した文書画像の構造解析を行って、リフロー型あるいはフィックス型にて表示することが可能である。 Or you may apply the document image processing apparatus which concerns on this invention to communication terminal apparatuses, such as a mobile telephone, a smart phone, a tablet terminal, an electronic book exclusive terminal, for example. An example of the configuration of the communication terminal device in this case will be described in the first embodiment, a receiving unit that receives a document image that has been subjected to image reading and various image processing by the image forming device or the image reading device, and a network. The communication terminal device (information processing device) includes the conversion processing unit 22 and the display device described in the first embodiment. The communication terminal device can receive the digitized document image, analyze the structure of the received document image, and display it in a reflow type or a fixed type.

なお、上記サーバ装置の受信装置又は上記通信端末装置が受信する文書画像は、画像形成装置や画像読取装置等にて生成された文書画像である必要は無い。例えばＷｏｒｄファイルやＰＤＦファイル等のように、構造化された文書ファイルを電子化したものを文書画像として受信してもよい。 The document image received by the receiving device of the server device or the communication terminal device need not be a document image generated by an image forming device, an image reading device, or the like. For example, an electronic version of a structured document file such as a Word file or a PDF file may be received as a document image.

〔実施の形態４：記録媒体・プログラム〕
上記で説明した画像処理装置１，１ａ（特に、変換処理部２２，２２ａ）、サーバ装置（特に、文書画像処理装置）、通信端末装置（特に、変換処理部）は、集積回路（ＩＣチップ）等に形成された論理回路（ハードウェア）によって実現してもよいし、ＣＰＵを用いてソフトウェアによって実現してもよい。 [Embodiment 4: Recording Medium / Program]
The image processing devices 1 and 1a described above (particularly, the conversion processing units 22 and 22a), the server device (particularly the document image processing device), and the communication terminal device (particularly the conversion processing unit) are integrated circuits (IC chips). It may be realized by a logic circuit (hardware) formed in the above, or may be realized by software using a CPU.

後者の場合、画像処理装置１，１ａ、サーバ装置、通信端末装置は、各機能を実現するソフトウェアであるプログラムの命令を実行するＣＰＵ、上記プログラム及び各種データがコンピュータ（又はＣＰＵ）で読み取り可能に記録されたＲＯＭ（Read Only Memory）又は記憶装置（これらを「記録媒体」と称する）、上記プログラムを展開するＲＡＭ（Random Access Memory）等を備えている。そして、コンピュータ（又はＣＰＵ）が上記プログラムを上記記録媒体から読み取って実行することにより、本発明の目的が達成される。上記記録媒体としては、「一時的でない有形の媒体」、例えば、テープ、ディスク、カード、半導体メモリ、プログラマブルな論理回路等を用いることができる。また、上記プログラムは、該プログラムを伝送可能な任意の伝送媒体（通信ネットワークや放送波等）を介して上記コンピュータに供給されてもよい。なお、本発明は、上記プログラムが電子的な伝送によって具現化された、搬送波に埋め込まれたデータ信号の形態でも実現され得る。 In the latter case, the image processing apparatuses 1 and 1a, the server apparatus, and the communication terminal apparatus can read a program command that is software that realizes each function, and the program and various data can be read by a computer (or CPU). A ROM (Read Only Memory) or a storage device (these are referred to as “recording media”), a RAM (Random Access Memory) for expanding the program, and the like are provided. And the objective of this invention is achieved when a computer (or CPU) reads the said program from the said recording medium and runs it. As the recording medium, a “non-temporary tangible medium” such as a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, or the like can be used. The program may be supplied to the computer via an arbitrary transmission medium (such as a communication network or a broadcast wave) that can transmit the program. The present invention can also be realized in the form of a data signal embedded in a carrier wave in which the program is embodied by electronic transmission.

なお、上記で説明した文書画像処理装置及び文書画像処理方法は、カラーの画像データを扱う構成としたが、これに限るものではなく、白黒の画像データを扱う構成であってもよい。 The document image processing apparatus and the document image processing method described above are configured to handle color image data, but are not limited thereto, and may be configured to handle monochrome image data.

本発明は上述した各実施の形態に限定されるものではなく、種々の変更が可能である。すなわち、本発明の要旨を逸脱しない範囲内において適宜変更した技術的手段を組み合わせて得られる実施の形態についても本発明の技術的範囲に含まれる。 The present invention is not limited to the above-described embodiments, and various modifications can be made. That is, embodiments obtained by combining technical means appropriately changed within the scope not departing from the gist of the present invention are also included in the technical scope of the present invention.

〔まとめ〕
本発明の態様１に係る文書画像処理装置（変換処理部２２）は、文書を電子化した文書画像の再構成を行う文書画像処理装置において、前記文書画像の構造解析を行う構造解析部（行解析処理部３１、行ブロック解析処理部３２、レイアウト解析処理部３３）と、前記構造解析により前記文書画像から抽出した文字列あるいは図又は表の特徴量に基づいて、前記文書画像に含まれる各文字、図、及び／又は表である各要素を再構成するか否かの判定を行う変換判定部（変換可否判定処理部３４）と、前記変換判定部が前記文書画像の各要素を再構成すると判定すると、前記構造解析部による解析結果に基づいて、前記文書画像を再構成した際の前記各要素の順序を記述した参照リストを生成する参照リスト生成部（再配置処理部３５）と、を備える。 [Summary]
A document image processing apparatus (conversion processing unit 22) according to an aspect 1 of the present invention is a document image processing apparatus that performs reconstruction of a document image obtained by digitizing a document. Analysis processing unit 31, line block analysis processing unit 32, layout analysis processing unit 33) and each character included in the document image based on the character string or the feature quantity of the figure or table extracted from the document image by the structural analysis. A conversion determination unit (conversion possibility determination processing unit 34) that determines whether to reconstruct each element that is a character, figure, and / or table, and the conversion determination unit reconstructs each element of the document image Then, based on the analysis result by the structure analysis unit, a reference list generation unit (relocation processing unit 35) that generates a reference list describing the order of the elements when the document image is reconstructed, The Obtain.

上記構成によると、構造解析により前記文書画像から抽出した文字列あるいは図又は表の特徴量に基づいて、前記文書画像に含まれる各要素を再構成するか否かの判定を行うことができる。このように、文書画像に含まれる各要素を再構成するか、つまり、文書画像をリフロー型のファイル形式にするか、文書画像に含まれる各要素を再構成しないか、つまり、フィックス型（固定レイアウト）のファイル形式にするか、を文書画像処理装置にて判断することで、常に最適なファイル形式での表示を行うことが可能となる。また、構造解析部による解析結果に基づいて、文書画像を再構成した際の各要素の順序を記述した参照リストの生成により、文書画像をリフロー型のファイル形式にするための情報を生成することができる。 According to the above configuration, it is possible to determine whether or not each element included in the document image is to be reconfigured based on the character string extracted from the document image by structural analysis, or the feature quantity of the figure or table. In this way, each element included in the document image is reconfigured, that is, the document image is made into a reflow type file format, or each element included in the document image is not reconfigured, that is, a fixed type (fixed type) It is possible to always display in the optimum file format by determining in the document image processing apparatus whether the file format is layout). Also, based on the analysis result by the structure analysis unit, generating information for converting the document image into a reflow file format by generating a reference list describing the order of each element when the document image is reconstructed Can do.

本発明の態様２に係る文書画像処理装置は、態様１において、前記構造解析により抽出された前記文字列からは前記文書画像での文書の記述方向を決定できない場合に、前記変換判定部は、前記文書画像を再構成しないと判定する。 In the document image processing apparatus according to aspect 2 of the present invention, in the aspect 1, when the document description direction in the document image cannot be determined from the character string extracted by the structural analysis, the conversion determination unit includes: It is determined that the document image is not reconstructed.

文書画像に縦書きの行と横書きの行とが無視できない比率で混在していると、文書画像全体の方向を一意に判別できず、また、レイアウト解析する際にも、縦書きと横書きの行をどう接続すればよいかの判定が困難になる。よって、正しくリフロー型に変換するのが困難となる。そこで、上記構成により変換判定部が上記判定を行うことで、正しくリフロー型に変換できない、というミスを防ぐことが可能となる。 If the document image contains a mixture of vertical and horizontal lines in a ratio that cannot be ignored, the direction of the entire document image cannot be uniquely determined. It becomes difficult to determine how to connect. Therefore, it becomes difficult to correctly convert to the reflow type. Therefore, it is possible to prevent the mistake that the conversion determination unit cannot correctly convert to the reflow type by performing the above determination with the above configuration.

本発明の態様３に係る文書画像処理装置は、上記態様１において、前記構造解析により前記文書画像が図及び／又は表のみから構成されていると判定される場合に、前記変換判定部は、前記文書画像を再構成しないと判定する。 In the document image processing apparatus according to aspect 3 of the present invention, in the aspect 1, when the structure analysis determines that the document image is composed only of a figure and / or a table, the conversion determination unit includes: It is determined that the document image is not reconstructed.

図及び／又は表は、その中に文字があったとしても、レイアウトを崩さずにそのまま表示しないと内容がわからなくなってしまうことが多い。そこで、上記構成により変換判定部が上記判定を行うことにより、図及び／又は表のみから構成されている文書画像を間違ってリフロー型に変換して内容が不明になることを防ぐことが可能となる。 Even if there are characters in the figure and / or table, the contents often cannot be understood unless they are displayed as they are without breaking the layout. Therefore, the conversion determination unit makes the above determination according to the above configuration, thereby making it possible to prevent a document image made up of only figures and / or tables from being erroneously converted into a reflow type and unclear. Become.

本発明の態様４に係る文書画像処理装置は、上記態様１において、前記構造解析により前記文書画像から抽出された文字列の数が、予め定められる第１閾値以下である場合に、前記変換判定部は、前記文書画像を再構成しないと判定する。 In the document image processing apparatus according to aspect 4 of the present invention, in the above aspect 1, the conversion determination is performed when the number of character strings extracted from the document image by the structural analysis is equal to or less than a predetermined first threshold value. The unit determines that the document image is not reconstructed.

文字列が少ない場合、わざわざリフロー型にして読むほどの文章の長さではないと考えられる。そのため、上記構成により変換判定部が上記判定を行うことにより、無駄にリフロー型に変換することを防ぐことが可能となる。 When there are few character strings, it is thought that it is not the length of the sentence which is reflow type and read. Therefore, it is possible to prevent unnecessary conversion into the reflow type when the conversion determination unit performs the above determination with the above configuration.

本発明の態様５に係る文書画像処理装置は、上記態様１において、前記構造解析により前記文書画像から抽出された文字列の高さあるいは幅が、予め定められた第２閾値以上である場合に、前記変換判定部は、前記文書画像を再構成しないと判定する。 The document image processing apparatus according to aspect 5 of the present invention is the above-described aspect 1, wherein the height or width of the character string extracted from the document image by the structural analysis is equal to or greater than a predetermined second threshold value. The conversion determination unit determines not to reconstruct the document image.

文字が十分大きい場合には、縮小表示しても十分可読性があると考えられるため、リフロー型にする必要がない。そのため、上記構成により変換判定部が上記判定を行うことにより、無駄にリフロー型に変換することを防ぐことが可能となる。 When the characters are sufficiently large, it is considered that the characters are sufficiently readable even if they are displayed in a reduced size, so that it is not necessary to use the reflow type. Therefore, it is possible to prevent unnecessary conversion into the reflow type when the conversion determination unit performs the above determination with the above configuration.

本発明の態様６に係る文書画像処理装置は、上記態様１において、前記構造解析により、前記文書画像から抽出された複数の文字列あるいは図又は表よりなるブロックのそれぞれの位置が不規則である場合に、前記変換判定部は、前記文書画像を再構成しないと判定する。 In the document image processing apparatus according to aspect 6 of the present invention, the position of each of a plurality of character strings or blocks made up of figures or tables extracted from the document image by the structural analysis is irregular. In this case, the conversion determination unit determines not to reconstruct the document image.

文字列あるいは図又は表よりなるブロックの位置が一定の範囲内で揃っていない場合には、整理された段組みされたレイアウトの文書画像ではない可能性が高い。そのため、行ブロックの接続が失敗し、誤った行ブロック同士をつないでしまう可能性が高い。よって、上記構成により変換判定部が上記判定を行うことにより、正しくリフロー型に変換できない、というミスを防ぐことが可能となる。 If the positions of blocks consisting of character strings or figures or tables are not aligned within a certain range, there is a high possibility that the document images are not arranged in an organized and arranged layout. Therefore, there is a high possibility that connection of row blocks will fail and incorrect row blocks will be connected. Therefore, it is possible to prevent a mistake that conversion to the reflow type cannot be performed correctly when the conversion determination unit performs the above determination with the above configuration.

本発明の態様７に係る情報処理装置は、上記態様１から６の何れか１つに記載の文書画像処理装置と、前記変換判定部により各要素を再構成しないと判定された文書画像をフィックス型の表示が可能にフォーマット変換するフォーマット変換処理部と、前記変換判定部により各要素を再構成すると判定された文書画像については、当該文書画像の前記参照リスト及び当該文書画像に含まれる各要素の画像データを送信し、かつ、前記変換判定部により各要素を再構成しないと判定された文書画像については、前記フォーマット変換された文書画像を送信する送信装置と、を備える。 An information processing apparatus according to an aspect 7 of the present invention fixes a document image processing apparatus according to any one of the above aspects 1 to 6 and a document image that is determined not to be reconstructed by the conversion determination unit. A format conversion processing unit that performs format conversion so that the type can be displayed, and a document image determined to be reconstructed by the conversion determination unit, the reference list of the document image and each element included in the document image And a transmission device that transmits the format-converted document image for a document image that is determined to be not reconstructed by the conversion determination unit.

上記情報処理装置は、例えば、画像形成装置、画像読取装置、サーバ装置等であってもよい。また、画像読取装置は、スキャナ、デジタルスチルカメラ、書画カメラ、あるいは、カメラを搭載した電子機器類（例えば、携帯電話、スマートフォン、タブレット端末等）等であってもよい。上記情報処理装置が、例えば、画像形成装置である場合、読み込まれた画像データに変換処理を施して画像データを再構成し、アドレスを指定して再構成された画像データを受信側装置（例えば、タブレット端末等）に送信することにより、受信側装置にて、一方向（文書の記述方向と直交する方向）のスクロールのみで画像を閲覧することができる。 The information processing apparatus may be, for example, an image forming apparatus, an image reading apparatus, a server apparatus, or the like. In addition, the image reading apparatus may be a scanner, a digital still camera, a document camera, or an electronic device (for example, a mobile phone, a smartphone, a tablet terminal, or the like) equipped with a camera. When the information processing apparatus is, for example, an image forming apparatus, the read image data is subjected to conversion processing to reconstruct image data, and the reconfigured image data is designated by an address. By transmitting to the tablet terminal or the like, the image can be viewed only by scrolling in one direction (direction orthogonal to the document description direction) on the receiving side device.

本発明の態様８に係る表示装置は、上記態様７の情報処理装置から受信した文書画像を表示する表示装置であって、前記変換判定部により各要素を再構成すると判定された文書画像の前記参照リスト及び当該文書画像に含まれる各要素の画像データに基づき、当該文書画像をリフロー型にて表示し、かつ、前記フォーマット変換された文書画像をフィックス型にて表示する。 A display device according to an aspect 8 of the present invention is a display device that displays a document image received from the information processing device according to the aspect 7, and the document image determined to be reconstructed by the conversion determination unit. Based on the reference list and the image data of each element included in the document image, the document image is displayed in a reflow type, and the format-converted document image is displayed in a fixed type.

上記構成によると、変換判定部により各要素を再構成すると判定された文書画像をリフロー型にて表示し、かつ、上記フォーマット変換された文書画像をフィックス型にて表示することができ、リフロー型のページとフィックス型のページとが混在した原稿の表示を行うことができる。 According to the above configuration, the document image determined to be reconstructed by the conversion determination unit can be displayed in a reflow type, and the format-converted document image can be displayed in a fixed type. The original document in which the fixed page and the fixed page are mixed can be displayed.

本発明の態様９に係る表示装置は、上記態様８の表示装置において、上記態様７の情報処理装置から、さらに前記変換判定部により各要素を再構成すると判定された文書画像をフィックス型の表示が可能にフォーマット変換した文書画像を受信し、前記変換判定部により各要素を再構成すると判定された文書画像の表示を、リフロー型とフィックス型とで切り替える切替部を備える。 The display device according to aspect 9 of the present invention is the display device according to aspect 8, in which the document image determined to be reconstructed by the conversion determination unit from the information processing device according to aspect 7 is displayed in a fixed type. A switching unit that receives a document image that has been subjected to format conversion and that switches the display of the document image determined to be reconstructed by the conversion determination unit between a reflow type and a fix type.

上記構成によると、変換判定部により各要素を再構成すると判定された文書画像の表示を、ユーザがリフロー型とフィックス型とで切り替えることができる。 According to the above configuration, the user can switch the display of the document image determined to be reconstructed by the conversion determination unit between the reflow type and the fix type.

なお、上記文書画像処理装置、上記情報処理装置、又は上記表示装置は、コンピュータによって実現してもよい。この場合には、コンピュータを上記各部として動作させることにより上記文書画像処理装置、上記情報処理装置、又は上記表示装置をコンピュータにて実現させるプログラム、及びそのプログラムを記録したコンピュータ読み取り可能な記録媒体も、本発明の範疇に入る。 The document image processing device, the information processing device, or the display device may be realized by a computer. In this case, there is also provided a program for realizing the document image processing device, the information processing device, or the display device on a computer by causing the computer to operate as each unit, and a computer-readable recording medium storing the program. Falls within the scope of the present invention.

本発明は、文書画像の再構成を行う文書画像処理装置等に利用することができる。 The present invention can be used in a document image processing apparatus that reconstructs a document image.

１，１ａ画像処理装置
４送信装置
２２，２２ａ変換処理部（文書画像処理装置）
３１行解析処理部（構造解析部）
３１ａ文字列抽出処理部
３１ｂ図表抽出処理部
３２行ブロック解析処理部（構造解析部）
３３レイアウト解析処理部（構造解析部）
３４変換可否判定処理部（変換判定部）
３５再配置処理部（参照リスト生成部）
３７段組解析処理部
３８順序付け処理部
３９段落解析処理部（段落解析部）
１００画像形成装置（情報処理装置）
２００画像読取装置（情報処理装置） 1, 1a Image processing device 4 Transmission device 22, 22a Conversion processing unit (document image processing device)
31 line analysis processing part (structure analysis part)
31a Character string extraction processing unit 31b Chart extraction processing unit 32 Line block analysis processing unit (structure analysis unit)
33 Layout Analysis Processing Unit (Structural Analysis Unit)
34 Conversion enable / disable determination processing unit (conversion determination unit)
35 Relocation processing unit (reference list generation unit)
37 Column analysis processing unit 38 Ordering processing unit 39 Paragraph analysis processing unit (paragraph analysis unit)
100 Image forming apparatus (information processing apparatus)
200 Image reading device (information processing device)

Claims

In the document image processing apparatus,
A structure analysis unit for analyzing the structure of a document image obtained by digitizing a document;
It is determined whether or not the character string feature amount including at least one of the description direction, number, size per character, and position of the character string extracted from the document image by the structural analysis satisfies a predetermined condition. If it is determined that the character image is satisfied, each character, figure, and / or table included in the document image is displayed in a display area so that the character string in the document image can be displayed in a folded manner. A conversion determination unit that determines to reconfigure
A reference list generation unit that generates a reference list for reconstructing, describing the order of each element of the document image based on the analysis result by the structure analysis unit, and
When the conversion determination unit determines to reconstruct each element of the document image, the reference list generation unit is used to generate the reference list ,
The feature amount is a description direction of the character string, and the conversion determination unit determines that the predetermined condition is not satisfied when a ratio of vertical writing lines and horizontal writing lines is larger than a predetermined threshold, and the document image Is determined not to be reconfigured,
For a document image determined to be reconstructed by the conversion determination unit, the reference list of the document image and image data of each element included in the document image are output to a transmission device. Image processing device.

In the document image processing apparatus,
A structure analysis unit for analyzing the structure of a document image obtained by digitizing a document;
It is determined whether or not the character string feature amount including at least one of the description direction, number, size per character, and position of the character string extracted from the document image by the structural analysis satisfies a predetermined condition. If it is determined that the character image is satisfied, each character, figure, and / or table included in the document image is displayed in a display area so that the character string in the document image can be displayed in a folded manner. A conversion determination unit that determines to reconfigure
A reference list generation unit that generates a reference list for reconstructing, describing the order of each element of the document image based on the analysis result by the structure analysis unit, and
When the conversion determination unit determines to reconstruct each element of the document image, the reference list generation unit is used to generate the reference list ,
The feature amount is the height per character of a horizontal character string line or the vertical character string line.
It is a width per character, and the conversion determination unit determines the height or the width in advance.
When it is greater than or equal to a second threshold, it is determined that the predetermined condition is not satisfied, and the document image is reconstructed
And decide not to
For a document image determined to be reconstructed by the conversion determination unit, the reference list of the document image and image data of each element included in the document image are output to a transmission device. Image processing device.

In the document image processing apparatus,
A structure analysis unit for analyzing the structure of a document image obtained by digitizing a document;
It is determined whether or not the character string feature amount including at least one of the description direction, number, size per character, and position of the character string extracted from the document image by the structural analysis satisfies a predetermined condition. If it is determined that the character image is satisfied, each character, figure, and / or table included in the document image is displayed in a display area so that the character string in the document image can be displayed in a folded manner. A conversion determination unit that determines to reconfigure
A reference list generation unit that generates a reference list for reconstructing, describing the order of each element of the document image based on the analysis result by the structure analysis unit, and
When the conversion determination unit determines to reconstruct each element of the document image, the reference list generation unit is used to generate the reference list ,
The feature amount is a size of a position shift between a plurality of blocks made up of a plurality of character strings or figures or tables, and the conversion determination unit determines that the size of the shift is larger than a predetermined third threshold value. The document image is determined not to be reconstructed, the document image is determined not to be reconstructed, and the document image determined to be reconstructed by the conversion determining unit is the reference list of the document image and A document image processing apparatus that outputs image data of each element included in the document image to a transmission apparatus.

A document image processing apparatus according to any one of claims 1 to 3 ,
A format conversion processing unit that converts the format of a document image that is determined not to be reconstructed by the conversion determination unit so that a fixed-type display is possible;
For the document image determined to reconstruct each element by the conversion determination unit, the reference list of the document image and the image data of each element included in the document image are transmitted, and the conversion determination unit An information processing apparatus comprising: a transmission device that transmits the format-converted document image for a document image determined not to be reconstructed.

A display device for displaying a document image received from the information processing device according to claim 4 ,
Based on the reference list of document images determined to reconstruct each element by the conversion determination unit and the image data of each element included in the document image, the document image is displayed in a reflow type, and the format A display device that displays a converted document image in a fixed format.

A document image obtained by converting the document image determined to be reconstructed by the conversion determination unit from the information processing device according to claim 4 in a format that enables fixed display is received.
The display device according to claim 5 , further comprising a switching unit that switches a display of the document image determined to be reconstructed by the conversion determination unit between a reflow type and a fixed type.

A program for operating the document image processing apparatus according to any one of claims 1 to 3 , wherein the program causes a computer to function as each unit.

A computer-readable recording medium on which the program according to claim 7 is recorded.