JPH04590A

JPH04590A - Method for character recognition

Info

Publication number: JPH04590A
Application number: JP2100831A
Authority: JP
Inventors: Koichi Higuchi; 浩一樋口; Yoshiyuki Yamashita; 山下　義征
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1990-04-17
Filing date: 1990-04-17
Publication date: 1992-01-06

Abstract

PURPOSE:To stabilize extracted features, to make it unnecessary to use a dictionary corresponding to the change of character inclination and to improve a processing speed, compact size and recognition accuracy by extracting an average inclination from the horizontal and vertical stroke components of a character pattern and determin ing a division area. CONSTITUTION:A character inclination extracting part 8 extracts respective strokes of horizontal and vertical subpatterns from a subpattern extracting part 7. The inclinations of respective trokes are calculated from the coordinates of both the ends of these strokes, average the inclinations and extracts respective average inclination values of both the horizontal and vertical subpatterns. A marginal distribution part 9 scans the character pattern stored in a pattern register 4 based upon these average inclination values and extracts the marginal distribution in both the horizontal and vertical directions. A division point detecting part 10 finds out the centroid coordinates of the peripheral distribution and determines the division point coordinates in the (x) and (y) axis direction. A feature matrix extracting part 11 divides a circumscribed frame and extracts a feature value expressing the character line variables of the subpatterns in respective areas. Thereby, the extracted features can be stabilized even in an inclined character.

Description

【発明の詳細な説明】（産業上の利用分野）本発明は、媒体上の文字を光電変換して得られる文字パ
タンより、入力文字を認識する文字認識方法、特に高速
で、認識精度の良い文字認識方法に関するものである。Detailed Description of the Invention (Field of Industrial Application) The present invention relates to a character recognition method that recognizes input characters from a character pattern obtained by photoelectrically converting characters on a medium, which is particularly fast and has good recognition accuracy. It concerns a character recognition method.

（従来の技術）従来、この種の文字認識方法とし、では、特開昭５８−
１２３１７１号公報に記載されるものがあった。(Prior art) Conventionally, this type of character recognition method was used, and
There was one described in Publication No. 123171.

従来、媒体上の文字を認識する場合、手書文字において
は筆者の違いによる文字線素の移動や、印刷文字におい
てはフォント（字体）の違い等による文字線素の移動に
より、抽出される特徴が変動するので、該特徴の変動に
対応した辞書を用意しなければならなかった。そのなめ
、辞書容量が増大し、さらにその辞書の照合に要オろ時
間も増大して、処理速度の低下、装置の大型化を招いて
いた。Conventionally, when recognizing characters on a medium, features are extracted by the movement of character line elements due to differences in the writer in handwritten characters, and the movement of character line elements due to differences in fonts (fonts) in printed characters. Since the characteristics change, it was necessary to prepare a dictionary that can accommodate the changes in the characteristics. As a result, the dictionary capacity increases, and the time required to check the dictionary also increases, resulting in a decrease in processing speed and an increase in the size of the device.

そこで、前記文献の技術では、帳票等の媒体上の文字を
光電変換して得た文字パタンを、水平方向及び垂直方向
に走査し、各走査ライン毎に黒ビット数を計数し、該文
字パタンに対して設定されたＸ軸上及びＹ軸上の黒ビッ
ト数の分布（周辺分布）を求めている。さらに、該周辺
分布の重心座標を求め、その重心座標に基づいて分割座
標そして分割境界を設定して分割領域を決め、該文字パ
タンの外接方形枠内を分割している。そのため、分割境
界、従って分割領域を水平方向及び垂直方向の文字線素
の移動に追従させることが可能となる。これにより、前
記文字パタンについて、水平、垂直、右斜め、及び左斜
めの各方向の線素を表すサブパタンを抽出し、前記分割
領域毎の各サブパタンの文字線素量を特徴要素とする特
徴マトリクスを抽出し、その抽出された特徴マトリクス
と、予め用意した辞書とを照合することで、個人差に基
づくストローク位置変動等の特徴の変動を吸収し、辞書
容量の減少による処理速度の高速化等を図るようにして
いる。したがって、文字、！！素の水平方向及び垂直方
向の移動に対しては、大きな効果があった。Therefore, in the technique of the above-mentioned document, a character pattern obtained by photoelectrically converting characters on a medium such as a form is scanned in the horizontal and vertical directions, the number of black bits is counted for each scanning line, and the character pattern is The distribution (marginal distribution) of the number of black bits on the X-axis and Y-axis set for is calculated. Furthermore, the barycenter coordinates of the peripheral distribution are determined, and based on the barycenter coordinates, division coordinates and division boundaries are set to determine division areas, and the inside of the circumscribed rectangular frame of the character pattern is divided. Therefore, it is possible to make the division boundary, and therefore the division area, follow the movement of character line elements in the horizontal and vertical directions. As a result, sub-patterns representing line elements in each of the horizontal, vertical, right diagonal, and left diagonal directions are extracted from the character pattern, and a feature matrix is created in which the amount of character line elements of each sub-pattern for each divided area is used as a feature element. By extracting the extracted feature matrix and comparing it with a dictionary prepared in advance, fluctuations in characteristics such as stroke position fluctuations based on individual differences can be absorbed, and processing speed can be increased by reducing dictionary capacity. I am trying to achieve this. Hence the characters! ! There was a large effect on raw horizontal and vertical movement.

（発明が解決しようとする課題）しかしながら、上記の文字認識方法では、次のような課
題があった。(Problems to be Solved by the Invention) However, the above character recognition method has the following problems.

第２図（ａ）、（ｂ）、（ｃ）、（ｄ）は、従来の文字
パタン分割方法を説明するための図であり、同図（ａ、
　）は傾きのない文字パタン［○Ｊの例、同図（ｂ）は
斜体字ｒＱＪの例、同図（ｃ）は同図（ａ）の周辺分布
図、同図（ｄ）は同図（ｂ）の周辺分布図である。Figures 2 (a), (b), (c), and (d) are diagrams for explaining the conventional character pattern division method;
) is an example of a character pattern with no slope [○J, the same figure (b) is an example of the italic character rQJ, the same figure (c) is the peripheral distribution diagram of the same figure (a), the same figure (d) is the same figure ( It is a peripheral distribution map of b).

従来の文字認識方法では、入力さｈを文字パタンを水平
方向及び垂直方向に走査して周辺分布ＳＸＹを求め、そ
の周辺分布ＳＸＹによって設定した分割点座標り及び分
割境界Ｓに基づきその外接枠内を複数の領域に分割して
いる。ところが、第２図（ｂ）に示すように、垂直線素
に傾きを持つような斜体字について垂直方向の分割境界
Ｓを定めた場合、第２図（ａ）に示すような傾きのない
文字の場合と比較すると、それぞれの分割点座標Ｄ（１
）〜Ｄ（５）とＤ（１１）〜Ｄ（１５）とが異なる上に
、傾きのない文字については分割境界Ｓが垂直線素と平
行に設定されるが、垂直線素に傾きを持つ斜体字につい
ては分割境界Ｓが文字線素を斜めに分断するように設定
される。そのため、同じ文字でありながら、周辺分布５
ＸＹ（１）とＳＸＹ　（２＞とが大きく異なり、従って
抽出される特徴が大きく異なったものとなって、文字認
識精度が低下してしまう。このように従来の文字認識方
法においては、垂直線素に傾きを持つような斜体字につ
いてそれを的確に分割する方法がないなめ、そのような
特徴の変動を吸収するために、様々な変形に対応する辞
書を用意せざるを得なかった。そのため、辞書容量が増
大し、照合に時間がかかつて、ハード規模（装置規模）
や処理時間が増大するという問題があり、未だ技術的に
十分満足のゆくものが得られなかった。In the conventional character recognition method, the peripheral distribution SXY is obtained by scanning the input character pattern in the horizontal and vertical directions. is divided into multiple areas. However, as shown in Figure 2(b), if a vertical division boundary S is determined for an italic character whose vertical line elements have a slope, the characters without slope as shown in Figure 2(a) When compared with the case, each division point coordinate D(1
)~D(5) and D(11)~D(15) are different, and for characters with no slope, the division boundary S is set parallel to the vertical line element, but the vertical line element has a slope. For italic characters, the dividing boundary S is set so as to diagonally divide the character line elements. Therefore, although it is the same character, the marginal distribution 5
XY(1) and SXY(2> are significantly different, and therefore the extracted features are significantly different, reducing character recognition accuracy.In this way, in conventional character recognition methods, vertical lines Since there is no way to accurately divide italic characters that naturally have a slope, we had no choice but to prepare a dictionary that accommodates various transformations in order to absorb variations in such characteristics. , the dictionary capacity has increased and it takes time to collate, so the hardware scale (equipment scale) has increased.
However, there are problems such as an increase in processing time and an increase in processing time, and it has not yet been possible to obtain a product that is technically satisfactory.

このような問題は、手書文字における右上り文字のよう
に水平線素に傾きがある文字の場合についても同様に生
じる。Such a problem similarly occurs in the case of a character in which horizontal line elements are inclined, such as a handwritten character with an upward slope to the right.

本発明は、前記従来技術が持っていた課Ｅとして、処理
時間の増大と装置の大型化という点について解決した文
字認識方法を提供するものである。The present invention provides a character recognition method that solves problems E of the prior art, such as an increase in processing time and an increase in the size of the device.

（課題を解決するための手段）本発明は、前記Ｍｕを解決するために、ｇ体上の文字を
光電変換して得られる文字パタンを複数の領域に分割し
、該分割された分割領域毎に特徴を抽出しｆＳ後、該抽
出された特徴と予め用意した辞書との照合により、前記
文字を認識する文字認識方法において、次のような手段
を講じたｔ、のである。(Means for Solving the Problems) In order to solve the above-mentioned Mu, the present invention divides a character pattern obtained by photoelectrically converting characters on a g typeface into a plurality of regions, and for each divided region. In this character recognition method, the character is recognized by extracting the features, fS, and then comparing the extracted features with a dictionary prepared in advance.

即ち、前記文字パタンより水平ストローク成分と垂直ス
トローク成分の両方またはいずれか一方を抽出し、前記
ストローク成分毎に、それぞれのストロークの傾斜を該
ストロークの長さを重みとして加重平均したものを該ス
トローク成分の平均傾斜として抽出し、前記平均傾斜に
従って前記文字パタンを走査して周辺分布を抽出し、前
記周辺分布を用いて分割点（分割点座標）を検出し、前
記分割点及び前記平均傾斜に基づき前記文字パタンを複
数の領域に分割するようにしたものである。That is, both or either of the horizontal stroke component and the vertical stroke component are extracted from the character pattern, and for each stroke component, the stroke is calculated by weighting the slope of each stroke using the length of the stroke as a weight. The character pattern is extracted as the average slope of the component, the character pattern is scanned according to the average slope to extract the marginal distribution, the dividing point (dividing point coordinates) is detected using the marginal distribution, and the dividing point and the average slope are The character pattern is divided into a plurality of areas based on the above.

（作用）本発明によれば、以上のように文字認識方法を構成した
ので、文字パタンか入力されると、その文字パタンの水
平ストローク成分と垂直ストローク成分の両方またはい
ずれか一方を抽出し、その水平ストローク成分よりその
各ストロークの平均傾斜を、その垂直ストローク成分よ
りその各ストロークの平均傾斜を、それぞれ抽出する。(Operation) According to the present invention, since the character recognition method is configured as described above, when a character pattern is input, both or one of the horizontal stroke component and the vertical stroke component of the character pattern is extracted, The average slope of each stroke is extracted from the horizontal stroke component, and the average slope of each stroke is extracted from the vertical stroke component.

ここで、各ストローク成分の平均傾斜は、該ストローク
成分に含まれるそれぞれのストロークの傾斜をそのスト
ロークの長さを重みとして加重平均して求めている。通
常長いストロークは文字の傾斜をよく反映し、短いスト
ロークは該文字のｗＸ斜に対してばらつきが大きいが、
平均傾斜を加重平均により求めることで、平均傾斜の算
出には、長いストロークの傾斜が良く反映され、短いス
トロークの傾斜の影響が抑えられる。このようにして求
めた平均傾斜に従い文字パタンを走査して周辺分布を抽
出し、該周辺分布を用いて分割点座標を決定し、該分割
点座標と前に求めた平均傾斜に基づいて文字パタンを複
数の領域に分割している。その後、その分割された分割
領域毎に特徴を抽出し、その特徴と辞書との照合により
、入力文字の認識を行つ。Here, the average slope of each stroke component is determined by weighted averaging of the slopes of each stroke included in the stroke component using the length of the stroke as a weight. Normally, long strokes reflect the slant of the character well, and short strokes have large variations in the wX slant of the character.
By calculating the average inclination using a weighted average, the inclination of long strokes is well reflected in the calculation of the average inclination, and the influence of the inclination of short strokes can be suppressed. The character pattern is scanned according to the average slope obtained in this way to extract the marginal distribution, the division point coordinates are determined using the peripheral distribution, and the character pattern is created based on the division point coordinates and the previously determined average slope. is divided into multiple areas. Thereafter, features are extracted for each of the divided regions, and the input characters are recognized by comparing the features with a dictionary.

このように、文字パタンの傾斜に追従させて分割領域を
決定しているので、垂直線素や水平線素に傾きを持つ文
字パタンについても、抽出される特徴が安定となるので
、文字傾斜の変形に対応した辞書を用意する必要がない
。それにより、処理速度の高速化と装置の小型化、及び
認識精度の向上が図れる。従って、前記課題を解決でき
るのである。In this way, since the divided regions are determined by following the slope of the character pattern, the extracted features are stable even for character patterns that have slopes in vertical and horizontal line elements, so the deformation of the character slope There is no need to prepare a compatible dictionary. Thereby, it is possible to increase the processing speed, reduce the size of the device, and improve recognition accuracy. Therefore, the above problem can be solved.

（実施例）第１図は、本発明の実施例を示すもので、文字認識方法
を説明するための文字認識装置の構成ブロック図である
。(Embodiment) FIG. 1 shows an embodiment of the present invention, and is a block diagram of a character recognition device for explaining a character recognition method.

この文字認識装置は、帳票等の媒体上の文字画像の光信
号ＩＮを量子化された電気信号（ディジタル信号）に変
換する光電変換部１を有し、その出力１則には、行バッ
ファ２が接続されている。行バッファ２は、例えば＠２
０４８Ｘ高さ１２８ビツトの大きさを有し、１行分の文
字画像のディジタル信号を格納する構成になっており、
その出力側には、文字切出部３を介してパタンレジスタ
４が接続されている。This character recognition device has a photoelectric conversion unit 1 that converts an optical signal IN of a character image on a medium such as a form into a quantized electric signal (digital signal). is connected. Row buffer 2 is, for example, @2
It has a size of 048 x height 128 bits, and is configured to store the digital signal of one line of character image.
A pattern register 4 is connected to its output side via a character cutting section 3.

文字切出部３は、行バッファ２の出力から１文字分のデ
ィジタル信号（これを、「文字パタンＪという）をパタ
ンレジスタ４に格納する機能を有している。The character cutting section 3 has a function of storing a digital signal for one character (hereinafter referred to as "character pattern J") from the output of the line buffer 2 into the pattern register 4.

パタンレジスタ４は、例えば６４Ｘ６４ビツトの記憶容
量を有し、その出力側には文字枠検出部５、線幅測定部
６、及びサブパタン抽出部７が接続され、サブパタン抽
出部７を介して文字傾斜抽出部８が接続されていると共
に、周辺分布抽出部９が接続されている。The pattern register 4 has a storage capacity of, for example, 64 x 64 bits, and a character frame detection section 5, a line width measurement section 6, and a sub-pattern extraction section 7 are connected to its output side. The extraction unit 8 is connected, and the marginal distribution extraction unit 9 is also connected.

文字枠検出部５は、パタンレジスタ４内の文字パタンを
走査して外接枠、つまり文字枠を検出し、その検出結果
を周辺分布抽出部９等に与える機能を有している。線幅
測定部６は、パタンレジスタ４の出力に対する線幅を測
定し、その測定結果をサブパタン抽出部７に与える機能
を有している。The character frame detection unit 5 has a function of scanning the character pattern in the pattern register 4 to detect a circumscribed frame, that is, a character frame, and providing the detection result to the peripheral distribution extraction unit 9 and the like. The line width measurement unit 6 has a function of measuring the line width of the output of the pattern register 4 and providing the measurement result to the sub-pattern extraction unit 7.

サブパタン抽出部７は、パタンレジスタ４を複数の方向
に走査して垂直、水平、右斜め、左斜めサブパタンを抽
出するもので、垂直サブパタン抽出部７ａ、水平サブパ
タン抽出部７ｂ、右斜めサブパタン抽出部７ｃ、及び左
斜めサブパタン抽出部７ｄより構成されている。各抽出
部７ａ〜７ｄは、それぞれパタン格納用のメモリを有し
ている。The sub-pattern extractor 7 scans the pattern register 4 in a plurality of directions to extract vertical, horizontal, right diagonal, and left diagonal sub-patterns, and includes a vertical sub-pattern extractor 7a, a horizontal sub-pattern extractor 7b, and a right-diagonal sub-pattern extractor. 7c, and a left diagonal sub-pattern extraction section 7d. Each of the extraction units 7a to 7d has a memory for storing patterns.

文字傾斜抽出部８は、サブパタン抽出部７の垂直サブパ
タン抽出部７ａ及び水平サブパタン抽出部７ｂで抽出さ
ｈを垂直サブパタン及び水平サブパタンについて傾斜を
抽出し、その抽出結果を周辺分布抽出部９に与える機能
を有している。The character slope extraction unit 8 extracts slopes of the vertical subpatterns and horizontal subpatterns extracted by the vertical subpattern extraction unit 7a and the horizontal subpattern extraction unit 7b of the subpattern extraction unit 7, and provides the extraction results to the marginal distribution extraction unit 9. It has a function.

周辺分布抽出部９は、文字傾斜抽出部８により得られた
傾斜を用いてパタンレジスタ４に格納された文字パタン
を走査してその周辺分布を抽出する機能を有している。The peripheral distribution extraction section 9 has a function of scanning the character pattern stored in the pattern register 4 using the slope obtained by the character slope extraction section 8 and extracting its peripheral distribution.

周辺分布抽出部９の出力側には、分割点検出部１０及び
特徴マトリクス抽出部１１か接続され、さらにその出力
側には識別部１２が接続されている。A division point detection section 10 and a feature matrix extraction section 11 are connected to the output side of the marginal distribution extraction section 9, and an identification section 12 is further connected to the output side thereof.

分割点検出部］０は、外接枠内を複数の領域に分割する
ための分割点座標を検出する機能を有している。特徴マ
トリクス抽出部１１は、サブパタン抽出部７から出力さ
れる垂直１．水平、右斜め、及び左斜めサブパタンの各
パタンから、特徴量を抽出して特徴マトリクスＦ　（ｋ
）を作成し、それを識別部１２へ与える機能を有してい
る。識別部１２は、標準文字の特徴マトリクス（標準文
字マスク）Ｇ（ｋ）とこの特徴マトリクスＧ（ｋ＞を有
する標準文字の文字名とを格納する辞書メモリを有して
いる。そして、特徴マトリクス抽出部１１で抽出された
特徴マトリクスＦ　（ｋ＞と辞書メモリの特徴マトリク
スＧ（ｋ）とを照合することにより、該特徴マトリクス
Ｆ　（ｋ）を得た外接枠内領域の文字図形の認識を行い
、文字名０ＩＪＴを出力する機能を有している。The division point detection unit 0 has a function of detecting division point coordinates for dividing the inside of the circumscribed frame into a plurality of regions. The feature matrix extraction section 11 uses the vertical 1. Feature quantities are extracted from each of the horizontal, right diagonal, and left diagonal sub-patterns, and a feature matrix F (k
) and provides it to the identification unit 12. The identification unit 12 has a dictionary memory that stores a standard character feature matrix (standard character mask) G(k) and character names of standard characters having this feature matrix G(k>). By comparing the feature matrix F(k> extracted by the extraction unit 11 with the feature matrix G(k) in the dictionary memory, the character/figure in the area within the circumscribed frame from which the feature matrix F(k) was obtained is recognized. It has the function of executing the command and outputting the character name 0IJT.

次Ｉ：５以上のように構成される文字認識装置を用いた
文字認識方法について、各精成ブロックの処理（Ｉ＞〜
（ＩＸ）について説明する。Next I: Regarding the character recognition method using the character recognition device configured as above, the processing of each refined block (I>~
(IX) will be explained.

（Ｉ）　　文字パタン生成処理帳票上に記入された文字画像の光信号ＩＮが光電変換部
１に入力されると、光電変換部１では、光信号ＩＮを２
値のディジタル信号、つまり文字線部を°１゛　（これ
を「焦ビットＪという）、背景部を“０’　　（こｈを
「白ビ゛・ントＪというンに変換する。光電変換部］−
で変換さｈｆ、−１行分の文字画像のディジタル信号は
、行バッファ２に格納される。(I) When the optical signal IN of the character image written on the character pattern generation processing form is input to the photoelectric conversion unit 1, the photoelectric conversion unit 1 converts the optical signal IN into 2
The digital signal of the value, that is, the character line portion is converted to °1゛ (this is called "focus bit J"), the background part is converted to "0" (this is converted to "white bit J". Photoelectric conversion section) −
The digital signal of the character image for hf, -1 line converted by is stored in the line buffer 2.

文字切出部３では、行バッファ２に格納さｈｆ、Ｈ文字
画像のディジタル信号から、１文字分のディジタル信号
ｃ文字パタン）を切出し、パタンレジスタ４に格納する
。本実施例では、帳票フォーマットが予め指定されてお
り、文字切出部３にＩＪメモリを有し、行バツフア２内
の文字位置を示すアドレスが格納されている。そのなめ
、文字切出し動作は、該アドレスで指定された行バッフ
ァ２の内容を読み出すことにより実行される。The character cutting section 3 cuts out one character's worth of digital signals (c character pattern) from the digital signals of the hf and H character images stored in the line buffer 2, and stores them in the pattern register 4. In this embodiment, the form format is specified in advance, and the character cutting section 3 has an IJ memory in which addresses indicating character positions within the line buffer 2 are stored. Therefore, the character cutting operation is executed by reading the contents of the line buffer 2 specified by the address.

（ＩＩ）　　文字枠検出処理文字枠検出部５では、パタンレジスタ４のパタンを走査
してそのパタンの左端座標Ｘ」、右端座標Ｘｒ、上端座
標ｙｔ及び下端座標ｙｂを検出する。外接枠、つまり文
字枠は（Ｘ、Ｉ）、Ｙｔ）、（ＸＣＹｂ＞、（Ｘｒ、Ｙ
ｔ）、（Ｘｒ、Ｙｂ）の４点を結ぶ矩形枠となる。(II) Character frame detection processing The character frame detection unit 5 scans the pattern in the pattern register 4 and detects the left end coordinate X', right end coordinate Xr, upper end coordinate yt, and lower end coordinate yb of the pattern. The circumscribing frame, that is, the character frame is (X, I), Yt), (XCYb>, (Xr, Y
It becomes a rectangular frame connecting the four points t), (Xr, Yb).

才な、文字枠検出後は、特徴量の正現化を行うために、
必要な文字枠の大きさを算出する。即ち、パタンＩ・ジ
スタ４のＸ軸に対し、平行な方向（水平方向）の文字枠
の大きさをＷＰｈとしてＷＰｈ＝Ｘｒ−Ｘ、ｌ！＋ｌを
、垂直な方向（垂直方向）の文字枠の大きさをＷＰｖと
してＷＰｖ＝Ｙｔ−Ｙｂ＋１を、そｈぞｈｘ出する。さ
らに、右斜め及び左斜め４５″方向の文字枠の大きさを
ＷＰｒ及びＷＰｌとしてＷＰｈ＋ＷＰｖＷＰｒ＝ＷＰり＝を算出する２これらの算出結果は、周辺分布抽出部９及
び特徴マトリクス抽出部１１に与えら九る。After detecting the character frame, in order to normalize the feature amount,
Calculate the size of the required character frame. That is, assuming that the size of the character frame in the direction (horizontal direction) parallel to the X axis of Pattern I/Jister 4 is WPh, WPh=Xr-X, l! +l and the size of the character frame in the vertical direction (vertical direction) is WPv, and WPv=Yt-Yb+1 is outputted by hx. Furthermore, the sizes of the character frames in the right diagonal and left diagonal 45'' directions are set as WPr and WPl to calculate WPh+WPv WPr=WPr=2 These calculation results are given to the marginal distribution extraction section 9 and the feature matrix extraction section 11. Ra nine.

（ＩＩＩ）　　線幅測定処理線幅測定部６は、パタンレジスタ４がらのディジタル信
号を入力し、例えば２×２の窓の全ての点が黒ビットと
なる状態の個数Ｑと、全黒ビットの個数Ａとを計数し、
従来周知の次式（１＞に従って線幅ＷＬを算出する。(III) Line Width Measurement Process The line width measurement unit 6 inputs the digital signal from the pattern register 4 and calculates, for example, the number Q of states in which all points in a 2×2 window are black bits, and the number of all black bits. Count the number A,
The line width WL is calculated according to the following well-known formula (1>).

ＷＬ＝Ａ／（Ａ−Ｑン　　　　　　　　　・・・・・・
　（１）（ＴＶ）　　サブパタン抽出処理サブパタン抽出部７では、垂直サブパタン抽出部７ａ、
水平サブパタン抽出部７ｂ、右斜めサブパタン抽出部７
ｃＦ１．び左斜めサブパタン抽出部７ｄにより、そノ１
それパタンレジスタ４上に設定し。WL=A/(A-Qn...
(1) (TV) Sub-pattern extraction processing In the sub-pattern extraction section 7, the vertical sub-pattern extraction section 7a,
Horizontal sub-pattern extraction section 7b, right diagonal sub-pattern extraction section 7
cF1. and left diagonal sub-pattern extraction section 7d,
Set it on pattern register 4.

たＸ軸方向に対して垂直な方向（垂直方向）及び平行な
方向（水平方向）と、Ｘ軸から反時計方向４５°の方向
（右斜め４５°方向）及び時計方向４５°の方向（左斜
め４５°方向）とを、主走査方向としてパタンレジスタ
４を走査し、各主走査方向に対応する垂直、水平、右斜
め、及び左斜めサブパタンを抽出する。The direction perpendicular to the X-axis direction (vertical direction) and parallel direction (horizontal direction), the direction 45° counterclockwise from the The pattern register 4 is scanned with the 45° diagonal direction) as the main scanning direction, and vertical, horizontal, right diagonal, and left diagonal sub-patterns corresponding to each main scanning direction are extracted.

即ち、垂直サブパタン抽出部７ａでは、垂直方向を主走
査方向とじて原パタンを全面走査し、垂直方向の走査線
上で連続する黒ビット（黒ラン）を検出する。そして、
検出した黒ランの中から次式（２）を満足する長さｊの
黒ランを抽出する。That is, the vertical sub-pattern extraction unit 7a scans the entire original pattern with the vertical direction as the main scanning direction, and detects continuous black bits (black runs) on the vertical scanning line. and,
A black run of length j that satisfies the following equation (2) is extracted from the detected black runs.

ｊ≧Ｎ・ＷＬ　　　　　　　　・・・（２）但し、Ｉｌ
ｌ：主走査方向における黒ランの長さＮ；各サブパタンに対する任意定数（例えば、２）次に、垂直サブパタン抽出部７ａは、（２）式を満足す
る黒ランを、サブパタンを精成する黒ランとみなして図
示しない垂直サブパタンメモリに格納する。（２）式を
満足しない黒ランは白ビットとみなす。j≧N・WL...(2) However, Il
l: Length N of black runs in the main scanning direction; arbitrary constant for each sub-pattern (for example, 2) Next, the vertical sub-pattern extraction unit 7a extracts the black runs that satisfy the equation (2) and the black runs that refine the sub-patterns. It is regarded as a run and stored in a vertical sub-pattern memory (not shown). Black runs that do not satisfy equation (2) are regarded as white bits.

同様に、水平、右斜め及び左斜めサブパタン抽出部７ｂ
、７ｃ、７ｄ／１ｉ、水平、右斜め及び左斜め方向を主
走査方向とじで原パタンを走査し、それぞれの主走査方
向の走査線上の黒ランの中から、（２）式を満足する黒
ランを抽出し、抽出しな黒ランを、サブパタンを精成す
る黒ランとみなして図示しない水平、右斜め及び左斜め
サブパタンメモリに格納する。Similarly, the horizontal, right diagonal and left diagonal sub-pattern extraction section 7b
, 7c, 7d/1i, scan the original pattern horizontally, diagonally to the right, and diagonally to the left in the main scanning direction, and select black runs that satisfy equation (2) from among the black runs on each scanning line in the main scanning direction. Runs are extracted, and unextracted black runs are stored in horizontal, right diagonal, and left diagonal subpattern memories (not shown) as black runs for refining subpatterns.

このようにして抽出されるサブパタンの一般的な例が第
３図（ａ）〜（ｅ）に示されている。第３図（ａ）〜（
ｅ）は文字パタン及びそのサブパタンの例を示す図であ
り、第３図（ａ）は外接する文字枠内の２値パタンから
なる原パタンであり、このような原パタンから得られた
垂直サブパタン（ｖｓｐ）が第３図（ｂ）に、水平サブ
パタン（Ｈ８Ｐ）が第３図（ｃ）に、右斜めサブパタン
（Ｒ８Ｐ＞が第３図（ｄ）に、左斜めサブパタン（ＬＳ
Ｐ）が第３図（ｅ）に、それぞれ示されている、（Ｖ）　　文字傾斜抽出処理文字傾斜抽出部８は、サブパタン抽出部７より得られる
垂直サブパタン及び水平サブパタンのそれぞれについて
、該サブパタンの文字線素成分（これを［ストロークｊ
という〉を抽出する。次に、抽出した各ストロークの両
端の座標値より、各ストロークの傾きを計算し、それら
を平均して、垂直サブパタンからは垂直ストローク成分
の平均傾斜θＶ、水平サブパタンからは水平ストローク
成分の平均傾斜θｈを、それぞれ抽出する。General examples of sub-patterns extracted in this way are shown in FIGS. 3(a) to 3(e). Figure 3(a)-(
e) is a diagram showing an example of a character pattern and its subpatterns, and FIG. 3(a) is an original pattern consisting of a binary pattern within a circumscribing character frame, and vertical subpatterns obtained from such an original pattern (vsp) is shown in Fig. 3(b), the horizontal subpattern (H8P) is shown in Fig. 3(c), the right diagonal subpattern (R8P> is shown in Fig. 3(d), and the left diagonal subpattern (LS
P) are shown in FIG. 3(e), respectively. Character line element (this is [stroke j
Extract 〉. Next, calculate the slope of each stroke from the coordinate values of both ends of each extracted stroke, and average them. From the vertical sub-pattern, the average slope of the vertical stroke component θV, and from the horizontal sub-pattern, the average slope of the horizontal stroke component. θh are each extracted.

次に、この平均傾斜θ■、θｈの具体的な抽出方法につ
いて説明する。Next, a specific method for extracting the average slopes θ■ and θh will be explained.

先ず、垂直サブパタンについて、水平走査を全面に行い
、白ビットから黒ビット、及び黒ビットから白ビットへ
の変化点を検出する。そして、１ライン前の走査線と現
在の走査線における変化点座標との関係より、垂直方向
のストローク（垂直ストローク）の両端座標を抽出する
。First, horizontal scanning is performed over the entire vertical sub-pattern to detect points of change from white bits to black bits and from black bits to white bits. Then, the coordinates of both ends of the vertical stroke (vertical stroke) are extracted from the relationship between the change point coordinates of the scanning line one line before and the current scanning line.

抽出した垂直ストロークの両端ｐｉ：標を（ＶＸＳ、　
　ＶＭＳ、）と（ＶＸＥ、　　ＶＹＥ、）とした１’　
　　　　　　　１　　　　　　　　　　　　１’　　　
　　　　　］とき、次式（３）を用いて垂直ストローク
成分の平均傾斜θＶを計算する。但し、ｊ−１，・・・
・・Ｐｖ、Ｐｖは垂直サブパタンより抽出したストロー
ク数、またＶＹＳ、＜ＶＹＥ　、である。Both ends pi of the extracted vertical stroke: mark (VXS,
1' with VMS, ) and (VXE, VYE,)
1 1'
], the average slope θV of the vertical stroke component is calculated using the following equation (3). However, j-1,...
...Pv, Pv is the number of strokes extracted from the vertical sub-pattern, and VYS, <VYE.

−〕１ θＶ＝ここで、ＶＬＧ・は次式（４）より求められる、ＶＬＧ
。−]1 θV= Here, VLG・ is obtained from the following equation (4), VLG
.

ＨＡＸ　　＜　ｌ　ＶＸＥｉ−ＶＸＳｉ　　ｌ　、　　
ｌ　ＶＹＥｉ−ＶＹＳ＋　　ｌ　）＋・・・・・（４）この（４）式は、２点間の距離、即ち両端座標（ＶＸＳ
　ｉ　、　ＶＹＳ　ｉ　）及び（ＶＸＥｉ、ＶＹＥｉ）
を持つ垂直ストロークの長さを表わす該両端座標間の距
離を、その２点間の水平及び垂直座標差のうちで小さい
方の１／２と他の一方との和とする近似式である。HAX < l VXEi-VXSi l,
l VYEi-VYS+ l )+ ...(4) This formula (4) is the distance between two points, that is, the coordinates of both ends (VXS
i , VYS i ) and (VXEi, VYEi)
This is an approximation formula in which the distance between the two end coordinates representing the length of the vertical stroke is the sum of 1/2 of the smaller of the horizontal and vertical coordinate differences between the two points and the other one.

また、水平サブパタンより水平ストローク成分の平均傾
斜θｈを、次のようにして抽出する。Further, the average slope θh of the horizontal stroke component is extracted from the horizontal sub-pattern as follows.

水平サブパタンについて垂直走査を行い、水平方向のス
トローク（水平ストローク）の両端座標を抽出する。そ
の両端座標を、（ＨＸＳｊ、ＨＹＳ、＞と（ＨＸＥ　、
、ＨＹＥ　、）としたとき、水Ｊ　　　　　　　　　　
　Ｊ　　　　　　　　Ｊ千ストローク成分の平均傾斜θ
ｈを次式（５）で計算する。但し、ｊ−１，・・・・・
・、ｐｈ、ｐｈは水平サブパタンより抽出したストロー
ク数、まなＨＸＳ　、＜ＨＸＥ　、である。Vertical scanning is performed on the horizontal sub-pattern, and the coordinates of both ends of the horizontal stroke are extracted. The coordinates of both ends are (HXSj, HYS, > and (HXE,
, HYE , ), water J
J Average slope θ of 1,000 stroke components
Calculate h using the following equation (5). However, j-1,...
, ph, ph is the number of strokes extracted from the horizontal sub-pattern, HXS, <HXE.

Ｊ　　　　　　　　Ｊ θｈ＝・・・・・・（５）ここで、ＨＬＧ、は次式（６）より求められる。J θh= ・・・・・・(5) Here, HLG is obtained from the following equation (6).

ＨＬＧ。H.L.G.

ＨＡＸ　　＜　ｌ　ＨＸＥｊ−ｔ（χＳｊ　　ｌ　、　
　ｌ　ＨＹＥｊ−ＨＹＳｊ　　ｌ　）＋・・・・・・（６）なお、ストローク数がＯのときは、その平均傾斜を０と
する。即ち、Ｐｖ＝ＯのときはθＶ＝○、ｐｈ＝ｏのと
きはθｈ＝ｏとする。HAX < l HXEj-t(χSj l,
l HYEj-HYSj l )+ (6) Note that when the number of strokes is O, the average slope is set to zero. That is, when Pv=O, θV=○, and when ph=o, θh=o.

（ＶＩ）　　周辺分布抽出処理周辺分布抽出部９は、パタンレジスタ４に格納された文
字パタンを走査して、その周辺分布を抽出する。以下、
その周辺分布の抽出方法について、第４図（ａ）、（ｂ
）を用いて説明する。(VI) Marginal Distribution Extraction Process The marginal distribution extraction section 9 scans the character pattern stored in the pattern register 4 and extracts its marginal distribution. below,
Figures 4(a) and (b) show how to extract the marginal distribution.
).

第４図（ａ）、（ｂ）は、傾きを持つ文字パタンの分割
領域の決定方法を示す図であり、同図（ａ）は文字パタ
ンの例であって数字の「○Ｊの場合の図であり、同図（
ｂ）は同図（ａ）の文字パタンから抽出された周辺分布
を示す図である。FIGS. 4(a) and 4(b) are diagrams showing a method of determining dividing areas for a character pattern with an inclination, and FIG. 4(a) is an example of a character pattern in the case of the number The same figure (
b) is a diagram showing the peripheral distribution extracted from the character pattern in Fig. 4(a).

第４図（ａ）で矢印Ｐは、開始アドレス（ｘａ。In FIG. 4(a), arrow P indicates the start address (xa).

ＹＴ）から垂直走査を開始する場合の走査経路を示す。The scanning path when starting vertical scanning from YT) is shown.

なお、第４図（ａ＞では、文字パタン上にＸ軸及びｙＩ
！Ｉ！ｌを設定し、また、第４図（ｂ）では、横方向に
Ｘ軸をとり、縦方向にＸ軸方向に対応した周辺分布ＳＸ
　（ｘ）をとっている。In addition, in Figure 4 (a), the X axis and yI are placed on the character pattern.
! I! In addition, in FIG. 4(b), the X-axis is taken in the horizontal direction, and the peripheral distribution SX corresponding to the X-axis direction is taken in the vertical direction.
(x) is taken.

周辺分布抽出部９では、文字傾斜抽出部８で得られた平
均傾斜θＶ及び平均傾斜θｈに基づき、パタンレジスタ
４に格納された文字パタンの走査経路を決定し、その走
査経路により垂直走査及び水平走査を行い、各走査によ
りＸ軸方向の周辺分布５Ｘ（ｘ＞及びｙ軸方向の周辺分
布ｓｙ　（ｙ）をそれぞれ抽出する。The peripheral distribution extraction unit 9 determines the scanning path of the character pattern stored in the pattern register 4 based on the average slope θV and average slope θh obtained by the character slope extraction unit 8, and uses the scanning path to perform vertical scanning and horizontal scanning. Scanning is performed, and by each scan, a peripheral distribution 5X (x>) in the X-axis direction and a peripheral distribution sy (y) in the y-axis direction are respectively extracted.

次に、この周辺分布ｓｘ　（ｘ＞、ＳＹ　（ｙ＞の具体
的な抽出方法について説明する。Next, a specific method for extracting the marginal distribution sx (x>, SY (y>) will be described.

先ず、垂直走査を行う場合、文字パタンの上辺に走査開
始点として、例えば開始アドレス（ｘａ。First, when vertical scanning is performed, a start address (xa), for example, is set as a scanning start point on the upper side of a character pattern.

ＹＴ）を設定し、その開始アドレス（ｘａ、ＹＴ＞から
ｙ軸の正の方向へ走査を行うものとすると、その走査経
路ＰＳ座標（Ｊ、３’ｊ、）は、次式（７）によって決
定される。YT), and scanning is performed in the positive direction of the y-axis from the start address (xa, YT>), the scanning path PS coordinate (J, 3'j,) is calculated by the following equation (7). It is determined.

Ｘ　１　”　Ｘ　ａ、ｙ１＝ＹＴｘ　、＝ｘａ＋θｖＸ　（ｙ　−−ＹＴ）ｙｉ＝ｙｉ−
１＋１・・・（７）ここで、θＶは実数であり、第４図（ａ）の場合例えば
θｖ＝５／２０に設定されている。また、θｖＸ　（ｙ
　、−ＹＴ）の計算結果については小数点以下を切捨て
、座標値は全て整数である。X 1 ” X a , y1 = YT x , = xa + θv
1+1 (7) Here, θV is a real number, and in the case of FIG. 4(a), it is set to θv=5/20, for example. Also, θvX (y
, -YT) are rounded down to the decimal point, and all coordinate values are integers.

次に、（７）式で決定された走査経路Ｐに沿って垂直走
査を行い、その走査経路Ｐ中の黒ビット数を計数し、走
査開始点のＸ座標がｘａにおける周辺分布の値ＳＸ　（
ｘａ＞を得る。ｘａは、ＸＬ≦ｘａ≦ＸＲの範囲で変化
させる。即ち、走査開始点を（ＸＬ、ＹＴ）、（ＸＬ＋
１．ＹＴ）、・・・（ＸＲ，ＹＴ）にそれぞれ設定し、
各走査開始点毎に前記（７）式に従う走査経路上の黒ビ
ット数を計数し、周辺分布を抽出する。これにより、周
辺分布ＳＸ　（ｘ＞、ｘ＝ＸＬ、・・・、ＸＲが抽出さ
れる。Next, vertical scanning is performed along the scanning path P determined by equation (7), the number of black bits in the scanning path P is counted, and the value of the marginal distribution SX (
xa> is obtained. xa is changed within the range of XL≦xa≦XR. That is, the scanning start point is (XL, YT), (XL+
1. YT), ... (XR, YT) respectively,
At each scanning start point, the number of black bits on the scanning path according to the above equation (7) is counted, and the marginal distribution is extracted. As a result, the marginal distribution SX (x>, x=XL, . . . , XR) is extracted.

また、平均傾斜θｈを用いてｙ軸方向の周辺分布５Ｙ（
ｙ）を、次のようにして抽出する。Also, using the average slope θh, the peripheral distribution 5Y(
y) is extracted as follows.

先ず、ｙ軸上に走査開始点として、例えば開始アドレス
（ＸＬ、ｙａ＞を設定した場合の水平方向の走査経路は
、次式（８）によって決定さ九る。First, when a start address (XL, ya>, for example) is set as a scanning start point on the y-axis, the horizontal scanning path is determined by the following equation (8).

ｘ１＝ＸＬｙｌ：ｙａｘｊ＝ｘｊ−１＋１ｙ　、＝ｙａ＋θｌｌｘ　（ｘ　、−ＸＬ）Ｊ　　　　
　　　　　　　　　　　　Ｊ・・・（８）ここで、θｈは実数であり、θｈＸ（ｘ−−ＸＬ）の計
算結果の小数点以下を切捨て、座標値は全て整数である
。x1=XL yl:ya xj=xj-1+1 y,=ya+θllx (x, -XL)J
J...(8) Here, θh is a real number, the decimal part of the calculation result of θhX(x−XL) is rounded down, and all coordinate values are integers.

（８）式でｙａの値をＹＴ≦ｙａ≦ＹＢの範囲で変化さ
せる。ａｌｌち（ＸＬ、ＹＴ）、　・　、（ＸＬ。In equation (8), the value of ya is changed within the range of YT≦ya≦YB. allchi (XL, YT), ・ , (XL.

ｙａ）、・・・、（ＸＬ、ＹＢ）を走査開始点に設定す
る。そして、それぞれの走査開始点毎に前記（８）式に
従う走査経路上の黒ビット数を計数すれば、周辺分布Ｓ
Ｙ　（ｙ）、ｙ＝ＹＴ、・・・、ＹＢが得られる。ya), . . . , (XL, YB) are set as scanning start points. Then, by counting the number of black bits on the scanning path according to equation (8) above for each scanning start point, the marginal distribution S
Y (y), y=YT,..., YB are obtained.

（ＶＩＩ）　　分割点決定処理分割点検出部１０は、周辺分布抽出部９で抽出された周
辺分布ＳＸ　（ｘ＞、ＳＹ　（ｙ）の重心座標を求め、
次いで、文字枠のＸ軸方向またはｙ軸方向の全範囲また
はすでに求めた重心座標で分割された範囲における１次
モーメントの和をその範囲の黒ビット和で除算すること
により、Ｘ軸方向の分割点座標ＤＸ　（ｍ＞及びｙ軸方
向の分割点座標ＤＹ（ｎ）を決定するものである。但し
、ｍｎは座標値の大きさの順に付した重心座標番号であ
り、ｍ＝１〜（Ｎχ−１）、ｎ＝１〜（ＮＹｌ）である
。なお、本実施例では、ＮＸ＝ＮＹ−４とした。(VII) Division point determination processing The division point detection unit 10 determines the barycenter coordinates of the marginal distribution SX (x>, SY (y)) extracted by the marginal distribution extraction unit 9,
Next, the division in the X-axis direction is performed by dividing the sum of the first moments in the entire range of the character frame in the X-axis direction or y-axis direction or in the range divided by the barycentric coordinates already determined by the sum of black bits in that range. This is to determine the point coordinate DX (m>) and the division point coordinate DY (n) in the y-axis direction. However, mn is the barycenter coordinate number assigned in order of the size of the coordinate value, and m = 1 to (Nχ -1), n=1 to (NYl).In this example, NX=NY-4.

次に、この分割点座標ＤＸ　（ｍ）、ＤＹ　（ｎ）の具
体的な決定方法について第４図（ａ）、（ｂ）を参照し
つつ説明する。Next, a specific method for determining the dividing point coordinates DX (m) and DY (n) will be explained with reference to FIGS. 4(a) and 4(b).

分割点座標ＤＸ　（ｍ）を求めるために、先ず文字枠の
Ｘ軸方向の範囲ＸＬ〜ＸＲを対象として。In order to obtain the division point coordinates DX (m), first, the range XL to XR of the character frame in the X-axis direction is targeted.

文字パタンの周辺分布ＳＸ　（ｘ＞の１次モーメント和
をその範囲の黒ビット和で除算することにより、中央の
重心座標番号の重心座標ＤＸ（２＞を次式（９）によっ
て求める。By dividing the sum of the linear moments of the peripheral distribution SX (x) of the character pattern by the sum of black bits in that range, the barycenter coordinate DX(2) of the center barycenter coordinate number at the center is determined by the following equation (9).

・・（９）但し、（８）式の右辺の値は実数となるので、小数点以
下を切り上げてＤＸ（２＞を整数にする。...(9) However, since the value on the right side of equation (8) is a real number, round up the decimal point to make DX(2>) an integer.

次に、（９）式によって求めた重心Ｓ標Ｄ（２）によっ
て分割されてできるそれぞれのＭ囲、ＸＬ〜ＤＸ　（２
＞、ＤＸ　（２）〜ＸＲを対象として２つの重心座標Ｄ
Ｘ　（１）、ＤＸ　（３）をそれぞれ次式（１０）、（
１１）によって求める。Next, each of the M boundaries, XL to DX (2
>, DX (2) Two centroid coordinates D for ~XR
X (1) and DX (3) are expressed by the following equations (10) and (
11).

Ｘ二Ｘ［ χ＝ＸＬ・・・（１０）以上のようにして重心座標ＤＸ（１）、ＤＸ（２）、Ｄ
Ｘ　（３）を求め、当該座標をＸ軸上の対象分割点座標
として決定する。X2X[χ=XL...(10) As above, the center of gravity coordinates DX(1), DX(2), D
X (3) is determined, and the coordinates are determined as the target division point coordinates on the X axis.

また、分割点座標ＤＹ（ｎ）についても同様ｔこ、先ず
ｙ軸上の周辺分布ＳＹ（、ｙ）より、文字枠の範囲ＹＴ
〜ＹＢを対象として、重心座標ＤＹ（２）を検出し、次
にその重心座標ＤＹ（２＞によって分割されてできるそ
れぞれの範囲ＹＴ〜ＤＹ（２＞、ＤＹ　＜２）〜ＹＢの
そノ１ぞｈを対象として周辺分１ｓＹ（、ｙ＞の重心座
標ＤＹ　（１）、ＤＹ　（３）を検出する。Similarly, regarding the dividing point coordinates DY(n), first, from the peripheral distribution SY(, y) on the y-axis, the range YT of the character frame
〜YB as a target, detect the center of gravity coordinate DY(2), and then divide it by the center of gravity coordinate DY(2> to create each range YT〜DY(2>, DY<2)〜YB part 1 The center of gravity coordinates DY (1), DY (3) of the surrounding area 1sY(, y>) are detected for zh.

以上のようにして重心座標ＤＹ（１）、ＤＹ（２）、Ｄ
Ｙ　（３）を求め、当該座標をｙ軸上の対象分割点！標
として決定する。As described above, the center of gravity coordinates DY(1), DY(2), D
Find Y (3) and use the coordinates as the target dividing point on the y-axis! Decided as a target.

（ＶＩＩＬ）　　特徴マトリクス抽出処理文字枠検出部
５が文字パタンの外接枠を規定する座標ＸＮ　、Ｘｒ、
Ｙｔ、Ｙｂを検出し、さらに分割点検出部１０が文字パ
タンについて対象分割点座標を検出すると、特徴マトリ
クス抽出部１１では、垂直、水平、右斜め、及び左斜め
サブパタンの各パタンから特徴量を抽出し、特徴マトリ
クスを作成する。(VIIL) Feature matrix extraction processing Character frame detection unit 5 defines the circumscribed frame of the character pattern at coordinates XN, Xr,
When Yt and Yb are detected, and the dividing point detection unit 10 further detects the target dividing point coordinates for the character pattern, the feature matrix extraction unit 11 extracts feature amounts from each of the vertical, horizontal, right diagonal, and left diagonal sub-patterns. Extract and create a feature matrix.

即ち、特徴マトリクス抽出部１１は、１つの外接枠内領
域を、対象分割点座標と座標Ｘｆ１．ＸｒＹｔ、Ｙｂと
によってＮχＸＮＹ個の部分領域に分割し、各部分領域
内のサブパタンの文字線量を表す特徴量を抽出する。そ
して、１つの外接枠領域内の各サブパタンから抽出しな
ＮχＸＮＹＸ４個の特徴量から成る特徴マトリクスを、
当該外接枠内領域の特徴マトリクスとして抽出する。That is, the feature matrix extraction unit 11 extracts one circumscribed frame area from the target dividing point coordinates and the coordinates Xf1. It is divided into NxXNY partial regions by XrYt and Yb, and a feature amount representing the character dose of the subpattern in each partial region is extracted. Then, a feature matrix consisting of NχXNYX4 feature quantities extracted from each subpattern within one circumscribed frame area is created as follows:
It is extracted as a feature matrix of the region within the circumscribed frame.

先ず、水平サブパタン（Ｈ８Ｐ）からの特徴量抽出につ
き説明する。First, feature extraction from the horizontal sub-pattern (H8P) will be explained.

特徴マトリクス抽出部１１は、対象分割点座標と座標Ｘ
ＪＩＩ　、Ｘｒ、Ｙｔ、Ｙｂとに基づき、外接枠内領域
をＮＸｘＮＹ個の部分領域に分割しく対象分割点座標及
び座標Ｘ、１）、Ｘｒ、Ｙｔ−、Ｙｂは分割点座標であ
る）、各部分領域毎に部分＠域内の水平サブパタンＨ８
Ｐの黒ビ・ソト数ＢＨ（ｉ。The feature matrix extraction unit 11 extracts target division point coordinates and coordinates
Based on JII, Xr, Yt, and Yb, the area within the circumscribing frame is divided into NXxNY partial areas. Horizontal sub-pattern H8 within the part @ area for each partial area
P's black and soto number BH (i.

ｊ）を計数する。j).

分割点座標から分割領域の決定は、次のように行う。先
ず、Ｘ軸上の分割点座標ＤＸ（ｎ）を文字パタンの外接
枠の上辺に設定する。この座標の起点として、次式（１
２）で求められる座標系列の左側を、第４図（ａ＞に示
すような分割境界Ｓとする。The division area is determined from the division point coordinates as follows. First, the dividing point coordinates DX(n) on the X axis are set at the upper side of the circumscribing frame of the character pattern. As the starting point of this coordinate, the following formula (1
Let the left side of the coordinate series obtained in 2) be a division boundary S as shown in FIG. 4 (a>).

ｘ　（＋、　＝　Ｄ　Ｘ　（ｎ　）３’０　＝ｙ’ｒｘ　、＝ＤＸ　（ｎ）＋θｖＸ（ｙｉ　ｙ□＞ｙＨ＝ｙ
ｉ−１＋１・・・（１２）水平方向の分割境界Ｓも同様に、ＤＹ（ｎ）とθｈを用
いて設定する。x (+, = D X (n) 3'0 = y'r x , = DX (n) + θv
i-1+1 (12) The horizontal division boundary S is similarly set using DY(n) and θh.

以上のように分割された分割領域毎に、水平サブパタン
Ｈ３Ｐの黒ビット数ＢＨ（ｉ、ｊ＞を計数する。このＢ
Ｈ（ｉ、ｊ＞は、一つの外接枠内領域に関する第ｉ行第
ｊ列の部分領域の黒ビット数である。さらに、次式（１
３）に従って第ｊ行第ｊ列の部分領域に関する特＠１Ｆ
Ｈ（ｉ、ｊ）を計算する。For each divided area divided as described above, count the number of black bits BH (i, j>) of the horizontal sub-pattern H3P.
H(i, j> is the number of black bits of the partial area in the i-th row and j-th column regarding one circumscribed frame area.Furthermore, the following formula (1
According to 3), the special @1F regarding the partial area of the jth row and the jth column
Calculate H(i,j).

ｊ列の部分領域のＶＳＰ、Ｈ３Ｐ、ＬＳＰの黒ビット数
ＢＶ（ｉ、ｊ）、ＢＲ（ｉ、ｊ）、ＢＬ（ｉ、　ｊ）を
計数し、次式（１４）〜（１６）に従って第ｉ行第ｊ列
の部分領域に関するｖｓｐ。Count the black bit numbers BV (i, j), BR (i, j), and BL (i, j) of the VSP, H3P, and LSP in the partial area of column j, and calculate the number according to the following equations (14) to (16). vsp regarding the partial area in the i-th row and the j-th column.

Ｈ３Ｐ、ＬＳＰの特徴量ＦＶ（ｉ、ｊ＞、ＦＲ（ｊ、ｊ
＞、ＦＬ（ｉ、ｊ＞を算出する。H3P, LSP feature quantity FV(i, j>, FR(j, j
>, FL(i, j>).

・・・・・・（１３）但し、ｉ＝１．２．・・・・・・、ＮＸｊ＝１．２．・
・・・・・、ＮＹＷＬ、線幅ｗｐｈ、文字幅（＝Ｘｒ−ＸＵ　＋１　）さらに、Ｈ８
Ｐの場合と同様にして、第ｉ行第ＷＬ−ＷＰＵ−＝・−
（１６’）但し、ＷＰｖ；文字高さ（＝Ｙｂ−Ｙｔ＋１）ＷＰｒ＝ＷＰＪ
２　＝　（ＷＰｖ＋ＷＰｈ）７２以上のようにして、外
接枠内領域の各部分領域毎にＶＳＰ、Ｈ３Ｐ、Ｈ８Ｐ、
ＬＳＰの特徴１を抽出し、これらＮχＸＮＹＸ４個の特
徴量から成る特徴マトリクスＦ　（ｋ＞（ｋ＝１．２．
・・・・・・ＮＸＸＮＹＸ４）を得る。特徴マトリクス
抽出部１１は、特徴マトリクスＦ　（ｋ）を各外接枠領
域毎に抽出し、その抽出結果を識別部１２へ送る。......(13) However, i=1.2. ......, NXj=1.2.・
..., NY WL, line width wph, character width (=Xr-XU +1), and H8
Similarly to the case of P, the i-th row WL-WPU-=・-
(16') However, WPv; character height (=Yb-Yt+1)WPr=WPJ
2 = (WPv+WPh)72 As above, VSP, H3P, H8P,
Feature 1 of LSP is extracted and a feature matrix F (k>(k=1.2.
...NXXNYX4) is obtained. The feature matrix extraction unit 11 extracts the feature matrix F (k) for each circumscribed frame area, and sends the extraction result to the identification unit 12 .

（ＩＸ）　　識別処理識別部１２は、抽出された特徴マトリクスＦ（ｋ＞と内
部に設けられた辞書メモリ内の特徴マトリクスＧ　（ｋ
）とを照合することにより、該特徴マトリクスＦ（ｋ）
を得た外接枠内領域の文字図形の認識を行う。この認識
では、次式（１７）に従って特徴マトリクスＦ　（ｋ）
とＧ　（ｋ）間の距離りを求め、距離りが最小となる特
徴マトリクスＧ　（ｋ）の標準文字の文字名（例えば、
ＪＩＳ規格に定められた文字コード）ＯＵＴを認識結果
として出力する。(IX) The identification processing identification unit 12 extracts the extracted feature matrix F(k> and the feature matrix G(k
), the feature matrix F(k)
Recognize the characters and figures in the area within the circumscribed frame obtained. In this recognition, the feature matrix F (k) is calculated according to the following equation (17)
Find the distance between
The character code defined in the JIS standard) OUT is output as the recognition result.

以上のように、本実施例では、次のような利点を有して
いる。As described above, this embodiment has the following advantages.

本実施例では、文字パタンから抽出された垂直サブパタ
ン及び水平サブパタンを用いて、垂直サブパタンより垂
直ストローク成分の平均傾斜θ■を、水平サブパタンよ
り水平ストローク成分の平均傾斜θｈを、それぞれ抽出
し、その平均傾斜θＶ、θｈに従って文字パタンを走査
してその文字パタンの周辺分布を抽出している。さらに
、その周辺分布を利用して、分割点座標を決定し、その
分割点座標と平均傾斜θＶ、θｈに基づき、文字パタン
の外接枠内の分割領域を決定し、その分割領域毎の特徴
マトリクスを抽出している。そのため、手書文字におけ
る水平線素が右上がりに傾いな文字や、印刷文字におけ
る垂直線素が右に傾いた斜体字等のように傾斜を有する
文字パタンについて、抽出される特徴が安定となる。し
たがって、文字傾斜の変形に対応した辞書を識別部１２
内に用意する必要がなく、辞書容量の減少により、照合
時間の短縮と、それによる処理速度の高速化が図れると
共に、ハード規模が小さくなり認識精度の良い文字認識
が可能となる。In this example, the vertical sub-pattern and horizontal sub-pattern extracted from the character pattern are used to extract the average slope θ■ of the vertical stroke component from the vertical sub-pattern and the average slope θh of the horizontal stroke component from the horizontal sub-pattern, respectively. The character pattern is scanned according to the average slopes θV and θh to extract the peripheral distribution of the character pattern. Furthermore, the division point coordinates are determined using the peripheral distribution, and the division areas within the circumscribed frame of the character pattern are determined based on the division point coordinates and the average slopes θV and θh, and the feature matrix for each division area is determined. is extracted. Therefore, the extracted features are stable for character patterns that have an inclination, such as handwritten characters in which the horizontal line elements are inclined upward to the right, or printed characters in italic characters in which the vertical line elements are inclined to the right. Therefore, the identification unit 12 selects a dictionary corresponding to the deformation of the character slope.
There is no need to prepare one internally, and by reducing the dictionary capacity, it is possible to shorten the collation time and thereby increase the processing speed, and at the same time, the hardware size is reduced and character recognition with high recognition accuracy is possible.

なお、本発明は、上記実施例に限定されず、種々の変形
が可能である。その変形例としては、例えば次のような
ものがある。Note that the present invention is not limited to the above embodiments, and various modifications are possible. Examples of such modifications include the following.

（ｉ）本実施例では、水平、垂直の両方向について、ス
トロークの傾斜を抽出する場合について説明した。しか
し、手書文字については、一般に右上がり文字が多いの
で、水平ストロークの傾斜抽出のみで十分である。また
、印刷文字の斜体字を対象とする場合は、垂直ストロー
クの傾斜抽出を行えばよい。このように、読取り対象に
より、適宜、傾斜抽出方向を選択し、構成の簡略化を図
ることが可能である。(i) In this embodiment, a case has been described in which the slope of a stroke is extracted in both the horizontal and vertical directions. However, since handwritten characters generally have many characters that slope upward to the right, it is sufficient to extract only the slope of horizontal strokes. Moreover, when the target is an italic character of a printed character, it is sufficient to extract the inclination of a vertical stroke. In this way, the tilt extraction direction can be selected as appropriate depending on the object to be read, and the configuration can be simplified.

（ｉｉ）第１図の構成ブロック図は、個別回路て構成す
る以外に、コンピュータを用いたプログラム制御等で実
行する構成にしてもよい。(ii) The configuration block diagram in FIG. 1 may be configured not only by individual circuits but also by program control using a computer.

（発明の効果）以上詳細に説明したように本発明によれば、媒体上の文
字を光電変換して得られる文字パタンより水平ストロー
ク成分と垂直ストローク成分の両方またはいずれか一方
を抽出し、前記ストローク成分毎に、それぞれのストロ
ークの傾斜を該ストロークの長さを重みとして加重平均
したものを該ストローク成分の平均傾斜として抽出し、
前記平均傾斜に従って前記文字パタンを走査して周辺分
布を抽出し、前記周辺分布を用いて分割点を検出し、前
記分割点及び前記平均傾斜に基づき前記文字パタンを複
数の領域に分割して、その分割された分割領域毎に特徴
を抽出するようにしている。(Effects of the Invention) As described above in detail, according to the present invention, horizontal stroke components and/or vertical stroke components are extracted from a character pattern obtained by photoelectrically converting characters on a medium. For each stroke component, the weighted average of the slope of each stroke using the length of the stroke as a weight is extracted as the average slope of the stroke component,
scanning the character pattern according to the average slope to extract a peripheral distribution, detecting a dividing point using the peripheral distribution, dividing the character pattern into a plurality of regions based on the dividing point and the average slope, Features are extracted for each divided region.

そのため、手書文字のように水平線素が右上りに傾いた
文字や、印刷文字のように垂直線素が右に傾いた斜体字
等を対象とする場合でも、安定な特徴を抽出することが
できる。従って、文字傾斜の変形に対応した辞書を用意
する必要がなく、処理速度が速く、小さなハード規模で
、認識精度の良い文字認識が可能となる。Therefore, it is not possible to extract stable features even when dealing with characters such as handwritten characters in which horizontal line elements are tilted upward to the right, or italic characters in which vertical line elements are tilted to the right such as printed characters. can. Therefore, there is no need to prepare a dictionary that accommodates the deformation of the character slope, and character recognition with high processing speed and small hardware scale is possible with high recognition accuracy.

[Brief explanation of drawings]

第１図は本発明の実施例を示す文字認識方法を説明する
ための文字認識装置の構成ブロック図、第２図（ａ）〜
（ｄ）は従来の文字パタン分割方法を説明するための図
、第３図（ａ）〜（ｅ）は文字パタン及びそのサブパタ
ンの例を示す図、第４図（ａ）、（ｂ）は、傾きを持つ
文字パタンの分割領域の決定方法を示す図である。１・・・光電変換部、２・・・行バッファ、３・・・文
字切出部、４・・・パタンレジスタ、５・・・文字枠検
出部、６・・・線幅測定部、７・・・サブパタン抽出部
、８・・・文字傾斜抽出部、９・・・周辺分布抽出部、
１０・・・分割点検出部、１１・・・特徴マトリクス抽
出部、１２・・・識別部。FIG. 1 is a block diagram of a character recognition device for explaining a character recognition method according to an embodiment of the present invention, and FIG. 2(a) to
(d) is a diagram for explaining the conventional character pattern division method, Figures 3 (a) to (e) are diagrams showing examples of character patterns and their sub-patterns, and Figures 4 (a) and (b) are diagrams for explaining the conventional character pattern division method. , is a diagram illustrating a method for determining divided regions of a character pattern having an inclination. DESCRIPTION OF SYMBOLS 1... Photoelectric conversion part, 2... Line buffer, 3... Character cutting part, 4... Pattern register, 5... Character frame detection part, 6... Line width measurement part, 7 . . . sub-pattern extraction unit, 8 . . . character slope extraction unit, 9 . . . marginal distribution extraction unit,
10... Division point detection unit, 11... Feature matrix extraction unit, 12... Identification unit.

Claims

[Claims] A character pattern obtained by photoelectrically converting characters on a medium is divided into a plurality of regions, and features are extracted for each divided region, and then the extracted features and a previously prepared In a character recognition method that recognizes the character by checking with a dictionary, extracting both or either one of a horizontal stroke component and a vertical stroke component from the character pattern, and determining the slope of each stroke for each stroke component. A weighted average using the stroke length as a weight is extracted as the average slope of the stroke component, the character pattern is scanned according to the average slope to extract a peripheral distribution, and a dividing point is detected using the peripheral distribution. . A character recognition method, characterized in that the character pattern is divided into a plurality of regions based on the dividing point and the average slope.