JP2510722B2

JP2510722B2 - How to distinguish uppercase and lowercase letters in English

Info

Publication number: JP2510722B2
Application number: JP1106054A
Authority: JP
Inventors: 泰二森
Original assignee: Fuji Electric Co Ltd; Fuji Facom Corp
Current assignee: Fuji Electric Co Ltd; Fuji Facom Corp
Priority date: 1989-04-27
Filing date: 1989-04-27
Publication date: 1996-06-26
Anticipated expiration: 2011-06-26
Also published as: JPH02285477A

Description

【発明の詳細な説明】〔産業上の利用分野〕この発明は、文字認識装置における英文の大文字，小
文字の判別方法に関する。Description: TECHNICAL FIELD The present invention relates to a method for discriminating between uppercase and lowercase letters in English in a character recognition device.

[Conventional technology]

従来、大文字，小文字を判別するに当たり、対象とす
る文字の或る行の高さを大文字の高さとし、その高さに
対して予め設定された小文字の高さの比よりしきい値を
求め、対象文字の高さをこのしきい値と比較し、大文
字，小文字の判別を行なうものが知られている。Conventionally, in distinguishing uppercase and lowercase letters, the height of a certain line of the target character is set as the height of the uppercase letter, and the threshold value is obtained from the ratio of the height of the lowercase letter set in advance to that height, It is known that the height of the target character is compared with this threshold value to discriminate between uppercase and lowercase letters.

[Problems to be Solved by the Invention]

しかしながら、この方法では、小文字ばかりの行では
その行の高さは小文字の高さになる場合があり、従来方
法ではこれを大文字の高さに誤って計算，比較するた
め、全ての文字を大文字と誤判別してしまうという問題
がある。However, in this method, the height of the line may be the height of the lowercase letter in the case of only lowercase letters, and in the conventional method, this is erroneously calculated and compared with the height of the uppercase letter. There is a problem that it is erroneously determined.

したがって、この発明の課題は英字には大文字と小文
字で字形の異なる文字が存在することを利用し、大文字
の文字高さと小文字の文字高さを別々に求め、この２つ
の値と対象文字の大きさより大文字，小文字の判別を行
なうことにより、英字の大文字，小文字の判別精度を向
上させることにある。Therefore, an object of the present invention is to use the fact that uppercase letters and lowercase letters have different glyphs in the alphabet, and obtain the uppercase letter height and the lowercase letter height separately, and determine these two values and the size of the target character. By distinguishing uppercase letters from lowercase letters, the accuracy of distinguishing uppercase letters from lowercase letters is improved.

[Means for solving the problem]

少なくとも対象英文字の大きさを正規化し、大文字も
小文字も同じ標準パターンを用いて認識する文字認識装
置にて大文字，小文字の判別を行なうべく、前記文字認
識装置による認識結果から、対象文字が大文字と小文字
で字形が異なる文字種かまたは字形が同じ文字種かを判
断し、字形が異なる文字種ならばその文字が大文字か小
文字かを判断して対象文字種の文字高さを大文字，小文
字別々に積算する一方、字形の同じ文字種ならばその文
字の高さを記憶する処理を１文字行分行ない、しかる後
前記字形が異なる文字種の文字高さの積算値より大文
字，小文字の平均高さをそれぞれ計算して大文字，小文
字の判別しきい値を求め、しかる後前記字形の同じ文字
種について各々の文字高さをこのしきい値と比較して大
文字か小文字かを判別する。At least the size of the target English character should be normalized, and the character recognition device that recognizes uppercase and lowercase letters using the same standard pattern should distinguish uppercase letters from lowercase letters. If the character type is different between lowercase and lowercase, or if the character type is the same, and if the character type is different, it is determined whether the character is uppercase or lowercase, and the character height of the target character type is summed separately for uppercase and lowercase. , If the character type is the same, perform the process of storing the height of that character for one character line, and then calculate the average height of uppercase letters and lowercase letters respectively from the integrated value of the character height of the character types with different glyphs. Determine the uppercase / lowercase distinction threshold, and then compare the height of each character for the same character type of the above-mentioned glyph with this threshold to determine whether it is uppercase or lowercase. That.

[Action]

対象文字の認識結果から、対象文字が大文字と小文字
で字形が異なる文字種か、字形が同じ文字種かを判断
し、字形の異なる文字ならば、その文字が大文字か小文
字かを判断し、対象文字の文字高さを大文字，小文字別
々に積算し、字形の同じ文字種ならば、その文字の高さ
を記憶する処理をして１行分の認識結果を得、高さの積
算値と文字数より大文字，小文字の平均高さを計算し、
この２つの値から大文字，小文字の判別しきい値を求
め、字形の同じ文字の高さをこのしきい値と比較し、大
文字，小文字の判別を行なう。From the recognition result of the target character, it is determined whether the target character has different upper and lower case glyphs, or the same glyph, and if the characters have different glyphs, it is determined whether the character is uppercase or lowercase. Character heights are summed separately for uppercase and lowercase letters, and if the character type is the same, the height of the character is memorized to obtain the recognition result for one line. Calculate the average height of lowercase letters,
A threshold for discriminating uppercase letters and lowercase letters is obtained from these two values, and the heights of characters having the same character shape are compared with this threshold value to discriminate uppercase letters and lowercase letters.

〔Example〕

第１図はこの発明の実施例を示すフローチャートであ
る。FIG. 1 is a flowchart showing an embodiment of the present invention.

まず、公知の画像処理手法により文字画像データを抽
出し（参照）、同じく公知の手法により対象文字を認
識する（参照）。次いで、認識結果より対象文字が英
字かどうかを判別し（参照）、英字であればその文字
が例えば“Ｃ（ｃ）”のように大文字，小文字で字形が
同じか、“Ａ（ａ）”のように大文字，小文字で字形が
異なるかを判断し（参照）、異なる文字であれば大文
字か小文字かを判断し（参照）、大文字ならば大文字
の積算値に、その文字高さとそれに対する文字高さの相
対テーブルの値を掛け合わせたものを加え（参照）、
もし、小文字ならば小文字の積算値に、その文字高さと
それに対する文字高さの相対テーブルの値を掛け合わせ
たものを加える（参照）。First, character image data is extracted by a known image processing method (see), and the target character is recognized by a known method (see). Next, it is determined whether the target character is an alphabetic character from the recognition result (see). If the character is an alphabetic character, whether the character is the same in uppercase and lowercase, such as "C (c)", or "A (a)" It is determined whether the letter shape is different between upper case and lower case (see), and if it is a different character, it is determined whether it is upper case or lower case (see). Add the ones multiplied by the values in the relative height table (see),
If it is a lowercase letter, add the sum of the lowercase letters and the character height multiplied by the value in the relative table of the character height to it (see).

相対テーブルの例を第２図に示す。すなわち、大文字
T1のテーブル値は全て“1"であるが、小文字T2のテーブ
ル値については、b,h,lの如くその文字高さが大文字と
同程度のものもあるので、これらについてはテーブル値
を例えば“0.5"として、他の小文字とのバランスをとる
ようにしている。An example of the relative table is shown in FIG. Ie uppercase
The table values for T1 are all "1", but for the table values for lowercase T2, there are some character heights that are similar to uppercase, such as b, h, l. It is set to "0.5" to balance it with other small letters.

一方、大文字，小文字で字形が同じ文字種であれば、
その文字高さを保持する（参照）。以上のステップ
〜を繰り返し、１行の認識結果を得る（参照）。１
行の認識終了後、大文字，小文字の積算値と文字数から
各々の平均値を計算し、この２つの値の中間値等から最
適な大文字，小文字の判別しきい値を求める（参
照）。そして、ステップで保存しておいた、大文字，
小文字で字形が同じ文字種の文字高さを呼び出し（参
照）、その各々をしきい値と比較して大文字，子文字の
判別を行なう（参照）。この，のステップは保存
した文字がなくなるまで繰り返す（参照）。On the other hand, if the character type is the same in uppercase and lowercase,
Holds the character height (see). The above steps (1) to (3) are repeated to obtain the recognition result of one line (see). 1
After recognizing the line, the average value of each case is calculated from the integrated value of uppercase and lowercase letters and the number of characters, and the optimum uppercase and lowercase discrimination threshold value is obtained from the intermediate value of these two values (see). And the capital letters that you saved in step,
Calls the character heights of lowercase letters with the same character type (reference), compares each with a threshold value, and determines uppercase letters and child characters (reference). This step is repeated until there are no stored characters (see).

〔The invention's effect〕

この発明によれば、小文字ばかりの英字でも大文字か
小文字かの判別が可能となり、判別精度を向上し得る利
点がもたらされる。According to the present invention, it is possible to discriminate between uppercase letters and lowercase letters even in English letters having only lowercase letters, and there is an advantage that the discrimination accuracy can be improved.

[Brief description of drawings]

第１図はこの発明の実施例を示すフローチャート、第２
図は相対テーブルを示す概要図である。符号説明 T1……大文字テーブル、T2……小文字テーブル。FIG. 1 is a flow chart showing an embodiment of the present invention,
The figure is a schematic diagram showing a relative table. Code Explanation T1 …… Upper case table, T2 …… Lower case table.

Claims

(57) [Claims]

1. A character recognition device for normalizing at least the size of a target alphabetic character and recognizing a capital letter and a small letter by using a standard pattern having the same upper and lower case, based on the recognition result by the character recognizing device. , The upper and lower case of the target character type is determined by determining whether the target character is in uppercase and lowercase, and if the glyphs have different glyphs or the same glyph has the same glyph. While summing separately, if the character type of the glyph is the same, the process of storing the height of that character is performed for one character line, and then the average height of uppercase letters and lowercase letters from the cumulative value of the character height of the character types with different glyphs Is calculated to obtain the upper and lower case discrimination threshold, and then the height of each character of the same character type of the above-mentioned glyph is compared with this threshold to determine whether it is capital or small. Uppercase English, characterized in that to determine character, lowercase determination method.