JP2001314820A

JP2001314820A - Device for detecting address region

Info

Publication number: JP2001314820A
Application number: JP2001084835A
Authority: JP
Inventors: Hiroyoshi Miyano; 博義宮野
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2001-03-23
Filing date: 2001-03-23
Publication date: 2001-11-13

Abstract

PROBLEM TO BE SOLVED: To solve such a problem that a regions showing the structural contents of the same place of residence as an address like a place of residence of a sender is erroneously detected as an address region. SOLUTION: The device for detecting an address region comprises a means 21 for producing candidates for an address region, a means 22 for calculating the characteristic content of the candidate for an address region, a means 31 for storing an address probability of the characteristic content and the candidate for an address region to the characteristic content and a sending probability which is the probability of a region having the structural contents of a place of residence other than an address in such a way that these probabilities are made to correspond to each other, a means 23 for converting the characteristic content of each of the candidates for an address region to the address probability of each of the candidates for an address region and the sending probability, referring to the storage means 31, a means 24 for deciding the sequential order of the candidates for an address region based on the address probability and the sending probability obtained and a means 25 for selecting a specified number of the candidates for an address region in the order starting with the highest address probability.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、郵便物の宛名が記
載された宛名領域を検出する宛名領域検出装置に関する
ものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a destination area detecting device for detecting a destination area on which a destination of a postal matter is described.

【０００２】[0002]

【従来の技術】従来、このような宛名領域検出装置で
は、郵便画像を分割して郵便画像中の宛名である領域を
含むいくつかの郵便画像の部分領域（以下、宛名である
領域のことを宛名領域、郵便画像の部分領域のことを宛
名領域候補と呼ぶ）を得ている。この場合、差出人の住
所のような宛名領域以外に宛名領域と同じような住所の
構造を持っている領域（以下、差出領域と呼ぶ）を混同
して検出しないようにして宛名領域を出力しなければな
らない。2. Description of the Related Art Conventionally, such an address area detecting apparatus divides a postal image into several partial areas of a postal image including an area which is an address in the postal image (hereinafter referred to as an addressing area). The address area and the partial area of the postal image are called address area candidates). In this case, in addition to the address area such as the sender's address, an area having the same address structure as the address area (hereinafter, referred to as a “sender area”) must be confused and the address area must be output so as not to be detected. Must.

【０００３】そこで、従来においては各宛名領域候補に
対して宛名領域らしさの尺度のみを設けてその尺度の大
きいものから、ある与えられた数の候補数だけ宛名領域
を出力している。例えば、１９９６年のインターナショ
ナル・コンフェレンス・オン・パターン・レコグニショ
ン、第３巻（Proceedings of 13th International Conf
erence on Pattern Recognition , Volume III Track
C）の、Adrian P．Whichello とHong Yan の論文“Loca
ting Address Blocks and Postcodes in Maii-Piece Im
ages”（ｐ．７１６〜ｐ．７２０）中、ｐ．７１９に記
載されているように、郵便画像の中央下方の点からの距
離が小さい宛名領域候補ほど宛名領域である可能性が高
いことを利用して郵便画像の中央下方の点から近い順に
宛名領域候補を選択する方法が知られている。Therefore, conventionally, only a measure of the likelihood of a destination area is provided for each destination area candidate, and a given number of destination areas are output from a large scale. For example, 1996 International Conference on Pattern Recognition, Volume 3 (Proceedings of 13th International Conf.
erence on Pattern Recognition, Volume III Track
C), Adrian P. Whichello and Hong Yan's paper “Loca
ting Address Blocks and Postcodes in Maii-Piece Im
ages "(p. 716 to p. 720), as described on p. 719, it is more likely that an address area candidate having a smaller distance from a point below the center of the postal image is more likely to be an address area. There is known a method of selecting a destination area candidate in ascending order from a lower center point of a postal image using the mail address.

【０００４】[0004]

【発明が解決しようとする課題】従来の技術において
は、宛名領域らしさの尺度のみで宛名領域であるかを判
断していて、宛名と思われる郵便画像の部分画像を、差
出人を表わす住所のような宛名と同じ構造を持つ郵便画
像の部分画像と区別する対策を施していない。そのた
め、差出人住所を宛名住所であると誤ることがあり、必
ずしも宛名領域を検出する精度は高くなかった。In the prior art, whether or not a mailing area is an addressing area is determined only by a measure of the likelihood of an addressing area. No countermeasures have been taken to distinguish it from a partial image of a postal image that has the same structure as a mailing address. Therefore, the sender's address may be mistaken for the address, and the accuracy of detecting the address area is not always high.

【０００５】本発明は、上記従来の問題点に鑑み、差出
人住所等のような宛名と同じ住所の構造を持つ領域を誤
検出する危険性を低減し、宛名領域の検出精度を高める
ことが可能な宛名領域検出装置を提供することを目的と
する。SUMMARY OF THE INVENTION In view of the above-mentioned conventional problems, the present invention can reduce the risk of erroneously detecting an area having the same address structure as an address, such as a sender address, and improve the accuracy of address area detection. It is an object of the present invention to provide a destination address detecting device.

【０００６】[0006]

【課題を解決するための手段】本発明は、上記目標を達
成するために、郵便画像の部分画像に基づいて宛名情報
の領域である宛名領域を含む複数の宛名領域候補を生成
する手段と、生成された宛名領域候補毎にその宛名領域
が宛名領域であるかを評価する場合の指標を表す特徴量
を抽出する手段と、前記特徴量と宛名領域候補の宛名領
域である確率を表す宛名確率及び宛名領域候補の宛名以
外の住所の構造を持つ領域の確率を示す差出確率と宛名
でも宛名以外の住所の構造も持たない領域の確率を示す
文様確率とを対比させて記憶する手段と、前記抽出手段
により抽出された各々の宛名領域候補の特徴量を前記記
憶手段を参照して宛名確率及び差出確率及び文様確率に
変換する手段と、変換された確率のうち宛名確率及び差
出確率を利用して各々の宛名領域候補に対して得点を計
算し、その得点に基づいて順序を決定する手段と、決定
された順序に基づいて宛名領域候補の中から所定数の宛
名領域候補を選択する手段とを備えたことを特徴とす
る。According to the present invention, in order to achieve the above object, there is provided a means for generating a plurality of address area candidates including an address area which is an area of address information based on a partial image of a postal image. Means for extracting, for each generated address area candidate, a feature quantity representing an index when evaluating whether the address area is the address area; and address probability indicating the probability that the feature quantity and the address area candidate are the address area. Means for comparing and storing the sending probability indicating the probability of an area having an address structure other than the address of the address area candidate and the pattern probability indicating the probability of an area having no address structure other than the address in the address, Means for converting the feature amount of each address area candidate extracted by the extracting means into an address probability, a sending probability, and a pattern probability by referring to the storage means, and using the address probability and the sending probability among the converted probabilities. hand Means for calculating a score for each addressing area candidate and determining an order based on the score; and means for selecting a predetermined number of addressing area candidates from the addressing area candidates based on the determined order. It is characterized by having.

【０００７】本発明では、郵便画像の部分画像に基づい
て生成された宛名領域候補が宛名である確率の他に、差
出人の住所等のような宛名以外の住所の構造を持つ差出
確率を新たに考慮している。そして、この２つの宛名確
率と差出確率を用いて宛名領域候補の順序を決定し、得
られた順序に基づいて宛名である確率が高い順に所定数
の宛名領域候補を選択している。これによって、宛名以
外の住所の構造を持つ領域の誤検出を低減し、宛名領域
の検出精度を高めている。According to the present invention, in addition to the probability that an address area candidate generated based on a partial image of a postal image is an address, a sending probability having a structure of an address other than the address such as the address of the sender is newly added. Take into account. Then, the order of the destination area candidates is determined using the two destination probabilities and the sending probability, and a predetermined number of destination area candidates are selected in descending order of the probability of the destination based on the obtained order. As a result, erroneous detection of an area having an address structure other than the address is reduced, and detection accuracy of the address area is increased.

【０００８】[0008]

【発明の実施の形態】次に、本発明の実施の形態につい
て図面を参照して詳細に説明する。図１は本発明の宛名
領域検出装置の第１の実施形態の構成を示すブロック図
である。図１において、１はスキャナ等の入力装置、２
はプログラム制御により動作するデータ処理装置、３は
情報を記憶する記憶装置、４は出力装置である。入力装
置１は処理対象の郵便画像を入力する装置である。デー
タ処理装置２は、宛名領域候補生成手段２１、特徴量抽
出手段２２、宛名・差出確率計算手段２３、順序決定手
段２４、出力候補決定手段２５から構成されている。記
憶装置３は宛名・差出分布辞書３１を含んでいる。出力
装置４は宛名領域候補の中から宛名である確率の高い順
に所定数の宛名領域候補を出力する。Next, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram showing a configuration of a first embodiment of a destination area detecting device according to the present invention. In FIG. 1, 1 is an input device such as a scanner, 2
Is a data processing device that operates under program control, 3 is a storage device that stores information, and 4 is an output device. The input device 1 is a device for inputting a postal image to be processed. The data processing device 2 includes a destination area candidate generation unit 21, a feature amount extraction unit 22, a destination / submission probability calculation unit 23, an order determination unit 24, and an output candidate determination unit 25. The storage device 3 includes an address / submission distribution dictionary 31. The output device 4 outputs a predetermined number of destination area candidates from the destination area candidates in descending order of the probability of being the destination.

【０００９】次に、データ処理装置２の内部構成につい
て説明する。始めに、本明細書で用いる用語について定
義する。まず、領域とは入力装置１で入力された郵便画
像の部分画像であると定義する。また、宛名領域とは入
力装置１で入力された郵便画像中の宛名の部分を示す領
域であると定義する。差出人の住所等、宛名でなく住所
を表している領域のことを差出領域という。また、宛名
でも宛名以外の住所を表している領域でもない領域のこ
とを文様領域という。文様領域は宛名領域でも差出領域
でもない領域のことである。Next, the internal configuration of the data processing device 2 will be described. First, terms used in the present specification are defined. First, an area is defined as a partial image of a postal image input by the input device 1. Further, the address area is defined as an area indicating the address part in the postal image input by the input device 1. An area that represents an address instead of an address, such as the sender's address, is called a sender area. An area that is neither an address nor an area representing an address other than the address is called a pattern area. The pattern area is an area that is neither an address area nor a sending area.

【００１０】宛名領域候補生成手段２１は入力装置１か
ら入力された郵便画像から宛名領域を含む複数の領域を
生成する。以下、この領域それぞれを宛名領域候補とい
う。例えば、図２に示すような郵便画像を例とすると、
宛名領域候補生成手段２１により図３に示すように宛名
領域候補Ｇ1 ，Ｇ2 ，Ｇ3 ，Ｇ4 及びＧ5 が生成され
る。また、図４は宛名領域候補生成手段２１で生成され
た５つの宛名領域候補の場所のみを示している。宛名領
域候補生成手段２１で生成された複数の宛名領域候補の
中で宛名領域を示す宛名領域候補はただ一つである。宛
名領域以外の各宛名領域候補は、差出領域か文様領域で
ある。The address area candidate generating means 21 generates a plurality of areas including the address area from the postal image input from the input device 1. Hereinafter, each of these areas is referred to as a destination area candidate. For example, taking a postal image as shown in FIG. 2 as an example,
As shown in FIG. 3, the destination area candidates G1, G2, G3, G4 and G5 are generated by the destination area candidate generating means 21. FIG. 4 shows only the locations of the five destination area candidates generated by the destination area candidate generating means 21. Of the plurality of destination area candidates generated by the destination area candidate generation means 21, only one destination area candidate indicating a destination area is provided. Each address area candidate other than the address area is a sender area or a pattern area.

【００１１】特徴量抽出手段２２は、宛名領域候補生成
手段２１で得られた各宛名領域候補から、宛名・差出分
布辞書３１に基づいて宛名・差出確率を計算するのに必
要な特徴量を抽出する手段である。特徴量の例として
は、図５に示すように郵便画像に原点０及びｘ軸ｙ軸を
設定した時の、宛名領域候補の中心点の絶対位置のｘ座
標値、ｙ座標値としている。ここでは、座標値は画素数
で表現することとするが、それは本質的なものではな
い。図４の宛名領域候補Ｇ1 ，Ｇ2 ，Ｇ3 ，Ｇ4 及びＧ
5 の場合の特徴量を図６に示している。例えば、宛名領
域候補Ｇ1 の特徴量はｘは６００、ｙは８０、宛名領域
候補Ｇ2 の特徴量はｘは４８０、ｙは８０である。この
特徴量は、宛名・差出分布辞書３１に基づいて宛名・差
出確率を計算するのに用いられる。The feature extraction means 22 extracts, from each address area candidate obtained by the address area candidate generation means 21, a feature required to calculate the address / submission probability based on the address / address distribution dictionary 31. It is a means to do. As an example of the feature amount, as shown in FIG. 5, when the origin 0 and the x-axis and the y-axis are set in the postal image, the x-coordinate value and the y-coordinate value of the absolute position of the center point of the destination area candidate are set. Here, the coordinate value is represented by the number of pixels, but this is not essential. The address area candidates G1, G2, G3, G4 and G in FIG.
FIG. 6 shows the feature values in the case of No. 5. For example, the feature quantity of the destination area candidate G1 is x = 600 and y = 80, and the feature quantity of the destination area candidate G2 is x = 480 and y = 80. This feature amount is used to calculate an address / sending probability based on the address / sending distribution dictionary 31.

【００１２】ここで、宛名確率、差出確率及び文様確率
の定義について説明する。まず、ある一つの宛名領域候
補に対して宛名領域候補から抽出した特徴量に基づいて
宛名領域候補が宛名領域と見なせる確率を、宛名領域候
補の宛名確率と定義する。また、宛名領域候補から抽出
した特徴量に基づいて宛名領域候補が差出領域と見なせ
る確率を、宛名領域候補の差出確率と定義する。更に、
宛名領域候補から抽出した特徴量に基づいて宛名領域候
補が文様領域と見なせる確率を、宛名領域候補の文様確
率と定義する。各宛名領域候補の文様確率は、１から宛
名領域候補の宛名確率、差出確率を引くことにより得ら
れる。図４の場合の宛名領域候補Ｇ1 ，Ｇ2 ，Ｇ3 ，Ｇ
4 及びＧ5 の宛名確率、差出確率、文様確率は、例えば
図７に示すような確率となる。図７の例では、例えば、
宛名領域候補Ｇ2 の宛名確率は０．１０、差出確率は
０．２０、文様確率は０．７０である。Here, definitions of the address probability, the sending probability and the pattern probability will be described. First, a probability that a destination area candidate can be regarded as a destination area based on a feature amount extracted from the destination area candidate for one destination area candidate is defined as a destination probability of the destination area candidate. Further, the probability that the destination area candidate can be regarded as the sending area based on the feature amount extracted from the destination area candidate is defined as the sending probability of the destination area candidate. Furthermore,
The probability that the destination area candidate can be regarded as a pattern area based on the feature amount extracted from the destination area candidate is defined as the pattern probability of the destination area candidate. The pattern probability of each destination area candidate is obtained by subtracting the destination probability and the sending probability of the destination area candidate from 1. Address area candidates G1, G2, G3, G in the case of FIG.
The addressing probability, sending probability, and pattern probability of 4 and G5 are, for example, the probabilities as shown in FIG. In the example of FIG. 7, for example,
The destination probability of the destination area candidate G2 is 0.10, the sending probability is 0.20, and the pattern probability is 0.70.

【００１３】宛名・差出確率計算手段２３は、特徴量抽
出手段２２で得られた宛名領域候補の特徴量をもとに宛
名・差出分布辞書３１を利用して各々の宛名領域候補の
宛名確率、差出確率、文様確率を計算する。この作業を
すべての宛名領域候補に対して実行する。順序決定手段
２４は、宛名・差出確率計算手段２３で得られた各宛名
領域候補の宛名確率及び差出確率に基づいて宛名領域候
補を宛名領域としてふさわしい順序に並べる手段であ
る。全ての宛名領域候補に対する順序は、以下のような
処理で得られる。まず、ある宛名領域候補に対してその
宛名確率及び差出確率から宛名領域候補の得点を計算す
る。宛名領域候補の得点をｍで表わすと、宛名領域候補
の得点ｍの付け方は、宛名領域候補の宛名確率ｐ、差出
確率ｑから次の数式（１）に従った計算により求められ
る。The address / sending probability calculating means 23 uses the address / sending distribution dictionary 31 based on the feature quantity of the address area candidates obtained by the feature quantity extracting means 22 to calculate the address probability of each address area candidate, Calculate the sending probability and the pattern probability. This operation is performed for all the address area candidates. The order determining means 24 is means for arranging the address area candidates in an order suitable for the address area based on the addressing probabilities and the sending probabilities of the respective address area candidates obtained by the address / sending probability calculating means 23. The order for all the address area candidates is obtained by the following processing. First, a score of a destination area candidate is calculated from the destination probability and the sending probability for a certain destination area candidate. When the score of the destination area candidate is represented by m, the way of assigning the score m of the destination area candidate is obtained from the destination probability p and the sending probability q of the destination area candidate by calculation according to the following equation (1).

【００１４】ｍ＝ｆ（ｐ）−ｇ（ｑ／（ｐ＋ｑ）） …（１）但し、
関数ｆ及びｇは引数をｕとして、ｆ（ｕ）＝ｕ、関数ｇ
（ｕ）＝ｕである。数式（１）が良いのは、ｑ／（ｐ＋
ｑ）は宛名領域候補が宛名領域か差出領域であると仮定
した場合の、宛名領域候補が宛名領域でない可能性を示
しているからである。この得点計算を全ての宛名領域候
補に対して実行し、得られた得点に基づいて全ての宛名
領域候補の得点の大きい順序を決定する。これが順序決
定処理で得られた全ての宛名領域候補の順序である。得
点ｍの計算において数式（１）では引数ｕに対し、ｆ
（ｕ）＝ｕ、ｇ（ｕ）＝ｕとしたが、ｆ（ｕ），ｇ
（ｕ）は引数ｕの単調増加関数であれば何でも良い。M = f (p) −g (q / (p + q)) (1) where
The functions f and g are represented by f (u) = u,
(U) = u. Equation (1) is good because q / (p +
This is because q) indicates a possibility that the destination area candidate is not a destination area when it is assumed that the destination area candidate is the destination area or the sending area. This score calculation is executed for all the address area candidates, and the order of the greatest score of all the address area candidates is determined based on the obtained scores. This is the order of all the address area candidates obtained in the order determination processing. In the calculation of the score m, in equation (1), for the argument u, f
(U) = u, g (u) = u, but f (u), g
(U) may be anything as long as it is a monotonically increasing function of the argument u.

【００１５】また、得点計算として数式（１）の代わり
に次のような数式（２）、数式（３）、数式（４）、数
式（５）、数式（６）を用いても良い。The following equation (2), equation (3), equation (4), equation (5), and equation (6) may be used instead of equation (1) for calculating the score.

【００１６】ｍ＝ｆ（ｐ）／ｇ（ｑ／（ｐ＋ｑ）） …（２）M = f (p) / g (q / (p + q)) (2)

【００１７】ｍ＝ｆ（ｐ）−ｇ（ｑ） …（３）M = f (p) -g (q) (3)

【００１８】ｍ＝ｆ（ｐ）／ｇ（ｑ） …（４）M = f (p) / g (q) (4)

【００１９】ｍ＝ｆ（ｐ）＋ｇ（ｐ／（ｐ＋ｑ）） …（５）M = f (p) + g (p / (p + q)) (5)

【００２０】ｍ＝ｆ（ｐ）×ｇ（ｐ／（ｐ＋ｑ）） …（６）数式
（２）が良いのは、数式（１）と同様にｑ／（ｐ＋ｑ）
は宛名領域候補が宛名領域か差出領域であると仮定した
場合の、宛名領域候補が宛名領域でない可能性を表して
いるからである。数式（３）及び数式（４）が良い理由
は、ｆ（ｐ）は宛名領域候補の宛名領域らしさを、ｇ
（ｑ）は宛名領域候補の差出領域らしさを表しているか
らである。また、数式（５）及び数式（６）が良い理由
は、１−ｑ／（ｐ＋ｑ）＝ｐ／（ｐ＋ｑ）であるので、
数式（１）及び数式（２）と同じ理由である。M = f (p) × g (p / (p + q)) (6) Equation (2) is better because q / (p + q) as in equation (1).
This is because the possibility that the destination area candidate is not the destination area when the destination area candidate is assumed to be the destination area or the sending area is indicated. Equations (3) and (4) are good because f (p) represents the likelihood of a destination area as a destination area candidate, and g (g)
This is because (q) indicates the likelihood of the destination area of the destination area candidate. Also, the reason why the equations (5) and (6) are good is that 1−q / (p + q) = p / (p + q).
This is the same reason as in Expressions (1) and (2).

【００２１】順序決定手段２４において宛名領域候補の
順序は各宛名領域候補の宛名確率及び差出確率から得点
を計算して全ての宛名領域候補を得点の大きい順序に並
べる方法の他にも次のようにしても良い。例えば、宛名
領域候補がｎ個あるとする。ｎ個の宛名領域候補にある
順序を与えた時にこの順序で始めからｉ番目（ｉ＝１，
…，ｎ）の宛名領域候補に対する宛名確率をｐi 、差出
確率をｑi 、文様確率をｒi とすると、以上の順序に対
して宛名領域が差出領域よりも始めの方にくる確率Ｐ
は、The order determining means 24 determines the order of the destination area candidates in addition to the method of calculating scores from the addressing probabilities and the sending probabilities of the respective address area candidates and arranging all the address area candidates in the order of larger points. You may do it. For example, it is assumed that there are n destination area candidates. When a certain order is given to n address area candidates, the i-th (i = 1, i = 1,
.., N), assuming that the addressing probability for the addressing area candidate is pi, the sending probability is qi, and the pattern probability is ri, the probability P that the addressing area comes earlier than the sending area in the above order.
Is

【００２２】[0022]

【数１】で計算できる。ｎ個の宛名領域候補の全ての順序を求
め、それぞれの宛名領域候補の順序に対して数式（７）
に基づいて確率Ｐを計算し、確率Ｐが最大となるような
順序を求める。(Equation 1) Can be calculated by The order of all the n addressing area candidates is obtained, and the order of each of the addressing area candidates is calculated by Expression (7).
, And calculates the order that maximizes the probability P.

【００２３】出力候補決定手段２５は、順序決定手段２
４で得られた宛名領域候補に対する順序に基づいて順序
の始めから与えられた候補数だけ宛名領域候補を選び、
それらの宛名領域候補を出力する。The output candidate deciding means 25 comprises the order deciding means 2
Based on the order for the destination area candidates obtained in step 4, the destination area candidates are selected by the number of candidates given from the beginning of the order,
Output those address area candidates.

【００２４】次に、記憶装置３について説明する。記憶
装置３内には、宛名・差出分布辞書３１が設けられてい
る。宛名・差出分布辞書３１は、図８に示すように特徴
量の範囲、その範囲に属する特徴量を持つ宛名領域候補
の宛名確率、差出確率及び文様確率の値の変換テーブル
から構成されている。但し、図８は変換テーブルの一部
を示している。ここで、図８において“１００〜１１
０”とは、１００以上１１０未満という意味を表わす。
図８の変換テーブルを用いて特徴量を確率値に変換する
場合、例えば、宛名領域候補の中心位置のｘ座標値が１
１０以上１２０未満の値で、且つ、ｙ座標値が５０以上
５５未満の値であったとすると、宛名領域候補の宛名確
率、差出確率及び文様確率はそれぞれ、０．７０，０．
２０，０．１０である。Next, the storage device 3 will be described. In the storage device 3, an address / sender distribution dictionary 31 is provided. As shown in FIG. 8, the address / submission distribution dictionary 31 includes a conversion table of values of feature ranges, destination probabilities, destination probabilities, and pattern probabilities of the destination area candidates having the feature amounts belonging to the ranges. However, FIG. 8 shows a part of the conversion table. Here, in FIG.
"0" means that the value is 100 or more and less than 110.
When the feature value is converted into the probability value using the conversion table of FIG. 8, for example, the x coordinate value of the center position of the destination area candidate is 1
Assuming that the value is 10 or more and less than 120 and the y coordinate value is 50 or more and less than 55, the addressing probability, the sending probability and the pattern probability of the addressing area candidate are 0.70, 0.
20, 0.10.

【００２５】次に、本実施形態の具体的な動作について
説明する。まず、宛名読取対象となる郵便物は、スキャ
ナやカメラからなる入力装置１により読み取られ、その
郵便物の宛名が書かれている面の画像情報が読み取られ
る。例えば、入力装置１により図２の郵便画像が読み取
られたものとする。入力装置１により読み取られた郵便
画像の情報は宛名領域候補生成手段２１に入力され、宛
名領域候補生成手段２１では前述のように郵便画像の部
分画像に基づいて宛名領域を含む複数の宛名領域候補が
生成される。Next, a specific operation of the present embodiment will be described. First, a postal matter to be read as an address is read by the input device 1 including a scanner or a camera, and image information on the surface on which the address of the postal matter is written is read. For example, it is assumed that the postal image of FIG. The information of the postal image read by the input device 1 is input to the destination area candidate generating means 21, and the destination area candidate generating means 21 outputs a plurality of address area candidates including the address area based on the partial image of the postal image as described above. Is generated.

【００２６】宛名領域候補生成手段２１では、例えば、
１９９２年のパターン・レコグニション、第２５巻、第
１２番（Pattern Recognition ，Vol ．25，No．12，19
92）中の Anil K .Jain とSushil K．Bhattacharjee の
論文“ADDRESS BLOCK LOCATION ON ENVELOPES USING GA
BOR FILTERS ”（ｐ．１４５９〜ｐ．１４７７）中、
ｐ．１４７０に記載されている方法を用いて宛名領域を
含む複数の宛名領域候補を作成し、図３に示すようにＧ
1 ，Ｇ2 ，Ｇ3 ，Ｇ4 及びＧ5 の５つの宛名領域候補を
生成する。図４は宛名領域候補生成手段２１で得られた
５つの宛名領域候補の場所のみを示しているが、Ｇ1 は
切手の領域なので文様領域、Ｇ2 は消印の領域なので文
様領域、Ｇ3 は絵の領域なので文様領域、Ｇ4 は差出人
の住所の領域なので差出領域、Ｇ5 は宛名領域である。
但し、それぞれの宛名領域候補が宛名領域なのか差出領
域なのか文様領域なのかは、宛名領域候補生成手段２１
が宛名領域候補を生成した直後の時点では分かっていな
い。In the destination area candidate generating means 21, for example,
Pattern Recognition, 1992, Vol. 25, No. 12 (Pattern Recognition, Vol. 25, No. 12, 19)
92) Anil K. Jain and Sushil K. Bhattacharjee's paper "ADDRESS BLOCK LOCATION ON ENVELOPES USING GA
BOR FILTERS "(p. 1449 to p. 1277)
p. A plurality of address area candidates including the address area are created using the method described in FIG. 1470, and as shown in FIG.
5, five destination area candidates of G2, G3, G4 and G5 are generated. FIG. 4 shows only the locations of the five address area candidates obtained by the address area candidate generating means 21, but G1 is a stamp area, a pattern area, G2 is a postmark area, a pattern area, and G3 is a picture area. Therefore, the pattern area, G4 is the sender's address area because it is the sender's address area, and G5 is the address area.
However, whether each address area candidate is an address area, a sending area, or a pattern area is determined by the address area candidate generating means 21.
Is not known at the time immediately after generating the destination area candidate.

【００２７】宛名領域候補生成手段２１で生成された宛
名領域候補Ｇ1 ，Ｇ2 ，Ｇ3 ，Ｇ4及びＧ5 は特徴量抽
出手段２２に供給される。図１０は特徴量抽出手段２２
の処理の流れを示すフローチャートである。まず、全て
の宛名領域候補に対して処理を実行したか否かを判定し
（ステップＡ１）、未処理の候補があればその中から１
つを選ぶ（ステップＡ２）。選ぶ順番は何でも良い。次
いで、選んだ宛名領域候補について特徴量を計算する
（ステップＡ３）。特徴量は図５で説明したように郵便
画像に原点０及びｘ軸及びｙ軸を定めた時の、宛名領域
候補の中心点の絶対位置の座標値ｘ，ｙである。特徴量
抽出手段２２はステップＡ１〜Ａ３の処理を繰り返し行
い、すべての宛名領域候補について特徴量を計算する。
図４における宛名領域候補Ｇ1 ，Ｇ2 ，Ｇ3 ，Ｇ4 ，Ｇ
5 の中心点の絶対位置の座標値ｘ及びｙは図６の通りで
ある。図６では、例えば、宛名領域候補Ｇ2 の特徴量ｘ
は４８０、特徴量ｙは８０である。本実施形態では座標
値の単位は画素数である。但し、これは本質的な問題で
はない。The destination area candidates G 1, G 2, G 3, G 4 and G 5 generated by the destination area candidate generation means 21 are supplied to the feature quantity extraction means 22. FIG. 10 shows a feature amount extracting unit 22.
3 is a flowchart showing the flow of the processing of FIG. First, it is determined whether or not the processing has been executed for all the address area candidates (step A1).
One is selected (step A2). You can choose any order. Next, a feature amount is calculated for the selected destination area candidate (step A3). The feature amount is the coordinate value x, y of the absolute position of the center point of the destination area candidate when the origin 0 and the x-axis and the y-axis are defined in the postal image as described in FIG. The feature amount extraction means 22 repeats the processing of steps A1 to A3, and calculates feature amounts for all the address area candidates.
Address area candidates G1, G2, G3, G4, G in FIG.
The coordinate values x and y of the absolute position of the center point of No. 5 are as shown in FIG. In FIG. 6, for example, the feature amount x of the destination area candidate G2
Is 480 and the feature amount y is 80. In this embodiment, the unit of the coordinate value is the number of pixels. However, this is not an essential problem.

【００２８】なお、宛名領域候補の特徴量としては、宛
名領域候補の絶対位置の他に様々な特徴量を用いてもよ
い。例えば、他の宛名領域候補との相対位置、郵便画像
の大きさを正規化して得られた位置、宛名領域候補の外
接矩形の幅や高さがある。また、宛名領域候補から、例
えば特願平７−１０５５７５号に記載された方法で文字
行抽出を行い、抽出した文字行に対する特徴量、例えば
文字行の左端のｘ座標を見てその座標値がすべての行で
一致しているか否かといった行の配置情報であってもよ
い。更に、行画像を例えば１９９２年、画像の処理と認
識、安居院猛・長尾智晴共著、ｐ．１４〜ｐ．１７にあ
るような手法で二値化して、行画像に対して横方向の白
画素／黒画素反転数や横方向の黒画素連続数の平均値や
最頻値等の特徴量を求めて、すべての文字行に対して特
徴量の平均をとった値を用いてもよい。It should be noted that various feature values other than the absolute position of the address region candidate may be used as the feature value of the destination region candidate. For example, there are a relative position to another destination area candidate, a position obtained by normalizing the size of the postal image, and a width and height of a circumscribed rectangle of the destination area candidate. Further, a character line is extracted from the address area candidate by, for example, a method described in Japanese Patent Application No. 7-105575, and the coordinate value of the extracted character line, for example, the x-coordinate of the left end of the character line is determined. Row arrangement information such as whether or not they match in all rows may be used. Further, for example, in 1992, processing and recognition of a line image, co-authored by Takeshi Yasui and Tomoharu Nagao, p. 14 to p. 17 to obtain a feature amount such as an average value or a mode value of the number of white pixel / black pixel inversions in the horizontal direction and the number of continuous black pixels in the horizontal direction with respect to the row image. A value obtained by averaging feature amounts for all character lines may be used.

【００２９】宛名・差出分布辞書３１は図８に示すよう
に宛名領域候補の特徴量の範囲とその範囲に対応する宛
名確率、差出確率、文様確率との変換テーブルである。
図８ではｘ，ｙという特徴量そのものの組合せと確率値
との変換テーブルを示しているが、ｘ，ｙ以外の特徴量
を含む複数個の特徴量をもとに演算を行って新しい特徴
量を算出し、それと確率値との変換テーブルを作成して
も良い。As shown in FIG. 8, the address / submission distribution dictionary 31 is a conversion table of the range of the feature amount of the destination area candidate and the address probability, the transmission probability, and the pattern probability corresponding to the range.
FIG. 8 shows a conversion table between a combination of the feature quantities x and y itself and a probability value. However, a new feature quantity is calculated by performing a calculation based on a plurality of feature quantities including feature quantities other than x and y. May be calculated, and a conversion table between the calculated value and the probability value may be created.

【００３０】宛名・差出確率計算手段２３では、宛名・
差出分布辞書３１を参照して特徴量抽出手段２２で計算
された特徴量をもつ各宛名領域候補が、宛名領域なの
か、差出領域なのか、文様領域なのかの確からしさ、即
ち、各々の宛名領域候補の宛名確率、差出確率、文様確
率を求める処理を行う。例えば、宛名領域候補から特徴
量抽出手段２２により宛名領域候補の中心点の位置が
（ｘ＝１１５，ｙ＝５３）であることが分かると、図８
の変換テーブルを参照して、ｘは特徴量ｘの範囲１１０
〜１２０に属し、ｙは特徴量ｙの範囲５０〜５５に属す
るので、この宛名領域候補の宛名確率は０．７０、差出
確率は０．２０、文様確率は０．１０であると各確率を
求めることができる。宛名・差出確率計算手段２３で
は、図６の特徴量から図７に示すように各宛名領域候補
の宛名確率ｐ、差出確率ｑ及び文様確率ｒを得ることが
できる。In the address / sending probability calculating means 23, the address
The likelihood that each address area candidate having the feature amount calculated by the feature amount extraction unit 22 with reference to the sender distribution dictionary 31 is an address area, a sender area, or a pattern area, that is, each address area candidate. A process for calculating the address probability, the sending probability, and the pattern probability of the area candidate is performed. For example, if it is found from the destination area candidate that the position of the center point of the destination area candidate is (x = 115, y = 53) by the feature amount extracting means 22, FIG.
X is the range 110 of the feature amount x
１２０120, and y belongs to the range of 50 to 55 of the feature quantity y. Therefore, when the addressing probability of this addressing area candidate is 0.70, the sending probability is 0.20, and the pattern probability is 0.10, You can ask. The address / sending probability calculating means 23 can obtain the address probability p, the sending probability q, and the pattern probability r of each address area candidate from the feature amounts in FIG. 6 as shown in FIG.

【００３１】次に、順序決定手段２４では、宛名・差出
確率計算手段２３で得られた各宛名領域候補の宛名確率
及び差出確率に基づいて全ての宛名領域候補の順序を決
定する処理を行う。図１１は順序決定手段２４の処理の
流れを示すフローチャートである。まず、全ての宛名領
域候補に対して処理を実行したか否かを判定し（ステッ
プＢ１）、未処理の候補があればその中から１つを選ぶ
（ステップＢ２）。選ぶ順番は何でも良い。次いで、選
んだ宛名領域候補について宛名領域候補の宛名確率を
ｐ、差出確率をｑとして、ｐとｑから数式（１）に従っ
た計算で宛名領域候補の得点を計算する（ステップＢ
３）。但し、関数ｆ及びｇは、引数をｕとして、ｆ
（ｕ）＝ｕ、関数ｇ（ｕ）＝ｕである。Next, the order determining means 24 performs a process of determining the order of all the address area candidates based on the addressing probabilities and the sending probabilities of the respective address area candidates obtained by the address / sending probability calculating means 23. FIG. 11 is a flowchart showing the flow of the processing of the order determining means 24. First, it is determined whether or not the processing has been executed for all the address area candidates (step B1), and if there is an unprocessed candidate, one is selected (step B2). You can choose any order. Next, for the selected destination area candidate, the score of the destination area candidate is calculated by p and q in accordance with Equation (1), where p is the destination probability of the destination area candidate and q is the sending probability (step B).
3). However, the functions f and g are f
(U) = u and the function g (u) = u.

【００３２】以下、ステップＢ１〜Ｂ３の処理を繰り返
し行い、各宛名領域候補の得点を計算していく。全ての
宛名領域候補の得点を計算すると、得点の大きい順に宛
名領域候補をソートする（ステップＢ４）。これが順序
決定処理で得られた全ての宛名領域候補の順序である。
ステップＢ３では、例えば、図６の場合を例として、数
式（１）に基づいて得点を計算すると、図９に示すよう
な結果が得られ、Ｇ5，Ｇ4 ，Ｇ3 ，Ｇ1 ，Ｇ2 という
順序が得られる。なお、ｆ（ｕ）＝ｕ及びｇ（ｕ）＝ｕ
としたが、ｆ（ｕ），ｇ（ｕ）は引数ｕの単調増加関数
であれば何でも良い。また、先にも説明したが、得点計
算として数式（１）の代わりに数式（２）〜数式（６）
を用いても良い。Thereafter, the processing of steps B1 to B3 is repeated to calculate the score of each address area candidate. When the scores of all the address area candidates are calculated, the address area candidates are sorted in descending order of the score (step B4). This is the order of all the address area candidates obtained in the order determination processing.
In step B3, for example, in the case of FIG. 6, when the score is calculated based on equation (1), the result shown in FIG. 9 is obtained, and the order of G5, G4, G3, G1, G2 is obtained. Can be Note that f (u) = u and g (u) = u
However, f (u) and g (u) may be anything as long as they are monotonically increasing functions of the argument u. As described above, instead of the equation (1), the equations (2) to (6) are used for the score calculation.
May be used.

【００３３】図１２は順序決定手段２４の他の順序決定
方法を示すフローチャートである。まず、複数の宛名領
域候補の取り得る順序を全て作成する（ステップＣ
１）。次に、得られた全ての順序に対し、処理が終了し
たか否かを判定し（ステップＣ２）、未処理のものがあ
れば、全ての順序の中から未処理のものを１つ選ぶ（ス
テップＣ３）。選ぶ順番は何でもよい。次いで、選んだ
宛名領域候補の順序について、その順序に対する宛名領
域が差出領域よりも始めの方にくる確率Ｐを、数式
（７）に基づいて計算する（ステップＣ４）。このよう
にしてステップＣ２〜Ｃ４の処理を繰り返し行い、全て
の順序ついて確率Ｐを求める処理を行う。また、宛名領
域候補の全ての順序の中から確率Ｐを最大とするような
宛名領域候補の順序を求める処理を行う（ステップＣ
５）。これが順序決定処理で得られた全ての宛名領域候
補に対する順序である。FIG. 12 is a flowchart showing another order determining method of the order determining means 24. First, all possible orders of a plurality of destination area candidates are created (step C).
1). Next, it is determined whether or not the processing has been completed for all the obtained orders (step C2). If there is any unprocessed one, one unprocessed one is selected from all the orders (step C2). Step C3). You can choose any order. Next, with respect to the order of the selected destination area candidates, the probability P that the destination area for the order comes earlier than the source area is calculated based on the equation (7) (step C4). In this way, the processing of steps C2 to C4 is repeated, and the processing of obtaining the probabilities P for all the orders is performed. In addition, a process is performed to determine the order of the destination area candidates that maximizes the probability P from all the orders of the destination area candidates (step C).
5). This is the order for all the address area candidates obtained in the order determination processing.

【００３４】出力候補決定手段２５では、順序決定手段
２４で得られた宛名領域候補の順序に基づいて順序の先
頭から宛名領域候補を出力する。出力する宛名領域候補
の数が例えば２個と予め決められているならば、Ｇ5 ，
Ｇ4 を出力する。また、出力する宛名領域候補の数が１
個であればＧ5 を出力する。The output candidate determining means 25 outputs the destination area candidates from the head of the order based on the order of the destination area candidates obtained by the order determining means 24. If the number of destination area candidates to be output is predetermined to be, for example, two, G5,
G4 is output. Also, if the number of destination area candidates to be output is 1
If so, G5 is output.

【００３５】次に、宛名・差出分布辞書３１の作成方法
について説明する。宛名・差出分布辞書とは、特徴量の
範囲とその範囲にある特徴量を持つ宛名領域候補の宛名
確率、差出確率、文様確率を、いろいろな特徴量の範囲
について並べたものである。今、特徴量の範囲をＡで表
わし、その範囲にある特徴量を持つ宛名領域候補の宛名
確率をｐ（Ａ）、差出確率をｑ（Ａ）、文様確率をｒ
（Ａ）で表わすものとする。また、各範囲Ａに対する関
数ｈｐ（Ａ）、ｈｑ（Ａ）、ｈｒ（Ａ）を用意し、それ
ぞれ宛名領域の頻度、差出領域の頻度、文様領域の頻度
という。Next, a method for creating the address / submission distribution dictionary 31 will be described. The address / submission distribution dictionary is a list in which the address ranges, the address probabilities, and the pattern probabilities of the address region candidates having the characteristic amounts within the range of the characteristic amounts are arranged for various characteristic amount ranges. Now, the range of the feature amount is represented by A, the destination probability of the destination area candidate having the feature amount within the range is p (A), the sending probability is q (A), and the pattern probability is r.
(A). Also, functions hp (A), hq (A), and hr (A) for each range A are prepared, and are respectively referred to as a frequency of a destination area, a frequency of a sending area, and a frequency of a pattern area.

【００３６】図１３は宛名・差出分布辞書３１の作成方
法を示すフローチャートである。まず、予めｍ個の郵便
画像を学習データとして用意する。また、特徴量、即
ち、宛名領域候補の中心点の絶対位置ｘ，ｙに対し、ｘ
の取り得る値の最小値ｘs 、最大値ｘl 、ｙの取り得る
値の最小値ｙs 、最大値ｙl を予め適当に決定する。例
えば、ｘs として０、ｘl として学習データであるｍ個
の郵便画像の横幅（横とは図５のｘ軸の方向）の最大
値、ｙs として０、ｙl として学習データであるｍ個の
郵便画像の縦幅（縦とは図５のｙ軸の方向）の最大値を
それぞれ設定する。横幅、縦幅としては特徴値ｘ，ｙと
同様に画素値を用いるものとする。そして、ある整数ｋ
に対し、ｘs ＋（ｉ×（ｘl −ｘs ））／ｋ〜ｘs ＋
（（ｉ＋ｌ）×（ｘl −ｘs ））／ｋ、且つ、ｙs ＋
（ｊ×（ｙl −ｙs ））／ｋ〜ｙs ＋（（ｊ＋ｌ）×
（ｙl−ｙs ））／ｋ（ｉ＝０，１，２，…，ｋ−１、
ｊ＝０，１，２，…，ｋ−１の全ての組み合わせを考え
る）のｋ2 個の範囲を考える。本実施形態では、ｋを１
０として合計１００個の範囲を用いている。なお、本実
施形態では各特徴量に対し、等間隔で範囲を設定してい
るが、それは本質的な問題ではない。FIG. 13 is a flowchart showing a method of creating the address / submission distribution dictionary 31. First, m mail images are prepared in advance as learning data. Further, for the feature quantity, that is, for the absolute position x, y of the center point of the destination area candidate, x
The minimum value xs, maximum value xl, and minimum value ys, maximum value yl of the possible values of y are appropriately determined in advance. For example, xs is 0, x is the maximum value of the width of the postal image of m pieces of learning data (xl is the direction of the x-axis in FIG. 5), ys is 0, and m postal images of the learning data are yl as yl. (The vertical is the direction of the y-axis in FIG. 5). As the width and the height, pixel values are used in the same manner as the characteristic values x and y. And an integer k
Xs + (ix (xl-xs)) / k to xs +
((I + 1) × (xl−xs)) / k and ys +
(J × (yl−ys)) / k〜ys + ((j + 1) ×
(Yl-ys)) / k (i = 0, 1, 2,..., K-1,
Consider all combinations of j = 0, 1, 2,..., k−1). In the present embodiment, k is 1
A total of 100 ranges is used as 0. In the present embodiment, ranges are set at equal intervals for each feature amount, but this is not an essential problem.

【００３７】具体的な方法としては、まず、ｍ個の郵便
画像を入力する（図１３のステップＤ１）。次いで、全
ての特徴量の範囲Ａに対し、宛名頻度ｈｐ（Ａ）、差出
頻度ｈｑ（Ａ）、文様頻度ｈｒ（Ａ）に０を代入し、頻
度の初期化を行う（ステップＤ２）。また、ｍ個の郵便
画像全てに対してステップＤ５の処理を実行したか否か
を判定し（ステップＤ３）、未処理のものがあれば、ｍ
個の郵便画像の中から未処理の郵便画像を１つ選ぶ（ス
テップＤ４）。順番は何でも良い。また、選んだ郵便画
像について郵便画像を入力として宛名領域候補生成手段
２１によりｎ個の宛名領域候補を生成する（ステップＤ
５）。次いで、宛名領域候補について特徴量抽出手段２
１により特徴量、即ち、宛名領域候補の中心点の絶対位
置ｘ，ｙを計算する（ステップＤ６）。As a specific method, first, m postal images are input (step D1 in FIG. 13). Next, 0 is substituted for the address frequency hp (A), the sending frequency hq (A), and the pattern frequency hr (A) for the range A of all the feature amounts, and the frequency is initialized (step D2). Further, it is determined whether or not the processing of step D5 has been executed for all m postal images (step D3).
One unprocessed postal image is selected from the individual postal images (step D4). The order does not matter. Further, for the selected postal image, the postal image is input and the destination area candidate generating means 21 generates n destination area candidates (step D).
5). Next, the feature amount extracting means 2 for the address area candidate
1, the feature quantity, that is, the absolute position x, y of the center point of the destination area candidate is calculated (step D6).

【００３８】ここで、ある宛名領域候補に対して、宛名
領域がもつ特徴量が特徴量の範囲Ａに属するものとし
て、宛名領域候補が宛名領域であるか差出領域であるか
文様領域であるかを人手により判定し、宛名領域候補が
宛名領域であるならば、ｈｐ（Ａ）の値を１増やし、宛
名領域候補が差出領域であるならば、ｈｑ（Ａ）の値を
１増やし、宛名領域候補が文様領域であるならば、ｈｒ
（Ａ）の値を１増やす処理を行う（ステップＤ７）。ス
テップＤ７では、郵便画像の宛名領域候補に対して以上
のような処理を施すことにより頻度の更新を行う。以
下、ステップＤ３〜Ｄ７の処理を繰り返し行い、全ての
郵便画像について同様の処理を行う。全ての郵便画像に
ついて処理を終了すると、数式（８）、数式（９）、数
式（１０）を用いてその特徴量の範囲に対する宛名確率
ｐ（Ａ）、差出確率ｑ（Ａ）、文様確率ｒ（Ａ）を計算
する（ステップＤ８）。Here, for a certain destination area candidate, it is assumed that the feature amount of the destination area belongs to the feature amount range A and whether the destination area candidate is the destination area, the sending area, or the pattern area. Is manually determined, and if the destination area candidate is the destination area, the value of hp (A) is increased by 1. If the destination area candidate is the sending area, the value of hq (A) is increased by 1, and the address area is increased. If the candidate is a pattern area, hr
A process of increasing the value of (A) by 1 is performed (step D7). In step D7, the frequency is updated by performing the above-described processing on the destination area candidates of the postal image. Hereinafter, the processing of steps D3 to D7 is repeated, and the same processing is performed for all postal images. When the processing is completed for all postal images, the addressing probability p (A), the sending probability q (A), and the pattern probability r for the range of the feature amount are calculated using Expressions (8), (9), and (10). (A) is calculated (step D8).

【００３９】ｐ（Ａ）＝ｈｐ（Ａ）／（ｈｐ（Ａ）＋ｈｑ（Ａ）＋ｈｒ（Ａ））…（８）P (A) = hp (A) / (hp (A) + hq (A) + hr (A)) (8)

【００４０】ｑ（Ａ）＝ｈｑ（Ａ）／（ｈｐ（Ａ）＋ｈｑ（Ａ）＋ｈｒ（Ａ））…（９）Q (A) = hq (A) / (hp (A) + hq (A) + hr (A)) (9)

【００４１】ｒ（Ａ）＝ｈｒ（Ａ）／（ｈｐ（Ａ）＋ｈｑ（Ａ）＋ｈｒ（Ａ））…（１０）このようにして特徴量と各確率が得られ、全ての確率の
範囲Ａとｐ（Ａ），ｑ（Ａ），ｒ（Ａ）を対応させて宛
名・差出分布辞書３１に記憶させておく。R (A) = hr (A) / (hp (A) + hq (A) + hr (A)) (10) In this way, the feature amount and each probability are obtained, and the range A of all probabilities is obtained. And p (A), q (A), r (A) are stored in the address / submission distribution dictionary 31 in association with each other.

【００４２】図１４は本発明の第２の実施形態の構成を
示すブロック図である。本実施形態では、第１の実施形
態の記憶装置３内に宛名確率変換辞書３２が設けられ、
データ処理装置２内に宛名確率変換手段２６が設けられ
ている。宛名確率変換手段２６は、宛名・差出確率計算
手段２３で得られた宛名確率、差出確率、文様確率を、
その宛名確率、差出確率、文様確率をもとにして別の宛
名確率、差出確率、文様確率に補正する手段である。宛
名確率変換辞書３２は、宛名確率変換手段２６により旧
宛名確率等を新宛名確率等に変換するのに必要な情報を
格納するための辞書である。FIG. 14 is a block diagram showing the configuration of the second embodiment of the present invention. In the present embodiment, an address probability conversion dictionary 32 is provided in the storage device 3 of the first embodiment,
An address probability conversion means 26 is provided in the data processing device 2. The address probability converting means 26 calculates the address probability, the sending probability, and the pattern probability obtained by the address / sending probability calculating means 23,
This is a means for correcting to another addressing probability, sending probability and pattern probability based on the addressing probability, sending probability and pattern probability. The address probability conversion dictionary 32 is a dictionary for storing information necessary for converting the old address probability and the like into the new address probability and the like by the address probability conversion means 26.

【００４３】第１の実施形態における各宛名領域候補の
宛名確率や差出確率は各領域毎に独立して算出したもの
であり、宛名領域が宛名領域候補のなかでただ一つ存在
するという状況を反映していない。本実施形態では、こ
の点を改良し、更に宛名領域の検出精度を向上するもの
である。まず、以下に必要な用語を定義する。ある宛名
領域候補に対して特徴量抽出手段２２で得られた特徴量
をもとに全ての宛名領域候補の中で、宛名領域候補が宛
名領域である確率を、その対象とする宛名領域候補の宛
名確率と改めて定義する。また、宛名領域候補以外の宛
名領域候補が宛名領域であるという条件の下で、宛名領
域候補が差出領域である確率を差出確率と改めて定義す
る。更に、宛名領域候補以外が宛名領域であるという条
件の下で宛名領域候補が文様領域である確率を文様確率
と改めて定義する。The addressing probability and the sending probability of each addressing area candidate in the first embodiment are calculated independently for each area, and it is assumed that there is only one addressing area among the addressing area candidates. Not reflected. In the present embodiment, this point is improved, and the detection accuracy of the destination area is further improved. First, the necessary terms are defined below. Based on the feature value obtained by the feature value extracting means 22 for a certain destination area candidate, the probability that the destination area candidate is the destination area among all the destination area candidates is determined by the probability of the target destination area candidate. Define again as address probability. Further, under the condition that a destination area candidate other than the destination area candidate is the destination area, the probability that the destination area candidate is the sending area is defined again as the sending probability. Further, the probability that the destination area candidate is a pattern area is defined as a pattern probability under the condition that the address area other than the destination area candidate is a destination area.

【００４４】宛名領域候補が差出領域及び文様領域であ
る確率は、ここで定義した宛名確率をｐ′、差出確率を
ｑ′、文様確率をｒ′として、それぞれ（１−ｐ′）
ｑ′及び（１−ｐ′）ｒ′と表わすことができる。以
下、混同を避ける為に宛名・差出確率計算手段２３で得
られる確率のことを旧確率、宛名確率変換手段２６で得
られる確率のことを新確率という。例えば、図４に示す
ようなＧ1 ，Ｇ2 ，Ｇ3 ，Ｇ4 ，Ｇ5 の宛名領域候補が
得られた場合の、新宛名確率ｐ′、新差出確率ｑ′、新
文様確率ｒ′は図１５に示す通りとなる。図１５では、
宛名領域候補Ｇ1 、Ｇ2 、Ｇ3 、Ｇ4 、Ｇ5 の中で、例
えばＧ1 の新宛名確率は０．０４、Ｇ2 の新宛名確率は
０．０８である。The probabilities that the destination area candidates are the sending area and the pattern area are defined as (1-p ') where p' is the destination probability defined here, q 'is the sending probability, and r' is the pattern probability.
q 'and (1-p') r '. Hereinafter, in order to avoid confusion, the probability obtained by the address / submission probability calculation means 23 is referred to as an old probability, and the probability obtained by the address probability conversion means 26 is referred to as a new probability. For example, when the address area candidates of G1, G2, G3, G4, and G5 as shown in FIG. 4 are obtained, the new address probability p ', the new delivery probability q', and the new pattern probability r 'are shown in FIG. It becomes street. In FIG.
Among the address area candidates G1, G2, G3, G4, G5, for example, the new address probability of G1 is 0.04, and the new address probability of G2 is 0.08.

【００４５】宛名確率変換手段２６は、宛名・差出確率
計算手段２３で得られた旧宛名確率ｐ、旧差出確率ｑ、
旧文様確率ｒを、宛名確率変換辞書３２に基づいて新宛
名確率ｐ′、新差出確率ｑ′、新文様確率ｒ′に変換す
る。宛名確率変換手段２６では宛名・差出確率計算手段
２３で得られた旧宛名確率ｐから、新宛名確率ｐ′を例
えば次の数式（１１）を用いて計算する。The address probability conversion means 26 calculates the old address probability p, the old address probability q, and the old address probability p obtained by the address / delivery probability calculation means 23.
The old pattern probability r is converted into a new address probability p ′, a new submission probability q ′, and a new pattern probability r ′ based on the destination probability conversion dictionary 32. The address probability conversion means 26 calculates a new address probability p 'from the old address probability p obtained by the address / delivery probability calculation means 23 using, for example, the following equation (11).

【００４６】ｐ′＝ｐ／Ｓ …（１１）但し、Ｓは全て
の宛名領域候補に対する旧宛名確率の、全ての宛名領域
候補での総和である。P '= p / S (11) where S is the total sum of the old address probabilities for all the address area candidates in all the address area candidates.

【００４７】または、例えば、宛名領域候補の旧宛名確
率ｐの範囲、宛名領域候補以外の宛名領域候補の旧宛名
確率の最大値ｐmax の範囲、宛名領域候補の新宛名確率
ｐ′の変換テーブルを宛名確率変換辞書３２に用意して
おく。この時の宛名確率変換辞書３２は、例えば、図１
６のように表わせる。但し、データ量が多いので一部の
みを示している（…の部分は省略した部分である）。
今、確率の範囲をＡと表わし、その範囲にあるｐ及びｐ
max を持つ宛名領域候補の新宛名確率をｐ′（Ａ）、新
差出確率をｑ′（Ａ）、新文様確率をｒ′（Ａ）で表わ
すものとする。また、各範囲Ａに対する関数ｈｐ
（Ａ）、ｈｑ（Ａ）、ｈｒ（Ａ）を用意し、それぞれ宛
名領域の頻度、差出領域の頻度、文様領域の頻度とい
う。Alternatively, for example, a conversion table of the range of the old address probability p of the address area candidate, the range of the maximum value pmax of the old address probability of the address area candidates other than the address area candidate, and the new address probability p 'of the address area candidate is stored in the conversion table. It is prepared in the address probability conversion dictionary 32. The address probability conversion dictionary 32 at this time is, for example, as shown in FIG.
It can be expressed as follows. However, since the data amount is large, only a part is shown (the part indicated by... Is omitted).
Now, the range of the probability is represented as A, and p and p in the range are
The new address probability of the address area candidate having max is represented by p '(A), the new sending probability is represented by q' (A), and the new pattern probability is represented by r '(A). Also, the function hp for each range A
(A), hq (A), and hr (A) are prepared, and are respectively referred to as a frequency of a destination area, a frequency of a sending area, and a frequency of a pattern area.

【００４８】図１７は宛名確率変換辞書３２の作成方法
を示すフローチャートである。まず、予めｍ個の郵便画
像を学習データとして用意する。また、確率ｐ及びｐma
x に対し、ｐの取り得る値の最小値ａs 、最大値ａl 、
ｐmax の取り得る値の最小値ｂs 、最大値ｂl を予め適
当に決定する。例えば、ａs として０、ａl として１、
ｂs として０、ｂl として１をそれぞれ設定する。そし
て、ある整数ｋに対し、特徴量ｐが、ａs ＋（ｉ×（ａ
l −ａs ））／ｋ〜ａs ＋（（ｉ＋ｌ）×（ａl −ａs
））／ｋで、且つ、特徴量ｐmax が、ｂs ＋（ｊ×
（ｂl −ｂs ））／ｋ〜ｂs ＋（（ｊ＋ｌ）×（ｂl −
ｂs ））／ｋ（ｉ＝０，１，２，…，ｋ−１、ｊ＝０，
１，２，…，ｋ−１の全ての組み合わせを考える）のｋ
2 個の範囲を考える。本実施形態はｋを２０として、合
計４００の範囲を用いている。なお、本実施形態では各
特徴量に対し、等間隔で範囲を設定しているが、それは
本質的な問題ではない。FIG. 17 is a flowchart showing a method of creating the address probability conversion dictionary 32. First, m mail images are prepared in advance as learning data. Also, the probabilities p and pma
For x, the minimum value as, the maximum value al, of the possible values of p,
The minimum value bs and the maximum value bl of the possible values of pmax are appropriately determined in advance. For example, 0 for as, 1 for al,
0 is set as bs and 1 is set as bl. Then, for a certain integer k, the feature amount p is as + (i × (a
l−as)) / k〜as + ((i + 1) × (al−as)
)) / K and the feature value pmax is bs + (j ×
(Bl-bs)) / k-bs + ((j + 1) .times. (Bl-
bs)) / k (i = 0, 1, 2,..., k-1, j = 0,
Consider all combinations of 1, 2, ..., k-1)
Consider two ranges. In the present embodiment, k is set to 20 and a total range of 400 is used. In the present embodiment, ranges are set at equal intervals for each feature amount, but this is not an essential problem.

【００４９】具体的な方法としては、まず、ｍ個の郵便
画像を入力する（図１７のステップＥ１）。次いで、全
ての特徴量の範囲Ａに対し、宛名頻度ｈｐ（Ａ）、差出
頻度ｈｑ（Ａ）、文様頻度ｈｒ（Ａ）に０を代入し、頻
度の初期化を行う（ステップＥ２）。また、ｍ個の郵便
画像全てに対して処理を実行したか否かを判定し（ステ
ップＥ３）、未処理のものがあれば、ｍ個の郵便画像の
中から未処理のものを１つ選ぶ（ステップＥ４）。順番
は何でも良い。次に、選んだ郵便画像について宛名領域
候補生成手段２１によりｎ個の宛名領域候補を生成する
（ステップＥ５）。また、ｎ個の宛名領域候補について
特徴量抽出手段２１により各宛名領域候補の特徴量、即
ち、宛名領域候補の中心点の絶対位置ｘ，ｙを計算する
（ステップＥ６）。更に、ｎ個の宛名領域候補について
ステップＤ５で得られた特徴量を用いて、宛名・確率計
算手段２３により旧宛名確率、旧差出確率、旧文様確率
を計算する（ステップＥ７）。As a specific method, first, m postal images are input (step E1 in FIG. 17). Next, 0 is substituted into the address frequency hp (A), the sending frequency hq (A), and the pattern frequency hr (A) for the range A of all the feature amounts, and the frequency is initialized (step E2). Further, it is determined whether or not the processing has been performed on all of the m postal images (step E3), and if there is an unprocessed one, one unprocessed one is selected from the m postal images. (Step E4). The order does not matter. Next, n destination area candidates are generated for the selected postal image by the destination area candidate generating means 21 (step E5). Further, the feature quantity extracting means 21 calculates the feature quantity of each of the destination area candidates, that is, the absolute positions x and y of the center points of the destination area candidates for the n destination area candidates (step E6). Further, the old address probabilities, old sending probabilities, and old pattern probabilities are calculated by the address / probability calculating means 23 using the feature amounts obtained in step D5 for the n address area candidates (step E7).

【００５０】ここで、ある宛名領域候補に対して、宛名
領域がもつ旧宛名確率及び宛名領域候補が持つ旧宛名確
率の最大値が確率の範囲Ａに属するものとして、宛名領
域候補が宛名領域であるか差出領域であるか文様領域で
あるかを人手により判定し、宛名領域候補が宛名領域で
あるならば、ｈｐ（Ａ）の値を１増やし、差出領域であ
るならば、ｈｑ（Ａ）の値を１増やし、文様領域である
ならば、ｈｒ（Ａ）の値を１増やす処理を行う（ステッ
プＥ８）。ステップＥ８では、郵便画像のｎ個の宛名領
域候補に対して以上の処理を施すことにより頻度の更新
を行う。以下、ステップＥ３〜Ｅ８の処理を繰り返し行
い、すべての郵便画像について同様の処理を行う。Here, for a certain destination area candidate, it is assumed that the old destination probability of the destination area and the maximum value of the old destination probability of the destination area belong to the probability range A. It is manually determined whether the destination area is a destination area or a pattern area. If the destination area candidate is a destination area, the value of hp (A) is increased by one. If the destination area candidate is a destination area, hq (A) is used. Is increased by one, and if it is a pattern area, a process of increasing the value of hr (A) by 1 is performed (step E8). In step E8, the frequency is updated by performing the above-described processing on the n destination area candidates of the postal image. Hereinafter, the processing of steps E3 to E8 is repeated, and the same processing is performed for all postal images.

【００５１】全ての郵便画像について処理を終了する
と、次の数式（１２）、数式（１３）、数式（１４）を
用いてその特徴量の範囲に対する新宛名確率ｐ′
（Ａ）、新差出確率ｑ′（Ａ）、新文様確率ｒ′（Ａ）
を計算する処理を行う（ステップＥ９）。When the processing is completed for all postal images, the new address probability p 'for the range of the feature amount is calculated using the following equations (12), (13), and (14).
(A), new sending probability q '(A), new pattern probability r' (A)
Is calculated (step E9).

【００５２】ｐ′（Ａ）＝ｈｐ（Ａ）／（ｈｐ（Ａ）＋ｈｑ（Ａ）＋ｈｒ（Ａ））…（12）P ′ (A) = hp (A) / (hp (A) + hq (A) + hr (A)) (12)

【００５３】ｑ′（Ａ）＝ｈｑ（Ａ）／（ｈｐ（Ａ）＋ｈｑ（Ａ）＋ｈｒ（Ａ））…（13）Q ′ (A) = hq (A) / (hp (A) + hq (A) + hr (A)) (13)

【００５４】ｒ′（Ａ）＝ｈｒ（Ａ）／（ｈｐ（Ａ）＋ｈｑ（Ａ）＋ｈｒ（Ａ））…（14）このようにして新確率が得られ、全ての確率の範囲Ａと
ｐ′（Ａ），ｑ′（Ａ），ｒ′（Ａ）を対応させて宛名
確率変換辞書３２に記憶させておく。図１６は２つの宛
名領域候補の旧宛名確率ｐの値をもとにして宛名確率
ｐ′に変換するテーブルであるが、もちろんもとにする
宛名領域候補の数はより多くしても良いし、旧宛名確率
だけでなく旧差出確率や旧文様確率を用いても良い。R ′ (A) = hr (A) / (hp (A) + hq (A) + hr (A)) (14) In this way, a new probability is obtained, and all the probability ranges A and p '(A), q' (A), and r '(A) are stored in the address probability conversion dictionary 32 in association with each other. FIG. 16 is a table for converting the address probabilities p 'based on the values of the old address probabilities p of the two address area candidates. However, the number of the address area candidates based on the table may of course be larger. Alternatively, not only the old addressing probability but also the old sending probability and the old pattern probability may be used.

【００５５】次に、各宛名領域候補毎にその宛名領域候
補に対する新差出確率ｑ′及び新文様確率ｒ′を、旧差
出確率をｑ、旧文様確率をｒとして、数式（１５）、数
式（１６）を用いて計算する。Next, for each destination area candidate, the new submission probability q 'and the new pattern probability r' for the destination area candidate are expressed by the following equation (15), where the old transmission probability is q and the old pattern probability is r. Calculate using 16).

【００５６】ｑ′＝ｑ／（ｑ＋ｒ） …（１５）Q '= q / (q + r) (15)

【００５７】ｒ′＝ｒ／（ｑ＋ｒ） …（１６）例とし
て、宛名・差出確率計算手段２３で図６のように旧宛名
確率ｐ、旧差出確率ｑ、旧文様確率ｒが得られた場合、
宛名確率変換手段２６で、数式（１１）、数式（１５）
及び数式（１６）を用いて計算することにより図１５に
示すように新宛名確率ｐ′、新差出確率ｑ′、新文様確
率ｒ′が得られる。後は新宛名確率、新差出確率、新文
様確率をそれぞれ宛名確率、差出確率、文様確率とみな
して、第１の実施形態と全く同じ処理を行う。例えば、
図１５の例の場合、数式（１）を用いて得点を計算する
と、図１８に示すような結果が得られ、宛名領域候補Ｇ
5 の得点０．０７が最も高得点となる。但し、関数ｆ，
ｇは引数ｕに対し、ｆ（ｕ）＝ｕ，ｇ（ｕ）＝ｕであ
る。R '= r / (q + r) (16) As an example, when the old address probabilities p, old address probabilities q, and old pattern probabilities r are obtained as shown in FIG. ,
Expressions (11) and (15) in the address probability conversion means 26
Then, the new address probability p ', the new sending probability q', and the new pattern probability r 'are obtained as shown in FIG. Thereafter, the new address probability, the new sending probability, and the new pattern probability are regarded as the address probability, the sending probability, and the pattern probability, respectively, and the same processing as in the first embodiment is performed. For example,
In the case of the example of FIG. 15, when the score is calculated using the mathematical expression (1), a result as shown in FIG. 18 is obtained, and the destination area candidate G
A score of 0.07 is the highest score. Where the function f,
g is f (u) = u and g (u) = u for the argument u.

【００５８】[0058]

【発明の効果】以上説明したように本発明によれば、郵
便物の宛名領域候補から宛名領域を検出する際に差出人
の住所等のような宛名ではないが住所の構造を持ってい
る差出領域の差出確率を新たに考慮して、宛名である宛
名確率から差し引くことにより差出領域を出力しにくい
ようにしているので、宛名ではないが住所の構造を持っ
ている領域を宛名領域として誤検出する危険性を低減で
き、宛名領域の検出精度を高めることができる。As described above, according to the present invention, when a destination area is detected from a destination area candidate of a postal matter, the destination area having a structure of an address which is not an address such as an address of a sender is detected. Is newly taken into account and the output of the sender area is made difficult by subtracting it from the address probability, which is an address, so that an area that is not an address but has an address structure is erroneously detected as an address area. The risk can be reduced, and the detection accuracy of the address area can be increased.

[Brief description of the drawings]

【図１】本発明の第１の実施形態の構成を示すブロック
図である。FIG. 1 is a block diagram illustrating a configuration of a first exemplary embodiment of the present invention.

【図２】郵便画像の例を示す図である。FIG. 2 is a diagram illustrating an example of a postal image.

【図３】図２の郵便画像から複数の宛名領域候補を生成
した場合の図である。FIG. 3 is a diagram when a plurality of address area candidates are generated from the postal image of FIG. 2;

【図４】図３の宛名領域候補を画像を除いて示す図であ
る。FIG. 4 is a diagram showing the address area candidates in FIG. 3 excluding an image.

【図５】特徴量抽出手段の特徴量を説明するための図で
ある。FIG. 5 is a diagram for explaining a feature amount of a feature amount extraction unit.

【図６】特徴量抽出手段による宛名領域候補毎の特徴量
の例を示す図である。FIG. 6 is a diagram illustrating an example of a feature amount for each destination area candidate by a feature amount extracting unit.

【図７】宛名・差出確率計算手段による宛名領域候補毎
の各確率の例を示す図である。FIG. 7 is a diagram illustrating an example of each probability for each destination area candidate by a destination / submission probability calculation unit.

【図８】宛名・差出分布辞書の変換テーブルの具体例を
示す図である。FIG. 8 is a diagram showing a specific example of a conversion table of an address / submission distribution dictionary.

【図９】順序決定手段による得点結果の例を示す図であ
る。FIG. 9 is a diagram illustrating an example of a score result obtained by an order determining unit.

【図１０】特徴量抽出手段の処理の流れを示すフローチ
ャートである。FIG. 10 is a flowchart illustrating a flow of a process of a feature amount extracting unit.

【図１１】順序決定手段の処理の流れを示すフローチャ
ートである。FIG. 11 is a flowchart showing a flow of processing of an order determining unit.

【図１２】順序決定手段の他の処理の流れを示すフロー
チャートである。FIG. 12 is a flowchart showing another processing flow of the order determining means.

【図１３】宛名・差出分布辞書の作成方法を示すフロー
チャートである。FIG. 13 is a flowchart illustrating a method of creating an address / submission distribution dictionary.

【図１４】本発明の第２の実施形態の構成を示すブロッ
ク図である。FIG. 14 is a block diagram illustrating a configuration of a second exemplary embodiment of the present invention.

【図１５】宛名確率変換手段による宛名領域候補毎の新
確率の例を示す図である。FIG. 15 is a diagram showing an example of a new probability for each destination area candidate by the destination probability conversion means.

【図１６】宛名確率変換辞書の例を示す図である。FIG. 16 is a diagram showing an example of an address probability conversion dictionary.

【図１７】宛名確率変換辞書の作成方法を示すフトーチ
ャートである。FIG. 17 is a foot chart showing a method of creating an address probability conversion dictionary.

【図１８】順序決定手段による宛名領域候補毎の得点結
果の例を示す図である。FIG. 18 is a diagram illustrating an example of a score result for each destination area candidate by an order determining unit.

[Explanation of symbols]

１入力装置２データ処理装置３記憶装置４出力装置２１宛名領域候補生成手段２２特徴量抽出手段２３宛名・差出確率計算手段２４順序決定手段２５出力候補決定手段２６宛名確率変換手段３１宛名・差出分布辞書３２宛名確率変換辞書 DESCRIPTION OF SYMBOLS 1 Input device 2 Data processing device 3 Storage device 4 Output device 21 Address region candidate generation means 22 Feature amount extraction means 23 Address / submission probability calculation means 24 Order determination means 25 Output candidate determination means 26 Address probability conversion means 31 Address / submission distribution Dictionary 32 Address probability conversion dictionary

Claims

[Claims]

1. A means for generating a plurality of address area candidates including an address area which is an area of address information based on a partial image of a postal image, and the address area is an address area for each generated address area candidate. Means for extracting a feature quantity representing an index when evaluating whether or not the feature quantity and the address probability indicating the probability of being the address area of the address area candidate and the probability of an area having an address structure other than the address of the address area candidate. Means for comparing and storing the sending probability indicating the probability of the area having no address structure other than the address in the destination and the address, and storing the feature amount of each address area candidate extracted by the extracting means. Means for referring to the storage means to convert to the addressing probability, the sending probability and the pattern probability; and calculating the score for each addressing area candidate using the addressing probability and the sending probability among the converted probabilities. Based on Address area detecting device characterized by comprising a means for selecting a predetermined number of address area candidate from among the address area candidates based on have means for determining the order, was determined order.

2. The method according to claim 1, wherein the order determining means sets a destination probability for each destination area candidate to p and a sending probability to each destination area candidate.
q, two kinds of increasing functions f (x) with x given in advance as an argument
And g (x), the score for the address area candidate
2. The address area detection device according to claim 1, wherein the address area is calculated as f (p) -g (q), and the order of the address area candidates is determined in descending order of the score.

3. The order deciding means, for each destination area candidate, p, a destination probability for the destination area candidate, and
q, two kinds of increasing functions f (x) with x given in advance as an argument
And g (x), the score for the address area candidate
2. The destination area detection device according to claim 1, wherein the destination area detection apparatus calculates as f (p) -g (q / (p + q)) and determines the order of the destination area candidates in descending order of the score.

4. The method according to claim 1, wherein the order determining means sets, for each destination area candidate, a destination probability to the destination area candidate and a sending probability to the destination area candidate.
q, two kinds of increasing functions f (x) with x given in advance as an argument
And g (x), the score for the address area candidate
2. The destination area detecting device according to claim 1, wherein the destination area candidate is calculated as f (p) / g (q), and the order of the destination area candidates is determined in descending order of the score.

5. The order deciding means, for each destination area candidate, a destination probability for the destination area candidate, and a sending probability for the destination area candidate.
q, two kinds of increasing functions f (x) with x given in advance as an argument
And g (x), the score for the address area candidate
2. The destination area detecting device according to claim 1, wherein the destination area is calculated as f (p) / g (q / (p + q)), and the order of the destination area candidates is determined in descending order of the score.

6. The order deciding means, for each destination area candidate, a destination probability for the destination area candidate, and a sending probability for the destination area candidate.
q, when a constant given in advance is c, the score for the destination area candidate is calculated as p-cq, and the order of the destination area candidates is determined in descending order of the score.
2. The address area detection device according to 1.

7. The order deciding means, for each destination area candidate, a destination probability for the destination area candidate, and a sending probability for the destination area candidate.
q, when a constant given in advance is c, the score for the destination area candidate is calculated as p-cq / (p + q), and the order of the destination area candidates is determined in descending order of the score. 2. The address area detection device according to claim 1, wherein: