KR100204618B1

KR100204618B1 - Method and system for recognition of character or graphic

Info

Publication number: KR100204618B1
Application number: KR1019960027872A
Authority: KR
Inventors: 가즈따가 야마사끼
Original assignee: 포맨 제프리 엘; 인터내셔널 비지네스 머신즈 코포레이션
Priority date: 1995-08-24
Filing date: 1996-07-11
Publication date: 1999-06-15
Also published as: CN1145494A; KR970012219A; JPH0962787A; CN1100304C

Abstract

온라인으로 입력된 문자의 필기 순서이외의 정보의 정보를 이용하여 문자인식을 행하는 방법을 제공하는 데 있다.The present invention provides a method of character recognition using information of information other than the writing order of characters input online.

온라인으로 기입영역상에 입력된 문자 또는 도형을 인식하는 방법에 있어서, 온라인으로 입력된 문자 또는 도형을 샘플링하여 샘플링 정보를 추출하는 단계와, 샘플링 정보에 기초하여 입력된 문자 또는 도형으로부터 복수의 국소 영역을 결정하는 단계와, 각각의 국소 영역마다 특징 벡터를 구하는 단계와, 각각의 특징 벡터의 기입 영역상의 배치에 기초하여 벡터열을 구하는 단계와, 벡터열에 기초하여 입력된 문자 또는 도형을 인식하는 단계를 갖는 방법.A method of recognizing a character or figure input on a writing area online, the method comprising: sampling a character or figure input online and extracting sampling information; and a plurality of localities from the input character or figure based on the sampling information. Determining a region, obtaining a feature vector for each local region, obtaining a vector sequence based on the arrangement on the writing region of each feature vector, and recognizing input characters or figures based on the vector sequence How to have a step.

Description

Recognition method and system of character or figure

본 발명은, 문자 또는 도형의 인식방법 및 시스템에 관한 것으로, 특히 온라인으로 입력된 수서 한자 등의 인식방법 및 시스템에 관한 것이다.The present invention relates to a method and system for recognizing a character or a figure, and more particularly, to a method and system for recognizing a handwritten Chinese character inputted online.

온라인으로 입력된 수서 문자 또는 도형을 인식하는 자동 시스템으로서, 많은 수법이 종래 제안되고 있다. 예를 들어, 일본국 특원평4-220410호(미합중국 특허 제5,343,537호)는 한 개 또는 몇 개의 특징 벡터 공간, 각 공간에 있어서의 가우스ㆍ모델링 및 모든 공간에 있어서의 연관된 모든 프로토타입의 기여를 감안한 혼합 복합에 있어서 수서의 적당한 표시에 기초하여 수서된 텍스트의 자동 인식을 행하는 방법 및 장치를 개시하고 있다.As an automatic system for recognizing handwritten characters or figures input online, many techniques have been proposed in the past. For example, Japanese Patent Application No. 4-220410 (U.S. Patent No. 5,343,537) discloses the contribution of one or several feature vector spaces, Gaussian modeling in each space, and all associated prototypes in all spaces. Disclosed is a method and apparatus for automatically recognizing a written text on the basis of a proper display of a written text in the mixed composite considered.

구체적으로는, 전자 테블릿 상의 스타일러스에 의한 필기자의 필기에 응답하여, 필기자로부터의 수서 입력을 포함한 공지된 문자를 샘플링한다. 이 샘플링된 공지된 문자의 수서 공간에 있어서의 파라메터ㆍ벡터 표시를 행하는, 샘플링된 공지된 문자의 수서 공간에 있어서의 파라메터ㆍ벡터 표시의 제공에 응답하여, 수서 프로토타입을 제공한다. 그리고, 테블릿 상의 스타일러스에 의한 필기자의 필기에 응답하여, 필기자로부터 인식될 수서 입력을 포함한 미지의 문자를 샘플링하여 샘플링된 미지의 문자의 수서 공간에 있어서의 파라메터ㆍ벡터 표시를 행한다. 후보 문자의 리스트를 작성하는 것을 포함하는, 수서 프로토타입과 상기 샘플링된 미지의 문자의 수서 공간에 있어서의 파라메터ㆍ벡터 표시와의 탄도 비교(a ballistic comparison)에 기초하여 수서 프로토타입의 적어제1도개가 미지의 문자로서 인식되는 후보 문자인 유사성을 평가된다. 그리고, 후보 문자 리스트의 탄도 분석을 행하여, 인식될 수서 입력을 포함한 샘플링된 미지의 문자를 인식하고 있다.Specifically, in response to the writing of the writer by the stylus on the electronic tablet, a known character including a handwriting input from the writer is sampled. A handwriting prototype is provided in response to the provision of the parameter-vector display in the handwriting space of the sampled known characters, which performs the parameter-vector display in the handwriting space of the sampled known characters. In response to the writing by the stylus on the tablet, unknown characters including the handwriting input recognized by the stylist are sampled to perform parameter vector display in the sample space of the sampled unknown characters. Writing a handwriting prototype based on a ballistic comparison between a handwriting prototype and a parameter-vector representation in the handwriting space of the sampled unknown character, including making a list of candidate characters. Similarity is evaluated in which the drawing is a candidate character recognized as an unknown character. Then, the ballistic analysis of the candidate character list is performed to recognize sampled unknown characters including the recognized text input.

또한, 일본국 특원평4-328128호는, 감춰진 마코브 모델(a hidden Markov model)을 이용하여 수서 인식을 행하는 방법 및 시스템을 개시하고 있다.In addition, Japanese Patent Application Laid-Open No. 4-328128 discloses a method and system for performing handwriting recognition using a hidden Markov model.

일반적으로, 온라인 수서 문자 인식은, 문자가 정확한 필기 순서로 쓰여져 있다고 가정할 수 있는 경우에 대해서는 높은 확률로 문자를 인식할 수 있다. 상술한 종래 기술도 이와 같은 필기 순서, 즉 스트로크 순서(stroke order)라고 하는 시간적인 정보에 기초하여 문자 인식을 행하고 있다. 그러나, 한자 등의 화수(strokes)가 많은 문자에서는 쓰는 사람에 따라 필기 순서가 달라지기 때문에, 동일한 문자이면서 시간적인 정보도 서로 다르다. 따라서, 같은 문자도 그 필기 순서에 의해서는 잘못으로 인식되는 경우가 있거나, 또는 문자 인식은 입력된 문자로부터 추출ㆍ작성된 모델을 문자 인식 시스템속의 사전에 기억된 다수의 모델과 비교하여 각각의 스코어를 구함으로써 달성된다. 따라서, 상술한 스트로크의 순서가 다른 것을 고려하여 임의의 1개 문자에 대해 예상되는 몇 개 패턴의 모델을 사전에 기억하여 두면, 사전 내용이 필연적으로 팽창하게 된다. 이것은, 사전 용량의 증대뿐 아니라, 인식 속도의 저하에도 관련되기 때문에 바람직하지 않다.In general, online handwritten character recognition can recognize characters with a high probability when it can be assumed that the characters are written in the correct handwriting order. The above-described prior art also performs character recognition on the basis of such handwritten order, that is, temporal information called a stroke order. However, in characters with many strokes such as Chinese characters, the writing order varies depending on the person writing the letter, so that the same letter and the temporal information are different. Therefore, the same letter may be recognized as wrong in the writing order, or the character recognition may be performed by comparing the models extracted and written from the input letters with a plurality of models stored in the text recognition system in advance. Is achieved. Therefore, if the models of several patterns expected for any one character are stored in advance in consideration of the difference in the order of the above-described strokes, the dictionary contents inevitably expand. This is not preferable because it is related not only to the increase of the preliminary capacity but also to the decrease of the recognition speed.

본 발명은 상술한 문제점을 감안하여 이루어진 것으로, 그 목적은 입력 필기 순서와 화수에 제한되지 않은 인식 수법을 제안하는 것이다. 이와 같은 수법은 문자의 화수가 영어 등과 비교하여 많은 일본어, 특히 한자의 인식에 대해 효과적이다.SUMMARY OF THE INVENTION The present invention has been made in view of the above problems, and an object thereof is to propose a recognition method that is not limited to the input handwriting order and the number of conversation. This method is effective for the recognition of many Japanese, especially Chinese characters, compared to English, such as the number of characters.

또한, 본 발명의 다른 목적은, 문자의 필기 순서 이외의 정보, 즉, 시간적인 정보이외의 정보를 이용하여 문자 인식을 행하는 것이다.Another object of the present invention is to perform character recognition using information other than the writing order of characters, that is, information other than temporal information.

제1도은 본 실시예에 있어서의 수서 인식 시스템(the handwriting-recognition system)의 블록도.1 is a block diagram of the handwriting-recognition system in this embodiment.

제2도는 본 실시예에 있어서의 수서 인식 방법의 순서도.2 is a flowchart of a method of recognizing a handwriting in the present embodiment.

제3도는 특징 벡터의 벡터 열을 구하는 점을 상술한 순서도.3 is a flowchart in which points of vector columns of feature vectors are obtained.

제4a도 및 b도는 테블릿(tablet)의 기입영역에「と」가 입력된 경우의 점의 집합 및 국소 영역을 도시하는 도면.4A and 4B show a set of points and a local area when "to" is input to a tablet writing area.

제5도는 각각의 국소 영역에 있어서의 구성하는 점을 도시하는 표.FIG. 5 is a table showing points constituting in each local area. FIG.

제6도는 각각의 국소 영역의 y좌표를 도시한 표.6 is a table depicting the y coordinate of each local area.

* 도면의 주요부분에 대한 부호의 설명* Explanation of symbols for main parts of the drawings

102 : 수서 인식 프로그램 104 : 프런트 엔드102: Recognition Program 104: Front End

108 : 모델링 요소 110 : 컴퓨터 플랫폼108: Modeling Elements 110: Computer Platforms

112 : 오퍼레이팅 시스템 114 : 마이크로 명령코드112: operating system 114: micro instruction code

116 : 하드웨어 요소 118 : 랜덤 액세스 메모리(RAM)116 hardware element 118 random access memory (RAM)

120 : 중앙처리장치(CPU) 122 : 입출력 인터페이스120: central processing unit (CPU) 122: input and output interface

124 : 단말기 126 : 전자 입력 테블릿124: terminal 126: electronic input tablet

128 : 데이터 기억 디바이스 130 : 프린터128: data storage device 130: printer

본 발명은, 기입 영역상에 입력된 문자 또는 도형을 인식하는 방법에 있어서, 입력된 문자 또는 도형을 샘플링하여 샘플링 정보를 추출하는 단계; 샘플링 정보에 기초하여 입력된 문자 또는 도형으로부터 복수의 국소 영역을 결정하는 단계; 각각의 국소 영역마다 특징 벡터를 구하는 단계; 각각의 특징 벡터의 기입 영역상의 배치에 기초한 벡터 열을 구하는 단계; 및 벡터 열에 기초하여 입력된 문자 또는 도형을 인식하는 단계를 갖는 방법을 제공한다. 여기서, 기입 영역이라는 것은, 인식하고자 하는 문자 또는 도형이 존재하는 영역을 말한다. 그리고, 이 영역상의 국소 영역의 위치에 기초하여 특징 벡터를 순서에 따라 나열함으로써 특징 벡터의 벡터 열을 구할 수 있다. 이 기입 영역은, 온라인 문자 인식에 대해서는 일반적으로 문자가 입력되는 프레임 영역이나 하선 영역(a frame region or underline region)등이다.According to an aspect of the present invention, there is provided a method of recognizing a character or a figure input on a writing area, the method comprising: extracting sampling information by sampling the input character or figure; Determining a plurality of local areas from the input characters or figures based on the sampling information; Obtaining a feature vector for each local region; Obtaining a vector column based on the arrangement on the writing area of each feature vector; And recognizing an input character or figure based on the vector column. Here, the writing area refers to an area in which a character or figure to be recognized exists. Then, the vector sequence of the feature vectors can be obtained by arranging the feature vectors in order based on the position of the local region on the region. This writing area is generally a frame area or a underline area where characters are input for online character recognition.

여기서, 상기 벡터 열을 구하는 단계는, 기입 영역상의 한쪽 방향으로부터 대향하는 방향에 있어서의 국소 영역이 배치된 순서에 기초하여 결정되도록 하는 것이 바람직하다. 보다 구체적으로는, x, y좌표로 표현되는 기입 영역에 대해, 벡터 열에 대응하는 국소 영역을 참조하여 그 y좌표가 큰 것으로부터 순차적으로 나열함으로서 구하는 것이 바람직하다.Here, it is preferable that the step of obtaining the vector columns is determined based on the order in which the local regions in the opposite directions from one direction on the write region are arranged. More specifically, it is preferable to obtain the writing area represented by the x and y coordinates by referring to the local area corresponding to the vector column and arranging the y coordinates sequentially from the larger one.

또한, 다른 발명은 온라인으로 기입 영역상에 입력된 문자 또는 도형을 인식하는 방법에 있어서, 온라인으로 입력된 문자 또는 도형을 샘플링하여 샘플링 정보를 추출하는 단계; 샘플링 정보에 기초하여 입력된 문자 또는 도형으로부터 복수의 국소 영역을 결정하는 단계; 각각의 국소 영역마다 특징 벡터를 구하는 단계; 온라인으로 기입된 문자 또는 도형의 스트로크의 순서에 기초하여 복수의 특징 벡터로 이루어지는 제1 벡터 열을 구하는 단계; 각각의 특징 벡터의 기입 영역상의 배치에 기초하여 복수의 특징 벡터로 이루어지는 제2 벡터 열을 구하는 단계; 및 제1 벡터 열 및 제2 벡터 열에 기초하여 입력된 문자 또는 도형을 인식하는 단계를 갖는 방법을 제공한다.In another aspect, the present invention provides a method for recognizing a character or a figure input on an online writing area, the method comprising: extracting sampling information by sampling a character or a figure input online; Determining a plurality of local areas from the input characters or figures based on the sampling information; Obtaining a feature vector for each local region; Obtaining a first vector sequence consisting of a plurality of feature vectors based on a sequence of strokes of characters or figures written online; Obtaining a second vector column consisting of a plurality of feature vectors based on the arrangement on the writing area of each feature vector; And recognizing an input character or figure based on the first vector column and the second vector column.

여기서, 상기 제2 벡터 열을 구하는 단계는, 기입 영역상의 한쪽 방향으로부터 대향하는 방향에 있어서의 국소 영역이 배치된 순서에 기초하여 결정되도록 하는 것이 바람직하다.Here, it is preferable that the step of obtaining the second vector column is determined based on the order in which the local regions are arranged in opposite directions from one direction on the writing region.

또한, 상기 인식하는 단계는, 감춰진 마코브 모델법 또는 DP 매칭법을 이용하여 상기 입력된 문자 또는 도형을 인식하는 것이 바람직하다.The recognizing may include recognizing the input character or figure using a hidden Markov model method or a DP matching method.

또한, 다른 발명은 온라인으로 기입 영역상에 입력된 문자 또는 도형을 인식하는 단계에 있어서, 온라인으로 입력된 문자 또는 도형을 샘플링하여 샘플링 정보를 추출하는 수단; 샘플링 정보에 기초하여 입력된 문자 또는 도형으로부터 복수의 국소 영역을 결정하고 각각의 국소 영역마다 특징 벡터를 구하는 수단; 각각의 특징 벡터의 기입 영역상의 배치에 기초한 벡터 열을 구하는 수단; 벡터열에 기초하여 입력된 문자 또는 도형을 인식하는 수단을 갖는 시스템을 제공한다.In another aspect, the present invention provides a step of recognizing a character or a figure input on a writing area online, comprising: means for sampling the character or figure input online and extracting sampling information; Means for determining a plurality of local areas from input characters or figures based on sampling information and obtaining a feature vector for each local area; Means for obtaining a vector column based on the arrangement on the writing area of each feature vector; A system having means for recognizing input characters or figures based on vector strings is provided.

여기서, 상기 벡터 열을 구하는 수단은, 기입 영역상의 한쪽 방향으로부터 대향하는 방향에 있어서의 상기 국소 영역이 배치된 순서에 기초하여 벡터 열을 구하는 것이 바람직하다.Here, it is preferable that the means for obtaining the vector column obtains the vector column based on the order in which the local regions are arranged in opposite directions from one direction on the write area.

이하, 본 발명의 바람직한 실시예로서, 온라인 문자 인식을 예로 설명한다. 제1도은 본 발명에 있어서의 수기 인식 시스템의 블록도이다. 이 시스템은 컴퓨터 플랫폼(110)을 포함한다. 컴퓨터 플랫폼(110)은 랜덤 액세스 메로리(RAM)(118), 중앙처리장치(CPU)(120) 및 입출력 인터페이스(122) 등으로 이루어지는 하드웨어 요소(116)를 갖는다. 컴퓨터 플랫폼(110)은, 오퍼레이팅 시스템(112)을 가지며, 마이크로명령 코드(114)를 가질 수도 있다.Hereinafter, as an exemplary embodiment of the present invention, online character recognition will be described as an example. 1 is a block diagram of a handwriting recognition system according to the present invention. The system includes a computer platform 110. Computer platform 110 has hardware elements 116 consisting of random access memory (RAM) 118, central processing unit (CPU) 120, input / output interface 122, and the like. Computer platform 110 has operating system 112 and may have microinstruction code 114.

컴퓨터 플랫폼(110)에는, 온라인으로 기입 영역상에 입력된 문자 또는 도형을 샘플링하여 샘플링 정보를 추출하는 수단, 예를 들어 전기 기입 테블릿(126)이 접속되어 있다. 이 테블릿(126)은, 사용자가 입력 펜을 사용하여 기입 영역에 원하는 문자나 도형을 기입하기 위한 것이다. 단말기(124), 데이터 기억 디바이스(128) 및 프린터(130) 등의 많은 주변장치도 접속되어 있다.The computer platform 110 is connected to a means for extracting sampling information by sampling characters or graphics input on the writing area online, for example, an electrical writing tablet 126. This tablet 126 is for a user to write a desired character or figure in a writing area using an input pen. Many peripheral devices, such as the terminal 124, the data storage device 128, and the printer 130, are also connected.

플랫폼 디바이스(110)에서는, 수서 인식 프로그램(102)이 작동한다. 수서 인식 프로그램(102)에 의해 프런트 에지(104), 재배치 처리 기구(106) 및 모델링 요소(108)가 작동한다. 여기서, 프런트 에지(104)는 샘플링 정보에 기초하여 입력된 문자 또는 도형으로부터 복수의 국소 영역을 결정함과 동시에, 각각의 국소 영역마다 특징 벡터를 구하기 위한 것이다. 재배치 처리 기구(106)는 각각의 특징 벡터의 기입 영역상의 배치에 기초한 벡터 열을 구하기 위한 것이다. 이것은, 예를 들어 기입 영역상의 한쪽 방향으로부터 대향하는 방향에 있어서의 국소 영역이 배치된 순서에 기초하여 구할 수 있다. 또한, 모델링 요소(108)에는 벡터 열에 기초하여 사용자에 의해 입력된 문자 또는 도형을 인식하기 위한 것이다.In the platform device 110, the recipient recognition program 102 operates. The aquatic recognition program 102 operates the front edge 104, the relocation processing mechanism 106, and the modeling element 108. Here, the front edge 104 is used to determine a plurality of local areas from the input characters or figures based on sampling information, and to obtain a feature vector for each local area. The relocation processing mechanism 106 is for obtaining a vector column based on the arrangement on the writing area of each feature vector. This can be obtained, for example, on the basis of the order in which the local areas in the opposite directions from one direction on the writing area are arranged. In addition, the modeling element 108 is for recognizing a character or a figure input by a user based on the vector column.

제2도는 본 실시예에 있어서 수서 인식 방법의 순서도이다. 우선, 온라인으로 입력된 문자 또는 도형을 샘플링하여 샘플링 정보를 추출한다(단계 201). 이것은, 사용자가 전자 입력 테블릿(126)의 기입 영역상에 기입한 문자 또는 도형이 샘플링의 대상으로 된다.2 is a flowchart of a method of recognizing a handwriting in the present embodiment. First, sampling information is extracted by sampling characters or graphics input online (step 201). This is because the character or figure written by the user on the writing area of the electronic input tablet 126 is subjected to sampling.

이 샘플링 정보에 기초하여, 입력된 문자 또는 도형으로부터 복수의 국소 영역이 결정된다(단계 202). 즉, 단계(201)에 의해 추출된 입력의 각 샘플점은, 기입 영역에 있어서의 좌표(X_n, Y_n)로 정의되는 점이다. 이들 점의 간격은 필기자의 필기 속도가 반드시 일정하지 않기 때문에 통상은 같지 않게 사용자가 기입한 속도의 함수로서 표시된다. 여기서, 각 샘플점은 정규화하여 등간격인 점 p(X_m, Y_m)로 한다. 이와 같이, 테블릿(126)에 의해 포획된, 시간에 의존하는 각 샘플점은 모든 점이 등간격인 시간에 의존하지 않는 표현으로 변환된다.Based on this sampling information, a plurality of local areas are determined from the input characters or figures (step 202). That is, each sample point of the input extracted by step 201 is a point defined by the coordinates (X _n , Y _n ) in the writing area. The intervals between these points are usually displayed as a function of the speed written by the user because they are not necessarily constant. Here, each sample point is a point p (X _m, Y _m) gyeokin normalized deunggan. As such, each time-dependent sample point captured by tablet 126 is transformed into a time-dependent representation where all points are equidistant.

다음에, 정규화된 등간격인 점 p에 기초하여 국소 영역이 결정된다. 여기서 국소 영역이라는 것은, 임의의 문자, 도형을 인식하는 경우에 필요로 되는 특징적인 일부분을 말한다. 일반적으로, 국소 영역은 스트로크의 시점, 종점 또는 x, y좌표의 극대값ㆍ극소값을 포함하는 영역인 경우가 많다. 국소 영역은 같은 수의 점(예를 들어 2K+1개)으로 구성되도록 결정되어 있고, 문자ㆍ도형은 복수의 국소 영역을 갖는다.Next, the local region is determined based on the point p which is the normalized equal interval. Here, the local area means a characteristic part which is required when recognizing arbitrary characters and figures. In general, the local area is often an area including the maximum value and the minimum value of the start point, the end point, or the x and y coordinates of the stroke. The local area is determined to be composed of the same number of dots (for example, 2K + 1), and the character and figure have a plurality of local areas.

단계(202)에 의해 구해진 복수의 국소 영역의 각각에 대해, 대응하는 특징 벡터를 구한다(단계 203). 또한, 이 단계는 단계(202)와 함께 제1도에 도시한 시스템의 프런트 엔드(104)에 대해 행해진다.For each of the plurality of local regions obtained by step 202, a corresponding feature vector is obtained (step 203). This step is also performed with the step 202 on the front end 104 of the system shown in FIG.

여기서, 특징 벡터라는 것은 임의의 국소 영역의 특징, 예를 들어 그 영역내의 각 점의 좌표나 스트로크의 곡선과 같은 점간의 관계 등의 파라메터를 갖는 벡터를 말한다. 국소 영역내의 임의의 점이 같은 영역내의 다른 점으로부터 어느 정도 변위하고 있는지를 도시하는 파라메터 등이 포함되어 있다.Here, the feature vector refers to a vector having a parameter such as a characteristic of an arbitrary local region, for example, a relationship between points such as coordinates of each point in the region or a curve of a stroke. Parameters including how many points in the local area are displaced from other points in the same area are included.

또한, 임의의 문자에 있어서의 특징 벡터의 수가 N개인 경우는, N개의 특징 벡터의 순서를 도시하는 벡터 열도 이 단계에서 결정되게 된다. 그러나, 이 벡터 열의 순서는 이하에 서술되는 바와 같이 기입 영역상의 배치에 기초하여 순서의 교체가 행해진다. 따라서, 이 단계에서 구해진 것은 초기 벡터 열(the primary vectorseries)이기 때문에, 그 벡터의 순서는 어느 순서라도 관계없다. 여기서 구해지는 초기 벡터 열은 일반적으로는 온라인으로 입력된 문자나 도형의 스트로크 순서, 즉 시간적인 순서에 기초하여 나열되어 있다.In the case where the number of feature vectors in an arbitrary character is N, a vector sequence showing the order of the N feature vectors is also determined in this step. However, the order of this vector column is changed based on the arrangement on the writing area as described below. Therefore, since what is obtained in this step is the primary vector series, the order of the vectors may be in any order. The initial vector strings obtained here are generally arranged based on the stroke order, i.e., the temporal order, of characters or figures input online.

구해진 복수의 특징 벡터를 기입 영역상의 배치에 기초하여 순서에 따라 벡터 열을 구한다(단계 204). 단계(203)에서, 스트로크의 순서에 따라 이미 벡터 열이 구해져 있는 경우에는 그 순서를 기입 영역상의 배치에 기초하여 교체하는 경우도 포함된다. 이 단계에 대해서는 후에 상술한다. 또한, 이 단계는 재배치 처리 기구(106)에 대해 행해진다.The obtained vector sequence is obtained in order based on the obtained plurality of feature vectors (step 204). In the step 203, when a vector column has already been obtained according to the order of the strokes, the case of replacing the order based on the arrangement on the writing area is also included. This step will be described later. This step is also performed for the relocation processing mechanism 106.

마지막으로, 벡터 열에 기초하여, 입력된 문자 또는 도형을 인식한다(단계 205). 단계(204)에 의해, 기입 영역상의 배치에 기초한 벡터 열을 감춰진 마코브 모델HMM)등으로 모델화함으로써 입력된 문자ㆍ도형을 시스템이 인식한다. 즉, 특징 벡터의 벡터 열을 시스템의 사전에 기억되어 있는 임의 문자의 감춰진 마코브 모델등과 순차적으로 비교하여 스코어를 계산하고, 스코어의 가장 양호한 것을 특정함으로써 문자 등의 인식이 달성된다. 즉, DP 매칭법을 이용하여 인식하는 것도 물론 가능하다. 이와 같이, 본래적으로 필기 순서에 기초하여 문자 인식을 행하는 HMM 등에 본 발명을 적용함으로써 올바른 필기 순서로 쓰여져 있지 않거나 또한 연속하여 쓰여져 있는 문자에 대해서도 높은 확률로 인식할 수 있게 된다.Finally, based on the vector sequence, the input character or figure is recognized (step 205). In step 204, the system recognizes the input characters and figures by modeling a vector sequence based on the arrangement on the writing area with a hidden Markov model HMM). That is, recognition of characters and the like is achieved by sequentially comparing the vector sequence of feature vectors with a hidden Markov model of arbitrary characters stored in the system's dictionary, and specifying the best of the scores. In other words, it is also possible to recognize using the DP matching method. Thus, by applying the present invention to an HMM that performs character recognition based on the writing order, it is possible to recognize a character with a high probability even for characters not written in the correct writing order or written continuously.

상기 단계(204)에 대해 제3도을 더욱 상세히 설명한다. 우선, 단계(202)에서 구해진 임의의 문자ㆍ도형중 한 개의 국소 영역의 y좌표를 구한다(단계 301). 그러나, 임의의 국소 영역을 구성하는 점은 복수개로, 그 y좌표도 소정의 범위를 갖고 있기 때문에, 국소 영역의 y좌표를 한 개로 특정하기 위해서는, 몇 개의 알고리즘이 필요하다. 그래서, 임의의 특징 벡터 f_n에 대응한 국소 영역의 y좌표 y_n을 특정하기 때문에, 국소 영역이 2K+1개의 점으로 구성되어 있는 경우, K+1번째의 점을 국소 영역의 y좌표로 정의한다. 예를 들어, 국소 영역이 5개의 점으로 구성되는 경우에 있어서, 3번째의 점 p의 y좌표를 국소 영역의 y좌표로 한다. 이와 같이 정의함으로써 용이하게 임의의 국소 영역의 y좌표를 특정할 수 있다.3 is described in more detail with respect to step 204 above. First, the y-coordinate of one local area of any character / shape obtained in step 202 is obtained (step 301). However, since there are a plurality of points constituting an arbitrary local area, and the y-coordinate also has a predetermined range, several algorithms are required to specify the y-coordinate of the local area as one. Therefore, since the y coordinate y _n of the local region corresponding to the feature vector f _n is specified, when the local region is composed of 2K + 1 points, the K + 1st point is represented by the y coordinate of the local region. define. For example, when the local area consists of five points, the y coordinate of the third point p is the y coordinate of the local area. By defining in this way, the y-coordinate of an arbitrary local region can be easily specified.

또한, 국소 영역의 y좌표를 이와 같이 정의하는 것은, 특징 벡터의 벡터 열을 기입 영역의 배치에 기초하여 교체하기 위한 평가 기준을 부여하기 때문이다. 따라서, 본 발명은 이 정의 이외에도 여러 가지 방법이 고려될 수 있다. 예를 들어 2K+1개의 점의 각각의 y좌표를 구하고, 그 평가값을 국소 영역의 y좌표로 정의하여도 좋다.The y-coordinate of the local region is defined in this way because it gives an evaluation criterion for replacing the vector column of the feature vector based on the arrangement of the writing region. Accordingly, the present invention may be considered in various ways in addition to this definition. For example, y coordinates of 2K + 1 points may be obtained, and the evaluation value may be defined as the y coordinate of the local area.

또한, 벡터 열의 교체는, 본 실시예에서는 y좌표를 기준으로 하지만, x좌표에 기초하여도 물론 좋다. 본 발명은 온라인으로 기입된 문자의 스트로크 순서에 기초한 특징 벡터의 벡터 열 대신에, 기입 영역상의 배치에 기초한 벡터 열을 가지고 문자 인식을 행하는 것이 중요한 특징이다. 따라서, 기입 영역상의 한쪽 방향에서 대향하는 방향까지 국소 영역의 좌표의 대소 관계가 파악되는 한, 이 방법에 한정되어 있지 않다.Incidentally, the replacement of the vector column is based on the y coordinate in this embodiment, but of course it may be based on the x coordinate. It is an important feature of the present invention to perform character recognition with a vector column based on the arrangement on the writing area, instead of a vector column of feature vectors based on the stroke order of characters written online. Therefore, it is not limited to this method as long as the magnitude relationship of the coordinates of a local area | region is grasped | ascertained from one direction to an opposite direction on a writing area | region.

모든 국소 영역의 y좌표를 구할 것인지를 판단한다(단계 302).「아니오」의 경우에는, 단계(301)로 복귀하여 모든 국소 영역의 y좌표가 결정될 때까지 이것을 반복한다. 이 일련의 단계에 의해 문자 또는 도형이 N개의 국소 영역으로 구성되어 있는 경우에, 그들 국소 영역의 모든 y좌표가 결정된다.It is determined whether the y coordinates of all local regions are to be obtained (step 302). In the case of "no", the process returns to step 301 and is repeated until the y coordinates of all local regions are determined. In the case where a character or figure is composed of N local areas by this series of steps, all y-coordinates of those local areas are determined.

단계(302)에서「예」로 된 경우, 즉 모든 국소 영역의 y좌표가 구해지면 배열 A가 작성된다(단계 303). 배열 A는 특징 벡터와, 이것에 대응하는 국소 영역의 y좌표의 조(f_n, y_n)를 파라메터로 하고 있다.If " Yes " in step 302, i.e., the y-coordinates of all local regions are obtained, array A is created (step 303). The array A has parameters of the feature vector and the set (f _n , y _n ) of the y coordinate of the local region corresponding thereto.

국소 영역의 y좌표의 큰 순서로 배열 A를 교체한다(단계 304). 이 소트는 기존의 정렬 알고리즘으로 행할 수 있다.The array A is replaced in large order of the y coordinate of the local area (step 304). This sort can be done with existing sorting algorithms.

벡터 열을 결정한다(단계305). 소트된 배열 A로부터 그 순서로 특징 벡터를 추출하여 열거한 것을 벡터 열이라고 한다.Determine the vector column (step 305). Extracting and enumerating the feature vectors from the sorted array A in that order is called a vector column.

이상과 같은 순서(단계 301에서 305)에 의해 구해진 벡터 열을 상기 단계(205)의 시스템의 사전 기억 내용과 비교함으로써 문자를 인식할 수 있다.The character can be recognized by comparing the vector sequence obtained by the above procedure (steps 301 to 305) with the pre-stored contents of the system of the step 205.

다음에, 구체적으로 문자 인식의 예로서, 제4도에 도시하는 바와 같이, 테블릿의 기입 영역에 「と」가 입력된 경우에 대해 설명한다. 제4a도와 같이, 기입 영역에 입력된 문자는, 도시하지 않은 횡방향인 x좌표 및 종방향인 y좌표에 따라 특정된 34개의 점의 집합으로서 정규화된다. 시스템은 정규화된 34개의 점의 집합을 근거로 제4b도에 도시하는 바와 같이 6개의 국소 영역(r₁에서 r₆)을 결정한다. 국소 영역은 5개의 점을 포함하도록 구성되어 있고, 입력된 문자의 각 스트로크의 시점ㆍ종점이나 x좌표값 또는 y좌표값이 극대ㆍ극소로 되는 영역을 포함하도록 결정된다.Next, specifically as an example of character recognition, as shown in FIG. 4, the case where "(to)" is input to the writing area of a tablet is demonstrated. As shown in Fig. 4A, the characters input to the writing area are normalized as a set of 34 points specified according to the x-coordinate in the transverse direction and the y-coordinate in the longitudinal direction, not shown. The system determines _six local regions (r ₁ to r ₆ ) as shown in FIG. 4b based on a set of 34 normalized points. The local area is configured to include five points, and is determined to include an area where the start point, the end point, the x-coordinate value, or the y-coordinate value of each stroke of the input character becomes maximum and minimum.

제5도는 이와 같이 특정된 6개의 국소 영역(r₁에서 r₆)과 국소 영역을 구성하는 점 p_n과의 관계를 도시하고 있다. 또한, 여기서, r₁, r₂, r₃, 및 r₆에는, 제4a도에 도시되어 있지 않은 점 p_o가 포함되어 있다. 이와 같은 가공의 점을 설정한 것은, 스트로크의 시점ㆍ종점이 국소 영역의 중심이 되도록 함과 동시에, 시점ㆍ종점을 포함한 국소 영역을 특정하는 알고리즘과 이것 이외의 영역을 알고리즘을 구별하지 않고, 통일적으로 취급하도록 하기 때문이다. 따라서, 제5도로부터도 이해할 수 있는 바와 같이, 국소 영역의 y좌표는 n+1번째의 점, 즉 n=2이기 때문에 실점인 3번째의 점이므로, 3번째의 점이 특정되어 있는 한, 가공의 점 p₀의 구체적인 값은 문제가 되지 않는다.FIG. 5 shows the relationship between the six local regions (r ₁ to r ₆ ) specified in this way and the point p _n constituting the local region. Further, here, the _{_{_{r 1, r 2, r 3}}} , and r _6, includes a point p _o claim 4a are not shown in Fig. Such a processing point is set so that the start point and the end point of the stroke become the center of the local area, and the algorithm for specifying the local area including the start point and the end point and the other areas are not distinguished. This is because it is treated as. Therefore, as can be understood from FIG. 5, the y-coordinate of the local area is the n + 1st point, that is, the third point which is a true point because n = 2, so that the third point is specified The specific value of the point p ₀ does not matter.

국소 영역 r₁에서 r₆에 각각 대응한 특징 벡터를 f₁에서 f₆를 구하여 입력 스트로크의 순서에 기초한 벡터 열을(f₁, f₂, f₃, f₄, f₅, f₆)이라 한다. 이 경우에 각각의 국소 영역의 y좌표를 제6도에 도시한다. 따라서, 특징 벡터와 국소 영역의 y좌표의 요소로 이루어지는 배열 A는 이하와 같이 된다.The feature vector corresponding to the local regions r ₁ to r ₆ is obtained from f ₁ to f ₆ , and a vector sequence based on the sequence of input strokes is referred to as (f ₁ , f ₂ , f ₃ , f ₄ , f ₅ , f ₆ ). do. In this case, the y-coordinate of each local area is shown in FIG. Therefore, the arrangement A consisting of the feature vector and the y-coordinate element of the local region is as follows.

배열 A : (f₁, 100), (f₂, 59), (f₃, 73), (f₄, 27), (f₅, 0), (f₆, 19)Array A: (f ₁ , 100), (f ₂ , 59), (f ₃ , 73), (f ₄ , 27), (f ₅ , 0), (f ₆ , 19)

이것을 배열 알고리즘을 이용하여 y좌표의 큰 것 순서로 소트한다. 이하와 같이 된다.This is sorted in the order of the largest y coordinate using the array algorithm. It becomes as follows.

배열 A : (f₁,100), (f₃,73), (f₂,59), (f₄,27), (f₆,19), (f₅,0)Array A: (f ₁ , 100), (f ₃ , 73), (f ₂ , 59), (f ₄ , 27), (f ₆ , 19), (f ₅ , 0)

이와 같이 하여, 특징 벡터만을 추출하여 y좌표의 크기에 기초하여 벡터 열(f₁, f₃, f₂, f₄, f₆, f₅)가 얻어진다.In this way, only the feature vectors are extracted to obtain vector columns f ₁ , f ₃ , f ₂ , f ₄ , f ₆ , f ₅ based on the magnitude of the y coordinate.

이 얻어진 벡터 열은 제4a도의 「と」의 스트로크의 순서가 역인 경우에 있어서도, 모두 동일하다. 종래에는, 동일한 문자라도 스트로크의 순서에 따라 별도의 벡터 열이 생성되기 때문에, 1개의 문자에 대해서도 순서에 따른 복수의 모델을 사전에 준비하지 않으면 안되었다. 그러나, 이 방법에서는 스트로크의 순서에 관계없이 동일한 특징 벡터의 벡터 열이 얻어지기 때문에, 사전에 갖는 모델은 1개이어도 좋다. 따라서, 사전의 기억 용량을 작게 할 수 있어 인식 시간을 단출할 수 있다.This obtained vector column is the same also in the case where the order of the "to" strokes of FIG. 4A is reversed. Conventionally, since a separate vector sequence is generated according to the order of stroke even for the same character, a plurality of models according to the order have to be prepared in advance even for one character. However, in this method, since a vector sequence of the same feature vector is obtained irrespective of the order of strokes, one model in advance may be used. Therefore, the storage capacity of the dictionary can be reduced, and the recognition time can be shortened.

또한, 실제의 문자 인식에 있어서는 필기 순서가 작은 글자 종류와 많은 종류가 혼재하는 경우가 많다. 이와 같은 장면에서는 기입 영역의 배치에 기초한 본 발명의 인식 방법과, 스트로크의 순서에 기초한 인식 방법이라는 두가지 방법을 이용하여 양쪽의 스코어를 참작함으로써 문자 인식율을 향상시킬 수도 있다. 또한, 필기 순서가 사람에 따라 다를 가능성이 높은 문자에 대해서는 본 발명의 방법에 따라 얻어지는 스코어를 중시하고, 가능성이 낮은 문자에 대해서는 스트로크 순서에 기초한 스코어를 중시한다고 하는 방법도 효과적이다. 이 경우에는, 온라인으로 기입된 문자 또는 도형의 스트로크 순서에 기초하여 특징 벡터가 배열된 벡터 열과 기입 영역상의 국소 영역(특징 벡터에 대응)의 배치에 기초하여 특징 벡터가 배치된 벡터 열의 양쪽을 구하고, 각각에 대해 모델과 비교하여 스코어를 더할 필요가 있다.Moreover, in actual character recognition, many kinds of letters and small kinds of writing order are mixed in many cases. In such a scene, the character recognition rate can be improved by taking into account both scores by using two methods, a recognition method of the present invention based on the arrangement of writing areas and a recognition method based on the order of strokes. Moreover, the method of making emphasis on the score obtained by the method of this invention with respect to the character which the writing order most likely differs with a person is important, and the method of making importance on the score based on stroke order for the character with low possibility. In this case, both the vector column in which the feature vectors are arranged based on the stroke order of the characters or figures written online and the vector column in which the feature vectors are arranged based on the arrangement of the local regions (corresponding to the feature vectors) on the writing area are obtained. For each, we need to add a score against the model.

본 발명에서는, 사용자가 어떠한 필기 순서로 문자를 입력하여도 동일한 특징 벡터가 얻어지기 때문에, 1개의 문자 모델을 삭감할 수 있다. 따라서, 모델이 기억된 사전의 용량을 감소시킬 수 있어 고속으로 문자 인식을 행할 수 있게 된다.In the present invention, since the same feature vector is obtained even if the user inputs characters in any writing order, one character model can be reduced. Therefore, the capacity of the dictionary in which the model is stored can be reduced, and character recognition can be performed at high speed.

Claims

A method of recognizing a character or a figure, the method comprising: extracting sampling information by sampling a character or a figure present in an arbitrary area; Determining a plurality of local areas in the character or figure based on the sampling information; Obtaining a feature vector for each said local region; Obtaining a vector column of the feature vector by arranging each of the feature vectors in order based on the position of the local region on the region where the character or figure is present; And recognizing the character or the figure based on the vector sequence.

The method of claim 1, wherein the obtaining of the vector sequence is based on a position order of each of the local areas from one direction on the plane area where the text exists to the opposite direction. Way.

A method of recognizing a character or figure input online, the method comprising: extracting sampling information by sampling a character or figure input in a writing area; Determining a plurality of local areas from the input characters or figures based on the sampling information; Obtaining a feature vector for each said local region; Obtaining the vector column based on an arrangement on the writing area of each feature vector; And recognizing the character or the figure based on the vector sequence.

The method of claim 3, wherein the step of obtaining the vector sequence is based on the order in which the local regions are arranged from one direction on the writing area to the opposite direction.

The method of claim 1 or 2, wherein the character is a Chinese character.

A method of recognizing a character or a figure input on a writing area online, the method comprising: sampling the input character or figure and extracting sampling information; Determining a plurality of local areas from the input characters or figures based on the sampling information; Obtaining a feature vector for each said local region; Obtaining a first vector sequence consisting of a plurality of said feature vectors based on stroke orders of characters or figures written online; Obtaining a second vector column consisting of a plurality of said feature vectors based on an arrangement on said writing area of each said feature vector; And recognizing the input character or figure based on the first vector column and the second vector column.

7. The method of claim 6, wherein the step of obtaining the second vector column is based on the order in which the local areas are arranged from one direction on the writing area to the opposite direction.

The character or character of claim 1 or 2, wherein the recognizing comprises recognizing the input character or figure using a hidden Markov model method or a DP matching method. Recognition method of figure.

CLAIMS 1. A system for recognizing text or graphics input on a writing area, comprising: means for sampling input text or graphics to extract sampling information; Means for determining a plurality of local regions from the input characters or figures based on the sampling information, and obtaining a feature vector for each of the local regions; Means for obtaining the vector column based on the placement of each feature vector on the write area; And means for recognizing the input character or figure based on the vector sequence.

10. The character or figure recognition according to claim 9, wherein the means for obtaining the vector sequence obtains the vector sequence based on the order in which the local regions are arranged from one direction on the writing area to the opposite direction. system.

10. The system of claim 9, wherein said means for recognition is means for recognition using hidden Markov models or DP matching.