US20120114245A1 - Online Script Independent Recognition of Handwritten Sub-Word Units and Words - Google Patents

Online Script Independent Recognition of Handwritten Sub-Word Units and Words Download PDF

Info

Publication number
US20120114245A1
US20120114245A1 US13/292,145 US201113292145A US2012114245A1 US 20120114245 A1 US20120114245 A1 US 20120114245A1 US 201113292145 A US201113292145 A US 201113292145A US 2012114245 A1 US2012114245 A1 US 2012114245A1
Authority
US
United States
Prior art keywords
sub
module
recognition
word
strokes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US13/292,145
Other versions
US8768062B2 (en
Inventor
Lajish Vimala Lakshmanan
Sunil Kumar KOPPARAPU
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tata Consultancy Services Ltd
Original Assignee
Tata Consultancy Services Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tata Consultancy Services Ltd filed Critical Tata Consultancy Services Ltd
Assigned to TATA CONSULTANCY SERVICES LIMITED reassignment TATA CONSULTANCY SERVICES LIMITED ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: KOPPARAPU, SUNIL KUMAR, LAKSHMANAN, LAJISH VIMALA
Publication of US20120114245A1 publication Critical patent/US20120114245A1/en
Application granted granted Critical
Publication of US8768062B2 publication Critical patent/US8768062B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/32Digital ink
    • G06V30/333Preprocessing; Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • G06F3/0488Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
    • G06F3/04883Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures for inputting data by handwriting, e.g. gesture or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/98Detection or correction of errors, e.g. by rescanning the pattern or by human intervention; Evaluation of the quality of the acquired patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/142Image acquisition using hand-held instruments; Constructional details of the instruments
    • G06V30/1423Image acquisition using hand-held instruments; Constructional details of the instruments the instrument generating sequences of position coordinates corresponding to handwriting

Definitions

  • the present invention relates to a method and system for online script independent recognition of handwritten sub-word unit and words. More particularly the present invention relates to a system and method which enables online recognition of script independent sub-word unit and words by recognizing the written individual strokes prior to recognition of sub-word unit and words.
  • SMS Short messaging service
  • chatting and e-mails are some of the common communication modes used by people all over the world. These communication modes are cost effective, easy and comfortable.
  • SMS short messages
  • e-mails e-mails
  • the present day input mode of communication means often tends to be of less user friendly for individuals originating from places like India especially because of the several existing Indic (Indian) languages scripts. Further communicating by mode of short messages and e-mails in scripts of these languages using the conventional keyboard or mobile keypad is both difficult and time consuming.
  • online handwritten character recognition is of prime importance especially in the context of communicating short messages and e-mails for script independent languages.
  • U.S. Pat. No. 5,550,931 titled “Automatic handwriting recognition using both static and dynamic parameters” provides a method and apparatus for recognizing handwritten characters in response to an input signal from a handwriting transducer.
  • '931 patent provides a feature extraction and reduction procedure that relies on static or shape information, it fails to relate the temporal order in which points are captured by an electronic tablet.
  • U.S. Pat. No. 6,011,865 titled “Hybrid on-line handwriting recognition and optical character recognition system” provides a method and a system for hybrid on-line handwriting recognition and optical character recognition.
  • '865 patent provides a handwriting recognition system and method that employs both online and off-line Hand writing recognition to achieve a recognition accuracy that is improved over the use of either technique when used alone, it fails to provide and perform feature extraction and spatio-temporal analysis for ascertaining if the recognized sequences of strokes are valid or not and thereby which enhances character recognition accuracy.
  • U.S. Pat. Nos. 4,284,975, 6,389,166 and 4,365,235 disclose a pattern recognition system operating in particular for Chinese handwritten characters, online handwritten Chinese character recognition apparatus based on character shapes and a Chinese/Kanji online recognition system consisting of tablet electronics module, a signal filter and segment integration unit, a base stroke classification unit, a symbol element recognition unit and a symbol recognition output table respectively.
  • U.S. Pat. No. 7,587,087 titled “On-line handwriting recognition” discloses a method and a device for on-line handwriting recognition; wherein the use of at least one auxiliary line is displayed on a touch sensitive panel.
  • Each of the auxiliary lines constitutes a portion of more than one character of a character set.
  • a character of a character set is drawn on the touch sensitive panel by completing one of the at least one auxiliary line into the character. The drawn character is recognized on the basis of said completion.
  • '087 patent relates to online handwriting recognition, it fails to recognize the stroke leading to recognition of the character. Instead the character is recognized only on completion of writing the character.
  • US patent application number 20060126936 titled “System, method, and apparatus for triggering recognition of a handwritten shape” discloses a technique that uses repetitive and reliably recognizable parts of handwriting, during digital handwriting data entry, to trigger recognition of digital ink and to repurpose handwriting task area properties.
  • '936 application discloses a system and method for handwritten shape recognition, it fails to provide and perform feature extraction and Spatio-temporal analysis for ascertaining if the recognized sequences of strokes are valid or not and thereby enhancing character recognition accuracy.
  • the recognition technique of '936 patent application attempts to find the character that most closely matches the strokes entered on the tablet and returns the results on run instead of showing results when the user finishes writing.
  • US patent application number 20080159625 titled “System, Method and Apparatus for Automatic Segmentation and Analysis of Ink Stream” discloses a technique that provides for real-time segmentation of handwritten traces during data entry into a computer.
  • '625 application discloses a system and method for automatic segmentation and analysis of ink stream, it fails to provide and perform feature extraction and lexicon based domain specific word knowledge for recognition of characters and words.
  • US patent application number 20090003705 titled “Feature Design for HMM Based Eastern Asian Character Recognition” provides a method for online character recognition of East Asian characters includes acquiring time sequential, online ink data for a handwritten East Asian character, conditioning the ink data to produce conditioned ink data where the conditioned ink data includes information as to writing sequence of the handwritten East Asian character and extracting features from the conditioned ink data where the features include a tangent feature, a curvature feature, a local length feature, a connection point feature and an imaginary stroke feature.
  • '705 application discloses a system and method for Eastern Asian Character Recognition, it fails to identify and construct a primitive stroke database which encompasses the handwritten script and the recognition engine primarily which recognizes their primitives prior to character and word recognition and further does not provide for a lexicon based domain specific word knowledge used for identification of characters and words.
  • PCT application number 2006090404 titled “System, Method, and Apparatus for Accommodating Variability in Chunking the Sub-Word Units of Online Handwriting” provides a technique for automatic real-time segmentation of an ink stream that does not require learning any chunking methodology, style of writing, and/or a predefined symbol set. In one example embodiment, this is achieved by drawing one or more strokes associated with a desired word of a script in one or more boxes provided on a digitizer screen using a pen.
  • '404 application discloses a system and method for online handwriting recognition, it fails to provide for feature extraction and spatio-temporal analysis for ascertaining if the recognized sequences of strokes are valid or not and thereby enhancing character recognition accuracy.
  • the present invention provides a method and system for online script independent recognition of handwritten sub-word unit and words. More particularly the present invention relates to a system and method which enables online recognition of script independent sub-word unit and words by recognizing the written individual strokes prior to recognition of sub-word unit and words.
  • the principle object of the invention is to provide a system and method for online script independent recognition of handwritten sub-word unit and words.
  • Another object of the invention is to enable online script independent recognition of handwritten sub-word unit and words by recognizing the written individual strokes prior to recognition of sub-word unit and words.
  • Yet another object of the invention is to provide a system and method to enable use of online script independent recognition of handwritten sub-word unit and words engine on the communication means.
  • Yet another object of the invention is to provide a system and a method for online script independent recognition of handwritten sub-word unit and words through identification of the primitive strokes and the structure of the written language script.
  • Yet another object of the present invention is to provide overcome the existing challenges in online handwritten recognition for scripts such as but not limited to the large size of the sub-word unit set, larger similarity between different sub-word units in the script and huge variation in writing style, by providing a system and method for focusing on stoke identification prior to sub-word unit and word recognition.
  • Yet another object of the invention is to provide a system and method for identification of a small (compared to the size of the sub-word unit set of the language) set of primitives, which encompasses a script; wherein the handwriting recognition engine primarily recognizes these primitives prior to sub-word unit and words
  • Yet another object of the invention is to provide a system and method for modeling and representing the stroke using Fuzzy Directional Feature (FDF) set.
  • FDF Fuzzy Directional Feature
  • Yet another object of the invention is to define rule sets for sub-word unit formation from a sequence of strokes and make use of the spatio-temporal knowledge of the script to ascertain the validity of the recognized sequence of strokes.
  • Yet another object of the invention is to provide an easy to use and robust system for online script independent recognition of handwritten sub-word unit and words.
  • the present invention discloses a system and method for online script independent recognition of handwritten sub-word unit and words.
  • the user provides input in the form of online script independent handwritten text input via the input means of the communication means.
  • a method for online handwritten sub-word unit recognition on a communication means using an application stored in a memory of a communication means; wherein the said method comprises the processor implemented steps of:
  • FIG. 1 of the present invention illustrates the formation of the phrase “Mera Bharat” in Devanagari script.
  • FIG. 2 of the present invention illustrates the word construction from sub-word unit, which are comprised of primitive strokes.
  • FIG. 3 of the present invention illustrates the steps involved in creating the primitive stroke database.
  • FIG. 4 of the present invention illustrates the architecture of online handwritten sub-word unit and word recognition system of the present invention for Indian languages.
  • FIG. 5 of the present invention illustrates the effect of noise removal using smoothing on a Devanagari sub-word unit.
  • FIG. 6 of the present invention illustrates the critical points extracted on the smoothed Devanagari handwritten sub-word unit.
  • FIG. 7 of the present invention illustrates how the angle ⁇ contributes to two directions (1, 2) with different fuzzy membership values (green and red dot).
  • FIG. 8 of the present invention illustrates a block diagram indicating the steps involved in primitive stroke recognition.
  • FIG. 9 of the present invention illustrates the procedure adapted for word and sub-word unit boundary detection and stroke extraction and segmentation.
  • FIG. 10 of the present invention illustrates the methodology used for sub-word unit recognition.
  • FIG. 11 of the present invention illustrates the approach used by the system of the present invention for word recognition.
  • FIG. 12 of the present invention illustrates an exemplary embodiment, where primitives (m, ou, R, A, Ab, ***) are combined together to form sub-word unit, resulting into words.
  • primitives m, ou, R, A, Ab, ***
  • FIG. 13 of the present invention illustrates a typical online Devanagari Paragraph Data collected using an electronic pen device.
  • the script independent modules comprises of stroke extraction and separation module, pre-processing module, feature extraction module, stroke level recognition module, evaluation and error analysis module, word level recognition module and sub-word unit level recognition module; whereas the script dependent system comprises of spatio-temporal analysis module, rules based sub-word unit creation module and lexicon based word level knowledge or language model dictionary are script dependent modules.
  • the present invention provides a method for online script independent recognition of handwritten sub-word unit and words.
  • sub-word unit refers to a member of alphabetic, characters or composite characters, logographic, and/or phonetic/syllabic character set, which includes syllables, alphabets, numerals, punctuation marks, consonants, consonant modifiers, vowels, vowel modifiers, and special characters, and/or any combination thereof.
  • these sub-word units together form a word.
  • the vowels following a consonant are orthographically indicated by signs called matras to form a consonant vowel combination.
  • the modifier symbols are normally attached to the top, bottom, left or right of the base sub-word unit which is highly dependent on the consonant-vowel pair.
  • the sub-word unit in Devanagari script refers to an “akshara” or “samyukthakshara”.
  • FIG. 1 of the present invention illustrates the formation of the phrase “Mera Bharat” in Devanagari script.
  • the consonants, vowels, matras and the consonant/vowel modifiers constitute the entire alphabet set; wherein these composite sub-word units are joined together by a horizontal line called shirorekha 10 to form words 20 as shown in FIG. 2 .
  • FIG. 2 of the present invention illustrates the word construction from sub-word units, which are comprised of primitive strokes.
  • sub-word units 30 are made up of multiple strokes 40 ; wherein the sub-word units 30 are identified on recognizing a sequence of strokes 40 that make a sub-word unit 30 .
  • the recognition engine primarily recognizes these primitive strokes 40 prior to sub-word units 30 and words 20 .
  • FIG. 3 of the present invention illustrates the steps involved in creating the primitive stroke database.
  • FIG. 13 illustrates a typical online Devanagari Paragraph Data collected using an electronic pen device.
  • the online handwritten text input 90 provided in step 50 is then further analyzed and the individual strokes 40 are separated and modeled using the Fuzzy Directional Features (FDF) 160 .
  • FDF Fuzzy Directional Features
  • the separated and modeled strokes 40 are extracted using the feature extraction technique 170 ; wherein the strokes 40 are extracted based on the feature set i.e. directional properties of the curve connecting two consecutive critical points identified on a stroke using fuzzy directional features (FDF) 160 .
  • FDF fuzzy directional features
  • the identification of the curvature points 150 is considered to be a prerequisite for fuzzy directional feature extraction 160 .
  • the extracted strokes are further classified into 69 primitive strokes and have been further clustered into Devanagari primitive stroke sets to form a primitive stroke database for Devanagari script.
  • stroke level recognition module 180 is script independent even though the shape of the strokes 40 and the number of strokes 40 to form a sub-word unit 30 and further a word 20 might vary from one language script to another language script.
  • FIG. 4 of the present invention illustrates the architecture of online script independent recognition of handwritten sub-word unit and words.
  • the online handwritten text input 90 is provided by the user in his/her handwriting to the communication means using the user interface.
  • the communication means comprises of but does not limit to a mobile phone, a Personal Digital Assistant or PDA, palm-top, mobile digital assistant, computer, laptop, notebook, personal computer or any portable communication device.
  • the input means comprises of an electronic pen or stylus or stick; wherein the script independent handwritten text is written on writing panel or an electronic tablet or on the pressure sensitive touch screen of the communication means.
  • the online handwritten text input 90 provided by the user using the user interface to the communication means essentially is of data format which would be typically a trace of a pen between a pen-down and a pen-up process, which is a set of x, y points and are uniformly sampled in time. Also these set of points are non-uniformly sampled in space.
  • a stroke can be represented by a variable number of 2D points which are in a time sequence.
  • an online script would be represented as
  • n varies depending on the size of the stroke 40 and also the time taken to write the stroke 40 .
  • the online handwritten text input 90 acquired initially is further subjected to spatio-temporal analysis module (B 1 ) 100 of individual strokes.
  • the spatio-temporal analysis module (B 1 ) 100 provides the ability to segment a paragraph of online handwritten text input 90 data into words 20 based on shirorekha identification 10 in case of Devanagari script, followed by identification of matras by identifying the relative position of the strokes 40 .
  • spatio-temporal analysis module 100 may be used to improve the performance of the stroke recognition.
  • the stroke recognition can be constrained to only the reference matras.
  • the individual strokes 40 are then recognized to be one of the 69 primitives of Devanagari script of the primitive stroke database (for Devanagari script) 80 .
  • the online handwritten text input 90 is further subjected to either stroke extraction and separation module (B 2 ) 110 and further to stroke level recognition module (A) 180 or sub-word unit level recognition module (C) 200 or word level recognition module (ID) 220 on the basis of relative spatial information of each identified primitive stroke 40 .
  • FIG. 3 illustrates the steps involved in creating the primitive stroke database.
  • the extracted, separated and identified stroke is further subjected to the algorithms of the pre-processing module 140 ; wherein the extracted, separated and identified strokes are further subjected to noise removal module 120 and optionally to size normalization module 130 processes.
  • the noise in online handwritten text input 90 scripts is inherent, which severely affects the performance of online sub-word unit and word recognition algorithms of the online handwritten text input 90 script as the number of data points is very sparse, especially when the movement of the user interface such as pen or stylus is fast. Whereas the number of data points is very contaminated by high frequency noise in case of slow pen movement.
  • noise that contributes to the noisy data can be further classified into two types based on the following circumstances:
  • FIG. 5 of the present invention illustrates the effect of noise removal using smoothing on a Devanagari sub-word unit.
  • feature extraction module 170 further involves two stages viz, identification of critical (also called as curvature points) points 150 and fuzzy directional feature extraction 160 .
  • Identification of critical points 150 is based on the directional properties of the curve and the critical points occur at the points on the handwritten strokes where there is a large curvature change; wherein identification of the critical points is a prerequisite for feature extraction.
  • the critical points are extracted from the smoothed handwritten text data provided by the user.
  • x i and y i sequences are treated separately and the critical points are computed for each of these sequences.
  • the first difference x′ i is calculated as
  • x′ is used to compute the critical point in x sequence.
  • Point i is considered as a critical point if and only if
  • the critical points for the y sequence are calculated in the similar way.
  • the final list of critical points is the union of all the points marked as critical points in both the x and the y sequence.
  • FIG. 6 of the present invention illustrates the critical points extracted on the smoothed Devanagari handwritten sub-word unit.
  • another prerequisite for feature extraction is fuzzy directional feature extraction 160
  • the algorithm for fuzzy directional feature extraction 160 is as follows:
  • k be the number of critical points (denoted by c 1 , c 2 , . . . , c k ) extracted from a stroke of length n; wherein usually k ⁇ n.
  • the k critical points form the basis for extraction of the fuzzy directional features. Firstly the angle between two critical points, say c l and c l+1 , is computed as
  • ⁇ l tan - 1 ⁇ ( y l - y l + 1 x l - x l + 1 )
  • ⁇ t, k-1 is the angle between two consecutive critical points (where k is the total number of critical points) in a handwritten primitive and d 1 . . . 8 is the respective direction.
  • the fuzzy membership values assigned to each direction are represented as m 1,k-1 1,2 and the corresponding feature vector values as f 1 . . . and f 8 . Further, the sum of the membership functions of a particular row as represented in Table 2 is always 1.
  • FDF 160 is calculated by taking average across the columns, so as to form a vector of dimension eight.
  • the mean is further calculated as follows;
  • these mean values are used to construct 8 directional FDF 160 to represent a stroke 40 .
  • the membership function associates the angle between two critical points into two directions with different membership values. In the commonly used Directional Features only one direction is associated with each 6 (the angle between two consecutive critical points).
  • FIG. 7 of the present invention illustrates how the angle ⁇ is contributing to the two directions (1, 2) with different fuzzy membership values (green and red dot).
  • the extracted handwritten text data is further subjected to stroke level recognition module (A) 180 .
  • FIG. 8 of the present invention illustrates a block diagram indicating the steps involved in primitive stroke recognition.
  • the stroke level recognition module (A) 180 is described in details in FIG. 8 ; wherein the method of recognizing primitive strokes consists of the following two phases:
  • the primitive stroke extraction and separation (B 2 ) 110 and the stroke level recognition (A) 180 are performed prior to the sub-word unit level recognition 200 .
  • the rule based sub-word unit formation 210 compares the sub-word unit boundary information along with the relative position of the strokes used to in ascertain if the recognized sequence of strokes are valid or not. If the strokes are valid then, the sub-word unit level recognition 200 further processes the processed online handwritten text data 90 to word recognition level (D) 220 . If the strokes are not valid, then the strokes are further evaluated and analyzed for any possible errors using the evaluation and error analysis process 190 .
  • FIG. 9 of the present invention illustrates the procedure adapted for word and sub-word unit boundary detection and stroke extraction and segmentation.
  • the online handwritten text input 90 provided by the user using the user interface to the communication means.
  • the online handwritten text input 90 acquired initially is further subjected to spatio-temporal analysis module (B 1 ) 100 of individual strokes 40 .
  • the spatio-temporal analysis module (B 1 ) 100 provides the ability to segment a paragraph of online handwritten text input 90 data into words 20 based on shirorekha identification 10 in case of Devanagari script, followed by identification of matras by identifying the relative position of the strokes 40 .
  • the spatio-temporal analysis module 100 may be used to improve the performance of the stroke recognition. For example, once a matra is identified based on the spatial position of the stroke, the stroke recognition can be constrained to only the reference matras. The individual strokes are then recognized to be one of the 69 primitives of the primitive stroke database for Devanagari script.
  • the Devanagari handwritten text 90 can be further subjected to either stroke extraction and segmentation (B 2 ) 110 and further to stroke level recognition module 180 or sub-word unit recognition level module 200 or word level recognition module 220 on the basis of relative spatial information of each identified primitive stroke 40 .
  • the word boundaries are identified using the word boundary segmentation module 260 from the online handwritten text input 90 and the output words B 11 are then further subjected to sub-word unit boundary segmentation module 270 to identify the sub-word units B 12 .
  • each stroke is segmented which is further send to primitive stroke recognition module 180 for recognizing individual strokes.
  • the language script specific rules and spatio-temporal information is used to detect the word and sub-word unit boundaries.
  • FIG. 10 of the present invention illustrates the methodology used for sub-word unit recognition.
  • the online handwritten text input 90 provided by the user using the user interface to the communication means.
  • the online handwritten text input 90 acquired initially is further subjected to spatio-temporal analysis module (B 1 ) 100 of individual strokes 40 .
  • the spatio-temporal analysis module 100 provides the ability to segment a paragraph of online handwritten text input 90 data into words 20 based on shirorekha identification 10 in case of Devanagari script, followed by identification of matras by identifying the relative position of the strokes 40 .
  • the spatio-temporal analysis module 100 may be used to improve the performance of the stroke recognition. For example, once a matra is identified based on the spatial position of the stroke, the stroke recognition can be constrained to only the reference matras. The individual strokes are then recognized to be one of the 69 primitives of the primitive stroke database for Devanagari script.
  • the online handwritten text input 90 is further subjected to stroke extraction and segmentation module 110 and further to stroke level recognition module 180 on the basis of relative spatial information of each identified primitive stroke 40 .
  • each stroke is segmented which is further send to primitive stroke recognition module 180 for recognizing individual strokes.
  • the primitive stroke extraction and separation module (B 2 ) 110 and the stroke level recognition module (A) 180 are performed prior to the sub-word unit level recognition module 200 .
  • the rule based sub-word unit formation 210 compares the sub-word unit boundary information along with the relative position of the strokes used to in ascertain if the recognized sequence of strokes are valid or not. If the strokes are valid then, the sub-word unit level recognition 200 further processes the processed online handwritten text data 90 to word recognition level ( 0 ) 220 . If the strokes are not valid, then the strokes are further evaluated and analyzed for any possible errors using the evaluation and error analysis process 190 .
  • FIG. 11 of the present invention illustrates the approach used by the system of the present invention for word recognition.
  • the online handwritten text input 90 provided by the user using the user interface to the communication means.
  • the online handwritten text input 90 acquired initially is further subjected to spatio-temporal analysis module (B 1 ) 100 of individual strokes 40 .
  • the spatio-temporal analysis module 100 provides the ability to segment a paragraph of online handwritten text input 90 data into words 20 based on shirorekha identification 10 in case of Devanagari script, followed by identification of matras by identifying the relative position of the strokes 40 .
  • the word recognition module 220 which is based on a lexicon based word knowledge 240 making it adoptable for any Indian language. The steps involved are illustrated in the FIG. 11 ; wherein the output of the sub-word unit recognition level module (C) 200 as described in FIG. 10 , and the word boundary information (B 1 ) 100 as described in FIG. 9 along with the primitive stroke extraction results (B 2 ) 110 from FIG. 9 are used for word recognition in the word recognition module (D) 220 .
  • the words in the lexicon based word knowledge 240 are an important aspect of achieving acceptable accuracy for online handwritten sub-word unit and word recognition. Further the lexicon based word knowledge 240 is used for verifying and improving the word recognition results as represented in FIG. 11 .
  • FIG. 12 of the present invention illustrates an exemplary embodiment, where primitive strokes 40 (m, ou, R, A, Ab, ***) are combined together to form sub-word units 30 , resulting into words 20 .
  • the formation of Devanagari sub-word unit by the concatenation of sequence of primitives strokes 40 and hence words 20 by a sequence of sub-word units 30 is illustrated in FIG. 12 ; wherein the Devanagari sub-word units are recognized based on the primitives and a sequence of primitives are analyzed to identify a sub-word unit.
  • the rules for sub-word unit formation from a sequence of strokes are formed for each sub-word unit in Devanagari.
  • the rules set along with the primitive recognition results for the recognition of sub-word units are used.
  • the present invention provides a system and a method for online script independent handwritten sub-word unit and word.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Human Computer Interaction (AREA)
  • Character Discrimination (AREA)

Abstract

The present invention relates to a method and system for online script independent recognition of handwritten sub-word unit and words. More particularly the present invention relates to a system and method which enables online recognition of script independent sub-word unit and words by recognizing the written individual strokes prior to recognition of sub-word unit and words. The present invention provides an easy and natural to use method for handwritten sub-word unit and word recognition, wherein the application can be deployed on the existing communication means.

Description

    FIELD OF THE INVENTION
  • The present invention relates to a method and system for online script independent recognition of handwritten sub-word unit and words. More particularly the present invention relates to a system and method which enables online recognition of script independent sub-word unit and words by recognizing the written individual strokes prior to recognition of sub-word unit and words.
  • BACKGROUND OF THE INVENTION AND PRIOR ART
  • With the rapid advancement in technology, the mode of communication and the means used for communication has improved by leaps and bounds to meet the ever increasing demand of the population.
  • “Short messaging service” acronym SMS, chatting and e-mails are some of the common communication modes used by people all over the world. These communication modes are cost effective, easy and comfortable.
  • In recent times PDAs, palmtops and handheld personal computer (PC) are more frequently being used for composing short messages (SMS) and e-mails. These messages are generally composed in English using the conventional keyboard of PC's or regular keypads of mobile handsets.
  • The biggest challenge for word processing in other languages such as Germanic, Slavic, Romanic and Indic languages is a vexing experience, considering the constraint to use the regular keyboard, designed for English language.
  • The present day input mode of communication means often tends to be of less user friendly for individuals originating from places like India especially because of the several existing Indic (Indian) languages scripts. Further communicating by mode of short messages and e-mails in scripts of these languages using the conventional keyboard or mobile keypad is both difficult and time consuming.
  • A solution that has been employed is the transliteration of these language (Germanic, Slavic, Romanic and Indic languages) texts in English, which allows the use of the English keyboard to enter the scripts of these language texts. However this requires the user to be able to write the non-English language text in English alphabets which requires English literacy.
  • For this reason a feasible and probably the only option for script independent message composition for the English non-literate population that is paving way, is to the use the electronic pen (e-pen) or a stylus touching a pressure sensitive surface in lieu of the keyboard to write sub-word units.
  • Hence, online handwritten character recognition (OHCR) is of prime importance especially in the context of communicating short messages and e-mails for script independent languages.
  • Though advantageous, online handwritten character recognition (OHCR) is available for English, Chinese and Japanese languages, and surprisingly relatively less work has been reported for language scripts such as Germanic, Indic, and Romanic and so on.
  • Hence there is an urgent need to provide a method and system to enable online script independent recognition of sub-words and words.
  • Some of the inventions which deal with providing online handwritten script recognition are as follows:
  • U.S. Pat. No. 5,550,931 titled “Automatic handwriting recognition using both static and dynamic parameters” provides a method and apparatus for recognizing handwritten characters in response to an input signal from a handwriting transducer. Though '931 patent provides a feature extraction and reduction procedure that relies on static or shape information, it fails to relate the temporal order in which points are captured by an electronic tablet.
  • U.S. Pat. No. 6,011,865 titled “Hybrid on-line handwriting recognition and optical character recognition system” provides a method and a system for hybrid on-line handwriting recognition and optical character recognition. Though '865 patent provides a handwriting recognition system and method that employs both online and off-line Hand writing recognition to achieve a recognition accuracy that is improved over the use of either technique when used alone, it fails to provide and perform feature extraction and spatio-temporal analysis for ascertaining if the recognized sequences of strokes are valid or not and thereby which enhances character recognition accuracy.
  • U.S. Pat. Nos. 4,284,975, 6,389,166 and 4,365,235 disclose a pattern recognition system operating in particular for Chinese handwritten characters, online handwritten Chinese character recognition apparatus based on character shapes and a Chinese/Kanji online recognition system consisting of tablet electronics module, a signal filter and segment integration unit, a base stroke classification unit, a symbol element recognition unit and a symbol recognition output table respectively.
  • U.S. Pat. No. 7,587,087 titled “On-line handwriting recognition” discloses a method and a device for on-line handwriting recognition; wherein the use of at least one auxiliary line is displayed on a touch sensitive panel. Each of the auxiliary lines constitutes a portion of more than one character of a character set. A character of a character set is drawn on the touch sensitive panel by completing one of the at least one auxiliary line into the character. The drawn character is recognized on the basis of said completion. Though '087 patent relates to online handwriting recognition, it fails to recognize the stroke leading to recognition of the character. Instead the character is recognized only on completion of writing the character.
  • US patent application number 20060126936 titled “System, method, and apparatus for triggering recognition of a handwritten shape” discloses a technique that uses repetitive and reliably recognizable parts of handwriting, during digital handwriting data entry, to trigger recognition of digital ink and to repurpose handwriting task area properties. Though '936 application discloses a system and method for handwritten shape recognition, it fails to provide and perform feature extraction and Spatio-temporal analysis for ascertaining if the recognized sequences of strokes are valid or not and thereby enhancing character recognition accuracy. Further, the recognition technique of '936 patent application attempts to find the character that most closely matches the strokes entered on the tablet and returns the results on run instead of showing results when the user finishes writing.
  • US patent application number 20080159625 titled “System, Method and Apparatus for Automatic Segmentation and Analysis of Ink Stream” discloses a technique that provides for real-time segmentation of handwritten traces during data entry into a computer. Though '625 application discloses a system and method for automatic segmentation and analysis of ink stream, it fails to provide and perform feature extraction and lexicon based domain specific word knowledge for recognition of characters and words.
  • US patent application number 20090003705 titled “Feature Design for HMM Based Eastern Asian Character Recognition” provides a method for online character recognition of East Asian characters includes acquiring time sequential, online ink data for a handwritten East Asian character, conditioning the ink data to produce conditioned ink data where the conditioned ink data includes information as to writing sequence of the handwritten East Asian character and extracting features from the conditioned ink data where the features include a tangent feature, a curvature feature, a local length feature, a connection point feature and an imaginary stroke feature. Though '705 application discloses a system and method for Eastern Asian Character Recognition, it fails to identify and construct a primitive stroke database which encompasses the handwritten script and the recognition engine primarily which recognizes their primitives prior to character and word recognition and further does not provide for a lexicon based domain specific word knowledge used for identification of characters and words.
  • PCT application number 2006090404 titled “System, Method, and Apparatus for Accommodating Variability in Chunking the Sub-Word Units of Online Handwriting” provides a technique for automatic real-time segmentation of an ink stream that does not require learning any chunking methodology, style of writing, and/or a predefined symbol set. In one example embodiment, this is achieved by drawing one or more strokes associated with a desired word of a script in one or more boxes provided on a digitizer screen using a pen. Though '404 application discloses a system and method for online handwriting recognition, it fails to provide for feature extraction and spatio-temporal analysis for ascertaining if the recognized sequences of strokes are valid or not and thereby enhancing character recognition accuracy.
  • The current state of arts restricts the universal application of the short messaging service and e-mail communication mode for script dependent online handwritten sub-word unit and words recognition. Hence there is an urgent need to provide a method and system to enable communication using the existing Short messaging service and e-mail communication means by employing an application for online recognition of script independent handwritten sub-word unit and words.
  • In light of the above mentioned prior arts it is evident that there is a need to have a customizable solution for online script independent recognition of handwritten sub-word unit and words.
  • In order to address the long felt need of such a solution, the present invention provides a method and system for online script independent recognition of handwritten sub-word unit and words. More particularly the present invention relates to a system and method which enables online recognition of script independent sub-word unit and words by recognizing the written individual strokes prior to recognition of sub-word unit and words.
  • OBJECTS OF THE INVENTION
  • The principle object of the invention is to provide a system and method for online script independent recognition of handwritten sub-word unit and words.
  • Another object of the invention is to enable online script independent recognition of handwritten sub-word unit and words by recognizing the written individual strokes prior to recognition of sub-word unit and words.
  • Yet another object of the invention is to provide a system and method to enable use of online script independent recognition of handwritten sub-word unit and words engine on the communication means.
  • Yet another object of the invention is to provide a system and a method for online script independent recognition of handwritten sub-word unit and words through identification of the primitive strokes and the structure of the written language script.
  • Yet another object of the present invention is to provide overcome the existing challenges in online handwritten recognition for scripts such as but not limited to the large size of the sub-word unit set, larger similarity between different sub-word units in the script and huge variation in writing style, by providing a system and method for focusing on stoke identification prior to sub-word unit and word recognition.
  • Yet another object of the invention is to provide a system and method for identification of a small (compared to the size of the sub-word unit set of the language) set of primitives, which encompasses a script; wherein the handwriting recognition engine primarily recognizes these primitives prior to sub-word unit and words
  • Yet another object of the invention is to provide a system and method for modeling and representing the stroke using Fuzzy Directional Feature (FDF) set.
  • Yet another object of the invention is to define rule sets for sub-word unit formation from a sequence of strokes and make use of the spatio-temporal knowledge of the script to ascertain the validity of the recognized sequence of strokes.
  • Yet another object of the invention is to provide an easy to use and robust system for online script independent recognition of handwritten sub-word unit and words.
  • SUMMARY OF THE INVENTION
  • The present invention discloses a system and method for online script independent recognition of handwritten sub-word unit and words.
  • The user provides input in the form of online script independent handwritten text input via the input means of the communication means.
  • According to the present invention a method for online handwritten sub-word unit recognition on a communication means, using an application stored in a memory of a communication means; wherein the said method comprises the processor implemented steps of:
      • a. providing an online handwritten text input using an input means through an user interface to the said communication means;
      • b. sending the said online handwritten text input to spatio-temporal analysis module of the application; wherein the said spatio-temporal analysis module further comprises of a recognition engine for recognizing primitive strokes prior to sub-word units;
      • c. subjecting the text input of step b) to stroke extraction and separation module of the application to extract and separate the strokes by comparing the said strokes to a primitive stroke database;
      • d. subjecting the data of strokes obtained from step c) to the pre-processing module to pre-process the said extracted, separated and identified individual strokes by:
        • i. reducing or removing the noise created due to the slow movement of the said input means using a noise removal module of the said pre-processing module to obtain noise free smooth data, and
        • ii. optionally normalizing the size of the online handwritten text data by using the size normalization module of the pre-processing module;
      • e. subjecting the pre-processed primitive stroke data of step d) to feature extraction module for modeling and representing the online handwritten text input; wherein the feature extraction module further comprises of:
        • i. critical point identification module pertaining algorithms for identifying the critical points in the pre-processed smoothed data, and
        • ii. fuzzy directional features for representing the strokes in the pre-processed smoothed data;
      • f. subjecting the feature extracted data of step e) to stroke level recognition module for recognizing the primitive strokes, and further concatenating the said primitive strokes to form a sequence of strokes to further form a sub-word unit; wherein the stroke level recognition module further analyzes for any errors in the recognition of strokes using an evaluation and error analysis module;
      • g. in case of any error, the evaluation and error analysis module helps in improving the stroke recognition;
      • h. subjecting the sequence of recognized strokes of step f) for sub-word unit formation and sub-word unit recognition using the sub-word unit level recognition module; wherein the sub-word units are formed on the basis of the sub-word unit boundary information obtained from the sub-word unit boundary segmentation module and the defined rules for sub-word unit formation of the said sub-word unit level recognition module; wherein the sub-word unit recognition module further analyzes for any error in the recognition;
      • i. In case of any error, the sub-word unit recognition results are backtracked through the error analysis module and modified based on the failure of the sub-word unit formation namely to put together a sequence strokes to form a sub-word unit.
    BRIEF DESCRIPTION OF THE DRAWINGS
  • The foregoing summary, as well as the following detailed description of preferred embodiments, is better understood when read in conjunction with the appended drawings. For the purpose of illustrating the invention, there is shown in the drawings, example constructions of the invention; however, the invention is not limited to the specific methods, script and hardware disclosed in the drawings:
  • FIG. 1 of the present invention illustrates the formation of the phrase “Mera Bharat” in Devanagari script.
  • FIG. 2 of the present invention illustrates the word construction from sub-word unit, which are comprised of primitive strokes.
  • FIG. 3 of the present invention illustrates the steps involved in creating the primitive stroke database.
  • FIG. 4 of the present invention illustrates the architecture of online handwritten sub-word unit and word recognition system of the present invention for Indian languages.
  • FIG. 5 of the present invention illustrates the effect of noise removal using smoothing on a Devanagari sub-word unit.
  • FIG. 6 of the present invention illustrates the critical points extracted on the smoothed Devanagari handwritten sub-word unit.
  • FIG. 7 of the present invention illustrates how the angle θ contributes to two directions (1, 2) with different fuzzy membership values (green and red dot).
  • FIG. 8 of the present invention illustrates a block diagram indicating the steps involved in primitive stroke recognition.
  • FIG. 9 of the present invention illustrates the procedure adapted for word and sub-word unit boundary detection and stroke extraction and segmentation.
  • FIG. 10 of the present invention illustrates the methodology used for sub-word unit recognition.
  • FIG. 11 of the present invention illustrates the approach used by the system of the present invention for word recognition.
  • FIG. 12 of the present invention illustrates an exemplary embodiment, where primitives (m, ou, R, A, Ab, ***) are combined together to form sub-word unit, resulting into words. We can recognize the Devanagari sub-word unit by recognizing the primitives and analyzing a sequence of primitives to identify a sub-word unit.
  • FIG. 13 of the present invention illustrates a typical online Devanagari Paragraph Data collected using an electronic pen device.
  • DETAILED DESCRIPTION OF THE INVENTION
  • Before the present method and hardware enablement are described, it is to be understood that this invention in not limited to the particular methodologies, scripts and hardware described, as these may vary. It is also to be understood that the terminology used in the description is for the purpose of describing the particular versions or embodiments only, and is not intended to limit the scope of the present invention which will be limited only by the appended claims. The words “comprising,” “having,” “containing,” and “including,” and other forms thereof, are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. The disclosed embodiments are merely exemplary methods of the invention, which may be embodied in various forms.
  • In one of the significant embodiment of the present invention a system for online handwritten sub-word unit recognition; wherein the said system comprising:
      • a. at least one communication means having an application stored in a memory of the said communication means; wherein the said communication means further comprises of an input means to provide online handwritten text input via the user interface of the said communication means; and
      • b. the said application for online handwritten sub-word unit recognition comprising:
        • script independent modules; and
        • script dependent module.
  • According to one of the embodiment of the present invention, the script independent modules comprises of stroke extraction and separation module, pre-processing module, feature extraction module, stroke level recognition module, evaluation and error analysis module, word level recognition module and sub-word unit level recognition module; whereas the script dependent system comprises of spatio-temporal analysis module, rules based sub-word unit creation module and lexicon based word level knowledge or language model dictionary are script dependent modules.
  • The present invention provides a method for online script independent recognition of handwritten sub-word unit and words.
  • In a preferred embodiment of the present invention, the term “sub-word unit” refers to a member of alphabetic, characters or composite characters, logographic, and/or phonetic/syllabic character set, which includes syllables, alphabets, numerals, punctuation marks, consonants, consonant modifiers, vowels, vowel modifiers, and special characters, and/or any combination thereof.
  • In another preferred embodiment of the present invention, these sub-word units together form a word.
  • According to one of the embodiments of the present invention, in Devanagari script, the vowels following a consonant are orthographically indicated by signs called matras to form a consonant vowel combination. The modifier symbols are normally attached to the top, bottom, left or right of the base sub-word unit which is highly dependent on the consonant-vowel pair. The sub-word unit in Devanagari script refers to an “akshara” or “samyukthakshara”.
  • FIG. 1 of the present invention illustrates the formation of the phrase “Mera Bharat” in Devanagari script.
  • In a preferred embodiment of the present invention, in Devanagari script, the consonants, vowels, matras and the consonant/vowel modifiers constitute the entire alphabet set; wherein these composite sub-word units are joined together by a horizontal line called shirorekha 10 to form words 20 as shown in FIG. 2.
  • FIG. 2 of the present invention illustrates the word construction from sub-word units, which are comprised of primitive strokes.
  • According to one of the embodiments of the present invention, sub-word units 30 are made up of multiple strokes 40; wherein the sub-word units 30 are identified on recognizing a sequence of strokes 40 that make a sub-word unit 30.
  • According to another embodiment of the present invention, the recognition engine primarily recognizes these primitive strokes 40 prior to sub-word units 30 and words 20.
  • According to another embodiment of the present invention, FIG. 3 of the present invention illustrates the steps involved in creating the primitive stroke database.
  • The steps are as follows:
  • 50: Providing online handwritten text input 90 using the user interface to the communication means. FIG. 13 illustrates a typical online Devanagari Paragraph Data collected using an electronic pen device.
  • 60: The online handwritten text input 90 provided in step 50 is then further analyzed and the individual strokes 40 are separated and modeled using the Fuzzy Directional Features (FDF) 160.
  • 70: The separated and modeled strokes 40 are extracted using the feature extraction technique 170; wherein the strokes 40 are extracted based on the feature set i.e. directional properties of the curve connecting two consecutive critical points identified on a stroke using fuzzy directional features (FDF) 160.
  • In a preferred embodiment of the present invention, the identification of the curvature points 150 is considered to be a prerequisite for fuzzy directional feature extraction 160.
  • 80: The extracted strokes are further classified into 69 primitive strokes and have been further clustered into Devanagari primitive stroke sets to form a primitive stroke database for Devanagari script.
  • The identified and clustered 69 primitive strokes for Devanagari script are represented in Table 1 below:
  • In one of the preferred embodiment of the present invention, stroke level recognition module 180 is script independent even though the shape of the strokes 40 and the number of strokes 40 to form a sub-word unit 30 and further a word 20 might vary from one language script to another language script.
  • FIG. 4 of the present invention illustrates the architecture of online script independent recognition of handwritten sub-word unit and words.
  • According to one of the embodiment of the present invention, the online handwritten text input 90 is provided by the user in his/her handwriting to the communication means using the user interface.
  • In a preferred embodiment of the invention the communication means comprises of but does not limit to a mobile phone, a Personal Digital Assistant or PDA, palm-top, mobile digital assistant, computer, laptop, notebook, personal computer or any portable communication device.
  • In a preferred embodiment of the present invention, the input means comprises of an electronic pen or stylus or stick; wherein the script independent handwritten text is written on writing panel or an electronic tablet or on the pressure sensitive touch screen of the communication means.
  • The online handwritten text input 90 provided by the user using the user interface to the communication means essentially is of data format which would be typically a trace of a pen between a pen-down and a pen-up process, which is a set of x, y points and are uniformly sampled in time. Also these set of points are non-uniformly sampled in space.
  • Further, a stroke can be represented by a variable number of 2D points which are in a time sequence. For example an online script would be represented as

  • {(xt1, yt1), (xt2, yt2), . . . , (xm, ym)}  (1)
  • Wherein, t denotes the time such that and n represents the total number of points. Equivalently, the online text data is represented as

  • {(x1, y1), (x2, y2), . . . , (xn, yn)}  (2)
  • by dropping the variable t. The number of points denoted by n varies depending on the size of the stroke 40 and also the time taken to write the stroke 40.
  • According to another embodiment of the present invention, the online handwritten text input 90 acquired initially is further subjected to spatio-temporal analysis module (B1) 100 of individual strokes. Typically, the spatio-temporal analysis module (B1) 100 provides the ability to segment a paragraph of online handwritten text input 90 data into words 20 based on shirorekha identification 10 in case of Devanagari script, followed by identification of matras by identifying the relative position of the strokes 40.
  • Further, the spatio-temporal analysis module 100 may be used to improve the performance of the stroke recognition.
  • For example, in case of Devanagari script, once a matra is identified based on the spatial position of the stroke, the stroke recognition can be constrained to only the reference matras. The individual strokes 40 are then recognized to be one of the 69 primitives of Devanagari script of the primitive stroke database (for Devanagari script) 80.
  • On completion of the spatio-temporal analysis, the online handwritten text input 90 is further subjected to either stroke extraction and separation module (B2) 110 and further to stroke level recognition module (A) 180 or sub-word unit level recognition module (C) 200 or word level recognition module (ID) 220 on the basis of relative spatial information of each identified primitive stroke 40.
  • In case of stroke extraction and separation, the primitive strokes are extracted, separated and identified as described in FIG. 3; wherein FIG. 3 of the present invention illustrates the steps involved in creating the primitive stroke database.
  • On completion of the process of stroke separation, the extracted, separated and identified stroke is further subjected to the algorithms of the pre-processing module 140; wherein the extracted, separated and identified strokes are further subjected to noise removal module 120 and optionally to size normalization module 130 processes.
  • According to one of the embodiment of the present invention, the noise in online handwritten text input 90 scripts is inherent, which severely affects the performance of online sub-word unit and word recognition algorithms of the online handwritten text input 90 script as the number of data points is very sparse, especially when the movement of the user interface such as pen or stylus is fast. Whereas the number of data points is very contaminated by high frequency noise in case of slow pen movement.
  • Further, the noise that contributes to the noisy data can be further classified into two types based on the following circumstances:
      • (a) due to the inherent shake of the hand of the writer especially at the beginning and end of the stroke and
      • (b) Contribution by the noise creeping in due to digitization process.
  • In order to remove and reduce the noise created due to the above mentioned circumstances the following two processes are followed
      • (a) a noise removal algorithm (typically Gaussian smoothing) is used on the raw noisy data, and
      • (b) a feature extraction algorithm is used to compensate for the noise.
  • FIG. 5 of the present invention illustrates the effect of noise removal using smoothing on a Devanagari sub-word unit.
  • According to another embodiment of the present invention, feature extraction module 170, further involves two stages viz, identification of critical (also called as curvature points) points 150 and fuzzy directional feature extraction 160.
  • Identification of critical points 150 is based on the directional properties of the curve and the critical points occur at the points on the handwritten strokes where there is a large curvature change; wherein identification of the critical points is a prerequisite for feature extraction.
  • The critical points are extracted from the smoothed handwritten text data provided by the user. The denoised sequence (xi, yi)i=0 n represents a noise free handwritten stroke. xi and yi sequences are treated separately and the critical points are computed for each of these sequences. For the x sequence, the first difference x′i is calculated as

  • x′i =sgn(x i −x i+1)

  • where,

  • sgn(k)=+1 if x i −x i+1>0   (3)

  • sgn(k)=−1 if x i −x i+1<0

  • sgn(k)=0 if x i −x i+1=0
  • x′ is used to compute the critical point in x sequence. Point i is considered as a critical point if and only if

  • sgn(k)=0 if x i −x i+1=0

  • x′ t −x′ i+1≠0
  • The critical points for the y sequence are calculated in the similar way. The final list of critical points is the union of all the points marked as critical points in both the x and the y sequence.
  • FIG. 6 of the present invention illustrates the critical points extracted on the smoothed Devanagari handwritten sub-word unit.
  • According to another embodiment of the present invention, another prerequisite for feature extraction is fuzzy directional feature extraction 160
  • The algorithm for fuzzy directional feature extraction 160 is as follows:
  • Let k be the number of critical points (denoted by c1, c2, . . . , ck) extracted from a stroke of length n; wherein usually k<<n. The k critical points form the basis for extraction of the fuzzy directional features. Firstly the angle between two critical points, say cl and cl+1, is computed as
  • θ l = tan - 1 ( y l - y l + 1 x l - x l + 1 )
  • Where (xl, yl) and (xl+1, yl+1) are the coordinates corresponding to the critical point cl and cl+1 respectively. It is to be noted that 2p is divided into P directions with overlap. Every θl (for example the angle θ that the blue dotted line makes with the horizontal axis as illustrated in FIG. 7, further has two directions (d1l=1, d2l=2, further the line making an angle θ with the dotted line in FIG. 7 lies in both the triangles represented by direction 1 and direction 2 associated with it having m1l, m2l membership values respectively (represented by the green and the red dot respectively in FIG. 7.
  • Further, 1. ml 1+ml 2=1 and
      • 2. dl 1, dl 2 are adjacent directions, for example if dl 1=5 then dl 2 could be either 4 or 6
        Algorithm 1 Conversion of Angles Between Two Critical Points into 8 Directions
  • int deg2dir(double θ)
    int dir = −1
    if (θ > −π/8 & θ < π/8) then
    dir = 1;
    end if
    if (θ > π/8 & θ < 3π/8) then
    dir = 2;
    end if
    if (θ > 3π/8 & θ < 5π/8) then
    dir = 3;
    end if
    if (θ > 5π/8 & θ < 7π/8) then
    dir = 4;
    end if
    if ((θ>=7π/8 & θ < 9π/8) ∥ (θ>= −9π/8 & θ < −7p/8) then
    dir =5;
    end if
    if (θ >−7π/8 & θ < −5p/8) then
    dir =6;
    end if
    if (θ >−5π/8 & θ < −3p/8) then
    dir =7;
    end if
    if (θ >−3π/8 & θ < −p/8) then
    dir =8;
    end if
    return(dir);
  • Algorithm 2 Triangular Fuzzy Membership Function
  • Fuzzy membership ( θ c , θ ) ; m = 1.0 - ( ( θ c - θ ) ) 4 ;
  • return (m);
  • θ is used in Algorithm 1 which is assisted by triangular membership function described in Algorithm 2 for computing the FDF 160 set. The same is represented in table 2 below:
  • TABLE 2
    Directions
    Angle
    1 2 3 4 5 6 7 8
    θ1 m1 1 m1 2
    θ2 m2 1 m2 2
    θ3 m3 2 m3 1
    θi m1 2 m1 1
    θk−1 mk−1 2 mk−1 1
    F f 1 = ( m 2 2 + m 3 2 ) 2 f2 = m2 2 f3 = m1 1 f4 = m1 2 f5 = mk−1 2 f6 = mk−1 1 f7 = m1 2 f 8 = ( m 3 1 + m 1 1 ) 2
  • θt, k-1 is the angle between two consecutive critical points (where k is the total number of critical points) in a handwritten primitive and d1 . . . 8 is the respective direction. Wherein, P=8 is considered and used to describe the process.
  • The fuzzy membership values assigned to each direction are represented as m1,k-1 1,2 and the corresponding feature vector values as f1 . . . and f8. Further, the sum of the membership functions of a particular row as represented in Table 2 is always 1.
  • FDF 160 is calculated by taking average across the columns, so as to form a vector of dimension eight. The mean is further calculated as follows;
  • For each direction (1 to 8), all the membership values are collected and divided by the number of occurrences of the membership values in that direction.
  • For example as represented in Table 2, the mean for direction 1 is calculated as
  • f 1 = ( m 2 2 + m 3 2 ) 2 . ( 8 )
  • Further, these mean values are used to construct 8 directional FDF 160 to represent a stroke 40. Further, the membership function associates the angle between two critical points into two directions with different membership values. In the commonly used Directional Features only one direction is associated with each 6 (the angle between two consecutive critical points).
  • Further, FIG. 7 of the present invention illustrates how the angle θ is contributing to the two directions (1, 2) with different fuzzy membership values (green and red dot).
  • According to another embodiment of the present invention, once the feature extraction is completed for the online handwritten text input 90 provided by the user, the extracted handwritten text data is further subjected to stroke level recognition module (A) 180.
  • FIG. 8 of the present invention illustrates a block diagram indicating the steps involved in primitive stroke recognition.
  • The stroke level recognition module (A) 180 is described in details in FIG. 8; wherein the method of recognizing primitive strokes consists of the following two phases:
      • 1. Learning phase: In the learning phase, the system learns and builds reference models for all primitive strokes 40 in the primitive stroke database 80.
      • 2. Testing Phase: In the testing phase the stroke 40 is compared with the primitive stroke database using a recognizer 250 to determine the best matching (n-best where n=3) reference stroke.
  • The primitive stroke extraction and separation (B2) 110 and the stroke level recognition (A) 180 are performed prior to the sub-word unit level recognition 200. The recognized strokes with n best match (n=3) are considered for sub-word unit level recognition 200. The rule based sub-word unit formation 210 compares the sub-word unit boundary information along with the relative position of the strokes used to in ascertain if the recognized sequence of strokes are valid or not. If the strokes are valid then, the sub-word unit level recognition 200 further processes the processed online handwritten text data 90 to word recognition level (D) 220. If the strokes are not valid, then the strokes are further evaluated and analyzed for any possible errors using the evaluation and error analysis process 190.
  • FIG. 9 of the present invention illustrates the procedure adapted for word and sub-word unit boundary detection and stroke extraction and segmentation.
  • According to one of the embodiments of the present invention, the online handwritten text input 90 provided by the user using the user interface to the communication means. The online handwritten text input 90 acquired initially is further subjected to spatio-temporal analysis module (B1) 100 of individual strokes 40. Typically, the spatio-temporal analysis module (B1) 100 provides the ability to segment a paragraph of online handwritten text input 90 data into words 20 based on shirorekha identification 10 in case of Devanagari script, followed by identification of matras by identifying the relative position of the strokes 40.
  • Further, the spatio-temporal analysis module 100 may be used to improve the performance of the stroke recognition. For example, once a matra is identified based on the spatial position of the stroke, the stroke recognition can be constrained to only the reference matras. The individual strokes are then recognized to be one of the 69 primitives of the primitive stroke database for Devanagari script.
  • On completion of the spatia-temporal analysis, the Devanagari handwritten text 90 can be further subjected to either stroke extraction and segmentation (B2) 110 and further to stroke level recognition module 180 or sub-word unit recognition level module 200 or word level recognition module 220 on the basis of relative spatial information of each identified primitive stroke 40.
  • When the online handwritten text input 90 is subjected directly to word recognition module 220, the word boundaries are identified using the word boundary segmentation module 260 from the online handwritten text input 90 and the output words B11 are then further subjected to sub-word unit boundary segmentation module 270 to identify the sub-word units B12.
  • When the online handwritten text input 90 is subjected to stroke extraction and segmentation module 120 each stroke is segmented which is further send to primitive stroke recognition module 180 for recognizing individual strokes.
  • In a preferred embodiment of the present invention, the language script specific rules and spatio-temporal information is used to detect the word and sub-word unit boundaries.
  • FIG. 10 of the present invention illustrates the methodology used for sub-word unit recognition.
  • According to one of the embodiments of the present invention, the online handwritten text input 90 provided by the user using the user interface to the communication means. The online handwritten text input 90 acquired initially is further subjected to spatio-temporal analysis module (B1) 100 of individual strokes 40. Typically, the spatio-temporal analysis module 100 provides the ability to segment a paragraph of online handwritten text input 90 data into words 20 based on shirorekha identification 10 in case of Devanagari script, followed by identification of matras by identifying the relative position of the strokes 40.
  • Further, the spatio-temporal analysis module 100 may be used to improve the performance of the stroke recognition. For example, once a matra is identified based on the spatial position of the stroke, the stroke recognition can be constrained to only the reference matras. The individual strokes are then recognized to be one of the 69 primitives of the primitive stroke database for Devanagari script.
  • On completion of the spatio-temporal analysis, the online handwritten text input 90 is further subjected to stroke extraction and segmentation module 110 and further to stroke level recognition module 180 on the basis of relative spatial information of each identified primitive stroke 40.
  • When the online handwritten text input 90 is subjected directly to stroke extraction and segmentation module 110 each stroke is segmented which is further send to primitive stroke recognition module 180 for recognizing individual strokes.
  • The primitive stroke extraction and separation module (B2) 110 and the stroke level recognition module (A) 180 are performed prior to the sub-word unit level recognition module 200. The recognized strokes with n best match (n=3) are considered for sub-word unit level recognition 200. The rule based sub-word unit formation 210 compares the sub-word unit boundary information along with the relative position of the strokes used to in ascertain if the recognized sequence of strokes are valid or not. If the strokes are valid then, the sub-word unit level recognition 200 further processes the processed online handwritten text data 90 to word recognition level (0) 220. If the strokes are not valid, then the strokes are further evaluated and analyzed for any possible errors using the evaluation and error analysis process 190.
  • FIG. 11 of the present invention illustrates the approach used by the system of the present invention for word recognition.
  • According to one of the embodiments of the present invention, the online handwritten text input 90 provided by the user using the user interface to the communication means. The online handwritten text input 90 acquired initially is further subjected to spatio-temporal analysis module (B1) 100 of individual strokes 40. Typically, the spatio-temporal analysis module 100 provides the ability to segment a paragraph of online handwritten text input 90 data into words 20 based on shirorekha identification 10 in case of Devanagari script, followed by identification of matras by identifying the relative position of the strokes 40.
  • Further, the word recognition module 220 which is based on a lexicon based word knowledge 240 making it adoptable for any Indian language. The steps involved are illustrated in the FIG. 11; wherein the output of the sub-word unit recognition level module (C) 200 as described in FIG. 10, and the word boundary information (B1) 100 as described in FIG. 9 along with the primitive stroke extraction results (B2) 110 from FIG. 9 are used for word recognition in the word recognition module (D) 220. The words in the lexicon based word knowledge 240 are an important aspect of achieving acceptable accuracy for online handwritten sub-word unit and word recognition. Further the lexicon based word knowledge 240 is used for verifying and improving the word recognition results as represented in FIG. 11.
  • FIG. 12 of the present invention illustrates an exemplary embodiment, where primitive strokes 40 (m, ou, R, A, Ab, ***) are combined together to form sub-word units 30, resulting into words 20. The formation of Devanagari sub-word unit by the concatenation of sequence of primitives strokes 40 and hence words 20 by a sequence of sub-word units 30 is illustrated in FIG. 12; wherein the Devanagari sub-word units are recognized based on the primitives and a sequence of primitives are analyzed to identify a sub-word unit. The rules for sub-word unit formation from a sequence of strokes are formed for each sub-word unit in Devanagari. The rules set along with the primitive recognition results for the recognition of sub-word units are used.
  • Further, it is obvious to a person skilled in art that the invention is not limited to the type of script used to describe and illustrate the particular methodologies, and hardware described, as these may vary. Further, the use of particular script (s), methodologies, and hardware (s) described is not intended to limit the scope of the present invention. The disclosed embodiments are merely exemplary methods of the invention, which may be embodied in various forms for various scripts, methodologies and hardware described, as these may vary.
  • ADVANTAGES OF THE INVENTION
  • The present invention provides a system and a method for online script independent handwritten sub-word unit and word.
  • Provides the English illiterate individuals to communicate using the script independent system and method.
  • Provides a platform to enable input in non-English languages
  • Provides an easy and natural to use method for handwritten sub-word unit and word recognition, wherein the application can be deployed on the existing communication means.

Claims (26)

1. A method for online handwritten sub-word unit recognition on a communication means, using an application stored in a memory of a communication means; wherein the said method comprises the processor implemented steps of:
a. providing an online handwritten text input using an input means through an user interface to the said communication means;
b. sending the said online handwritten text input to spatio-temporal analysis module of the application; wherein the said spatio-temporal analysis module further comprises of a recognition engine for recognizing primitive strokes prior to sub-word units;
c. subjecting the text input of step b) to stroke extraction and separation module of the application to extract and separate the strokes by comparing the said strokes to a primitive stroke database;
d. subjecting the data of strokes obtained from step c) to the pre-processing module to pre-process the said extracted, separated and identified individual strokes by:
i. reducing or removing the noise created due to the slow movement of the said input means using a noise removal module of the said pre-processing module to obtain noise free smooth data, and
ii. optionally normalizing the size of the online handwritten text data by using the size normalization module of the pre-processing module;
e. subjecting the pre-processed primitive stroke data of step d) to feature extraction module for modeling and representing the online handwritten text input; wherein the feature extraction module further comprises of:
i. critical point identification module pertaining algorithms for identifying the critical points in the pre-processed smoothed data, and
ii. fuzzy directional features for representing the strokes in the pre-processed smoothed data;
f. subjecting the feature extracted data of step e) to stroke level recognition module for recognizing the primitive strokes, and further concatenating the said primitive strokes to form a sequence of strokes to further form a sub-word unit; wherein the stroke level recognition module further analyzes for any errors in the recognition of strokes using an evaluation and error analysis module;
g. in case of any error, the evaluation and error analysis module helps in improving the stroke recognition;
h. subjecting the sequence of recognized strokes of step f) for sub-word unit formation and sub-word unit recognition using the sub-word unit level recognition module; wherein the sub-word units are formed on the basis of the sub-word unit boundary information obtained from the sub-word unit boundary segmentation module and the defined rules for sub-word unit formation of the said sub-word unit level recognition module; wherein the sub-word unit recognition module further analyzes for any error in the recognition;
i. In case of any error, the sub-word unit recognition results are backtracked through the error analysis module and modified based on the failure of the sub-word unit formation namely to put together a sequence strokes to form a sub-word unit.
2. A method as claimed in claim 1, wherein the said communication means comprises of mobile phone, a Personal Digital Assistant, PDA, palm-top, mobile digital assistant, computer, laptop, notebook, personal computer or any portable communication device.
3. A method as claimed in claim 1, wherein the input means of the said communication means comprises of a special electronic pen, stylus or stick.
4. A method as claimed in claim 1, wherein the user interface comprises of writing panel, an electronic tablet or the pressure sensitive touch screen of the communication means.
5. A method as claimed in claim 1, wherein the user interface is receptive to online handwritten text input for capturing and storing time ordered (x, y) sequence of handwritten text with stroke begin or pen down and stroke end or pen up information.
6. A method as claimed in claim 1, wherein the sub-word units recognized and formed are further subjected to word level recognition module to recognize and form words, using the lexicon based word level knowledge or language model dictionary along with the obtained spatio-temporal information obtained.
7. A system for online handwritten sub-word unit recognition; wherein the said system comprising:
a. at least one communication means having an application stored in a memory of the said communication means; wherein the said communication means further comprises of an input means to provide online handwritten text input via the user interface of the said communication means; and
b. the said application for online handwritten sub-word unit recognition comprising:
script independent modules; and
script dependent module.
8. A system as claimed in claim 7, wherein the said script independent modules comprises of stroke extraction and separation module, pre-processing module, feature extraction module, stroke level recognition module, evaluation and error analysis module and sub-word unit level recognition module.
9. A system as claimed in claim 8, wherein the said stroke extraction and separation module extracts and separates the strokes from the online handwritten text input by comparing the said strokes to a primitive stroke database
10. A system as claimed in claim 8, wherein the said pre-processing module further consists of noise removal module and a size normalization module to provide pre-processed smoothed data.
11. A system as claimed in claim 10, wherein the said noise removal module is used for reducing or removing the noise created due to slow movement of the input means of the said communication means.
12. A system as claimed in claim 10, wherein the said size normalization module is used for normalizing the size of the online handwritten text input.
13. A system as claimed in claim 8, wherein the said feature extraction module further consists of critical points identification module and fuzzy directional feature extraction module.
14. A system as claimed in claim 13, wherein the said critical points identification module consists of algorithms for identifying the critical points in the pre-processed smoothed data.
15. A system as claimed in claim 13, wherein the said fuzzy directional feature extraction module models and represents the strokes in the pre-processed smoothed data.
16. A system as claimed in claim 8, wherein the said stroke level recognition module recognizes the primitive strokes and further concatenates the said primitive strokes to form a sequence of strokes to further form a sub-word unit.
17. A system as claimed in claim 8, wherein the said sub-word unit level recognition module further consists of sub-word unit boundary segmentation module and rules based sub-word unit creation module to form sub-word units based on the sub-word unit boundary information and rules for sub-word unit formation.
18. A system as claimed in claim 17, wherein the said rules based sub-word unit creation module is script dependent.
19. A system as claimed in claim 8, wherein the said evaluation and error analysis module evaluates and analyses the error to improve stroke recognition and sub-word unit recognition.
20. A system as claimed in claim 7, wherein the sub-word units recognized and formed are further concatenated to form words using word level recognition module supported by lexicon based word level knowledge or language model dictionary.
21. A system as claimed in claim 20, wherein the said word level recognition module and lexicon based word level knowledge or language model dictionary are script dependent modules
22. A system as claimed in claim 7, wherein the said script dependent modules further consists of script independent spatio-temporal analysis module; wherein the said spatio-temporal analysis module consists of a recognition engine for recognizing primitive strokes prior to sub-word unit based on the spatial information of each stroke.
23. A system as claimed in claim 7, wherein the said communication means comprises of mobile phone, a Personal Digital Assistant, PDA, palm-top, mobile digital assistant, computer, laptop, notebook, personal computer or any other portable communication device.
24. A system as claimed in claim 7, wherein the input means of the said communication means comprises of a pen, special electronic pen, stylus or stick.
25. A system as claimed in claim 7, wherein the user interface comprises of a special writing panel or an electronic tablet or the pressure sensitive touch screen of the communication means.
26. A system as claimed in claim 7, wherein the user interface is receptive to online handwritten text input for capturing and storing time ordered (x, y) sequence of handwritten text with stroke begin and end or pen up and pen down information.
US13/292,145 2010-11-09 2011-11-09 Online script independent recognition of handwritten sub-word units and words Active 2032-09-07 US8768062B2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN3080/MUM/2010 2010-11-09
IN3080MU2010 2010-11-09

Publications (2)

Publication Number Publication Date
US20120114245A1 true US20120114245A1 (en) 2012-05-10
US8768062B2 US8768062B2 (en) 2014-07-01

Family

ID=46019684

Family Applications (1)

Application Number Title Priority Date Filing Date
US13/292,145 Active 2032-09-07 US8768062B2 (en) 2010-11-09 2011-11-09 Online script independent recognition of handwritten sub-word units and words

Country Status (1)

Country Link
US (1) US8768062B2 (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130182971A1 (en) * 2012-01-18 2013-07-18 Dolby Laboratories Licensing Corporation Spatiotemporal Metrics for Rate Distortion Optimization
US20140022406A1 (en) * 2012-07-19 2014-01-23 Qualcomm Incorporated Automatic correction of skew in natural images and video
US20140108004A1 (en) * 2012-10-15 2014-04-17 Nuance Communications, Inc. Text/character input system, such as for use with touch screens on mobile phones
US8730396B2 (en) * 2010-06-23 2014-05-20 MindTree Limited Capturing events of interest by spatio-temporal video analysis
US20140184610A1 (en) * 2012-12-27 2014-07-03 Kabushiki Kaisha Toshiba Shaping device and shaping method
US8831381B2 (en) 2012-01-26 2014-09-09 Qualcomm Incorporated Detecting and correcting skew in regions of text in natural images
US9014480B2 (en) 2012-07-19 2015-04-21 Qualcomm Incorporated Identifying a maximally stable extremal region (MSER) in an image by skipping comparison of pixels in the region
CN104615367A (en) * 2015-01-14 2015-05-13 中国船舶重工集团公司第七0九研究所 Pen interaction method and system based on handwriting input state adaptive judgment processing
US9047540B2 (en) 2012-07-19 2015-06-02 Qualcomm Incorporated Trellis based word decoder with reverse pass
US9064191B2 (en) 2012-01-26 2015-06-23 Qualcomm Incorporated Lower modifier detection and extraction from devanagari text images to improve OCR performance
US9141874B2 (en) 2012-07-19 2015-09-22 Qualcomm Incorporated Feature extraction and use with a probability density function (PDF) divergence metric
US20150339524A1 (en) * 2014-05-23 2015-11-26 Samsung Electronics Co., Ltd. Method and device for reproducing partial handwritten content
US9251412B2 (en) * 2013-12-16 2016-02-02 Google Inc. Segmentation of devanagari-script handwriting for recognition
US9262699B2 (en) 2012-07-19 2016-02-16 Qualcomm Incorporated Method of handling complex variants of words through prefix-tree based decoding for Devanagiri OCR
US20170206406A1 (en) * 2016-01-20 2017-07-20 Myscript System and method for recognizing multiple object structure
US10067669B1 (en) * 2017-07-13 2018-09-04 King Fahd University Of Petroleum And Minerals Online character recognition
US10402734B2 (en) * 2015-08-26 2019-09-03 Google Llc Temporal based word segmentation
US20210350122A1 (en) * 2020-05-11 2021-11-11 Apple Inc. Stroke based control of handwriting input
CN117332761A (en) * 2023-11-30 2024-01-02 北京一标数字科技有限公司 PDF document intelligent identification marking system

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9710157B2 (en) * 2015-03-12 2017-07-18 Lenovo (Singapore) Pte. Ltd. Removing connective strokes
CN106952583B (en) * 2017-05-23 2019-04-30 深圳市华星光电技术有限公司 The production method of flexible array substrate

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5862251A (en) * 1994-12-23 1999-01-19 International Business Machines Corporation Optical character recognition of handwritten or cursive text
US6370269B1 (en) * 1997-01-21 2002-04-09 International Business Machines Corporation Optical character recognition of handwritten or cursive text in multiple languages
US7359551B2 (en) * 2001-10-15 2008-04-15 Silverbrook Research Pty Ltd Method and apparatus for decoding handwritten characters
US20080159625A1 (en) * 2005-02-23 2008-07-03 Hewlett-Packard Development Company, L.P. System, Method and Apparatus for Automatic Segmentation and Analysis of Ink Stream
US20100128985A1 (en) * 2006-07-27 2010-05-27 Bgn Technologies Ltd. Online arabic handwriting recognition

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5580183A (en) 1978-12-12 1980-06-17 Nippon Telegr & Teleph Corp <Ntt> On-line recognition processing system of hand-written character
US4365235A (en) 1980-12-31 1982-12-21 International Business Machines Corporation Chinese/Kanji on-line recognition system
US5491758A (en) 1993-01-27 1996-02-13 International Business Machines Corporation Automatic handwriting recognition using both static and dynamic parameters
US6011865A (en) 1993-05-12 2000-01-04 International Business Machines Corporation Hybrid on-line handwriting recognition and optical character recognition system
US6389166B1 (en) 1998-10-26 2002-05-14 Matsushita Electric Industrial Co., Ltd. On-line handwritten Chinese character recognition apparatus
US8849034B2 (en) 2004-12-09 2014-09-30 Hewlett-Packard Development Company, L.P. System, method, and apparatus for triggering recognition of a handwritten shape
US7587087B2 (en) 2004-12-10 2009-09-08 Nokia Corporation On-line handwriting recognition
WO2006090404A1 (en) 2005-02-23 2006-08-31 Hewlett-Packard Development Company, L.P. System, method, and apparatus for accomodating variability in chunking the sub-word units of online handwriting
US7974472B2 (en) 2007-06-29 2011-07-05 Microsoft Corporation Feature design for HMM based Eastern Asian character recognition

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5862251A (en) * 1994-12-23 1999-01-19 International Business Machines Corporation Optical character recognition of handwritten or cursive text
US6370269B1 (en) * 1997-01-21 2002-04-09 International Business Machines Corporation Optical character recognition of handwritten or cursive text in multiple languages
US7359551B2 (en) * 2001-10-15 2008-04-15 Silverbrook Research Pty Ltd Method and apparatus for decoding handwritten characters
US20080159625A1 (en) * 2005-02-23 2008-07-03 Hewlett-Packard Development Company, L.P. System, Method and Apparatus for Automatic Segmentation and Analysis of Ink Stream
US20100128985A1 (en) * 2006-07-27 2010-05-27 Bgn Technologies Ltd. Online arabic handwriting recognition

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8730396B2 (en) * 2010-06-23 2014-05-20 MindTree Limited Capturing events of interest by spatio-temporal video analysis
US9020294B2 (en) * 2012-01-18 2015-04-28 Dolby Laboratories Licensing Corporation Spatiotemporal metrics for rate distortion optimization
US20130182971A1 (en) * 2012-01-18 2013-07-18 Dolby Laboratories Licensing Corporation Spatiotemporal Metrics for Rate Distortion Optimization
US9064191B2 (en) 2012-01-26 2015-06-23 Qualcomm Incorporated Lower modifier detection and extraction from devanagari text images to improve OCR performance
US9053361B2 (en) 2012-01-26 2015-06-09 Qualcomm Incorporated Identifying regions of text to merge in a natural image or video frame
US8831381B2 (en) 2012-01-26 2014-09-09 Qualcomm Incorporated Detecting and correcting skew in regions of text in natural images
US9141874B2 (en) 2012-07-19 2015-09-22 Qualcomm Incorporated Feature extraction and use with a probability density function (PDF) divergence metric
US9639783B2 (en) 2012-07-19 2017-05-02 Qualcomm Incorporated Trellis based word decoder with reverse pass
US9014480B2 (en) 2012-07-19 2015-04-21 Qualcomm Incorporated Identifying a maximally stable extremal region (MSER) in an image by skipping comparison of pixels in the region
US9262699B2 (en) 2012-07-19 2016-02-16 Qualcomm Incorporated Method of handling complex variants of words through prefix-tree based decoding for Devanagiri OCR
US9047540B2 (en) 2012-07-19 2015-06-02 Qualcomm Incorporated Trellis based word decoder with reverse pass
US9183458B2 (en) 2012-07-19 2015-11-10 Qualcomm Incorporated Parameter selection and coarse localization of interest regions for MSER processing
US20140022406A1 (en) * 2012-07-19 2014-01-23 Qualcomm Incorporated Automatic correction of skew in natural images and video
US9076242B2 (en) * 2012-07-19 2015-07-07 Qualcomm Incorporated Automatic correction of skew in natural images and video
US20140108004A1 (en) * 2012-10-15 2014-04-17 Nuance Communications, Inc. Text/character input system, such as for use with touch screens on mobile phones
US9026428B2 (en) * 2012-10-15 2015-05-05 Nuance Communications, Inc. Text/character input system, such as for use with touch screens on mobile phones
US20140184610A1 (en) * 2012-12-27 2014-07-03 Kabushiki Kaisha Toshiba Shaping device and shaping method
US9251412B2 (en) * 2013-12-16 2016-02-02 Google Inc. Segmentation of devanagari-script handwriting for recognition
US10528249B2 (en) * 2014-05-23 2020-01-07 Samsung Electronics Co., Ltd. Method and device for reproducing partial handwritten content
US20150339524A1 (en) * 2014-05-23 2015-11-26 Samsung Electronics Co., Ltd. Method and device for reproducing partial handwritten content
CN104615367A (en) * 2015-01-14 2015-05-13 中国船舶重工集团公司第七0九研究所 Pen interaction method and system based on handwriting input state adaptive judgment processing
US10402734B2 (en) * 2015-08-26 2019-09-03 Google Llc Temporal based word segmentation
US10846602B2 (en) 2015-08-26 2020-11-24 Google Llc Temporal based word segmentation
US10013603B2 (en) * 2016-01-20 2018-07-03 Myscript System and method for recognizing multiple object structure
US20170206406A1 (en) * 2016-01-20 2017-07-20 Myscript System and method for recognizing multiple object structure
US10067669B1 (en) * 2017-07-13 2018-09-04 King Fahd University Of Petroleum And Minerals Online character recognition
US10156982B1 (en) * 2017-07-13 2018-12-18 King Fahd University Of Petroleum And Minerals Writing direction extraction for character recognition
US10156983B1 (en) * 2017-07-13 2018-12-18 King Fahd University Of Petroleum And Minerals Method using statistical features for character recognition
US20210350122A1 (en) * 2020-05-11 2021-11-11 Apple Inc. Stroke based control of handwriting input
US12033411B2 (en) * 2020-05-11 2024-07-09 Apple Inc. Stroke based control of handwriting input
CN117332761A (en) * 2023-11-30 2024-01-02 北京一标数字科技有限公司 PDF document intelligent identification marking system

Also Published As

Publication number Publication date
US8768062B2 (en) 2014-07-01

Similar Documents

Publication Publication Date Title
US8768062B2 (en) Online script independent recognition of handwritten sub-word units and words
US8180160B2 (en) Method for character recognition
EP1564675B1 (en) Apparatus and method for searching for digital ink query
KR101354663B1 (en) A method and apparatus for recognition of handwritten symbols
CN107969155B (en) Improving handwriting recognition using pre-filter classification
Tagougui et al. Online Arabic handwriting recognition: a survey
CN102449640B (en) Recognizing handwritten words
Sabbour et al. A segmentation-free approach to Arabic and Urdu OCR
US10007859B2 (en) System and method for superimposed handwriting recognition technology
Biadsy et al. Segmentation-free online arabic handwriting recognition
KR20210017090A (en) Method and electronic device for converting handwriting input to text
Das et al. An algorithm for Japanese character recognition
Saba et al. Online versus offline Arabic script classification
Kasem et al. Advancements and Challenges in Arabic Optical Character Recognition: A Comprehensive Survey
CN107912062B (en) System and method for overlaying handwriting
Khosrobeigi et al. A rule-based post-processing approach to improve Persian OCR performance
KR20090111202A (en) The Optical Character Recognition method and device by the numbers of horizon, vertical and slant lines which is the element of Hanguel
Das et al. Survey of Pattern Recognition Approaches in Japanese Character Recognition
US9454706B1 (en) Arabic like online alphanumeric character recognition system and method using automatic fuzzy modeling
Urala et al. Recognition of open vocabulary, online handwritten pages in Tamil script
Rao et al. Orthographic properties based Telugu text recognition using hidden Markov models
Khare et al. Handwritten Devanagari character recognition system: a review
WO2006090404A1 (en) System, method, and apparatus for accomodating variability in chunking the sub-word units of online handwriting
Patil et al. Comparative Study of Multilingual Text Detection and Verification from Complex Scene
Kopparapu A framework for on-line devanagari handwritten character recognition

Legal Events

Date Code Title Description
AS Assignment

Owner name: TATA CONSULTANCY SERVICES LIMITED, INDIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LAKSHMANAN, LAJISH VIMALA;KOPPARAPU, SUNIL KUMAR;REEL/FRAME:027197/0281

Effective date: 20111104

STCF Information on status: patent grant

Free format text: PATENTED CASE

CC Certificate of correction
CC Certificate of correction
MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551)

Year of fee payment: 4

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

FEPP Fee payment procedure

Free format text: 7.5 YR SURCHARGE - LATE PMT W/IN 6 MO, LARGE ENTITY (ORIGINAL EVENT CODE: M1555); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8