AU709833B2 - Tactiley-guided, voice-output reading apparatus - Google Patents

Tactiley-guided, voice-output reading apparatus Download PDF

Info

Publication number
AU709833B2
AU709833B2 AU22665/97A AU2266597A AU709833B2 AU 709833 B2 AU709833 B2 AU 709833B2 AU 22665/97 A AU22665/97 A AU 22665/97A AU 2266597 A AU2266597 A AU 2266597A AU 709833 B2 AU709833 B2 AU 709833B2
Authority
AU
Australia
Prior art keywords
symbology
text
feedback
hand
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU22665/97A
Other versions
AU2266597A (en
Inventor
James T. Sears
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Publication of AU2266597A publication Critical patent/AU2266597A/en
Application granted granted Critical
Publication of AU709833B2 publication Critical patent/AU709833B2/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/014Hand-worn input/output arrangements, e.g. data gloves
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/03Arrangements for converting the position or the displacement of a member into a coded form
    • G06F3/033Pointing devices displaced or positioned by the user, e.g. mice, trackballs, pens or joysticks; Accessories therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/1429Identifying or ignoring parts by sensing at different wavelengths
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/1444Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B21/00Teaching, or communicating with, the blind, deaf or mute
    • G09B21/001Teaching or communicating with blind persons
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B21/00Teaching, or communicating with, the blind, deaf or mute
    • G09B21/001Teaching or communicating with blind persons
    • G09B21/007Teaching or communicating with blind persons using both tactile and audible presentation of the information
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2203/00Indexing scheme relating to G06F3/00 - G06F3/048
    • G06F2203/033Indexing scheme relating to G06F3/033
    • G06F2203/0331Finger worn pointing device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • User Interface Of Digital Computer (AREA)
  • Image Processing (AREA)

Description

WO 97/30415 PCT/US97/0207 9 TACTILEY-GUIDED, VOICE-OUTPUT READING APPARATUS Description Cross-Reference to Related Patent Applications This application is related to and claims priority from Provisional Patent Application No.
60/011,561, filed Feb. 13, 1996, titled "Hand-held Reading Device for the Location, Capture and Spoken Interpretation of Printed Text on a Surface," the contents of which are incorporated herein by reference.
Technical Field The-present invention relates to a method and apparatus for allowing persons with little or no vision to read text and other symbology, and more particularly, to a method and apparatus including a hand-held independent reading aid which provides feedback to the user of its relative position.
Background Art People who are blind or suffer from vision impairments wish for functional independence in their lives. The blind include those who are blind congenitally, those who suffer adventitious accidents, and a large and growing number of elderly individuals whose sight is deteriorating due to age-related diseases such as macular degeneration and diabetes. In the modem world, where reading is a crucial component of everyday life, the blind and those with low vision are constantly challenged by their ability to read the written word. In recognition of this, numerous devices have been produced in an attempt to allow vision-impaired people to read independently.
For individuals with low-vision, particularly the elderly, the most frequently used device to allow them to read is a simple optical magnifier. Unfortunately, as the disease progresses and vision deteriorates, the amount of magnification required exceeds that practical with a magnifying lens. For these users, one option is greater magnification provided by closed-circuit television apparatus.
However, even this palliative device is insufficient to allow reading in situations requiring easy portability or for users in advanced stages of vision loss. For example, in advanced macular degeneration, the central portion of the visual field of the eye's retina is rendered useless due to optical distortion, leaving only peripheral vision intact. In this case, extreme magnification on a television screen at close distance can at best provide only slow character-by-character reading by these users.
An alternative approach practiced by a number of manufacturers combines a flat-bed optical scanner coupled with a computer containing optical character recognition (OCR) software and a speech synthesizer. The user places an entire page of a document to be read on the scanner, which transmits the image to the computer. The OCR software decodes the image into text, and then the speech synthesizer voices the information. This system has been used very successfully to read books and typewritten letters. This approach, however, does not work well in those cases where the spatial format of the document is significant in understanding or navigating through the information content to obtain the desired information, such as in a utility bill containing hundreds of numbers where the total amount to pay is the only desired information. Furthermore, the flat bed scanners do not work well with text appearing on non-flat surfaces, such as is often found on cans of food or medicine bottles.
WO 97/30415 PCT/US97/02 0 7 9 Finally, these systems are non-portable, both because of the weight and size of components designed to scan entire pages of documents, as well as the general need for sufficient line power to drive the mechanical scanners.
Previous attempts to make a portable, hand-held reading device have proved unsuccessful because users were unable to identify and accurately track text across a page with the hand-held device. Such systems have included a hand-held device with a camera mounted therein. The identifying and tracking problem is especially acute with long lines of text, lines of text closely spaced, or text which is highly formatted. For example, an early system known as the Optacon was developed by Telesensory Systems, Inc. of Mountain View, California to allow blind users to read text directly through the tactile sense in their distal finger pad. This was accomplished using a small hand-held camera which magnified a typically one-quarter inch by one-quarter inch field of view on the paper being read onto a three-quarter inch by one and one-quarter inch array of 144 vibrating tactile stimulators. The user would guide the camera with one hand, while stroking an opposite hand finger pad over the vibrating array. The drawbacks of this system were that the user would typically require a six-week full-time training course to master the tracking of lines and the interpretation of individual characters. Furthermore, upon reaching a high level of proficiency, users would only be able to read twenty or thirty words per minute. Generally, only young users were able to master the tactile discrimination required to read using the device.
A second hand-held reading device was developed by Raymond Kurzweil in the 1980's. The hand-held scanner transferred page image swaths to a desktop computer system containing optical character recognition software and a speech synthesizer. This system has been long discontinued because it did not provide a useful means for the user to locate and track over individual lines of text.
Audible feedback to the user indicated when tracks of images with text-like characteristics were being passed over, however, this feedback was not specific enough to allow users to track text reliably. The audible feedback includes a chirping sound indicative of the scanner seeing, but not necessarily recognizing, symbols which could include symbols, designs, or drawings, as well as text. Furthermore, the synthesized voice output of the text was not provided until after an entire swath of text image had been scanned.
In addition, several IBM patents Patent Nos. 5,186,629, 5,223,828, 5,287,102, and 5,374,924) disclose a mouse-driven interface for personal computers for use by blind persons.
Unfortunately, such systems generally require the host computer to be able to report, in ASCII, the symbology that is proximate to the cursor. Since it may not always be possible to obtain such ASCII information, it is desirable to have an interface which could interpret information presented on a video screen through image recognition.
WO 97/30415 PCT/US97/0207 9 It was our intention to create a device that overcomes the disadvantages of the existing systems, in order to improve the quality of life for vision-impaired people, by allowing them independent access to printed and visually displayed textual information. It is against this background and the desire to solve the problems of the prior art that the present invention has been developed.
Disclosure of Invention Accordingly, it is an object of the present invention to provide an improved hand-held independent reading aid.
It is another object of the present invention to provide a portable independent reading aid.
It is also an object of the present invention to provide an independent reading aid useable to read symbols from a variety of surface contours, including both flat and non-flat surfaces.
It is still another object of the present invention to provide an improved reading aid which helps the user to properly track lines of text.
It is yet another object of the present invention to provide an improved reading aid capable of reading video displays.
It is still further an object of the present invention to provide a hand-held independent reading aid useable with a single hand.
It is still further an object of the present invention to provide an independent reading aid which is relatively inexpensive.
Additional objects, advantages and novel features of this invention shall be set forth in part in the description that follows, and in part will become apparent to those skilled in the art upon examination of the following specification or may be learned by the practice of the invention. The objects and advantages of the invention may be realized and attained by means of the instrumentalities, combinations, and methods particularly pointed out in the appended claims.
To achieve the foregoing and other objects and in accordance with the purposes of the present invention, as embodied and broadly described therein, the present invention is directed to a method for converting visible symbology into another humanly perceptible version of the symbology, the method employing a hand-held imaging device operated by a user. The method includes the steps of imaging the symbology to convert the symbology from an optical image to an electronic signal representative of the optical image; performing symbology recognition on the electronic signal to convert the electronic signal into recognized symbology and to provide the location of the recognized symbology in the image; providing feedback to the user based on the location of the recognized symbology in the image, the feedback being representative of the position of the hand-held imaging device relative to the symbology; and converting the recognized symbology into a humanly perceptible version of the symbology.
WO 97/30415 PCTIUS97/02079 The humanly perceptible version of the symbology into which the recognized symbology is converted may be an audible version. The humanly perceptible version of the symbology into which the recognized symbology is converted may be a Braille version. The feedback provided to the user may be tactile feedback. The tactile feedback may be provided by a moving element on the handheld imaging device. The moving element may include a moving pin. The moving pin may be part of an armature in an electromagnetic device, the electromagnetic device also including a solenoid coil.
The moving element may be moved at a frequency representative of the position of the hand-held imaging device relative to the symbology. The frequency can vary within a range from approximately six to sixty-five Hertz. The frequency may be inversely related to the relative distance between the hand-held imaging device and the symbology. The method may include a plurality of moving elements arranged in a linear array oriented parallel to a longitudinal axis of the hand-held imaging device to provide an indication of the position, relative to the longitudinal axis, of the hand-held imaging device relative to the symbology. The method may include a plurality of said parallel linear arrays to also provide an indication of the position, relative to an axis normal to the longitudinal axis, of the hand-held imaging device relative to the symbology. The method may include two arrays of three elements each.
The feedback provided to the user may be visible feedback. The visible feedback may be provided by a source of light on the hand-held imaging device. The source of light may be a light emitting diode. The method may include a plurality of light sources arranged in a linear array oriented parallel to a longitudinal axis of the hand-held imaging device to provide an indication of the position, relative to the longitudinal axis, of the hand-held imaging device relative to the symbology. The method may include a plurality of parallel linear arrays to also provide an indication of the position, relative to an axis normal to the longitudinal axis, of the hand-held imaging device relative to the symbology. The method may include two arrays of three light sources each. The plurality of light sources may include light sources which produce light of a different color than others of the plurality of light sources.
The feedback provided to the user may be audible feedback. The audible feedback may be provided when the hand-held imaging device is skewed in orientation relative to the image by more than a predetermined amount. The predetermined amount of skew may be approximately five degrees.
The method may include two modes, a search mode in which the relative position of any symbology recognized will be provided as feedback to the user and a track mode in which the relative position of a line of recognized symbology which is being tracked is provided as feedback to the user.
The user can select the search mode or the track mode. The recognized symbology may be converted into a humanly perceptible version of the recognized symbology only in the track mode.
WO 97/30415 PCTfUS97/02079 The symbology recognition may include optical character recognition. The symbology recognition may include bar code recognition.
The visible symbology to be converted may be displayed on a video display.
The hand-held imaging device may includes a camera for imaging the visible symbology on the video display and converting the image to an electronic signal. The hand-held imaging device may include a sensor to sense that the visible symbology to be converted is being displayed on a video display.
The audible version of the recognized symbology may be provided in the form of separate symbols, including letters of the alphabet. The audible version of the recognized symbology may be provided in the form of groups of symbols, including words.
The visible symbology may also be displayed in a magnified visible version.
The present invention also relates to an apparatus for converting visible symbology into another humanly perceptible version of the symbology for a user. The apparatus includes an imaging device to covert the symbology from an optical image to an electronic signal representative of the optical image; a symbology recognizer receptive of the electronic signal to recognize symbology in the image and to determine the location of the recognized symbology in the image; a feedback device to provide feedback for the user based on the location of the recognized symbology in the image, the feedback being representative of the position of the hand-held imaging device relative to the symbology; and a transducer to convert the recognized symbology into a humanly perceptible version of the recognized symbology.
The imaging device, the feedback device, and the transducer may all be located in a hand-held positioning device. The hand-held positioning device may be tapered to a narrower width at a distal end than at a proximal end. A bottom side of the hand-held position device may be adapted for placing against a surface with symbology to be read therefrom, and wherein the bottom side includes a recessed channel in the center thereof to allow the bottom side to be placed against curved surfaces to read symbology therefrom. The imaging device may include an illuminator and a camera having a lens incorporated therein. The lens and the camera may be adjustable in position relative to each other to adjust the focal length thereof so as to be able to focus the imaging device on symbology at various distances therefrom.
The present invention is also related to a method for allowing a user with impaired vision to operate a host computer with a graphical user interface including information displayed on a computer display. The method includes the steps of providing a surface onto which coded symbology has been provided; imaging with a hand-held positioning device containing an imaging system to image and convert symbology to an electronic signal, the device being adapted for placing in proximity to and IL X~~X-VII-L;"I"CIII 111-1 iyl~j~i-~ WO 97/30415 PCT/US97/02079 moving across the surface; recognizing the coded symbology in the image and determining the location of the coded symbology in the image, and based thereon, determining the position of the positioning device relative to the surface; obtaining the information in the vicinity of the position on the computer display corresponding to the position of the positioning device relative to the surface, and providing an indication of the location of the information relative to the positioning device's position; providing feedback for the user based on the locational indication, the feedback being representative of the location of the information relative to the position on the computer display corresponding to the position of the positioning device relative to the surface; and transducing the information into a humanly perceptible version of the information.
Brief Description of the Drawings Fig. 1 is a perspective view of the tactilely-guided, voice-output reading apparatus of the present invention..
Fig. 2 is a side view of a mouse of the apparatus shown in Fig. 1, showing the mouse internal components.
Fig. 3 is a top-level functional and software block diagram of the apparatus of Fig. 1..
Fig. 4a through 4g are diagrammatical representations of the preferred response of a tracking display of the apparatus of Fig. 1 to text alignments within the field of view when the apparatus is in a search mode.
Fig. 5a through 5e. are diagrammatical representations of the preferred response of a tracking dis of the apparatus of Fig. 1 to text alignments within the field of view when the apparatus is in a track mode Fig. 6 is a perspective view of the mouse of Fig. 2.
Fig. 7 is a flow-diagram of the control and functionality of the alternative modes of operation of the apparatus of Fig. 1.
Fig. 8 is a perspective view similar to Fig. 1, showing an optional flat panel display mounted on the lateral aspect of a computer of the apparatus, and showing the mouse exploded away from a page of text for ease of illustration.
Fig. 9 is a schematic diagram of the apparatus of Fig. 1, showing the capture of luminous text from a CRT screen using a photo diode to detect the illumination period and synchronize the camera image capture with that period.
Fig. 10 is a timing diagram graphically depicting timing relationships between the detected signal of the photo diode and the camera timing signals output by the camera timing driver, of Fig. 9.
Fig. 11 is a perspective view similar to Fig. 1, depicting the apparatus functioning as a peripheral input/output device through a host computer serial interface.
M
WO 97/30415 PCTIUS97/02079 Fig. 12 is an electronic block diagram of the apparatus of Fig. 1.
Fig. 13 is an end view of mouse of Fig. 2.
Fig. 14 is a cross-sectional schematic of a cord of the apparatus of Fig. 1.
Fig. 15 is an illustration of the method of establishing end physical and electrical connection for the cord of Fig. 14.
Fig. 16 is a schematic block diagram showing an optional feature of the apparatus of Fig. 1, showing the use of switched single-color illumination to enhance the reading of colored text.
Fig. 17 is a schematic block diagram of many of the components of the mouse of Fig. 1.
Fig. 18 is a cross-sectional view of the tracking display on the mouse of Fig. 2.
Fig. 19 is a perspective view of the tracking display on the mouse of Fig. 2.
Fig. 20 is a bottom view of the mouse of Fig. 2.
Fig. 21 is a rear partially-cut-away view of the mouse of Fig.2, showing a lever for adjusting the focus of the camera.
Best Mode for Carrying Out the Invention Functional System-Level Overview Fig. 1 is a system depiction of the preferred embodiment. The Independent Reading Aid (IRA) system 1 is a text reading device which includes a mouse 2, a computer 3, and a cord 5 which connects the mouse 2 and the computer 3. In the operation, the mouse 2 is passed over a surface containing text 7. Inside the mouse 2 is an image capture system (shown in Fig. containing both an illumination system as well as a camera, which captures and transmits a video image through the cord 5 to the computer 3. In the computer 3, the image is processed to enhance contrast, and then the image is analyzed by a symbology recognition program incorporated in software in the computer 3 to provide the location and identity of alphanumeric or other specific symbology within the field of view. As used herein, symbology recognition may include optical character recognition (OCR), bar code recognition, or any other process by which symbology content or meaning is recognized by recognition of the spatial characteristics of the elements of each symbol.
In order for the user to direct the mouse 2 over desired text elements, it is desirable for the user to have feedback information about the location of the text elements. This locational feedback is most intuitively provided by tactile and visual indicators located on the mouse 2. A plurality of locational feedback indicators 9 are located on the upper, forward aspect of the mouse 2, and includes both tactile, vibrating pins, as well as corresponding illuminated points that are useful for individuals with some residual vision. When the user locates text that he desires to read, a button 11 is depressed to command the computer 3 to vocalize the text content through a voice synthesizer located within the computer 3, and project same through a pair of speakers 13.
WO 97/30415 PCT/US97/02079 An aspect of the invention which provides many of its practical benefits is the rapid feedback of textual location information to the user. The locational feedback indicators 9 allow the user to rapidly locate, through tactile feel and residual vision, the individual text elements in fields of other text and graphics, and further allows the user to track along a line of text. This allows the image-capture system to process small images from the surface to be scanned, permitting near real-time text-to-speech conversion so that the user's natural sense of the location of the text can be brought into play in selecting the next text to be read. This feedback is very similar to the process of reading in a sighted person. The capacity of the system to utilize small images further endows the system the potential for great miniaturization.
Detailed Description of the System Physical Description of IRA System 1 Components Mouse Components The internal components of the mouse 2 are depicted diagrammatically in Fig. 2. The mouse 2 is enclosed by a plastic housing 14 made of anti-ballistic styrene (ABS) plastic, although other materials would suffice. The mouse 2 is connected to the computer 3 through the cord 5, which is anchored to the mouse 2 with a cord grommet 205. A characteristic of this grommet mounting is to break-away from the mouse 2 at a nominal tension, both to prevent entanglement with environmental objects, which could result in a potential safety hazard to the user, as well as mechanical damage to the IRA system 1. The mouse 2 is constructed in a splash-resistant manner with seams that are designed so that casual spills which may occur during usage will not permanently damage the internal electrical components spills occurring whiie using the IRA system I for reading in a restaurant).
The text to be read is illuminated by a plurality of illuminators 15, which in the preferred embodiment are LEDs, although strobe, fluorescent or incandescent lamps may be alternatively used. The iliuminators are placed so that they can both directly, and indirectly through mirror-bounce and inside-housing bounce, illuminate the text to be read. A transmissive diffuser 17 is placed in close proximity to the illuminators 15, so as to smooth the distribution of the illumination on the text to be read. A multiplicity of illumination sources helps to increase the consistency of the illumination. In the preferred embodiment, the illuminators 15 are placed on both the left and right aspects of the mouse 1 for this purpose.
The transmissive diffuser 17 is best a controlled twenty degree dispersion holographic diffuser, such as is available from Physical Optics Corporation of Torrance, California under Part No. LSD20PC 10, in order to allow the greatest amount of light to pass, and so as to reduce unwanted scatter in unneeded directions. However, other diffusing systems, including scattering diffusers, may work in both the reflective and the transmissive configurations.
A window 21 is provided for reading the text appearing therethrough. The window 21 is constructed from anti-reflection-coated glass, although certain window materials including sapphire and plastics may
L
WO 97/30415 PCT/US97/02079 substitute successfully. The purpose of the window 21 is to allow the mouse 2 to be completely sealed against contamination, and to protect the sensitive components inside the mouse 2.
Scattered light from the illuminated text transmits back through the window 21 and bounces off of a mirror 19 toward a camera assembly 20. The use of the mirror 19 allows a longer focal length for the imaging system while remaining in a compact configuration. This longer focal length reduces image distortion and increases the depth of field.
The camera assembly 20 includes a lens 23 and a camera 25, such as is available as Part No. A53,308 from Edmund Scientific of Barrington, NJ. The lens 23 is conveniently an 8mm focal length f/16 aperture lens and is adjusted to focus on images located anywhere between the surface of the window 21 and the lower surface of the mouse 2, typically located three millimeters (mm) below the bottom of the window 21 surface. This provides a field of view 54 thirty mm wide and twenty-two mm high, although other dimensions can be used. The camera 25 is a black-and-white/grey-scale CCD camera. Alternatively, CMOS or other types of cameras may be used.
In the preferred embodiment, the black-and-white camera 25 outputs grey-scale images. Alternatively, a color camera can be used outputting full-color images. The advantage, however, of the black-and-white camera is an enhanced sensitivity to light, approximately ten-fold that of color cameras, thus allowing a smaller optical aperture, such as 0.50 mm, with a resultant greater depth of field of approximately 6mm.
In addition, black-and-white images require lower information band width for image transfers to the computer 3.
A CRT emission sensor 27 is located in the mouse 2 so that it views the scene under the window 21.
The purpose of the sensor 27 is to determine whether the text to be read is being displayed from a CRT screen rather from a document, in which case special functions are used to synchronize the camera 25 with the scan cycle of the CRT display. This functionality is described in more detail in a later section.
The buttons 11 and the audio speakers 13 are located on both the left and right lower lateral aspects of the mouse. Their placement is determined by the ergonomics of hand placement and ease of use, which will be considered in greater detail in a later section.
A mouse circuit board 207 contains the electronic components which interface between the IRA system 1 cord 5 and the internal components of the mouse 2. These components include the afore-mentioned buttons 11, the speakers 13, the illuminators 15, the camera 25, and the CRT emissions sensor 27, as well as a microphone 63, a mode scroll-up button 103, a mode scroll-down button 105, and a tracking display 53, whose functions will be described in a later section.
Overall Electronic Description Fig. 12 presents the electronic block diagram of the IRA system 1. Elements in the right of the figure are input/output components contained within the mouse 2. Elements in the left of the figure are WO 97/30415 PCT/US97/02079 computational, power, control, and interface elements contained within the computer 3. The connections between the elements, shown in the center of the figure, are contained within the cord Images are captured by the video camera 25 through the lens 23, and analog video signals are sent via a cord element 177, contained within the cord 5, to a video frame-grabber 191 contained within the computer 3. Video frame-grabbers are widely available from companies such as ImageNation of Beaverton, Oregon (the PX104PLUS), Data Translation, Inc. (the DT3152), and Epix of Buffalo Grove, Illinois. The frame-grabber 191 creates digitized bit-maps in memory from the video analog signal supplied by the camera 25. These memory bit-maps are transmitted to a computer circuit board 193 through a high-speed bus 194. The bus 194 should have a high bandwidth, such as a PCI bus or a bus conforming to the 1394 FireWire interface standard. The PCI bus is the standard communications bus found on Pentium-class and PowerPC computers. Both PCI and 1394 FireWire communications standards are characterized by high data rates and the ability to transfer data directly to computer memory without the need for intensive main processor involvement. The computer circuit board 193 contains software for analyzing and converting the image files into text information, including an OCR program 31, through a novel word detector 59, a speech synthesizer 61, and a tracking display driver 51. The computational speed of the computer circuit board 193 is critical to the real-time operation of the IRA system 1, and a processor with 133 MHz Pentium-class performance or better is preferred. The computer circuit board 193 produces digital data strings which are converted into analog waveforms by a D/A converter 185. These waveforms are output to the audio speakers 13 through a cord element 175. A volume control 187, located on the exterior of the computer 3, allows the operator to manually adjust the volume output through the audio speakers 13 to a comfortable level. Alternatively, the user may insert earphones into an earphone jack 189, which permits private reading by the operator, reading by those with hearing impairment, or reading in environments with high ambient noise.
A field programmable gate array (FPGA) 173, located within the mouse 2, provides the interface between the assorted elements of the mouse 2 and the computer circuit board 193. The FPGA 173 includes custom-designed circuitry comprising a high-speed serial interface, assorted timing elements, and hardware driver elements necessary for video camera control, LED drivers, tactile stimulator drivers, mode button interface, illumination control, and speak button interface. In the preferred embodiment, the FPGA 173 is a XC3030VQ64 FPGA, available from XILINX of San Jose, California, that is programmed upon power-up by the computer circuit board 193, although many other commercially available circuit solutions may provide equivalent function.
The FPGA 173 communicates with the computer circuit board 193 over a cord element 179 using a high-speed bi-directional communications protocol. The gate array 173 interfaces with most of the elements of the mouse 2. The output of the CRT emissions sensor 27 is amplified by an amplifier 125, WO 97/30415 PCTIUS97/0207 9 and the resultant signal is binarized by the gate array 173. When in CRT mode, this information is used to modify camera control signals generated within the gate array 173, which drive the timing cycles of the camera 25. In addition, signals are sent to the computer circuit board 193, informing the computer whether the camera 25 is reading from a CRT display. The illuminators 15 are controlled by a hardware driver 167, which is commanded by signals from the gate array 173. It should be noted that in different IRA system 1 embodiments, the illuminators may be comprised of either white light lamps, or variously colored LEDs, depending on the type of color discrimination desired. In the case of multiple color LEDs, the timing of the different illuminators is controlled through the FPGA 173, and reported over the cord element 179 to the computer circuit board 193.
The tracking display 53 includes a plurality of LEDs 107, 109, 111, 113, 115, and 117, as well as a plurality of tactile stimulators 65, 67, 69, 71, 73, and 75. Preferably, the upper LEDs 107 and 113 emit red light, the middle LEDs 109 and 115 emit green light, and the lower LEDs 111 and 117 emit orange light, so that low vision users can more easily distinguish which LEDs are illuminated. Both the LEDs and the tactile stimulators are controlled by the FPGA 173 through a plurality of hardware drivers 169 for the tactile stimulators and a plurality of hardware drivers 171 for the LEDs. The computer circuit board 193 sends signals to the FPGA 173 through the cord element 179 about the logical state of the tracking device 53, including the active geometrical arrangement of tactile stimulators and LEDs, as well as their frequencies and actuation strengths. This logical information is interpreted by the FPGA 173 and sent to individual hardware drivers, one for each tactile stimulator and LED, determining their state of activation.
In the preferred mode, timing oscillators for the tactile stimulators are contained within the logic of the FPGA 173. Alternatively, the computer circuit board 193 may internally decide the instantaneous state of each LED and tactile stimulator, through its own internal oscillator, and transmit this state to the FPGA 173, which directs individual hardware drivers to their correct state.
The state of the mode scroll-up button 103 and the mode scroll down button 105 are detected by the FPGA 173 and communicated to the single-board computer 193 through the cord element 179. Likewise, the state of the buttons 11I are communicated to the computer circuit board 193 through the FPGA 173.
Fig. 17 depicts the circuit components on the mouse circuit board 207. The functions of this circuit board are predominantly performed by the FPGA 173, which includes the functions surrounded by the inner box in the figure. These conventional functions are created through standard software modules provided by the manufacturer and are well-known in the art. These software modules are transferred from the read-only memory resident in the single-board computer 193 programmed into the FPGA 173 through the wire 179 via a serial interface 275 upon each power-up of the IRA system 1. An FPGA clock 269 uses a crystal 271 to provide timing signals for the FPGA functions. A timer oscillator 273 feeds tactile stimulator drive signals to a pin driver selector 277, whereupon individual drive signals are sent to the WO 97/30415 PCT/US97/02079 tactile stimulator drivers 169. The tactile stimulator drivers 169 include standard bipolar of FET transistors, sinking current through a plurality of solenoid coils 283, servicing each tactile stimulator. At each moment of current disruption through the solenoid coils 283, a current is generated from field collapse that illuminates the associated tactile stimulator LEDs 107, 109, 111, 113, 115, and 117. In this configuration, the individual hardware drivers 171 for the tracking LEDs are not required, since the stored energy in the coil 283 incidentally drives the associated LED.
A frequency counter 279 measures the pulsation frequency of light incident on the CRT sensor, whose output is amplified by the amplifier 125. The hardware drivers 167 control the current through the illuminators 15 through a plurality of resistors 285. A camera timing oscillator 281 generates the standard integration refresh and clockout signals used in the camera 25. Such signals are used in both CMOS and CCD cameras.
Tracking Display Description Fig. 18 depicts an end-view cross-section of the tracking display 53. The solenoid coil 283 rests within a cylindrical recess in a magnetic steel housing 289, and when energized, pulls a vibrating armature 287 upwards across the distance of an impact standoff 295, which is typically 0.015". As the vibrating armature 287 comes to the end of this distance, it strikes the underside of a rubber-like sealing sheet 297, whereupon shock-like displacements of the sheet are sensed by tactile sensors in a human operator's finger 305, which rests within a smooth finger trough 303 fashioned in the magnetic steel housing 289. The rubber-like sealing sheet 297 is affixed to the magnetic steel housing 289 using a sealing adhesive 299 around its border, such that contamination does not interfere with the free motion of the vibrating armature 287. A bumper O-ring 291 prevents a me:ailic striking noise or buzz from emanating from the tactile display 53 when the operator's finger 305 is not fully in contact with the rubber sheet 297.
When the solenoid coil 283 is de-energized, a combination of gravity and stored energy in the rubber sealing sheet 297 and the bumper O-ring 291 moves the vibrating armature 287 downwards, where it strikes a rubber rebound bumper 293. The rebound rubber prevents metallic noise emanation, as well as providing a mechanically resonant amplification of the vibrating armature movement. The hardness (measured in durometers) of the rubber rebound bumper 293, the bumper O-ring 291, and the rubber-like sealing sheet 297 is selected to maximize the skin deflection and tactile sensation afforded by the assembly.
The presence of the rubber-like sealing sheet 297 has additional benefits. In addition to sealing the display, the sealing sheet 297 spreads the area of tactile vibration beyond the diameter of the tactile stimulator vibrating armature 287, making it less critical to tactile sensitivity the exact placement of the operator's finger 305 on the surface of the tracking display 53.
WO 97/30415 PCT/US97/02079 Of course, the mechanism for inducing skin deflection could be accomplished with means other than the use of magnetic solenoids 283 actuating vibrating armatures 287. Alternatively, these other means could include piezoelectric actuators, electro-rheological fluids, electro-tactile techniques, or memorymetals.
The LEDs 107, 109, 111, 113, 115, and 117 are mounted in depressions on the lateral aspects of the magnetic steel housing 289, outside of the finger trough 303, so that when the operator's finger 305 is resting in the finger trough, the LEDs are visible on either side.
A plurality of electrical leads 301 connect the LEDs 107, 109, 111, 113, 115, and 117, as well as the solenoid coils 283, to a conventional flexible wiring circuit board (not shown) for transmission to the mouse circuit board 207.
Certain operators may suffer from neuropathy afflicting the tactile sensors in their fingertips. In such cases, the use of the tracking device 53 is compromised. In its stead, auxiliary tactile tracking devices may be employed. For example, a more widely spaced set of tactile stimulators may be strapped to the operator's arm, and connect with the computer 3 directly to provide feedback information to the operator.
The wider spacing of tactile stimulators in such an auxiliary tactile display is necessitated by the wider spacing of tactors in the general skin surface as contrasted with the fine spacing of tactors in fingertips.
System Power In Fig. 12, a rechargeable battery 197 contains the electrical energy necessary to support the IRA system 1. In the preferred embodiment, this battery is chosen to provide the highest energy density possible, such as a metal hydride battery. Convenient batteries may be chosen from a variety of consumer electronics devices, including those used for portable computers or portable video cameras.
The power from the battery 197 is converted into the various voltage levels needed for the electronic devices in the IRA system 1 by a power conditioner 199. Outputs from the conditioner 199 are sent to the electronics contained within the mouse 2 through a cord element 183. Other outputs are directed to the computer circuit board 193 and other elements of the computer 3. An on/off switch 201 controls power output from the rechargeable battery 197 and may be combined with the volume control 187. A charge connector 203 provides charging means for the battery 197.
During periods of inactivity longer than a given threshold time, the computer circuit board 193 goes into a "sleep mode" in order to preserve battery power. During this sleep mode, all power consuming elements in the mouse 2 are turned off, as well as elements in the computer 3. Activity in the computer circuit board 193, shown in Fig. 12, is reduced to the minimum necessary to remain in an active state, in which dynamic memory is retained. In addition to the communication of the button 11 through the FPGA 173, the buttons 11 have a direct link to the computer circuit board 193 through a cord element 181 to provide a wake-up signal once pushed, which restores power to all IRA system 1 system elements. The WO 97/30415 PCT/US97/0207 9 microphones 63 are connected to an amplifier 32 in the circuit board 207 which connects through a cord element 323 to an A-to-D converter 325 for communication to the computer circuit board 193, as shown in Fig. 12.
Ergonomic Design of Physical Components Use of the IRA system 1 will be primarily among an elderly population with little familiarity with, and often an aversion to, computer devices. The IRA system 1 needs to be easy for operators to use, and such ease is determined both by the mechanical ergonomics of the hardware construction and the design of its software user interface, both of which must make the IRA system I intuitive to use. This section presents those novel mechanical aspects of the construction that contribute to its ease of use.
Fig. 13 presents an end view of the mouse 2. The tracking display 53 and its associated tactile stimulators and LEDs are shown in the upper aspect, with the mode scroll-up button 103 and the mode scroll-down button 105 on the forward and upper-forward aspects of the mouse.
Of note is a U-shaped slot 209 on the bottom of the mouse 2, that extends along the length of the mouse 2 parallel to the longitudinal axis of the mouse 2. This slot assists users in aligning the mouse 2 parallel to the axis of cylindrical objects such as medical pill bottles and food cans. Using this feature, the cylindrical object is placed with its axis parallel to the longitudinal axis of the mouse 2, and is placed flush against the ridges flanking the U-shaped slot 209. In this orientation, the text on the cylindrical object is located directly beneath the window 21. The operator may conveniently rotate or translate the cylindrical object to be read within the U-shaped slot 209, using the slot 209 as a guide.
In Fig. 2, the end of the U-shaped slot 209 is shown terminated by a mouse cutout 213. The purpose of the cutout 213 is to allow bottle caps and can rims to fit underneath the end of the mouse 2, so that text near the end of the bottle or rim can be read through the window 21.
Many of the ergonomic features of the mouse 2 are easily apprised in Fig. 6. The operator's index or middle finger fits into the finger trough (which may be viewed in cross-section in Fig. 18) on the upper surface of the tracking display 53. The thumb and either the middle or ring finger rest on opposite sides of the mouse 2, resting comfortably over the buttons 11 on each side of the mouse. The tactile display is located directly over the window 21, and therefore presents tactile information to the operator in overlying alignment with the text to be read. With this overlying alignment, the human factors kinesthetic response necessary to track the mouse 2 along text is natural and intuitive, similar to tracing along a vibrating line with an index finger. This is naturally performed by readers of Braille and mimics the actions of small children learning to read.
Because the mouse 2 may be both translated and rotated while tracking text to correct skew alignments of mouse and text, the bottom of the mouse 2 is made from a slick plastic with no preferred movement directionality. In the preferred mode, the bottom surface of the mouse 2, corresponding to the ridges on L 1 n- crrrrr~rr*lral. r "-i WO 97/30415 PCT/US97/0207 9 the sides of the U-shaped channel 209, is formed from a self-lubricating plastic, although other methods may be employed, including the use of Teflon pad inserts.
The mouse 2 is wider in a posterior region 215 fitting under the operator's ball of the palm than a forward region 217 gripped between the thumb and the middle or ring finger (see Fig. 1 and Fig. This tapering of shape has been shown to naturally counteract the skew rotation arising from forearm pivoting when the device is scanned left or right along a line of text.
On the lower lateral aspect of the mouse 2 are a plurality of flared extensions 219 which are integral with a plastic housing 211. The operator's thumb and middle or ring finger rest half on and half off these flares during normal operation. This positioning of the fingers both provides the user rest, as well as the ability to modulate the friction and closely monitor the movement of the mouse over the surface to be read.
The buttons 11 are located in positions such that the operating fingers rest naturally and comfortably on them. The buttons 11 provide tactile feedback to the user when pressed through a distinctive tactile click. The raised prominence of the buttons 11 allows easy finger positioning by operators, and the elongated shape of the buttons 11 allows operators with different finger lengths and hand shapes to make use of the buttons 11.
The speakers 13 within the mouse 2 are located in such a position that when the mouse 2 is comfortably gripped with either hand, at least one speaker 13 projects towards the user through the gap between the thumb and index finger. The audio speakers 13 are placed in the mouse 2, rather than in or with the computer 3, so that the location of the sound output provides audio-spatial cues to the reader about the location of text being read. In addition, this location minimizes the required audio volume, since the operator's facial attention is naturally and directionally focused on the material being read.
The microphones 63 are placed in the front of the mouse 2, so that when the user wishes to speak into either of the microphones 63, he merely raises the mouse 2 to his face, and one of the microphones 63 is naturally directed towards the operator's mouth.
All aspects of the external interface to the mouse 2, including the tracking display 53, the buttons 11, and the speakers 13, are bilaterally symmetrical. This allows equal ease of use by either hand, and may be used equally well by left- and right-handed users.
The use of the IRA system I is facilitated by the cord 5 having a limber construction without substantial physical memory. Such a cord is generally difficult to construct given the large number of wires contained within the cord, and the nature of molded plastic sheaths which stiffen with cold. Fig.
14 depicts a schematic of the cord 5 in the preferred embodiment. A braided cloth exterior 221 surrounds an extension-limiting filament 225 and a plurality of wires 223, which carry the electrical signals between the mouse 2 and the computer 3. The filament 225 and the sheath 221 are bonded at each end to each WO 97/30415 PCT/US97/02079 other at a plurality of attachment points 227, with the filament 225 fully extended and the sheath 221 relaxed. It is a characteristic property of the braided sheath 221 that its diameter changes under tension, and the attachment between the sheath 221 and the filament 225 is performed when the sheath 221 is not under tension so that its diameter is large and it does not cinch down on the filament 225 and the wires 223. The method of attaching the sheath 221 to the filament 225 at the attachment point 227 is by thermoplastic fusion, although adhesive bonding or mechanical cinching are also practical.
Fig. 15 depicts the method of establishing the physical and electrical end connections for the cord constructed according to the method of Fig. 14. A connector 231 plugs into the computer 3 and transmits signals through the wires 223. A connector bell 229 provides the mechanical interface and protective means attaching the sheath 221 and the filament 225 through the attachment points 227. In the preferred embodiment, the diameter of the strain relief 233 through which the cord 5 is threaded is smaller than the diameter of the cord 5 at the attachment points 227, providing a secure physical grip on the cord. On the other end of the cord 5, a strain relief 223 also provides an orifice which is smaller than the attachment points 227, preventing the cord 5 from pulling through the strain relief 233. The internal wires 223 are connected to a mouse end connector 235, which is fashioned to mate with a connector located on the circuit board 207 located in the mouse 2.
While the IRA system I can be used as a device at a fixed location, it is generally intended for use as a portable device. Fig. 1 depicts a number of features which contribute to its ease of portable use. The computer 3 is normally protected and housed in a fabric cover 247 (as shown in Fig. which is comprised of a nylon cloth in the preferred embodiment, although other fabrics or materials such as leather are suitable. The mouse 2 is protected and housed during periods of non-use in a mouse pocket 241, and secured within the pocket 241 by a mouse pocket closure 2- The means of closure in the preferred embodiment is a pair of mated Velcro strips, although adhesive band, buttons, or mated snaps are also suitable. The flexible cord 5, during periods of non-deployment of the mouse 2, is conveniently stowed within a cord pocket 239. The pockets 239 and 241 are integrated into the construction cover 247, as may be other storage pockets for miscellaneous utilitarian uses, such as for the user's wallet and keys. A convertible strap 237 and a buckle 245 are connected to the fabric cover 247, in a reconfigurable means, such that the strap 237 may serve either as a belt or shoulder means of carrying the IRA system 1.
Computer Decoding of Text Images Fig. 3 depicts a top-level software block diagram. Images of the text 7 are captured by the camera and are passed to the computer 3 for processing. The computer 3 includes a number of separable functional components, which may or may not be embodied in separate hardware components. The camera image is initially placed into memory, either on the computer, or within a specialized hardware component such as the afore-mentioned frame-grabber 191. This image includes an array of pixels, which WO 97/30415 PCTUS97/02079 in the preferred embodiment are gray-scale pixels with a depth of 8 bits. In Fig. 3, the components physically located within the computer 3 are located within the box designated as the computer 3.
An optional initial stage of image processing is pre-processing which enhances the contrast of the bit field, carried out by an image pre-processor 29. In the preferred embodiment, the image pre-processor 29 is implemented in software running on the computer circuit board 193, although it may alternatively be implemented through a hardware pre-processor such as a XILINX FPGA or a digital signal processing chip such as a Texas Instruments C80. This is needed because many optical character recognition (OCR) software programs require high contrast images. A simple yet useful algorithm is to set a threshold value, above which all pixels are set to white, and below which all pixel values are set to black. This works in conjunction with the camera's automatic exposure control to create a generally satisfactory image for input to an OCR program. The threshold value may be set at fixed value, or may be dynamically adjusted to accommodate variations in surface reflectivity or illumination intensity. The pre-processor 29 may be either a software program carried out by the main processor of the computer 3, or may be a specialized hardware image processor, such as a gate-array processor or other image processor specialized for such a task.
The outcome of the pre-processing is an image with high contrast, and which may be optionally reduced in pixel depth from a greyscale to a single bit depth binary image. Such binary images may be very rapidly decoded by a variety of OCR programs.
The image is then presented to the OCR program 31 in order to decode the text. Such OCR programs include the XIS OCR engine from Xerox and others that are widely available from a variety of software vendors, including Caere Corporation, International Neural Machines, Mitek, and others. In the preferred embodiment, we have used the Tiger OCR library available from Cognitive Technology Corporation. The input to the OCR program 31 is a bitmap of the text to be read, and the output is a database of text locational data 33 relating to text within the image. In the preferred embodiment, the text locational data 33 includes not only the identity of the individual text elements, but also the location of each text element, the degree of confidence with which the character or symbology was identified, the font and point size in which the text is rendered, and the degree (angle) of skew from the horizontal. In cases where the OCR program does not provide all of this information, some of the missing information may be derived from the provided information. For example, in the absence of skew information, the skew angle can be computed by trigonometry from the relative locations of adjacent text elements.
Symbols and characters which overlap the boundaries of the viewing field are discarded, in order to prevent certain mistakes in OCR interpretation. For example, the character which is at the boundary of the viewing field, may be truncated on the right of the character to produce an apparent By ri-~~il r -rri i i; I i r. ii r -ir,-r~;u-uliuxlrrrr~~i ir.
WO 97/30415 PCT/US97/0207 9 deleting characters that overlap the boundary of the viewing field, such mistakes are avoided. This procedure is facilitated because the OCR program 31 returns the location of each text element.
In many cases, text will be presented to the IRA system 1 in an orthogonal rotation. For example, a blind user picking up a document cannot easily determine whether the text is upside down or sideways (in landscape mode). In such cases, it is of high utility for the IRA system 1 to alert the reader. To perform this function, the pre-processor 29 in general computes a variability measure, which is conveniently a sum of edges in the field of view 54. Moving across each row in the bitmap, a change from a zero bit to a one bit, or from a one bit to a zero bit, increments a counter. This variability measure may be computed by a variety of alternative means. If this variability measure exceeds a threshold, the image can be said to contain contrast structure, generally to be either text or graphics. If alphanumeric content is not recognized in the normal orientation by the OCR program 31, the bit-image is rotated in the image pre-processor 29 by ninety degrees and presented again to the OCR program 31 for interpretation, and so forth. If text content is not recognized in any of the four orthogonal rotations, then the user is alerted by the vocalization of the words "graphics", for example. If the text is recognized in one of the orientations other than the upright presentation, the user is alerted by the vocalization of the proper rotation, such as "rotate left" or "upside down" to establish upright orientation.
The IRA system 1 has two fundamental modes of operation: a search mode and a track mode. When neither button 11 is pressed (thus, the system is in the search mode), software switch 41 is in position to transmit signals from a center text Y-locator 35 to the tracking display 53. When the track mode is selected by pressing and holding in either of the buttons 11, the software switch 41 moves to position 42 and a software switch 43 moves from a null or unconnected position to position 44. The center text Ylocator 35 determines and outputs the vertical location value") of the most vertically-centered line of text. If multiple lines of text are located within the image, then the vertical positions of the letters in the line closest to the vertical center of the field of view 54 are averaged. The OCR program 31 formats the text into lines, along with the vertical position of each line within the field of view 54. The center text Y-locator 35 determines which of these line positions is closest to that of the center of the field of view 54, and outputs that line's vertical position.
With either of the buttons 11 pressed so that the system is in the track mode, as each new image frame is analyzed, a track line updater 36 determines if text patterns match that of previous elements in the current tracking line. If so, redundant elements within the two patterns are eliminated, and the track line identity is enlarged to include the new elements. For example, if the current track line includes the text "This is the tim", and the newest frame includes the text "e time for all", the track line updater 36 tests different registrations of the current track line and the newest frame for maximum correlation. At the registration which yields the largest correlation, the lines are merged to form the new track line and WO 97/30415 PCT/US97/02079 duplicate characters are dropped, and in this case the track line is amended to be "This is the time for all".
It should be noted that the OCR program 31 may return characters with a low certainty of interpretation, and the OCR program 31 reports this lack of certainty. In such a case, the track line updater 36 will replace the isolated instances of low certainty characters with a "wild-card" symbol, permitting the track line updater 36 to match this wild-card symbol with any other symbol or text element.
This permits a correlation and merging activity to continue even in the presence of occasional low certainty characters.
The advantages of the technique used by the track line updater 36 for sensing text string overlap are substantial. Instead of having to merge image fragments in image space to capture entire lines, the IRA system X merges symbology strings in ASCII or "content" space, which requires many orders of magnitude less computation then merging in image space.
The essential logic of tracking lines of text is to find strings of symbols which are contextually related to one another. This relationship is defined for purposes of the algorithm to always involve physical contiguity, that is, letters adjacent to each other or separated by language constructs such as punctuation or small spaces between words. When the space between words exceeds some multiple of the average symbol width, a break in the track line is registered. The incorporation of this logic into text line tracking allows IRA system 1 to disambiguate multiple columns of text, since the IRA system 1 stops talking and tracking when excessive space is encountered during the course of a single push of the button.
Operationally, tracking is terminated by a space exceeding a multiple of the average symbol width, where the multiple is generally between 3 and 4 text spaces wide. In the case of proportional text, where symbol width is variable, the spacing multiple is increased to account for this intrinsic variation.
A tracking text Y locator 37 outputs the average Y value of the text elements corresponding to the current track line within the current image. The tracking text Y locator 37 uses the text position information obtained from OCR program 31 to determine the average vertical position of the current tracking line in the field of view 54.
A track line outputter 39 outputs text elements newly added to the track line. When the new frame text and the current track line text are merged, new text elements which are added to the current track line are output.
In overview, when the button 11 is pressed and held, the IRA system 1 enters a mode ("track mode") in which the central line of text is used and remembered as the tracking line. As long as the button 11 remains pressed, and the track line remains within the field of view 54, the IRA system 1 will attempt to assemble new elements to that line, outputting their vertical coordinates to the tracking display driver 51, WO 97/30415 PCTIUS97/02079 and vocalizing any complete words which are encountered. This is accomplished through the software switch 41 and the software switch 43, which are electronically manipulated via the button 11.
When the button 11 is not pressed the IRA System 1 is in the search mode in which the software switch 41 channels the Y values of the most vertically centered line from the center text locator 35 to the tracking display driver 51. This driver 51, in a manner to be described shortly, includes both software and hardware elements that deliver the Y values of the most centrally located text in the window 21 to the user through the tracking display 53 so that he may explore the locations of text elements relative to the current mouse position. When the button 11 is pressed to enter the track mode, the software switch 41 channels the Y values of the current tracking line from the track line Y locator 37 to the tracking display driver 51, so that the user may manipulate the mouse 2 through control of his hand 57 to continue tracking the text line chosen at button press.
In the search mode, the switch 43 deactivates any text output to the speech synthesizer 61. In the track mode, the switch 43 channels text elements found in the current tracking line from the track line text outputter 39 to the novel word detector 59. The novel word detector 59 determines when a word has been defined by virtue of having a space or punctuation before and after, and further determines whether the word is novel and has not been vocalized immediately prior (that is, repetitive with the previous vocalized word). If both conditions are met, the letter string is transmitted to the speech synthesizer 61, which is a software program, which may have hardware assistance, that computes and generates an electronic wave form which, when played through the audio speakers 13, is heard by a human operator 55 as the novel word. Such a speech synthesizer might include a Digital Equipment Corporation DECtalk hardware device, or soft;2.are programs such as AT&T's "Watson" that may be played through widely available audio-output hardware such as a Sound Blaster or a digital to analog output device 185 connected to speakers 13.
The tracking display driver 51 presents information to the human operator 55, through tactile, visual and audio feedback, to allow the directed manipulation of the mouse 2 through hand control 57 both in horizontal, vertical, and angular presentation to the material being read. The inputs to the tracking display driver 51 includes the output from the center text locator 35, the track line Y locator 37, and a skew detector 47. The tracking display driver 51 includes a software program that determines which tactile stimulators and LEDs should be active and at which frequency they each should be activated. The tracking display driver 51 includes hardware components within the mouse 2 which physically activate the tactile stimulators and LEDs through electronic oscillators and transistor drivers in accordance with serial commands from the computer 3. This combination of tactile, visual and audio stimulus serves to provide the user with an intuitively understandable feedback mechanism to locate and track individual lines of text.
LL ~-~.I-III.IX~~.IPIII-I~y(l .I_ WO 97/30415 PCT/US97/02079 The tracking display 53 includes a set of six solenoids which actuate vibrating pins, hereafter known as tactile stimulators, 65, 67, 69, 71, 73, and 75, that excite propriosensory responses in the distal pad of the finger of the human operator 55. It is well noted in the scientific literature that different proprioreceptors in the fingertip respond to different frequency stimuli, and have different spatial discrimination capabilities. By using an impulse displacement stimulation which excites more than one type of proprioceptor, the perception of the pins with maximum sensitivity and spatial discrimination is achieved. In addition, a set of six LEDs, 107, 109, 111, 113, 115, and 117, corresponding to and mounted adjacent to the tactile stimulators 65, 67, 69, 71, 73 and 75 are collaterally energized to provide visual augmentation of the tactile stimulator feedback. Both the tactile stimulators and the LEDs are arranged in two columns of three elements (see Fig. 6 and Fig. 13). Information is provided to the human operator through both the pattern of tactile stimulators energized, as well as the frequency at which the tactile stimulators vibrate. A collateral effect is the audio artifact generated by the vibration of the tactile stimulators, which is apprised by the human operator as a variable frequency soft-toned "buzz." The frequency of this buzzing provides additional centering cues to the human operator. In a later section, the relationship of the patterns of tactile stimulator and LED feedback to the text locational data will be presented in detail. It should be noted that alternative arrangements of tactile stimulators are within the spirit of the invention. A higher density of tactile stimulators may provide more detailed information to operators, and a single column of tactile stimulators has been demonstrated in a prototype to communicate acceptable positional information. Even a single tactile stimulator provides important positional information that is of sufficient assistance to operators for detecting and tracking text.
The tracking display driver 51 transmits "chirps" through the audio speakers 13 in response to the skew detector 47, which detects when the text angle exceeds a threshold angle necessary for accurate OCR interpretation by the optical character recognition program 31. In the preferred embodiment, a threshold angle of five degrees is used, although this angle is highly dependent on the specific OCR program 31 used. If such a threshold is exceeded, the human operator is informed through tactile, visual and audio displays via the audio speakers 13 and the tracking display 53. The human operator 55 can then correct the skew angle of the mouse 2 to the text 7 so as to eliminate the audio feedback.
Feedback Algorithm Fig. 4 presents the preferred response of the tracking display 53 to text alignments within the field of view 54 which is depicted schematically in alignment under the tracking display 53 when the button 11 is not pressed, so that the IRA system 1 is in the search mode. The text 7 is shown in a fixed position in the figures; while the mouse 2 is moved over the text by the human operator 55. The tracking display 53 is located on the mouse 2, whose field of view 54 moves relative to the text 7 in conjunction with the mouse 2. When the button is not pressed, the IRA system 1 is in the search mode, in which the most WO 97/30415 PCT/US97/02079 central recognized text in the field of view 54 is tracked. In this figure, only the tactile stimulators 67, 69, 71, 73, and 75 are depicted, although corresponding LEDs would energize in coordination with the tactile stimulators. The dashed grids designate the image sectors of field of view 54 corresponding to each pin. In Fig. 4a, no recognized alphanumeric text is present within the field of view 54. In such an instance, no tactile stimulator is energized, as indicated by the unfilled pin symbols.
In Fig. 4b, alphanumeric text is recognized only within the right zone of the field of view 54. The center-text Y locator 35 determines the Y value of the text located most closely to the vertical center. In this case, the text substring "THE" of "THE QUICK BROWN FOX JUMPED" is bolded to indicate it is within the field of view 54, and it is the text closest to the vertical center. The tracking device driver 51 directs the right-lower tactile stimulator 75 to energize, indicated by the designated tactile stimulator. In all cases, the frequency of the tactile stimulator energization is inversely varied in relation to its distance from the vertical centerline. In testing, it has been discovered that a minimum frequency of six Hertz (Hz) when detection occurs at either the top or bottom extremes of the field of view 54 is a good choice for the lowest threshold frequency. As the text elements move closer to the center of the field of view 54, the frequency is exponentially increased to a maximum of sixty-five Hz to provide good tactile-frequencybased center discrimination.
In Fig. 4c, alphanumeric text is recognized only within the right zone of the field of view 54.
However, the substring "THE Q" is bolded to indicate that it is the most vertically centered text, now within the central-right zone, so that the tracking device driver 51 directs the energization of the rightcentral tactile stimulator 73. Note that in this algorithm, even though text is identified in more that one zone of the field of view 54, only one tactile stimulator of a column will be energized at one time. This algorithm allows the human operator to focus on centrally-located lines, removing other, potentially distracting tactile stimulus from their consideration. The stimulator tactile frequency, the audio frequency, the stimulator position, and the LED color and position provide somewhat redundant sensory information.
These stimuli are fused together within the mind of the user, providing a more robust and intuitive centering response. Another advantage is that the centering response of the operator is not eliminated even if one of the sensory modalities is disrupted by a poorly-centered finger pad on the tracking display 53).
In Fig. 4d, alphanumeric text is recognized in both the left and right zone of the field of view 54. The most vertically centered text substring, "ICK BROWN F" spans both left and right in the field of view 54, and so both the left-central tactile stimulator 67 and the right-central tactile stimulator 73 are energized.
In this case, tactile stimulators in both left and right columns are energized, but as mentioned previously, only one tactile stimulator in each column may be energized.
Y- WO 97/30415 PCTfUS97/02079 In Fig. 4e, alphanumeric text is recognized in both the left and right zone of the field of view 54. The mouse 2 has moved lower on the text 7, so that the central substring of text now becomes "AZY DOG.", present within both the left-central and right-central zones and so both the left-central tactile stimulator 67 and the right-central tactile stimulator 73 are energized. In the search mode, the IRA system 1 freely changes the text being tracked to match the identity of the most centrally located text within the field of view 54. Thus, that the central line has changed from "THE QUICK BROWN FOX JUMPED" to "OVER THE LAZY DOG." will have been felt and heard by the operator as a stimulation frequency ripple but this had no effect on pins energized in the tracking display 53 during that transition.
In Fig. 4f. alphanumeric text is recognized only within the left zone of the field of view 54. The center-text Y locator 35 determines the Y value of the text located most closely to the vertical center, in this case the sub-string "ULTY.", which is located in the upper left zone of the field of view 54. The tracking device driver 51 directs the left-upper tactile stimulator 65 to energize.
As described above, the human operator 55 is alerted to excessive text skew angles identified by the skew detector 47 through audio signals. Alternatively or in addition to this audio signal, the stimulation pattern on the tactile display 53 may alert the human operator that the skew angle has been exceeded. In Fig. 4g, although the text is centered, there is a skewed presentation angle. As before, the centered text causes the left-central tactile stimulator 67 and the right-central tactile stimulator 73 to energize.
Additionally, the tracking display driver 51, with input from the skew detector 47, energizes the tactile stimulators 69 and 71 to indicate the skew of the text. In order to distinguish the skew correction signal in these tactile stimulators 69 and 71 from the central text locator signal in other tactile stimulators 67 and 73, the energizing action of these tactile stimulators 69 and 71 is a periodic pulse coordinated with a "chirp" audio output from the speakers, rather than a continuous vibration, as indicated by the unique energization symbols for tactile stimulators 69 and 71.
Fig. 5 presents the preferred response of the tracking display 53 to text alignments within the field of view 54 when the button 11 is pressed, so that the IRA system 1 is in the track mode. The text 7 is in a fixed position in the figures, while the mouse 2 is moved over the text by the human operator 55. The tracking display 53 is located on the mouse 2, whose field of view 54 so moves relative to the text 7 in conjunction with the mouse. In the track mode, the operator 55 has chosen to read a line of text, and has indicated this desire to the IRA system 1 by pressing and holding the button 11 while drawing the mouse 2 across the text. To assist the operator in reading the line of text, at the moment of button press, the most central line in the field of view 54 is designated as the tracking line. The IRA system 1 maintains this designation as long as the tracking line of text remains within the field of view 54. Through feedback to the user through the tracking display, the IRA system 1 directs the user to move the mouse 2 so as to maintain the tracking line centrally in the field of view 54.
WO 97/30415 PCT/US97/02079 In Fig. 5a, the text substring "THE Q" is located in the most central vertical position, as determined by the track line Y locator 37, which transmits this information to the tracking display driver 51. Pressing and holding the button 11 designates this line as the tracking line, indicated by the bold font of the text.
Because the tracking line text is located only within the right-central field of view 54, only the rightcentral tactile stimulator 73 is energized. The speaker 13 will have spoken "the." In Fig. 5b, the tracking line "THE QUICK BROWN FOX JUMPED" spans both left and right central zones of the field of view 54, and is recognized by virtue of the substring "UICK BROWN F" within the field of view 54. Thus, the left-central tactile stimulator 67 and the right-central tactile stimulator 73 are energized. As the track line retains elements from the fast movement of the button 11 push, "THE Q" is also bolded. The speakers 13 will have spoken "the quick brown." In Fig. 5c, the tracking line "THE QUICK BROWN FOX JUMPED" spans both left and right upper zones of the field of view 54. Thus, the left-upper tactile stimulator 65 and the right-upper tactile stimulator 71 are energized. Note that in the track mode, the tracking display 53 responds only to the line designated and being constructed as the track line for as long as this track line continues to remain in the field of view 54, even though "OVER THE LAZY DOG." is currently in a more vertically central location. The advantage of this algorithm is that it allows the user to recover from large mis-tracking errors and return the track line to a central position in the field of view 54, so that he may read this line continuously to its end. The speakers 13 will have spoken "the quick brown fox." In Fig. 5d, the tracking line "THE QUICK BROWN FOX JUMPED" in now located only within the left-upper zone of the field of view 54. Thus, the left-upper tactile stimulator 65 is energized. Note that the tactile stimulator is energized, even though only a single letter remains in the field of view 54. The speakers 13 will have spoken "the quick brown fox jumped." Obtaining a response from only a single tactile stimulator alerts the operator that the field of view 54 -contains either the beginning or end of a single line, or a small, isolated text element. In Fig. 5a, the response from a single tactile stimulator in the right column suggests that the field of view 54 contains the beginning of the line of text. In the tracking display response shown in Fig. 5d, the response from a single tactile stimulator in the left column suggests that the field of view 54 contains the end of a line of text.
The tactile display 53 alerts the operator that the skew angle has been exceeded in track mode also.
In the tracking display response shown in Fig. 5e, the track line is in the central zone of the field of view 54 but there is a skewed presentation angle. As before, the track line text has been bolded and causes the left-central tactile stimulator 67 and the right-central tactile stimulator 73 to energize. Additionally, the tracking display driver 51, with input from the skew detector 47, energizes in a periodic pulse mode the tactile stimulators 69 and 71 to indicate the skew of the track line text. As with the search mode, in order WO 97/30415 PCT/US97/02079 to distinguish the skew correction signal in the tactile stimulators 69 and 71 from the central text locator signal in the tactile stimulators 67 and 73, the energizing action of the tactile stimulators 69 and 71 is a periodic pulse coordinated with a "chirp" audio output from the speakers 13, rather than a continuous vibration, as indicated by the unique energization symbols for tactile stimulators 69 and 71.
When multiple lines of text are being read, the operator 55 will want to be able to find the next line of text following that which the IRA system 1 has been tracking. In order to do this, he must by some means reposition the mouse 2 at one carriage space below the beginning of the previously tracked line.
This is accomplished by a variety of means. In the first (manual) means, the operator 55 may leave a positioning finger on the hand not operating the mouse 2 at the beginning of the currently tracked line.
When the end of the line is reached, the mouse 2 is repositioned relative to this finger and then shifted down one line. Before this new line is tracked, the positioning finger is recentered on the current line with the IRA system 1 mouse. This technique, though manual, is very intuitive and can be fairly compared with a child's beginning reading technique.
In the second (computer-assisted) means, the IRA system 1 may intelligently assist this manual technique with an indication of the beginning of a line previously read. When the button 11 is pressed, the IRA system 1 commences to remember the entire line of text read during the button push, including the first word of the line. When the button is released, the IRA system 1 goes into the search mode, as described above. During this search mode, if the first words of the most recent track line, which is stored in IRA system I memory, is encountered, the operator is alerted with a combination of background-level audio, visual, and tactile stimuli. This stimuli includes a brief flash of all LEDs, a brief and unique chirp audio output from the speakers 13, and a brief pulse from all tactile stimulators on the tracking display 53. This background is over laid on the conventional tracking driver output. It is understood that within the spirit of this invention, these stimuli may take many forms. When the operator encounters and -recognizes the first word of the line previously tracked, he may then index the mouse 2 down to the next adjacent line to begin reading text from this next line.
It is understood that different information can be communicated from the tactile stimulators to the operator through different vibratory modes. For example, in order to distinguish between the central tactile stimulators 67 and 73 and the other tactile stimulators, the vibrational frequency of the central tactile stimulators may be different from that of the other tactile stimulators. This additional communication mode enhances the human operator's ability to intuitively respond to the information, particularly since propriosensation in the fingertips has low spatial discrimination. In our experience, frequencies in the range of 6 to 65 Hz offer both authoritative sensation as well as a range which may be easily discriminated by users. Square-pulse energization provides an impact wave which appears to be well sensed by users in both position and frequency.
l--il~ l r WO 97/30415 PCT/US97/02079 Alternative Modes of Operation In different circumstances, the operator may wish the IRA system I to specialize its function. This requires the IRA system 1 to enter different modes of operation. In order for the operator to select between these different operational modes, a number of different input control means are available. The preferred methods of operator mode selection are the use of mode selector buttons on the mouse 2, as well as verbal mode selection governed through the computer 3 via a voice recognition program.
Fig. 6 depicts an isometric view of the mouse 2. The tracking device is located on the upper forward aspect of the mouse 2, and forms a curved depression into which the operator 55 places his index or middle finger. The distal pad of the finger rests on the tactile stimulators 65, 67, 69, 71, 73, and 75, in such a manner that the LEDs 107, 109, 111, 113, 115, 119 may be viewed on each side of the finger.
The finger can be extended to depress the mode scroll buttons 103 and 105. The scroll buttons 103 and 105 control the selection of alternative modes of operation. The mode scroll-up button 103 steps forward through the available modes, while the mode scroll-down button 105 steps back through the available modes. Pushing both buttons simultaneously returns to normal read mode, as has been previously described. As alternative modes are selected, the entry of each mode is vocalized through the audio speakers 13 by naming the mode entered.
Alternatively, alternative modes can be chosen vocally. The microphones 63 are located on the forward, lateral aspect of the mouse 2. As the operator vocalizes the name of a mode, his voice is pickedup by the noise-canceling microphones 63, and sent via the cable 5 to the computer 3, where it is analyzed by a voice recognition program such as the "Watson" program from AT&T. When the name of a new mode is detected, the IRA system 1 confirms the mode selection by vocalizing the name of the mode through the speakers 13.
Fig. 7 presents a flow-diagram of the control and functionality of the alternative modes of operation.
Selection of alternative modes are made as described above through either the mode scroll-up button 103 and the mode scroll-down button 105, or through a voice recognition program 45 which uses input from the microphones 63. This information is integrated through a software mode selector 145. Depending on the mode selected, IRA system 1 function is altered as described below.
Spell Out Mode In spell-out mode, the letters in a word are spelled out, rather than enunciated by the IRA system 1.
This mode is of particular use when reading serial numbers, garbled text, foreign text, technical documentation, or names. The speech synthesis algorithms in current use may fail with unusual spellings or highly technical language such as are. frequently found in English text, and that are often non-phonetic.
When a word is encountered by the operator which is not familiar to or understood by the operator, the operator can have IRA system 1 spell out the words. The internal operation of this mode is activated WO 97/30415 PCT/US97/02079 through the mode selector 145 which changes the way in which the novel word detector 59 operates.
Instead of sending entire words to the speech synthesizer 61, individual letters are rapidly vocalized.
Bar Code Mode In bar code mode, images from the image pre-processor 29 are sent not to the optical character recognition program 31, but instead are sent to a separate program which is specialized for the interpretation of bar and space codes, such as are commonly provided on retail products, forms, etc. Bar code interpretation programs, such as the PDF1000 software produced by Symbol Technologies of Bohemia, NY, are widely available, and may be adapted for incorporation into the IRA system 1, and may also be performed utilizing a special library for the optical character recognition program 31. Using this alternative means, the mode selector 145 alters one of a plurality of OCR recognition libraries 159 so that the OCR program 31 is optimized to interpret bar code labels. In such a mode, the tactile stimulators respond only when a bar code is within the field of view 54. The bar code mode of operation can be coupled with a database, so that instead of simply returning the code digits to the operator, the identity of the manufacturer and product are vocalized. Such a mode of operation is particularly useful for blind operators in grocery and department store shopping, and may be modified for use by blind operators in a variety of occupational contexts. It should be noted that alternative means of bar code decoding are within the spirit of the invention.
Big Print Headline Mode The optical character recognition program 31 is optimized in most circumstances for a field of view 54 that contains a multiplicity of letters. In addition, the optical character recognition program 31 generally recognizes letters within a certain range of font sizes. When reading very large text, such as is found in newspaper or magazine headlines, book titles, building directories, restaurant menu headings, elevator button designations, or supermarket displays, the fonts will be very large, and may be so large -that the individual symbols will not fit within the contact field of view 54 of the IRA system 1 camera 25. In cases where the letters fit within the field of view 54, big print headline mode will switch optical character recognition program parameters so as to be able to interpret the large fonts encountered. In certain cases, a image pre-processing software program may scale the image in order to make the symbology more easily interpreted by the OCR program 31. In cases where the letters do not fit within the contact field of view 54 of the IRA system 1 camera, the operator 55 may draw the mouse 2 back from the surface, modify the focus on the lens 23 of the camera 25, and thus expand the field of view 54 so as to enable the letters to fit. The natural field of view 54 divergence within the preferred lens system 23 is thirty degrees so that the area covered expands with increasing distance from the window. The OCR recognition library 159 is altered in both cases through the mode selector 145 so that the OCR program 31 is optimized to interpret large font text.
WO 97/30415 PCT/US97/0207 9 Adjusting the focal plane of the IRA system I lens 23 may be accomplished by two means. Firstly, a lever 160 extending to the exterior of the mouse 2 may be provided and attached to the lens 23. This allows the user to rotate the lens 23 having a coarsely-threaded barrel so as to adjust the position of the lens 23 relative to the camera 25 to change its focus. Alternatively, the lever may pull additional optical elements into or out of the optical path, thereby adjusting the optical focal distance.
Currency Denomination Mode In many business and social contexts, blind and low-vision users will need to handle paper currency.
Such currency does not have standard font symbology that is interpretable by the OCR program 31. In such cases, special symbology interpretation libraries must be utilized to distinguish between different currency denominations. Thus, in the preferred embodiment, in currency denomination mode, the mode selector 145 alters the OCR recognition library 159 so that the OCR program 31 is optimized to interpret currency denominations. It should be noted that alternative means of currency denomination identification are within the spirit of the invention. Instead of utilizing the alternative OCR recognition library 159, images may be transferred to a currency identification program entirely separate from the OCR program 31. Such a program could, for example, scan for images which correlate with stored images of different currency denominations.
Medicinal Mode OCR programs are never 100% accurate, and in some cases they may correct for such inaccuracies by checking individual words against dictionaries of words in common usage. Due to the necessity of accurate and reliable interpretation of medical packaging information, it is desirable in this instance to check the information read on medicinal packages against standard medicinal usage. In medicinal mode, the mode selector 145 alters an OCR program word-checking dictionary 157 to specialize in medicinal vocabulary, or activates the OCR program dictionary 157 if such dictionary verification is not otherwise used. In addition or alternatively, the mode selector 145 alters a plurality of OCR program parameters 155 to increase the confidence level at which the OCR program matches symbols, so that weak matches are not vocalized to the user, preventing the transmission of false information.
Read Color Mode Blind and low-vision users have expressed a strong interest in color discrimination and identification.
Many vision-impaired individuals wish to ensure that their dress is unobtrusive, and therefore desire to have clothes that are color-coordinated or socks that match. Read color mode disables the optical character recognition program, and in its place, the IRA system 1 analyzes the color information, such as hue and density, of the image in the field of view 54. This is accomplished in a manner described in more detail below using a black and white camera using methods such as switched single-color illumination. In this method, reflected light is quantified when the illuminators 15 are switched between different color WO 97/30415 PCT/US97/02079 illumination. By comparing the reflected light with different colored illumination, the hue and density of the surface can be determined. The information is compared with a library of color information. and the color designation is vocalized to the operator via the speech synthesizer 61 and the audio speakers 13.
In read color mode, the mode selector 145 activates a color discriminator 147 which, as a color is identified, vocalizes the name of the color from a standard library of color terms through the speech synthesizer 61 to audio speaker 13 output.
Braille Read Out Mode Tactile stimulators 65, 67, 69, 71, 73, and 75 may function to display individual Braille characters, since the standard Braille cell includes two columns of three rows, corresponding to the layout of the tactile stimulators. In the preferred mode, when Braille read out mode is selected, the IRA system 1 searches and tracks text in the normal manner described above. After a line of text is tracked with the button 11 depressed, the operator may access this information in Braille by depressing the button 11 rapidly twice ("double-clicking"), or alternatively the simultaneous push of both buttons 11, in which case the letters of the previous track line are translated into Braille equivalent patterns of vibrating tactile stimulators, and presented serially to the operator. In our experience, transmission rates of three to four letters per second are practical for skilled Braille readers using a single vibro-tactile Braille cell and these rates are acceptable to these readers. In Braille read out mode, the mode selector 145 directs an output switch 149 to redirect text output to a software Braille translator 151. The Braille translator 151 pauses until the double-click input of the button 11 indicates that the operator is ready for Braille output, at which time the Braille translator 151 sends the sequence of letter Braille-equivalents for output to the tracking display driver 51.
CCTV Mode For low-vision users, table-mounted electronic magnification devices have been found to be quite useful for reading flat documents, and are available from a number of manufacturers such as Xerox, Telesensory and HumanWare. Portable electronic-magnification units are also available from companies such as Magni-Cam, but suffer from the difficulty that hand tremors from individuals utilizing the devices cause magnified movements in the field of view 54. Such tremors are particularly frequent in the elderly population suffering vision loss from macular degeneration and other diseases of advanced age. If the resultant unwanted movements in the image resulting from tremors could be eliminated, such a portable magnification system as part of the IRA system I would allow low-vision users to interpret photographs, graphs, handwriting or fonts which are uninterpretable by the OCR program. Furthermore, the device could serve medical purposes, such as self-examination.
In CCTV mode, the video signal is transmitted as in normal operation by the IRA system 1 camera to the image-preprocessor 29 which digitizes the image, converting the camera signal from analog to WO 97/30415 PCT/US97/0207 9 digital information. In CCTV mode, the digital image, in addition to being transferred to the optical character recognition program 31, is additionally transmitted to a CCTV driver 143 for display on a flatpanel display 119 that is mounted to the face of the computer 3. In other embodiments, the flat panel display 119 may alternatively be a computer CRT monitor or a standard television that is connected by a video transmission cable to a connector on the computer 3. Fig. 8 depicts an Independent Reading Aid with the flat panel display 119 mounted on the lateral aspect of the computer 3. A subject image 121 is placed underneath the mouse 2, wherein the camera 25 transmits this image 121 to the computer 3. The image 121 is received by the image pre-processor 29 where it is digitized, and in addition to being sent for input to the optical character recognition program 31, the signal is also sent to the flat panel display 119, where a magnified image of the subject image 121 appears. The activation of the flat panel display 119 is normally controlled via software mode selection, although a hardware flat panel display switch 123 on the computer 3 case offers an alternative method of engaging magnified image augmentation.
The digitized image, before being transmitted to the flat panel display 119, may be modified by a number of image enhancement algorithms. Such algorithms may include contrast enhancement, image inversion, or color balancing, such as developed specifically for low-vision applications using the CE-3000 processor developed by Digivision of San Diego, CA. Because many elderly people suffer from hand tremors, software within the IRA system 1 computer may also be used to stabilize the image, using algorithms that are in common application in video surveillance, military target tracking, and video camcorders.
Special Search Mode In many circumstances, a user needs to search thro a quantity of text in order to identify a specific piece of information. In general, such information i:z accompanied by a special word or symbol. For example, when shopping for clothes at a department store, the price will be the only information on a tag preceded by a dollar sign through this search feature, prices could be rapidly located and vocalized.
In another example, a utility bill is characterized by a complex combination of text, tables, and considerable extraneous information. Yet, the amount to pay is commonly preceded by a dollar sign or the word "pay", which can serve as a beacon for the user in finding this information. The special search mode speeds the identification of sections of text that may subsequently be read in detail.
A list of default special words may be stored in or custom programmed into the IRA system 1 by the user. When in the above-described normal search mode, the IRA system 1 interprets each image frame and interprets the image using the OCR program 31. Each set of OCR-interpreted text is scanned for correlation with the list of default special words. Whereas in normal search mode no text is vocalized and the primary feedback to the user is through the tracking display 53, in special search mode, when one of the special words is encountered, its presence is announced to the operator by vocalization of the word WO 97/30415 PCT/US97/02079 through the speech synthesizer 61 and the audio speakers 13. This allows the operator to rapidly screen through text for symbols or words of particular interest. In special search mode, the mode selector 145 directs the novel word detector 59 to disregard any novel vords unless they are also present in a special search dictionary 153. Additionally, the mode selector 145 activates the software switch 43, so that text from the track line text outputter 39 is continuously fed to the novel word detector 59, even when the operator is operating the IRA system I in the search mode. If the user wishes to override or add to the words or symbols in the special search dictionary 153, he may spell a special word or say a special symbol into the microphone 63, which through the voice recognition program 45 is interpreted and transmitted to the special search dictionary 153.
Alternative special-search modes can discriminate on any basis other than content, including any distinction that can be discriminated by the computer. Such distinctions could include font type, color, or formatting, such as italicization or holding.
Continuous Read Mode When quickly scanning through a complicated or voluminous text, the user is often initially screening the page for representational word content, rather than tracking entire lines of text. In such a case, when the IRA system 1 is in continuous read mode, when the button 11 is depressed, the IRA system 1 does not attempt to track lines of information, but rather speaks individual words whenever novel words are encountered in the center zone of the field of view 54. In such a case, the IRA system 1 vocalizes words as in track mode, but the tracking display 53 operates as if it is in search mode. In continuous read mode, the mode selector 145 directs the track line text outputter 39 so that all complete words are output to the novel word detector 59, whether or not they are located within the current contiguous track line.
Computer Interface Using The IRA System The IRA system 1 may be used to access computer software and digital information from sources such as the World Wide Web and electronic mail. This will be of great usefulness in the employment of blind and low-vision people, as well as increasingly important in entertainment and everyday life. In addition, many everyday tasks require interaction with video displays, such as the use of automatic teller machines (ATM), library catalogs, and video kiosks for store and public location directories. The IRA system 1 accomplishes these tasks through two very different modes of operation.
In the first means, the IRA system 1 treats information presented on video or computer displays as text to be read in its normal mode. The IRA system 1, however, utilizes the video camera 25 to capture images, and in order to capture such images from a cathode ray tube requires special hardware. Fig. 9 depicts the capture of luminous text from a CRT screen using a photo diode to detect the illumination period and synchronize the camera image capture with that period. An electron scanning beam 309 of a CRT 307 causes photoemmission on a CRT screen 311, corresponding to the projected text 7. The CRT WO 97/30415 PCT/US97/0207 9 emission sensor 27 is located in the mouse 2 within direct view of the text 7 to be read. The CRT emission sensor 27 includes a photo diode 141 connected to the amplifier 125 which inputs the ambient light measured by the photo diode 141 to a conventional CRT scan detection circuit 127 (constructed from a phase lock loop IC CD4046 from National Semiconductor Corporation of Santa Clara, California) that is adjusted to respond only to the common screen refresh rates of sixty to seventy-five Hz. This circuit screens for periodic variations in the input light that are characteristic of common CRT screen refresh rates. Televisions and video monitors typically have a characteristic and standardized illumination frequency and duty-cycle. If the time variations correspond to the illumination signature of a cathode-ray display terminal, a timing signal is transmitted to a camera timing driver 129, which extinguishes the illuminators 15, so that this illumination does not compete with the CRT screen illumination, and synchronizes the camera 25 image capture with the illumination period of the CRT scan. The synchronization is accomplished by delaying image integration 135 until one of a plurality of text illumination periods 131 is detected, as described below.
Fig. 10 graphically depicts timing relationships between the detected signal of the photo diode 141 and the camera timing signals output by the camera timing driver 129. The vertical axis of the graph represents the light intensity in the IRA system 1 field of view 54 detected by the photo diode 141. Given the small field of view 54 of the camera 25, the view is illuminated for only a small fraction of the CRT scan. The illumination periods 131 are interspersed with a plurality of quiescent periods 133, when the field of view 54 is dark. During one illumination period 131, corresponding to the image capture period 135, the camera timing driver 129 directs the integration of the optical image by the camera 25. During subsequent quiescent periods 133, the camera image is read out to the image pre-processor 29, followed each time by a camera reset period 139. It should be noted that at common CRT refresh rates, t; camera can at most capture every second refresh of the CRT screen, so that during some illumination periods, the camera 25 does not capture images.
When the CRT emission sensor 27 no longer detects scan characteristic illumination signatures, the camera timing driver 129 reenergizes the illuminators 15, and returns the camera timing to normal image capture mode. The use of the CRT emission sensor 27 and the camera timing driver 129 allows the IRA system 1 to read text from CRT screens, and other pulsed illumination screens. The modes of the IRA system 1 operation with textual information displayed on CRT screens in this manner is identical with that of textual information that the IRA system 1 would encounter on conventional printed surfaces.
In a second mode of operation, the IRA system I system is physically connected to a target or host computer so that it can both read and provide a point-click-and-drag capability like a computer mouse.
In this case, the IRA system 1 functions as a peripheral input/output device through a serial interface, which is shown as a system depiction in Fig. 11. The computer 3 is connected through a communication WO 97/30415 PCT/US97/02079 cable 161 to a target computer 163. The target computer 163 must be loaded with IRA system 1 interface software to allow the interactions which will be described below. While the communications cable 161 may conveniently be a serial communications cable, so that it is more compatible with standard computer mouse drivers, the communications cable may also be a parallel or other mode of communication so as to take advantage of higher communication rates.
A coordinate pad 165 is imprinted with a regular grid of unique alphanumeric symbols. Such symbols could be pairs of letters, in which all pairs in the first row have as the first letter, and in the second row, all pairs have as the first letter, and so forth. In the first column, all pairs have as the second letter, and in the second column, all pairs have as the second letter, and so forth. This two letter designation works well, since it can be easily interpreted by the OCR recognition program 31, but a variety of other alphanumeric and graphical symbologies are possible. The symbologies may be graphical, in which case either a unique program must be utilized to interpret them, or the OCR recognition library 159 must allow the OCR program 31 to discriminate the symbols.
The arrangement of symbols on the coordinate pad 165 are such that whenever the mouse 2 is located on the pad, at least one of the unique symbols is fully within its field of view 54, and able to be translated using the OCR program 31. As the mouse 2 is translated over the coordinate pad 165, the OCR program 31 determines the identity, location and skew of the symbol within the field of view 54. Using this information, IRA system I can compute the location of the center of its field of view 54 relative to the coordinate pad 165 frame of reference. This information is used as an absolute cursor positioning coordinate, which is transmitted to the target computer 163 over the communications cable 161. As the operator translates the mouse 2 over the coordinate pad 165, the position of the cursor on the target computer 163 is continuously updated. It should be noted that this mode of operation is different from that of most computer mouses, which function as relative positioning devices. The IRA system I computer input as described above functions more closely to that of a digitizing tablet, which sends absolute positions to the target computer. Absolute positioning is important for blind and low-vision computer users, since they do not receive the visual feedback of the current location of the mouse.
Therefore, absolute positioning allows such users to rely on kinesthetic positioning cues, because the position of the cursor on the screen is directly related to the haptic or tactile-kinesthetic-sensed position of the mouse 2 on the physical coordinate pad.
So that IRA system 1 can provide additional information about the current status of the graphical user interface located on the target computer, it is preferred that the target computer 163 transmit pixel information from the immediate vicinity of the cursor to the computer 3. The computer 3 will treat this image data in the same manner as that gathered from operation of the camera 25. This means that the text underneath the cursor can be converted into speech feedback for the operator, and further, that lines of text WO 97/30415 PCT/US97/0207 9 can be tracked by the operator in conjunction with the operation of the tracking display 53. While often it is the case that lines of text are already known to the target computer in line oriented ASCII text format, available for speech output on the target computer, an increasing amount of text information, such as that located on the World Wide Web, is graphical in nature (bit-mapped), and unavailable as ASCII text to normal target computer operation. Therefore, even when the target computer has software optimized for the use of blind and low-vision operators, the described methods provide the only known access to these modem information sources.
Use of the IRA system 1 as an input/output interface for graphical user interfaces on target computers is enhanced by the use of the special OCR recognition libraries 159 that recognize specific features of the graphical user interface, in the manner of the special search mode previously described. In addition to standard fonts, these recognition libraries recognize such features as radio buttons, scroll bars, window title bars, and special toolbar button icons. Special search mode in this case is be activated in the manner described above for alternative modes of operation. Note that this system as described allows blind or low-vision users to identify objects, drag and place objects, pull down and select menus, as well as locate and edit existing text, irrespective of the presence of adaptations of the operating system of the target computer to assist low-vision or blind users.
The computer 3 determines the position of the mouse 2 on the encoded pad 165 and sends X-Y coordinates, which are normalized to the pad dimensions, to the target computer 163. The target computer 163 knows the size of its own screen (in display pixels) and can compute (through direct scaling) the corresponding X-Y position on its screen. Optionally, the target computer 163 may place its cursor at this location.
The target computer 163 then provides the video data corresponding to each of the pixel values in the nearby vicinity of the cursor an array of pixels centered around the cursor position, proportional in size to the field of view 54) to the computer 3. The computer 3 (through an alternate image mode of the image pre-processor 29) enters this video data into the OCR program in-between uses of the OCR program to recognize the symbology on the coordinate pad 165. The interpreted data from this video data is used to drive the tactile display 53 and to provide words and symbology icons) for the IRA system 1 to process into audible words for the user to hear. In between the reading and interpretation of the video data from the target computer 163, the OCR program's interpretations of the coordinates or coded symbology on the coordinate pad 165 are not voiced or used to drive the tactile display 53. They are used solely to derive the cursor position coordinates for the target computer 163.
Switched Single-Color Illumination In many cases it is an advantage to be able to detect colored text on colored backgrounds, such as in food packaging or magazine advertisements. Such text may be difficult to discriminate in white-light WO 97/30415 PCT/US97/02079 imaging systems. In such a case, it is appropriate to use a multiplicity of colored LED lamps, such as red, green, blue, and infra-red, which combine to function as the illuminators 15. Such a collection of colored lamps may be used together or with one color at a time. For example, the computer may determine the average contrast of the image elements using different color illuminations, and choose for further processing those images with the highest contrast. In addition, the use of switched single-color illumination provides the ability to distinguish colors within the image on the basis of differential reflectivity in different color illumination.
Fig. 16 depicts a schematic of using switched single-color illumination to enhance the reading of colored text. Illuminators 15 are replaced in this embodiment by banks of colored LEDs, comprising a plurality of red LEDs 249, blue LEDs 251, green LEDs 253 and infra-red LEDs 255. A synch-stripper 257 extracts the timing information corresponding to exposure periods from the camera 25 by using widely-available integrated circuitry, available as Part No. LM1881 from National Semiconductor Corporation of Santa Clara, California, as well as auxiliary output from frame-grabbers. This information is transmitted to a colored LED sequencer 259, which activates the illumination of the colored LEDs 249, 251, 253, and 255 to synchronize with the camera 25 exposure periods. The sequencer 259 is programmed to activate one color set of colored LEDs for a period of time during each successive exposure period, so that during each exposure, only a single color of illumination is used. This illumination period may be adjusted to compensate for the reflectivity of the surface on which the text is illuminated, but in general will be timed to be less than three milliseconds so as to prevent excessive blurring of the image from movement of the mouse during the time of exposure. The sequencer 259 may be a separate electronic circuit, but is conveniently embedded within the functions of the FPGA 173 located within the mouse 2.
Image output from the camera 25 is sent to the image pre-processor 29, which stores separately a red field 261, a blue field 263, a green field 265 and an infra-red field 267. The pre-processor 29 assesses the relative variations in intensity in the different fields, which corresponds roughly to the contrast values of the images. The better contrast images are then sent to the OCR program 31 for OCR interpretation.
Further, the relative total brightness recorded in the fields 261, 263, 265, and 267 are used to determine the dominant color in the field of view 54.
The foregoing description is considered as illustrative only of the principles of the invention.
Furthermore, since numerous modifications and changes will readily occur to those skilled in the art, it is not desired to limit the invention to the exact construction and process shown as described above.
Accordingly, all suitable modifications and equivalents may be resorted to falling within the scope of the invention as defined by the claims which follow.
Benefits and Advantages of IRA system 1 The invention provides a number of advantages to low-vision and blind users: WO 97/30415 PCTIUS97/02079 The IRA system 1 is portable, allowing users to read labels in food stores, price tags in department stores, menus in restaurants, recipes and package instructions in kitchens, medicine bottles in bathrooms, currency denominations in taxicabs, and schedules for buses. The IRA system 1 is particularly amenable to miniaturization since it requires only inch-sized images to operate.
The IRA system 1 furnishes users with spatial feedback on the location of text, as well as the text content, thereby providing users with crucial information embodied within the layout of the text. For example, using only information about the spatial distribution of words and not their specific identities, users can tell the number of columns, the paragraph structure, the existence of isolated words (as in a title), or the location of a page number, location of page headers, and more. This information is critical to rapidly determine the general page content, whether the user would want to read the page, and where the information lies. Consider a utility bill this special format contains volumes of information without interest to the typical user who only wants to know the amount to pay. Using the IRA system 1, the user may quickly "feel out" the page and read only the desired information.
Furthermore, to scan rapidly through a book for a specific page, the IRA system 1 does not require the user to scan entire pages, but can tactilely guide himself directly to the location in which the page number is printed. The IRA system 1 does not simply read the text, it interacts with the user to determine from spatial locations where the text that he wants to read is located. Current print reading devices force the user to manually scan through the text by listening to large amounts of irrelevant text.
The physical shape and character of the mouse 2 allows the user to read text from curved and angular surfaces on a variety of objects, such as medicine bottles, food cans, soft packages, as well as flat objects.
SIf text is located on a fixed or difficultly-moved object, such as the product label on an appliance, the buttons on a microwave oven, or a tag on a chair, the IRA system 1 can be taken to the text, rather than requiring the text to be taken to the scanner.
The IRA system 1 can read computer screens, allowing users to access such increasingly common devices as automated teller machines, computerized library catalogs, and computer kiosks, not to mention enabling access to personal computers without the need for handicapped-access software which, when available, is often inadequate.
The IRA system I may be operated with one-hand, leaving the other hand free to manipulate the object being read. This coordination between the user and the object to be read results in significant gains in both speed and usability.
The IRA system 1 interface is intuitive in the presentation of tactile and aural feedback to the user, coordinated with the location and content of the text. The camera 25 and speakers 13 are located in 36 WO 97/30415 PCT/US97/02079 the mouse 2, placed to capitalize on the user's natural attention facing on the text. This placement also provides aural as well as tactile or haptic feedback as to the location of the information. This intuitive channeling of feedback is critical in making the IRA system 1 easy to learn and natural to operate.
Because the IRA system 1 requires no keyboard, no screen, or no large page scanning apparatus, it is inexpensive to produce, which is a particularly important characteristic for a device serving a handicapped population having modest income.
The features and advantages listed above are, to our knowledge, not available in any existing device.
It is significant that this combination of features may contribute greatly to the employability and independent-living capability of vision-impaired individuals.
An IRA system 1 prototype has been constructed with many of the features listed. In tests with both young and elderly, and with both low-vision and totally blind individuals, the device was quickly learned and accepted, allowing users to read common print on flat and curved surfaces. In testing of the device sponsored by the Department of Education, four young, blind users gained the ability to use the device on both flat paper and cylindrical objects within approximately 20 minutes of training. In testing supporting by the National Institute of Aging, seventeen users averaging 80 years of age evaluated the device during 2 hour training sessions. Of these subjects, 76% found the device mostly or completely easy-to-use and 82% felt that the device would be useful in daily life and would wish to own one when fully developed.
It should be apparent to one skilled in the art that the above-mentioned embodiments are merely illustrations of a few of the many possible specific embodiments of the present invention. Numerous and varied other arrangements can be readily devised by those skilled in the art without departing from the spirit and scope of the invention, which is defined by the following claims.

Claims (41)

1. A method for converting visible symbology into another humanly perceptible version of the symbology, the method employing a hand-held imaging device operated by a user, the method comprising the steps of: imaging the symbology to convert the symbology from an optical image to an electronic signal representative of the optical image; performing symbology recognition on the electronic signal to convert the electronic signal into recognized symbology and to provide the location of the recognized symbology in the image; providing feedback to the user based on the location of the recognized symbology in the image, the feedback being representative of the position of the hand-held imaging device relative to the symbology; and converting the recognized symbology into a humanly perceptible version of the symbology.
2. A method as defined in claim 1, wherein the humanly perceptible version of the symbology into which the recognized symbology is converted is an audible version.
3. A method as defined in claim 1, wherein the humanly perceptible version of the symbology into which the recognized symbology is converted is a Braille version.
4. A method as defined in claim 1, wherein the feedback provided to the user is tactile feedback.
A method as defined in claim 4, wherein the tactile feedback is provided by a moving element on the hand-held imaging device.
6. A method as defined in claim 5, wherein the moving element includes a moving pin.
7. A method as defined in claim 6, wherein the moving pin is part of an armature in an electromagnetic device, the electromagnetic device also including a solenoid coil.
8. A method as defined in claim 5, wherein the moving element is moved at a frequency representative of the position of the hand-held imaging device relative to the symbology.
9. A method as defined in claim 8, wherein the frequency can vary within a range from approximately six to sixty-five Hertz.
A method as defined in claim 8, wherein the frequency is inversely related to the relative distance between the hand-held imaging device and the symbology.
11. A method as defined in claim 5, including a plurality of moving elements arranged in a linear array oriented parallel to a longitudinal axis of the hand-held imaging device to provide an indication of the position, relative to the longitudinal axis, of the hand-held imaging device relative to the symbology. WO 97/30415 PCTUS97/02079
12. A method as defined in claim 11, including a plurality of said parallel linear arrays to also provide an indication of the position, relative to an axis normal to the longitudinal axis, of the hand-held imaging device relative to the symbology.
13. A method as defined in claim 12, including two arrays of three elements each.
14. A method as defined in claim 1, wherein the feedback provided to the user is visible feedback.
A method as defined in claim 14, wherein the visible feedback is provided by a source of light on the hand-held imaging device.
16. A method as defined in claim 15, wherein the source of light is a light emitting diode.
17. A method as defined in claim 15, including a plurality of light sources arranged in a linear array oriented parallel to a longitudinal axis of the hand-held imaging device to provide an indication of the position, relative to the longitudinal axis, of the hand-held imaging device relative to the symbology.
18. A method as defined in claim 17, including a plurality of parallel linear arrays to also provide an indication of the position, relative to an axis normal to the longitudinal axis, of the hand-held imaging device relative to the symbology.
19. A method as defined in claim 18, including two arrays of three light sources each.
A method as defined in claim 17, wherein the plurality of light sources includes light sources which produce light of a different color than others of the plurality of light sources.
21. A method as defined in claim 1, wherein the feedback provided to the user is audible feedback.
22. A method as defined in claim 21, wherein the audible feedback is provided when the hand-held imaging device is skewed in orientation relative to the image by more than a predetermined amount.
23. A method as defined in claim 22, wherein the predetermined amount of skew is approximately five degrees.
24. A method as defined in claim 1, wherein the method includes two modes, a search mode in which the relative position of any symbology recognized will be provided as feedback to the user and a track mode in which the relative position of a line of recognized symbology which is being tracked is provided as feedback to the user.
A method as defined in claim 24, wherein the user can select the search mode or the track mode.
26. A method as defined in claim 25, wherein the recognized symbology is converted into a humanly perceptible version of the recognized symbology only in the track mode.
27. A method as defined in claim 1, wherein the symbology recognition includes optical character recognition.
28. A method as defined in claim 1, wherein the symbology recognition includes bar code recognition.
29. A method as defined in claim 1, wherein the visible symbology to be converted is displayed on a video display.
WO 97/30415 PCT/US97/0207 9 A method as defined in claim 29, wherein the hand-held imaging device includes a camera for imaging the visible symbology on the video display and converting the image to an electronic signal.
31. A method as defined in claim 30, wherein the hand-held imaging device includes a sensor to sense that the visible symbology to be converted is being displayed on a video display.
32. A method as defined in claim 2, wherein the audible version of the recognized symbology is provided in the form of separate symbols, including letters of the alphabet.
33. A method as defined in claim 2, wherein the audible version of the recognized symbology is provided in the form of groups of symbols, including words.
34. A method as defined in claim 1, wherein the visible symbology is also displayed in a magnified visible version.
An apparatus for converting visible symbology into another humanly perceptible version of the symbology for a user, comprising an imaging device to covert the symbology from an optical image to an electronic signal representative of the optical image; a symbology recognizer receptive of the electronic signal to recognize symbology in the image and to determine the location of the recognized symbology in the image; a feedback device to provide feedback for the user based on the location of the recognized symbology in the image, the feedback being representative of the position of the hand-held imaging device relative to the symbology; and a transducer to convert the recognized symbology into a humanly perceptible version of the recognized symbology.
36. An apparatus as defined in cl,:m 35, wherein the imaging device, the feedback device, and the transducer are all located in a hand-held positioning device.
37. An apparatus as defined in claim 36, wherein the hand-held positioning device is tapered to a narrower width at a distal end than at a proximal end.
38. An apparatus as defined in claim 35, wherein a bottom side of the hand-held position device is adapted for placing against a surface with symbology to be read therefrom, and wherein the bottom side includes a recessed channel in the center thereof to allow the bottom side to be placed against curved surfaces to read symbology there'from.
39. An apparatus as defined in claim 35, wherein the imaging device includes an illuminator and a camera having a lens incorporated therein. An apparatus as defined in claim. 39, wherein the lens and the camera are adjustable in position relative to each other to adjust the focal length thereof so as to be able to focus the imaging device on symbology at various distances therefrom.
WO 97/30415 PCT/US97/02079
41. A method for allowing a user with impaired vision to operate a host computer with a graphical user interface including information displayed on a computer display, the method comprising the steps of: providing a surface onto which coded symbology has been provided; imaging with a hand-held positioning device containing an imaging system to image and convert symbology to an electronic signal, the device being adapted for placing in proximity to and moving across the surface; recognizing the coded symbology in the image and determining the location of the coded symbology in the image, and based thereon, determining the position of the positioning device relative to the surface; obtaining the information in the vicinity of the position on the computer display corresponding to the position of the positioning device relative to the surface, and providing an indication of the location of the information relative to the positioning device's position; providing feedback for the user based on the locational indication, the feedback being representative of the location of the information relative to the position on the computer display corresponding to the position of the positioning device relative to the surface; and transducing the information into a humanly perceptible version of the information.
AU22665/97A 1996-02-13 1997-02-11 Tactiley-guided, voice-output reading apparatus Ceased AU709833B2 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US1156196P 1996-02-13 1996-02-13
US60/011561 1996-02-13
PCT/US1997/002079 WO1997030415A1 (en) 1996-02-13 1997-02-11 Tactiley-guided, voice-output reading apparatus

Publications (2)

Publication Number Publication Date
AU2266597A AU2266597A (en) 1997-09-02
AU709833B2 true AU709833B2 (en) 1999-09-09

Family

ID=21750935

Family Applications (1)

Application Number Title Priority Date Filing Date
AU22665/97A Ceased AU709833B2 (en) 1996-02-13 1997-02-11 Tactiley-guided, voice-output reading apparatus

Country Status (4)

Country Link
EP (1) EP0892964A4 (en)
AU (1) AU709833B2 (en)
CA (1) CA2245769A1 (en)
WO (1) WO1997030415A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11153472B2 (en) 2005-10-17 2021-10-19 Cutting Edge Vision, LLC Automatic upload of pictures from a camera

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2769451B1 (en) * 1997-10-06 2000-04-21 Adca Electronique AID READING AID APPARATUS
FR2775097B1 (en) * 1998-02-16 2001-11-23 Freres Lissac DEVICE FOR HELPING THE VISIBLE PERSONS TO READ AND / OR CONSULT DOCUMENTS
FR2834632A1 (en) * 2002-01-15 2003-07-18 Oleg Tretiakoff PORTABLE READING MACHINE FOR THE BLIND
GB2415079A (en) * 2004-06-09 2005-12-14 Darren Raymond Taylor Portable OCR reader which produces synthesised speech output
ITMI20062316A1 (en) * 2006-11-30 2008-06-01 Itex Di Marco Gregnanin METHOD AND APPARATUS FOR RECOGNIZING TEXT IN A DIGITAL IMAGE.
EP2333695B1 (en) * 2009-12-10 2017-08-02 beyo GmbH Method for optimized camera position finding for systems with optical character recognition
WO2011107982A1 (en) * 2010-03-01 2011-09-09 Noa Habas Visual and tactile display
GB2489066A (en) * 2011-12-13 2012-09-19 Rnib A portable code-reading device for helping the visually-impaired
SG11201404511TA (en) 2012-02-23 2014-09-26 Sicpa Holding Sa Audible document identification for visually impaired people
CN106961572A (en) * 2016-01-08 2017-07-18 杭州瑞杰珑科技有限公司 A kind of electronic viewing aid of self adaptation different application scene
GB201718051D0 (en) * 2017-11-01 2017-12-13 Imperial Innovations Ltd apparatus and method for providing tactile stimulus
CN111695264B (en) * 2020-06-16 2023-03-03 中国空气动力研究与发展中心高速空气动力研究所 Multi-wave-system synchronous waveform parameter propelling method for sonic boom propagation calculation

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3676938A (en) * 1970-09-01 1972-07-18 Arnold Trehub Reading device for the blind
US4687444A (en) * 1986-03-31 1987-08-18 The United States Of America As Represented By The Administrator Of The National Aeronautics & Space Administration Braille reading system

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3820069A (en) * 1972-12-26 1974-06-25 Ibm Electro-optical read system
US3976973A (en) * 1974-01-07 1976-08-24 Recognition Equipment Incorporated Horizontal scan vertical simulation character reading
FR2453451B1 (en) * 1979-04-04 1985-11-08 Lopez Krahe Jaime READING MACHINE FOR THE BLIND
US4793812A (en) * 1987-10-05 1988-12-27 Xerox Corporation Hand held optical scanner for omni-font character recognition
US5233333A (en) * 1990-05-21 1993-08-03 Borsuk Sherwin M Portable hand held reading unit with reading aid feature
FR2701132B1 (en) * 1993-02-04 1995-03-31 Ioan Montane Processing installation for the exploration of screens intended for blind operators and corresponding method.

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3676938A (en) * 1970-09-01 1972-07-18 Arnold Trehub Reading device for the blind
US4687444A (en) * 1986-03-31 1987-08-18 The United States Of America As Represented By The Administrator Of The National Aeronautics & Space Administration Braille reading system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11153472B2 (en) 2005-10-17 2021-10-19 Cutting Edge Vision, LLC Automatic upload of pictures from a camera
US11818458B2 (en) 2005-10-17 2023-11-14 Cutting Edge Vision, LLC Camera touchpad

Also Published As

Publication number Publication date
EP0892964A1 (en) 1999-01-27
WO1997030415A1 (en) 1997-08-21
CA2245769A1 (en) 1997-08-21
EP0892964A4 (en) 2000-02-23
AU2266597A (en) 1997-09-02

Similar Documents

Publication Publication Date Title
US6115482A (en) Voice-output reading system with gesture-based navigation
US10741167B2 (en) Document mode processing for portable reading machine enabling document navigation
US9626000B2 (en) Image resizing for optical character recognition in portable reading machine
CA2308213A1 (en) Voice-output reading system with gesture-based navigation
AU709833B2 (en) Tactiley-guided, voice-output reading apparatus
US7629989B2 (en) Reducing processing latency in optical character recognition for portable reading machine
US7659915B2 (en) Portable reading device with mode processing
US8284999B2 (en) Text stitching from multiple images
US8036895B2 (en) Cooperative processing for portable reading machine
US7505056B2 (en) Mode processing in portable reading machine
US8249309B2 (en) Image evaluation for reading mode in a reading machine
US7325735B2 (en) Directed reading mode for portable reading machine
US8186581B2 (en) Device and method to assist user in conducting a transaction with a machine
US20150043822A1 (en) Machine And Method To Assist User In Selecting Clothing
US20060071950A1 (en) Tilt adjustment for optical character recognition in portable reading machine
US20060013483A1 (en) Gesture processing with low resolution images with high resolution processing for optical character recognition for a reading machine
WO1997030415A9 (en) Tactiley-guided, voice-output reading apparatus
WO2005096760A2 (en) Portable reading device with mode processing
KR100582514B1 (en) An apparatus and a method for character recognition

Legal Events

Date Code Title Description
MK14 Patent ceased section 143(a) (annual fees not paid) or expired