ITMI20062316A1

ITMI20062316A1 - METHOD AND APPARATUS FOR RECOGNIZING TEXT IN A DIGITAL IMAGE.

Info

Publication number: ITMI20062316A1
Application number: IT002316A
Authority: IT
Inventors: Marco Gregnanin
Original assignee: Itex Di Marco Gregnanin
Priority date: 2006-11-30
Filing date: 2006-11-30
Publication date: 2008-06-01
Also published as: WO2008065520A3; WO2008065520A2

Description

P02725/IT/FC Titolo: “Metodo e apparato per riconoscere testo in una immagine digitale.” P02725 / IT / FC Title: "Method and apparatus for recognizing text in a digital image."

DESCRIZIONE DESCRIPTION

La presente invenzione si riferisce ad un metodo e ad un apparato per riconoscere testo in una immagine digitale, in particolare ad un metodo e ad un apparato in accordo con il preambolo delle rivendicazioni 1 e 27. The present invention relates to a method and an apparatus for recognizing text in a digital image, in particular to a method and an apparatus in accordance with the preamble of claims 1 and 27.

Nella società odierna, la maggior parte delle informazioni sono acquisibili mediante la vista. Nonostante le leggi vigenti stiano incrementando il numero di informazioni disponibili mediante il tatto (si pensi per esempio ai pulsanti degli ascensori, alle scritte in Braille sui medicinali, alle indicazioni In today's society, most information can be acquired through sight. Although the laws in force are increasing the number of information available through touch (think for example of the elevator buttons, the Braille writings on medicines, the indications

tattili sui pavimenti degli aeroporti o delle stazioni), vi sono ancora moltissimi campi in cui un utente non-vedente o ipo-vedente ha un’estrema difficoltà a tactile on the floors of airports or stations), there are still many fields in which a blind or visually impaired user has extreme difficulty in

fruire delle informazioni disponibili. make use of the information available.

Per esempio, si pensi alle bevande contenute in bottiglie o in lattine, o al contenuto delle scatolette alimentari, o alle scatole contenenti alimenti, o ai medicinali. For example, think of drinks contained in bottles or cans, or the contents of food cans, or boxes containing food, or medicines.

In tutti questi casi, un utente non-vedente o ipo-vedente può trovarsi con estrema facilità nella situazione di dover ottenere informazioni che sono solo disponibili tramite stimoli visivi, proprio quelli che l’utente ipo-vedente o In all these cases, a blind or visually impaired user can very easily find himself in the situation of having to obtain information that is only available through visual stimuli, precisely those that the visually impaired user or

non-vedente non può decodificare con facilità. blind person cannot decode easily.

Per esempio, tutte le lattine hanno la stessa forma e pressoché le stesse dimensioni, essendo il differente contenuto ampiamente descritto sulla superficie della lattina stessa; le bottiglie di vetro possono contenere indifferentemente vino, acqua, olio, aceto, liquori, oppure le bottiglie di plastica For example, all cans have the same shape and almost the same dimensions, the different contents being widely described on the surface of the can; glass bottles can indifferently contain wine, water, oil, vinegar, liqueurs, or plastic bottles

Ing. Aurelio PERANI N. Iscr. ALBO 277 (in proprio ø porgli altri) con la stessa forma possono contenere trielina o acqua; scatolette con la stessa forma possono contenere fagioli, passata di pomodoro o tonno; le scatole di cartone possono contenere pasta con tempi di cottura estremamente differenti ed, infine, i medicinali, hanno le informazioni relative al nome della medicina ed alla data di scadenza in formato tattile solo sulla confezione: quando il blister è separato da questa, per un utente non-vedente o ipo -vedente può diventare praticamente impossibile decodificare correttamente queste informazioni partendo solo dalla forma del blister. Ing. Aurelio PERANI Registered no. ALBO 277 (on their own or others) with the same shape can contain trichlorethylene or water; cans with the same shape can contain beans, tomato sauce or tuna; the cardboard boxes can contain pasta with extremely different cooking times and, finally, the medicines, have the information relating to the name of the medicine and the expiration date in tactile format only on the package: when the blister is separated from this, for a blind or visually impaired users it can become practically impossible to correctly decode this information starting only from the shape of the blister.

Da questi esempi, è chiaro come un errore nella comprensione di queste informazioni visive può avere conseguenze spiacevoli, dalle lievi conseguenze di mangiare una pasta troppo cotta con un sugo al tonno anziché ai fagioli, fino aH’awelenamento provocato dall’assunzione di un medicinale sbagliato o di un liquido corrosivo al posto di semplice acqua. From these examples, it is clear how a mistake in understanding this visual information can have unpleasant consequences, from the mild consequences of eating overcooked pasta with a tuna sauce instead of beans, to poisoning caused by taking the wrong medicine. or a corrosive liquid instead of plain water.

Per evitare tali problemi, gli utenti non-vedenti o ipo-vedenti si affidano generalmente all’aiuto di persone normo-vedenti che possono leggere per loro quelle informazioni importanti, rinunciando magari all’aiuto esterno per informazioni meno rilevanti, come per esempio quelle relative alla marca di un detersivo o al tempo di cottura della pasta. To avoid such problems, blind or visually impaired users generally rely on the help of visually impaired people who can read that important information for them, perhaps giving up external help for less relevant information, such as related information. the brand of detergent or the cooking time of the pasta.

Sarebbe però desiderabile avere un sistema che permetta agli utenti ipo-vedenti e/o non-vedenti di svincolarsi dall’aiuto di altre persone, in modo da essere meno dipendenti da costoro e da poter sfruttare anche le informazioni meno importanti, comunque fondamentali per una migliore qualità della vita. However, it would be desirable to have a system that allows visually impaired and / or blind users to free themselves from the help of other people, so as to be less dependent on them and to be able to exploit even less important information, however fundamental for a better quality of life.

Nella tecnica sono noti dispositivi di acquisizione di immagini digitali, programmi OCR (Optical Character Recognition, ovvero programmi in grado di convertire il testo contenuto in una immagine digitale in codice testuale) e In the art there are known digital image acquisition devices, OCR programs (Optical Character Recognition, i.e. programs capable of converting the text contained in a digital image into textual code) and

Ing. Aurelio PERANi N. lscr. ALBO 277 (in proprio e per gli altri) sintetizzatori vocali in grado di leggere il risultato dell’OCR. Ing. Aurelio PERANi N. lscr. ALBO 277 (on their own and for others) speech synthesizers able to read the OCR result.

I dispositivi di acquisizione d’immagine, siano essi scanner, macchine fotografiche (digitali o meno), webcam, scanner a planetario, videocamere digitali o simili risentono della mancanza di precisione nell’ impostazione dei parametri di acquisizione dell’immagine; il risultato è, frequentemente, un’immagine sfuocata e/o mossa, sovra- o sottoesposta, con un’inquadratura errata, con una presenza di riflessi indesiderati, con un’errata taratura del colore, del contrasto e/o della profondità di campo. Image acquisition devices, whether they are scanners, cameras (digital or not), webcams, planetary scanners, digital video cameras or the like are affected by the lack of precision in setting the image acquisition parameters; the result is, frequently, a blurred and / or blurred image, over- or underexposed, with an incorrect framing, with the presence of unwanted reflections, with an incorrect calibration of color, contrast and / or depth of field .

A tutti i problemi sopra menzionati relativi alla bontà dell’immagine di partenza si somma il problema relativo alla forma dell’oggetto fotografato, che, nel caso di una lattina, per esempio, potrebbe richiedere anche ad un utente nonno-vedente di ruotare la lattina attorno al suo asse per leggere per intero il nome della marca e quindi conoscerne il contenuto. To all the aforementioned problems relating to the goodness of the starting image is added the problem relating to the shape of the photographed object, which, in the case of a can, for example, could also require a non-sighted user to rotate the can. around its axis to read the brand name in full and therefore know its content.

A partire da queste immagini è attualmente impossibile per un sistema di OCR ottenere buoni risultati, in quanto gli OCR attualmente disponibili in commercio sono stati sviluppati sostanzialmente per leggere documenti contenenti testo stampato in bianco su nero, con caratteri regolari e di tipo standard (Arial, Times New Roman, Courier, eccetera), con un corpo di almeno 10-12 punti e allineati lungo la direzione verticale o la direzione orizzontale. Starting from these images it is currently impossible for an OCR system to obtain good results, as the OCRs currently available on the market have been substantially developed to read documents containing text printed in white on black, with regular and standard type characters (Arial, Times New Roman, Courier, etc.), with a body of at least 10-12 points and aligned along the vertical direction or the horizontal direction.

I risultati attualmente possibili sono limitati in termini di praticità a, per esempio, la lettura dei foglietti illustrativi dei farmaci. Currently possible results are limited in terms of practicality to, for example, reading drug package inserts.

I problemi sopra indicati di inquadratura, illuminazione e di stabilità dell’ immagine sono stati affrontati dal titolare della presente invenzione e risolti almeno parzialmente mediante il dispositivo oggetto del brevetto avente per titolo “DISPOSITIVO DI ASSISTENZA PER IPOVEDENTI PER SCATTARE The aforementioned framing, lighting and image stability problems were addressed by the owner of the present invention and solved at least partially by means of the device covered by the patent entitled "ASSISTANCE DEVICE FOR THE PROOFLY VISIBLE TO SHOOT

Ing. Aurelio PERANI N. Iscr. ALBO 277 (in proprio e per gli altri) FOTOGRAFIE”, depositato a nome della medesima Richiedente in data 29 novembre 2006. Ing. Aurelio PERANI Registered no. ALBO 277 (in person and for others) PHOTOGRAPHS ", filed in the name of the same Applicant on 29 November 2006.

In vista dello stato della tecnica descritto, scopo della presente invenzione è quello di realizzare un metodo ed un apparato per riconoscere testo in una immagine digitale che non presenti gli svantaggi dell’arte anteriore. In view of the state of the art described, the purpose of the present invention is to provide a method and an apparatus for recognizing text in a digital image that does not have the disadvantages of the prior art.

In accordo con la presente invenzione, tale scopo viene raggiunto mediante un metodo secondo la rivendicazione 1 ed un apparato secondo la rivendicazione 27. In accordance with the present invention, this object is achieved by means of a method according to claim 1 and an apparatus according to claim 27.

Tale scopo è anche raggiunto un prodotto informatico che, caricato nella memoria di un computer e fatto lavorare su tale computer, consente di attuare il metodo in accordo con la presente invenzione. This object is also achieved by a computer product which, loaded into the memory of a computer and made to work on this computer, allows the method to be implemented in accordance with the present invention.

Le caratteristiche ed i vantaggi della presente invenzione risulteranno evidenti dalla seguente descrizione dettagliata di una forma di realizzazione pratica, illustrata a titolo di esempio non limitativo negli uniti disegni, nei quali: The characteristics and advantages of the present invention will become evident from the following detailed description of a practical embodiment, illustrated by way of non-limiting example in the accompanying drawings, in which:

-la figura 1 mostra un apparato in accordo con una forma di realizzazione della presente invenzione; figure 1 shows an apparatus according to an embodiment of the present invention;

-le figure 2A, 2B, e 2C mostrano rappresentazioni grafiche dell’applicazione di rispettive strategie di riconoscimento, in accordo con la presente invenzione. - Figures 2A, 2B, and 2C show graphic representations of the application of respective recognition strategies, in accordance with the present invention.

Nel corso della presente invenzione, si farà uso di alcuni termini, i quali vengono di seguito definiti. In the course of the present invention, some terms will be used, which are defined below.

Con “immagine digitale” si intende un’immagine che si presenti in formato digitale, indipendentemente dal modo con cui è stata acquisita per la prima volta; tale acquisizione sarà vantaggiosamente realizzata mediante una macchina fotografica digitale, ma altri metodi sono possibili, come indicato By "digital image" we mean an image that is presented in digital format, regardless of how it was acquired for the first time; this acquisition will be advantageously carried out by means of a digital camera, but other methods are possible, as indicated

Ing. Aurelio PER ANI N. Iscr. ALBO 277 (in proprio e per gli altri) nella parte introduttiva. Il soggetto dell’immagine sarà chiaro dal contesto. Con “oggetto” si intende un qualunque elemento raffigurabile in un immagine digitale, ad esempio, bottiglie, lattine, barattoli, ma anche, libri ad 1 anta, libri a 2 ante, opuscoli, giornali, riviste, buste, insegne tavole sinottiche, ecc.. Ing. Aurelio PER ANI N. Insc. ALBO 277 (on his own and for others) in the introductory part. The subject of the image will be clear from the context. By "object" we mean any element that can be represented in a digital image, for example, bottles, cans, jars, but also 1-door books, 2-door books, brochures, newspapers, magazines, envelopes, signs, synoptic tables, etc. ..

Con “testo presente nell’immagine” si intendono porzioni dell’immagine digitale contenenti rappresentazioni grafiche di elementi di testo, siano essi simboli alfanumerici o altri caratteri testuali di tutte le lingue incluse quelle asiatiche, arabe e cirilliche, ma anche simboli commercialmente noti, quali ad esempio <S>, et, £r, ®, lavare a mano, pura lana vergine, ecc..; in genere, si possono considerare “testo presente neH’immagine” tutto ciò che sia riconducibile ad uno dei caratteri leggibili da un computer come “testo” o comunque riconducibili a glifi. With "text in the image" we mean portions of the digital image containing graphic representations of text elements, whether they are alphanumeric symbols or other textual characters of all languages including Asian, Arabic and Cyrillic, but also commercially known symbols, such as for example <S>, et, £ r, ®, hand wash, pure new wool, etc ..; in general, "text present in the image" can be considered anything that is attributable to one of the characters that can be read by a computer as "text" or otherwise attributable to glyphs.

Con “mezzi OCR” si intendono mezzi in grado di analizzare un’immagine digitale e riconoscere la presenza e la posizione di eventuale testo presente nell’immagine e di convertirlo in testo. By "OCR means" we mean means capable of analyzing a digital image and recognizing the presence and position of any text present in the image and converting it into text.

Tutto ciò premesso, con riferimento alle annesse figure, è rappresentato un apparato 1 per la determinazione del testo 2 contenuto in una immagine 3 raffigurante un oggetto 4. Having said all this, with reference to the attached figures, an apparatus 1 for determining the text 2 contained in an image 3 depicting an object 4 is represented.

Tale apparato 1 comprende: This apparatus 1 includes:

-un dispositivo di elaborazione 5 avente mezzi di elaborazione 6 per elaborare l’immagine digitale 3 in modo da produrre una prima immagine modificata 3A; - a processing device 5 having processing means 6 to process the digital image 3 so as to produce a first modified image 3A;

-mezzi 7 per eseguire un riconoscimento ottico di carattere (OCR) su detta prima immagine modificata 3A per determinare almeno una regione 3B contenente testo all’interno di detta immagine modificata 3A; -means 7 for performing an optical character recognition (OCR) on said first modified image 3A to determine at least one region 3B containing text within said modified image 3A;

Ing. Aurelio PERANI N. Iscr. ALBO 277 (in proprio e per gli altri) -mezzi 8 per convertire detto testo contenuto nella regione 3B in Ing. Aurelio PERANI Registered no. ALBO 277 (for oneself and for others) -means 8 to convert said text contained in region 3B into

un primo segnale voce SI, detto segnale voce SI essendo emettibile attraverso a first voice signal SI, said voice signal SI being emitted through

una interfaccia audio 9 operativamente connessa a detto dispositivo di elaborazione 1. an audio interface 9 operatively connected to said processing device 1.

Tali mezzi per eseguire un riconoscimento ottico di carattere 7, si concretizzano in uno scanner oppure, vantaggiosamente, in una macchina fotografica (come rappresentato in figura 1), in una webcam, in uno scanner a planetario o in una videocamera digitale. These means for carrying out an optical recognition of the character 7 take the form of a scanner or, advantageously, of a camera (as shown in Figure 1), a webcam, a planetarium scanner or a digital video camera.

Giova rilevare che siffatti mezzi per eseguire un riconoscimento ottico di carattere 7 prevedono di associare alla prima regione 3B un primo valore di attendibilità Cl, ossia un valore per mezzo del quale è possibile avere una stima It should be noted that such means for carrying out an optical recognition of character 7 provide for associating to the first region 3B a first reliability value Cl, i.e. a value by means of which it is possible to have an estimate

della bontà del testo contenuto in tale prima regione 3B, nonché la posizione di the quality of the text contained in this first region 3B, as well as the position of

tale regione 3B con riferimento alla prima immagine modificata 3A. this region 3B with reference to the first modified image 3A.

Giova rilevare che i mezzi 7 per eseguire un riconoscimento ottico di carattere (OCR) sono operativamente connessi con detto dispositivo di elaborazione 1, ad esempio, mediante un protocollo di comunicazione Bluetooth, mediante USB, mediante un protocollo di comunicazione Wi-Fi, via It should be noted that the means 7 for carrying out an optical character recognition (OCR) are operatively connected with said processing device 1, for example, by means of a Bluetooth communication protocol, by USB, by means of a Wi-Fi communication protocol, via

cavo, Wireless ecc.. cable, Wireless etc ..

In particolare, qualora si utilizzi una macchina fotografica digitale per acquisire l’immagine digitale 3 è bene sottolineare che oltre a trasferire le informazioni proprie relative a tale immagine digitale 3 è possibile che siano In particular, if a digital camera is used to acquire the digital image 3, it should be emphasized that in addition to transferring the information relating to this digital image 3, it is possible that

trasferite anche tutte quelle informazioni che la macchina fotografica memorizza in un cosiddetto “file exif ’ (o analoghi formati proprietari), ossia also transferred all the information that the camera stores in a so-called "exif file" (or similar proprietary formats), that is

dettagli che contengono dettagli sulla macchina fotografica, sui parametri dello details that contain details about the camera, about the parameters of the

scatto quali, ad esempio, distanza di messa a fuoco, diaframma, tempo di shooting such as, for example, focus distance, aperture, time of

Ing. Aurelio PERANI N. Iscr. ALBO 277 (in proprio e par gli altri) esposizione, focale, ecc. e sulla configurazione della masterizzazione. Ing. Aurelio PERANI Registered no. ALBO 277 (on his own and for others) exposure, focal length, etc. and on the burn configuration.

In particolare, il dispositivo di elaborazione 1 si concretizza in un computer, sia esso un PC, un portatile, un palmare, un PDA o un dispositivo per telecomunicazioni come ad esempio un telefono cellulare. In particular, the processing device 1 takes the form of a computer, be it a PC, a laptop, a palmtop, a PDA or a telecommunications device such as a mobile phone.

Vantaggiosamente, in tale dispositivo di elaborazione 1 è installato un programma in grado di riconoscere il testo contenuto nella regione 3B dell’immagine modificata. Advantageously, a program capable of recognizing the text contained in the region 3B of the modified image is installed in this processing device 1.

A tale fine i detti mezzi per elaborare 6 sono in grado di: To this end, said means for processing 6 are able to:

(a) elaborare detta immagine digitale 3 per produrre una prima immagine modificata 3 A; (a) processing said digital image 3 to produce a first modified image 3A;

(b) eseguire un riconoscimento ottico di carattere (OCR) su detta prima immagine modificata 3A per determinare almeno una regione contenente testo (b) performing optical character recognition (OCR) on said first modified image 3A to determine at least one region containing text

3B alfinterno di detta immagine modificata 3A ed associare a detta prima regione un primo valore di attendibilità CI ; 3B inside said modified image 3A and associating to said first region a first reliability value C1;

(c) convertire detto testo riconosciuto in un primo segnale voce S 1 ; (c) converting said recognized text into a first voice signal S1;

in cui detta fase di elaborare (a) comprende l’ulteriore fase di correggere in in which said phase of processing (a) includes the further phase of correcting in

modo automatico i parametri grafici di detta immagine, detti parametri grafici comprendendo luminosità, contrasto saturazione e/o spazi colore equivalenti. automatically the graphic parameters of said image, said graphic parameters including brightness, contrast, saturation and / or equivalent color spaces.

I parametri grafici possono essere modificati anche grazie alle informazioni contenute nei parametri “exif ’ di detta immagine 3A. The graphic parameters can also be modified thanks to the information contained in the "exif" parameters of said image 3A.

Giova rilevare che tale modifica automatica può anche avvenire modificando uno o più dei suddetti parametri luminosità, contrasto e/o saturazione o in spazi colore equivalenti. It should be noted that this automatic modification can also take place by modifying one or more of the aforementioned brightness, contrast and / or saturation parameters or in equivalent color spaces.

Inoltre, tali mezzi per elaborare 6 sono anche in grado di espletare le seguenti fasi: Furthermore, such means for processing 6 are also capable of carrying out the following steps:

Ing. Aurelio PERANI N. Iscr. ALBO 277 (in proprio e per gli altri) (d) confrontare detto primo valore di attendibilità CI con un prefissato Ing. Aurelio PERANI Registered no. BOOK 277 (on one's own and for others) (d) compare said first confidence value CI with a predetermined

valore di attendibilità C2; reliability value C2;

(e) selezionare almeno una strategia Xj di riconoscimento tra una pluralità “n” di strategie di riconoscimento X qualora detto primo valore di attendibilità CI sia inferiore a detto prefissato valore di attendibilità C2. (e) selecting at least one recognition strategy Xj among a plurality "n" of recognition strategies X if said first reliability value CI is lower than said predetermined reliability value C2.

Giova rilevare che ciascuna di detta pluralità di strategie X di riconoscimento è in grado di modificare i parametri grafici e geometrico/prospettici di detta prima immagine modificata 3A per generare una seconda immagine modificata 3C per incrementare detto primo valore Cl. It should be noted that each of said plurality of recognition strategies X is capable of modifying the graphic and geometric / perspective parameters of said first modified image 3A to generate a second modified image 3C to increase said first value Cl.

Vantaggiosamente, le fasi precedentemente elencate (e), (b) e (c) possono essere ripetute fintanto che detto primo valore di attendibilità Cl è prossimo o superiore a detto prefissato valore di attendibilità C2. Advantageously, the previously listed steps (e), (b) and (c) can be repeated as long as said first reliability value Cl is close to or higher than said predetermined reliability value C2.

È bene sottolineare che ciascuna X; di detta pluralità di strategie X di riconoscimento è in grado di modificare i parametri grafici della prima immagine modificata 3A per generare la seconda 3C immagine modificata, iterando uno o più di detta pluralità di strategie X fino al raggiungimento di un It should be noted that each X; of said plurality of recognition strategies X is able to modify the graphic parameters of the first modified image 3A to generate the second modified image 3C, by iterating one or more of said plurality of strategies X until reaching a

valore di attendibilità C compatibile ai fine della presente invenzione. reliability value C compatible for the purposes of the present invention.

In altre parole, a seguito all’applicazione sull’immagine modificata 3A In other words, following the application on the modified image 3A

di una o più di tali strategie X è possibile giungere ad un valore di affidabilità sufficiente elevato perché il testo contenuto nella regione di testo 3B della immagine modificata 3A possa essere letto con una precisione sufficiente così of one or more of such strategies X it is possible to reach a sufficiently high reliability value for the text contained in the text region 3B of the modified image 3A to be read with sufficient precision thus

da generare il segnale voce SI comprensibile all’utente. to generate the voice signal YES understandable to the user.

Verranno ora illustrate, anche con riferimento alle figure da 2A a 2D, le suddette strategie X per ottenere il riconoscimento del testo contenuto nella regione 3B. The above strategies X for obtaining recognition of the text contained in region 3B will now be illustrated, also with reference to Figures 2A to 2D.

Ing. Aurelio PERANI N. Iscr. ALBO 277 (in proprio e porgli altri) Una prima strategia Xi di riconoscimento comprendere la fase di: Ing. Aurelio PERANI Registered no. ALBO 277 (on his own and ask him others) A first recognition strategy Xi include the phase of:

(f) selezionare tra un elenco predefinito la tipologia cui appartiene detto oggetto 4, detto elenco comprendendo oggetti riconducibili a qualsiasi primitiva geometrica aventi forma cilindrica, sferica, conica, o anche oggetti riconducibili a qualsiasi primitiva prospettica singola, doppia e/o composta avente forma parallepipedica. (f) select from a predefined list the type to which said object 4 belongs, said list including objects attributable to any geometric primitive having a cylindrical, spherical, conical shape, or also objects attributable to any single, double and / or composite perspective primitive having a shape parallepipedica.

Giova rilevare che la selezione può essere anche impostata manualmente da parte dell’utente specificando la specifica primitiva geometrica o primitiva prospettica cui appartiene l’oggetto 4. It should be noted that the selection can also be set manually by the user by specifying the specific geometric primitive or perspective primitive to which the object 4 belongs.

Qualora l’oggetto 4 appartenga alla tipologia di oggetti aventi forma cilindrica, sferica e/o conica, la strategia X)prevede che detta fase (f) comprenda l’ulteriore fase (f.l) o strategia Xu di applicare una funzione di trasformazione cilindrica inversa a detta regione di testo 3B. If object 4 belongs to the type of objects having a cylindrical, spherical and / or conical shape, strategy X) provides that said phase (f) includes the further phase (f.l) or strategy Xu of applying an inverse cylindrical transformation function to said text region 3B.

Infatti, anche con riferimento alla figura 2A, grazie a tale funzione di trasformazione cilindrica inversa si ha che le linee di base del testo contenuta in tale regione 3B, che risultano essere curvate a causa della prospettiva, vengano ricondotte attraverso detta trasformazione inversa a linee rette 3D. In fact, also with reference to Figure 2A, thanks to this inverse cylindrical transformation function we have that the basic lines of the text contained in this region 3B, which appear to be curved due to the perspective, are brought back through said inverse transformation to straight lines 3D.

Qualora le aree di testo (ad esempio come rappresentato in figura 2d) sono due 3E e 3F, entrambe con bassa affidabilità, sui due lati dell’oggetto 4, e’ probabile che si debba ruotare l’oggetto 4 di un predeterminato angolo, ad esempio, di 90 gradi. If the text areas (for example as represented in figure 2d) are two 3E and 3F, both with low reliability, on the two sides of the object 4, it is probable that the object 4 must be rotated by a predetermined angle, to example, 90 degrees.

A tale fine, la fase (f) comprende l’ulteriore fase (f.2) o strategia X].2di imporre una predefinita rotazione angolare attorno all’asse verticale di detto oggetto 4. To this end, phase (f) includes the further phase (f.2) or strategy X] .2 of imposing a predefined angular rotation around the vertical axis of said object 4.

Vantaggiosamente è previsto che venga generato un secondo segnale Advantageously, a second signal is generated

Ing. Aurelio PERANI N. Iscr. ALBO 277 (In proprio e per gli altri) voce S2 per specificare la predefinita rotazione. Ing. Aurelio PERANI Registered no. ALBO 277 (For oneself and for others) item S2 to specify the default rotation.

In particolare, i mezzi OCR 6 restituiscono come valore delle porzioni della immagine modificata 3A cui sono associati posizione e affidabilità "generale". In particular, the OCR means 6 return as a value the portions of the modified image 3A to which position and "general" reliability are associated.

Ulteriormente all'interno di tali porzioni delia immagine modificata 3° restituite dai mezzi OCR è anche associata a ciascun carattere la loro posizione e raffidabilità. Furthermore, within these portions of the modified image 3 ° returned by the OCR means, their position and reliability is also associated with each character.

Quindi nel caso in cui l’oggetto 4 raffigurato nelPimmagine 3 e che abbia due colonne con affidabilità molto bassa, è probabile che tale oggetto 4 appartenga ad una primitiva geometrica dell’elenco e che tale oggetto 4 sia stato fotografato dal lato sbagliato. So in the event that the object 4 depicted in image 3 and has two columns with very low reliability, it is likely that this object 4 belongs to a geometric primitive in the list and that object 4 has been photographed from the wrong side.

Il segnale voce S2 avverte l’utente e propone di effettuare la rotazione angolare attorno all’asse verticale dell’oggetto 4, ad esempio una rotazione pari a 90°. The voice signal S2 warns the user and proposes to carry out the angular rotation around the vertical axis of the object 4, for example a rotation equal to 90 °.

Qualora invece l’oggetto 4 raffigurato nell’immagine 3 abbia due colonne con affidabilità medio alta, è probabile che tale oggetto 4 appartenga ad una primitiva geometrica dell’elenco e che tale oggetto 4 sia stato fotografato dal lato giusto. In tale scenario l'oggetto 4 risulta essere largo quanto la somma delle larghezze delle colonne. If, on the other hand, object 4 depicted in image 3 has two columns with medium-high reliability, it is likely that object 4 belongs to a geometric primitive in the list and that object 4 has been photographed from the right side. In this scenario, object 4 is as wide as the sum of the column widths.

È bene rilevare che la larghezza dell’oggetto 4 è calcolata coma la larghezza della singola colonna o la somma delle due colonne. It should be noted that the width of object 4 is calculated as the width of the single column or the sum of the two columns.

Il metodo inventivo nel corso della strategia Xu e X(2imposta la funzione di trasformazione cilindrica inversa al centro della larghezza e dell’altezza dell’oggetto 4. The inventive method during the strategy Xu and X (2 sets the inverse cylindrical transformation function at the center of the width and height of the object 4.

L’oggetto 4 è riconducibile a dette primitive geometriche è scelto tra il Object 4 is attributable to said geometric primitives and is chosen from

Irsg. Aurelio PERANI N. Iscr. ALBO 277 (in proprio e per gli altri) gruppo comprendente bottiglie, lattine, barattoli e simili. Irsg. Aurelio PERANI N. Insc. ALBO 277 (on his own and for others) group comprising bottles, cans, jars and the like.

Qualora l’oggetto 4 appartenga alla tipologia di oggetti riconducibili a primitive prospettiche singola o doppia aventi forma parallepipedica del tipo a libro ad una anta (figura 2B) o due ante (figura 2C), la fase (f) comprende l’ulteriore fase (f.3) o strategia X1.3di applicare una funzione di prospettiva inversa a detta regione di testo. If object 4 belongs to the typology of objects attributable to single or double perspective primitives having a parallelpipedic shape of the book type with one door (figure 2B) or two wings (figure 2C), phase (f) includes the further phase ( f.3) or strategy X1.3 to apply an inverse perspective function to said text region.

Infatti, grazie a tale funzione di prospettiva inversa le linee di base del testo contenute nella regione 3b, curvate dalla struttura della relativa primitiva geometrica vengono ricondotte attraverso una trasformazione inversa a linee rette, come raffigurato nella zona 3D. In fact, thanks to this inverse perspective function, the basic lines of the text contained in the region 3b, curved by the structure of the relative geometric primitive, are brought back through an inverse transformation to straight lines, as shown in the 3D area.

L’oggetto 4 è riconducibile a dette primitive prospettiche avente forma parallepipedica è scelto tra il gruppo comprendete libri ad 1 anta, libri a 2 ante, opuscoli, giornali, riviste, buste, insegne tavole sinottiche. Object 4 is attributable to said primitive perspective having a parallepipedic shape and is chosen from the group including 1-leaf books, 2-leaf books, brochures, newspapers, magazines, envelopes, signs, synoptic tables.

Una ulteriore strategia Xj .4, eseguibile successivamente o indipendentemente dalle strategie prima descritte, comprende l’ulteriore fase (f.4) di interpolare la risoluzione nativa di detta prima immagine modificata 3A per generare detta seconda immagine modificata 3C avente una risoluzione superiore alla risoluzione nativa di detta prima immagine modificata. A further strategy Xj .4, which can be executed subsequently or independently of the strategies described above, comprises the further step (f.4) of interpolating the native resolution of said first modified image 3A to generate said second modified image 3C having a resolution higher than the resolution native of said first modified image.

Vantaggiosamente grazie a tale strategìa Xi.4è possibile trovare caratteri con corpo piccolo internamente a detta regione di testo 3B. Advantageously, thanks to this strategy Xi.4, it is possible to find characters with a small body inside said text region 3B.

Una ulteriore strategia X1 5, eseguibile successivamente o indipendentemente dalle strategie prima descritte, comprende l’ulteriore fase (f.5) di applicare un primo filtro vettoriale del tipo di erosione del bordo detta regione contenente testo. A further strategy X1 5, which can be executed subsequently or independently of the strategies described above, includes the further step (f.5) of applying a first vector filter of the type of edge erosion of said region containing text.

Vantaggiosamente grazie a tale strategia X1.5 è possibile distinguere le Advantageously, thanks to this strategy X1.5 it is possible to distinguish the

Ing. Aurelio PERANI N. Iscr. ALBO 277 (In proprio Θ per gli altri) scritte in grassetto internamente a detta regione di testo 3B. Ing. Aurelio PERANI Registered no. ALBO 277 (On his own Θ for the others) written in bold type inside said region of text 3B.

Una ulteriore strategia Xi.6, eseguibile successivamente o indipendentemente dalle strategie prima descritte, comprende l’ulteriore fase A further strategy Xi.6, which can be executed subsequently or independently of the strategies described above, includes the further phase

(f.6) di applicare un secondo filtro vettoriale o grafico del tipo di contrasto massimo a detta regione contenente testo, e/o strategia X].7l’ulteriore fase (f.7) (f.6) to apply a second vector or graphic filter of the type of maximum contrast to said region containing text, and / or strategy X] .7 the further phase (f.7)

di applicare un terzo filtro vettoriale o grafico del tipo di contrasto minimo è a to apply a third vector or graphic filter of the minimum contrast type is a

detta regione contenente testo. said region containing text.

Una ulteriore strategia Xi,8, eseguibile successivamente o indipendentemente dalle strategie prima descritte, comprende l’ulteriore fase A further strategy Xi, 8, which can be executed subsequently or independently of the strategies described above, includes the further phase

(f.8) di applicare un quarto filtro vettoriale o grafico avente un primo valore di (f.8) to apply a fourth vector or graphic filter having a first value of

soglia per filtrare i pixel da bianchi a neri di detta regione contenente testo 3B, threshold for filtering the white to black pixels of said region containing text 3B,

e/o strategia X1.9di applicare un quinto filtro avente un secondo valore di soglia and / or strategy X1.9 of applying a fifth filter having a second threshold value

per filtrare i pixel da neri a bianchi di detta regione contenente testo. to filter the black to white pixels of that region containing text.

È bene rilevare che detto valore di soglia delle strategie Xu e X1.9è determinabile in funzione del valore di attendibilità prefissato. It should be noted that said threshold value of strategies Xu and X1.9 can be determined as a function of the predetermined reliability value.

Una seconda strategia di riconoscimento X2, eseguibile successivamente A second X2 recognition strategy, which can be performed later

0 indipendentemente dalle strategie prima descritte, che comprende la fase di interpolare la risoluzione nativa di detta prima immagine modificata per generare detta seconda immagine modificata avente una risoluzione superiore 0 regardless of the strategies described above, which comprises the step of interpolating the native resolution of said first modified image to generate said second modified image having a higher resolution

alla risoluzione nativa di detta prima immagine modificata, come già descritto at the native resolution of said first modified image, as already described

in precedenza. previously.

Ulteriormente vi può essere una terza strategia di riconoscimento X3, eseguibile successivamente o indipendentemente dalle strategie prima descritte, Furthermore, there may be a third recognition strategy X3, which can be executed subsequently or independently of the strategies described above,

che comprende la fase di applicare detto primo filtro di erosione del bordo a which comprises the step of applying said first edge erosion filter a

detta regione contenente testo, come già descritto in precedenza. said region containing text, as already described above.

Ing. Aurelio PERANI N. Iscr. ALBO 277 (in proprio e per gli altri) Una quarta strategia di riconoscimento X4, eseguibile successivamente o indipendentemente dalle strategie prima descritte, comprende la fase di applicare detto secondo filtro di contrasto massimo a detta regione contenente Ing. Aurelio PERANI Registered no. ALBO 277 (on one's own and for others) A fourth recognition strategy X4, which can be executed subsequently or independently of the strategies described above, comprises the step of applying said second maximum contrast filter to said region containing

testo 3B. text 3B.

Una quinta strategia di riconoscimento X5 comprende la fase di applicare detto terzo filtro di contrasto minimo a detta regione contenente testo A fifth recognition strategy X5 comprises the step of applying said third minimum contrast filter to said text-containing region

3B. 3B.

Una sesta strategia di riconoscimento X6comprende la fase di applicare A sixth X6 recognition strategy includes the apply phase

detto quarto filtro avente un primo valore di soglia per filtrare i pixel da bianchi a neri di detta regione contenente testo 3B. said fourth filter having a first threshold value for filtering the white to black pixels of said region containing text 3B.

Alternativamente, la sesta strategia di riconoscimento X6comprende la fase di applicare detto quinto filtro avente un secondo valore di soglia per filtrare i pixel da neri a bianchi di detta regione contenente testo. Alternatively, the sixth recognition strategy X6 comprises the step of applying said fifth filter having a second threshold value to filter the black to white pixels of said region containing text.

È bene rilevare che detto valore di soglia per filtrare i pixel da neri a bianchi e/o da bianchi a neri è determinabile in funzione del valore di attendibilità prefissato. It should be noted that said threshold value for filtering pixels from black to white and / or from white to black can be determined as a function of the predetermined reliability value.

Giova altresì rilevare che detta almeno una regione contenente testo all’interno di detta immagine modificata qualora detto primo valore di attendibilità sia prossimo a zero coincide con detta immagine modificata. It should also be noted that said at least one region containing text within said modified image if said first reliability value is close to zero coincides with said modified image.

Come si può apprezzare da quanto descritto, il metodo e l’apparato secondo l’invenzione permettono di soddisfare le esigenze di cui si è detto nella As can be appreciated from what has been described, the method and the apparatus according to the invention allow to meet the needs mentioned in the

parte introduttiva della presente descrizione e di ovviare agli inconvenienti dei metodi e apparati della tecnica nota. introductory part of the present description and to obviate the drawbacks of the methods and apparatuses of the known art.

Ovviamente un tecnico del ramo, allo scopo di soddisfare esigenze contingenti e specifiche, potrà apportare numerose modifiche e varianti alle Obviously, a person skilled in the art, in order to satisfy contingent and specific needs, can make numerous modifications and variations to the

Ing. Aurelio PERANI N. Iscr. ALBO 277 {in proprio e per gli altri) configurazioni sopra descritte, tutte peraltro contenute nell'ambito di protezione dell'invenzione quale definita dalle seguenti rivendicazioni. Ing. Aurelio PERANI Registered no. ALBO 277 (in its own right and for the others) configurations described above, all however contained within the scope of protection of the invention as defined by the following claims.

Ing. Aurelio PERANI N. lscr. ALBO 277 (in proprio a porgli altri) Ing. Aurelio PERANI N. lscr. ALBO 277 (on his own to ask him others)

Claims

CLAIMS 1. Method for recognizing text in an image (3), in which an object (4) is represented in said image (3), comprising the steps of: (a) processing a digital image (3) to produce a first modified image (3A); (b) performing an optical character recognition (OCR) on said first modified image (3A) to determine at least one region containing text (3B) within said modified image (3A) and associating to said first region (3B) a first reliability value (Cl); (c) converting said recognized text into a first voice signal (S1); in which said processing step (a) comprises a further step of automatically correcting the graphic parameters of said image (3), said graphic parameters comprising brightness, contrast, saturation and / or equivalent color spaces.

2. Method for recognizing text in an image in accordance with claim 1, comprising the further steps of: (d) comparing said first reliability value (Cl) with a predetermined reliability value (C2); (e) selecting at least one recognition strategy (Xi; ..., X6) among a plurality of recognition strategies if said first reliability value (Cl) is lower than said predetermined reliability value (C2), each of said plurality recognition strategies (Xi; ..., X6) being adapted to modify the graphic and geometric / perspective parameters of said first modified image (3A) to generate a second modified image (3C) to increase said first reliability value (Cl ). Ing. Aurelio PERANI N. Jscr. ALBO 277 (on his own and for others)

3. Method for recognizing text in an image in accordance with claim 2, wherein said method provides for repeating steps (e), (b) and (c) as long as said first reliability value (Cl) is close to or higher to said predetermined reliability value (C2).

4. Method for recognizing text in an image in accordance with any one of claims 1 to 3, wherein a first recognition strategy (Xj) comprises the step of: (f) select from a predefined list the type to which said object belongs (4), said list including objects attributable to any geometric primitive having a cylindrical, spherical, conical shape, or also objects attributable to any single, double and / or composite perspective primitive having a parallelpipedic shape.

5. Method for recognizing text in an image according to claim 4, wherein if said object (4) belongs to the typology of objects having a cylindrical, spherical and / or conical shape, said phase (f) comprises (XJ.I) 1 a further step (f.l) of applying an inverse cylindrical transformation function to said text region.

Method for recognizing text in an image according to claim 4 or 5, wherein said step (f) comprises the further step (f.2) of imposing (X1.2) a predefined angular rotation around the vertical axis of said object.

Method for recognizing text in an image according to claim 6, wherein said step (f.2) provides for generating a second voice signal (S2) to specify said predefined angular rotation.

8. Method for recognizing text in an image in accordance with the ! ng. Aurelio PERANI N. Insc. ALBO 277 (on his own and asking others) claim 4, in which if said object (4) belongs to the type of objects attributable to single or double perspective primitives having a parallelpipedic shape, said phase (f) includes the further phase (f.3) of applying (X1.3) an inverse perspective function to said text region (3B).

9. Method for recognizing text in an image in accordance with any one of claims 4 to 8, in which said step (f) includes the further step (f.4) to interpolate (X | .4) the native resolution of said first modified image to generate said second modified image having a resolution higher than the native resolution of said first modified image (3B).

Method for recognizing text in an image according to any one of claims 4 to 9, wherein said step (f) comprises the further step (f.5) of applying (Xi.5) a first vector filter or graph of type of edge erosion called region containing text (3B).

Method for recognizing text in an image according to any one of claims 4 to 10, wherein said step (f) comprises the further step (f.6) of applying (Xi.e) a second vector filter 0 graph of maximum contrast type to said region containing text (3B).

Method for recognizing text in an image according to any one of claims 4 to 10, wherein said step (f) comprises the further step (f.7) of applying (X1.7) a third vector filter or type chart of minimum contrast is to said region containing text (3B).

Method for recognizing text in an image according to any one of claims 4 to 12, wherein said step (f) comprises the further step (f.8) of applying (Xi.s) a fourth vector filter or graphic having a first threshold value for filtering the white to black pixels thereof ! ng. Aurelio PERANI Tech. No. ALBO 277 (on his own and for others) region containing text (3B).

Method for recognizing text in an image according to any one of claims 4 to 12, wherein said step (f) comprises It is a further step (f.9) of applying (X1.9) a fifth vector or graphic filter having a second threshold value to filter the black to white pixels of said region containing text (3B).

15. Method for recognizing text in an image according to claims 13 and 14, wherein said threshold value can be determined as a function of the predetermined reliability value (C2).

Method for recognizing text in an image according to any one of claims 1 to 3, wherein a second recognition strategy (X2) comprises the step of interpolating the native resolution of said first modified image (3A) to generate said second modified image (3C) having a resolution higher than the native resolution of said first modified image (3A).

Method for recognizing text in an image according to any one of claims 1 to 4 or 16, wherein a third recognition strategy (X3) comprises the step of applying said first edge erosion filter to said text-containing region (3B).

18. Method for recognizing text in an image according to any one of claims 1 to 4 or 16 to 17, wherein a fourth recognition strategy (X4) comprises the step of applying said second maximum contrast filter to said region containing text (3B).

19. Method for recognizing text in an image in accordance with any of claims 1 to 4 or 16 to 18, wherein a fifth strategy Ing. Aurelio PERANI N. fscr. ALBO 277 (on its own and for others) of recognition (X5) comprises the step of applying said third minimum contrast filter to said region containing text (3B).

Method for recognizing text in an image in accordance with any one of claims 1 to 4 or 16 to 19, wherein a sixth recognition strategy (X6) comprises the step of applying said fourth filter having a first threshold value for filtering the pixels from white to black of said region containing text (3B).

Method for recognizing text in an image according to any one of claims 1 to 40 from 16 to 19, wherein a sixth recognition strategy (X6) comprises the step of applying said fifth filter having a second threshold value for filtering the black to white pixels of said region containing text (3B).

22. Method for recognizing text in an image in accordance with claims 20 and 21, wherein said threshold value can be determined according to of the predetermined reliability value.

23. Method for recognizing text in an image according to claim 4, wherein said object referable to said geometric primitives is selected from the group comprising bottles, cans, cans.

24. Method for recognizing text in an image according to claim 8, wherein said object with said perspective primitives having parallepipedic form is chosen from the group including 1-leaf books, 1-leaf books 2 panels, brochures, newspapers, magazines, envelopes, signs, synoptic tables.

25. Method for recognizing text in an image in accordance with any of the previous claims 1 to 24, wherein said at least one region containing text (3B) within said modified image (3A) Ing. Aurelio PERANI Registered no. ALBO 277 (on his own and for others) if said first reliability value (Cl) is close to zero, it coincides with said modified image (3A).

26. Computer product that can be directly loaded into the memory of a computer, comprising portions of program code capable of carrying out the method according to any one of claims 1 to 25 when run on said computer.

27. Apparatus for recognizing text in an image (3), in which an object (4) is represented in said image (3), said apparatus comprising: - a processing device (1) having means for processing a digital image (3) to produce a first modified image (3A); - means (7) for carrying out an optical character recognition (OCR) on said first modified image (3A) to determine at least one region containing text (3B) inside said modified image (3A) and to associate with said first region (3B) a first reliability value (C 1); -means (8) for converting said recognized text into a first voice signal (S1), said voice signal (SI) being emitted through an audio interface (9) operatively connected to said processing device (1); characterized in that said means (7) for carrying out an optical character recognition comprise a program in accordance with any one of claims 1 to 25.

28. Apparatus for recognizing text in an image, according to claim 27, wherein said means (7) for acquiring said digital image are operatively connected with said processing device (1).

29. Apparatus for recognizing text in an image, according to any one of claims 27 to 28, wherein said means (7) for performing a Ing. Aurelio PERANI N. lscr. ALBO 277 (on their own and by others) optical character recognition (OCR) on said first modified image (3A) are able to generate a first reliability value (Cl) for said at least one region containing text (3B),

30. Apparatus for recognizing text in an image, according to claim 27, wherein said processing means are adapted to compare said first reliability value (Cl) with a predetermined reliability value (C2) and to select at least one recognition strategy (Xj; ..., X6) among a plurality of recognition strategies if said first reliability value (Cl) is lower than said predetermined reliability value (C2), each of said plurality of recognition strategies (Xs ; ..., X6) being able to modify the graphic parameters of said first modified image (3A) to generate a second modified image (3C) as long as said first reliability value (Cl) is close to or higher than said predetermined value of reliability (C2).

31. Apparatus for recognizing text in an image, according to any one of claims 27 to 30, wherein said processing device (1) comprises a computer, a notebook, a palmtop, a PDA or a telecommunications system such as a mobile phone. Ing. Aurelio PERANI N. lscr. ALBO 277 (on his own and for others)