CN114429632A

CN114429632A - Method and device for identifying point-reading content, electronic equipment and computer storage medium

Info

Publication number: CN114429632A
Application number: CN202011104395.1A
Authority: CN
Inventors: 董胜; 徐浩; 项小明
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-10-15
Filing date: 2020-10-15
Publication date: 2022-05-03
Anticipated expiration: 2040-10-15
Also published as: CN114429632B

Abstract

The embodiment of the application provides a method and a device for recognizing point-to-read content, electronic equipment and a computer storage medium, and relates to the technical field of cloud education. The method comprises the steps of acquiring an image of a reading object, identifying characters in the image and position information of an indicating object by an image identification method, determining the characters pointed by the indicating object by using the characters and the position information of the indicating object, and finally determining reading content according to the characters pointed by the indicating object, so that the reading range is not limited by a reading pen and a specific reading teaching material, reading of reading objects such as common teaching materials, data and the like can be supported, great convenience is provided, answers can be rapidly obtained for the obscure contents encountered by a user in learning, and the learning efficiency of the user is effectively improved.

Description

Method and device for identifying point-reading content, electronic equipment and computer storage medium

Technical Field

The present application relates to the field of read-on-demand technologies, and in particular, to a method and an apparatus for recognizing read-on-demand content, an electronic device, and a computer storage medium.

Background

Cloud Computing reduction (CCEDU) refers to an Education platform service based on Cloud Computing business model application. On the cloud platform, all education institutions, training institutions, enrollment service institutions, propaganda institutions, industry associations, management institutions, industry media, legal structures and the like are integrated into a resource pool in a centralized cloud mode, all resources are mutually displayed and interacted and communicated according to needs to achieve intentions, so that education cost is reduced, and efficiency is improved.

The family education machine with the point reading function is becoming an important carrier in the field of cloud education, and the current method for recognizing the point reading content is generally to prepare the recognition result of the point reading position and directly return the prepared content after acquiring the point reading position. For example, the reading contents are written into an invisible code layer of the reading textbook in advance, the reading pen emits light through an infrared light emitting diode, the camera acquires code layer data, and the reading contents are identified through the identification code layer data. For example, the page of the textbook is marked, the reading pen is provided with a special mark, the reading pen triggers reading, and the camera acquires the marked information, so that the reading of the content fails.

The existing scheme has the following problems:

1. the identification range is limited, and only customized books can be identified;

2. high cost, needs to make the reading books and the reading pens in a customized way, consumes a great deal of manpower and material resources,

3. the customized reading book cannot update the reading content.

Disclosure of Invention

Embodiments of the present invention provide a method, apparatus, electronic device, and computer storage medium for identifying click-to-read content that overcome or at least partially solve the above-mentioned problems.

In a first aspect, a method for identifying read-by-touch content is provided, and the method includes:

acquiring an image of a point-reading material;

identifying the position information of the characters and the indicating objects in the image, and determining the characters pointed by the indicating objects according to the position information of the characters and the indicating objects;

and determining the point reading content according to the character pointed by the pointer.

In one possible implementation manner, recognizing the position information of the character and the indicator in the image, and determining the character pointed by the indicator according to the position information of the character and the indicator includes:

recognizing the image through an OCR recognition engine to obtain the position information of characters in the image; identifying the position information of the indicator in the image by a fingertip detection method;

and calculating the character closest to the indicator according to the position information of the character and the indicator, and taking the character as the character pointed by the indicator.

recognizing the image through an OCR recognition engine to obtain character lines in the image and the positions of the character lines; identifying the position information of the indicator in the image by a fingertip detection method;

obtaining a character line closest to the indicator as a target character line according to the character line and the position information of the indicator;

and determining the pixel width occupied by a single character in the target character line, and combining the target character line and the position information of the indicator to obtain the character pointed by the indicator.

In one possible implementation manner, obtaining a character line closest to the indicator as a target character line according to the character line and the position information of the indicator, includes:

for any character line, calculating the distance from the indicating object to the bottom edge of the character line according to the character line and the position information of the indicating object;

obtaining a character line position relation coefficient according to the relative position relation between the indicating object and the bottom edge of the character line;

carrying out weighted summation on the vertical distance from the indicator to the bottom edge of the character line and the position relation coefficient of the character line to obtain the weighted distance between the character line and the indicator;

and taking the character line with the minimum weighted distance as a target character line.

In one possible implementation, determining a pixel width occupied by a single character in the target text line includes:

obtaining the pixel width occupied by the target character line according to the position information of the target character line;

and obtaining the pixel width occupied by the single character in the target character line according to the quotient of the pixel width occupied by the target character line and the number of characters in the target character line.

In one possible implementation manner, the obtaining of the character pointed by the pointer by combining the target literal line and the position information of the pointer includes:

determining the distance between the indicator and the left end of the target character line according to the position information of the target character line and the indicator;

obtaining the sequence of the characters pointed by the indicator in the target character line according to the quotient of the distance between the indicator and the left end of the target character line and the pixel width occupied by the characters in the target character line;

and determining the character pointed by the indicator from the target character line according to the sorting.

In one possible implementation manner, determining the point-reading content according to the character pointed by the pointer includes:

if the character pointed by the indicator is a Chinese character, determining the point reading content according to the Chinese character;

and if the character pointed by the indicator is an English character, determining the vocabulary where the English character is located, and determining the point reading content according to the vocabulary.

In a second aspect, an apparatus for identifying read-by-touch content is provided, the apparatus comprising:

the image acquisition module is used for acquiring an image of the point reading material;

the pointing character determining module is used for identifying the characters in the image and the position information of the pointing object and determining the characters pointed by the pointing object according to the characters and the position information of the pointing object;

and the reading content determining module is used for determining the reading content according to the character pointed by the indicator.

In one possible implementation, the directional character determination module includes:

the character position determining submodule is used for identifying the image through an OCR (optical character recognition) engine to obtain the position information of the characters in the image; identifying the position information of the indicator in the image by a fingertip detection method;

and the nearest distance calculation submodule is used for calculating the character nearest to the indicator according to the position information of the character and the indicator, and the character is used as the character pointed by the indicator.

the character line position determining submodule is used for identifying the image through an OCR (optical character recognition) engine to obtain the character line and the position of the character line in the image; identifying the position information of the indicator in the image by a fingertip detection method;

the target character line determining submodule is used for acquiring a character line closest to the indicator as a target character line according to the character line and the position information of the indicator;

a width combination submodule for determining the pixel width occupied by a single character in the target character line and combining the position information of the target character line and the indicator to obtain the character pointed by the indicator

In one possible implementation, the target text line determination sub-module includes:

the distance determining unit is used for calculating the distance from the indicating object to the bottom edge of the character line according to the character line and the position information of the indicating object for any character line;

the relation coefficient determining unit is used for obtaining the position relation coefficient of the character line according to the relative position relation between the indicator and the bottom edge of the character line;

the weighted summation unit is used for carrying out weighted summation on the vertical distance from the indicator to the bottom edge of the character line and the position relation coefficient of the character line to obtain the weighted distance between the character line and the indicator;

and a target character row determining unit for taking the character row with the smallest weighting distance as the target character row.

In one possible implementation, the width combining sub-module further includes a character width determining unit for determining a pixel width occupied by a single character in the target character line, and the character width determining unit includes:

the character line width calculating operator unit is used for obtaining the pixel width occupied by the target character line according to the position information of the target character line;

and the character width calculating operator unit is used for obtaining the pixel width occupied by the single character in the target character line according to the quotient of the pixel width occupied by the target character line and the number of characters in the target character line.

In one possible implementation manner, the width combining sub-module further includes a directional character determining unit configured to obtain a character pointed by the pointer by combining the target literal line and the position information of the pointer, where the directional character determining unit includes:

the distance determining subunit is used for determining the distance between the indicator and the left end of the target character line according to the position information of the target character line and the indicator;

the sequencing determining subunit is used for obtaining the sequencing of the characters pointed by the indicator in the target character line according to the quotient of the distance between the indicator and the left end of the target character line and the pixel width occupied by the characters in the target character line;

and the sequencing character subunit is used for determining the character pointed by the indicator from the target character line according to the sequencing.

In a third aspect, an embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the steps of the method provided in the first aspect.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps of the method as provided in the first aspect.

According to the method, the device, the electronic equipment and the computer storage medium for identifying the point reading content, provided by the embodiment of the invention, the image of the point reading material is obtained, the position information of the characters and the indicator in the image is identified by the image identification method, the characters pointed by the indicator are determined by using the position information of the characters and the indicator, and the point reading content is determined according to the characters pointed by the indicator, so that the point reading range is not limited by a point reading pen and a specific point reading teaching material, the point reading of reading materials such as common teaching materials, data and the like can be supported, the great convenience is realized, the solution can be quickly obtained for the obscure contents encountered by a user in the learning process, and the learning efficiency of the user is effectively improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.

Fig. 1 is a schematic system architecture diagram of a point-to-read system according to an embodiment of the present disclosure;

fig. 2 is a schematic structural diagram of a desk lamp according to an embodiment of the present disclosure;

fig. 3 is a schematic flow chart illustrating content reading by applying the identification point of the desk lamp shown in fig. 2 according to an embodiment of the present application;

fig. 4 is a schematic flow chart of reading content by applying the table lamp shown in fig. 2 according to another embodiment of the present application;

fig. 5 is a schematic flowchart illustrating a method for recognizing click-to-read content according to an embodiment of the present application;

FIG. 6 is an interaction diagram of a system for recognizing click-to-read content according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an apparatus for recognizing click-to-read content according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present invention.

As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

The application provides a method, an apparatus, an electronic device and a computer-readable storage medium for identifying point-to-read content, which aim to solve the above technical problems in the prior art.

The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

Fig. 1 shows a system architecture diagram of a point-to-read system to which the technical solution of the embodiment of the present application can be applied.

As shown in fig. 1, the system architecture may include a terminal device (e.g., one or more of a smartphone 101, a tablet computer 102, and a point-and-read machine 103 shown in fig. 1, but may also be a desktop computer, etc.), a network 104, and a server 105. The network 104 serves as a medium for providing communication links between terminal devices and the server 105. Network 104 may include various connection types, such as wired communication links, wireless communication links, and so forth.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud computing services.

The execution method of the server in the embodiment of the application can be completed in a cloud computing (cloud computing) mode, and the cloud computing is a computing mode, and distributes computing tasks on a resource pool formed by a large number of computers, so that various application systems can obtain computing power, storage space and information service according to needs. The network that provides the resources is referred to as the "cloud". Resources in the "cloud" appear to the user as being infinitely expandable and available at any time, available on demand, expandable at any time, and paid for on-demand.

As a basic capability provider of cloud computing, a cloud computing resource pool (called as an ifas (Infrastructure as a Service) platform for short is established, and multiple types of virtual resources are deployed in the resource pool and are selectively used by external clients.

According to the logic function division, a PaaS (Platform as a Service) layer can be deployed on an IaaS (Infrastructure as a Service) layer, a SaaS (Software as a Service) layer is deployed on the PaaS layer, and the SaaS can be directly deployed on the IaaS. PaaS is a platform on which software runs, such as a database, a web container, etc. SaaS is a variety of business software, such as web portal, sms, and mass texting. Generally speaking, SaaS and PaaS are upper layers relative to IaaS.

A user may use a terminal device to interact with the server 105 over the network 104 to receive or send messages or the like. The server 105 may be a server that provides various services. For example, a user uploads an image of a reading material read by the user to the server 105 by using the terminal device 103 (or the terminal device 101 or 102), the server 105 recognizes characters in the image and position information of the indicator, and determines characters pointed by the indicator according to the characters and the position information of the indicator; and determining the point reading content according to the character pointed by the pointer. The server 105 determines the positions of the characters and the indicator in the image by an image recognition method, and further determines the characters pointed by the indicator, so that the reading range is not limited any more, the reading of reading materials such as teaching materials, books and electronic books can be supported, more importantly, a textbook and a reading pen are not required any more, and the reading system is simple and flexible.

The point reading system in the embodiment of the application can be deployed as an independent software and hardware integrated system, for example, a mobile intelligent device system integrated with a camera device and an audio playing device; the system is formed by combining the independent camera device and the intelligent sound box; the system is formed by combining an independent camera device, an independent audio playing device and mobile intelligent equipment; it can also be deployed as a system combining the aforementioned terminal and server.

It is understood that in other embodiments, the terminal may not transmit the image to the server 105, and the above-mentioned series of processing performed on the image by the server 105 may be performed by the terminal itself. Here, only one application scenario is listed, and the execution subject of the method for recognizing the read-by-talk content provided in the present application is not limited to the server 105, but may be a method in which the terminal itself executes the read-by-talk content.

Further, the terminal of the embodiment of the present application may integrate a camera device and an audio/video playing device, referring to fig. 2, fig. 2 shows a schematic structural diagram of a desk lamp to which the embodiment of the present application may be applied, and the terminal in the embodiment of the present application uses the desk lamp as a carrier and includes a base 201, a lamp post 202 and a lamp body 203;

the base 201 is fixedly placed on a desktop, and the base 201 comprises a processor 2011, a microphone 2012, a speaker 2013, a communication module 2014, a memory 2015, a display 2016, a power module 2017 and the like;

the microphone 2012, the speaker 2013, the communication module 2014, the memory 2015 and the display 2016 are connected to the processor 2011; the power supply module 2017 is used for supplying power to the above components;

the lamp body 203 can be fixedly connected with the base 201 placed on a desktop through the lamp post 202, the lamp body 203 can be transversely arranged and can also be hinged with the lamp post 202, so that the change of the irradiation angle is realized, the lamp body 203 comprises a light source 2031 which can be an LED light source, an OLED light source and the like, wherein the lamp body 203 can be beneficial to reducing the shadow of primary and secondary school students during operation by using a surface light source;

wherein, still install camera 2032 in the lamp body 203, can be single camera or two cameras, the camera of even more quantity, camera 2032 is connected (this relation of connection is not shown in the figure) with treater 2011 to receive the control of treater, the camera can be installed on the lamp body 203 and the one end far away from lamp pole 202, and the shooting direction of camera is towards this desktop, in order to obtain the field of vision scope of preferred.

The memory 2025 stores executable program codes, and the processor 2021 calls the executable program codes stored in the memory 2025 to execute the technical solution of the embodiment of the present application to obtain click-to-read contents, which are played by the speaker 2023 and the display 2026 in voice and video respectively.

Therefore, the desk lamp described in the figure 2 can provide the operation experience of directly reading by using the pointing objects of non-point reading pens such as fingers and pencils on real books for users of primary and secondary schools, has great convenience, can ask questions in time for the rarely-used contents encountered during learning, and obtains corresponding answer results, thereby effectively improving the learning efficiency of students.

Fig. 3 shows a schematic flow chart of reading content by applying the table lamp identification point shown in fig. 2 in the embodiment of the present application, as shown in fig. 3:

a user places a book below the desk lamp for reading;

when a user finds an unknown character from a page, placing a finger below the character, triggering a desk lamp to photograph the finger and the page, and obtaining an image of a reading material clicked and read by the user; the finger and the characters in the page are clearly recorded in the image;

the position information of the character and the finger in the image is recognized, the character pointed by the finger is determined to be the bottom of the Chinese character according to the position information of the character and the finger, and the click-to-read content of the bottom, such as information of pronunciation, paraphrase, strokes, word formation and the like, is displayed on the display.

Fig. 4 is a schematic flow chart of reading content by applying the table lamp identification point shown in fig. 2 according to another embodiment of the present application, as shown in fig. 4:

a user places a book below the desk lamp for reading;

when a user finds an unknown character from a page, a pencil is placed below the character, a desk lamp is triggered to photograph the pencil and the page, and an image of a reading material clicked and read by the user is obtained; the pen point of a pencil and characters in a page are clearly recorded in the image;

the method comprises the steps of identifying the position information of characters and pen points in an image, determining the characters pointed by the pen points to be the bottom of the Chinese characters according to the position information of the characters and the pen points, and displaying click-to-read contents of the bottom on a display, such as information of pronunciation, paraphrase, strokes, word formation and the like.

The embodiment of the present application provides a method for recognizing read-on content, as shown in fig. 5, the method includes:

s101, acquiring an image of the point reading material.

In the embodiment of the present application, the image may be obtained by taking a picture in real time by a device with an image capturing function, calling a locally stored image, or receiving a picture sent by another device or a storage device, and is not limited specifically here.

The image of the embodiment of the application not only comprises the character to be recognized, but also records the pointer when the user reads, namely, the embodiment of the application does not need to use a reading pen to read by reading, so that the hardware requirement of reading by reading is greatly reduced. It is understood that the pointing object of the embodiments of the present application may be a general writing pen, such as a pencil, a painting brush, a pen, etc., instead of a finger, and is not limited herein.

S102, recognizing the position information of the characters and the indicating object in the image, and determining the characters pointed by the indicating object according to the position information of the characters and the indicating object.

OCR (Optical Character Recognition) refers to a process in which an electronic device (e.g., a scanner or a digital camera) checks a Character printed on paper, determines its shape by detecting dark and light patterns, and then translates the shape into a computer text by a Character Recognition method; that is, for the print characters, the characters in the paper document are optically converted into an image file of black and white dot matrix, and the characters in the image are converted into a text format by recognition software. According to the method and the device, the image can be recognized through a preset OCR engine recognition engine, and the position information of each character in the image is obtained. The position information of the character can be represented by the maximum abscissa, the minimum abscissa, the maximum ordinate and the minimum ordinate of a pixel point of the character in the image.

Specifically, the characters can be recognized through the Tencent cloud OCR character recognition engine, the Tencent cloud OCR character recognition can support recognition of the whole image characters in any layout, including recognition of more than ten languages such as Chinese and English, letters, numbers, Japanese and Korean, and the Ten-to-several-language recognition method has the advantages of being accurate and rapid when being applied to click-to-read scenes of infants and primary and secondary school students.

Taking finger identification as an example, the embodiment of the application may identify the finger and the position information of the finger in the image by using a fingertip detection method. The finger is subjected to fingertip detection, and the coordinates of the fingertip are used as the position information of the finger. Fingertip detection is a special hand key point recognition, which is a technology for positioning the tip of the index finger which extends out of the index finger. The fingertip detection can adopt a Gesture Recognition function (GR) of Tencent cloud Recognition, and Tencent cloud Recognition is a man-machine interaction technology based on Tencent audio and video laboratories and comprises multiple functions of static Gesture Recognition, key point Recognition, fingertip Recognition, Gesture action Recognition and the like. Tencent cloud recognition supports in any one picture, if the type of gesture therein is that the index finger is extended, the position of the fingertip is returned, wherein the position is represented by the abscissa and the ordinate.

After determining the position information of the character and the indicator, the character closest to the indicator can be searched as the character pointed by the indicator. For example, the coordinates of the center point of the character may be determined from the coordinates of the character, and then the distance between the coordinates of the center point of the character and the coordinates of the fingertip may be calculated, and the character having the smallest distance may be used as the character pointed to by the pointer.

S103, determining the point reading content according to the character pointed by the indicator.

It should be understood that the characters refer to font-like units or symbols, and in the process of reading, the effective characters are generally kanji characters or english characters, and of course, korean characters, japanese characters, arabic characters, and the like can also be used. For non-letter characters such as Chinese characters, the character pointed by the indicator is a complete character, and the click-to-read content can be determined directly according to the complete character. Taking the Chinese character as an example, if the character pointed by the indicator is "king", the contents related to the "king" character, such as strokes, paraphrases and pronunciations, can be used as the contents of click-to-read.

For the characters in the letter form, a single character does not form a complete character (word), so that the word head needs to be searched from the character forward, the word is searched backwards, the word where the character is located is determined, and then the click-to-read content is determined according to the word. Taking English characters as an example, if the character pointed by the indicator is the letter "i", the letter is found to be "n" by searching from the letter "i" forward, the space character is arranged in front of the letter "n", so that the letter "n" is determined to be the letter head, then the letter "i" is found to be backward, the letters "c" and "e" are sequentially found, and the space character is arranged behind the letter "e", so that the word in which the character pointed by the indicator is located is determined to be "nice", and the contents related to the "nice", such as pronunciation, paraphrase, example sentences and the like, are taken as click-to-read contents.

Further, if the character pointed by the indicator is a space character, the point reading content is determined according to the non-space character adjacent to the space character, and the manner of determining the point reading content through the non-space character is similar to the above example, and is not described herein again.

According to the method for identifying the point reading content, the image of the point reading material is obtained, the characters in the image and the position information of the indicator are identified through the image identification method, the characters pointed by the indicator are determined by the characters and the position information of the indicator, and the point reading content is determined according to the characters pointed by the indicator, so that the point reading range is not limited by a point reading pen and a specific point reading teaching material, the point reading of reading materials such as common teaching materials and data can be supported, great convenience is achieved, the answer can be rapidly obtained for the obscure contents encountered by a user in learning, and the learning efficiency of the user is effectively improved.

It should be noted that the recognition accuracy of the OCR recognition engine is often positively correlated with the cost, and the higher the recognition accuracy of the OCR recognition engine is, the higher the cost is. Therefore, on the basis of the above embodiment giving an example of accurately recognizing the position information of each character by using the OCR recognition engine, as an alternative embodiment, the present application further provides a method for determining a pointing character of a pointing object by using the OCR recognition engine with lower recognition accuracy, specifically, recognizing the position information of the character and the pointing object in an image, and determining the character pointed by the pointing object according to the position information of the character and the pointing object, including:

s201, identifying the image through an OCR (optical character recognition) engine to obtain character lines and positions of the character lines in the image; and identifying the position information of the indicator in the image by a fingertip detection method.

In the embodiment of the application, the recognition accuracy of the OCR recognition engine is low, and although the characters in the image can be recognized, the position of each character cannot be accurately known, and only the position of each character row can be obtained. The position of each line of text can be characterized by a maximum abscissa, a minimum abscissa, a maximum ordinate, and a minimum ordinate of the line of text.

Considering that a user generally points a pointer below a character rather than below the character when reading, in order to determine a target character row more accurately, the embodiment of the present application further determines the target character row by obtaining a weighted distance based on calculating a distance between the character row and the pointer, specifically: obtaining a character line position relation coefficient according to the relative position relation between the indicating object and the bottom edge of the character line;

if the indicator is located above the bottom edge of the text line, the position relation coefficient is an integer, and if the indicator is located below the bottom edge of the text line, the position relation coefficient is 0 or a negative number.

Carrying out weighted summation on the vertical distance from the indicator to the bottom edge of the character line and the position relation coefficient of the character line to obtain the weighted distance between the character line and the indicator; and taking the character line with the minimum weighted distance as a target character line.

The formula for defining the weighted distance between the text line and the indicator is:

Fi＝a*Li+b*Mi

wherein, Fi represents the weighted distance between the indicators of the ith character row, a and b are respectively the first weight and the second weight, Li represents the vertical distance between the indicators and the bottom edge of the character row, and Mi represents the coefficient of the positional relationship of the character row.

The process of identifying the position information of the pointer in the image by the fingertip detection method has been described in the above embodiments, and is not described again.

S202, obtaining a character line closest to the indicator as a target character line according to the character line and the position information of the indicator.

Specifically, the base of the character line can be obtained by connecting the maximum abscissa and the minimum abscissa of the character line, then the vertical distance between the ordinate of the indicator and the base of each character line is calculated, and the character line with the minimum vertical distance is taken as the target character line.

S203, determining the pixel width occupied by the single character in the target character line, and combining the target character line and the position information of the indicator to obtain the character pointed by the indicator.

Specifically, after the target character line is determined, the pixel width occupied by the target character line is obtained according to the position information of the target character line, and the pixel width occupied by a single character in the target character line can be determined by combining the number of characters in the target character line recognized by the OCR recognition engine.

For example, if the maximum abscissa of the target character string is 100 and the minimum abscissa is 10, the width of the target character string is 90, and if the number of characters in the target character string is 30, it can be inferred that the pixel width occupied by a single character is 3.

Further, the distance between the indicator and the left end of the target character line can be determined according to the position information of the target character line and the indicator. It will be appreciated that the value of the abscissa increases progressively from left to right, and the distance of the pointing object from the left end of the target line can be determined by subtracting the abscissa of the pointing object from the smallest abscissa of the target line. For example, if the abscissa of the pointer is 60 and the abscissa of the target character row is 10, the distance between the pointer and the left end of the target character row is 50.

The ordering of the characters pointed by the pointer in the target character line can be obtained by dividing the distance between the pointer and the left end of the target character line by the pixel width occupied by the single character, for example, if the distance between the pointer and the left end of the target character line is 50, the pixel width occupied by the single character is 3, the division of 50 by 3 is 16, and the remainder is 2, which means that the ordering of the characters pointed by the pointer in the target character line is 17. After the sequence of the characters pointed by the indicator is obtained, the characters pointed by the indicator can be determined from the target character line according to the sequence, namely, the 17 th character in the target character line is used as the character pointed by the indicator.

Because the embodiment of the present application is completed in an image recognition manner when the content is read at the recognition point, in order to verify the influence of the luminosity of the ambient light on the recognition accuracy, the embodiment of the present application respectively performs the statistics of the recognition accuracy at different times and under different luminosity, and the statistical results are shown in table 1:

TABLE 1 recognition accuracy statistics table

As can be seen from table 1, the recognition accuracy of the method for recognizing click-to-read contents provided in the embodiment of the present application has no obvious correlation with the magnitude of luminosity, and an ideal accuracy can be ensured even under a lower luminosity (320 ± 30Lux), which indicates that the embodiment of the present application is not harsh on the luminosity and can meet the click-to-read requirements of general users.

Furthermore, the embodiment of the application is compared with the accuracy of the existing multi-style point-reading product under different luminosity, and the comparison result is shown in table 2

TABLE 2 statistical table of identification results of the present application and the contests

As can be seen from table 2, the recognition accuracy of the present application under natural light and table light is very low, and even the recognition accuracy under table light with lower luminosity is slightly better than that under natural light, and 3 kinds of competitions have the problem that the accuracy is greatly different under different luminosity or the accuracy is significantly lower under a certain luminosity, so that compared with the prior art, the present application embodiment has more excellent comprehensive performance of accuracy and stability under different luminosity.

Further, the embodiment of the application is compared with the existing multi-style point-reading products in terms of identification speed, and the comparison result is shown in table 3.

Product(s)	This application	Height of step	Aer egg	L*ka
					Time consuming (ms)	947.5	1415	1312.5	2777.5

TABLE 3 speed identification comparison Table

As can be seen from table 3, the recognition speed of the embodiment of the present application is significantly better than that of the prior art.

Fig. 6 is an interaction schematic diagram of an identification read-on-demand content system according to an embodiment of the present disclosure, as shown in fig. 6, the identification read-on-demand content system according to the present embodiment includes a terminal, an access layer, and an algorithm layer, specifically,

the terminal responds to the operation of reading the reading material by the user, acquires an image of the reading material, and then sends a reading identification request to the access layer through an HTTP (Hypertext Transfer Protocol), wherein the reading identification request comprises the image of the reading material and the account information of the user.

The access layer refers to a part of the network directly facing to user connection or access, the access layer is connected with the user by using transmission media such as optical fibers, twisted pairs, coaxial cables, wireless access technologies and the like, and optionally, the access layer is composed of a wireless network card, an access point and a switch.

After receiving the read-on-demand identification request, the access layer verifies account information recorded in the read-on-demand identification request, and aims to intercept possible malicious attacks. If the account number passes the verification, it is determined that the click-to-read identification request is sent by a legal user, and then the click-to-read identification request is subjected to frequency control, so as to avoid the fact that the calculation amount of a subsequent algorithm layer exceeds the calculation capacity, and promote that the embodiment of the application can be kept stable for a long time, then the data is verified, specifically, the format, the definition, the size and the like of an image are verified, and only after the data is verified, the click-to-read identification request is sent to an algorithm layer in an Taf frame for processing. Taf is a high-performance RPC framework of a background logic layer, supports three languages of C + +, Java and node at present, integrates extensible protocol coding and decoding, a high-performance RPC communication framework, name routing, discovery, release monitoring, log statistics and configuration management, and can quickly construct stable and reliable distributed application by means of micro-service and realize complete and effective service management.

The method comprises the steps that an algorithm layer firstly identifies an image through an OCR (optical character recognition) engine to obtain character lines in the image and the positions of the character lines; identifying the position information of the indicator in the image by a fingertip detection method;

then determining a target character line, including calculating the distance from the indicating object to the bottom edge of the character line according to the position information of the character line and the indicating object for any character line; obtaining a character line position relation coefficient according to the relative position relation between the indicating object and the bottom edge of the character line; carrying out weighted summation on the vertical distance from the indicator to the bottom edge of the character line and the position relation coefficient of the character line to obtain the weighted distance between the character line and the indicator; and taking the character line with the minimum weighted distance as a target character line.

After the target character line is determined, analyzing the character line to determine the pixel width of a single character in the character line, specifically obtaining the pixel width occupied by the target character line according to the position information of the target character line; and obtaining the pixel width occupied by the single character in the target character line according to the quotient of the pixel width occupied by the target character line and the number of characters in the target character line.

Obtaining the pixel width occupied by a single character, combining the position information of the target character line and the indicating object to obtain the character pointed by the indicating object, and specifically determining the distance between the indicating object and the left end of the target character line according to the position information of the target character line and the indicating object; obtaining the sequence of the characters pointed by the indicator in the target character line according to the quotient of the distance between the indicator and the left end of the target character line and the pixel width occupied by the characters in the target character line; determining characters pointed by the indicator from the target character line according to the sequence, wherein if the characters pointed by the indicator are Chinese characters, the point-reading content is determined according to the Chinese characters; and if the character pointed by the indicator is an English character, determining the vocabulary where the English character is located, and determining the point reading content according to the vocabulary.

And the algorithm layer returns the point-read content to the terminal for displaying through the access layer.

An embodiment of the present application provides an apparatus for recognizing read-on content, and as shown in fig. 7, the apparatus for recognizing read-on content may include: an image acquisition module 301, a directional character determination module 302, and a click-to-read content determination module 303, wherein,

the image acquisition module 301 is configured to acquire an image of a point reading material.

The image of the embodiment of the application not only comprises the character to be recognized, but also records the pointer of the user during point reading, namely, the embodiment of the application does not need to use a point reading pen to read the point as in the prior art, only the pointer is needed to realize point reading, and the hardware requirement of point reading is greatly reduced. It is understood that the user can also read with a common writing pen, such as a pencil, a painting brush, a pen, etc., in addition to the pointer, and is not limited herein.

And a pointing character determination module 302, configured to identify the characters in the image and the position information of the pointing object, and determine the character pointed by the pointing object according to the characters and the position information of the pointing object.

According to the embodiment of the application, the fingertip detection can be adopted to identify the indicator in the image and the position information of the indicator, so that the fingertip detection is carried out on the indicator, and the coordinate of the fingertip is used as the position information of the indicator. Fingertip identification is a special hand key point identification, which refers to a technology for positioning the tip of the index finger which extends out of the index finger. The fingertip detection can adopt a Gesture Recognition function (GR) of Tencent cloud Recognition, and Tencent cloud Recognition is a man-machine interaction technology based on Tencent audio and video laboratories and comprises multiple functions of static Gesture Recognition, key point Recognition, fingertip Recognition, Gesture action Recognition and the like. Tencent cloud recognition supports in any one picture, if the type of gesture therein is that the index finger is extended, the position of the fingertip is returned, wherein the position is represented by the abscissa and the ordinate.

And a reading content determining module 303, configured to determine reading content according to the character pointed by the pointer.

The content identification and reading device provided in the embodiment of the present invention specifically executes the process of the above method embodiment, and please refer to the content of the above method embodiment for identifying and reading content in detail, which is not described herein again. The device for identifying the point reading content provided by the embodiment of the invention has the advantages that the image of the point reading material is obtained, the characters in the image and the position information of the indicator are identified by the image identification method, the characters pointed by the indicator are determined by utilizing the characters and the position information of the indicator, and the point reading content is determined according to the characters pointed by the indicator, so that the point reading range is not limited by a point reading pen and a specific point reading teaching material, the point reading of the reading materials such as common teaching materials, data and the like can be supported, the great convenience is realized, the solution can be quickly obtained for the obscure contents encountered by a user in the learning process, and the learning efficiency of the user is effectively improved.

It should be noted that the recognition accuracy of the OCR recognition engine is often positively correlated with the cost, and the higher the recognition accuracy of the OCR recognition engine is, the higher the cost is. Thus, while the above-described embodiment gives an example of accurately recognizing the position information of each character by the OCR recognition engine, as an alternative embodiment, the pointing character determination module includes:

the character line position determining submodule is used for identifying the image through an OCR (optical character recognition) engine to obtain the character line and the position of the character line in the image; and identifying the position information of the indicator in the image by a fingertip detection method.

Fi＝a*Li+b*Mi

And the target character line determining submodule is used for acquiring a character line closest to the indicator as a target character line according to the character line and the position information of the indicator.

And the width combining submodule is used for determining the pixel width occupied by a single character in the target character line and combining the target character line and the position information of the indicator to obtain the character pointed by the indicator.

and the character line width calculating operator unit is used for obtaining the pixel width occupied by the target character line according to the position information of the target character line.

and the distance determining subunit is used for determining the distance between the indicator and the left end of the target character line according to the position information of the target character line and the indicator.

An embodiment of the present application provides an electronic device, including: a memory and a processor; at least one program stored in the memory for execution by the processor, which when executed by the processor, implements: the method comprises the steps of obtaining an image of a reading object, identifying characters in the image and position information of an indicating object through an image identification method, determining the characters pointed by the indicating object by using the characters and the position information of the indicating object, and finally determining reading content according to the characters pointed by the indicating object, so that the reading range is not limited by a reading pen and a specific reading teaching material, reading of reading objects such as common teaching materials and data can be supported, great convenience is achieved, answers can be rapidly obtained for the obscure contents encountered by a user in learning, and the learning efficiency of the user is effectively improved.

In an alternative embodiment, an electronic device is provided, as shown in fig. 8, the electronic device 4000 shown in fig. 8 comprising: a processor 4001 and a memory 4003. Processor 4001 is coupled to memory 4003, such as via bus 4002. Optionally, the electronic device 4000 may further comprise a transceiver 4004. In addition, the transceiver 4004 is not limited to one in practical applications, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.

The Processor 4001 may be a CPU (Central Processing Unit), a general-purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 4001 may also be a combination that performs a computational function, including, for example, a combination of one or more microprocessors, a combination of a DSP and a microprocessor, or the like.

Bus 4002 may include a path that carries information between the aforementioned components. The bus 4002 may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 4002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 8, but this is not intended to represent only one bus or type of bus.

The Memory 4003 may be a ROM (Read Only Memory) or other types of static storage devices that can store static information and instructions, a RAM (Random Access Memory) or other types of dynamic storage devices that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic Disc storage medium or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these.

The memory 4003 is used for storing application codes for executing the scheme of the present application, and the execution is controlled by the processor 4001. Processor 4001 is configured to execute application code stored in memory 4003 to implement what is shown in the foregoing method embodiments.

The present application provides a computer-readable storage medium, on which a computer program is stored, which, when running on a computer, enables the computer to execute the corresponding content in the foregoing method embodiments. Compared with the prior art, the method has the advantages that the image of the reading material is obtained, the characters in the image and the position information of the indicating material are identified through the image identification method, the characters pointed by the indicating material are determined by the characters and the position information of the indicating material, and the reading content is determined according to the characters pointed by the indicating material, so that the reading range is not limited by a reading pen and a specific reading teaching material, reading of reading materials such as common teaching materials and data can be supported, great convenience is realized, the answer can be quickly obtained for the content of the user who is rarely used in learning, and the learning efficiency of the user is effectively improved.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method for identifying read-on content, comprising:

acquiring an image of a point-reading material;

identifying the position information of the characters and the indicating object in the image, and determining the characters pointed by the indicating object according to the position information of the characters and the indicating object;

and determining the point reading content according to the character pointed by the indicator.

2. The method for recognizing click-to-read contents according to claim 1, wherein the recognizing the position information of the character and the pointing object in the image, and determining the character pointed by the pointing object according to the position information of the character and the pointing object comprises:

recognizing the image through an OCR recognition engine to obtain the position information of characters in the image; identifying position information of the indicator in the image by a fingertip detection method;

and calculating the character closest to the indicator according to the character and the position information of the indicator, and taking the character as the character pointed by the indicator.

3. The method for recognizing click-to-read contents according to claim 1, wherein the recognizing the position information of the character and the pointing object in the image, and determining the character pointed by the pointing object according to the position information of the character and the pointing object comprises:

recognizing the image through an OCR recognition engine to obtain character lines in the image and the positions of the character lines; identifying position information of the indicator in the image by a fingertip detection method;

4. The method for recognizing point-reading contents according to claim 3, wherein the obtaining of the character line closest to the pointing object as the target character line according to the character line and the position information of the pointing object comprises:

obtaining a position relation coefficient of the character line according to the relative position relation between the indicator and the bottom edge of the character line;

and taking the character line with the minimum weighted distance as the target character line.

5. The method for identifying read-through content according to claim 3, wherein the determining the pixel width occupied by a single character in the target text line comprises:

6. The method for recognizing click-to-read contents according to claim 3, wherein the obtaining of the character pointed by the pointing object by combining the target literal line and the position information of the pointing object comprises:

7. The method for identifying click-to-read contents according to any one of claims 1-6, wherein the step of determining click-to-read contents according to the characters pointed by the pointer comprises the following steps:

if the character pointed by the indicator is a Chinese character, determining click-to-read content according to the Chinese character;

8. An apparatus for recognizing click-to-read contents, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the method for recognizing click-to-read contents according to any one of claims 1 to 7 are implemented when the processor executes the program.

10. A computer-readable storage medium storing computer instructions for causing a computer to perform the steps of the method for identifying read-on-demand content according to any one of claims 1 to 7.