CN111542817A

CN111542817A - Information processing device, video search method, generation method, and program

Info

Publication number: CN111542817A
Application number: CN201980006824.0A
Authority: CN
Inventors: 盐泽繁
Original assignee: Recruit Holdings Co Ltd
Current assignee: Recruit Holdings Co Ltd
Priority date: 2018-01-25
Filing date: 2019-01-16
Publication date: 2020-08-14
Also published as: WO2019146466A1; JP2019128850A; JP6506427B1

Abstract

Provided is an information processing device (10) provided with: a storage unit (105) that stores a database in which a 2 nd character string generated by character recognition of an image of a 1 st character string, time information indicating a time at which the image of the 1 st character string is displayed in a video, and the video are stored in a manner so as to be associated with the video in which the images of a plurality of the 1 st character strings are displayed; a receiving unit (101) that receives a character string to be searched; a search unit (102) that searches a database for a 2 nd character string that includes a character string to be searched, time information corresponding to the 2 nd character string, and a video corresponding to the 2 nd character string; and an output unit (103) that outputs a screen including a 1 st display region (2001) for playing the searched video and a 2 nd display region (2002) for displaying the searched 2 nd character string and time information in chronological order.

Description

Information processing device, video search method, generation method, and program

Cross Reference to Related Applications

The present application is based on japanese application No. 2018-010904 (special application) filed on 25/1/2018, and the contents of the description thereof are incorporated herein by reference.

Technical Field

The invention relates to an information processing apparatus, a video search method, a generation method, and a program.

Background

An online learning system is known, which enables a user to learn using a web browser or the like. By using the online learning system, the user can watch interesting teaching videos, grasp the degree of understanding of the user by taking in a quiz, and review questions encountered in the quiz, and can efficiently promote learning. As a remote learning support system using a network, for example, a technique described in patent document 1 is known.

Documents of the prior art

Patent document

Patent document 1: japanese patent laid-open No. 2001 and 188461

Disclosure of Invention

Considering that there is a need for a user to review troublesome subjects, etc.: the lecture video does not have to be viewed all the way through but only a specific portion. For example, consider that the following needs exist: since the user wants to review the american history in the world history subjects, the user only wants to view the lecture video of the world history, which explains the united states by the instructor.

However, the conventional online learning system does not provide a function of retrieving a specific portion that a user desires to view from a lecture video. Therefore, the user needs to view the lecture video from beginning to end or search for a desired portion in person by fast forwarding or the like. This problem is not limited to lecture videos, but may occur in all videos.

Accordingly, an object of the present invention is to provide a technique capable of quickly searching for a specific portion in a video that a user wishes to view.

An information processing apparatus according to an aspect of the present invention includes: a storage unit that stores a database in which a 2 nd character string generated by character recognition of an image of a 1 st character string, time information indicating a time at which the image of the 1 st character string is displayed in the video, and the video are stored in association with each other with respect to the video in which the images of a plurality of the 1 st character strings are displayed; a reception unit that receives a character string to be searched; a search unit that searches the database for a 2 nd character string including the character string to be searched, time information corresponding to the 2 nd character string, and a video corresponding to the 2 nd character string; and an output unit that outputs a screen including a 1 st display area that plays the searched video and a 2 nd display area that displays the searched 2 nd character string and time information in chronological order. According to this mode, a technique capable of quickly searching for a specific portion in a video that a user desires to view can be provided.

In the above aspect, the output unit may output a screen in which the 2 nd character string and the time information retrieved in the 2 nd display area are displayed in parallel in time series in the horizontal or vertical direction. According to this aspect, since the plurality of text information and time information are displayed in chronological order in the 2 nd area on the screen, visibility can be improved.

In the above aspect, the output unit may further display, in the 2 nd display area, a message indicating that an image of the 1 st character string corresponding to the searched 2 nd character string is displayed in the video image. According to this aspect, the user can easily recognize on the screen that the search target is the 1 st character string displayed in the video.

In the above aspect, the output unit may display, in a manner to overlap with the video, information indicating a position in the video where the image of the 1 st character string corresponding to the searched 2 nd character string is displayed. According to this aspect, the user can easily grasp the position in the video where the character string to be searched is displayed.

In the above aspect, the output unit may highlight a portion of the 2 nd character string displayed in the 2 nd display area, which corresponds to the character string to be searched. According to this aspect, even when the number of characters in the 2 nd character string is large, for example, it is easy to grasp which part of the 2 nd character string matches the character string to be searched.

In the above aspect, the video may be a video captured when the lecturer gives a lecture using a blackboard, and the 1 st character string may be a character string including a plurality of handwritten characters written on the blackboard by handwriting. According to this aspect, the user can easily search for a portion displayed with a character string of a search target among handwritten characters written on the blackboard in the lecture video.

An information processing apparatus according to another aspect of the present invention includes: an extraction unit that extracts a 1 st image, which is an area in which the 1 st string image is displayed in a video, and outputs time information at which the 1 st string image starts to be displayed in the video; a dividing unit that divides the 1 st image extracted by the extracting unit into 2 nd images for each character included in the 1 st character string; a character recognition unit that outputs a plurality of candidate characters for each of the 2 nd images by performing character recognition on each of the 2 nd images; an output unit that outputs, as a 2 nd character string, a character string that is determined to be most similar to any of the plurality of candidate character strings, among the plurality of character strings that have a possibility of being used in the video, for a plurality of candidate character strings that are generated by combining the plurality of candidate characters output for each of the 2 nd images in the order of arrangement of characters in the 1 st character string; and a generation unit that generates a database in which the 2 nd character string output by the output unit, the time information output by the extraction unit, and the video are associated with each other. According to this aspect, the database can be automatically generated, and the user can quickly use a technique that can quickly search for a specific portion desired to be viewed in the video.

A video search method according to another aspect of the present invention is an information processing apparatus including a storage unit that stores a database in which a 2 nd character string generated by character recognition of an image of a 1 st character string, time information indicating a time at which the image of the 1 st character string is displayed in the video, and the video are stored in association with each other, the information processing apparatus including: a step of receiving a character string to be searched; a step of retrieving a 2 nd character string including the character string to be retrieved, time information corresponding to the 2 nd character string, and a video corresponding to the 2 nd character string from the database; and outputting a screen including a 1 st display area and a 2 nd display area, wherein the 1 st display area plays the retrieved video, and the 2 nd display area displays the retrieved 2 nd character string and time information in chronological order. According to this mode, a technique capable of quickly searching for a specific portion in a video that a user desires to view can be provided.

Another aspect of the present invention is a program for causing a computer to function as means including: a storage unit that stores a database in which a 2 nd character string generated by character recognition of an image of a 1 st character string, time information indicating a time at which the image of the 1 st character string is displayed in the video, and the video are stored in association with each other with respect to the video in which the images of a plurality of the 1 st character strings are displayed; a reception unit that receives a character string to be searched; a search means for searching for a 2 nd character string including the character string to be searched, time information corresponding to the 2 nd character string, and a video corresponding to the 2 nd character string from the database; and an output unit that outputs a screen including a 1 st display area that plays the retrieved video and a 2 nd display area that displays the retrieved 2 nd character string and time information in chronological order. According to this mode, a technique capable of quickly searching for a specific portion in a video that a user desires to view can be provided.

Effects of the invention

According to the present invention, it is possible to provide a technique capable of quickly searching for a specific portion in a video that a user desires to view.

Drawings

Fig. 1 is a diagram showing an example of a video distribution system according to an embodiment.

Fig. 2 is a diagram showing an example of the hardware configuration of the distribution server.

Fig. 3 is a diagram showing an example of the function block configuration of the distribution server.

Fig. 4 is a flowchart showing an example of a processing procedure when the lecture data DB is generated.

Fig. 5 is a diagram showing a specific example of processing for extracting an image of a character display region.

Fig. 6 is a diagram showing a process of specifying a keyword from an image of a character unit.

Fig. 7 is a diagram showing an example of the lecture data DB.

Fig. 8 is a diagram showing an example of a screen displayed on the terminal.

Fig. 9 is a diagram showing an example of a screen displayed on the terminal.

Detailed Description

Preferred embodiments of the present invention will be described with reference to the accompanying drawings. In the drawings, the same or similar structures are denoted by the same reference numerals.

< System architecture >

Fig. 1 is a diagram showing an example of a video distribution system according to an embodiment. The video distribution system includes a distribution server 10 and a terminal 20. The distribution server 10 and the terminal 20 can communicate with each other via a wireless or wired communication network N. One terminal 20 is illustrated in fig. 1, but a plurality of terminals 20 may be included in the present video distribution system. In the present embodiment, the distribution server 10 and the terminal 20 may be collectively referred to as an information processing apparatus, or only the distribution server 10 may be referred to as an information processing apparatus.

The distribution server 10 is a server that distributes a lecture video, and has a function of transmitting data of the lecture video requested from the terminal 20 to the terminal 20. The publishing server 10 may be one or more physical or virtual servers, or may be a cloud server.

The terminal 20 is a terminal operated by a user, and any terminal having a communication function may be used, and any terminal such as a smartphone, a tablet terminal, a mobile phone, a Personal Computer (PC), a notebook PC, a portable information terminal (PDA), or a home game device may be used.

In the present embodiment, the user can search for a lecture video in which a lecturer includes a character string to be searched for in an image of a character string (hereinafter referred to as a "handwritten character string") handwritten on a blackboard in a lecture video by inputting the character string to be searched for (a search keyword). For example, when the user inputs a character string of "organic compound" as a search target in the search screen of the terminal 20, a lecturer writes a lecture video of "organic compound" on a blackboard to be listed on the screen of the terminal 20. When the user selects a lecture video to be viewed from the listed lecture videos, the lecture video is played on the screen of the terminal 20, and the time when the lecturer writes "organic compound" on the blackboard is listed on the time axis of the lecture video (for example, about 5 minutes 30 seconds, 15 minutes 10 seconds, 23 minutes 40 seconds, and the like in a 30 minute video). When the user selects one of the listed times, the playing lecture video moves to the selected time.

In order to realize such an operation, the delivery server 10 stores text information (2 nd character string) generated by character-recognizing an image of a handwritten character string (1 st character string), time information indicating the time at which the image of the handwritten character string is displayed in a lecture video, and a lecture video (or information of a unique specific lecture video) in a database in association with each other. More specifically, the time information may be information indicating a time (hereinafter referred to as "appearance time") from display of the handwritten character string to completion of display in the lecture video. In the present embodiment, this Database is referred to as a "lecture data DB (Database)". Thus, the distribution server 10 can search for a lecture video including a character string to be searched for among handwritten articles written on a blackboard by a lecture teacher, and can search for a lecture video including a character string to be searched for among character strings to be written by a lecturer.

< hardware architecture >

Fig. 2 is a diagram showing an example of the hardware configuration of the distribution server 10. The delivery server 10 includes a CPU (central processing Unit) 11, a storage device 12 such as a memory, a communication IF (Interface) 13 for performing wired or wireless communication, an input device 14 for receiving input operations, and an output device 15 for outputting information. Each functional unit described in the functional block configuration described later can be realized by a process of causing the CPU11 to execute a program stored in the storage device 12. Further, the program can be stored in, for example, a non-transitory recording medium.

< Structure of functional Block >

Fig. 3 is a diagram showing an example of the functional block configuration of the distribution server 10. The delivery server 10 includes a reception unit 101, a search unit 102, an output unit 103, a generation unit 104, and a storage unit 105. The storage unit 105 stores lecture data DB.

The reception unit 101 has a function of receiving a character string to be searched, which is input by a user on the screen of the terminal 20.

The search unit 102 searches the lecture data DB for "text information" including the character string to be searched received by the reception unit 101, "presentation time" corresponding to the text information, and "lecture video" corresponding to the text information.

The output unit 103 outputs a screen including an area (1 st area) for playing the lecture video retrieved by the retrieval unit 102 and an area (2 nd area) for displaying the retrieved text information and the appearance time (time information) in chronological order. The outputted screen is displayed on the display of the terminal 20. The output unit 103 may have a web server function, for example, and may have a function of transmitting a website for delivering a lecture video to the terminal 20. Alternatively, the output unit 103 may have a function of transmitting, to the terminal 20, content for displaying a lecture video or the like on a screen of an application installed in the terminal 20.

The generation unit 104 generates the lecture data DB by character-recognizing a handwritten character string displayed in a video of the lecture video. The generation unit 104 further includes a region extraction unit 1041, a division unit 1042, a single character recognition engine 1043, a character string recognition engine 1044, and a DB generation unit 1045. The processing performed by the region extracting unit 1041, the dividing unit 1042, the character recognition engine 1043, the character string recognition engine 1044, and the DB generating unit 1045 will be described later.

< Generation of teaching data DB >

Next, a method of generating the lecture data DB will be specifically described with reference to fig. 4. In the following description, the description is given on the premise that the generation unit 104 of the distribution server 10 generates the lecture data DB, but the generation of the lecture data DB by the distribution server 10 itself is not necessarily required, and may be performed by an external information processing apparatus. In that case, the generation unit 104 may be mounted in an information processing apparatus other than the distribution server 10 instead of the distribution server 10, and register the lecture data DB generated by the information processing apparatus in the storage unit 105 of the distribution server 10.

In step S101, the region extraction unit 1041 extracts an image (1 st image) of a character display region displayed as a handwritten character string in the lecture video. Further, the time (appearance time) from the display to the end of the display of the handwritten character string in the lecture video is determined and output. If there are a plurality of handwritten character strings, extraction of an image in a character display area and determination of appearance time are performed for each handwritten character string.

A specific example of processing for extracting an image in a character display area and determining the appearance time of one handwritten character string will be described with reference to fig. 5. The region extraction unit 1041 performs image processing on a video (fig. 5 (a)) in which the lecturer gives a lecture while writing on the blackboard, by a predetermined number of frames (for example, 80 frames), and extracts a region (a region other than the background) which is different from the background. For example, the region extraction unit 1041 outputs a score (probability) indicating the possibility of being different from the background image in units of pixels and in units of the predetermined number of frames. By this processing, scores equal to or larger than a predetermined value are output for pixels in a region where characters are written on the blackboard and pixels in a region where the instructor appears.

Next, the region extraction unit 1041 extracts pixels whose output score is equal to or greater than a predetermined value. An example of the extracted pixel is shown in fig. 5 (b). The extraction portion 500 shown in fig. 5 (b) represents a portion of the extracted pixel set. In addition, the region extraction unit 1041 preferably performs a process of removing a region where the instructor appears when extracting a region different from the background. For example, the region extraction unit 1041 may extract a pixel (region) in which the score value fluctuation in a predetermined time period (for example, 10 seconds) is equal to or less than a predetermined threshold value in the process of extracting a region different from the background. This makes it possible to prevent pixels recognized by a lecturer who moves around in a video from being extracted as a region different from the background. In the process of extracting a region different from the background, when the area of the extracted pixel set region is larger than a predetermined value, the region extraction unit 1041 may treat the region as a non-extraction target, assuming that the instructor is extracted instead of the character string. This makes it possible to prevent the pixels appearing on the instructor from being extracted as a region different from the background. The region extraction unit 1041 determines, in the lecture video, the time from the appearance of the pixel collection portion until the disappearance of the pixel collection portion as the appearance time when the handwritten character string is displayed in the lecture video.

Next, the region extraction unit 1041 determines the position (for example, the pixel position on the lower left of the rectangle when the lower left of the video is used as the starting point) and the size (the vertical and horizontal sizes) of a rectangular frame surrounding the pixel collection portion. The frame 510 shown in fig. 5 (b) is an example of the determined rectangular frame.

Next, the region extraction unit 1041 extracts an image of a character display region in which a handwritten character string is displayed in the lecture video by extracting a region surrounded by a rectangular frame from an image of any frame among images of frames constituting the lecture video in the presentation time period.

In step S102, the dividing unit 1042 divides the image of the character display region extracted by the region extraction unit 1041 into images (2 nd images) of one character unit constituting the handwritten character string. The dividing section 1042 binarizes the image of the character display region, and divides the image into images of one character unit by regarding, for example, a portion where the illuminance of all pixels in the vertical axis direction of the image is lower than a prescribed threshold as a break of a character. Fig. 5 (c) shows a specific example of the position of the interruption.

In step S103, the single-character recognition engine 1043 outputs a plurality of candidate characters for each image by performing character recognition on the image of one character unit constituting the handwritten character string. A specific example is shown in fig. 6. The candidate characters 1 to 5 shown in fig. 6 show examples of a plurality of candidate characters output by character recognition of images of "different characters", "characters", and "bodies", respectively.

In addition, when the single character recognition engine 1043 has a high-precision character recognition capability, the process may be performed without going to step S104, and the candidate characters output by the single character recognition engine 1043 may be directly stored in the lecture data DB as text information. For example, in the example of fig. 6, when the character recognition engine 1043 has the capability of correctly recognizing images of "different" and "properties" as "different", it may directly store "different (different) properties" obtained by combining the recognized texts of "different" and "properties" into the lecture data DB as text information.

In step S104, the character string recognition engine 1044 (output unit) generates a plurality of candidate character strings by combining a plurality of candidate characters output for each character unit image in the order of arrangement of characters in the handwritten character string. For example, in the example of fig. 6, 125(5 × 5 × 5) candidate character strings are generated by combining five candidate characters corresponding to "different (different)", five candidate characters corresponding to "sex", and five candidate characters corresponding to "body".

Here, the character string recognition engine 1044 has previously learned a plurality of keywords (character strings) that may be used in the lecture video, and has a function of inputting an arbitrary character string and outputting a keyword determined to be most similar to the input character string and a score indicating a degree of similarity among the plurality of keywords. There is a possibility that the keywords used in the lecture video are, for example, keywords described in the index of the textbook, such as "maotai country" or "dechunkangkang" in the case of the lecture video of japanese history. However, keywords are generally different depending on subjects. Therefore, a character string recognition engine 1044 in which different keywords are learned according to the attributes of the lecture video (subject, name of lecture, and the like) may be prepared, and the processing step of step S104 may be performed using the character string recognition engine 1044 according to the attributes of the lecture video.

Next, the character string recognition engine 1044 outputs, as text information corresponding to the handwritten character string, a keyword (character string) determined to be most similar to any of the generated plurality of candidate character strings among keywords (character strings) learned in advance as a plurality of keywords (character strings) that may be used in the lecture video. More specifically, the character string recognition engine 1044 outputs the keyword and the similarity (score) determined to be the most similar to each of the generated plurality of candidate character strings, and outputs the keyword having the highest similarity to be output as text information corresponding to the handwritten character string.

Fig. 6 shows an example of a case where the character string recognition engine 1044 outputs the similarity between each of the 125 candidate character strings and the learned keyword (in the example of fig. 6, at least "different" entity ") and outputs the learned keyword" different "having the highest similarity to be output as the text information corresponding to the handwritten character string. Even when the character recognition engine 1043 cannot correctly recognize the character and the "different" body is not included in the 125 candidate character strings, the "different" body can be output from the character string recognition engine 1044 as the text information corresponding to the handwritten character string as long as the candidate character string similar to the "different" body (for example, the "different" body) is included in the plurality of candidate character strings.

The generation unit 104 repeats the processing steps up to steps S101 to 104 described above for each handwritten character string displayed in the lecture video, and thereby determines a keyword and an appearance time for each of the plurality of handwritten character strings displayed in the lecture video.

In step S105, the DB generation unit 1045 generates the lecture data DB by associating the text information output from the character string recognition engine 1044 in the processing step of step S104, the appearance time output from the area extraction unit 1041 in step S101, and the lecture video (which may be the file name of the lecture video) to be processed.

Fig. 7 is a diagram showing an example of the lecture data DB. The teaching video stores an identifier for uniquely identifying the teaching video. The identifier includes the subject of the teaching video, the name of the teaching, and the like. The identifier may be, for example, a file name of a subject including a video of a lecture. The "appearance time" stores the time from display to disappearance of the handwritten character string in the lecture video. The "text information" stores text data corresponding to a handwritten character string. In the example of fig. 7, data indicating that the "complex ion formation reaction" is displayed for a period of 0 min 05 sec to 3 min 10 sec, and the "elemental analysis" is displayed for a period of 1 min 20 sec to 3 min 10 sec is stored in the lecture video of "chemistry _ 1 st _ organic compound structure determination _ chapter 1".

< search for teaching >

Next, a process procedure when the user searches the lecture video will be specifically described. Fig. 8 and 9 are diagrams showing an example of a screen displayed on the terminal 20. Fig. 8 (a) is an example of a screen for searching a lecture video. On the screen for searching the lecture video, an input box 1001 for inputting a character string of a search target and a subject of the lecture video as a search target is provided. When a search button displayed on the right side of the input box 1001 is pressed, the search unit 102 accesses the lecture data DB, and searches whether or not there is a lecture video in which the text information of the lecture video corresponding to the input subject includes a character string to be searched. When there is a lecture video in which the text information includes a character string to be searched for, the output unit 103 outputs a screen on which the searched lecture video is listed. Further, the output unit 103 may output a screen in which the lecture videos are listed when there are a plurality of retrieved lecture videos, and may directly transit to a "screen for playing the lecture video" (fig. 9 (a)) "described later when there is one retrieved lecture video.

Fig. 8 (b) is an example of a screen listing the retrieved lecture videos. The search result is listed in the display area 1003. For example, when the user selects "chemistry" as a subject and searches for a character string to be searched for by inputting "ion", a lecture video in which a lecturer writes "ion" on a blackboard is listed as a search result in the display area 1003 from a lecture video relating to chemistry.

Next, when the user selects a lecture video desired to be viewed from the lecture videos listed in the display area 1003, transition is made to a screen on which the lecture video is played. Since the display area 1003 has a function of accepting selection of a lecture video that the user desires to view in addition to listing the retrieved lecture video, a screen including the display area 1003 may be referred to as a screen accepting selection of a lecture video that the user desires to view.

An example of a screen for playing a lecture video is shown in fig. 9 (a). Fig. 9 (a) includes: a display area 2001 (1 st area) for playing a lecture video; a display area 2002 (area 2) for displaying text information including a character string to be searched and a start time for starting display of a handwritten character string in a lateral direction in a chronological order; and a display area 2004 (area 3) for displaying the character strings searched in the past for the subjects of the lecture video played back in the display area 2001. A button 2003 for listing the start time and the text information is displayed on the upper portion of the display area 2002. When the user presses a button 2003, as shown in fig. 9 (b), a display area 2005 (2 nd area) in which text information including a character string to be searched and time stamp information are displayed in parallel in the vertical direction in chronological order is displayed instead of the display area 2002.

In the display area 2002 and the display area 2005, the word "blackboard-writing" is displayed as a message indicating that the search result is a handwritten character string displayed in the lecture video (a message indicating that a handwritten character string corresponding to the searched text information is displayed in the lecture video). In addition, the number of pieces of text information including the character string to be searched for is displayed in the display area 2102 above the display area 2002 and the display area 2005.

The portion of the text information displayed in the display area 2002 and the display area 2005 that matches the character string to be searched may be highlighted. For example, in the examples of fig. 9 (a) and 9 (b), the "complex ion formation reaction" and the "ion" portion of the character string to be searched out among the "hydrogen ions" are highlighted.

The display area 2002 and the display area 2005 may also display the end time of the display of the handwritten character string. For example, the display area 2002 and the display area 2005 may display the appearance time of the handwritten character string as "0: 05 to 3:10 complex ion formation reaction".

In addition, information indicating the display position of the handwritten character string corresponding to the searched text information in the lecture video may be displayed in the display area 2001 so as to be superimposed on the lecture video. For example, as shown in fig. 9 (a) and 9 (b), a frame 2101 indicating the position where the "complex ion formation reaction" as the searched text information is displayed in the lecture video may be displayed in the display area 2001. In order to display the frame 2101, information indicating the position of the frame 2101 and the size of the frame 2101 may be further stored for each record in the lecture data DB. The information stored in the lecture data DB as the information indicating the position of the frame 2101 and the size of the frame 2101 may be the same information as the information indicating the position and the size of the rectangular frame surrounding the extracted set of pixels described in step S101 in fig. 4. The frame 2101 may be continuously displayed in the display area 2001 for the appearance time corresponding to the searched text information.

When the user selects a lecture video in the display area 1003 ((b) of fig. 8), playback of the lecture video is started in the display area 2001. Next, when the user selects the start time and text information to be viewed from among the start times and text information displayed in the

display area

2002 or 2005, the lecture video displayed in the display area 2001 is played from the time of the selected start time or a time before a predetermined time (for example, 10 seconds before) from the time of the start time. For example, when the user clicks a portion displayed at 2:15 in the display area 2002, a lecture video is played from a point of time of 2:15 or before a predetermined time (for example, 2:06 or the like) in the display area 2001.

When the user selects the lecture video in the display area 1003 ((b) of fig. 8), the user may first start playing the lecture video by pressing a play start button displayed in the display area 2001 or by selecting time stamp information desired to be viewed from time stamp information and text information displayed in the

display area

2002 or 2005, instead of starting playing the lecture video in the display area 2001.

Alternatively, the user may slide from right to left (or left to right) in the display area 2002 to display the next (or previous) start time and text information. For example, in the example of fig. 9 (a), the user may slide from right to left in the display area 2002 to cause the text information having the start time of 0:05 to disappear from the left side and the text information having the start time of 2:15 to move from the right side to the left side, and further cause the next text information to appear on the right side.

Likewise, the next (or previous) time stamp information and text information may also be displayed by the user sliding from top to bottom (or from bottom to top) within the display area 2005.

In addition, when the number of characters of the text included in the text information searched by the search unit 102 is equal to or greater than the predetermined number of characters, the output unit 103 may output only a part of the text including at least the character string to be searched among the texts included in the searched text information in the display area 2002. Thus, even when the number of characters of the text included in the text information is too large to display all characters in the display area 2002 or the display area 2005, or when the terminal 20 is a smartphone or the like and the display size is small to display all the text information, the text information can be displayed without significantly sacrificing visibility.

Further, the character strings searched for the subjects of the lecture video displayed in the display area 2004 in the past may be displayed in descending order of the number of inputs of the character strings inputted by the plurality of users who have searched for the character strings in the past using the present video distribution system. In addition, when the user selects a character string displayed in the display area 2004, the selected character string may be automatically input into the input box 1001.

The present embodiment has been described above. In the present embodiment, the lecture data DB stores therein text information obtained by converting characters written on a blackboard by a lecturer in a lecture video into text, and searches the lecture video by comparing a character string to be searched with the text information. Thus, the present embodiment has a technical effect of being able to increase the search speed as compared with a method of searching for a character string while directly analyzing a video of a lecture video.

In the above description, the presentation time stored in the lecture data DB includes a time when the handwritten character string starts to be displayed (a time when the character string is written on the blackboard) and a time when the display is completed (for example, a time when the lecturer erases the character with a blackboard eraser or the like), but may include only a time when the handwritten character string starts to be displayed. This can reduce the data capacity of the lecture data DB. The time at which the display of the handwritten character string is started and the time at which the display is completed may be collectively referred to as "time information", or only the time at which the display of the handwritten character string is started may be referred to as "time information".

In the above description, the description has been given on the premise that the video displaying the character string is a lecture video in which a lecturer gives a lecture while handwriting characters on a blackboard, but the present embodiment is not limited to the lecture video and the handwritten characters. The present embodiment can be applied to any character string and video as long as the video displays the character string.

The embodiments described above are for the purpose of facilitating understanding of the present invention, and are not intended to limit the present invention. The flowcharts, sequences, elements included in the embodiments, and the arrangement, materials, conditions, shapes, dimensions, and the like of the elements are not limited to those illustrated in the embodiments, and may be appropriately modified. In addition, the structures described in the different embodiments can be partially replaced or combined with each other.

Claims

1. An information processing apparatus, comprising:

a storage unit that stores a database in which a 2 nd character string generated by character recognition of an image of a 1 st character string, time information indicating a time at which the image of the 1 st character string is displayed in the video, and the video are stored in association with each other with respect to the video in which the images of a plurality of the 1 st character strings are displayed;

a reception unit that receives a character string to be searched;

a search unit that searches the database for a 2 nd character string including the character string to be searched, time information corresponding to the 2 nd character string, and a video corresponding to the 2 nd character string; and

and an output unit that outputs a screen including a 1 st display area that plays the searched video and a 2 nd display area that displays the searched 2 nd character string and time information in chronological order.

2. The information processing apparatus according to claim 1, wherein the output unit outputs a screen in which the 2 nd character string and the time information retrieved in the 2 nd display area are displayed in parallel in time order in a horizontal direction or a vertical direction.

3. The information processing apparatus according to claim 2, wherein the output unit further displays, in the 2 nd display area, a message indicating that an image of the 1 st character string corresponding to the retrieved 2 nd character string is displayed in the video.

4. The information processing apparatus according to any one of claims 1 to 3, wherein the output unit displays, in superimposition with the video, information indicating a position in the video where an image of a 1 st character string corresponding to the retrieved 2 nd character string is displayed.

5. The information processing apparatus according to any one of claims 1 to 4, wherein the output unit highlights a portion of the 2 nd character string displayed in the 2 nd display area that matches the character string of the search target.

6. The information processing apparatus according to any one of claims 1 to 5, wherein the video is a video captured when an intercom gives a lecture on a blackboard,

the 1 st character string is a character string including a plurality of handwritten characters written on the blackboard by handwriting.

7. An information processing apparatus, comprising:

an extraction unit that extracts a 1 st image, which is an area in which the 1 st string image is displayed in a video, and outputs time information at which the 1 st string image starts to be displayed in the video;

a dividing unit that divides the 1 st image extracted by the extracting unit into 2 nd images for each character included in the 1 st character string;

a character recognition unit that outputs a plurality of candidate characters for each of the 2 nd images by performing character recognition on each of the 2 nd images;

an output unit that outputs, as a 2 nd character string, a character string that is determined to be most similar to any of the plurality of candidate character strings, among the plurality of character strings that have a possibility of being used in the video, for a plurality of candidate character strings that are generated by combining the plurality of candidate characters output for each of the 2 nd images in the order of arrangement of characters in the 1 st character string; and

and a generation unit that generates a database in which the 2 nd character string output by the output unit, the time information output by the extraction unit, and the video are associated with each other.

8. A video search method performed by an information processing device having a storage unit that stores a database in which a 2 nd character string generated by character recognition of an image of a 1 st character string, time information indicating a time at which the image of the 1 st character string is displayed in the video, and the video are stored in association with each other, the video search method comprising:

a step of receiving a character string to be searched;

a step of retrieving a 2 nd character string including the character string to be retrieved, time information corresponding to the 2 nd character string, and a video corresponding to the 2 nd character string from the database; and

and outputting a screen including a 1 st display area and a 2 nd display area, wherein the 1 st display area plays the retrieved video, and the 2 nd display area displays the retrieved 2 nd character string and time information in chronological order.

9. A program for causing a computer to function as a unit including:

a reception unit that receives a character string to be searched;

a search means for searching for a 2 nd character string including the character string to be searched, time information corresponding to the 2 nd character string, and a video corresponding to the 2 nd character string from the database; and

and an output unit that outputs a screen including a 1 st display area that plays the retrieved video and a 2 nd display area that displays the retrieved 2 nd character string and time information in chronological order.