CN106488257A

CN106488257A - A kind of generation method of video file index information and equipment

Info

Publication number: CN106488257A
Application number: CN201510537180.1A
Authority: CN
Inventors: 吴贻刚
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2015-08-27
Filing date: 2015-08-27
Publication date: 2017-03-08
Also published as: WO2017032245A1

Abstract

This application discloses a kind of generation method of video file index information, including：According to the foreground target that the coded sequence generating after the decoding of described video file is obtained with described video file；Obtain the characteristic information of the described foreground target of all frame of video under same key frame；Described foreground target in adjacent two-by-two frame of video under same key frame is carried out mate the behavioural information generating described foreground target under described key frame according to described characteristic information；Behavioural information according to foreground target described under described key frame generates the index information of described video file, and the index information in the application has structural description, can be followed according to described index and find target video file as early as possible.

Description

A kind of generation method of video file index information and equipment

Technical field

The application is related to field of computer technology, more particularly to a kind of generation of video file index information Method and apparatus.

Background technology

Video data is the maximum data of proportion on the Internet, contains extremely abundant in video data Content information it may be said that being one " data gold mine ".The order of magnitude of the daily video data producing is huge, But these data are not greatly further analyzed utilization after being saved in storage system, simply expire Delete or storage filing, typically the data of such as video monitoring is will be by after the very short time Delete.

For the problems referred to above, in the prior art video code flow is pushed to by storage by video capture device After server, video data is analyzed, video analytics server is analyzed to video, output analysis Result.

During realizing the application, it is found by the applicant that at least there is problems with prior art：

When being analyzed to video data, the description of Un-structured is it is impossible to preferably utilize these video counts According to, and because video data amount of storage is very greatly it is impossible to accurately find the video information of needs.

Content of the invention

The purpose of the application is to provide a kind of generation method of video file index information and equipment, passes through Described video file is generated to the behavioural information of the foreground target that the analysis of foreground target in key frame obtains Index information, server behavioural information of foreground target in the key frame generating is carried out to video file Structurized description, so that server preferably can utilize these video files, and described server Index information according to having structural description can find target video file as early as possible.

The technical scheme of the application is as follows：

A kind of generation method of video file index information, methods described includes：

Server is according to the prospect that the coded sequence generating after video file decoding is obtained with described video file Target；

Described server obtains the feature letter of the described foreground target of all frame of video under same key frame Breath；

Described server is according to described characteristic information in adjacent two-by-two frame of video under same key frame Described foreground target carries out mating the behavioural information generating described foreground target under described key frame；

Described server generates described video literary composition according to the behavioural information of foreground target described under described key frame The index information of part.

Described server obtains described video literary composition according to the coded sequence generating after the decoding of described video file The foreground target of part, specially：

Described server determines all key frames comprising in described video file；

Described video file is split by described server according to all key frames comprising in described video file For different sub-video files；

The mode having the multiple submodule of decoding function in described server according to load balancing is to different Sub-video file is decoded, and obtains institute according to the coded sequence generating after different sub-video file decodings State the foreground target of video file.

Described index information also includes：

Storage location in video file of the title of video file, the generation time of key frame, key frame, Interval frame number between adjacent key frame；

Wherein, the title of described video file is the video acquisition according to collection video file for the described server The acquisition time of the device numbering of equipment and video file generates；

Storage location in video file of the generation time of described key frame, key frame and adjacent key frame Between interval frame number be according to described server video file is analyzed obtain；

Storage location in video file of the generation time of described key frame, key frame and adjacent key frame Between interval frame number concrete acquisition modes comprise the following steps：

There is in described server the method for salary distribution according to load balancing for the multiple submodule of video analysis function Different sub-video files is analyzed, so that described server obtains key frame in described video file Generation time, interval frame number between the storage location in video file and adjacent key frame for the key frame.

Described characteristic information includes：Physical features, textural characteristics, architectural feature and mathematical feature.

Described server is according to described characteristic information in adjacent two-by-two frame of video under same key frame Described foreground target carries out mating the behavioural information generating described foreground target under described key frame, specially：

Described server judges under same key frame according to described physical features, textural characteristics, architectural feature Same foreground target in all frame of video；

Described server is according to described mathematical feature to same in adjacent two-by-two frame of video under same key frame Individual foreground target carries out mating the behavioural information generating described foreground target.

A kind of server, described server includes：

First acquisition module, for obtaining institute according to the coded sequence generating after the decoding of described video file State the foreground target of video file；

Second acquisition module, for obtaining the described foreground target of all frame of video under same key frame Characteristic information；

First generation module, for according to described characteristic information to adjacent two-by-two regarding under same key frame Described foreground target in frequency frame carries out mating the behavior letter generating described foreground target under described key frame Breath；

Second generation module, generates institute for the behavioural information according to foreground target described under described key frame State the index information of video file.

Described first acquisition module, specifically for：

Determine all key frames comprising in described video file；

Described video file is split as different sons by all key frames according to comprising in described video file Video file；

There is in described first acquisition module the mode pair according to load balancing for the multiple submodule of decoding function Different sub-video files is decoded, according to the coded sequence generating after different sub-video file decodings Obtain the foreground target of described video file.

Described index information also includes：

Storage location in video file of the generation time of described key frame, key frame and adjacent key frame Between interval frame number be according to the analysis module in described server video file is analyzed obtain 's；

Described analysis module, specifically for：

There is in described analysis module the distribution side according to load balancing for the multiple submodule of video analysis function Formula is analyzed to different sub-video files, so that described server obtains key in described video file The generation time of frame, the key frame interval frame between the storage location in video file and adjacent key frame Number.

Described first generation module, specifically for：

All frame of video under same key frame are judged according to described physical features, textural characteristics, architectural feature In same foreground target；

According to described mathematical feature to same foreground target in adjacent two-by-two frame of video under same key frame Carry out mating the behavioural information generating described foreground target.

The behavioural information of the foreground target by obtaining to the analysis of foreground target in key frame for the application generates The index information of described video file, the behavioural information pair of server foreground target in the key frame generating Video file has carried out structurized description, so that server preferably can utilize these video files, And described server can find target video literary composition as early as possible according to the index information with structural description Part.

Brief description

In order to be illustrated more clearly that the application or technical scheme of the prior art, below will to the application or In description of the prior art the accompanying drawing of required use be briefly described it should be apparent that, below describe in Accompanying drawing be only some embodiments of the present application, for those of ordinary skill in the art, do not paying On the premise of going out creative work, other accompanying drawings can also be obtained according to these accompanying drawings.

Fig. 1 is the generation method flow chart of one of the embodiment of the present application video file index information；

Fig. 2 is the structural representation of one of the embodiment of the present application server.

Specific embodiment

Below in conjunction with the accompanying drawing in the application, the technical scheme in the application is carried out clear, complete Description is it is clear that described embodiment is a part of embodiment of the application, rather than whole enforcement Example.The other embodiment being obtained based on the embodiment in the application, those of ordinary skill in the art, is all belonged to Scope in the application protection.

A kind of generation method of video file index information, as shown in figure 1, methods described includes following step Suddenly：

Step 101, server obtains described video literary composition according to the coded sequence generating after video file decoding The foreground target of part.

Specifically, in order to reach the quick purpose processing video file, need to split described video file Become multiple sub-video files, described server can be processed to multiple sub-video files simultaneously.Due to video File is made up of frame of video, wherein comprises to have many height in frame of video under multiple key frames, key frame Frame (non-key frame), wherein, key frame is residing for the key operations in role or object of which movement or change That frame, determine key frame when be to be determined according to the frame head of frame of video, the frame head of frame of video has According to described identification value, one identification value, can judge whether frame of video is key frame, and a pass Subframe under key frame is the continuity to described keyframe content, for controlling the duration of described key frame, I.e. the more duratioies of subframe are longer.The content of different key frame performances in one video is probably difference , that is, a video is made up of some, can be distinguished different in a video file according to key frame Partial content, and then reach the quick purpose searching concrete sub-video file in a video file, institute Described video file is split, specifically after stating key frame in determining video file for the server Method for splitting can be by a video file from the beginning of the 1st key frame, according to GOP (one GOP is exactly one group of continuous picture for Group of Pictures, picture group) size size will Described video file splits into multiple sub-video files.

There is in described server the multiple submodule of decoding function, the plurality of submodule can be according to negative The mode in a balanced way that carries is decoded to different sub-video files, after to different sub-video file decodings The coded sequence generating obtains the foreground target of described video file, so can reach quick civilian to video The purpose that part is processed, is can be in display device (electricity to the described coded sequence generating after video file decoding Depending on, computer) can show, specifically, described coded sequence can for YUV (Luma and Chroma, A kind of colour coding method, for optimizing the transmission of colour-video signal) form or RGB (Red Green Blue, a kind of color-coded method, for showing color graphics on the display device) code sequence of form Row, generate coded sequence after sub-video file is decoded, extract coding by background modeling algorithm The background parts of sequence neutron video file, are then filtered off the interference sections in foreground target, wherein, Described foreground target can include：People, car and other, that is,：The things of emphasis performance in video.

Step 102, described server obtains the described foreground target of all frame of video under same key frame Characteristic information.

Described physical features describe size in picture for the foreground target, described textural characteristics description It is the color of foreground target, described architectural feature describes shape and the construction of foreground target various pieces, Described mathematical feature describes the motion-vector of foreground target, and physical features, textural characteristics, knot The number of structure feature.

In one video file, emphasis performance is foreground target, by just obtaining to the analysis of foreground target To the specific descriptions of described video file, the foreground target in video file is being analyzed be to need to obtain It is taken at the same prospect mesh of different video frame in the sub-video file under the same key frame of a video file Target characteristic information, just can be complete by the analysis of the characteristic information of the same foreground target to different video frame Become the analysis of foreground target in described sub-video file, specific characteristic information can be：Corner Feature is believed Breath, sift (Scale-invariant feature transform, scale invariant feature is changed) characteristic information, with And self-defining other characteristic informations.

Step 103, described server is according to described characteristic information to adjacent two-by-two regarding under same key frame Described foreground target in frequency frame carries out mating the behavior letter generating described foreground target under described key frame Breath.

Specifically, the physical features of same foreground target in different video frame under same key frame, Textural characteristics and architectural feature are identicals, described server can according to the physical features of foreground target, Textural characteristics, architectural feature judge the same foreground target in different video frame under same key frame, After judging same foreground target, adjacent two-by-two under same key frame by contrasting described foreground target Frame of video in mathematical feature just can determine that out the behavioural information of described foreground target, described behavioural information It is specifically as follows：The motion-vector (including displacement and moving direction) of foreground target, foreground target are Whether no disappearance or foreground target newly increase, and the behavioural information of described foreground target can treat as institute State the structural description of key frame in video file.

Step 104, described server generates institute according to the behavioural information of foreground target described under described key frame State the index information of video file.

Described index information also includes：

Storage location in video file of the title of video file, the generation time of key frame, key frame, Interval frame number between adjacent key frame.

Wherein, the title of described video file is the video acquisition according to collection video file for the described server The acquisition time of the device numbering of equipment and video file generates.

Storage location in video file of the generation time of described key frame, key frame and adjacent key frame Between interval frame number be according to described server video file is analyzed obtain.

Described video file is torn open according to all key frames comprising in described video file in described server After being divided into different sub-video files, there is in described server the multiple submodule of video analysis function It is analyzed according to the method for salary distribution of the load balancing sub-video file different to described server, so that institute State server and obtain the generation time of key frame, key frame depositing in video file in described video file Storage space puts the interval frame number and adjacent key frame between.

Specifically, the acquisition time of the device numbering according to video capture device and video file is to collection Video file name not only can distinguish different video files, can also search specific video acquisition and set The standby video file collecting in the specific time, for example：User generally requires in checking monitoring video Search the video file that concrete monitoring device collects in the specific time, user can be according to video file Title just can find oneself needs video file.Generation in the key frame obtaining sub-video file It is to adopt during interval frame number between the storage location in video file and adjacent key frame of time, key frame The method being jointly processed by multiple sub-video files with multiple multiple submodules with video analysis function, this Sample can reduce and obtains in whole video file the generation time of key frame, key frame in video file The time of the interval frame number between storage location and adjacent key frame.

The knot containing each key frame in described video file in described index information describes in detail, i.e. institute State the detailed description of video file various pieces, the title according to the video file in index information can be looked for To specific video file, different key frames be can determine according to the behavioural information of foreground target in key frame Emphasis shows content, according to storage location in video file of the generation time of key frame, key frame, Interval frame number between adjacent key frame can find the Target key frames requiring to look up in a video file Corresponding sub-video file, for example：User wants to look up a certain monitoring in the video sometime producing The video file that certain foreground target disappears, described server produces according to the numbering of described monitoring and video Time find specific video file, the behavioural information further according to foreground target determines corresponding key frame, In storage location in video file of the generation time according to described key frame, key frame, adjacent key Interval frame number between frame just can determine that out the specific content that user requires to look up, you can with according to described Index information determines the performance content of the concrete part of described video file, thus avoid finding Specific video file latter point a little searches the content oneself wanting to look up.

After determining the index information of video file, described index information is generated form structure, so that Described server is big data inquiry and analysis is done data and prepared, when described server is providing the user inspection During rope service, when described server is received search instruction, search list content according to user instruction and carry out Search, if now not finding the video file of needs, described server uses machine learning algorithm, The characteristic information that Utilization prospects target preserves carries out Classification and Identification, if there being corresponding video file, exporting and referring to Fixed video file, otherwise retrieval service failure；When described server provides statistical fractals, described clothes Business device first calls retrieval service, searches video file to be counted, if no corresponding video file, leads to Crossing the cluster that clustering algorithm carries out certain class video file, if clustering out video file, exporting the video specified File.Otherwise statistical fractals failure；When described server provides prediction service, first call retrieval service And statistical fractals, Trend Algorithm analysis is done according to history video file, provides and predict the outcome.

After determining described index information, methods described also includes：

Described server obtains the Video coding type of described video file；

Described Video coding type is stored in the index information of described video file described server, with Described server is made to be selected according to described Video coding type after the particular location finding target video place Select corresponding player.

The behavioural information of the foreground target by obtaining to the analysis of foreground target in key frame for the application generates The index information of corresponding key frame, the behavioural information of server foreground target in generating key frame is to video File has carried out structurized description, so that server really can utilize these video files, and institute State server and target video literary composition can be found as early as possible according to the index information of the key frame with structural description Part.

The design of the technical scheme providing with reference to the embodiment of the present application of specific application scenarios is entered Row describes in detail.In the embodiment of the present application, it is applied under the scene of cloud computing, described server includes Storage cluster in cloud computing and computing cluster, specific as follows：

Described storage cluster device numbering according to video capture device and described after receiving video file The acquisition time of video file is named to video file, is stored according to the difference of collecting device simultaneously Under different catalogues, that is, the video file of same collecting device collection is stored under same catalogue.

Described storage cluster carries out Video coding type analysis to the video file preserving, and obtains described video The Video coding type of file, described Video coding type is used for determining for described video file to be broadcast accordingly Put device.

Described storage cluster, will be described according to GOP size from the beginning of the 1st key frame of described video file Video file is divided into multiple little sub-video files, that is, from the beginning of first key frame, is regarded according to described Described video file is divided into multiple sub-video files by the key frame in frequency file, each sub-video file In comprise multiple frame of video.

Described storage cluster will split after from video file according to the method for load balancing be pushed to calculating collection In different computing devices in group, so that the different computing device sub-video files in computing cluster are carried out Frame of video analysis and decoding.

Different computing device sub-video files in described computing cluster carry out video frame type analysis, obtain Take the storage location in video file of the generation time of key frame, key frame in described video file and phase Interval frame number between adjacent key frame.

Different computing devices in described computing cluster carry out front background to decoded sub-video file and divide From specifically, isolating the background of video pictures by background modeling algorithm, and filter out foreground target In interference sections.

If having 3 key frames in described video file, respectively key frame 1, key frame 2 and key frame 3, correspond to sub-video file 1, sub-video file 2 and sub-video file 3 respectively, in described computing cluster Computing device 1 process sub-video file 1, computing device 2 in described computing cluster processes sub-video literary composition Part 2, the computing device 3 in described computing cluster processes sub-video file 3.

Computing device 1 in described computing cluster obtains all frame of video foreground targets in sub-video file 1 Characteristic information, and according to the physical features in described characteristic information, textural characteristics and architectural feature determine Go out the same foreground target in all frame of video in sub-video file 1, then further according to described characteristic information In mathematical feature determine the behavioural information of all foreground targets, for example：N-th frame of video has 3 Foreground target, has 4 foreground targets in the N+1 frame of video, the characteristic information according to foreground target will Foreground target in n-th frame of video is mapped with the foreground target in the N+1 frame of video, wherein The behavioural information having a foreground target is newly-increased, and also can determine that other 3 foreground targets Behavioural information, and the described behavioural information of each foreground target is sent to described storage cluster, wherein, The feature description of the behavioural information of each target prospect is no less than 5, the characteristic vector of each feature description No less than 128 dimensions.The behavior of foreground target determined by sub-video file 2 and 3 also according to said method Information.

Described storage cluster closes according in the title of described video file, Video coding type, video file The generation time of key frame, the key frame interval between the storage location in video file and adjacent key frame The behavioural information of frame number and foreground target generates the index information of described video file, specifically, described rope Not only comprise the title of described video file in fuse breath, also include sub-video file 1,2,3 corresponding Between the generation time of key frame, key frame are between the storage location in video file and adjacent key frame The behavioural information of all foreground targets and described regard in frame number, and described sub-video file 1,2,3 The coding information of frequency file, described storage cluster not only can be searched user according to described index information and specify The target video file of content, can also find user-specific content concrete in target video file Position.

Wherein, not only have recorded all foreground targets in described video file in described index information, also remember The position of all foreground targets, the absolute size in video pictures and the direction of motion are recorded.

Described cloud computing also includes：Data imports and is used for being saved in the video of storage cluster with conversion module File index information formats and imports to big data analysis cluster, changes into form structure, facilitates big data Inquiry and analysis.

Machine learning module：Video data is learnt, analyzed and is excavated, be mainly used in being provided with prison Educational inspector practises and unsupervised learning scheduling algorithm, provides analysis engine for external service module.

Externally service module：External interface, for providing retrieval, the video of video file data to user The service such as file content statistics and the trend analysiss based on video file content.

Specifically, after determining the index information of video file, described index information is generated list knot Structure, so that cloud computing is big data inquiry and analysis is done data and prepared, when described cloud computing is carrying for user During for retrieval service, when described cloud computing is received search instruction, search list content according to user instruction Make a look up, if now not finding the video file of needs, described cloud computing is calculated using machine learning Method, the characteristic information that Utilization prospects target preserves carries out Classification and Identification, if there being corresponding video file, defeated Go out the video file specified, otherwise retrieval service failure；When described cloud computing provides statistical fractals, institute State cloud computing and first call retrieval service, search video file to be counted, if no corresponding video file, Then carry out the cluster of certain class video file by clustering algorithm, if clustering out video file, exporting and specifying Video file.Otherwise statistical fractals failure；When described cloud computing provides prediction service, first call retrieval Service and statistical fractals, do Trend Algorithm analysis according to history video file, provide and predict the outcome.

Based on the application design same with said method, the application also proposed a kind of server, such as Fig. 2 Described, described server includes：

First acquisition module 21, for according to obtaining to the coded sequence generating after video file decoding The foreground target of video file；

Second acquisition module 22, for obtaining the described prospect mesh of all frame of video under same key frame Target characteristic information；

First generation module 23, for according to described characteristic information to adjacent two-by-two under same key frame Described foreground target in frame of video carries out mating the behavior letter generating described foreground target under described key frame Breath；

Second generation module 24, generates for the behavioural information according to foreground target described under described key frame The index information of described video file；

Described first acquisition module, specifically for：

Determine all key frames comprising in described video file；

Described index information also includes：

Described analysis module, specifically for：

Described first generation module, specifically for：

Through the above description of the embodiments, those skilled in the art can be understood that this Shen Please realize naturally it is also possible to pass through hardware by the mode of software plus necessary general hardware platform, But the former is more preferably embodiment in many cases.Based on such understanding, the technical scheme of the application Substantially in other words prior art is contributed partly can be embodied in the form of software product, This computer software product is stored in a storage medium, including some instructions with so that a station terminal The application is each for equipment (can be mobile phone, personal computer, server, or network equipment etc.) execution Method described in individual embodiment.

The above is only the preferred implementation of the application it is noted that general for the art For logical technical staff, on the premise of without departing from the application principle, some improvement and profit can also be made Decorations, these improvements and modifications also should regard the protection domain of the application.

It will be appreciated by those skilled in the art that the module in device in embodiment can describe according to embodiment Carry out in the device be distributed in embodiment it is also possible to carry out that respective change is disposed other than the present embodiment In individual or multiple device.The module of above-described embodiment can be integrated in one it is also possible to be deployed separately；Can To merge into a module it is also possible to be further split into multiple submodule.Above-mentioned the embodiment of the present application sequence Number for illustration only, do not represent the quality of embodiment.

The several specific embodiments being only the application disclosed above, but, the application is not limited to this, The changes that any person skilled in the art can think of all should fall into the protection domain of the application.

Claims

1. a kind of generation method of video file index information is it is characterised in that methods described includes：

2. method as claimed in claim 1 is it is characterised in that described server is according to described video literary composition The coded sequence generating after part decoding obtains the foreground target of described video file, specially：

3. method as claimed in claim 2 is it is characterised in that described index information also includes：

4. method as claimed in claim 1 is it is characterised in that described characteristic information includes：Physical features, Textural characteristics, architectural feature and mathematical feature.

5. method as claimed in claim 4 is it is characterised in that described server is according to described characteristic information Described foreground target in adjacent two-by-two frame of video under same key frame is carried out with coupling and generates described pass The behavioural information of described foreground target under key frame, specially：

6. a kind of server is it is characterised in that described server includes：

First acquisition module, for regarding according to described in the coded sequence acquisition generating after video file decoding The foreground target of frequency file；

7. server as claimed in claim 6, it is characterised in that described first acquisition module, is specifically used In：

Determine all key frames comprising in described video file；

8. server as claimed in claim 7 is it is characterised in that described index information also includes：

Described analysis module, specifically for：

9. server as claimed in claim 6 is it is characterised in that described characteristic information includes：Physics is special Levy, textural characteristics, architectural feature and mathematical feature.

10. server as claimed in claim 9 is it is characterised in that described first generation module, specifically For：