CN101281520A

CN101281520A - Interactive physical training video search method based on non-supervision learning and semantic matching characteristic

Info

Publication number: CN101281520A
Application number: CNA2007100651801A
Authority: CN
Inventors: 胡卫明; 李华北
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2007-04-05
Filing date: 2007-04-05
Publication date: 2008-10-08
Anticipated expiration: 2027-04-05
Also published as: CN101281520B

Abstract

The invention discloses an interactive video retrieval method based on unsupervised learning and semantic matching features, which comprises the steps of extracting image bottom layer features and model matching sequence features from the video image frame level of a video database; extracting semantic matching features from the high-level semantic level of image bottom layer features; performing unsupervised learning on the extracted model matching sequence features and semantic matching features, establishing retrieval and direct retrieval based on unsupervised learning, and forming an interactive interface by related feedback. The middle layer features, high layer features, unsupervised retrieval mechanism and interactive mechanism of an integrated video compose a set of new complete video retrieval system, which precisely measures the time-space sequence information of a video object, therefore better retrieval effect is achieved, the semantic understand of a sports video theme is developed, the online calculating complexity and retrieval time of a system are reduced, and the interactive interface greatly improves the retrieval performance of the system.

Description

Based on unsupervised learning and semantic matching characteristic interactive physical video retrieval method

Technical field

The present invention relates to the Computer Applied Technology field, particularly the multimedia retrieval technology.

Background technology

Along with the develop rapidly of multimedia technology and computer network, the whole world comprises that the multi-medium data of digital picture, audio frequency, video increases with surprising rapidity.The multi-medium data of thousands of megabyte that every day is newly-generated is because the randomness of its distribution, just as a large amount of useful informations that wherein comprised that flooded of mercilessness as the flood that spreads unchecked.In the face of so abundant and be dispersed in worldwide magnanimity multimedia resource; how the user could effectively utilize information and internet new technology to realize quick location to required multimedia resource; conveniently obtaining with effective management has become very pressing issues, also makes the multimedia retrieval technology become a very active research field gradually.

Content-based multimedia retrieval is meant that physical content and semantic content that multi-medium data is contained carry out Computer Analysis and understanding, to make things convenient for user inquiring, its essence is exactly to unordered multimedia data stream structuring, extracts semantic information, guarantees that content of multimedia can be retrieved fast.Content-based Video Retrieval and CBIR are two most important branches of multimedia retrieval.In recent years, because coding, computer media are handled and the network transmission technology develop rapidly, the user can inquire about, appreciate and produce colourful video data in real time by high speed internet, as film, animation, news and sports cast etc., and utilize the automatic processing video data stream of computing machine.Video has become people's transmission and has obtained one of main path of information.In essence, the continuous data stream that video is made up of the continuous images frame sequence is the three dimensional object that two-dimensional digital image and time dimension constitute.Principal feature is as follows: (1) video data compares to image and comprises more visual information and semantic information; (2) data volume of video is huge; (3) abstract of video, structuring degree are low, and it is effectively managed and retrieval easily exists very big difficulty.Therefore, Content-based Video Retrieval has become one of the most popular research topic of Computer Applied Technology and area of pattern recognition, has very wide application prospect.

Video data is exactly image sequence in essence, and video features used in the video frequency searching also is made up of the low-level image feature of picture frame, and used search mechanism and thought have also been established solid foundation for the development of video frequency searching in the image retrieval simultaneously.Therefore the basic fundamental of image retrieval can be described as a required link.CBIR (CBIR, Content-based ImageRetrieval) is meant that direct employing picture material carries out the retrieval technique of image information inquiry.Its main thought is that the low layer characteristics of image such as spatial relationship of the color, texture, shape and the object that comprise according to image come analysis image information, and the proper vector of setting up image is as index; Generally provide the sample image during user search, system extracts the individual features vector of this sample image, and the proper vector with all objects that are retrieved in the database compares then, and will return to the user with the image of sample feature similarity.

Content-based Video Retrieval mainly is visual signature and the space-time characteristic that depends on video.Retrieval mode commonly used is based on the retrieval of video example, and the user submits video example to, and searching system is returned the similar video that the user needs from large-scale video database.Video data is the three dimensional object that is made of two-dimensional space and time, and the similarity measurement that how to define between the video is a difficult point, needs to solve following key issue:

(1) video is not the set of simple frame sequence, but the hierarchical structure of forming by scene-group-camera lens-key frame, measuring similarity is the prerequisite of video comparison on which level between video;

(2) visual signature of key frame is the basis of whole video visual signature, but every video all has a considerable amount of key frames; For large-scale video database, the memory space of each key frame visual signature of all videos and mutual number of comparisons all are very considerable;

Whether (3) two videos are similar is a very complicated problems, and different user has different understanding, and the artificial subjective factor that mixing wants video comparison algorithm reasonable in design must take all factors into consideration various factors as far as possible.

Introduce these new ideas of camera lens centroid vector based on the video frequency search system of camera lens centroid vector, calculate the similarity of camera lens level and then produce the similarity of video-level again; Utilize the data redundancy between key frame,, greatly reduce the memory space of key frame feature, simplified the complexity of system, realized the basic skills of Content-based Video Retrieval by sacrificing certain video space time information.

" iARM " system uses based on the method for model and the space-time sequence information of video is carried out modeling accurately and the model that contents of object is mapped to prior generation is got on.This system emphasizes the accuracy to the modeling of video space time information, so its relevant feedback analysis only needs limited number of time feedback and less training sample can obtain less retrieval time and retrieval performance preferably.

Except that above-mentioned technology, all be that the present invention lays a good foundation based on the theoretical unsupervised learning algorithm of figure, Image Retrieval Relevance Feedback technology with based on the information embedded technology of user feedback etc.

Summary of the invention

The objective of the invention is to propose new middle level, high-rise video features, with the sequence space time information and the semantic topic of reflecting video; Set up new search mechanism, to reduce the online computation complexity of similarity and to reduce retrieval time based on unsupervised learning; Make up new interactive search interface,, for this reason, the invention provides a kind of interactive physical video retrieval method based on unsupervised learning and semantic matching characteristic with on-line optimization query vector, on-line correction semantic marker, raising retrieval performance and expanding data storehouse.

For achieving the above object, the invention provides interactive physical video retrieval method, comprise that step is as follows based on unsupervised learning and semantic matching characteristic:

Step 1: the video frame image level at video database extracts the image low-level image feature;

Step 2: in the video sequence level extraction model matching sequence feature of image low-level image feature;

Step 3: the senior semantic hierarchies at the image low-level image feature extract semantic matching characteristic;

Step 4: extraction model matching sequence feature and semantic matching characteristic are carried out unsupervised learning, set up search mechanism based on unsupervised learning;

Step 5: form mutual search interface by the relevant feedback technology, optimize retrieval performance.

According to embodiments of the invention, the Model Matching sequence signature comprises: Weighted T-Bin histogram and Model Matching correlogram, the space-time sequence information that is used for the reflecting video object, Weighted T-Bin histogram comprises: each dimension expression object video is quoted the frequency of a certain model, and weight reflects the significance level of different Model Matching sequences.

According to embodiments of the invention, the extraction step of Model Matching sequence signature comprises as follows:

Step 21: entire database is considered as set of frames, down-sampling is carried out in entire image frame data storehouse obtain sample frame, sample frame low-level image feature vector is configured to matrix form generates training set;

Step 22: adopt the study of competitive learning algorithm to obtain mode set;

Step 23: each frame in the object video is found N optimum matching model from mode set, with the image frame sequence column-generation N bar optimum matching model sequence of video;

Step 24: N bar optimum matching model sequence is extracted Weighted T-Bin histogram and Model Matching correlogram.

According to embodiments of the invention, the Model Matching correlogram comprises: the best semantic matches sequence of given object video is S, and any two sequence members are m ₁And m ₂∈ S, mode set are MS, and the contained model number of mode set is Num_MS, and the pixel span is D; Then the Model Matching correlogram of this object video is the vector of Num_MS * D dimension; For i model M odel_i ∈ MS and certain pixel span k ∈ D, (the i-1) * D+k of Model Matching correlogram dimension is as described below:

Its physical significance is: for the Model Matching sequence S of certain object video, and the sequence member of given arbitrary use Model_i, MMC ^(k) _{Model_i}(S) provided the probability that sequence member beyond k the pixel span also uses Model_i, MMC describes the model reference frequency information and the preface information of object video simultaneously.

According to embodiments of the invention, semantic matching characteristic extracts, and comprises the steps:

Step 2a: the object video of choosing representative band mark is formed training set on a small scale, and this training set is characterized under the condition of current low-level image feature descriptive power database and contains the motion theme;

Step 2b: use training set, N bar best model matching sequence further is mapped to the semantic marker layer, obtained N bar optimum mark preface coupling row;

Step 2c: N bar optimum mark matching sequence is carried out histogram extract and weighting, the high-level semantic feature that obtains object video is the semantic matches histogram.

According to embodiments of the invention, search mechanism based on unsupervised learning comprises: adopt leading clustering algorithm that video database is carried out unsupervised learning, most of calculation of similarity degree is converted into off-line operation, weigh the quality and the total leading collection number of restriction of the leading collection of each generation with the consistance function, its concrete steps comprise:

Step 31: with video database as the nonoriented edge weight graph, wherein each object video is as the node of nonoriented edge weight graph, the coupling that uses a model correlogram or semantic matches histogram, the similarity of calculating any two sections videos be as the right weights of this node, and generate full similarity matrix A;

Step 32: utilize leading clustering algorithm, the label set of getting nonzero component in the locally optimal solution generates leading collection;

Step 33: the node that will belong to existing leading collection is deleted from current figure, and repeating above-mentioned steps is empty up to node diagram.

According to embodiments of the invention, interactive search interface, be used for video sequence level and semantic hierarchies are implemented relevant feedback, comprise as follows: adopt optimum inquiry relevant feedback technology to pass through man-machine interaction mode, be used to help computing machine to understand user's request, obtain the query vector of optimization, be applicable to direct search mechanism; Adopt the relational matrix relevant feedback, be used to adjust mutual relationship between each data clusters and contain overall semantic relation between data clusters, be applicable to search mechanism based on unsupervised learning; Missed suppression relevant feedback technology is used for the data object is carried out online missed suppression, expanding data storehouse, is applicable to use the histogrammic retrieving of semantic matches.

According to embodiments of the invention, optimum inquiry relevant feedback technology comprises as follows: after the user initially exported the relevant and uncorrelated video of result queue to system, query vector was optimized for:

{f_{q}}^{'} = W_{q} \times f_{q} + W_{R} \times (\frac{1}{N_{R}} Σ f_{R}) - W_{I} \times (\frac{1}{N_{I}} Σ f_{I})

In the formula: f _qBe former query vector, f _R, f _I, N _R, N _IBe associated video and the uncorrelated video and the number thereof of user's mark, f _q' be the query vector of optimizing, W _q, W _R, W _IBe constant coefficient.

According to embodiments of the invention, the relational matrix relevant feedback is following three steps:

Step a: initial relational matrix, calculate similarity between any two cluster centres and obtain initial relational matrix and be:

Correlation_Matrix[i][j]＝exp(-1*distance(Centroid_i，Centroid_j))

In the formula: Centroid_i, Centroid_j are two cluster centres, and distance () is certain distance function;

Step b: upgrade relational matrix, concern that the similarity of vectorial F (x) the given object of expression and each cluster centre is:

[F(x)] _i＝exp(-1*dis?tan?ce(x，Centroid_i))

X is the object video proper vector in the formula, and Centroid_i is certain cluster centre, and distance () is certain distance function;

The relational matrix formula upgrades by following formula:

Correlation_Matri x_{k} = Correlation_{Matrix}_{k - 1} + Σ_{i = 1}^{N_{R}} F (q) F (f_{R}) - Σ_{i = 1}^{N_{I}} F (q) F (f_{I});

In the formula: q is a query vector, f _R, f _I, N _R, N _IBe associated video and the uncorrelated video and the number thereof of user's mark, k is a update times;

Step c: use the relational matrix retrieval,, in relational matrix, find N relevant cluster, therefrom return Query Result again for a certain query requests; After each feedback, the renewal of correlation matrix will be saved, and make performance boost be accumulated.

According to embodiments of the invention, the concrete steps of missed suppression relevant feedback technology are as follows:

Steps d:, obtain associated video collection RS and uncorrelated video collection IS according to field feedback;

Step e: to associated video collection RS and uncorrelated video collection IS difference computation of mean values vector RMV and mean vector IMV;

Step f: find out two the components R D and the RD2 of greatest measure in mean vector RMV, expression is to should maximally related two themes of video;

Step g: in mean vector IMV, find the component ID of greatest measure, the theme that expression is least relevant;

Step h: if (ID==RD1) RD=RD2, execution in step i;

Step I: the semantic matches histogram of optimizing inquiry:

Query_SMH[RD]＝1，Query_SMH[ID]＝0；

Step j: deposit new feature in database, again retrieval.

The present invention the is integrated middle level feature of video, high-level characteristic, unsupervised learning search mechanism and interaction mechanism, constituted the novel complete video frequency search system of a cover, weighed the space-time sequence information of video accurately, developed semantic understanding to the sports video theme, the online computation complexity and the retrieval time of system have been reduced, increased substantially the retrieval performance of system by interactive interface, had broad application prospects.

Description of drawings

Fig. 1 is a system architecture diagram of the present invention.

Fig. 2 is the Model Matching synoptic diagram of video in the Model Matching sequence signature.

Fig. 3 is the indicia matched synoptic diagram of video in the semantic matching characteristic.

Fig. 4 is a data clusters overall situation semantic relation synoptic diagram in the relational matrix relevant feedback technology.

Fig. 5 is the program interface synoptic diagram of " CBVR_System ".

The return results that Fig. 6 inquires about for vollyball for the unsupervised learning search modes that uses MMC.

The return results that Fig. 7 inquires about for vollyball for the direct search modes that uses SMH.

Fig. 8 is a direct result for retrieval and through result's the comparison after the feedback once.

Embodiment

Below in conjunction with accompanying drawing the present invention is described in detail, be to be noted that described embodiment only is intended to be convenient to the understanding of the present invention, and it is not played any qualification effect.

General frame of the present invention is seen Fig. 1.Program " CBVR_System " is to realize an instantiation of the inventive method, adopts a computing machine, realizes with Visual C++ programming.The interactive physical video retrieval method based on unsupervised learning and semantic matching characteristic that the present invention proposes mainly comprises following four key issues:

(1) Model Matching sequence signature;

(2) semantic matching characteristic;

(3) based on the search mechanism of unsupervised learning;

(4) interactive search interface.

General structure of the present invention can be divided into off-line operation and two parts of on-line operation.The off-line operation part is made up of feature extraction and unsupervised learning.At first at the video frame image level to database object extraction image low-level image feature; Subsequently in video sequence level extraction model matching sequence feature; Extract semantic matching characteristic at senior semantic hierarchies; The video features that is extracted is carried out unsupervised learning, set up search mechanism based on unsupervised learning; On-line operation is divided into search mechanism and interaction feedback again.Search mechanism provides five kinds of search modes: use the unsupervised learning of the direct retrieval of TBH, the direct retrieval of using MMC, the direct retrieval of using SMH, use MMC to retrieve and use the unsupervised learning retrieval of SMH; When receiving query requests, system is retrieved and return results according to user's preference pattern; Form interactive interface sophisticated systems performance by the relevant feedback technology at last.Provide the explanation of each related in this invention technical scheme detailed problem below in detail.

(1) Model Matching sequence signature

Model Matching sequence signature among the present invention is the video middle level feature that is generated by the picture frame low-level image feature, is the sequence signature of weighing space time information at video-level, is again the feature based on model of dependent learning.Its key issue is the selection of obtaining of mode set and video middle level sequence signature.It is first key that sequence signature extracts that mode set obtains, " good " model representation in the video database frame have representational generalized graph picture, and " good " mode set is exactly by such one group representative strong and model a little less than the MD is formed.After obtaining suitable mode set,, video clips can be mapped as one group of best model matching sequence by each frame of video is carried out Model Matching.Extracting what kind of middle level sequence signature from the best model matching sequence becomes second key issue, and different feature extracting methods is huge to the influence that similarity is calculated, thereby has determined the performance of retrieval to a great extent.

Among the present invention, after database extraction low-level image feature, can carry out the extraction of Model Matching sequence signature.The Model Matching sequence signature is the video middle level sequence signature that the method by Model Matching is generated by the picture frame low-level image feature, in order to the space-time sequence information of accurate reflecting video object.The low-level image feature of describing image vision information is the basis of searching system, but it is not the emphasis that the present invention pays close attention to, and adopts which kind of low-level image feature can not influence structure of the present invention and principle, so native system has simply adopted the color correlogram.Concrete leaching process mainly is made up of training set generation, model generation, Model Matching and four steps of feature extraction.

Step 1: training set generates, and entire database is considered as set of frames rather than video set; Obtain sample frame by down-sampling is carried out in entire image frame data storehouse, sample frame low-level image feature vector is configured to matrix form as training set.

Step 2: model generates, and the present invention adopts the study of competitive learning algorithm to obtain mode set.

Sample x of picked at random from training set at first, at every turn _v, by formula (1) is that it is at existing mode set { m _t, find the optimum matching model in t=1...T)

| | x_{v} - m_{t^{*}} | | < | | x_{v} - m_{t} | |, t = 1 . . . T, t {&NotEqual; t}^{*}; - - - (1),

Subsequently, by formula (2) implement the competitive learning algorithm, and wherein m is an iterations, the study step-length coefficient of l (m) for successively decreasing with m.After iterations m reaches predetermined number of times, promptly obtain mode set.

m_{t^{*}} (m + 1) = m_{t^{*}} (m) + l (m) (x_{v} - m_{t^{*}} (m)); - - - (2),

Step 3: the model mapping, the concrete model mapping process is seen Fig. 2: given certain sequence of frames of video object, for its each frame finds N optimum matching model from mode set.Thereby the sequence of image frames of given video has just become N bar optimum matching model sequence.

Step 4: feature extraction, on the basis of optimum matching model sequence, the present invention has defined Weighted T-Bin histogram (WTH) and Model Matching correlogram (MMC).

The frequency that a certain model of each dimension expression of Weighted T-Bin histogram is cited, weight has reflected the significance level of different Model Matching sequences.WTH has reflected the frequency information that corresponding model is cited but has ignored the sequencing of sequence.

Defining 1. Model Matching correlogram: S is the best semantic matches sequence of given object video, m ₁And m ₂∈ S is any two sequence members, and MS is a mode set, and the contained model number of mode set is Num_MS, and D is the pixel span; Then the Model Matching correlogram of this object video is defined as the vector of a Num_MS * D dimension; For i model M odel_i ∈ MS and certain pixel span k ∈ D, (the i-1) * D+k of Model Matching correlogram dimension is as giving a definition:

For the Model Matching sequence S of certain object video, the sequence member of given arbitrary use Model_i, MMC ^(k) _{Model_i}(S) expression has provided the probability that sequence member beyond the k distance also uses Model_i.MMC describes out the model reference frequency information of object video simultaneously and quotes preface information.

(2) semantic matching characteristic

It is semantic matches histogram (SMH) that the present invention has defined semantic matching characteristic.Sports video can simply carry out index by their sports items title to its theme, as basketball, rugby and tennis etc.The semantic matches histogram is on the basis of image low-level image feature and video middle level feature, and the video high-level semantic feature that the method for learning by Model Matching and active generates is in order to carry out the theme mark to the sports video object.SMH provides the probability that certain object video belongs to each semantic topic, and with this video of related subject mark.Compare with the middle level feature, SMH reflects the semantic content of video to a certain extent, has reduced the proper vector dimension, has improved retrieval performance.The present invention develops Model Matching active learning algorithm in order to extract semantic feature.This algorithm learns to obtain the mode set of one group of band semantic marker by the mark training set being carried out active, then using this mode set carries out Model Matching to object video and obtains some optimum mark matching sequences, again matching sequence is carried out histogram and extract, obtain semantic matches histogram (SMH) at last.Concrete extracting method can be divided into following three steps.

Step 1: training set, training set are as the supervision message source of active study mechanism, and native system is chosen the object video of representative band mark and formed training set on a small scale.That is to say, the object video in the training set in advance the handmarking sports events theme; The training set scale is less with respect to database; Training set can be contained motion theme as much as possible under the condition of current low-level image feature descriptive power.

Step 2: model generates and mapping, and mapping process is seen Fig. 3, and model generates similar with the corresponding step of Model Matching sequence signature, and difference is only for having used different training sets.(in Fig. 2, add the semantic marker layer and form Fig. 3), N bar best model matching sequence further is mapped to the semantic marker layer, obtained N bar optimum mark preface coupling row.

Step 3: generative semantics matching histogram (SMH), N bar optimum mark matching sequence is carried out histogram extract and weighting, obtain high-level semantic feature---the SMH of object video.SMH has following characteristics: dimension is low, is equivalent to the semantic topic number of supervision message in the training set; Explicit physical meaning, its each dimension expression object video belongs to the probability of certain corresponding theme, and with this object of related subject mark; Vector is sparse, reduces storage space significantly and simplifies similarity calculating.

(3) based on the retrieval of unsupervised learning

Traditional search method is a sequencing of similarity mechanism, and this mechanism has directly, flexibly and be easy to realize characteristics such as relevant feedback; But for each query object, this method all will travel through whole data space simultaneously again in all similarities of line computation, therefore need bear very high online computation complexity; For large-scale video database, this mechanism almost can't operate as normal especially.

The present invention has set up retrieval framework based on unsupervised learning in order to replace traditional direct ordering search mechanism.This mechanism makes most of calculation of similarity degree be converted into off-line operation by video database is carried out unsupervised learning, has reduced the complexity in line computation significantly, realizes simultaneously database is more effectively managed.Like this, retrieving can be divided into coarse search and examining rope two parts, coarse search is actual to be exactly one inquiry is assigned to the assorting process of existing cluster, and examining Suo Ze only need be in a spot of similarity of line computation (similarity of inquiry and certain cluster sample, inquiry and a small amount of free sample).This part key issue is the selection of unsupervised learning algorithm.Clustering algorithm is based on the core of unsupervised learning search mechanism.Cluster time, cluster purity and cluster number are having a strong impact on the performance of searching system.

Native system adopts the search mechanism of leading clustering algorithm realization based on unsupervised learning.Leading clustering algorithm (Dominant Set Clustering) is a kind of of the theoretical clustering algorithm of figure, and this algorithm exists the consistance function to weigh the quality and the total leading collection number of restriction of the leading collection of each generation.Compare with other clustering algorithms, the cluster that leading clustering produces has higher degree, and the cluster number can determine automatically that by the setting of consistance threshold value computation complexity is less relatively simultaneously, and concrete clustering algorithm can be divided into three steps.

Step 1: full similarity matrix, video database is considered as the nonoriented edge weight graph, and wherein each object video is as the node of figure, and coupling correlogram or semantic matches histogram use a model, the similarity of calculating any two sections videos is as the right weights of this node, and forms full similarity matrix A.

Step 2: iterative equation, leading clustering algorithm is equivalent to following double optimization problem:

max?f(u)＝u ^T?Aus.t.u?∈Δ，(4)，

Wherein

Δ = {u &Element; R^{N} : u_{i} &GreaterEqual; 0 and Σ_{i = 1}^{n} u_{i} = 1},

A is full similarity matrix (5),

The locally optimal solution of this problem can obtain by following iterative equation

u _i(t+1)=u _i(t) (Au (t)) _i/ u (t) ^TAu (t), t are iterations (6), and the label set of getting nonzero component in the locally optimal solution generates leading collection;

Step 3: the node that will belong to existing leading collection is deleted from current figure, and repeating above-mentioned steps is empty up to node diagram.

(4) interactive search interface

Relevant feedback is exactly by man-machine interaction, allows the semantic information of user's online help computer understanding object and human subjectivity needs.The sequence characteristic of video makes field feedback need the relatively long time, so the relevant feedback technical development of video frequency searching is very limited.Alleviate the user and use burden, how to obtain the development trend that best retrieval effectiveness becomes the video relevant feedback by minimum feedback.

In native system, the search modes of the matching sequence feature that uses a model so it lacks the understanding to the searching object semanteme, has been ignored the subjectivity of human perception owing to itself do not relate to any semantic content of video simultaneously.Similarly, use the search modes of semantic matching characteristic to obtain certain semantic information from supervised training is concentrated, however to such an extent as to the very limited sometimes accurate mark that can not guarantee all the time of this information to the sports video theme.For the reflection user individual sexual demand, remedy semantic wide gap and the online supervision message that is supplemented with, the present invention has set up the interactive search interface of a cover, implemented three kinds of relevant feedback technology respectively at video sequence level and semantic hierarchies: optimum inquiry relevant feedback technology, relational matrix relevant feedback technology and missed suppression relevant feedback technology, wherein:

Optimum inquiry relevant feedback technology: the real demand that usually can not describe out the user by the query vector of character representations such as TBH or MMC exactly, so the present invention uses optimum inquiry relevant feedback technology by man-machine interaction mode, helps computing machine understanding user's request to obtain the query vector of optimizing.

The user simply gives a mark for the initial retrieval result of system, marks associated video and irrelevant video.According to field feedback, optimum query vector can be obtained by formula (7).Like this, the user can help computing machine more accurately to understand search request, has improved the performance of retrieval.

{f_{q}}^{'} = W_{q} \times f_{q} + W_{R} \times (\frac{1}{N_{R}} Σ f_{R}) - W_{I} \times (\frac{1}{N_{I}} Σ f_{I}) - - - (7)

Relational matrix relevant feedback technology: optimum inquiry mechanism is only optimized given query and has been ignored entire database, thus always when retrieving, lose by the performance boost that obtains alternately next time, and can not get continuing accumulation.When promptly same search request being retrieved once more, also to repeat whole reciprocal processes.

In view of this, the present invention proposes relational matrix relevant feedback technology, see Fig. 4 by adjusting the overall semantic relation that mutual relationship contains between data clusters between each data clusters, among Fig. 4, C ₁-C _NN cluster centre in the expression database, weights W is represented the similarity relation between each cluster.Detailed process is divided into following three steps:

Step 1: initial relational matrix.The similarity of calculating between any two cluster centres obtains initial relational matrix:

Correlation_Matrix[i][j]＝exp(-1*distance(Centroid_i，Centroid_j))(8)

In the formula: Centroid_i, Centroid_j are two cluster centres, and distance () is certain distance function.Correlation_Matrix represents the similarity relation between each cluster.

Step 2: upgrade relational matrix.The similarity that concerns vectorial F (x) the given object of expression and each cluster centre:

[F(x)] _i＝exp(-1×distance(x，Centroid_i))(9)

X is the object video proper vector in the formula, and Centroid_i is certain cluster centre, and distance () is certain distance function.

Relational matrix by formula (10) upgrades:

Correlation_{Matrix}_{k} = Correlation_{Matrix}_{k - 1} + Σ_{i = 1}^{N_{R}} F (q) F (f_{R}) - Σ_{i = 1}^{N_{I}} F (q) F (f_{I}) - - - (10)

In the formula: q is a query vector, f _R, f _I, N _R, N _IBe associated video and the uncorrelated video and the number thereof of user's mark, k is a update times.

Matrix

In nonzero component represent that the user thinks that relevant cluster is right;

In like manner

In nonzero component represent that the user thinks that incoherent cluster is right; By strengthening the right similarity relation of relevant cluster, weaken the right similarity relation of uncorrelated cluster, relational matrix is upgraded.

Step 3: use the relational matrix retrieval.For a certain query requests, in relational matrix, find the relevant cluster of N, therefrom return Query Result again.After each feedback, the renewal of correlation matrix will be saved, and make performance boost be accumulated.

Missed suppression relevant feedback technology: be used for the data object is carried out online missed suppression, improves retrieval performance expanding data storehouse simultaneously, specific algorithm is as follows:

Step 1:, obtain associated video collection RS and uncorrelated video collection IS according to field feedback;

Step 2: to associated video collection RS and uncorrelated video collection IS difference computation of mean values vector RMV and mean vector IMV;

Step 3: find out two the components R D and the components R D2 of greatest measure in RMV, expression is to should maximally related two themes of video;

Step 4: in IMV, find the component ID of greatest measure, the theme that expression is least relevant;

Step 5: If (ID==RD1) RD=RD2;

Step 6: the semantic matches histogram of optimizing inquiry:

Query_SMH[RD]＝1，Query_SMH[ID]＝0；

Step 7: deposit new feature in database, again retrieval.

Fig. 5-Fig. 8 illustrates technique effect of the present invention, wherein:

Fig. 5 is the program interface synoptic diagram of " CBVR_System ".Wherein, first two field picture of video in the video data storehouse, upper right viewing area uses the button page turning, and picture slid underneath bar is used for receiving feedback information; Upper left broadcast area is used for playing chooses video; Radio box is used to select search modes; Button area is used for feature operation; Status bar real-time display program operation information.

The return results that Fig. 6 inquires about for vollyball for the unsupervised learning search modes that uses MMC.Use radio box to select " using the unsupervised learning search modes of MMC ", first two field picture of viewing area output result for retrieval is clicked image and can be play whole section video at broadcast area.Status bar shows that retrieval is consuming time.

The return results that Fig. 7 inquires about for vollyball for the direct search modes that uses SMH.Use radio box to select " using the direct search modes of SMH ", status bar shows the semantic topic of retrieving consuming time and query object.

Fig. 8 is a direct result for retrieval and through result's the comparison after the feedback once.Last figure is the original output result of " the direct search modes that uses THB ", and the user uses slider bar that feedback opinion (representing associated video to the right, to the uncorrelated video of left representation) is provided; Figure below is feedback back result, and performance obviously improves.

The above; only be the embodiment among the present invention; but protection scope of the present invention is not limited thereto; anyly be familiar with the people of this technology in the disclosed technical scope of the present invention; can understand conversion or the replacement expected; all should be encompassed in of the present invention comprising within the scope, therefore, protection scope of the present invention should be as the criterion with the protection domain of claims.

Claims

1. the interactive physical video retrieval method based on unsupervised learning and semantic matching characteristic is characterized in that, comprises that step is as follows:

2. search method according to claim 1, it is characterized in that, the Model Matching sequence signature comprises: Weighted T-Bin histogram and Model Matching correlogram, be used for the space-time sequence information weighting of reflecting video object, the T-Bin histogram comprises: each dimension expression object video is quoted the frequency of a certain model, and weight reflects the significance level of different Model Matching sequences.

3. search method according to claim 1 is characterized in that, the extraction step of Model Matching sequence signature comprises as follows:

Step 22: adopt the study of competitive learning algorithm to obtain mode set;

4. search method according to claim 2 is characterized in that, the Model Matching correlogram comprises: the best semantic matches sequence of given object video is S, and any two sequence members are m ₁And m ₂∈ S, mode set are MS, and the contained model number of mode set is Num_MS, and the pixel span is D; Then the Model Matching correlogram of this object video is the vector of Num_MS * D dimension; For i model M odel_i ∈ MS and certain pixel span k ∈ D, (the i-1) * D+k of Model Matching correlogram dimension is as described below:

5. search method according to claim 1 is characterized in that, semantic matching characteristic extracts, and comprises the steps:

6. search method according to claim 1, it is characterized in that, search mechanism based on unsupervised learning comprises: adopt leading clustering algorithm that video database is carried out unsupervised learning, most of calculation of similarity degree is converted into off-line operation, weigh the quality and the total leading collection number of restriction of the leading collection of each generation with the consistance function, its concrete steps comprise:

7. search method according to claim 1 is characterized in that, interactive search interface is used for video sequence level and semantic hierarchies are implemented relevant feedback, comprises as follows:

Adopt optimum inquiry relevant feedback technology by man-machine interaction mode, be used to help computing machine to understand user's request, obtain the query vector of optimization, be applicable to direct search mechanism;

Adopt the relational matrix relevant feedback, be used to adjust mutual relationship between each data clusters and contain overall semantic relation between data clusters, be applicable to search mechanism based on unsupervised learning;

Missed suppression relevant feedback technology is used for the data object is carried out online missed suppression, expanding data storehouse, is applicable to use the histogrammic retrieving of semantic matches.

8. search method according to claim 7 is characterized in that: optimum inquiry relevant feedback technology comprises as follows: after the user initially exported the relevant and uncorrelated video of result queue to system, query vector was optimized for:

{f_{q}}^{'} = W_{q} \times f_{q} + W_{R} \times (\frac{1}{N_{R}} Σ f_{R}) - W_{I} \times (\frac{1}{N_{I}} Σ f_{I})

9. search method according to claim 7 is characterized in that, the relational matrix relevant feedback is following three steps:

Correlation_Matrix[i][j]＝exp(-1*distance(Centroid_i，Centroid_j))

[F(x)] _i＝exp(-1*dis?tan?ce(x，Centroid_i))

The relational matrix formula upgrades by following formula:

Correlation_{Matrix}_{k} = Correlation_{Matrix}_{k - 1} + Σ_{i = 1}^{N_{R}} F (q) F (f_{R}) - Σ_{i = 1}^{N_{I}} F (q) \overset{;}{F} (f_{I})

10. search method according to claim 7 is characterized in that, the concrete steps of missed suppression relevant feedback technology are as follows:

Step h: if (ID==RD1) RD=RD2, execution in step i;

Step I: the semantic matches histogram of optimizing inquiry:

Query_SMH[RD]＝1，Query_SMH[ID]＝0；

Step j: deposit new feature in database, again retrieval.