CN101151640B - Apparatus and method for processing video data - Google Patents

Apparatus and method for processing video data Download PDF

Info

Publication number
CN101151640B
CN101151640B CN2006800104697A CN200680010469A CN101151640B CN 101151640 B CN101151640 B CN 101151640B CN 2006800104697 A CN2006800104697 A CN 2006800104697A CN 200680010469 A CN200680010469 A CN 200680010469A CN 101151640 B CN101151640 B CN 101151640B
Authority
CN
China
Prior art keywords
data
video
pixel
skeleton pattern
summit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2006800104697A
Other languages
Chinese (zh)
Other versions
CN101151640A (en
Inventor
查尔斯·保罗·佩斯
约翰·维斯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Euclid Discoveries LLC
Original Assignee
Euclid Discoveries LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Euclid Discoveries LLC filed Critical Euclid Discoveries LLC
Publication of CN101151640A publication Critical patent/CN101151640A/en
Application granted granted Critical
Publication of CN101151640B publication Critical patent/CN101151640B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/001Model-based coding, e.g. wire frame
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/167Detection; Localisation; Normalisation using comparisons between temporally consecutive images
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/132Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/20Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/527Global motion vector estimation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/537Motion estimation other than block-based
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/537Motion estimation other than block-based
    • H04N19/54Motion estimation other than block-based using feature points or meshes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/63Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

An apparatus and methods for processing video data are described. The invention provides a representation of video data that can be used to assess agreement between the data and a fitting model for a particular parameterization of the data. This allows the comparison of different parameterization techniques and the selection of the optimum one for continued video processing of the particular data. The representation can be utilized in intermediate form as part of a larger process or as a feedback mechanism for processing video data. When utilized in its intermediate form, the invention can beused in processes for storage, enhancement, refinement, feature extraction, compression, coding, and transmission of video data. The invention serves to extract salient information in a robust and efficient manner while addressing the problems typically associated with video data sources.

Description

Be used for the apparatus and method of processing video data
It is the United States Patent (USP) provisional application the 60/653rd of topic application with " System And Method ForVideo Compression Employing Principal Component Analysis " that this part application requires on February 17th, 2005, No. 810 and on January 28th, 2005 are the right of priority of No. the 60/648th, 094, the United States Patent (USP) provisional application of topic application with " System And Method For Video Compression EmployingPrincipal Component Analysis ".This part application is the U.S. Patent application the 11/230th as application on September 20th, 2005, the U.S. Patent application of applying on November 16th, 2005 the 11/280th of No. 686 part continuation application, No. 625 part continuation application, and wherein said U.S. Patent application the 11/230th, it is again for No. 686 the part continuation application of No. the 11/191st, 562, the U.S. Patent application of application on July 28th, 2005.Every part of above-mentioned application is all incorporated into by quoting as proof at this.
Invention field
Relate generally to digital processing field of the present invention is more specifically to the computer installation and the computer implemented method that are used for expressing effectively with processing signals or view data (the most specifically, video data).
Background technology
The present invention is present in the General System of prior art wherein and describes and can represent with Fig. 1.Here block scheme shows the typical prior art processing system for video.Such system generally include following what: input stage 102, handle level 104, output stage 106 and one or more data storing mechanism 108.
Input stage 102 may comprise some elements, for example, and camera sensitive element, camera sensitive element array, range finding sensitive element or fetch the method for data from storage mechanism.Input stage provides the video data of expressing time correlation sequence artificial and/or abiogenous phenomenon.The remarkable composition of these data may be covered by noise or other undesired signal or pollute.
Video data can directly or by the storage element 108 of centre be submitted to processing level 104 with the form of data stream, array or packet according to predefined transfer protocol.Processing level 104 can adopt the form of specialized simulation or digital device or programming device (for example, central processing unit (CPU), digital signal processor (DSP) or field programmable gate array (FPGA)) to carry out needed one group of video data and handle operation.Handle level 104 and generally include one or more CODEC (coder/decoder).
Output stage 106 produces signal, demonstration or other response that can influence user or external device (ED).Usually, output device is used to produce the expression in storer of indicator signal, demonstration, hard copy, data processed, or beginning is to remote site transmission data.It also can be used to provide M signal or the controlled variable of using in the processing operation of back.
Storer occurs as non-essential element in this system.In use, it can be non-volatile storing element 108, for example, and read-only Storage Media, or easily lose, for example, dynamic randon access reservoir (RAM).Single processing system for video comprises that some kinds of storage elements are not to be rare, and these elements have various relation to input stage, processing level and output stage.The example that stores element like this comprises input buffer, output buffer and processing cache memory.
The fundamental purpose of processing system for video shown in Figure 1 is to handle the input data, produces the output profound to the application-specific meaning.In order to realize this target, can utilize multiple processing operation, comprise reducing or eliminating noise, feature extraction, object fractionation and/or standardization, data qualification, incident detection, editor, data selection, data recompile and code conversion.
It is important to the people that generation is subjected to many Data Sources (especially sound and visual image) of the data of bad constraint.In most of the cases, these targets of coming the essential characteristic of source signal that valid data are handled have adverse influence.Coming the intrinsic changeableness of source data is to result under the situation of error of untried experience and method exploration used in derivation engineering supposition with obstacle reliable and the effective and efficient manner deal with data not introducing.This changeableness is alleviated when using among the input data are limited in the narrow feature group (for example, one group of limited value of symbol or narrow bandwidth) of definition naturally or intentionally for some.These restrictions cause the low treatment technology of commercial value often.
The design of signal processing system is subjected to the desired use of this system and the influence of the expection feature of coming source signal used as input.In most of the cases, the required efficient of finishing also will be important design factor.Finish the influence that influence that quantity that efficient is subjected to pending data conversely compares with available data storage and the computational complexity that is subjected to this application program are compared with the computing power that can get.
Traditional method for processing video frequency is suffered infringement because of having many inefficient character, and these inefficient character are that, storage demand slow with data signaling rate is big and disturb the form of perceptual illusion to occur.These may be serious problems, because people wish to have several different methods to use and handle video data and people the visual information of some form are had inborn susceptibility.
" the best " processing system for video is high efficiency, reliable and strong in the processing operating aspect of finishing one group of expection.Such operation may comprise storage, transmission, demonstration, compression, editor, encryption, enhancing, classification, feature detection and the affirmation of data.Secondary operation may comprise the integration of such data processed and other Data Source.Of equal importance under the situation of processing system for video is that output should be compatible with human vision by avoiding introducing the consciousness artefact.
If its speed of processing system for video, efficient and quality do not depend on the details of any special characteristic of importing data consumingly then can be described to " strong "., strong also relevant with the ability of complete operation when mistake appears in some input.Many processing system for video are failed strong to the general category that is enough to consider application, only provide application for the data of using that are subjected to narrow limits equally in the research and development of this system.
Significantly information is not because the characteristics of signals of the sampling rate of input element and perceptual phenomena matches may lose in the discretize of the Data Source of continuous value.In addition, when surpassing the sensor limit, signal intensity causes that loss is also arranged saturated the time.Equally, in the time of the precise decreasing of input data, data also can be lost, and this complete numerical range in the input data is expressed with one group of discrete values, occurs in any quantification program when reducing the precision of data representation whereby.
Overall changeableness refers to any unpredictable property in class data or the information source.Because visual information is unrestricted usually, so represent the data of visual information feature that very large overall changeableness degree is arranged.Visual information can be expressed any because light is incident on sensor array and lists formed space array sequence or spatio-temporal sequence.
When copying visual phenomenon, video processor is forced some restriction groups and/or structure on expression or decryption mode usually.As a result, such method may be introduced will influence output quality, may be used for considering the systematic error of the type of confidence level of exporting and the subsequent treatment work that can finish reliably on these data.
Data precision in some quantization methods reduction video pictures attempts to possess the statistics variations of those data simultaneously.Usually, video data is to analyze like this, so that the distribution of data value is collected among the probability distribution.Also have certain methods data map among phase space so that the characteristic of data is expressed as the mixing of spatial frequency, allow precise decreasing to spread whereby in the less mode of opposition that causes.These quantization methods often cause fantastic color on the consciousness and can cause unexpected strange state in the originally level and smooth zone of this video pictures when being utilized in large quantities.
Differential coding also is used for utilizing the local space similarity of data usually.Around the similar data of data trend in accumulating in that picture in a part of picture with follow-up picture in similar position.Then, according to its space adjoin these data of data representation can with quantized combinations, and net result is to express differential ratio for given accuracy to use the absolute value of data more accurate.This supposition (for example, in the few image of black white image or color) when the spectral resolution of original video data is limited is worked well.Along with the spectral resolution of image increases gradually, similarity is supposed by havoc.This destruction is owing to there not being ability to protect the video data accuracy to cause selectively.
Residual coding and differential coding are similar, because the error of this expression is by further differential coding, so that the accuracy of raw data is returned to the levels of accuracy of expection.
The variation of these methods is attempted video data is transformed into data dependence relation is exposed to alternative expression among space phase and the scale.In case video data with these mode conversions, quantizes and the method for differential coding just can be applicable to the data that are transformed, thereby causes increasing the preservation of specific image feature.The most general two kinds is discrete cosine transform (DCT) and discrete wavelet transform (DWT) in these conversion video compression technologies.The error of dct transform shows variation widely aspect the video data numerical value, therefore, DCT is used on the section of video data usually, in order that make these wrong correlationships location.Illusion from this location often occurs along the border of these sections.With regard to DWT, take place when more complicated illusion mismatches between basic function and some texture, and this causes blurring effect.For the negative effect of canceling DC T and DWT, the accuracy of improve expressing is so that be that cost reduces distortion with the bandwidth of preciousness.
Summary of the invention
The present invention is a kind of at the computer implemented method for processing video frequency that calculates and analyze the method that all is better than existing top technology aspect two.Method of the present invention in principle is linear decomposition method, the integration of space Split Method and spatial norm method.Improve the robustness and the applicability of linear decomposition method greatly from space restriction video data.In addition, can further be used for increasing the interests that derive from spatial normization separately with the corresponding data space fractionation of spatial normization.
Specifically, the invention provides and a kind ofly can be effectively signal data be processed into the means of one or more useful expression.The present invention is effectively when handling the data set of many generally generations and is effective especially when handling video and view data.These data of methods analyst of the present invention and those data are provided one or more succinctly express so that its processing and coding become easy.For many application (including but not limited to: the coding of video data, compression, transmission, analysis, storage and demonstration), every kind of new more succinct data representation all allows to reduce computing, transmission bandwidth and storage requirements.The present invention includes the method for the remarkable composition that is used for discerning and extract video data, thereby allow to distinguish the processing of data and the priority ranking of expression.It is lower that noise in the signal and other redundance are counted as right of priority, so further processing can concentrate on and analyzes and express in the vision signal right of priority than on the higher part.As a result, the previous possible expression of the expression ratio of vision signal is more succinct.And the loss of accuracy concentrated in the vision signal unessential part on the consciousness.
Description of drawings
Fig. 1 is the block scheme that illustrates the prior art processing system for video.
Fig. 2 provides the block scheme of general survey of the present invention, and it shows the main modular that is used for handling image.
Fig. 3 is the block scheme that illustrates method of estimating motion of the present invention.
Fig. 4 is the block scheme that illustrates whole registration method of the present invention.
Fig. 5 is the block scheme that illustrates normalization method of the present invention.
Fig. 6 is the block scheme that illustrates hybrid spatial norm compression method.
Fig. 7 is the block scheme that illustrates the present invention's mesh generation method of use in the standardization of part.
Fig. 8 is the block scheme based on the normalization method of mesh that illustrates that the present invention uses in part standardization.
Fig. 9 is the block scheme that illustrates combined integrate of the present invention and local normalization method.
Figure 10 is the block scheme that illustrates basic fitting of a polynomial of GPCA-of the present invention and differential method.
Figure 11 is the block scheme that illustrates recurrence GPCA refinement method of the present invention.
Figure 12 is the block scheme that illustrates background resolution method.
Figure 13 is the block scheme that illustrates object method for splitting of the present invention.
Specific implementation method
In video signal data, video pictures is assembled into the three-dimensional on-the-spot image sequence that is depicted in projection on the two-dimensional imaging surface (imaging) usually.Each picture (or image) all is made up of the pixel of the imaging sensitive element of representing the response sampled signal.Often, sampled signal is corresponding to the energy (for example, electromagnetic energy, acoustic energy etc.) of some reflections of being sampled by two-dimentional sensitive element array, refraction or emission.Continuous sequential sampling causes space-time data stream, and two Spatial Dimensions of each picture and a time dimension are corresponding to the order of this picture in video sequence.
The present invention is as illustrational analytic signal data of Fig. 2 and the remarkable composition of identification.When signal was made up of video data, the space-time flow analysis disclosed often the remarkable composition as special object (for example, face).Recognizer limits the existence of this remarkable composition and importance and is chosen in most important one or more remarkable compositions among those limited remarkable compositions.This is not limited in after the processing of present description or discerns and handle the lower composition of other conspicuousness simultaneously.Then, above-mentioned remarkable composition is further analyzed, so as identification variable with constant sub-composition.The identification of varitron composition is not the modeling program of this certain aspect of composition, discloses the parameter expression method of this model whereby, to allow this composition is synthesized to the levels of accuracy of expection.
In one embodiment of the invention, survey and follow the tracks of foreground object.Discern the pixel of this object and these pixels are split from each video pictures.Locomotion evaluation based on section is applied to split the object of coming out from a plurality of pictures.Then, these locomotion evaluations result is integrated into a senior motion model.This motion model is used for the illustration of this object is hidden in the public spatial configuration.For specific data, in this configuration, the more feature of this object is aligned.This standardization allows to express compactly the linearity of the numerical value of the object pixel on a plurality of pictures and decomposes.The remarkable information that belongs to object appearance is comprised among this compact expression.
The preferred embodiments of the invention are described the linearity of prospect object video in detail and are decomposed.Should to as if press spatial normization, draw the linear display model of compactness whereby.In addition, further preferred embodiment split foreground object to come out from the background of video pictures earlier before spatial normization.
The preferred embodiments of the invention apply the present invention to a people carries out small motion while speaking facing to video camera image.
The preferred embodiments of the invention apply the present invention in the image any object that can express well by space conversion.
The preferred embodiments of the invention are used clearly based on the locomotion evaluation of section and are determined finite difference between two or more video pictures.For will be provided more effective linear the decomposition, senior motion model is according to those finite difference factorization.
Survey and follow the tracks of
In case determined the remarkable composition of signal, just can keep these compositions, and can reduce or remove all other signal composition.The program of surveying remarkable composition is illustrated in Fig. 2, and wherein video pictures (202) is with one or more detected object (206) routine processes, thereby causes one or more objects to be identified and tracked subsequently.The composition that keeps is represented the intermediate form of video data.Then, can use for the common unavailable technology of current video processing method to this intermediate data coding.Because this intermediate data exists with several forms, so the video coding technique of standard also can be used for to several form codings in these intermediate forms.For each illustration, the present invention determines to use then efficient coding technology earlier.
In a preferred embodiment, the notable feature routine analyzer is finished the detection and the classification of significant signal pattern.Embodiment of this program is used specially to produce the combination of the spatial filter that the intensity response signal relevant with the object notable feature that detects design in video pictures.Use this sort program with different space scales at the diverse location of video pictures.Point out the possibility that the significant signal pattern occurs from the intensity of the response of this sort program.In on the center being placed very significant object, this program is classified to it with corresponding strong response.The detection of significant signal pattern is distinguished the present invention by activating to the subsequent treatment and the analysis of the remarkable information in the video sequence.
Provide the detecting location of significant signal pattern in one or more video pictures, the present invention analyzes the no change feature of significant signal pattern.In addition, for unconverted feature, the present invention analyzes the residual value of this signal, " less significant " signal mode.The no change Feature Recognition provides the basis that is used for reducing redundant information and fractionation (that is, separating) signal mode.
Feature point tracking
In one embodiment of the invention, the locus in one or more pictures is determined by the analysis of spatial-intensity field gradient.These features are corresponding to some intersection points of " some lines ", and these intersection points can be described as " turning " loosely.Such embodiment further selects one group of such turning, and these turnings are strong and spatially are foreign peoples each other, referred to herein as unique point.In addition, use with different levels multiresolution assessment of optical flow to allow definite translational displacement along with unique point time lapse.
In Fig. 2, tracing object (220) program is for from the detection illustration of detected object program (208) with go up one or more further identification corresponding relations (222) that are detected the feature of object at many video pictures (202 and 204) and move to together and show.
The non-limiting embodiments of signature tracking can be used as such, so that these features are used to limit the more regular gradient analysis method locomotion evaluation of section (for example, based on).
Another embodiment expectation is based on the prediction of the locomotion evaluation of signature tracking.
Object-based detection and tracking
In a non-limiting embodiments of the present invention, strong object class program is used to follow the tracks of the face in the video pictures.Such sort program is based on the cascade response to the directed edge of having trained on face.In this sort program, the edge is defined as 45 degree rotations of one group of basic Haar feature and those features.This cascade sort program is the variant of AdaBoost algorithm.In addition, RESPONSE CALCULATION can be by using the optimization of total area table.
Local registration
Registration is included in the distribution of the corresponding relation between the various element that is identified object in two or more video pictures.These corresponding relations become the basis of setting up the Spatial Relational Model between the distinct video data of time point in this video data.
For the creative derivation algorithm according to the algorithm that is widely known by the people and those algorithms illustrates specific embodiment and the reduction that they are associated with practice, describe now to be used for various nonrestrictive method for registering of the present invention.
The time set up tangible optical flow model in the empty sequence a kind of method can produce that the finite differences territory realizes by two or more pictures from video data.If this corresponding relation meets specific constancy restriction on space and intensity double meaning, then the optical flow field energy is sparsely assessed.
As shown in Figure 3, picture (302 or 304) might by ten get other double sampling program (low pass filter for instance) of a program (306) or some by the space by double sampling.Image (the 310﹠amp that these spatially reduce; 312) also may be by further double sampling.
Rhombus is searched
Suppose a video pictures is divided into some nonoverlapping sections, search previous video pictures with each section coupling.Find out in formerly the video pictures position of the time error minimum of comparing with the section in the current picture based on the locomotion evaluation of comprehensive search section (FSBB).Finishing FSBB may be that computational costs is very great, and does not often produce than other evaluation scheme better matching that is assumed to be the basis with the localization campaign.The gradient descending motion assessment of searching section (DSBB) based on rhombus is the common substitute of FSBB, and it uses the rhombus search pattern of various different sizes towards crossing error gradient repeatedly for the direction of the best coupling of certain section.
In one embodiment of the invention, in order to produce the finite difference that numerical value is factorized into the high-order motion model after a while, DSBB is used to the image gradient domain analysis between one or more video pictures.
The people who is familiar with this technology knows that the locomotion evaluation based on section can be regarded as the equivalent that regular mesh summit is analyzed.
Locomotion evaluation based on mesh
How much mesh that use the summit to be coupled together by the edge based on the prediction of mesh are described the discontinuity zone of video pictures, predict the distortion and the motion in those zones in the follow-up picture by the distorted pattern that is subjected to the control of mesh vertex position subsequently.In order to predict current picture, because the summit moves, so also moving with the pixel within the zone of summit definition.The relative motion of original pixel numerical value and consequent being similar to are finished by some interpolation methods, and these interpolation methods connect the position of pixel location with the summit of contiguous that pixel.When such motion was present in the vision signal, the additional modeling of convergent-divergent and rotation was compared with pure translation and can be produced more accurate picture pixel prediction.
Usually, the mesh model can be defined as be the rule or adaptive.The mesh model of rule is to design under the situation of not considering the basis signal characteristic, and adaptive method attempts pressing arrangement space summit and the edge relevant with the feature of base video signal.
Rule mesh representation provides a kind of method, if the imaging object in the image has more space discontinuous point corresponding with the mesh edge, distortion intrinsic in motion or the motion just can be predicted or modeling with this method.
The self-adaptation mesh is substantially to consider more to form under the situation of feature of base video signal than regular mesh.In addition, the self-adaptation character of this mesh can be considered the various refinement of mesh along with going by.
In order to realize mesh and pixel registration, the present invention uses criterion of the same race to adjust the summit and searches.The summit that is associated with the xenogenesis intensity gradient on the space is to have the summit of gradient more of the same race to finish locomotion evaluation prior to those.
In preferred embodiments, the summit locomotion evaluation of mesh is by filling in other differentiation priority ranking at space equal or that be close to the locomotion evaluation on equal summit of the same race.
In preferred embodiments, initial mesh spatial configuration and last mesh configuration are to fill routine by the graphic formula of using standard to fill map image mutual mapping on the facet level with the facet identifier.The affined transformation that is associated with each triangle can be found from map table soon, and the pixel location that is associated with facet in a mesh can convert the position in another mesh soon to.
In preferred embodiments, carry out preliminary locomotion evaluation in order to evaluate the residual error that is associated with each locomotion evaluation coupling at the summit.This preliminary assessment is used for distinguishing the priority ranking of summit locomotion evaluation order in addition.The benefit of such residual analysis is that the locomotion evaluation that is associated with fewer distortion will cause keeping and more seem real mesh topology.
In preferred embodiments, mesh summit locomotion evaluation is tapered to certain limited range to scale, and multiple locomotion evaluation finishes by some iteration, in order that allow mesh near optimize more comprehensively with correct the separating of topology.
In preferred embodiments, utilize the locomotion evaluation based on section of center rectangle tile fragment neighborhood on each summit to be used to determine the top displacement of considering interpolation polygon neighborhood.Except the space interpolation and distortion of avoiding pixel at the error gradient origin, this technology also allows the parallel computing of locomotion evaluation.
Locomotion evaluation based on phase place
In the prior art, normally realize based on the locomotion evaluation of section as the space search that causes one or more spaces coupling.Section from current picture and previous picture is transformed in " phase space " as Fig. 3 is illustrational based on the normalized crosscorrelation (PNCC) of phase place, and seek the crosscorrelation of those two sections.This crosscorrelation is expressed as " phase shift " corresponding numerical value territory at edge between position and two sections.These positions are isolated by deciding threshold value, are changed into volume coordinate by inversion then.These volume coordinates are distinct edge dislocations, and corresponding to motion vector.
The advantage of PNCC comprises the contrast coverage, and this contrast is covered tolerance deviation that reserve gain in video flowing/exposure is regulated.In addition, PNCC allows the result from one step, and this one step is perhaps handled from the many iteration based on the locomotion evaluation program in space.In addition, this locomotion evaluation is that subpixel is accurate.
Utilize PNCC in the analysis in the image gradient territory of one embodiment of the invention between one or more video pictures, in order that produce the finite difference that its numerical value is factorized into the high-order motion model after a while.
Whole registration
In one embodiment, the present invention will be from one or more linear model factorization in limited differential evaluation territory.The territory that such sampling takes place is referred to herein as the generally overall of finite difference.Described method is used and strong assessment like the RANSAC class of algorithms.
As shown in Figure 4, under the situation of setting up the mass motion model, finite difference is that the random sampling (410) of collecting by those locomotion evaluations is assessed (402) by the translation motion in the general overall storehouse (404) of iterative processing, and linear shape model is factorized, and extracts the common factor (420) of those samples.Then, those results are used to regulate overall (404) so that illustrate this linear model better by dissident's sample of getting rid of this model of finding by random processing.
In an embodiment of linear model evaluation algorithm, the motion model appraisal procedure is separated with linear least-squares and is the basis.This correlativity makes this appraisal procedure break away from dissident's sample data.Based on RANSAC, thereby the method that is disclosed is a kind of effect of offsetting dissident's sample by assessment data subclass repeatedly detects and will describe the strong method of the motion model of important data subset.The model that each probe produces is all tested the number percent of the data of its representative.If enough iterationses are arranged, then will find model with maximum data subset match.
As Fig. 4 imagination and illustrational, the present invention discloses some in the formal reform that surpasses the RANSAC algorithm of algorithm change, comprises the initial sample (sample) of finite difference and the least square assessment of linear model.Composition error is to use the linear model separated to all samples assessments in general overall.Distribute a grade for this linear model according to the number of the residual error sample consistent with pre-set threshold.This grade is counted as " candidate's common recognition ".
Initial sample, find the solution and sort out by iteration and finish, till stopping criterion and being met.In case this criterion is met, the linear model that the highest grade is counted as this overall last common recognition.
Initial sampling, find the solution and sort out by iteration and finish, till stopping criterion and being met.In case this criterion is met, the linear model that the highest grade is counted as this overall last common recognition.
Non-essential improvement step comprises according to analyzing this sample subclass repeatedly and increase the subclass scale gradually with the best order of candidate's model fitting, will be above the residual error threshold value of whole subclass up to adding a sample again.
As shown in Figure 4, block mold appraisal procedure (450) repeats to till the common recognition grade acceptability test satisfactory (452) always.When this grade is unredeemed, in overall (404) classification of managing to disclose the linear model handle finite difference relevant with the model of having found.Best (highest ranking) motion model is added in the middle of the disaggregation of program 460.Then, in program 470, assess this model once more.After finishing, this overall (404) quilt is subseries again.
For will with corresponding another parameter vector space of certain specific linear model in determine subspace bunch, described non-limiting embodiments of the present invention can be used as further to be promoted the conventional method of vector space (front is described to the finite difference vector field) sampling.
The further result of whole registration program is that the difference between this registration procedure and the local registration procedure produces local registration residual error.This residual error is the error of block mold when being similar to partial model.
Standardization
Standardization refers to towards standard or common spatial configuration direction extracts spatial-intensity field sample once more.When these relevant spatial configuration are spatial alternation reversible between such configuration, pixel sample once more and subsidiary interpolation also reversible up to topological limit.Normalization method of the present invention is illustrational with Fig. 5.
When two above spatial-intensity fields by normalized the time, the counting yield of raising can realize by the standardization result of calculation in the middle of preserving.
For the purpose of registration, or equivalently in order standardizing, to be used for once more the spatial alternation model of abstract image sample and to comprise overall model and partial model.Overall model has from translation transformation to hinting obliquely at the order that conversion increases gradually.Partial model is a finite difference, and this finite difference hint is basically with section or the interpolation type about adjacent pixel determined with the piecewise linearity mesh of intricately more.
The green strength field increases rectilinearity based on the PCA display model of intensity field subclass to the interpolation of standardization intensity field.
As shown in Figure 2, object pixel (232 and 234) can be by sample once more (240) so that obtain the standardization version (242 and 244) of described object pixel.
Standardization based on mesh
The further embodiment of the present invention is mounted to unique point based in the leg-of-mutton mesh, follows the tracks of the summit of this mesh, and uses the relative position of each vertex of a triangle to assess the three-dimensional surface normal on the plane consistent with those three summits.When this surface normal conformed to the axis of projection of video camera, the imaging pixel can provide the skeleton view with the distortion minimum of the corresponding object of this triangle.Creation tends to support the normalized images of orthogonal faces normal can produce the pixel of preserving the intermediate data type, and this will improve afterwards the rectilinearity based on the pca model of outward appearance.
Another embodiment utilizes traditional locomotion evaluation based on section implicitly to set up the mass motion model.In a non-limiting embodiments, this method will be from the whole affine motion model factorization of traditional locomotion evaluation based on section/described motion vector of prediction.
Fig. 9 illustrates whole and local normalized combined method.
Progressive geometry specificationization
The classification of space discontinuous point is used to aim at the mesh of inlaying, so as they with implicitly set up the discontinuous point model when the mesh edge is consistent.
The border in zone of the same race is approximate with outlined polygon.In order to determine the remarkable right of priority of each vertex of polygon, this profile is with the precision successive approximation that reduces one by one.In order to protect the summit right of priority of sharing the summit, the summit right of priority is propagated on each zone.
In an embodiment of this invention, the priority ordering on the border that the permission of polygon decomposition method is associated with the classification of the same race of visual field.Pixel is according to some standards of the same race (for example, spectral similarity) classification, then tag along sort is connected among each zone by the space.In further preferred non-limiting embodiments, the connective criterion of 4-or 8-is used to determine the space connectedness.
In preferred embodiments, the border of these area of space is separated into polygon subsequently.All polygons are the chessboard trellis and are combined together to form preliminary mesh the space covering in all zones of the same race.Use some criterions that the summit of this mesh is decomposed, possess the better simply mesh of most of perceptual feature of initial mesh with announcement and express.
In preferred embodiments, method for registering images is setovered to these high priority summits with strong image gradient with the same of another part announcement of this part instructions.Consequent distorted pattern tends to protect the space discontinuous point that is associated with the geometric configuration of imaging object.
In preferred embodiments, Huo Yue profile is used for improving the zone boundary.The profile that enlivens in each polygon zone all allows to breed iteration one time.Each enlivens profile summit " distortion " or moves and is bonded in the calculating mean value operation in different zones, so that consider the restricted propagation of implicit expression mesh, they have membership for this mesh.
In preferred embodiments, the summit is dispensed on the counting that is fit to also as the adjacent apex number that it has in the mesh of the adjacent apex of the outline portion of zones of different.These other summit is defined as being in the opposition state.If summit counting is 1, then therefore its summit that whether opposes need be protected.If the counting on two opposition summits of adjoining all is 1 (mean these two summits in different polygons and adjacent one another are), a summit can be offered an explanation another so.When counting is 1 summit and numerical value when being the vertex of polygon opposition of 2 vicinity, counting is that to be converted into counting be 2 summit 1 summit, and the counting on that summit equals 1.Therefore, if another contiguous opposition summit, this summit can be offered an explanation again so.For this situation, it is important keeping initial summit counting, so in the explanation summit, we can find the solution direction based on initial summit counting biasing.This is for summit a becomes high-visible to summit b, and as seen with unintelligible, and summit c should become high-visible to summit b to summit b to summit c so, because b has been used to a kind of resolution.
In preferred embodiments, the T-abutment is handled clearly.These are the points in the polygon that does not have in the polygon of adjoining a little.In this case, each vertex of polygon all at first is drawn on the picture point mapping graph, the locus and the polygon identifier thereof on this mapping graph identification summit.Whether cross and test each polygonal girth then looks to have any from another polygonal adjacent apex.If have from another regional neighbouring vertices, so they each all tested, look at whether they have had from current polygonal neighbouring vertices.If they do not have, so current point is added into as current polygonal summit.This extra test guarantees that the isolated vertex in another polygon is used to produce the T-abutment.Otherwise this only adds new summit with having had in this zone under the situation of mating the summit.So, have only and when contiguous summit does not oppose with this current region, just add the opposition summit.In further embodiment, detect the efficient that T-connects by using mask images to increase.Visit vertex of polygon continuously, and upgrade mask like this, so that the pixel on summit is confirmed to be and belongs to certain vertex of polygon.The pixel of polygon girth is studied in great detail then, if they are consistent with vertex of polygon, they are registered as the summit within current polygon so.
In preferred embodiments, when the zone of a spectrum is shone upon by the image gradient of the same race zone of one or more overlappings again, and another SPECTRAL REGION of the same race is also overlapping the time, and before the zone of being shone upon again was endowed the identical labels in those zones that shine upon again with current quilt entirely.Therefore basically, if SPECTRAL REGION is covered by two zones of the same race, all all will be obtained same label by the SPECTRAL REGION that those two zones of the same race cover so, and therefore a SPECTRAL REGION is similar by a zone of the same race rather than two zone of the same race coverings really.
In one embodiment of the invention, in order to find in abutting connection with the merger criterion, processing region mapping graph rather than processing region catalogue are favourable.In further embodiment, spectrum splits sorter and can be corrected so that train this sorter to use non-zone of the same race.This allows processing is concentrated on the edge of SPECTRAL REGION.In addition, increase with use edge (for example, stable edge detector) for the different fractionation on basis and that present to that initial group polygon of active profile identification will consider zone of the same race than big difference.
Local standardization
The invention provides the method that to finish pixel registration in space-time stream in " part " mode.
A kind of such localization method is used the space of how much mesh to use the method for analyzing pixel is provided, so that the local coherence obtains explanation when the relevant apparent image brightness constancy of the local deformation of explanation and imaging phenomenon (or imaging object) in particular is ambiguous in the imaging phenomenon.
Such mesh is used to be provided at the piecewise linear model of surface deformation in the picture plane as the normalized method in part.Compare with the motion in the video high the time when the temporal resolution of video stream, the imaging phenomenon may be often corresponding with such model.The exception of model hypothesis is handled by multiple technologies, comprising: the topological constraints in pixel and image gradient zone, neighbouring vertices restriction and analysis of the same race.
In one embodiment, unique point is used for producing the mesh that is made of summit and the corresponding triangular element of unique point.The characteristic of correspondence point is that " distortion " that the interpolation of other picture hint triangle and corresponding pixel thereof causes produces the local deformation model.
Fig. 7 illustrates the generation of such object mesh.Fig. 8 illustrates and uses such object mesh picture that standardizes partly.
In a preferred embodiment, produce a width of cloth and discern leg-of-mutton triangle mapping graph, each pixel of wherein said mapping graph is all from described triangle.In addition, be precalculated with the corresponding affined transformation of each triangle as optimization step.Moreover when producing the local deformation model, the usage space coordinate moves around on still image (previous) to determine the sampling coordinate of source pixel.This pixel of being sampled will replace the present picture element position.
In another embodiment, local deformation is preformed after bulk deformation.Formerly in the instructions of Jie Shiing, whole standardization is as the program description of the pixel that uses the whole registration method from space standardization two width of cloth or several video pictures.The normalized video pictures of consequent integral body can be by further local standardization.The combination of these two kinds of methods is limited in the part standardization segmentation aspect of the solution that obtains on the whole.This can significantly reduce the indeterminate property of finding the solution needed partial approach.
In another non-limiting embodiments, the summit under unique point or " regular mesh " situation is to limit by the image gradient of analyzing those adjacent domains.This image gradient can directly or by some indirect calculation (for example, Harris response) be calculated.In addition, these points can be used space constraint and the filtration of locomotion evaluation resultant error that descends and be associated with image gradient.Qualified point can be used by many chessboards one of technology of formatting as the basis of mesh, is leg-of-mutton mesh thereby cause its element.For each triangle, produce an affine model based on those points and their residual motion vectors.
In preferred embodiments, keep the catalogue of triangle affine parameter.Current/previous some catalogue (using the summit to search mapping graph) that this catalogue constitutes by iteration.Current/previous some catalogue is transmitted to and is used for being evaluated as the routine that triangle calculates the conversion of affine parameter.Then, these affine parameters or model are stored in the triangle affine parameter catalogue.
In further embodiment, the traversing triangle identifier of this method image map, each pixel in this mapping graph is included in this pixel and has the leg-of-mutton identifier in the mesh of membership in this case.And, calculate the bulk deformation and the local deformation coordinate of the correspondence that is fit to that pixel for belonging to certain leg-of-mutton each pixel.Those coordinates are used for finishing the sampling of corresponding pixel successively and its numerical value are used in " standardization " position of corresponding person.
In further embodiment, serve as that the basis is applied to those points to space constraint to result from the density and the image intensity corresponding relation severity of image gradient search.After finishing the locomotion evaluation result, those points are classified based on certain image intensity residual error benchmark.Then, being restricted to the basis with space density filters these points.
In further embodiment, usage space spectrum splits, and little SPECTRAL REGION of the same race is merged based on space affinity (similarity of their intensity and/or color and adjacent domain).Then, use merging of the same race overlapping as base set the lumping together of SPECTRAL REGION with they and quality of the same race (image gradient) zone.Further embodiment uses point (zonule that those points are surrounded by bigger zone) around the center to support the summit of mesh as qualified interested point then.In further non-limiting embodiments, point is defined as its bounding box and is being of a size of within the pixel of 3 * 3 or 5 * 5 or 7 * 7 pixels and is being the zone of corner shape for that bounding box spatial image gradient around the center.This regional center can be classified as the corner, is favourable vertex position thereby further limit that position.
In further embodiment, level and vertical pixel finite difference image be used to the to classify intensity at each mesh edge.If there are many finite differences consistent with its locus at the edge, the summit at this edge and that edge is considered to very important for the local deformation of imaging phenomenon so.If between the mean value of the finite difference sum at edge big derivation difference is arranged, this edges of regions changes the edge corresponding to quality usually probably so, rather than quantization step.
In further embodiment, space density model end condition is used to optimize the processing on mesh summit.When checking that number was enough to cover the point of most of area of space that detects rectangle top, so can finish this processing.Stop producing score.The summit and the unique point that enter processing must assign to classify with this.If that point spatially suffers too closely with existing point, perhaps that point is not corresponding with the edge of image gradient, then it is abandoned.Otherwise image gradient in the neighbouring area of that point descends, and if the residual error of gradient surpass certain boundary, that point also is dropped so.
The standardization of rule mesh hole
The present invention utilizes the above-mentioned local normalization method of regular mesh expansion.This mesh is not consider what potential pixel constituted, yet its position is corresponding with detected object with size.
Given detected object zone, space picture position and scale generation rule mesh on the top of facial zone of pointing out facial size.In preferred embodiments, use one group of nonoverlapping tile to describe the rectangle mesh, the diagonal line of finishing tile is then cut apart the regular mesh that generation has triangle mesh element.In further preferred embodiment, tile is with to be used for those of traditional video compression algorithm (for example, MPEG-4 AVC) proportional.
In preferred embodiments, the summit that is associated with above-mentioned mesh is by analyzing the pixel area differentiation priority ranking that surrounds these summits at the specific video picture that is used for training.The gradient of analyzing such zone provides the degree of confidence about the processing that will rely on topography gradient relevant with each summit locomotion evaluation result of section (for example, based on).
Vertex position finds by simply progressively reducing image gradient at the corresponding relation of a plurality of pictures.In preferred embodiments, this is to realize by the locomotion evaluation based on section.In present embodiment, the corresponding relation of high confidence level is considered on the summit of high confidence level.The summit corresponding relation that degree of confidence is lower is to obtain from the higher summit corresponding relation of degree of confidence through finding the solution indefinite image gradient by reasoning.
In a preferred embodiment, regular mesh is to make on initial tracking rectangle.Produce 16 * 16 tile, and, form the triangle mesh along the diagonal line cutting.These vertexs of a triangle are carried out locomotion evaluation.The locomotion evaluation result depends on the quality type of each point.Quality is divided into three classes: corner, edge and of the same race, they also define the processing order on summit.The assessment result of neighbouring vertices is used on the summit, corner, that is, the locomotion evaluation of neighbor point (if can get) is used to the property foretold motion vector, and the locomotion evaluation result is applicable to each.Provide the motion vector of minimum mad error to use as this summit motion vector.The search strategy that is used for the corner be all (wide, little and initial point).For the edge, use nearest adjacent motion vectors as prophesy property motion vector again, and that of use error minimum.The search strategy at edge is little and initial point.For zone of the same race, search the contiguous summit and the locomotion evaluation of use error minimum.
In a preferred embodiment, the image gradient of each triangular apex is calculated, and is classified based on classification and size.So the corner is prior to the edge, the edge is prior to zone of the same race.For the corner, strong corner is prior to weak corner, and for the edge, strong edge is prior to weak edge.
In a preferred embodiment, the locomotion evaluation of each leg-of-mutton local deformation to be associated with that triangle.Each triangle all have to it the assessment affine.Reverse if triangle is not made topology, or become degeneration, being used to affinely with the assessment that obtains as the pixel of gable so serve as the sample of basis extraction present image.
In alternate embodiment, can limit pixel area based on geometric distortion low in cycle of training.Less distortion is pointed out higher right of priority according to the local deformation model.Locomotion evaluation and standardization based on mesh can be biased to support will tend to comprise the mesh facet of the pixel of generation as one man of maximum number.In the further embodiment of this technology, the edge detection response is used to further limit facet.
Split
The space discontinuous point of the disassembler identification by further describing be by they separately the geometric parameter representation on border (being called as space discontinuous point model) encode effectively.These space discontinuous point models can be with the progressive mode coding of constantly considering and the corresponding more succinct border of the subclass of encoding is described.Gradual coding provides a kind of strong method of distinguishing the geometry of space priority ranking in the many remarkable aspect of retaining space discontinuous point.
As shown in Figure 2, in case the corresponding relation of the feature of tracing object (220) and set up model (224) is as time goes by observed this motion/distorted pattern and can be used for splitting and the corresponding pixel of that object (230).Can repeat this program to the many objects (206 and 208) that detected in the picture (202 and 204).
A kind of form of the no change signature analysis that the present invention uses is concentrated in the identification of space discontinuous point.These discontinuous points are as edge, shade, cover, line, turning or any other visible features that causes identifiable separation unexpected between the pixel in one or more video imaging pictures occur.In addition, may only appear at the pixel of each object in the video pictures at the trickle space discontinuous point between the object like color and/or the texture classes is experiencing with respect to those objects itself and adheres to motion and experiencing in the different motions with respect to other object.The present invention utilizes frequency spectrum fractionation, texture to split the combination that splits with motion and discerns the space discontinuous point relevant with the significant signal pattern con vigore.
Time splits
Translational motion vector or in the spatial-intensity field of equal value finite difference measurement result to be integrated into the high-order motion model by the time be that a kind of motion of description of the Prior Art splits form.
In one embodiment of the invention, produce the dense domain of motion vector, the finite difference of object motion in the performance video pictures.These derivatives are by cutting apart tile regularly or pressing spatial aggregation by certain initialize routine (for example, the space splits)." derivative " of each set uses the linear least-squares appraisal procedure to be integrated into a high-order motion model.Then, consequent motion model uses the k-means clustering technique to troop in the motion model space as vector.These derivatives are based on the best heap sort of they matches.Then, group mark is by spatial clustering as the evolution of space segmentation.This program proceed to always space segmentation stable till.
In further embodiment of the present invention, the motion vector that is fit to given aperture is interpolated one group and the corresponding pixel location in this aperture.When the section with this interpolation definition crossed with the corresponding pixel of object bounds, certain irregular diagonal line that consequent classification is this section was cut apart.
In the prior art, the least square appraisal procedure that is used for integrating derivative is highstrung to outlier.This sensitive property produces the motion model that the clustered approach that makes motion model seriously tilts to the point of the big Vernonia parishii Hook of iteration result.
In the present invention, the motion method for splitting is by analyzing tangible picture element movement identification space discontinuous point on two above video pictures.Significantly motion be at the consistency analysis on these video pictures and be integrated into the movement parameter model.Identified with the space discontinuous point that this consistent motion is associated.Motion splits and also can be called as the time fractionation, may be by kinetic because the time changes.Yet it also may be that other phenomenon (for example, local deformation, illumination change, or the like) causes by some that the time changes.
By described method, can be identified with the corresponding significant signal pattern of normalization method and can separate by one of several background subtraction and ambient signal pattern (background or non-object).Often, these methods are statistically set up background model, because pixel all presents minimum variable quantity at each time illustration.Variation can be regarded as pixel value difference.
Based on the bulk deformation model that splits circumference is to reach by creating around the circumference of object earlier, this circumference is collapsed to the object centers that detects realized that up to the summit of this circumference the position is consistent with the xenogenesis image gradient.Locomotion evaluation is to infer at these new vertex positions, and strong affine assessment is used to find the bulk deformation model.
Be integrated into the bulk deformation model based on splitting the finite difference of mesh vertex graph as pedigree.
Object splits
The preferred embodiment that shown in block diagrams object shown in Figure 13 splits.This program is begun by the assemblage of normalized images (1302), and described then normalized images is pursued in the middle of this assemblage calculates difference (1304) over the ground.Then, these difference accumulate among the Accumulation Buffer district (1306) with being pursued element.In order to discern important error band, give this Accumulation Buffer district setting threshold (1310).Then, in order to support (1310) in the space of determining the cumulative errors zone, the element mask of passing threshold is carried out morphological analysis (1312).Then, the extract (1314) of consequent morphological analysis (1312) compares with the object's position that detects and compares (1320), so that processing is afterwards concentrated on the cumulative errors zone consistent with this object.Then, with the approximate border (1322) that isolates area of space (1320) of the polygon that forms its convex hull (1324).Then, adjust the profile (1330) of shell, so that initialization is used to enliven the vertex position of profile analysis (1332) better.In case enlivening profile analysis (1332) has been focused at low energy and has separated in the cumulative errors space, this profile is used as last profile (1334), and to be limited in that pixel among this profile is regarded as most possibly be those pixels of object pixel, and those pixels outside this profile are regarded as the non-object pixel.
In preferred embodiments, motion splits and can realize under the situation of the location to be detected of given specific image mode and scale.Range conversion can be used for determining that each pixel leaves the distance of this location to be detected.Pixel value is retained if be associated with ultimate range, and reasonably background model can be found the solution.In other words, ambient signal is to use the signal difference criterion to sample once more on time.
Further embodiment comprises that the use range conversion relevant with current detecting location comes distance is distributed to each pixel.If greater than the distance in certain maximum pixel distance table, this pixel value goes on record so to the distance of certain pixel.After suitable cycle of training, be big if be suitable for the ultimate range of that pixel, suppose that then this pixel has the possibility of the highest pixel as a setting.
Provide the ambient signal model, significant signal mode can be calculated difference by each time illustration completely.Each can both be become the signal difference (absolute difference) of spatial normization by sampling once more these difference.Then, these difference are aligned with each other and accumulated.Because these difference are spatial normizations with respect to the significant signal pattern, so the peak value of difference will be mainly corresponding to the pixel location relevant with the significant signal pattern.
In one embodiment of the invention, be defined cycle of training, some of them object-detection position is determined and also the barycenter of those positions be used to will consider away from this position the picture differencing produce background pixels that the highest probability as the non-object pixel will be arranged detecting location determine best frame numbers.
In one embodiment of the invention, enlivening skeleton pattern is used to by determine that in cumulative errors " image " the profile vertex position splits foreground object to come out from the non-object background.In preferred embodiments, the edge that enlivens profile is segmented to such an extent that match with the ratio that is detected object, so that obtain bigger degree of freedom.In preferred embodiments, the position of last profile can move to immediate regular mesh summit fast and flexible, so that the profile that is separated regularly.
In the non-limiting embodiments that object splits, use a guiding nuclear to come to go up paired image and produce the error image filter response at the time.The orientation and the response of the filtrator of total direction of motion quadrature are tended to strengthen the error surface in the motion with respect to background from covering background in displaying background.
The normalized images picture strength vector of normalized images assemblage is that the reference picture of the one or more creation residual error vectors of foundation calculates difference.These residual error vectors are by the accumulation of element ground, to form accumulation residual error vector.Then, this accumulation residual error vector is investigated by ground, space, so that definition is fit to the spatial object border that the space of object pixel and non-object pixel splits.
In a preferred embodiment, in order to draw the statistical threshold that can be used for, finished initial accumulation residual error vector statistics analysis to accumulation residual error vector setting threshold.By corroding the morphological operation of after expansion earlier, form preliminary subject area mask.Then, analyze the polygonal form point of this regional profile to disclose the convex hull of those points.Then, this convex hull is used to enliven the profile analysis method as initial profile.This enlivens profile and bred before it is focused on the accumulation residual error space boundary of this object always.In a further preferred embodiment, preliminary contour edge further is sub-divided into the minimum edge length that realizes being suitable for all edge length by adding the mid point summit.This further embodiment means increases the degree of freedom of enlivening skeleton pattern gradually so that be fit to this contours of objects more accurately.
In preferred embodiments, the profile of refinement is used for producing the polygon by covering this profile hint and covers the pixel mask that polygon in the normalized images is pointed out the pixel of this object.
The resolution of non-object
The preferred embodiment that block scheme shown in Figure 12 discloses the non-object fractionation or differentiates with the free burial ground for the destitute background.By the initialization of background buffer zone (1206) and initial maximum range value buffer zone (1204), this program is to work for connecting the most stable definite non-object pixel by the ultimate range that " stability " and distance is detected object's position (1202).Give the object's position (1202) that makes new advances and survey, each pixel location of this program checkout (1210).For each pixel location (1210), the service range transformation calculations is left the distance that is detected object's position (1202).If the distance of that pixel is greater than the previous position (1216) that stores in ultimate range buffer zone (1204), so previous numerical value is replaced (1218) by current numerical value, and this pixel value is recorded in (1220) in the pixel buffer zone.
Provide background image clearly, the error between this image and the current picture can be accumulated by spatial normization with by the time. and so clear background image is described in " background resolution " part.
Then, consequent cumulative errors passing threshold check provides initial profile.Then, this profile is spatially expanded so that residual error and profile distortion balance.
Gradient splits
The intensity gradient of texture method for splitting or synonym splits analyzes the partial gradient of pixel in one or more video pictures.The gradient response is a kind of statistical yardstick that characterizes the space discontinuous point, and wherein said space discontinuous point is local for the pixel location in this video pictures.Then, use one of several spatial clustering technology that these gradient response combination are become some area of space.Aspect the space discontinuous point of these regional borders in the one or more video pictures of identification is useful.
In one embodiment of the invention, the total area table notion that generates from the computer graphical texture is used to accelerate the purpose of intensity field gradient calculation.The generation of the codomain that adds up makes the summation of calculating any rectangle original domain by four inquiries that combine with four sub-addition computings become easy.
Further embodiment is used the Harris response that piece image is produced, and that the adjacent domain of each pixel is classified as is of the same race, edge or turning.Response numerical value be produce according to this information and point out the marginalisation of every kind of element in the picture or the degree of turningization.
The multi-scale gradient analysis
Embodiment of the present invention further retrain the image gradient support by producing the image gradient value with several spaces scale.This method can help to limit the qualification of image gradient, so that the space discontinuous point under different scales can be used for supporting each other that as long as " edge " can be offered an explanation, this edge should be " significantly " under several different space scales.More titular image gradient will tend to more notable attribute is corresponding.
In preferred embodiments, the texture response region at first produces, and then, this regional numerical value is quantized into some intervals with k-means by stages/be divided into the basis.Then, use each interval can split the interval of values that can be applied to it to the watershed divide and handle initial image gradient numerical value progressively as single iteration.The benefit of this method be of the same race be with the biasing definition of strong space on relative meaning.
Spectrum splits
The statistical probability of black and white pixel, gray scale pixel or color pixel distributes in the spectrum method for splitting analysis video signal.The frequency spectrum sort program is by finishing the operation formation of trooping about the probability distribution of those pixels.Then, use this sort program that one or more pixels are classified, make it to belong to certain probability classification.Then, consequent probability classification and its pixel are endowed class label.Then, make these class labels spatially be merged into the pixel area on distinct border.The space discontinuous point of these Boundary Recognition in one or more video pictures.
The present invention can utilize the space based on spectral classification to split pixel in the video pictures.In addition, the corresponding relation between each zone can be based on the overlapping definite of each SPECTRAL REGION and the previous zone that splits.
Observed already when video pictures be connected on substantially by the space with video pictures in the continuous multicolor zone in the corresponding big zone of object when forming, the identification in colored (or spectrum) zone and tracking can promote the follow-up fractionation of object in the image sequence.
Background splits
The instantaneous maximal value that the present invention includes with the space length measurement result between the detected object in every width of cloth video pictures and each the indivedual pixel is the method that the video pictures background model is set up on the basis.The given object's position that detects, the applications distances conversion produces the scalar distance numerical value that is fit to each pixel in the picture.The mapping graph of the ultimate range of each pixel is retained on all video pictures.In the time of the initial allocation greatest measure, or when upgrading this greatest measure with different new numerical value afterwards, the pixel that is suitable for the correspondence of that width of cloth video pictures is retained in " background clearly " picture.
Set up display model
The common objective of Video processing is set up model often and is preserved the outward appearance of sequence of video frames.The present invention is a target to allow using mandatory outward appearance modeling technique by pretreated utilization in mode strong and that extensively be suitable for.Previously described registration, fractionation and standardization are fit to this purpose significantly.
The present invention discloses the method for setting up the appearance change model.The main basis of setting up the appearance change model is the analytical characteristic vector under the situation of linear model, to disclose the solid foundation of development and use linear dependence relation.The eigenvector of expression of space intensity field pixel can be assembled into the appearance change model.
In alternate embodiment, the appearance change model is to calculate according to the subset of pixels that is split.In addition, this eigenvector can be divided into nonoverlapping eigenvector on some spaces.Such spatial decomposition can realize with the space tiling.Counting yield can totally realize by handling these temporarily, and the dimension of not sacrificing more general PCA method reduces.
When producing the appearance change model, the standardization of spatial-intensity field can be used for reducing the PCA modeling of spatial alternation.
PCA
The method for optimizing that produces the appearance change model is to be applied on this training matrix by video pictures is dressed up a training matrix or totally then Main Ingredients and Appearance analyzed (PCA) as the pattern set of vectors.When such expansion was intercepted, consequent PCA transformation matrix was used to analyze and the video pictures of synthetic back.Based on the intercepting level, the initial appearance quality level that changes pixel can realize.
The specific formation and the decomposition method of pattern vector are widely known by the people for the people who is familiar with this technology.
Provide that space from the significant signal pattern of ambient signal splits and the spatial normization of this pattern, the outward appearance of the consequent normalized signal of pixel itself or synonym can be factorized linear relevant composition, and wherein low-level parameters is expressed the direct exchange of considering between the approximate value sum of errors bit rate that is fit to expression pixel outward appearance.
As shown in Figure 2, in order to obtain versions of data simple and clear on the dimension (252 and 254), normalized object pixel (242 and 244) can project in the vector space and linear corresponding relation can use decomposing program (250) to set up model.
Continuous PCA
PCA uses the PCA conversion that pattern is encoded into the PCA coefficient.The pattern of expressing with the PCA conversion is good more, gives this pattern needed coefficient of encoding just few more.Admit that the pattern vector may obtain passage degradation between training pattern and the pattern to be encoded along with the time, upgrades conversion and can help to offset this degradation.As the substitute that produces new conversion, being updated in continuously under the specific situation of existing pattern is that calculating is gone up more effective.
The video compression algorithm of many state-of-the-art technologies is according to one or more other predictive pictures video pictures.Forecast model is usually based on the parameterized translational displacement of usefulness biased motion vector that each predictive picture is divided into the nonoverlapping tile that is complementary with patch corresponding in another picture and is associated.This optionally provides " motion prediction " version of tile with the space displacement of picture index coupling.If the error of prediction is below specific threshold value, then the pixel of tile is fit to residual coding; And corresponding gain arranged aspect compression efficiency.Otherwise the pixel of tile is by direct coding.Thisly in other words set up the image model by the tile that translation comprises pixel based on the motion forecast method of section based on tile.When the imaging phenomenon in the image was adhered to this modeling, corresponding codes efficient increased.For consistent with translation intrinsic in the prediction based on section supposition, this modeling restriction is in order to suppose consistently to suppose that specific temporal resolution level (or frame frequency) exists for the imaging object of moving with translation intrinsic in the prediction based on section.Another necessary condition of this translation model is must be restricted for specific temporal resolution space displacement; In other words, the mistiming that is used for deriving between the picture that predicts the outcome and the predicted picture must be relatively shorter absolute time.These temporal resolutions and limit movement make identification of some the unnecessary vision signal composition that is present in the video flowing and modeling become easy.
Decomposition based on residual error
In the MPEG video compress, current picture is by using earlier motion vector that previous picture is carried out motion compensation, residual error is upgraded being applied to those compensated section then, will anyly not fill part section of coupling at last and finishes the coding formation as new section.
Pixel corresponding to residual section is mapped on the pixel of previous picture by motion vector.The result is the instantaneous path that pixel passes through to pass through the synthetic image of continuous application residual value.These pixels are confirmed to be the pixel that can use PCA to express the most clearly.
Based on the decomposition of blocking
Further raising of the present invention determines to be applicable to whether the motion vector of a plurality of sections will cause any pixel from previous picture to be moved pixel and block (covering).Block incident for each, all be split into new layer blocking pixel.There is not historical pixel will come out yet.The pixel that comes out be placed to any will be in current picture with their matches and history matching also can be on the layer of finishing on that layer.
The time continuity of pixel is to the joint of different layers with transplant supported by pixel.In case obtain stable layer model, the pixel in every layer just can be based on the membership marshalling to logical motion model.
The subrane time quantization
Alternate embodiment of the present invention uses discrete cosine transform (DCT) or discrete wavelet transform (DWT) that each picture is resolved into the subrane image.Then, Main Ingredients and Appearance analysis (PCA) is applied to every width of cloth image among these " subrane " images.Notion is that the subrane of video pictures decomposes the spatial variations in any one subrane of minimizing of comparing with the original video picture.
With regard to mobile object (people's) image, spatial variations is tended to the variation of Zhi Peiyong PCA modeling.Subrane decomposes the spatial variations that reduces in any one decomposition image.
With regard to DCT, the coefficient of dissociation of any one subrane is all pressed arrangement space among the subrane image.For instance, the DC coefficient be obtain from each section and be arranged among the same subrane image of the stamp version that looks like raw video.This will be to all other subrane repeat, and use PCA to handle each consequent subrane image.
With regard to DWT, subrane is arranged by the mode of describing at DCT.
In non-limiting embodiments, the intercepting of PCA coefficient changes.
Wavelet
When using discrete wavelet transform (DWT) decomposition data, the logical data set of a plurality of bands is the result with lower spatial resolution.Conversion program can recursively be applied to derived data till only producing single scalar numeric value.Scaling element is relevant in graduate father and mother/child's mode usually in decomposed structure.Consequent data comprise the graded structure and the finite difference of multiresolution.
When DWT was applied to the spatial-intensity field, many abiogenous image phenomenons were to go out data structure with the first or second low strap admittance to express with inappreciable consciousness loss owing to spatial frequency is low.This graded structure of brachymemma is not exist to provide simple and clear expression when being exactly to be regarded as noise at the high-frequency spatial data.
Although PCA can be used for realizing accurate the reconstruction with coefficient few in number that this conversion itself may be sizable.In order to reduce the scale of this " initially " conversion, can use the embedding zero of wavelet decomposition to set the more and more accurate version that (EZT) structure is set up transformation matrix.
The subspace classification
Fully understand as the people who puts into practice this technology, the phenomenon data of discrete sampling and derived data can be expressed as one group with algebraically vector space corresponding data vector.These data vectors comprise pixel, kinematic parameter and feature in the standardization appearance that splits the back object or any two or the three-dimensional structure position on summit with non-limiting way.These vectors all are present among the vector space, and the geometric analysis in this space can be used for producing the succinct expression of sample or parameter vector.Useful geometric condition is to represent by the parameter vector that forms compact subspace.Mix when one or more subspaces, when forming on the surface more complicated single subspace, those key element subspaces may be difficult to distinguish.There are several method for splitting to consider by inspection by the such subspace of data separating in the higher-dimension vector space of some reciprocations (for example, inner product) generation of original vector.
The method in a kind of differential vector space comprises vector projected to be expressed among the polynomial Veronese vector space.This method is to be widely known by the people as general PCA or GPCA technology in the prior art.By such projection, polynomial normal is found, assemble, and can flock together with those normals of original vector correlation connection.The example of the practicality of this technology is the motion that the two-dimensional space point correspondence factorization of following the tracks of is as time passes become 3 d structure model and that three-dimensional model.
The GPCA technology as clear and definite definition only susceptible result uses when producing data vector with a little noise the time be incomplete.Prior art supposition supervisory routine user gets involved the management to the GPCA algorithm.This restriction limits the potential of this technology greatly.
The present invention has expanded the conceptual foundation of GPCA method, so that noise is being arranged and mixing and handle the identification and the fractionation of a plurality of subspaces when the codimension number exists con vigore.This reform provides unsupervised improvement for this technology on state of the art.
In the prior art, GPCA operates on the polynomial normal vector of Veronese mapping graph, does not consider the positive tangent space of those normal vectors.Method of the present invention expands GPCA, so that the positive tangent space of the orthogonal space of the normal vector that finds and find in the Veronese mapping graph usually.The subspace of using this " positive tangent space " or Veronese mapping graph then is this Veronese mapping graph factorization.
Positive tangent space is to discern by the application of Legendre conversion between position coordinates and tangent planimetric coordinates of the duality of the expression of plane wave expansion and announcement geometric object (tangent line of the polynomial normal of Veronese mapping graph in particular).Discrete Legendre conversion is to be applied to define the form that is tied with the corresponding derivative of normal vector by convextiry analysis.This method is used for splitting data vector by calculating normal vector under the situation that has noise to exist.This convextiry analysis and GPCA merge provides a kind of more strong algorithm.
The present invention utilizes the factorization method of iteration when using GPCA.Specifically, practicable being extended of finding in the prior art by same GPCA method described here based on derivative segmented the overall of grouped data vector.Be repeated to use, this technology can be used for finding out con vigore the candidate's normal vector in the Veronese mapping, uses the GPCA technology of this expansion further to limit those vectors then.With regard to factor decomposition step, from original data set, remove the raw data of the vector correlation connection that segments with that group.Remaining data set can be with this improved GPCA technical Analysis.This improvement is vital for using the GPCA algorithm in unsupervised mode.Figure 11 illustrates the recurrence segmentation of data vector.
People will further confirm, the present invention has in Veronese polynomial expression vector space the improvement of GPCA technology under a plurality of situation bigger advantage is arranged.In addition, when the normal parallel of Veronese mapping graph when prior art is running into the degeneration situation in the vector space axis, method of the present invention can not degenerated.
Figure 10 illustrates basic fitting of a polynomial and the method for differentiating.
Blending space standardization compression
The present invention is by giving full play to the efficient based on the encoder motion prediction scheme of section among the video flowing that the fractionation video flowing is added to " standardization ".Then, these video flowings are separately encoded and are supposed it is effective with the translation motion that allows traditional coding decoder.In the decoding of performance specification video flowing, video flowing is removed standardization, enters their suitable positions and be combined in to produce original video sequence together.
In one embodiment, one or more to as if in video flowing, detect, and the pixel relevant with each individual objects that detects split subsequently, leave the non-object pixel.Next, produce the overall space motion model at object pixel and non-object pixel.This block mold is used for finishing the spatial normization of object pixel and non-object pixel.A group image has been removed and is provided in such standardization effectively non-translational motion from video flowing, blocking mutually through being reduced to of this group image is minimum.These are two useful features of method of the present invention.
Pixel has been to provide as the input of giving traditional compression algorithm based on section by the new image of the object of spatial normization and non-object.When these image-decodings, the parameter of mass motion model is used to reduce normalized decoded picture, and the object pixel is synthesized on the non-object pixel together, produces the approximate of initial video flowing.
As shown in Figure 6, the object illustration (206 and 208) that had before detected for one or more objects (630 and 650) each all use the separately illustration of conventional video compression method (632) to handle.In addition, resulting from the non-object (602) of fractionation (230) of object also uses traditional video compress (632) compression.The result of each among these compressed encodings that separate (632) is traditional encoding stream separately, and each encoding stream (634) is dividually corresponding to each video flowing.At certain point, may be after transmission, these intermediate code stream (234) energy decompressed (636) become the complex of normalized non-object (610) and many objects (638 and 658).Pixel after these are synthetic can be removed standardization (640), become their the normalized version of releasing (622,642 and 662), these pixels are placed on correct position by the space with respect to other pixel, so that synthesis program (670) can be combined into complete synthetic picture (672) to object pixel and non-object pixel.
The integration of hybrid coding decoding
Traditional standardization of describing based on compression algorithm and the present invention of section-when the fractionation scheme combines, the method for the present invention that has some to bear results.At first, special data structure and necessary communication protocol are arranged.
Main data structure comprises that overall space deformation parameter and object split the standard mask.Main communication protocol is to comprise that transmission overall space deformation parameter and object split each aspect of standard mask.

Claims (14)

1. computer installation that is used for from numerous video pictures generating the coding form of video signal data, this device comprises:
Be used for being identified in the device of the corresponding element of the object between two width of cloth pictures or the multi-picture in the sequence of video frames of target video data;
Be used to set up the model of this corresponding relation to produce the device of skeleton pattern, described skeleton pattern is used to split non-object background in object and the sequence of video frames, wherein skeleton pattern by determine the profile summit with the cumulative errors zone of the corresponding to brightness of image of object in the position set up;
Be used for that pixel data to the described video pictures that is associated with described object splits and the device of sampling once more, described fractionation and the device of sampling once more utilize described skeleton pattern;
Be used to recover to sample the once more device of locus of pixel data, described recovery device utilizes skeleton pattern;
Described to liking one or more objects; And
The described pixel data of sampling once more is the intermediate form of target video data.
2. according to the device of claim 1, wherein to as if follow the tracks of with tracking, comprising:
The device of detected object in sequence of video frames;
Follow the trail of the device of the described object of two width of cloth pictures by sequence of video frames or multi-picture;
The detection of described object and tracking means comprise Viola/Jones face detection algorithm.
3. according to the device of claim 1, wherein:
The device that splits and sample once more comprises splitter; And
Described object is to use the space method for splitting to split from video pictures to come out, and described space method for splitting is realized by following steps:
The described pixel data that the splitter handle is associated with described object and other pixel data in the described sequence of video frames are taken apart;
The pixel data of the sampling once more of described recovery is fitted together the device that forms the original video picture with the fractionation data set that is associated;
Described splitter is integrated service time.
4. according to the device of claim 1, wherein skeleton pattern is factorized into the plurality of integral model, comprising:
The profile measurement result is integrated into the profile model building device of mass motion model;
Described profile model building device comprises the strong sampling common recognition of separating to two-dimentional affine motion model; And
Described profile model building device comprises the sample population based on the finite difference that produces according to the locomotion evaluation based on section between two width of cloth in sequence of video frames or several video pictures.
5. according to the device of claim 1, wherein the intermediate form of target video data is further encoded, comprising:
Described object pixel data is resolved into the device that coding is expressed;
The device of the described object pixel data of reorganization from coding is expressed;
Described decomposer comprises the Main Ingredients and Appearance analysis, and
Described reconstruction unit comprises the Main Ingredients and Appearance analysis.
6. according to the device of claim 5, wherein the non-object pixel of picture such as object pixel are modeled, and when other object was removed, described non-object pixel was corresponding with the residue object of picture.
7. according to the device of claim 5, wherein split and the pixel data of sampling once more is with traditional video compression/decompression suite, comprising:
Give the device of traditional video compression/decompression program the described pixel data of sampling once more as the normal video data supply;
Store and transmit the device of skeleton pattern corresponding relation data together with the corresponding codes video data;
Described whereby compression/de-compression program can improve compression efficiency.
8. according to the device of claim 1, wherein skeleton pattern is factorized into the local deformation model, and wherein the local deformation model is by with the execution of getting off:
The device of the two-dimensional mesh of definition covering and the corresponding pixel of described object;
The profile measurement result is generated the device of local motion models;
Described mesh definition device is based on the regular grid at summit and edge, and
Described profile measurement result comprises based on foundation between two width of cloth in sequence of video frames or several video pictures based on the top displacement of the finite difference that motion determination was produced of section.
9. device according to Claim 8, wherein said summit be corresponding to the discrete picture feature, comprising:
The device of identification and the corresponding important images feature of described object; And
Described recognition device is the Harris response of analysis image gradient.
10. according to the device of claim 1, wherein by determine the profile summit with picture that cumulative errors is associated in the position, skeleton pattern is used to object and non-object background are split.
11. according to the device of claim 1, thereby thereby wherein setting up the corresponding relation model generates skeleton pattern and further comprises making from one outline position in the skeleton pattern and be fit to the device that the mesh summit provides the space profiles that is consistent.
12. a method that generates the coding form of video signal data from several video pictures, this method may further comprise the steps:
Be identified in two width of cloth in the sequence of video frames of target video data or the corresponding element of the object between the multi-picture;
Set up the corresponding relation model to generate skeleton pattern, described skeleton pattern is used to split the non-object background in object and the sequence of video frames, wherein by determine the skeleton pattern summit with the cumulative errors zone of the corresponding to image intensity of object in the position set up skeleton pattern;
Use skeleton pattern, to video pictures that object is associated in pixel data split and sampling once more, wherein the data of sampling are the intermediate forms of target video data once more; And
Use skeleton pattern, recover the locus of the pixel data of sampling once more.
13. according to the method for claim 12, wherein skeleton pattern be used to by determine the profile summit with picture that cumulative errors is associated in the position split object and non-object background.
14., further comprise making and be fit to the step that the mesh summit provides the space profiles that is consistent to generate skeleton pattern from one outline position in the skeleton pattern thereby wherein set up the corresponding relation model according to the method for claim 12.
CN2006800104697A 2005-01-28 2006-01-20 Apparatus and method for processing video data Expired - Fee Related CN101151640B (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US64809405P 2005-01-28 2005-01-28
US60/648,094 2005-01-28
US65381005P 2005-02-17 2005-02-17
US60/653,810 2005-02-17
PCT/US2006/001907 WO2006083567A1 (en) 2005-01-28 2006-01-20 Apparatus and method for processing video data

Publications (2)

Publication Number Publication Date
CN101151640A CN101151640A (en) 2008-03-26
CN101151640B true CN101151640B (en) 2010-12-08

Family

ID=36777556

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2006800104697A Expired - Fee Related CN101151640B (en) 2005-01-28 2006-01-20 Apparatus and method for processing video data

Country Status (5)

Country Link
EP (1) EP1846892A4 (en)
JP (1) JP2008529414A (en)
KR (1) KR20070107722A (en)
CN (1) CN101151640B (en)
WO (1) WO2006083567A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI560650B (en) * 2012-09-12 2016-12-01 Realtek Semiconductor Corp Image processing method, image output processing method, and image reception processing method

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9578345B2 (en) 2005-03-31 2017-02-21 Euclid Discoveries, Llc Model-based video encoding and decoding
US9743078B2 (en) 2004-07-30 2017-08-22 Euclid Discoveries, Llc Standards-compliant model-based video encoding and decoding
US9532069B2 (en) 2004-07-30 2016-12-27 Euclid Discoveries, Llc Video compression repository and model reuse
US8902971B2 (en) 2004-07-30 2014-12-02 Euclid Discoveries, Llc Video compression repository and model reuse
KR101216161B1 (en) * 2005-03-31 2012-12-27 유클리드 디스커버리스, 엘엘씨 Apparatus and method for processing video data
WO2008091484A2 (en) 2007-01-23 2008-07-31 Euclid Discoveries, Llc Object archival systems and methods
CN101622876B (en) 2007-01-23 2012-05-30 欧几里得发现有限责任公司 Systems and methods for providing personal video services
JP2010526455A (en) 2007-01-23 2010-07-29 ユークリッド・ディスカバリーズ・エルエルシー Computer method and apparatus for processing image data
TW201016016A (en) 2008-10-07 2010-04-16 Euclid Discoveries Llc Feature-based video compression
WO2010103850A1 (en) * 2009-03-13 2010-09-16 日本電気株式会社 Image identifier extraction device
US8565479B2 (en) 2009-08-13 2013-10-22 Primesense Ltd. Extraction of skeletons from 3D maps
US8594425B2 (en) 2010-05-31 2013-11-26 Primesense Ltd. Analysis of three-dimensional scenes
KR102050444B1 (en) 2013-04-30 2019-11-29 엘지디스플레이 주식회사 Touch input system and method for detecting touch using the same
KR101463194B1 (en) * 2013-05-09 2014-11-21 한국과학기술원 System and method for efficient approach
US10097851B2 (en) 2014-03-10 2018-10-09 Euclid Discoveries, Llc Perceptual optimization for model-based video encoding
CA2942336A1 (en) 2014-03-10 2015-09-17 Euclid Discoveries, Llc Continuous block tracking for temporal prediction in video encoding
US10091507B2 (en) 2014-03-10 2018-10-02 Euclid Discoveries, Llc Perceptual optimization for model-based video encoding
KR101434514B1 (en) 2014-03-21 2014-08-26 (주) 골프존 Time synchronization method for data of different kinds of devices and data processing device for generating time-synchronized data
US10121062B2 (en) * 2014-11-03 2018-11-06 Koninklijke Philips N.V. Device, system and method for automated detection of orientation and/or location of a person

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1333976A (en) * 1999-01-29 2002-01-30 三菱电机株式会社 Method of image features encoding and method of image search
US6711278B1 (en) * 1998-09-10 2004-03-23 Microsoft Corporation Tracking semantic objects in vector image sequences

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5592228A (en) * 1993-03-04 1997-01-07 Kabushiki Kaisha Toshiba Video encoder using global motion estimation and polygonal patch motion estimation
KR100235343B1 (en) * 1994-12-29 1999-12-15 전주범 Apparatus for calculating motion vector in encoder using segmentation method
US6037988A (en) * 1996-03-22 2000-03-14 Microsoft Corp Method for generating sprites for object-based coding sytems using masks and rounding average
KR100611999B1 (en) * 1999-08-27 2006-08-11 삼성전자주식회사 Motion compensating method in object based quad-tree mesh using greedy algorithm
KR100455294B1 (en) * 2002-12-06 2004-11-06 삼성전자주식회사 Method for detecting user and detecting motion, and apparatus for detecting user within security system
FR2852773A1 (en) * 2003-03-20 2004-09-24 France Telecom Video image sequence coding method, involves applying wavelet coding on different images obtained by comparison between moving image and estimated image corresponding to moving image
CN101036150B (en) * 2004-07-30 2010-06-09 欧几里得发现有限责任公司 Apparatus and method for processing image data
CN101061489B (en) * 2004-09-21 2011-09-07 欧几里得发现有限责任公司 Apparatus and method for processing video data
KR20070086350A (en) * 2004-11-17 2007-08-27 유클리드 디스커버리스, 엘엘씨 Apparatus and method for processing video data
KR101216161B1 (en) * 2005-03-31 2012-12-27 유클리드 디스커버리스, 엘엘씨 Apparatus and method for processing video data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6711278B1 (en) * 1998-09-10 2004-03-23 Microsoft Corporation Tracking semantic objects in vector image sequences
CN1333976A (en) * 1999-01-29 2002-01-30 三菱电机株式会社 Method of image features encoding and method of image search

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
BILGE GUNSEL et.al.Content-based access to video objects: TemporalSegmentation, visual summarization, and feature extraction.Signal Processing66 2.1998,66(2),261-280.
BILGE GUNSEL et.al.Content-based access to video objects: TemporalSegmentation, visual summarization, and feature extraction.Signal Processing66 2.1998,66(2),261-280. *
JP特开2002-74375A 2002.03.15
Viola, P. Jones, M..Rapid object detection using a boosted cascadeofsimplefeatures.Computer Vision and Pattern Recognition, 20011.2001,11-9.
Viola, P. Jones, M..Rapid object detection using a boosted cascadeofsimplefeatures.Computer Vision and Pattern Recognition, 20011.2001,11-9. *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI560650B (en) * 2012-09-12 2016-12-01 Realtek Semiconductor Corp Image processing method, image output processing method, and image reception processing method

Also Published As

Publication number Publication date
WO2006083567A1 (en) 2006-08-10
EP1846892A4 (en) 2011-04-06
EP1846892A1 (en) 2007-10-24
AU2006211563A1 (en) 2006-08-10
KR20070107722A (en) 2007-11-07
CN101151640A (en) 2008-03-26
JP2008529414A (en) 2008-07-31

Similar Documents

Publication Publication Date Title
CN101151640B (en) Apparatus and method for processing video data
CN101167363B (en) Method for processing video data
CN101103364B (en) Apparatus and method for processing video data
CN101536525B (en) Apparatus and method for processing video data
CN101061489B (en) Apparatus and method for processing video data
CN101036150B (en) Apparatus and method for processing image data
US7457472B2 (en) Apparatus and method for processing video data
CN101939991A (en) Computer method and apparatus for processing image data
US7508990B2 (en) Apparatus and method for processing video data
Jung et al. Progressive modeling of 3D building rooftops from airborne Lidar and imagery

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20101208

Termination date: 20200120