US20190200003A1 - System and method for 3d space-dimension based image processing - Google Patents
System and method for 3d space-dimension based image processing Download PDFInfo
- Publication number
- US20190200003A1 US20190200003A1 US16/149,457 US201816149457A US2019200003A1 US 20190200003 A1 US20190200003 A1 US 20190200003A1 US 201816149457 A US201816149457 A US 201816149457A US 2019200003 A1 US2019200003 A1 US 2019200003A1
- Authority
- US
- United States
- Prior art keywords
- skeleton
- image
- elements
- dimensional
- further adapted
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000012545 processing Methods 0.000 title claims description 22
- 238000000034 method Methods 0.000 title description 91
- 230000033001 locomotion Effects 0.000 claims abstract description 109
- 241000282414 Homo sapiens Species 0.000 claims description 16
- 238000013144 data compression Methods 0.000 claims 1
- 238000003780 insertion Methods 0.000 abstract description 10
- 230000037431 insertion Effects 0.000 abstract description 10
- 239000013598 vector Substances 0.000 description 69
- 230000008569 process Effects 0.000 description 47
- 230000000007 visual effect Effects 0.000 description 38
- 239000011159 matrix material Substances 0.000 description 27
- 238000013213 extrapolation Methods 0.000 description 15
- 238000003384 imaging method Methods 0.000 description 12
- 238000010276 construction Methods 0.000 description 11
- 239000010410 layer Substances 0.000 description 11
- 230000015572 biosynthetic process Effects 0.000 description 10
- 230000003068 static effect Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 8
- 230000003287 optical effect Effects 0.000 description 8
- 230000009466 transformation Effects 0.000 description 8
- 230000006399 behavior Effects 0.000 description 7
- 230000006870 function Effects 0.000 description 7
- 210000000056 organ Anatomy 0.000 description 7
- 230000006835 compression Effects 0.000 description 6
- 238000007906 compression Methods 0.000 description 6
- 210000003128 head Anatomy 0.000 description 6
- 230000008859 change Effects 0.000 description 5
- 230000003278 mimic effect Effects 0.000 description 4
- 230000008447 perception Effects 0.000 description 4
- 230000002123 temporal effect Effects 0.000 description 4
- 238000010420 art technique Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 238000007664 blowing Methods 0.000 description 3
- 210000000988 bone and bone Anatomy 0.000 description 3
- 230000001413 cellular effect Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 239000002365 multiple layer Substances 0.000 description 3
- 210000003205 muscle Anatomy 0.000 description 3
- 238000009877 rendering Methods 0.000 description 3
- 238000004088 simulation Methods 0.000 description 3
- 230000007704 transition Effects 0.000 description 3
- 241001465754 Metazoa Species 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000002708 enhancing effect Effects 0.000 description 2
- 210000000245 forearm Anatomy 0.000 description 2
- 210000001503 joint Anatomy 0.000 description 2
- 239000004575 stone Substances 0.000 description 2
- 238000003860 storage Methods 0.000 description 2
- 238000001356 surgical procedure Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000001131 transforming effect Effects 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 241001481828 Glyptocephalus cynoglossus Species 0.000 description 1
- 241000282412 Homo Species 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 210000003414 extremity Anatomy 0.000 description 1
- 210000004709 eyebrow Anatomy 0.000 description 1
- 210000000887 face Anatomy 0.000 description 1
- 210000001061 forehead Anatomy 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000007493 shaping process Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000037303 wrinkles Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G06K9/3216—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/40—3D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
- G06T7/251—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/24—Aligning, centring, orientation detection or correction of the image
- G06V10/245—Aligning, centring, orientation detection or correction of the image by locating a pattern; Special marks for positioning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/167—Synchronising or controlling image signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/189—Recording image signals; Reproducing recorded image signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/275—Image signal generators from 3D object models, e.g. computer-generated stereoscopic image signals
-
- G06K2209/40—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30241—Trajectory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/12—Acquisition of 3D measurements of objects
Definitions
- the present invention relates to photography, image processing and animation, and more particularly, but not exclusively to three dimensional (3D) photography, three dimensional image processing and three dimensional animation.
- the present art in three-dimensional photography is based on the time dimension.
- the present invention relates to several different fields that belong to the world of 3D imagery and image processing, for example: Stereoscopic images, spherical photographing systems, 3D computer animation, 3D photography, and 3D image processing algorithms.
- Conventional 3D-stereoscopic photographing employs twin cameras having parallel optical axes and a fixed distance between their aligned lenses. These twin cameras produce a pair of images which can be displayed by any of the known in the art techniques for stereoscopic displaying and viewing. These techniques are based, in general, on the principle that the image taken by a right lens is displayed to the right eye of a viewer and the image taken by the left lens is displayed to the left eye of the viewer.
- U.S. Pat. No. 6,906,687 assigned to Texas Instruments Incorporated, entitled “Digital formatter for 3-dimensional display applications” discloses a 3D digital projection display that uses a quadruple memory buffer to store and read processed video data for both right-eye and left-eye display.
- video data is processed at a 48-frame/sec rate and readout twice (repeated) to provide a flash rate of 96 (up to 120) frames/sec, which is above the display flicker threshold.
- the data is then synchronized with a headset or goggles with the right-eye and left-eye frames being precisely out-of-phase to produce a perceived 3-D image.
- Spherical or panoramic photographing is traditionally done either by a very wide-angle lens, such as a “fish-eye” lens, or by “stitching” together overlapping adjacent images to cover a wide field of vision, up to fully spherical fields of vision.
- the panoramic or spherical images obtained by using such techniques can be two dimensional images or stereoscopic images, giving to the viewer a perception of depth.
- These images can also be computed three dimensional (3D) images in terms of computing the distance of every pixel in the image from the camera using known in art methods such as triangulation methods.
- U.S. Pat. No. 6,833,843 assigned to Tempest Microsystems Incorporated, teaches an image acquisition and viewing system that employs a fish-eye lens and an imager such as, a charge coupled device (CCD), to obtain a wide angle image, e.g., an image of a hemispherical field of view.
- a charge coupled device CCD
- the application teaches an imaging system for obtaining full stereoscopic spherical images of the visual environment surrounding a viewer, 360 degrees both horizontally and vertically. Displaying the images by means suitable for stereoscopic displaying, gives the viewers the ability to look everywhere around them, as well as up and down, while having stereoscopic depth perception of the displayed images.
- the disclosure teaches an array of cameras, wherein the lenses of the cameras are situated on a curved surface, pointing out from C common centers of said curved surface.
- the captured images are arranged and processed to create sets of stereoscopic image pairs, wherein one image of each pair is designated for the observer's right eye and the second image for his left eye, thus creating a three dimensional perception.
- 3D computer animation relates to the field of “Virtual Reality”, that has gained popularity in recent years.
- 3D Virtual reality is constructed from real images, with which synthetically made images can be interlaced in.
- 3D virtual reality demands 3D computation of the photographed image to create the 3D information of the elements being shot.
- 3DV systems Incorporated https://www.3dvsystems.com/
- the ZcamTM camera is a uniquely designed camera which employs a light wall having a proper width.
- the light wall may be generated, for example, as a square laser pulse.
- the imprint carries all the information required for the reconstruction of the depth map.
- 3D computation of photographed images may also be provided using passive methods.
- Passive methods for depth construction may use triangulation techniques that make use of at least two known scene viewpoints. Corresponding features are identified, and rays are intersected to find the 3D position of each feature.
- Space-time stereo adds a temporal dimension to the neighborhoods used in the spatial matching function. Adding temporal stereo, using multiple frames across time, we match a single pixel from the first image against the second image. This can also be done by matching space-time trajectories of moving objects, in contrast to matching interest points (corners), as done in regular feature-based image-to-image matching techniques. The sequences are matched in space and time by enforcing consistent matching of all points along corresponding space-time trajectories, also obtaining sub-frame temporal correspondence (synchronization) between two video sequences.
- 3D computer generated images is a virtual world, a designated area, created using 3D computer generated images software.
- the virtual world is created in a designated area where every point in the virtual world is a computer generated point, 2D or 3D real images may also be interlaced in this virtual world.
- FIG. 1 illustrates a virtual world, according to techniques known in the art.
- FIG. 2 shows a prior art virtual studio.
- the very opposite thing can also be done by monitoring a set of cameras in a pre-determined space such as a basketball field, where known fixed points are pre determined, and synchronized fix points are created in a computer generated 3D world.
- a pre-determined space such as a basketball field
- known fixed points pre determined
- synchronized fix points are created in a computer generated 3D world.
- ORAD Incorporated CyberSportTM product provides for live insertion of tied-to-the-field 3D graphics for sport events taking place in a basketball field, a football field, and the like, creating the illusion that the inserted graphic objects are integral parts of the event.
- an apparatus for 3D representation of image data comprising:
- a skeleton insertion unit associated with said structure identifier, for associating three-dimensional skeleton elements with said structures, such that said skeleton elements are able to move with said structures to provide a three-dimensional motion and structure understanding of said image data.
- a method for 3D representation of image data comprising:
- a recording apparatus for recording input data with depth information comprising:
- a skeleton insertion unit associated with said structure identifier, for associating three-dimensional skeleton elements with said structures, such that said skeleton elements are able to move with said structures to provide a three-dimensional motion and structural understanding of said image data
- a storage unit for recording said input data in relation to at least one of said skeleton elements and a background.
- a compression apparatus for compressing input data with depth information, comprising:
- a skeleton insertion unit associated with said structure identifier, for associating three-dimensional skeleton elements with said structures, such that said skeleton elements are able to move with said structures to provide a three-dimensional motion and structural understanding of said image data
- a compression unit for outputting said input data in relation to at least one of said skeleton elements and a background, such as to provide compression of said input data and to provide depth information thereof.
- a recording method for recording input data with depth information comprising:
- a compression method for compressing input data with depth information comprising:
- Implementation of the method and system of the present invention involves performing or completing certain selected tasks or steps manually, automatically, or a combination thereof.
- several selected steps could be implemented by hardware or by software on any operating system of any firmware or a combination thereof.
- selected steps of the invention could be implemented as a chip or a circuit.
- selected steps of the invention could be implemented as a plurality of software instructions being executed by a computer using any suitable operating system.
- selected steps of the method and system of the invention could be described as being performed by a data processor, such as a computing platform for executing a plurality of instructions.
- FIG. 1 is a photograph of prior art 3D computer generated virtual figures.
- FIG. 2 a is a first photograph of a prior art virtual studio.
- FIG. 2 b is a second photograph of a prior art virtual studio.
- FIG. 3 is a simplified block diagram of an apparatus for 3D image analysis according to a first preferred embodiment of the present invention.
- FIG. 4 is a simplified flow chart illustrating a procedure for inserting skeleton elements into a structural element identified from an image or series of images according to a preferred embodiment of the present invention
- FIG. 5 is a simplified flow chart illustrating a modification of the procedure of FIG. 4 for the case of a series of elements being recognized as a single body.
- FIG. 6 is a simplified flow chart showing skeleton insertion and its subsequent use in providing a three-dimensional understanding of 2d image data according to a preferred embodiment of the present invention
- FIG. 7 is a simplified diagram illustrating a multiple-layer format to provide a 3D understanding of a 2D image.
- FIG. 8 is a flow diagram illustrating two methods of obtaining object identification from a 2 dimensional image in which to insert skeleton elements according to a preferred embodiment of the present invention.
- FIG. 9 is a simplified flow chart illustrating the process of using a skeleton according to the present embodiments in order to provide a 3D understanding of a 2D moving element in the image.
- FIG. 10 is a balloon chart illustrating a series of exemplary applications of the present embodiments of the present invention.
- FIG. 11 is a depth map, illustrating possible imaging processes in accordance with a preferred embodiment of the present invention.
- FIG. 12 is a skeleton attached to a depth map, illustrating possible imaging processes in accordance with a preferred embodiment of the present invention.
- FIG. 13 is a skeleton demonstrating the process of deformation of its structure illustrating possible imaging processes in accordance with a preferred embodiment of the present invention.
- FIG. 14 illustrates how a structural element in a series of images should be processed from a frame in which it is in a position of minimal distortion.
- FIG. 15 illustrates a photographed image supplying 3D information from a specific direction.
- the present embodiments comprise a method and an apparatus for transforming time based sequences of photographed images into space based three dimensional (3D) models, enabling real-time and non real-time applications such as 3D real image animation, new time based sequences, image processing manipulations, 2D/3D motion capture and so on.
- the present embodiments identify structures within two-dimensional or partial three-dimensional data and associate three-dimensional skeleton or skeleton elements therewith.
- the skeleton or skeleton elements may be applied at a separate level from the original data, allowing the levels to be projected onto each other to provide accurate depth information to the image data.
- FIG. 3 is a simplified block diagram illustrating an apparatus for providing a three-dimensional understanding to image data.
- the image data may be two-dimensional or partial three-dimensional information, and the understanding is a unified understanding of three-dimensional structures and three-dimensional motion.
- the apparatus of FIG. 3 comprises a structure identifier 302 for identifying structures within the image data.
- the structures may be identified automatically using artificial intelligence or they may be identified with the help of user input, or a combination of both.
- the apparatus further comprises a skeleton insertion unit 304 , associated with said rigid structure identifier, which associates or attaches three-dimensional skeleton elements with the structures identified in the image data.
- the skeleton elements may be blocks, tubes, spheres, ovals, or any other elemental or more complex three-dimensional geometric entities, the elements have the ability to add joints to themselves and attach to each other.
- the three-dimensional shape of the element is imparted to the structure identified as above and the skeleton element is now able to move or otherwise coexist with the structures to provide a three-dimensional understanding of the structure.
- the skeleton element has a known three-dimensional structure, meaning it extends in the X, Y and Z dimensions.
- the structure's movement can be seen in the X and Y dimensions, and details of the structure's behavior in the Z dimension can be inferred from its association with the skeleton element.
- the structure identifier is preferably able to recognize not just individual structures but also complex bodies made up of interrelated structures, interrelated meaning that they have defined movement relations between them.
- An example is the human body, which consists of structures such as the forearm and the upper arm.
- the forearm pivots on the end of the upper arm in a defined manner, which can be modeled by the skeleton elements of the present embodiments.
- the skeleton insertion unit attempts to construct a correspondingly complex skeleton in which movement relations between the skeleton elements are defined as for the complex body.
- the complex body say as a human and have preset skeletons with the necessary elements and relationships preprogrammed in.
- the three-dimensional aspects of the complex body including both structure and motion can be understood. That is to say three-dimensional structure and motion within the image can be understood from a priori knowledge of an identified body. Furthermore, if the depth information for the object is known within the system, based on the skeleton, then the processing load for three-dimensional processing of the image may be significantly reduced.
- the apparatus may further comprise a movement analyzer unit 306 which may analyze relative movement within the original image data to provide movement relation definitions for the skeleton insertion unit 304 .
- the movement analyzer is able to recognize structures within the mass of pixels that make up the image and to identify movement among groups of pixels, using tracking techniques that are known in the art.
- a skeleton store 308 stores preset skeletons for use with recognized complex bodies.
- the store may for example store a preset skeleton for a human, which is used every time a human is recognized in the image data.
- the skeleton insertion unit attempts to form a skeleton from scratch by inserting geometric elements.
- the geometric elements may need to be rotated and distorted before they fit.
- a rotation unit, 310 which allows the selected element to be rotated until it fits the image data
- a distortion unit 312 which allows the element to be distorted in various ways to fit the data.
- the rotation and distortion units may be operated through user input or may operate automatically.
- a tracking unit 314 can track movement within the initial image data and move the skeleton with the image so that three-dimensional information of the motion is available.
- a process of projecting between the skeleton and the image data can be carried out, and it is possible thereby to obtain three-dimensional and movement information from a single camera.
- An animation unit 316 allows movement to be applied via the skeleton so that a figure or other object once modeled can be animated.
- the apparatus will not necessarily have both the tracking unit and the animation unit.
- An animation application would typically have the animation unit but may dispense with the tracking unit, whereas a video capture application may have a tracking unit and dispense with the animation unit.
- Rendering unit 318 is connected to either or both of the tracking unit and animation unit and renders a scene being modeled for viewing from a requested direction. That is to say, the advantage of having the 3D data is that the modeled objects etc can be viewed from any angle and not just the angle in which an image may have been initially taken. The rendering unit simply needs to make a projection of the three-dimensional model onto a plane in the requested viewing direction, apply texture etc as will be explained in greater detail below, and the scene can be viewed from the given direction.
- FIG. 4 is a simplified diagram illustrating a process for obtaining a three-dimensional model including movement data, according to a preferred embodiment of the present invention.
- Image data is obtained in stage 402 , this data may be 2D data or partial or complete 3D data. Elements within the data are identified. Skeleton elements are inserted for association with the identified structural element in stage 406 . Then in stage 408 the skeleton element is rotated, translated or scaled in order to fit the identified structural element. Translation includes distorting. Movement relations are then defined between the skeleton elements as per information available in stage 410 .
- FIG. 5 is a variation of the flow chart of FIG. 4 for the case in which a complex body such as a human is recognized.
- the initial data is obtained, in stage 502 .
- the complex body is identified from the initial data.
- the appropriate skeleton is retrieved from the data store in stage 504 and is inserted in association with the complex body in stage 506 .
- the skeleton is rotated, translated or scaled. Translation includes distorting. The result is to produce a fit between the identified structure and the skeleton elements as necessary to fit. It is noted that the very attempt to fit skeleton elements to the complex body as in FIG. 4 above, may lead to the identification of the complex body as say a human, so that an appropriate complex skeleton may be selected.
- Stages 602 to 608 are as previously described.
- Stage 612 involves the skeleton modeling the 3D object so that movement of the object is projected onto the skeleton and/or movement of the skeleton is projected onto 2D image data.
- the image data is available for rendering from any desired direction.
- FIG. 7 is a simplified diagram showing how image data may be managed in a layered structure according to a preferred embodiment of the present invention.
- the two-dimensional or partial or complete three-dimensional image data is stored in a first layer 702 .
- the three-dimensional skeleton is stored in an underlying layer 706 .
- a two-dimensional projection of the three dimensional skeleton exists in a virtual layer 704 in between.
- An ostensible two-dimensional image can be viewed from a different direction of that of the original 2D image in layer 702 by projection of the three-dimensional skeleton into that direction.
- the projection is stored in virtual layer 704 .
- FIG. 8 is a simplified diagram illustrating how objects and structures in the initial data may be recognized for the purpose of assignment of the skeleton elements.
- stage 802 Two paths are shown, the first being a manual path, stage 802 , in which a user simply identifies the elements, bodies and complex bodies to the apparatus.
- stage 804 a manual path
- 806 an automatic path
- 808 an automatic path
- a mixture of the two processes may be used. For example a user may point out some elements or a complex body to the system, and the system automatically identifies other elements or identifies the individual elements within the complex body.
- FIG. 9 is a simplified diagram illustrating the iterative nature of extrapolation of movement into the third dimension using embodiments of the present invention.
- Pixels are tracked in the initial 2D or partial or complete 3D image, stage 902 .
- the underlying skeleton is moving in accordance with the motion of points tracked in its associated structure, stage 904 , and an extrapolation is carried out in stage 906 to determine the three-dimensional position of the pixels in the initial image. It is noted that stages 904 and 906 are concurrent and affect each other, hence they are indicated by double arrows in the figure.
- a computer generated time based photographic sequence may be constructed into a three 3D model.
- Input for the initial data may be provided by a module for the reception and digital recording of photographed images or video clips, for example from already recorded video clips compressed in any known in art video format, or from a directly connected single or multiple camera(s) using USB or any other digital or analog connection which is known in the art.
- the initial data may be obtained, for example from one or more sequences of time based photographed images.
- video or film sequences create time illusion of motion in the brain of the viewer.
- the input data is analyzed.
- the analysis involves a depth map construction of the input sequence(s), building depth maps for each of the time based sequences, and processing the depth maps, as described below in the algorithm section.
- the present method ultimately creates a 3D model for objects captured by the sequence(s) of the photographed images 530 .
- individual figures can be identified in the sequence. Once identified they may be converted into stand alone 3D models. The movement of the figure can be compared with timings of the photographs in the sequence to provide a basis for mapping the movement of the figures from their progression across the photographs. Then it is possible to adjust the time lines separately for each figure to give a different sequence of events. In this way the figures being modeled can be morphed.
- the apparatus enables the user to create several different kinds of outputs from the media based on the 3D space based models created.
- the user may use on the 3D space based models created with any external image processing, animation, broadcast and so on known in art programs, or may use an internal tool such as an editor.
- Such an editor may enable the user to create and edit two main kinds of outputs, linear media and non linear media.
- Linear media refers to time line based media, thus a sequence of images taken at specified time intervals. The user is able to create a clip based on time lined events he wishes to show.
- He is then able to export the results in a variety of viewing formats, thus for example: real time live video image processing, video clips, motion capture, stills images, DVD, spherical images, 2D images, stereoscopy images or any format which is known in the art.
- the apparatus of the present embodiments may also create a non time-lined, that is to say non-linear media.
- Such non-time-lined output may comprise for example, a 3D surrounding comprising a set of images, animation, and texts.
- the apparatus of the present embodiments provides the ability to present this output as a three-dimensional virtual environment in which say a user can fully walk through any route of his choice, reach any point, look 360 degrees around at that point, interact with any figure, and so on.
- this output is a three-dimensional virtual environment in which say a user can fully walk through any route of his choice, reach any point, look 360 degrees around at that point, interact with any figure, and so on.
- non-linear output computer games, medical surgery simulators, flight simulators, etc.
- the apparatus may include an animation editor, as per animation unit 316 of FIG. 3 .
- the Animation editor 316 is a tool which gives life to every object the user chooses, the animation editor 316 also assigns to the object a certain movement such as a tree blowing in the wind or a walking human figure, with unique characteristics: how does he act when he walks, runs, when he is angry, sad, his face mimics, lips movement and so on.
- the animation editor may also attach to the object a set of predefined movements from computer animation or motion capture from an external source or using the apparatus motion capture tool, and can also define a set of movements and characteristics that characterize every object, a little limp for example, wrinkles in his forehead and so on. These movements are characteristics that assist in creating the personality of the figure.
- the animation editor may also allow creating voice characteristics using the apparatus motion capture tool for an object, which may enable him to speak.
- the software preferably employs the method and algorithms that are described and illustrated below.
- the basic platform of a preferred embodiment of the present invention is placed in a computer generated 3D axis engine, utilizing three vectors, corresponding to the 3D axis engine, and a space-time vector as explained below.
- Input image sequence S is the sequence of images which is input to the platform.
- a preferred embodiment may implement the algorithms described below.
- Sequence S is divided in to Nf(s) number of frames in the sequence, for example 25 fps in a PAL video display standard.
- the anchor points have two major elements, one element is the correspondence between the elements inside Si (were 0 ⁇ i ⁇ number of sequences), and the second element is the correspondence between Si and the 3D axis engine denoted as F.
- An algorithm may receive S0 as input and may use all the sequence frames therein for generating depth maps for the sequence.
- Factor D is defined as the depth vector of s(0,0) (will be defined later).
- z is the set of depth values of different pixels from frame s(0,0).
- min(z/zi), i 0 . . .
- D of frame S(0,i) is a 3D matrix.
- the D of frame S(0,i)—is a vector of 2D mask matrices. If the depth of pixel d(i,j) is not defined for some reason, so d(i,j) infinity, and will be defined in the boolian 2D matrix as 0
- the algorithm will try to define it using the data from multiple frames from the same sequence S (the sequence from which we take S(0,i),) If the depth map of s(0,0) or a part of it cannot be defined due to bad lighting for example, the SP treats the d(0,0), or a defined part temporarily also as “co” (infinity), and using s(0,1) . . . s(0,i) Si (were 0 ⁇ i ⁇ number of sequences), it tries to compute s(0,0).
- SP also finds infimum of values from depth map matrix. If all the frames of sequence SO depth map are successfully processed. SP finds the supremum and infimum anchor points of sequence S0 in every defined moment in time.
- d(0,0) is the nearest point of depth.
- the deepest point of depth in s(0,0) is denoted as d(0,h), D0 ⁇ d(0,0)5 d (0,1) . . . d (0,h) ⁇ .
- the factor D is a Class of depth vectors in the algorithm, where several D vectors are used to analyze the data as a working tool to correlate the image depth structured maps.
- the SP structure map is built in F, using multiple new matrixes that are opened inside F for modeling static and moving elements, and for representing parts of elements (for example hands, legs and so on).
- D is built such that every point along this vector contains its corresponding depth information at a current depth, and furthermore expresses the depth values along the depth slice of every point just as altitude lines in a topological map do.
- D is a 3D matrix, built as a 2D boolian image matrix (x,y) for every Z point along D5 marking “1” in every 2D image matrix (x,y), only the information included in the image in the corresponding depth point (Z).
- each frame (S(0,0), S(0,1 . . . S(0,n) treats each frame (S(0,0), S(0,1 . . . S(0,n) as a sub space (that is a vector space itself) of the vector space S0, that is above the field F.
- W0 the span of each basis extends over W0
- the roles of the vector base vectors are similar to the ones known in art of Math.
- the depth alignment for rigid objects such as an image background is carried out in two stages.
- the system creates 4 vectors of reference from the vector base W(0,0) Horizontal, vertical, Depth, space/time vector.
- the first vector Z ⁇ S(0,0) reflects the number of base vectors in every point of d0 ⁇ S(0,0) and creates a Z vector which expresses the depth information of the base vectors in the frame.
- the midpoint of Z is also expressed as d(anc) and is the midpoint of the frame itself.
- d (anc) can be a point that the system temporarily marks as the 0 point axis XYZ ⁇ F.
- the horizontal and vertical vectors express the vectors in every horizontal and vertical point of the image matrix along the Z vector, the fourth vector of references is a space/time vector which is used as the transformation vector from the time dimension to the space dimension.
- the system has created 3 reference vectors for the alignment unified as D′, to be used between S(0,0) and S(0,1).
- the differences between frames may be a factor of lighting, moving elements inside the frame and camera behavior such as track in/out, track left/right, crane up/down, tilt up/down pan left/right and Zoom (regarding optical or digital photographs—the difference may be found in the amount of pixels per inch, that is lower in digital zoom).
- the different shifts between the frames is mostly found in the form of the location of the pixels, thus there may be a shift of the pixels of some manner between the frames, and so SP 1 computes the three reference vectors of the frames of S0 as a function of the space/time vector. Three corresponding vectors are constructed for the 3D alignment of the images where the vertical and horizontal vector correspond to a spatial window (X,Y), and the Z vector corresponds to the depth vector.
- Each factor in the spatial (X,Y) vectors reflects the base vectors of the image in the spatial domain along the Z vector in every point of the image.
- the matching function should aspire to zero difference, or up to a minimum predetermined point between the vectors V(h0/v0/z0) of image 1 and the vectors V(h1/v1/z1) of image 2, targeted to find 0 difference in as many points as possible.
- the alignment of the unified section of the vectors at the respectively opposite edges of both vectors there may be an inconsistency at the points of difference between the frames. These points of difference may refer to the different information that may be added to the new frame but does not appear in the previous frame.
- the three vectors are the outcome of the three dimensional positional information of the images and have no relation to the visual information but rather represent the base vectors of the image in every point.
- the horizontal, vertical, and depth vectors are compared with each vector separately, to find minimal differences in as many points as possible.
- An optical element such as a lens of a camera creates distortions of the photographed image and may create minor differences in the depth map due to distortion of the same object.
- FIG. 14 shows two images and illustrates a point about camera movement distortions.
- first frame 1401 a stone column 1403 appears in the center of the frame.
- second frame 1405 the same column 1403 is on the right part of the frame.
- the solution is to identify a best image of a given object as one where it appears relatively centrally in a frame.
- the pixels receive the 3D location obtained from this most accurate measurement.
- SP sets a threshold for the deviation, and regards locations having a bigger difference as pertaining to different objects.
- the unified sections are now treated us a sub space and the vectors are recomputed as a reference of this sub space.
- the zoom factor comes into consideration, by computing vectors with a “Scalar” factor over the F field, a scalar that may multiple the vectors or divide them and thus mimic the zoom/track in or out of the camera, where the same relation between the elements in the frames is preserved, but the resolution may vary.
- a scalar that may multiple the vectors or divide them and thus mimic the zoom/track in or out of the camera, where the same relation between the elements in the frames is preserved, but the resolution may vary.
- Using the assistance of the scalar we can align the vectors of S(0,0) with S(0,1). This process may align the images, and may also point to the alignment direction of the next frame.
- the space/time vector relates to the transformation from the time domain to the space domain, the new alignment is now regarded as a unified frame e F, and the next frame is aligned with the previous unified frames. This can also reduce the computation especially when the frames repeat already aligned areas.
- the Space/time vector is the reference vector for the transformation from the time dimension to the space dimension.
- the apparatus preferably opens a new vector plain, denoted as F1.
- the new plain is an empty XYZ coordinate system in which the process of this algorithm starts from the beginning.
- the user of the system is asked if he wishes to leave F0 and F1 as different locations or rather choose to align them.
- the user may then be asked to manually align F0 and F1 (Alternatively, the user may command the system to automatically align F0 and F1) using tools such as rotation, zoom, flip and so on, in order to manually align the two structures.
- the system After the user manually aligns the F0 and F1 he commands the system to compute this alignment and the system tries to align the fields using the alignment algorithm. If the fields are well aligned then the system announces it, if not—SP asks the user to set a lower standard for the misalignment factor (less accurate alignment), The system further provides the user with a tool box for overcoming discontinuities in the image plain using known in art image processing tools.
- the system 1 defines the temporary resolution of “F0” (the Field of the XYZ axis) sign as “R0”.
- R is defined by the number points of reference per inch.
- the resolution is the outcome of the combination factor of image resolution in terms of pixels in the time dimension, and the combination of points of depth in the space dimension.
- a resolution tool can assist, as an example, in the alignment of two video clips shooting the same location from different distances.
- a table may be shot in a high resolution clip where there may be more points of reference between the parts of the table for example from one leg to the next, or from a closer location compared to a second clip that has lower resolution, or using digital zoom or from a greater distance resulting in a lower number of points of reference.
- a point of reference for dealing with the resolution issue is the 3D location of every pixel with reference to reality.
- the 3D location of the pixel in the space dimension is the computed position after transformation from the time dimension.
- the resolution allows D0 to correspond with S0.
- the visual information of the points of reference may be layered in F0 as the visual layer of information, as further explained below.
- the skeleton consists of moving graphical elements and defines their relative positions and movement patterns, so as to construct a highly accurate 3D geometrical model of a moving element, preserve its motion capture, and attach the photographed visual information to the 3D model.
- the system enables the automation of the identification and reconstruction process.
- the system has to learn that it has a moving element in the data image sequence.
- the next stage is to identify the element in the data image with a presorted or user defined skeleton element.
- the system carries out a reconstruction of the 3D structures of the element using pre determined 3D structures & skeletons or the system creates a stand alone new 3D structure built gradually, based on the characteristics of the element.
- Moving elements that add different information than their background with respect to the camera can be semi static objects that add minor information over time, such as a tree which moves in the wind, or a person who crosses the frame, turns around and steps out of the frame on the other side.
- the system firstly learns that it has a moving object in the sequence.
- the system identifies this object using a set of predetermined 3D elements or skeletons.
- the user may define and attach a skeleton or elements to the figure.
- the system constructs the 3D structure of the figure using the predetermined 3D elements or skeleton or a new user defined element.
- the system searches for discontinuity of depth pixels in the sequence over space and time. That is to say that there may be a certain special 3D structure in SO that is not coherent with the solid points of SO with respect to the camera and background in the space dimension, but rather changes its information over the time dimension.
- the system may conclude that there is a moving object in the frame.
- the system reconstructs a 3D model of the moving element, were the table is a static element.
- the matching vector can be constructed from the region around an element in question.
- a rectangular window of size N ⁇ M ⁇ Z can be chosen, thus a 3D Matrix.
- N and M are the spatial sizes of the window, and Z is the depth dimension.
- a fourth vector can be provided to define a transformation dimension of the element or object from the time dimension to the space dimension, to lead to the construction of the 3D elements and figures. Matching the element both in the time and space dimensions enforces a consistent matching of all points along corresponding 3D structure maps that may be separately built for each element or object.
- the algorithm of the present embodiments is based on the projection of the 3D information structure on to the 2D image to assist in the tracking of the moving element in the frame in relation to its background and the absolute 3D surrounding.
- identifying the current element from the image data may be carried out with the assistance of a set of presorted 3D structures.
- the system operates in steps to determine the form of the element in question or parts of it, up to the identification of the whole structure, and also assists the user to construct new structures.
- the system may be provided with a data base comprising a set of 3D structures for the skeleton elements, beginning with simple 3D geometrical 3D models such as a ball, box, pipes and so on, and up to full skeletons of rigid and non rigid bodies.
- Skeletons can be rectangular regions for identifying and modeling a car for example, and up to animals and human‘s’ skeletons as shown in FIG. 12 .
- a skeleton is a complex 3D data structure that includes three crucial elements:
- the assembling of the skeleton refers to taking the structure of the skeleton and defining its parts, down to the smallest definition of body parts, in the sense that from these parts, the skeleton elements, the system can understand and build the body in question or build new bodies at the request of the user.
- the human arm may be based on a 3D cylinder, connected via a joint to another cylinder which may represent a hand.
- the head may start with the simple figure of a 3D ball, and may be connected to a joint which represents the neck.
- the neck in turn is connected to a big cylinder which represents the trunk.
- the ability to shape the skeleton according to the input 3D input is used in the identification process and in the reconstruction process as explained below, with respect to FIG. 13 which shows a skeleton in which a part thereof undergoes a deformation.
- the system determines the identification of the element. This process can be done automatically or manually by the user as shown in FIG. 8 , and involves identifying the element in question to the system and attaching an internal skeleton to the figure or building a new structure.
- the system attempts to identify it and attach it to a set of matching skeleton elements, chosen from a previously defined set of skeleton elements, or a specific skeleton, defined for the moving object by the user, preferably using a set of tools which is provided by the system.
- the attached skeleton elements are automatically adjusted to the size, shape and movement pattern of the moving object, so as to fit the moving object in terms of size, shape and movement pattern.
- the system completes the set of skeleton elements with an appropriately overlaid texture.
- the system further provides tools for extrapolating the moving objects onto a 2D plane for any desired point of view.
- Exploiting the properties of the 3D structure based alignment allows us to match information in various situations such as between different video sequences, matching under scale (zoom) differences, under different sensing modalities (IR and visible-light cameras) and so on.
- An element may be attached with a basic skeleton made of tubes and joints attached to arms legs and body, and a ball to the head, the depth alignment may add new information to the creation of the 3D element structure and to the correlation with the 3D figure, such as the physical behavior of the basic skeleton, length and thickness of the tubes with respect to the arm, body, legs, the size of the ball attached to the head and so on.
- the software tries to identify the structure of the moving object, as much as possible using its depth information. With the assistance of a set of previously defined 3D elements, step by step, the software determines the form of the object parts, to complete the full structure even if some of the visual information does not exist.
- the first step is to identify the object and determine its basic form. Then the system attempts to complete it as much as possible.
- SP tries to reconstruct the object details using a set of 3D skeleton elements (such as a ball, a box, pipes and so on.)
- the system may receive a full depth 3D map of an image.
- depth maps There are known in the art algorithms for constructing depth maps of images including its moving elements. Depth structure maps using space time stereo algorithms for example, make use of at least two cameras.
- the present algorithm can be used for extrapolating depth maps of moving elements using a single camera, as described above.
- the system may use a pre-acquired depth value of the static rigid background using known in art algorithms, and refer to the moving element as a stand alone 4D matrix with relation to its background, using reference points.
- the projection of the 3D information structure (such us a pre made 3D skeleton) on to the 2D image plane assists in the tracking of the moving element in each frame in terms of the depth axis.
- the ability to create depth maps of the element is provided.
- the attachment of the skeleton and organs fits the image to the depth maps, synthetically duplicating any object and capturing its motion. The latter process may further involve overlaying texture of the element on the reconstructed skeleton, completing the re-construction process as further explained below.
- the present method thus forces the creation of the 3D map of the moving element in the frame.
- the first step of depth extrapolation is the tracking of the 2D position of every pixel along each frame, creating a trajectory for each pixel.
- This tracking is performed using known in art tracking algorithms.
- passive methods for finding the same pixel on two images and also along the time dimension use alignment of color, shade, brightness, and ambiguities of the pixels to locate the same pixel in two frames, and along the time axis.
- the present tracking algorithms in the 2D image plane in time lack any projected pattern to assist the identification and thus tend to collect errors over time.
- each frame is on the one hand a 2D projection of a 3D data structure, and on the other hand a 3D projection of a 2D data structure, with identified organs of the body, say hand, left leg, right leg and so on.
- a predetermined 3D skeleton is projected onto the 2D image plain.
- the system in effect creates a shadow like image in an additional layer of information as explained above with respect to FIG. 7 .
- the extra layer of information pinpoints the parts of the image that need to be tracked, cutting the errors immediately and preventing their growth.
- Such a stage allows for tracking and extrapolating the depth of a moving element such as a walking person, who includes both rigid and non rigid elements.
- the 3D skeleton may then be used for extrapolating the 3D depth map of the already tracked 2D element in motion.
- n Numberer of frames.
- A a moving element with respect to a background
- the input is thus “M” with n frames.
- the system identifies the moving element, as is explained elsewhere herein, and there then follows a process of aligning F(b) ⁇ -G(A) the 2D projection B of the 3D skeleton on A. Alignment is for an initial defined threshold which can be changed.
- the process continues by searching and tracking Q, thus creating a T for each Q, where ⁇ is ⁇ (f(a,b)i, and q i ⁇ 1) is the function of the trajectory vector. Tracking achieves the location of the feature Qj in frame i on image a and shadow b, and adds it's location in frame i+1, and so on for (k) number of frames. It attaches the new point in 1+1 to image B, the new information on frame i+1 will enable to move image B according to the movement of image A. Thus the leg in B will follow the leg in A. The process then receives a new infinite number of Q accurately positioned in every new frame. For each T we will extrapolate Z, and the output will be an exact super resolution depth map of the moving element.
- the Z dimension may then be extrapolated using reference points from the 4D matrix surrounding the element at t, and t+1 and so on. This may be done with respect to the camera's motion and focal point with respect to the background. Rays from the reference points then enable the computation of:
- the depth extrapolation process could be for example the following.
- t is a 3D point (pixels or feature) on the element in time T (whose 3D coordinates we wish to find), and
- ‘t+r is the same 3D point but at time ‘t+1’(for which we also want to find 3d coordinates.
- 3D structure maps of the elements in motion assists the system to further fully reconstruct the 3D model of the element, recover the 3D geometry between different sequences, and handle differences in appearance between different sequences.
- the present method enables forcing of the creation of the 3D model of the moving element in the frame.
- the present algorithm is space based in concept.
- the projection of a 3D information on to a 2D image plane also enabling extrapolation of 3D information
- the space dimension based algorithm projects the 2D world with its 3D depth maps into a space based 3D world.
- the N ⁇ M ⁇ Z window referred to above which was chosen around an element in motion, is actually a 3D matrix (that turns in to a 4D matrix), a new (XYZ) axis field “f”, where the user can attached a pre defined internal skeleton or parts of 3D figures (tubes, joints etc).
- the process of depth extrapolation also includes the identification of each pixel and feature movements between frames, creating a 2D motion flow of pixels and features over time.
- the system transforms the 2D motion flow into a 3D motion flow.
- the system may use the reconstruction algorithm to correspond between the factor D of each frame (and the unified frames) with the internal attached skeleton over time (with its own factor D′) to define and construct the proportions of the internal skeleton transforming the 3D matrix in to a 4D matrix over the space & time dimension using the fourth vector referred to above, that is transformation between space and time.
- the system may create a full 3D structure depth map or receive a full 3D structure depth map of the moving elements.
- the static surroundings are modeled separately from the moving elements as previously was explained.
- the present embodiments enable a full 3D super resolution reconstruction of a 2D body such as a human in motion from an original 2D or partial 3D image, to a 3D model.
- the process involves also capturing the 3D structure texture and motion, constructed on the base of an internal 3D skeleton.
- the skeleton may be built with full internal bones and muscle using a full skeleton physics' data base.
- the system enables infinite manipulation on the 3D reconstructed model, for example for animation, motion capture, real time modeling and so on.
- FIG. 12 illustrates the building up of a full anatomical model from individual skeleton elements.
- Projection is carried out using the 3D depth map of the image and body, using the 4D matrix around the element with respect to the background, using the reference points.
- Tracking of the 3D movements of the element over frames is based on the DTM optical flow of the pixels and trajectories. Tracking allows for learning as much 3D information as possible on the 3D formation of the element, and interpolating the skeleton's 3D structure onto the depth map acquires its 3D formation.
- the skeleton's 3D structure preserves the learned information of the element over time, in order to design the skeleton's 3D structure to as accurately as possibly provide the element in the image.
- a first stage involves assigning a skeleton to the object.
- a second stage involves using the skeleton to learn the details of the object, such as the structure of the face, eyes, nose etc. Having learned the kind of structure from the skeleton it is now more a process of expecting to receive certain details and adjusting the right 3D details to the figure.
- the system in accordance with a configured policy, expects to receive 3D and visual information of, for example, the eyes, nose, etc. Since the SP expects the eyes and eyebrows, for example, in certain areas of the head, it is easier and faster for the SP to analyze this information with respect to the 3D figure.
- the present embodiments can be used as a motion capture tool to acquire the movement of the element, enabling the user to motion capture the element in the frame, not only as a 2D image but as a 3D model with texture.
- the process of the reconstruction can be carried out in the following way.
- the initial configuration is a 4D matrix, the input will be a DTM.
- the DTM can be from an external algorithm.
- the DTM can also be from the depth extrapolation algorithm of the present embodiments.
- the modeling process is a parallel process to the depth extrapolation process, where intuitively speaking several matrixes are located one beneath the other, in which the first matrix is the image 2D matrix, below that is the 2D projection matrix (of the 3D structure), and below that is the 3D data structure.
- the input also includes 3D trajectories based on the 2D tracking of the pixels and in particular of feature points that are set along the frames to allow movement to be followed.
- the feature points may be based on color or other properties easy to track.
- the trajectories are transformed into the DTM, and the system transforms them into 3D trajectories that mark the 3D position of pixels and feature points along the frames.
- the system sets a constraint between the projection of the 3D skeleton, and the input depth maps, along the time axis making an exact tracking with infinite new tracking points in every frame generated from the 3D projection with identified organs of the element. It thus knows were the hidden parts of the 3D body are and were they are in the 3D space.
- the system carries out 3D tracking of the points using the 3D skeleton data structure to force the creation of the exact supper resolution 3D model of the moving element.
- T3d Trajectories (3D point locations vector of Q3d),
- ⁇ 3d The transition function of T3d on Q3d.
- Model the reconstructed 3D model.
- the system aligns F(E) ⁇ G(S) such that the 3D skeleton S is aligned into the DTM-E.
- the system uses T3d to 3D track the Q3d on Et to the next DTM Et+1, where ⁇ 3d is ⁇ 3d (f(s,e)i, q3d i+1) the function of the trajectory vector, the location of the feature Q3d j in frame i on skeleton S and DTM E. and adds thereto its location in frame i+1, and so on for (k) frames for each Q3d.
- ⁇ 3d is ⁇ 3d (f(s,e)i, q3d i+1) the function of the trajectory vector, the location of the feature Q3d j in frame i on skeleton S and DTM E. and adds thereto its location in frame i+1, and so on for (k) frames for each Q3d.
- For each frame attaches the new point in 1+1 to S, and the new information on frame
- Factor D and D′ enables the system to change the formation of S, according to the formation of the collective 3D information of E3d, in the 4D matrix surrounding the element & skeleton in t, and t+1 (and so on).
- the system infers from the model where limbs and other elements may be expected to appear in the following frame.
- D is the key factor in the complex mathematical structure that synthetically duplicates S3d in the form of the 3D element.
- the system transforms the 3D skeleton in to a new data structure, collects and saves the gathered formation in the sequence of the DTM's on the 3D skeleton data structure, by tracking the 3D position of the pixels or feature points for creating an exact super resolution 3D model duplication of the moving element.
- ⁇ a, b, c, . . . ⁇ are the points with 3D coordinates on E3d i.
- An estimation is done to attach the corresponding points on S3d i ⁇ a′,b′,c′ ⁇ to the points in E3d i.
- a system aligns the factor D′ of the sub space of the 4D matrix to the factor D of E3d i, aligns S3d i as a unified unit and also splits S3d in to predefined miniature 4D matrices which each hold one D1 factor as a stand alone 4D sub space, thus, reconfiguring the formation of S3d i to the formation of E3d I, and then on to i+1 . . . .
- the output is then an exact super resolution reconstruction of the element (shape and texture as will further be explain), and the 3D motion capture of the moving element.
- the texture overlaying is part of the modeling process, as will further be explained hereinbelow.
- the above mentioned constraint allows for fully reconstructing the 3D model of the element. It enables the system to also recover the 3D geometry between different sequences, and handle differences in appearance between different sequences. Exploiting the properties of the 3D structure based alignment allows us to match information in situations which are extremely difficult such as between different video sequences, matching under scale (zoom) differences, under different sensing modalities (IR and visible-light cameras) and so on.
- the movement of the element may relate the model to its 3D motion capture, using the space dimension in the above image processing adds value in every different frame in the fact that it supplies more 3D and visual information for the reconstruction process for a more accurate 3D model of the object.
- the model can be kept as a separate figure from its origin & background, and can be used for future animation. Its origin movements can be used for motion capture. It may add more visual information from different times or location to the same figure, and can be changed to a new 3D figure depending on the users actions, just as a computer generated image on top of a polygon internal skeleton could be.
- the figure becomes independent with respect to the background they had, and can be used for further animation.
- SP may combine the information obtained from the different times or locations.
- the visual information may be taken in different times or locations, and the computed information is added to the master 3D model of the figure.
- the 3D structure of the object we can create animation, mimic face, add voice and so on, at the level of the object itself, with no dependency on the specifically shot background.
- the system can also use its ability to capture the motion within moving elements, thus to animate an existing 3D model using motion capture of a full body animation or part of it such as the mimic of the face.
- the assistance of the user may be needed.
- the user may be asked to assist the system to define the 3D figure in the frame, to indicate what 3D information the system should use, and whether to leave this figure as a time based sequence, without any attachment of animation.
- this element may be edited using regular image processing tools.
- the only difference between the present and the previous example is that the present example remains a time based 3D object and not a space based 3D object.
- Image processing tools allow the user to attach together surroundings of different times and locations, correct distortion in the image, remove elements and create new elements based on the information created by the input and also to create 3D computer generated figures or to input computer generated figures from different 3D computer animation programs.
- each point After receiving the three dimensional location in the space based three dimensional model, pending on the determined resolution, each point receives visual information layer(s), including the values of color and brightness as recorded in the digital information of the image.
- the resolution of the model compared with the resolution of the photographed image, the spherical information of each pixel, and different qualities of the visual information from different cameras or from different clips, etc.
- the image resolution is higher then the determent resolution of F and thus there is a more than the needed amount of information for every pixel in the 3D model. For example, if the photographed image is 5 times larger in terms of the number of pixels per inch, then the system sums and averages the visual information for every 5 pixels into one pixel and creates a new pixel in the 3D model with the new computed value.
- the second case is where the resolution of the 3D model is larger than the resolution of the photographed image.
- every frame generates texture pixels within the frame and if the camera moves a little, pixels will photograph neighboring 3D points enabling to collect more visual information for a unified model then the total amount of pixels in the image.
- Such a case can happen for example while shooting an image from a distance or using digital zoom and so on.
- the system extracts the information for each pixel from the neighboring pixels and along the time dimension.
- a key element of multiple layers of visual information for every pixel is crucial and will further be discussed.
- New pixels are now created and overlaid on the surface of the model, at the level of the system resolution.
- Each new pixel now has a three dimensional position in the 3D space based model, and just as in real life can be observed from the full 360 degrees.
- individual pixels are not observed from 360 degrees.
- a point in a wall may be looked at from 180 degree (the back of the wall has different information in different pixels pending on their 3D location), a corner of a stone is observed from 270 degree, and so on.
- FIG. 15 illustrates a photographed 3D image supplying visual information from a specific direction.
- Each photographed image supplies visual information from a specific direction. If SP receives visual information of a pixel from a specific direction only, it flattens the pixel, enabling to look at it from 180 degrees. This case creates some distortions in the visual quality when looking at this pixel from a side direction.
- a preferred embodiment of the present invention provides a half spherical pixel formation unifying multiple layers of visual information for each pixel with respect to its 3D location in the space dimension. It is possible to add infinite number of pixels and in terms of visual quality—we are creating super resolution,
- the super resolution also relates to the number of depth points it is possible to collect in the unified model creating super resolution 3D points.
- the depth points allow deformations of the surface in the most accurate way.
- every pixel can be photographed in many frames along each clip. Nevertheless, not all of this information is needed as not all the information has the same level of quality. Discarding of low quality information is a point that can assist in the lowering of computation needed for the image processing, but every piece of information is preferably used in order to enhance the image quality due to poor: image quality, lighting, camera resolution and so on.
- the system creates a grade of quality Q, were each new layer of information, which mines the information from every new frame is examined as for the quality of its visual information, and resolution.
- the visual information is graded by two factors, one is the image quality in the time dimension, the second is the image quality in the space dimension.
- SP may receive two clips, shot from the same location inside of a building, using different apertures, for photographing the inside of a room and an external garden.
- the camera uses an aperture with high exposure, this enables the camera to receive good visual information of the interior parts of the image while the external parts of the garden are over exposed and appear in the image as burned or excessively bright.
- the camera uses a low exposure aperture, this creates very dark visual information of the internal parts of the image, but the external parts of the image are very balanced and well exposed.
- Each of these clips may not be well balanced as a stand alone unit, and the histogram of each of them will show unbalanced results.
- a first factor is based on the time dimension, mining the histogram for every frame as a separate unit, and its quality with respect to F, and
- the second factor is from the space domain in which the already composed images refer to certain areas in the frame to achieve higher quality, even if in F they suffer from poor Q.
- the system searches the new clip for better visual quality in the specific parts needed for SP, not in any correlation to the neighboring frame pixels, but with correlation to poor Q of F neighboring pixels, as was explained above with respect to FIG. 14 .
- the system creates a well balanced image that in the present example, gives a very well exposed image that shows the external garden as well as the interior room in the best quality possible as if it were shot using a different aperture at the same time in the same image.
- the system regards image information up to a certain minimum level of Q, meaning that if the image is lower then that minimum for both of the two factors above, than there is no point to using this information or to add its values to the existing values of the pixel's texture.
- the adding process of the new information is on the basis of Q ⁇ SP,
- the higher Q will be the higher value of participation that the information has in the pixel value, and the lower Q will be the lower value of participation its information has in the pixel value.
- the system unifies the information from both clips to yield a balanced and well exposed view of both the inside of the room and the external garden.
- the system may set a threshold for quality Q, and discard visual information accordingly.
- the image processing may also include processing methods such as those employed by standard camera control units (CCU) in order to balance the image and achieve uniformity between adjacent images.
- CCU camera control units
- the constructed space based 3D model including its visual information is the collective result captured of all the image sequences is fed to the SP.
- virtual cameras are arranged in such a way that the field of vision of two adjacent lenses is overlapped to a great extent by the fields of view of the two adjacent lenses lying on the lens sides, with respect to a horizontal axis. Consequently, stereoscopic images can be generated.
- a preferred embodiment facilitates generation of full time based sequence, Live sequence, non linear output, stereoscopic/3D spherical imaging, and so on.
- virtual cameras are arranged in a specific configuration, wherein the field of vision of any of the lenses is overlapped to any desired extent by the fields of vision of all adjacent lenses surrounding the lens, the collective field of vision comprises a collection of fully circular images wherein any point within each of the field of vision is captured by at least two virtual lenses for creating a stereoscopic spherical imaging, or from one virtual lenses for creating a 2D spherical imaging, or form at list two virtual lenses for creating a 3D spherical imaging from any view point.
- stereoscopic data can be made available for viewing a scene filmed through a single camera.
- the images created by SP can be displayed to a viewer in various formats, such as stills, video, stereoscopic viewing, virtual reality, and so on.
- the images formed can be displayed on a flat screen such as a TV or a computer screen or by using a display device for virtual reality such as a virtual reality headset, were the part of the image being displayed changes according to the user's viewpoint.
- a display device for virtual reality such as a virtual reality headset
- Virtual reality visual linear and non-linear information is provided to the user, using known in the art virtual reality means.
- Such means may be a headset having sensors to detect the head position of the viewer, or a virtual glove having a sensor to detect the hand position, or any known in art viewing software.
- the viewing parameters of a user are taken from a user held pointing device (for example: a mouse or a joystick), programmed for this purpose.
- a user held pointing device for example: a mouse or a joystick
- the system can gather the user's own movements using this inventions real time motion capture capabilities for example, or any motion capture from any external device.
- the viewing parameters are detected and received by the displaying system.
- the viewer's viewing parameters include the viewer's viewing direction and viewer's horizon.
- the viewer's field of vision is determined in terms of the coordinates of the surrounding of the viewer and the image is projected into the viewing means.
- the present invention is not limited with regards to the type of camera(s) used for capturing the images or sequences of images fed to the SP.
- the camera(s) may be selected from any known in the art digital or analog video cameras.
- the camera(s) may also be non-digital, in which case any known in the art technique may be used to convert the images into a digital format.
- digital images may be manipulated for enhancing their quality prior to storage and conversion into a spaced based 3D model.
- FIG. 10 is a balloon chart illustrating different applications of the present invention.
- the user when a complete space based 3D model constructed as explained above, is available, the user can, within the virtual environment place virtual cameras to in effect re-photograph the scene from view points where no cameras were located in the original sequence. Furthermore this can also be done in real time: For example, in a basket ball game virtual cameras can be placed to shoot the games from view points where there are no actual cameras. All that is needed is to have previously modeled the arena and the individual players. In fact the modeling can be achieved in real time early during the broadcast, as an alternative to doing so beforehand.
- each figure once captured from the sequence can be re-animated by the user, who may also use motion capture for example from an external source or the motion capture of the SP, thus changing how the figure moves in the original clip. That is to say, the user may reconstruct a model from the original photographed image, but output in real time other movements of the figures.
- the user may modify the original figure from the image or even replace the figure with a completely new manipulated figure.
- new animation can be given to each figure with no dependency on the original movement of the figure during its photographed clips, by replacing the figure with a 3D model thereof, allowing the creation of new movie clips with the figure itself, using the techniques discussed herein.
- the figure may also be manipulated in real time by a user in computer games, console games, TV games, etc.
- a preferred embodiment introduces new lighting into the 3D model using known in art techniques, for adding light to a scene in animation or during post production of a video clip.
- a preferred embodiment comprises depth extrapolation in the arena to any desired point of reference of each element and background, as part of the 3D modeling of the elements and backgrounds.
- Depth extrapolation comprises a depth map analysis of the sequence(s) of photographic figure input to the system, which can be carried out in a number of ways as will be explained in more detail below.
- Preferred embodiments may allow various manipulations such as motion blur on the image.
- the user can create a full motion picture from the figures and backgrounds.
- the user can create a full computer game (console game. TV game etc.) using the 3D space based model where all the figures are real image based 3D models.
- computer generated images can be added to the three-dimensional environment and three-dimensional models therein. These images can have effects such as altering the skin of the model, adding further computer generated elements to the model or the background and so on
- the user can use the time line information associated with individual figures within the sequences to reconstruct the motion of the figure in a motion capture stage.
- the present techniques work using a sequence of images from a single camera or from images from two or more cameras.
- two dimensional and three dimensional tracking can be applied to any of the figures and backgrounds identified, based on their movements in the time based clips.
- the tracking can be done in real time, or later as part of re-animating the clips.
- the user may also add moving or static elements to the figures or backgrounds in the space based 3D environment.
- the user can create new arenas that were not originally photographed.
- the user may combine several different surroundings into a unified arena, or combine a photographed arena with a synthetic arena which is computer generated.
- the user can use a figure that is reconstructed using the present embodiments in a 3D model, remove it from its background, and relocate it to different arenas, or to export it to any computer generated program.
- the user can create a new figure based on the reconstructed figures.
- the user may further add or change her texture, organs, and so on.
- the user may use existing footage, for example an old movie, and use the data of the movie to model figures and backgrounds of the movie. This may be done by creating a full 3D space based environment or arena of the figures and locations therein, and then create a new movie made from the original figures and surroundings, based on the 3D environment that he has created.
- virtual gathering can be done using virtual 3D replication of the user.
- Such a virtual gathering may involve motion capture of the user.
- An application is allowing the user to participate in a virtual martial arts lesson where the teacher can see the 3D figure of the user and correct his movement, and each student may see the other students as 3D figures.
- the motion capture can be done using the user's own web camera.
- Such an application may also be used for additional educational purposes, virtual physical training, virtual video conferencing, etc.
- the 3D model and motion capture may also be used for virtual exhibitions, multiplayer games, or even virtual dating.
- the space based 3D model may be used in simulation, simulating combat arena for training soldiers, flight simulation, and so on.
- the 3D arena can be used in medical devices. It may be used for manipulating images acquired from one or more sensors. The images may be used to create a 3D model of a body organ for use during an actual surgical procedure in real time or for the purposes of simulation.
- the 3D models and environments described herein may be used for planning and design, for example, in architecture and construction engineering.
- models and environments described herein may also be used for transition between different video standards, such as between PAL and NTSC.
- One application of the techniques provided herein is video compression.
- space based 3D modeling using the photographed clip allows for transmission of the model, after which almost all that is necessary is the transmission of movement information.
- Such a technique represents a large saving in bandwidth over transmission of video frames, for the application is applicable to various uses of video and various quality specifications, from motion picture to cellular video, clips.
- the present embodiments provide a new method for video recording wherein the recording is directly made into or applied on the 3D space based model of the present embodiments.
- the video frames themselves can be reproduced after the information has been extracted to the model.
- the 3D model of the present embodiments can be used for capturing and modeling moving elements in real time from a single source, and viewing them from any direction.
- multiple users at different screens are able to view these figures from any direction or zoom, in real time.
- a device may be used in real time for capturing 3D movement of the user, and using it for fully operating the computer with the 3D movements of hands or body, for any computer program.
- This implementation may utilize a specified camera, a regular camera such as a regular video camera, a stills camera or a cellular camera.
- the user may be immersed within a computer game where one of the existing 2D or 3D characters in the game moves according to the movements of the user. This can also be done in the user interface of the cellular mobile phones or any other hand held mobile devices.
- users can model themselves as a full or a partial 3D model and immerse themselves in a computer game or any other relevant computer program
- 3D modeling can be done using any kind of sensor gathered information such as infra red, etc.
- microscopic information can also be modeled into the novel 3D space based model using data gathered from suitable sensors.
- 3D models and texture can be used to create new user defined 2D/3D arena based on data gathered from sensors without optical information, such as subatomic particles, distant stars, or even areas the sensors cannot capture (for example—behind a wall).
- the 3D SP process may be used in machine vision enablement.
- it may be used to provide three-dimensional spatial understanding of a scene to a robot.
- the robot is thus able to relate to a human as a unified three-dimensional entity and not as a partial image in multiple frames.
- the resulting robot may have applications for example assisting disabled people and so on.
- the 3D SP process may create a super resolution reconstructed 3D model in terms of the number of texture pixels per inch and number of depth points that construct the 3D formation of the model.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computer Graphics (AREA)
- Geometry (AREA)
- Software Systems (AREA)
- Processing Or Creating Images (AREA)
- Image Processing (AREA)
- Image Generation (AREA)
Abstract
An apparatus for 3D representation of image data, comprising: a structure identifier for identifying structures in motion within image data, and a skeleton insertion unit, which associates three-dimensional skeleton elements with the identified structures. The skeleton elements are able to move with the structures to provide a three-dimensional motion and structural understanding of said image data which can be projected back onto the input data. As well as individual elements, complex bodies can be modeled by complex skeletons having multiple elements. The skeleton elements themselves can be used to identify the complex objects.
Description
- The present invention relates to photography, image processing and animation, and more particularly, but not exclusively to three dimensional (3D) photography, three dimensional image processing and three dimensional animation.
- The present art in three-dimensional photography is based on the time dimension.
- The present invention relates to several different fields that belong to the world of 3D imagery and image processing, for example: Stereoscopic images, spherical photographing systems, 3D computer animation, 3D photography, and 3D image processing algorithms.
- Conventional 3D-stereoscopic photographing employs twin cameras having parallel optical axes and a fixed distance between their aligned lenses. These twin cameras produce a pair of images which can be displayed by any of the known in the art techniques for stereoscopic displaying and viewing. These techniques are based, in general, on the principle that the image taken by a right lens is displayed to the right eye of a viewer and the image taken by the left lens is displayed to the left eye of the viewer.
- For example, U.S. Pat. No. 6,906,687, assigned to Texas Instruments Incorporated, entitled “Digital formatter for 3-dimensional display applications” discloses a 3D digital projection display that uses a quadruple memory buffer to store and read processed video data for both right-eye and left-eye display. With this formatter video data is processed at a 48-frame/sec rate and readout twice (repeated) to provide a flash rate of 96 (up to 120) frames/sec, which is above the display flicker threshold. The data is then synchronized with a headset or goggles with the right-eye and left-eye frames being precisely out-of-phase to produce a perceived 3-D image.
- Spherical or panoramic photographing is traditionally done either by a very wide-angle lens, such as a “fish-eye” lens, or by “stitching” together overlapping adjacent images to cover a wide field of vision, up to fully spherical fields of vision. The panoramic or spherical images obtained by using such techniques can be two dimensional images or stereoscopic images, giving to the viewer a perception of depth. These images can also be computed three dimensional (3D) images in terms of computing the distance of every pixel in the image from the camera using known in art methods such as triangulation methods.
- For example, U.S. Pat. No. 6,833,843, assigned to Tempest Microsystems Incorporated, teaches an image acquisition and viewing system that employs a fish-eye lens and an imager such as, a charge coupled device (CCD), to obtain a wide angle image, e.g., an image of a hemispherical field of view.
- Reference is also made to applicant's co-pending U.S. patent application Ser. No. 10/416,533 filed Nov. 28, 2001, the contents of which are hereby incorporated by reference. The application teaches an imaging system for obtaining full stereoscopic spherical images of the visual environment surrounding a viewer, 360 degrees both horizontally and vertically. Displaying the images by means suitable for stereoscopic displaying, gives the viewers the ability to look everywhere around them, as well as up and down, while having stereoscopic depth perception of the displayed images. The disclosure teaches an array of cameras, wherein the lenses of the cameras are situated on a curved surface, pointing out from C common centers of said curved surface. The captured images are arranged and processed to create sets of stereoscopic image pairs, wherein one image of each pair is designated for the observer's right eye and the second image for his left eye, thus creating a three dimensional perception.
- 3D computer animation relates to the field of “Virtual Reality”, that has gained popularity in recent years. 3D Virtual reality is constructed from real images, with which synthetically made images can be interlaced in. There also exists fully computer generated Virtual reality. 3D virtual reality demands 3D computation of the photographed image to create the 3D information of the elements being shot.
- This can be done in real time using active methods.
- For example, 3DV systems Incorporated (https://www.3dvsystems.com/) provides the ZCam™ camera which captures, in real time, the depth value of each pixel in the scene in addition to the color value, thus creating a depth map for every frame of the scene by grey level scaling of the distances. The Zcam™ camera is a uniquely designed camera which employs a light wall having a proper width. The light wall may be generated, for example, as a square laser pulse. As the light wall hits objects in a photographed scene it is reflected towards the ZCam™ camera carrying an imprint of the objects. The imprint carries all the information required for the reconstruction of the depth map.
- 3D computation of photographed images may also be provided using passive methods.
- Passive methods for depth construction may use triangulation techniques that make use of at least two known scene viewpoints. Corresponding features are identified, and rays are intersected to find the 3D position of each feature. Space-time stereo adds a temporal dimension to the neighborhoods used in the spatial matching function. Adding temporal stereo, using multiple frames across time, we match a single pixel from the first image against the second image. This can also be done by matching space-time trajectories of moving objects, in contrast to matching interest points (corners), as done in regular feature-based image-to-image matching techniques. The sequences are matched in space and time by enforcing consistent matching of all points along corresponding space-time trajectories, also obtaining sub-frame temporal correspondence (synchronization) between two video sequences.
- 3D computer generated images (CGI) is a virtual world, a designated area, created using 3D computer generated images software. The virtual world is created in a designated area where every point in the virtual world is a computer generated point, 2D or 3D real images may also be interlaced in this virtual world.
- Reference is now made to
FIG. 1 which illustrates a virtual world, according to techniques known in the art. - The 3D position of every point in this virtual world is known. Adding to certain points in the space details such as color, brightness and so on, creates shapes in space (
FIG. 1 ). Introducing a virtual camera into this world enables to create time based sequences in the virtual world, to create stereo images, and so on. - We can synchronize between photographed images and the computer generated world using space synchronization, and then time synchronization, fitting real world images in the virtual world in spatial and temporal terms.
- Reference is now made to
FIG. 2 which shows a prior art virtual studio. - In this example we use a virtual studio where the camera enables to create separation between a human figure and its background, in a technique which is known in art as blue/green screen. Isolating the human figure from its surrounding we can interlace the figure in the virtual world created in a computer, as shown in
FIG. 3 . - The very opposite thing can also be done by monitoring a set of cameras in a pre-determined space such as a basketball field, where known fixed points are pre determined, and synchronized fix points are created in a computer generated 3D world. With such a technique, we can isolate a CGI figure and interlace it in the basketball field. For example, ORAD Incorporated CyberSport™ product provides for live insertion of tied-to-the-
field 3D graphics for sport events taking place in a basketball field, a football field, and the like, creating the illusion that the inserted graphic objects are integral parts of the event. - As described above, traditional methods and systems for 3D imaging and stereoscopic photography are based on special cameras, special lenses, predetermined positioning of two or more cameras and dedicated algorithms.
- There is thus a widely recognized need for, and it would be highly advantageous to have a system and method for photography and imaging
- According to one aspect of the present invention there is provided an apparatus for 3D representation of image data, the apparatus comprising:
- a structure identifier for identifying structures in motion within said image data, and
- a skeleton insertion unit, associated with said structure identifier, for associating three-dimensional skeleton elements with said structures, such that said skeleton elements are able to move with said structures to provide a three-dimensional motion and structure understanding of said image data.
- According to a second aspect of the present invention there is provided a method for 3D representation of image data, comprising:
- identifying structures within said image data, and
- associating three-dimensional skeleton elements with said structures, such that said skeleton elements are able to move with said structures to provide a three-dimensional understanding of said structures.
- According to a third aspect of the present invention there is provided a recording apparatus for recording input data with depth information, comprising:
- a structure identifier for identifying structures in motion within said image data,
- a skeleton insertion unit, associated with said structure identifier, for associating three-dimensional skeleton elements with said structures, such that said skeleton elements are able to move with said structures to provide a three-dimensional motion and structural understanding of said image data, and
- a storage unit for recording said input data in relation to at least one of said skeleton elements and a background.
- According to a fourth aspect of the present invention there is provided a compression apparatus for compressing input data with depth information, comprising:
- a structure identifier for identifying structures in motion within said image data,
- a skeleton insertion unit, associated with said structure identifier, for associating three-dimensional skeleton elements with said structures, such that said skeleton elements are able to move with said structures to provide a three-dimensional motion and structural understanding of said image data, and
- a compression unit for outputting said input data in relation to at least one of said skeleton elements and a background, such as to provide compression of said input data and to provide depth information thereof.
- According to a fifth aspect of the present invention there is provided a recording method for recording input data with depth information, comprising:
- identifying structures in motion within said image data,
- associating three-dimensional skeleton elements with said structures, such that said skeleton elements are able to move with said structures to provide a three-dimensional motion and structural understanding of said image data, and
- recording said input data in relation to at least one of said skeleton elements and a background.
- According to a sixth aspect of the present invention there is provided a compression method for compressing input data with depth information, comprising:
- identifying structures in motion within said image data,
- associating three-dimensional skeleton elements with said structures, such that said skeleton elements are able to move with said structures to provide a three-dimensional motion and structural understanding of said image data, and
- outputting said input data in relation to at least one of said skeleton elements and a background, such as to provide compression of said input data and to provide depth information thereof.
- Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The materials, methods, and examples provided herein are illustrative only and not intended to be limiting.
- Implementation of the method and system of the present invention involves performing or completing certain selected tasks or steps manually, automatically, or a combination thereof. Moreover, according to actual instrumentation and equipment of preferred embodiments of the method and system of the present invention, several selected steps could be implemented by hardware or by software on any operating system of any firmware or a combination thereof. For example, as hardware, selected steps of the invention could be implemented as a chip or a circuit. As software, selected steps of the invention could be implemented as a plurality of software instructions being executed by a computer using any suitable operating system. In any case, selected steps of the method and system of the invention could be described as being performed by a data processor, such as a computing platform for executing a plurality of instructions.
- The invention is herein described, by way of example only, with reference to the accompanying drawings. With specific reference now to the drawings in detail, it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of the preferred embodiments of the present invention only, and are presented in order to provide what is believed to be the most useful and readily understood description of the principles and conceptual aspects of the invention. In this regard, no attempt is made to show structural details of the invention in more detail than is necessary for a fundamental understanding of the invention, the description taken with the drawings making apparent to those skilled in the art how the several forms of the invention may be embodied in practice.
- In the drawings:
-
FIG. 1 is a photograph ofprior art 3D computer generated virtual figures. -
FIG. 2a is a first photograph of a prior art virtual studio. -
FIG. 2b is a second photograph of a prior art virtual studio. -
FIG. 3 is a simplified block diagram of an apparatus for 3D image analysis according to a first preferred embodiment of the present invention. -
FIG. 4 is a simplified flow chart illustrating a procedure for inserting skeleton elements into a structural element identified from an image or series of images according to a preferred embodiment of the present invention; -
FIG. 5 is a simplified flow chart illustrating a modification of the procedure ofFIG. 4 for the case of a series of elements being recognized as a single body. -
FIG. 6 is a simplified flow chart showing skeleton insertion and its subsequent use in providing a three-dimensional understanding of 2d image data according to a preferred embodiment of the present invention; -
FIG. 7 is a simplified diagram illustrating a multiple-layer format to provide a 3D understanding of a 2D image. -
FIG. 8 is a flow diagram illustrating two methods of obtaining object identification from a 2 dimensional image in which to insert skeleton elements according to a preferred embodiment of the present invention. -
FIG. 9 is a simplified flow chart illustrating the process of using a skeleton according to the present embodiments in order to provide a 3D understanding of a 2D moving element in the image. -
FIG. 10 is a balloon chart illustrating a series of exemplary applications of the present embodiments of the present invention. -
FIG. 11 is a depth map, illustrating possible imaging processes in accordance with a preferred embodiment of the present invention. -
FIG. 12 is a skeleton attached to a depth map, illustrating possible imaging processes in accordance with a preferred embodiment of the present invention. -
FIG. 13 is a skeleton demonstrating the process of deformation of its structure illustrating possible imaging processes in accordance with a preferred embodiment of the present invention. -
FIG. 14 illustrates how a structural element in a series of images should be processed from a frame in which it is in a position of minimal distortion. -
FIG. 15 illustrates a photographed image supplying 3D information from a specific direction. - The present embodiments comprise a method and an apparatus for transforming time based sequences of photographed images into space based three dimensional (3D) models, enabling real-time and non real-time applications such as 3D real image animation, new time based sequences, image processing manipulations, 2D/3D motion capture and so on.
- The present embodiments identify structures within two-dimensional or partial three-dimensional data and associate three-dimensional skeleton or skeleton elements therewith. The skeleton or skeleton elements may be applied at a separate level from the original data, allowing the levels to be projected onto each other to provide accurate depth information to the image data.
- The principles and operation of a method and apparatus according to the present invention may be better understood with reference to the drawings and accompanying description.
- Before explaining at least one embodiment of the invention in detail, it is to be understood that the invention is not limited in its application to the details of construction and the arrangement of the components set forth in the following description or illustrated in the drawings. The invention is capable of other embodiments or of being practiced or carried out in various ways. Also, it is to be understood that the phraseology and terminology employed herein is for the purpose of description and should not be regarded as limiting.
- Reference is now made to
FIG. 3 , which is a simplified block diagram illustrating an apparatus for providing a three-dimensional understanding to image data. The image data may be two-dimensional or partial three-dimensional information, and the understanding is a unified understanding of three-dimensional structures and three-dimensional motion. - The apparatus of
FIG. 3 comprises astructure identifier 302 for identifying structures within the image data. As will be discussed in greater detail below the structures may be identified automatically using artificial intelligence or they may be identified with the help of user input, or a combination of both. - The apparatus further comprises a
skeleton insertion unit 304, associated with said rigid structure identifier, which associates or attaches three-dimensional skeleton elements with the structures identified in the image data. The skeleton elements may be blocks, tubes, spheres, ovals, or any other elemental or more complex three-dimensional geometric entities, the elements have the ability to add joints to themselves and attach to each other. The three-dimensional shape of the element is imparted to the structure identified as above and the skeleton element is now able to move or otherwise coexist with the structures to provide a three-dimensional understanding of the structure. - That is to say the skeleton element has a known three-dimensional structure, meaning it extends in the X, Y and Z dimensions. The structure's movement can be seen in the X and Y dimensions, and details of the structure's behavior in the Z dimension can be inferred from its association with the skeleton element.
- The structure identifier is preferably able to recognize not just individual structures but also complex bodies made up of interrelated structures, interrelated meaning that they have defined movement relations between them. An example is the human body, which consists of structures such as the forearm and the upper arm. The forearm pivots on the end of the upper arm in a defined manner, which can be modeled by the skeleton elements of the present embodiments.
- In the event that such a complex body is recognized from the image data, the skeleton insertion unit attempts to construct a correspondingly complex skeleton in which movement relations between the skeleton elements are defined as for the complex body. As will be explained below, one way to achieve this is to recognize the complex body, say as a human and have preset skeletons with the necessary elements and relationships preprogrammed in.
- Using such prestored or preset skeletons, the three-dimensional aspects of the complex body, including both structure and motion can be understood. That is to say three-dimensional structure and motion within the image can be understood from a priori knowledge of an identified body. Furthermore, if the depth information for the object is known within the system, based on the skeleton, then the processing load for three-dimensional processing of the image may be significantly reduced.
- The apparatus may further comprise a
movement analyzer unit 306 which may analyze relative movement within the original image data to provide movement relation definitions for theskeleton insertion unit 304. The movement analyzer is able to recognize structures within the mass of pixels that make up the image and to identify movement among groups of pixels, using tracking techniques that are known in the art. - A
skeleton store 308 stores preset skeletons for use with recognized complex bodies. The store may for example store a preset skeleton for a human, which is used every time a human is recognized in the image data. - Assuming the structure is not recognized as having a preset skeleton. Then the skeleton insertion unit attempts to form a skeleton from scratch by inserting geometric elements. However the geometric elements may need to be rotated and distorted before they fit. There is thus provided a rotation unit, 310 which allows the selected element to be rotated until it fits the image data, and a
distortion unit 312 which allows the element to be distorted in various ways to fit the data. The rotation and distortion units may be operated through user input or may operate automatically. - Having fitted the skeleton the structures within the image are now modeled as three-dimensional models. A
tracking unit 314 can track movement within the initial image data and move the skeleton with the image so that three-dimensional information of the motion is available. A process of projecting between the skeleton and the image data can be carried out, and it is possible thereby to obtain three-dimensional and movement information from a single camera. - An
animation unit 316 allows movement to be applied via the skeleton so that a figure or other object once modeled can be animated. - It will be appreciated that depending on the application, the apparatus will not necessarily have both the tracking unit and the animation unit. An animation application would typically have the animation unit but may dispense with the tracking unit, whereas a video capture application may have a tracking unit and dispense with the animation unit.
-
Rendering unit 318 is connected to either or both of the tracking unit and animation unit and renders a scene being modeled for viewing from a requested direction. That is to say, the advantage of having the 3D data is that the modeled objects etc can be viewed from any angle and not just the angle in which an image may have been initially taken. The rendering unit simply needs to make a projection of the three-dimensional model onto a plane in the requested viewing direction, apply texture etc as will be explained in greater detail below, and the scene can be viewed from the given direction. - Reference is now made to
FIG. 4 , which is a simplified diagram illustrating a process for obtaining a three-dimensional model including movement data, according to a preferred embodiment of the present invention. - Image data is obtained in
stage 402, this data may be 2D data or partial or complete 3D data. Elements within the data are identified. Skeleton elements are inserted for association with the identified structural element instage 406. Then instage 408 the skeleton element is rotated, translated or scaled in order to fit the identified structural element. Translation includes distorting. Movement relations are then defined between the skeleton elements as per information available instage 410. - Reference is now made to
FIG. 5 , which is a variation of the flow chart ofFIG. 4 for the case in which a complex body such as a human is recognized. Again, the initial data is obtained, instage 502. The complex body is identified from the initial data. The appropriate skeleton is retrieved from the data store instage 504 and is inserted in association with the complex body instage 506. Then instage 508 the skeleton is rotated, translated or scaled. Translation includes distorting. The result is to produce a fit between the identified structure and the skeleton elements as necessary to fit. It is noted that the very attempt to fit skeleton elements to the complex body as inFIG. 4 above, may lead to the identification of the complex body as say a human, so that an appropriate complex skeleton may be selected. - Reference is now made to
FIG. 6 , which extends the process ofFIGS. 4 and 5 to movement of the object being modeled.Stages 602 to 608 are as previously described.Stage 612 involves the skeleton modeling the 3D object so that movement of the object is projected onto the skeleton and/or movement of the skeleton is projected onto 2D image data. Then instage 614 the image data is available for rendering from any desired direction. - Reference is now made to
FIG. 7 which is a simplified diagram showing how image data may be managed in a layered structure according to a preferred embodiment of the present invention. The two-dimensional or partial or complete three-dimensional image data is stored in afirst layer 702. The three-dimensional skeleton is stored in anunderlying layer 706. A two-dimensional projection of the three dimensional skeleton exists in avirtual layer 704 in between. An ostensible two-dimensional image can be viewed from a different direction of that of the original 2D image inlayer 702 by projection of the three-dimensional skeleton into that direction. The projection is stored invirtual layer 704. - Reference is now made to
FIG. 8 , which is a simplified diagram illustrating how objects and structures in the initial data may be recognized for the purpose of assignment of the skeleton elements. - Two paths are shown, the first being a manual path,
stage 802, in which a user simply identifies the elements, bodies and complex bodies to the apparatus. As an alternative an automatic path can be provided for identifying the structures, consisting ofstage - It will be appreciated that grouping and the decision about whether to continue could be viewed as a single stage. Points or pixels are traced over a series of images and points that move together are grouped together. The process of grouping is repeated iteratively until a stable identification is reached.
- It is also noted that a mixture of the two processes may be used. For example a user may point out some elements or a complex body to the system, and the system automatically identifies other elements or identifies the individual elements within the complex body.
- Reference is now made to
FIG. 9 , which is a simplified diagram illustrating the iterative nature of extrapolation of movement into the third dimension using embodiments of the present invention. Pixels are tracked in the initial 2D or partial or complete 3D image,stage 902. The underlying skeleton is moving in accordance with the motion of points tracked in its associated structure,stage 904, and an extrapolation is carried out instage 906 to determine the three-dimensional position of the pixels in the initial image. It is noted thatstages - In a preferred embodiment of the present invention a computer generated time based photographic sequence may be constructed into a three 3D model.
- Input for the initial data may be provided by a module for the reception and digital recording of photographed images or video clips, for example from already recorded video clips compressed in any known in art video format, or from a directly connected single or multiple camera(s) using USB or any other digital or analog connection which is known in the art.
- Referring again to
FIGS. 4 and 5 , the initial data may be obtained, for example from one or more sequences of time based photographed images. Known in the art video or film sequences create time illusion of motion in the brain of the viewer. - The input data is analyzed. Preferably, the analysis involves a depth map construction of the input sequence(s), building depth maps for each of the time based sequences, and processing the depth maps, as described below in the algorithm section.
- In a preferred embodiment, the present method ultimately creates a 3D model for objects captured by the sequence(s) of the photographed images 530.
- These models are reconstructed from the real images, or from graphical clips or the like, where the time dimension is converted to the space dimension where all the figures and the static backgrounds are three dimensional models.
- These 3D models may enable many manipulations that were previously possible only in a computer generated 3D virtual world.
- According to a preferred embodiment of the present invention, individual figures can be identified in the sequence. Once identified they may be converted into stand alone 3D models. The movement of the figure can be compared with timings of the photographs in the sequence to provide a basis for mapping the movement of the figures from their progression across the photographs. Then it is possible to adjust the time lines separately for each figure to give a different sequence of events. In this way the figures being modeled can be morphed.
- For example if, in our sequence, two people cross the street and person A reaches the other side before person B, now, since we have each figure modeled separately, we can alter the timing of the individual figures. Thus, we may decide that person B should cross the street before person A, thus altering the original time line of the photographed sequence and hence carrying out morphing of the sequence.
- The apparatus enables the user to create several different kinds of outputs from the media based on the 3D space based models created. The user may use on the 3D space based models created with any external image processing, animation, broadcast and so on known in art programs, or may use an internal tool such as an editor. Such an editor may enable the user to create and edit two main kinds of outputs, linear media and non linear media. Linear media refers to time line based media, thus a sequence of images taken at specified time intervals. The user is able to create a clip based on time lined events he wishes to show. He is then able to export the results in a variety of viewing formats, thus for example: real time live video image processing, video clips, motion capture, stills images, DVD, spherical images, 2D images, stereoscopy images or any format which is known in the art.
- The apparatus of the present embodiments may also create a non time-lined, that is to say non-linear media. Such non-time-lined output may comprise for example, a 3D surrounding comprising a set of images, animation, and texts.
- The apparatus of the present embodiments provides the ability to present this output as a three-dimensional virtual environment in which say a user can fully walk through any route of his choice, reach any point, look 360 degrees around at that point, interact with any figure, and so on. There are many examples for such a non-linear output: computer games, medical surgery simulators, flight simulators, etc.
- The apparatus may include an animation editor, as per
animation unit 316 ofFIG. 3 . TheAnimation editor 316, is a tool which gives life to every object the user chooses, theanimation editor 316 also assigns to the object a certain movement such as a tree blowing in the wind or a walking human figure, with unique characteristics: how does he act when he walks, runs, when he is angry, sad, his face mimics, lips movement and so on. The animation editor may also attach to the object a set of predefined movements from computer animation or motion capture from an external source or using the apparatus motion capture tool, and can also define a set of movements and characteristics that characterize every object, a little limp for example, wrinkles in his forehead and so on. These movements are characteristics that assist in creating the personality of the figure. The animation editor may also allow creating voice characteristics using the apparatus motion capture tool for an object, which may enable him to speak. - The software preferably employs the method and algorithms that are described and illustrated below.
- The 3D space based model creation in greater detail
- The basic platform of a preferred embodiment of the present invention is placed in a computer generated 3D axis engine, utilizing three vectors, corresponding to the 3D axis engine, and a space-time vector as explained below.
- Input image sequence S is the sequence of images which is input to the platform.
- A preferred embodiment may implement the algorithms described below.
- Sequence S is divided in to Nf(s) number of frames in the sequence, for example 25 fps in a PAL video display standard.
- The first Frame of S0 (the first sequence) is denoted as s(0,0) (the second frame of the first sequence as s(0,1) and the last frame as s(0,n) (=>Nf(S0)=n+1 frames).
- A number of anchor points are used. The anchor points have two major elements, one element is the correspondence between the elements inside Si (were 0≤i≤number of sequences), and the second element is the correspondence between Si and the 3D axis engine denoted as F.
- Input, Depth Map & Anchor Points
- An algorithm, according to a preferred embodiment of the present invention, may receive S0 as input and may use all the sequence frames therein for generating depth maps for the sequence. Factor D is defined as the depth vector of s(0,0) (will be defined later). Assume that z—is the set of depth values of different pixels from frame s(0,0). In s(0,0), d(0,0), is the set of points from the frame which depth value equal to z0, where 70=min(z). d(0,h) is the set of points from the frame witch depth equal to zh, where zh=max(z), and {zi|min(z/zi), i=0 . . . h−1} (where z/zi means: the set z without elements from z0 till z[i−1]), so {z0, z1, z2, . . . , zh}—is the set of depth layers of frame S(0,0)=> the vector D0, and this set is sorted in down-up order, it's clear from the definition that the numbers in this set are the layers of the vector D0 with respect to F's resolution factor as will future be explained. For example: D0=2,5,6,9,13,56,22,89) then (z0=2,z1=5,z2=6,z3=9,z4=13,75=22,z6=56,z7=89). D0—is the depth vector for S(0,0), D0=; d(0,0), d (0,1), . . . d (0,h).)
- D of frame S(0,i) is a 3D matrix. The D of frame S(0,i)—is a vector of 2D mask matrices. If the depth of pixel d(i,j) is not defined for some reason, so d(i,j)=infinity, and will be defined in the
boolian 2D matrix as 0 - Note: if depth value of the pixel from frame d(i,j) can't be defined under the data from the 2D image from frame S(0,i), the algorithm will try to define it using the data from multiple frames from the same sequence S (the sequence from which we take S(0,i),) If the depth map of s(0,0) or a part of it cannot be defined due to bad lighting for example, the SP treats the d(0,0), or a defined part temporarily also as “co” (infinity), and using s(0,1) . . . s(0,i) Si (were 0≤i≤number of sequences), it tries to compute s(0,0).
- In the case the frames (S(0,i)|i=0.1, 2 . . . n} (ϵS(sequence no., frame no.)) depth map is computed, D0={d(0,0), d (0,1), . . . d (0,h)}(ϵD(frame no., depth enc, points)). The software finds the supremum of values from depth map matrix (max of depth values set)
- SP also finds infimum of values from depth map matrix. If all the frames of sequence SO depth map are successfully processed. SP finds the supremum and infimum anchor points of sequence S0 in every defined moment in time.
- In s(0,0), d(0,0), is the nearest point of depth. The deepest point of depth in s(0,0) is denoted as d(0,h), D0ϵ{d(0,0)5 d (0,1) . . . d (0,h)}.
- The factor D is a Class of depth vectors in the algorithm, where several D vectors are used to analyze the data as a working tool to correlate the image depth structured maps. The SP structure map is built in F, using multiple new matrixes that are opened inside F for modeling static and moving elements, and for representing parts of elements (for example hands, legs and so on). D is built such that every point along this vector contains its corresponding depth information at a current depth, and furthermore expresses the depth values along the depth slice of every point just as altitude lines in a topological map do.
- D is a 3D matrix, built as a 2D boolian image matrix (x,y) for every Z point along D5 marking “1” in every 2D image matrix (x,y), only the information included in the image in the corresponding depth point (Z).
- To find anchor points of reference between the frames of the video sequences SO, the system proceeds as follows:
- {x,y,z}=horizontal, vertical, depth)={(1,0,0),(0,1,0),(0,0,1)}—is spatial vectors from F, we will call it “world coordinate system”
- Now look at some frame S(0,t), t=some number, this frame has his local coordinate system {index of column, index of row, depth}
- Let's imagine the k-th anchor point on this frame. It's position in the frames local coordinate system is: (utk=i, vtk=j, depth(i,j))
- The span(base) of this frame local coordinate system, in the world coordinate system is: {it, jt, kt}—were it, jt, kt=(it×jt) are vectors in the world coordinate system
- In frames local coordinate system it—is (1,0,0), jt—is (0,1,0), kt—(0,0,1)
- To find anchor points of reference between the frames of the video sequences S0, The system treats each frame (S(0,0), S(0,1 . . . S(0,n) as a sub space (that is a vector space itself) of the vector space S0, that is above the field F. The system computes the vector Base W0ϵS(0,0), and where the span of W0(W0=Sp{w1, w2, . . . wm}) extends over the sub space S(0,0). There can be several different bases for each subspace but the span of each basis extends over W0, and the roles of the vector base vectors are similar to the ones known in art of Math. These vectors create the sub space W(0,0)ϵS(0,0).
- Depth Alignment
- The depth alignment for rigid objects such as an image background is carried out in two stages. In the first stage, The system finds the vector base of W(0,0) where the Sp(w1, . . . wm)=W(0,0)ϵS(0,0). For fast alignment the system creates 4 vectors of reference from the vector base W(0,0) Horizontal, vertical, Depth, space/time vector.
- The first vector ZϵS(0,0) reflects the number of base vectors in every point of d0ϵS(0,0) and creates a Z vector which expresses the depth information of the base vectors in the frame. The midpoint of Z is also expressed as d(anc) and is the midpoint of the frame itself.
-
- d (anc) can be a point that the system temporarily marks as the 0 point axis XYZϵF. The horizontal and vertical vectors express the vectors in every horizontal and vertical point of the image matrix along the Z vector, the fourth vector of references is a space/time vector which is used as the transformation vector from the time dimension to the space dimension. Now the system has created 3 reference vectors for the alignment unified as D′, to be used between S(0,0) and S(0,1).
- The differences between frames may be a factor of lighting, moving elements inside the frame and camera behavior such as track in/out, track left/right, crane up/down, tilt up/down pan left/right and Zoom (regarding optical or digital photographs—the difference may be found in the amount of pixels per inch, that is lower in digital zoom). The different shifts between the frames is mostly found in the form of the location of the pixels, thus there may be a shift of the pixels of some manner between the frames, and so SP 1 computes the three reference vectors of the frames of S0 as a function of the space/time vector. Three corresponding vectors are constructed for the 3D alignment of the images where the vertical and horizontal vector correspond to a spatial window (X,Y), and the Z vector corresponds to the depth vector.
- Each factor in the spatial (X,Y) vectors reflects the base vectors of the image in the spatial domain along the Z vector in every point of the image.
- The matching function should aspire to zero difference, or up to a minimum predetermined point between the vectors V(h0/v0/z0) of image 1 and the vectors V(h1/v1/z1) of image 2, targeted to find 0 difference in as many points as possible. Regarding the alignment of the unified section of the vectors, at the respectively opposite edges of both vectors there may be an inconsistency at the points of difference between the frames. These points of difference may refer to the different information that may be added to the new frame but does not appear in the previous frame.
- V′0ϵV0ϵV0∩V1,
- V′1ϵV1ϵV0∩V1,
- The three vectors are the outcome of the three dimensional positional information of the images and have no relation to the visual information but rather represent the base vectors of the image in every point.
- Preferably, the horizontal, vertical, and depth vectors are compared with each vector separately, to find minimal differences in as many points as possible.
- There may be inconsistency at the points of difference between frames. These inconsistencies may indicate that different information appears in one frame that does not appear in a previous frame.
- Dealing with Distortions and Camera Movements
- An optical element such as a lens of a camera creates distortions of the photographed image and may create minor differences in the depth map due to distortion of the same object.
- Reference is now made to
FIG. 14 which shows two images and illustrates a point about camera movement distortions. - In first frame 1401 a
stone column 1403 appears in the center of the frame. In asecond frame 1405 thesame column 1403 is on the right part of the frame. Now there may be some distortion as a result of the optical process and the depth map may bear minor differences that are due to those distortions. Consequently, there may be some discontinuity in the outcome results of the aligned images, which discontinuity is the outcome of these distortions. In other words the structure of the block appears to change for optical reasons as it moves from the center to the side of the frame. - The solution is to identify a best image of a given object as one where it appears relatively centrally in a frame. The pixels receive the 3D location obtained from this most accurate measurement.
- When aligning the Z vector, we may also suffer from differences in the zoom factor between the images (optical or Digital), or track “in” or “out” of the camera.
- The smaller is the shift of the camera between the frames, the bigger is the correlation between the vectors, and the better the result is. The bigger the difference between the images, the smaller is the correlation between the vectors. SP sets a threshold for the deviation, and regards locations having a bigger difference as pertaining to different objects.
- After aligning the unified sections of the horizontal vertical and depth vectors:
- V′ h(0)ϵV h/v(0)ϵV h/v(0)∩V h/v(1),
- V′ h/v(1)ϵV h/v(1)ϵV h/v(0)∩V h/v(1)
- V′ z0ϵV′ z0ϵV z0∩V z1
- V z1ϵV z1ϵV z0∩V z1
- The unified sections are now treated us a sub space and the vectors are recomputed as a reference of this sub space. The zoom factor comes into consideration, by computing vectors with a “Scalar” factor over the F field, a scalar that may multiple the vectors or divide them and thus mimic the zoom/track in or out of the camera, where the same relation between the elements in the frames is preserved, but the resolution may vary. Using the assistance of the scalar we can align the vectors of S(0,0) with S(0,1). This process may align the images, and may also point to the alignment direction of the next frame. The space/time vector relates to the transformation from the time domain to the space domain, the new alignment is now regarded as a unified frame e F, and the next frame is aligned with the previous unified frames. This can also reduce the computation especially when the frames repeat already aligned areas. The Space/time vector is the reference vector for the transformation from the time dimension to the space dimension.
- Dealing with Inconsistent Depth Information
- In the event of an inconsistency of depth information the apparatus preferably opens a new vector plain, denoted as F1. The new plain is an empty XYZ coordinate system in which the process of this algorithm starts from the beginning.
- At the end of such a process the user of the system is asked if he wishes to leave F0 and F1 as different locations or rather choose to align them. The user may then be asked to manually align F0 and F1 (Alternatively, the user may command the system to automatically align F0 and F1) using tools such as rotation, zoom, flip and so on, in order to manually align the two structures.
- After the user manually aligns the F0 and F1 he commands the system to compute this alignment and the system tries to align the fields using the alignment algorithm. If the fields are well aligned then the system announces it, if not—SP asks the user to set a lower standard for the misalignment factor (less accurate alignment), The system further provides the user with a tool box for overcoming discontinuities in the image plain using known in art image processing tools.
- SP Resolution
- The system 1 defines the temporary resolution of “F0” (the Field of the XYZ axis) sign as “R0”. R is defined by the number points of reference per inch. The resolution is the outcome of the combination factor of image resolution in terms of pixels in the time dimension, and the combination of points of depth in the space dimension. A resolution tool can assist, as an example, in the alignment of two video clips shooting the same location from different distances.
- For example a table, may be shot in a high resolution clip where there may be more points of reference between the parts of the table for example from one leg to the next, or from a closer location compared to a second clip that has lower resolution, or using digital zoom or from a greater distance resulting in a lower number of points of reference.
- A point of reference for dealing with the resolution issue is the 3D location of every pixel with reference to reality. Thus—the 3D location of the pixel in the space dimension is the computed position after transformation from the time dimension. The resolution allows D0 to correspond with S0. The middle point of D0=d0 (anc) may be temporarily placed in the center of the axis field at the point (0,0,0)ϵ(X Y Z)ϵF0
- The visual information of the points of reference may be layered in F0 as the visual layer of information, as further explained below.
- Identifying & Reconstructing Moving Elements
- In the cases were moving elements appear in the image. The skeleton consists of moving graphical elements and defines their relative positions and movement patterns, so as to construct a highly accurate 3D geometrical model of a moving element, preserve its motion capture, and attach the photographed visual information to the 3D model. The system enables the automation of the identification and reconstruction process.
- First the system has to learn that it has a moving element in the data image sequence. The next stage is to identify the element in the data image with a presorted or user defined skeleton element. Finally the system carries out a reconstruction of the 3D structures of the element using pre determined 3D structures & skeletons or the system creates a stand alone new 3D structure built gradually, based on the characteristics of the element.
- Moving elements that add different information than their background with respect to the camera can be semi static objects that add minor information over time, such as a tree which moves in the wind, or a person who crosses the frame, turns around and steps out of the frame on the other side.
- As mentioned above, the system firstly learns that it has a moving object in the sequence. Next, the system identifies this object using a set of predetermined 3D elements or skeletons. Alternatively, the user may define and attach a skeleton or elements to the figure.
- Then the system constructs the 3D structure of the figure using the predetermined 3D elements or skeleton or a new user defined element.
- For identifying that there is a moving element in the frame, the system searches for discontinuity of depth pixels in the sequence over space and time. That is to say that there may be a certain special 3D structure in SO that is not coherent with the solid points of SO with respect to the camera and background in the space dimension, but rather changes its information over the time dimension.
- In other words there is a misalignment of space over time. For example, if we shoot a table using a camera which moves to the right, the table first appears in the right side of the frame and then moves toward the left part of the frame.
- If there is a 3D element in front of the table whose information varies over the time dimension the system may conclude that there is a moving object in the frame. The system reconstructs a 3D model of the moving element, were the table is a static element.
- For the creation of the 3D element in a dimension-based image processing, the matching vector can be constructed from the region around an element in question.
- A rectangular window of size N×M×Z can be chosen, thus a 3D Matrix. N and M are the spatial sizes of the window, and Z is the depth dimension. A fourth vector can be provided to define a transformation dimension of the element or object from the time dimension to the space dimension, to lead to the construction of the 3D elements and figures. Matching the element both in the time and space dimensions enforces a consistent matching of all points along corresponding 3D structure maps that may be separately built for each element or object.
- The algorithm of the present embodiments is based on the projection of the 3D information structure on to the 2D image to assist in the tracking of the moving element in the frame in relation to its background and the absolute 3D surrounding.
- Identification of the Element
- identifying the current element from the image data may be carried out with the assistance of a set of presorted 3D structures. The system operates in steps to determine the form of the element in question or parts of it, up to the identification of the whole structure, and also assists the user to construct new structures.
- The system may be provided with a data base comprising a set of 3D structures for the skeleton elements, beginning with simple 3D geometrical 3D models such as a ball, box, pipes and so on, and up to full skeletons of rigid and non rigid bodies. Skeletons can be rectangular regions for identifying and modeling a car for example, and up to animals and human‘s’ skeletons as shown in
FIG. 12 . - A skeleton is a complex 3D data structure that includes three crucial elements:
- 1. the physical assembly of the skeleton, that is the shapes and interrelationships of the constituent skeleton elements
- 2. shaping information of the skeleton according to the
input 3D information, - 3. incorporating of internal information such as the physical structure of a body (bones, muscles, joints and so on) and physical behavior of a body.
- The above three aspects are required for the identification and reconstruction process, according to the algorithm of the present embodiments.
- 1. The assembling of the skeleton refers to taking the structure of the skeleton and defining its parts, down to the smallest definition of body parts, in the sense that from these parts, the skeleton elements, the system can understand and build the body in question or build new bodies at the request of the user.
- For example—the human arm may be based on a 3D cylinder, connected via a joint to another cylinder which may represent a hand. In another example, the head may start with the simple figure of a 3D ball, and may be connected to a joint which represents the neck. The neck in turn is connected to a big cylinder which represents the trunk. The different physical behavior of the skeleton's parts and physical behavior of the individual elements in humans, animals, and so on are incorporated to reconstruct the basic configuration in question, thus assisting the system to identify and reconstruct the figure.
- 2. The ability to shape the skeleton according to the
input 3D input is used in the identification process and in the reconstruction process as explained below, with respect toFIG. 13 which shows a skeleton in which a part thereof undergoes a deformation. - 3. Internal information such as the physical structure of the body (bones, muscles, joints and so on) and its physical behavior are used in the identification process and in the reconstruction process as explained herein below.
- Using the set of 3D structures and skeletons, the system determines the identification of the element. This process can be done automatically or manually by the user as shown in
FIG. 8 , and involves identifying the element in question to the system and attaching an internal skeleton to the figure or building a new structure. - As a moving object is located in the sequence of frames, the system attempts to identify it and attach it to a set of matching skeleton elements, chosen from a previously defined set of skeleton elements, or a specific skeleton, defined for the moving object by the user, preferably using a set of tools which is provided by the system. Preferably, the attached skeleton elements are automatically adjusted to the size, shape and movement pattern of the moving object, so as to fit the moving object in terms of size, shape and movement pattern. The system completes the set of skeleton elements with an appropriately overlaid texture.
- In a preferred embodiment of the present invention, the system further provides tools for extrapolating the moving objects onto a 2D plane for any desired point of view.
- Exploiting the properties of the 3D structure based alignment allows us to match information in various situations such as between different video sequences, matching under scale (zoom) differences, under different sensing modalities (IR and visible-light cameras) and so on.
- The creation of 3D structures of elements from the moving objects is the basic factor that assists the system to handle differences in appearance between different sequences.
- An element may be attached with a basic skeleton made of tubes and joints attached to arms legs and body, and a ball to the head, the depth alignment may add new information to the creation of the 3D element structure and to the correlation with the 3D figure, such as the physical behavior of the basic skeleton, length and thickness of the tubes with respect to the arm, body, legs, the size of the ball attached to the head and so on.
- The construction of the full 3D figure out of these separate tubes and balls may reveal their mutual behavior—how they are attached to each other, or move. At his stage, the system determines what kind of element it faces, or decides that it cannot determine what the element is and asks the user to assist to determine the figure in question, or the user may also form a new structure that does not exist in the basic set of predetermined figures.
- As described above, the software tries to identify the structure of the moving object, as much as possible using its depth information. With the assistance of a set of previously defined 3D elements, step by step, the software determines the form of the object parts, to complete the full structure even if some of the visual information does not exist.
- The first step is to identify the object and determine its basic form. Then the system attempts to complete it as much as possible.
- Using the basic form, one can learn about the element in the spatial domain and the depth domain. SP tries to reconstruct the object details using a set of 3D skeleton elements (such as a ball, a box, pipes and so on.)
- 3D Structure Maps of Moving Elements Using a Single Camera
- The system may receive a
full depth 3D map of an image. There are known in the art algorithms for constructing depth maps of images including its moving elements. Depth structure maps using space time stereo algorithms for example, make use of at least two cameras. - There are known in the art algorithms for constructing depth maps of images of static surroundings using space time stereo algorithms for example, with a single camera.
- There are known in the art algorithms for creating static models from video sequences using one camera, also without extrapolating the depth map.
- The present algorithm can be used for extrapolating depth maps of moving elements using a single camera, as described above.
- For construction of a depth value of a moving element the system may use a pre-acquired depth value of the static rigid background using known in art algorithms, and refer to the moving element as a stand alone 4D matrix with relation to its background, using reference points.
- The projection of the 3D information structure (such us a pre made 3D skeleton) on to the 2D image plane assists in the tracking of the moving element in each frame in terms of the depth axis. Together with the relation to the projection of the 2D image plain in to the 3D space, there is provided the ability to create depth maps of the element. The attachment of the skeleton and organs fits the image to the depth maps, synthetically duplicating any object and capturing its motion. The latter process may further involve overlaying texture of the element on the reconstructed skeleton, completing the re-construction process as further explained below.
- The present method thus forces the creation of the 3D map of the moving element in the frame.
- The first step of depth extrapolation is the tracking of the 2D position of every pixel along each frame, creating a trajectory for each pixel.
- This tracking is performed using known in art tracking algorithms.
- As previously explained, passive methods for finding the same pixel on two images and also along the time dimension use alignment of color, shade, brightness, and ambiguities of the pixels to locate the same pixel in two frames, and along the time axis.
- The present tracking algorithms in the 2D image plane in time lack any projected pattern to assist the identification and thus tend to collect errors over time.
- The above observation is especially true if one tries to implement the above depth extrapolation algorithms to track the same picture or pixel from one camera in a movie clip of a moving 3D element, where one has hidden information (a hand behind the body that appears, change of lights and so on). The independent movement of pixels incoherently with the background results in problems such as finding the wrong tracking points from one frame to the next few frames. Furthermore, using one camera with a single angle has disadvantages in that the camera does not have a full simultaneous view. Thus in imaging a person, the person usually has two legs and two hands. However the 2D-based tracking technique often does not distinguish between the legs of a person or the hands which for some of the frames can be hidden from the angle of the camera and suddenly (re)appear with no continuity. ( ) Thus tracking in 2D becomes a complicated challenge Extrapolating depth value from such 2D tracking cannot result in a real depth map.
- However, when using the 3D skeleton as the projected data structure on the 2D image plain, each frame is on the one hand a 2D projection of a 3D data structure, and on the other hand a 3D projection of a 2D data structure, with identified organs of the body, say hand, left leg, right leg and so on.
- The result is that tracking mistakes cease to occur. Since the 3D posture of the skeleton is projected onto the 2D plain along the time axis, exact tracking with infinite new tracking points in every frame, is generated from the 3D projection, so that the system knows where the hidden parts of the 3D body are and where they are in the 3D space. The system may just project the 3D parts onto any requested 2D image plain, even when the parts requested are currently invisible to the
source 2D image. - A predetermined 3D skeleton is projected onto the 2D image plain. The system in effect creates a shadow like image in an additional layer of information as explained above with respect to
FIG. 7 . The extra layer of information pinpoints the parts of the image that need to be tracked, cutting the errors immediately and preventing their growth. Such a stage allows for tracking and extrapolating the depth of a moving element such as a walking person, who includes both rigid and non rigid elements. The 3D skeleton may then be used for extrapolating the 3D depth map of the already tracked 2D element in motion. Using the infinite points located on the 3D skeleton to force trajectories along time of pixels in the 2D image, using reference points is made possible in that the 4D matrix referred to above surrounds the elements as projection points in the 3D space with respect to the 2D space and vice versa. The system is thus able to use triangulation along time and 3D tracking of points using the 3D skeleton data structure to force the creation of the exact super-resolution level required, with depth map information of the moving element in each frame. In an example work flow may proceed as follows: - Given M—is an R.G.B Matrix of 2D (x,y) pixels,
- n—Number of frames.
- A—a moving element with respect to a background,
- B—the shadow like layer—a 2D matrix of a gray scale “Shadow” figure,
- Q—Feature points in every frame, with defined threshold,
- T—Trajectories (2D point locations vector of Q),
- δ—The transition function of T on Q.
- K—The number of frames that Q has (=the length of T),
- Z—3D extrapolation.
- The input is thus “M” with n frames. The system identifies the moving element, as is explained elsewhere herein, and there then follows a process of aligning F(b)<-G(A) the 2D projection B of the 3D skeleton on A. Alignment is for an initial defined threshold which can be changed.
- The process continues by searching and tracking Q, thus creating a T for each Q, where δ is δ(f(a,b)i, and q i−1) is the function of the trajectory vector. Tracking achieves the location of the feature Qj in frame i on image a and shadow b, and adds it's location in frame i+1, and so on for (k) number of frames. It attaches the new point in 1+1 to image B, the new information on frame i+1 will enable to move image B according to the movement of image A. Thus the leg in B will follow the leg in A. The process then receives a new infinite number of Q accurately positioned in every new frame. For each T we will extrapolate Z, and the output will be an exact super resolution depth map of the moving element.
- The Z dimension may then be extrapolated using reference points from the 4D matrix surrounding the element at t, and t+1 and so on. This may be done with respect to the camera's motion and focal point with respect to the background. Rays from the reference points then enable the computation of:
- the 2D transformation with respect to the 3D data structure.
- the 3D transformation with respect to the 2D data structure of pixels, or feature points for using triangulation extrapolation of Z, or for tracking the 3D position of pixels or
- feature points for creating an exact super resolution depth map of the moving element, from a video clip of a single camera.
- The depth extrapolation process could be for example the following.
- Assume (a, b, c, . . . ) are 3D ref points for which we know the 3D coordinates.
- t—is a 3D point (pixels or feature) on the element in time T (whose 3D coordinates we wish to find), and
- ‘t+r is the same 3D point but at time ‘t+1’(for which we also want to find 3d coordinates.
- Projecting rays from the ref point to t and t+1 creates triangles [t+1 t, a], [t+1, t, b] [t−1, t, c] . . . .
- In every triangle we know the 3D coordinates of its reference point. We have 6 unknowns (every triangle consists of the unknowns {t,t+1}, were t,t+1 are 3D points.
- In the 3D space all triangles must have the same distance between ‘t’ and ‘t+11, solving this equation system we find t and t+1 for each pixel or feature of every Q in every frame, and compute the 3D coordinates of the triangles, out of the 2D projection of the triangles on the image plane.
- The construction of 3D structure maps of the elements in motion assists the system to further fully reconstruct the 3D model of the element, recover the 3D geometry between different sequences, and handle differences in appearance between different sequences.
- 3D Reconstruction
- Following is an explanation regarding the creation of the element's model, using the projection of the 3D skeleton onto the 3D depth maps and the depth maps into the 3D SP space, while attaching the skeleton organs fitting it in to the depth map's formation, synthetically duplicating it and capturing its motion. The last process will be to overlay the texture of the element on the reconstructed skeleton, completing the reconstruction process as will further be explained.
- The present method enables forcing of the creation of the 3D model of the moving element in the frame.
- The present algorithm is space based in concept. On the one hand the projection of a 3D information on to a 2D image plane also enabling extrapolation of 3D information, and on the other hand using the space dimension based algorithm projects the 2D world with its 3D depth maps into a space based 3D world.
- The N×M×Z window referred to above, which was chosen around an element in motion, is actually a 3D matrix (that turns in to a 4D matrix), a new (XYZ) axis field “f”, where the user can attached a pre defined internal skeleton or parts of 3D figures (tubes, joints etc).
- The process of depth extrapolation also includes the identification of each pixel and feature movements between frames, creating a 2D motion flow of pixels and features over time. The system transforms the 2D motion flow into a 3D motion flow.
- Given a 3D depth structure map of S(o), or a set of 3D trajectories, or as will be explained while still in the process of depth extrapolation, the system may use the reconstruction algorithm to correspond between the factor D of each frame (and the unified frames) with the internal attached skeleton over time (with its own factor D′) to define and construct the proportions of the internal skeleton transforming the 3D matrix in to a 4D matrix over the space & time dimension using the fourth vector referred to above, that is transformation between space and time.
- The process of depth extrapolation and reconstruction is intuitively speaking a layered machine in which a 2D reflection of a 3D structure is layered underneath the 2D image matrix. The 3D structure itself is layered underneath the 2D reflection of a 3D structure being used to construct the synthetic 3D reconstruction of the element in the frame. This three-layer structure is as described above with respect to
FIG. 7 . Working under the space dimension enables reconstructing the 3D structure, and texture as will be explained herein, and even preserving the element's motion where the output is motion capture of the element in the frame as a 3D model. Or for that matter the output could be a specific 2D projection. - For the reconstruction process of the moving elements, the system may create a full 3D structure depth map or receive a full 3D structure depth map of the moving elements. The static surroundings are modeled separately from the moving elements as previously was explained.
- The present embodiments enable a full 3D super resolution reconstruction of a 2D body such as a human in motion from an original 2D or partial 3D image, to a 3D model. The process involves also capturing the 3D structure texture and motion, constructed on the base of an internal 3D skeleton. The skeleton may be built with full internal bones and muscle using a full skeleton physics' data base. The system enables infinite manipulation on the 3D reconstructed model, for example for animation, motion capture, real time modeling and so on.
FIG. 12 illustrates the building up of a full anatomical model from individual skeleton elements. - Projection is carried out using the 3D depth map of the image and body, using the 4D matrix around the element with respect to the background, using the reference points.
- Elements are identified as explained above with respect to
FIG. 8 , using automatic identification or manual identification and attachment of the 3D skeleton or parts thereof to the 3D depth map as previously explained. The system projects the 3D data structure, namely the 3D skeleton into the 3D depth map, - Tracking of the 3D movements of the element over frames is based on the DTM optical flow of the pixels and trajectories. Tracking allows for learning as much 3D information as possible on the 3D formation of the element, and interpolating the skeleton's 3D structure onto the depth map acquires its 3D formation. The skeleton's 3D structure preserves the learned information of the element over time, in order to design the skeleton's 3D structure to as accurately as possibly provide the element in the image.
- For example, let us consider a photographed man. A first stage involves assigning a skeleton to the object. A second stage involves using the skeleton to learn the details of the object, such as the structure of the face, eyes, nose etc. Having learned the kind of structure from the skeleton it is now more a process of expecting to receive certain details and adjusting the right 3D details to the figure.
- For example, after recognizing that the structure in the image is a man, the system, in accordance with a configured policy, expects to receive 3D and visual information of, for example, the eyes, nose, etc. Since the SP expects the eyes and eyebrows, for example, in certain areas of the head, it is easier and faster for the SP to analyze this information with respect to the 3D figure.
- Computing the location and distance between the organs of the moving object, assists the SP to more accurately estimate the relations of the other organs of the object and coordinate this information with that of the 3D figure.
- By using the space dimension in the image processes, there is an added value to the movement of the element over the time dimension with respect to the fact that every different frame supplies more 3D and visual information to the construction of the 3D element and figure.
- Parallel to the reconstruction process the present embodiments can be used as a motion capture tool to acquire the movement of the element, enabling the user to motion capture the element in the frame, not only as a 2D image but as a 3D model with texture.
- The process of the reconstruction can be carried out in the following way.
- The initial configuration is a 4D matrix, the input will be a DTM.
- The DTM can be from an external algorithm. The DTM can also be from the depth extrapolation algorithm of the present embodiments. In terms of processing time, the modeling process is a parallel process to the depth extrapolation process, where intuitively speaking several matrixes are located one beneath the other, in which the first matrix is the
image 2D matrix, below that is the 2D projection matrix (of the 3D structure), and below that is the 3D data structure. The input also includes 3D trajectories based on the 2D tracking of the pixels and in particular of feature points that are set along the frames to allow movement to be followed. The feature points may be based on color or other properties easy to track. - The trajectories are transformed into the DTM, and the system transforms them into 3D trajectories that mark the 3D position of pixels and feature points along the frames.
- The system sets a constraint between the projection of the 3D skeleton, and the input depth maps, along the time axis making an exact tracking with infinite new tracking points in every frame generated from the 3D projection with identified organs of the element. It thus knows were the hidden parts of the 3D body are and were they are in the 3D space. The system carries out 3D tracking of the points using the 3D skeleton data structure to force the creation of the
exact supper resolution 3D model of the moving element. - The work flow, may be as follows: Given:
- E3d—sequence of DTM's—3D Matrix of (x,y,z) of the moving element,
- n—Number of frames,
- S3d—the 3D skeleton.
- Q3d—3D feature points in every frame.
- T3d—Trajectories (3D point locations vector of Q3d),
- δ3d—The transition function of T3d on Q3d.
- K—The number of frames which Q3d has (=the length of T3d),
- Model—the reconstructed 3D model.
- The system aligns F(E)←G(S) such that the 3D skeleton S is aligned into the DTM-E. The system uses T3d to 3D track the Q3d on Et to the next DTM Et+1, where δ3d is δ3d (f(s,e)i, q3d i+1) the function of the trajectory vector, the location of the feature Q3d j in frame i on skeleton S and DTM E. and adds thereto its location in frame i+1, and so on for (k) frames for each Q3d. For each frame it attaches the new point in 1+1 to S, and the new information on frame i+1 allows alignment of S according to the new position of E (e.g: leg in S will align with the leg in E).
- The result is the ability to receive a new infinite number of Q3d accurately positioned in every new DTM. Factor D and D′ enables the system to change the formation of S, according to the formation of the collective 3D information of E3d, in the 4D matrix surrounding the element & skeleton in t, and t+1 (and so on). The system infers from the model where limbs and other elements may be expected to appear in the following frame. D is the key factor in the complex mathematical structure that synthetically duplicates S3d in the form of the 3D element. The system transforms the 3D skeleton in to a new data structure, collects and saves the gathered formation in the sequence of the DTM's on the 3D skeleton data structure, by tracking the 3D position of the pixels or feature points for creating an exact
super resolution 3D model duplication of the moving element. - Assume {a, b, c, . . . } are the points with 3D coordinates on E3d i. An estimation is done to attach the corresponding points on S3d i {a′,b′,c′} to the points in E3d i. A system aligns the factor D′ of the sub space of the 4D matrix to the factor D of E3d i, aligns S3d i as a unified unit and also splits S3d in to predefined miniature 4D matrices which each hold one D1 factor as a stand alone 4D sub space, thus, reconfiguring the formation of S3d i to the formation of E3d I, and then on to i+1 . . . . The output is then an exact super resolution reconstruction of the element (shape and texture as will further be explain), and the 3D motion capture of the moving element.
- The texture overlaying is part of the modeling process, as will further be explained hereinbelow.
- The above mentioned constraint allows for fully reconstructing the 3D model of the element. It enables the system to also recover the 3D geometry between different sequences, and handle differences in appearance between different sequences. Exploiting the properties of the 3D structure based alignment allows us to match information in situations which are extremely difficult such as between different video sequences, matching under scale (zoom) differences, under different sensing modalities (IR and visible-light cameras) and so on.
- In the case S0 was shot while an element was changing velocity differentially over time & space as in the example of blowing up a hand held air balloon, three options exists as a stand alone or a unified solution. The system can model the balloon with a changing velocity over time and mark the frame or series of frame or the strangely behaving object with the problematic configuration, thus leaving the issue in the time domain. Alternatively, the user can assist the automatic system to define the 3D figure in the frame, thus telling it what 3D information and visual information to use.
- At the end of the process the movement of the element may relate the model to its 3D motion capture, using the space dimension in the above image processing adds value in every different frame in the fact that it supplies more 3D and visual information for the reconstruction process for a more accurate 3D model of the object.
- The construction of the full 3D figure out of these separate tubes, balls, and other skeleton elements over the time dimension may reveal their mutual behavior—how they are attached to each other, or move together, and may assist in the further animation of the figure.
- Once the model has been completed, it can be kept as a separate figure from its origin & background, and can be used for future animation. Its origin movements can be used for motion capture. It may add more visual information from different times or location to the same figure, and can be changed to a new 3D figure depending on the users actions, just as a computer generated image on top of a polygon internal skeleton could be.
- Furthermore, the figure becomes independent with respect to the background they had, and can be used for further animation. If the object is photographed or filmed at different times or locations, SP may combine the information obtained from the different times or locations. For example, the visual information may be taken in different times or locations, and the computed information is added to the
master 3D model of the figure. Using the 3D structure of the object, we can create animation, mimic face, add voice and so on, at the level of the object itself, with no dependency on the specifically shot background. - The system can also use its ability to capture the motion within moving elements, thus to animate an existing 3D model using motion capture of a full body animation or part of it such as the mimic of the face.
- In the case that there are elements that change their relative velocity over both time and space, like blowing up a hand held air balloon in the example given above, the assistance of the user may be needed. The user may be asked to assist the system to define the 3D figure in the frame, to indicate what 3D information the system should use, and whether to leave this figure as a time based sequence, without any attachment of animation. In the latter case, this element may be edited using regular image processing tools. The only difference between the present and the previous example is that the present example remains a time based 3D object and not a space based 3D object.
- Image processing tools allow the user to attach together surroundings of different times and locations, correct distortion in the image, remove elements and create new elements based on the information created by the input and also to create 3D computer generated figures or to input computer generated figures from different 3D computer animation programs.
- Visual Information
- After receiving the three dimensional location in the space based three dimensional model, pending on the determined resolution, each point receives visual information layer(s), including the values of color and brightness as recorded in the digital information of the image.
- There are several visual parameters to take into consideration. The resolution of the model compared with the resolution of the photographed image, the spherical information of each pixel, and different qualities of the visual information from different cameras or from different clips, etc.
- With regards to different image resolution, there may be two cases.
- In the first case, the image resolution is higher then the determent resolution of F and thus there is a more than the needed amount of information for every pixel in the 3D model. For example, if the photographed image is 5 times larger in terms of the number of pixels per inch, then the system sums and averages the visual information for every 5 pixels into one pixel and creates a new pixel in the 3D model with the new computed value.
- The second case, is where the resolution of the 3D model is larger than the resolution of the photographed image. Using video sequences, every frame generates texture pixels within the frame and if the camera moves a little, pixels will photograph neighboring 3D points enabling to collect more visual information for a unified model then the total amount of pixels in the image. Such a case can happen for example while shooting an image from a distance or using digital zoom and so on. In this case the system extracts the information for each pixel from the neighboring pixels and along the time dimension. Here a key element of multiple layers of visual information for every pixel is crucial and will further be discussed.
- New pixels are now created and overlaid on the surface of the model, at the level of the system resolution. Each new pixel, now has a three dimensional position in the 3D space based model, and just as in real life can be observed from the full 360 degrees.
- In general, individual pixels are not observed from 360 degrees. For example, a point in a wall may be looked at from 180 degree (the back of the wall has different information in different pixels pending on their 3D location), a corner of a stone is observed from 270 degree, and so on.
- Reference is now made to
FIG. 15 which illustrates a photographed 3D image supplying visual information from a specific direction. Each photographed image supplies visual information from a specific direction. If SP receives visual information of a pixel from a specific direction only, it flattens the pixel, enabling to look at it from 180 degrees. This case creates some distortions in the visual quality when looking at this pixel from a side direction. - A preferred embodiment of the present invention, provides a half spherical pixel formation unifying multiple layers of visual information for each pixel with respect to its 3D location in the space dimension. It is possible to add infinite number of pixels and in terms of visual quality—we are creating super resolution,
- The super resolution also relates to the number of depth points it is possible to collect in the unified model creating
super resolution 3D points. The depth points allow deformations of the surface in the most accurate way. - In terms of 3 dimensional visual information the more angles that are covered by the visual information provided to the system the better is the ability of the software to mimic the ability to look at the pixel from every needed direction, and the better is the spherical information for this pixel, providing the ability to look at the pixel from every needed possible direction. Thus multiple images may be taken from various angles around the pixel. Every pixel can be photographed in many frames along each clip. Not all of this information is needed, meaning not all this information has the same level of quality. However, recording multi-layered visual information for each pixel may assist in the lowering of computation needed for the image processing and in enhancing the image quality.
- In terms of visual quality, every pixel can be photographed in many frames along each clip. Nevertheless, not all of this information is needed as not all the information has the same level of quality. Discarding of low quality information is a point that can assist in the lowering of computation needed for the image processing, but every piece of information is preferably used in order to enhance the image quality due to poor: image quality, lighting, camera resolution and so on.
- The system creates a grade of quality Q, were each new layer of information, which mines the information from every new frame is examined as for the quality of its visual information, and resolution. The visual information is graded by two factors, one is the image quality in the time dimension, the second is the image quality in the space dimension.
- For example, SP may receive two clips, shot from the same location inside of a building, using different apertures, for photographing the inside of a room and an external garden.
- In the first clip the camera uses an aperture with high exposure, this enables the camera to receive good visual information of the interior parts of the image while the external parts of the garden are over exposed and appear in the image as burned or excessively bright.
- In the second clip the camera uses a low exposure aperture, this creates very dark visual information of the internal parts of the image, but the external parts of the image are very balanced and well exposed.
- Each of these clips may not be well balanced as a stand alone unit, and the histogram of each of them will show unbalanced results.
- But, when the system is transformed from the Time dimension where every frame is separately examined in the space domain and the surroundings are examined as a whole, then, as the system receives new visual information it checks the clip based on 2 factors as follows:
- a first factor is based on the time dimension, mining the histogram for every frame as a separate unit, and its quality with respect to F, and
- the second factor is from the space domain in which the already composed images refer to certain areas in the frame to achieve higher quality, even if in F they suffer from poor Q.
- The system searches the new clip for better visual quality in the specific parts needed for SP, not in any correlation to the neighboring frame pixels, but with correlation to poor Q of F neighboring pixels, as was explained above with respect to
FIG. 14 . The system creates a well balanced image that in the present example, gives a very well exposed image that shows the external garden as well as the interior room in the best quality possible as if it were shot using a different aperture at the same time in the same image. - The system regards image information up to a certain minimum level of Q, meaning that if the image is lower then that minimum for both of the two factors above, than there is no point to using this information or to add its values to the existing values of the pixel's texture.
- The adding process of the new information is on the basis of QϵSP,
- The higher Q will be the higher value of participation that the information has in the pixel value, and the lower Q will be the lower value of participation its information has in the pixel value.
- The system unifies the information from both clips to yield a balanced and well exposed view of both the inside of the room and the external garden.
- The system may set a threshold for quality Q, and discard visual information accordingly.
- The image processing may also include processing methods such as those employed by standard camera control units (CCU) in order to balance the image and achieve uniformity between adjacent images.
- Viewing
- The constructed space based 3D model including its visual information is the collective result captured of all the image sequences is fed to the SP.
- Any point in the collective fields can be viewed from any view points using any known in art viewing methods. Following are some examples.
- In one embodiment of the present invention, virtual cameras are arranged in such a way that the field of vision of two adjacent lenses is overlapped to a great extent by the fields of view of the two adjacent lenses lying on the lens sides, with respect to a horizontal axis. Consequently, stereoscopic images can be generated.
- A preferred embodiment facilitates generation of full time based sequence, Live sequence, non linear output, stereoscopic/3D spherical imaging, and so on.
- In a preferred embodiment for providing stereoscopic images, virtual cameras are arranged in a specific configuration, wherein the field of vision of any of the lenses is overlapped to any desired extent by the fields of vision of all adjacent lenses surrounding the lens, the collective field of vision comprises a collection of fully circular images wherein any point within each of the field of vision is captured by at least two virtual lenses for creating a stereoscopic spherical imaging, or from one virtual lenses for creating a 2D spherical imaging, or form at list two virtual lenses for creating a 3D spherical imaging from any view point.
- As a result, stereoscopic data can be made available for viewing a scene filmed through a single camera.
- The images created by SP can be displayed to a viewer in various formats, such as stills, video, stereoscopic viewing, virtual reality, and so on. The images formed can be displayed on a flat screen such as a TV or a computer screen or by using a display device for virtual reality such as a virtual reality headset, were the part of the image being displayed changes according to the user's viewpoint. Surrounding a viewer, 360 degrees both horizontally and vertically by a suitable means for virtual reality displaying, gives the viewers the ability to look everywhere around him, as well as up and down, while having 3D depth perception of the displayed images.
- Virtual reality visual linear and non-linear information is provided to the user, using known in the art virtual reality means. Such means may be a headset having sensors to detect the head position of the viewer, or a virtual glove having a sensor to detect the hand position, or any known in art viewing software.
- For displaying on a flat screen, such as a TV or a computer screen, the viewing parameters of a user are taken from a user held pointing device (for example: a mouse or a joystick), programmed for this purpose. The system can gather the user's own movements using this inventions real time motion capture capabilities for example, or any motion capture from any external device.
- When a viewer selects a specific view, either by actually turning his head while wearing a virtual reality headset, or by a use held pointing device coupled to a computer device, the viewing parameters are detected and received by the displaying system. The viewer's viewing parameters include the viewer's viewing direction and viewer's horizon. In accordance with these parameters, the viewer's field of vision is determined in terms of the coordinates of the surrounding of the viewer and the image is projected into the viewing means.
- Type of Camera(s)
- The present invention is not limited with regards to the type of camera(s) used for capturing the images or sequences of images fed to the SP. The camera(s) may be selected from any known in the art digital or analog video cameras. The camera(s) may also be non-digital, in which case any known in the art technique may be used to convert the images into a digital format.
- In a preferred embodiment, digital images may be manipulated for enhancing their quality prior to storage and conversion into a spaced based 3D model.
- Applications
- Reference is now made to
FIG. 10 , which is a balloon chart illustrating different applications of the present invention. - According to a preferred embodiment, when a complete space based 3D model constructed as explained above, is available, the user can, within the virtual environment place virtual cameras to in effect re-photograph the scene from view points where no cameras were located in the original sequence. Furthermore this can also be done in real time: For example, in a basket ball game virtual cameras can be placed to shoot the games from view points where there are no actual cameras. All that is needed is to have previously modeled the arena and the individual players. In fact the modeling can be achieved in real time early during the broadcast, as an alternative to doing so beforehand.
- According to a preferred embodiment, using the above modeling, each figure once captured from the sequence, can be re-animated by the user, who may also use motion capture for example from an external source or the motion capture of the SP, thus changing how the figure moves in the original clip. That is to say, the user may reconstruct a model from the original photographed image, but output in real time other movements of the figures.
- According to a preferred embodiment, the user may modify the original figure from the image or even replace the figure with a completely new manipulated figure.
- According to this embodiment, new animation can be given to each figure with no dependency on the original movement of the figure during its photographed clips, by replacing the figure with a 3D model thereof, allowing the creation of new movie clips with the figure itself, using the techniques discussed herein. The figure may also be manipulated in real time by a user in computer games, console games, TV games, etc.
- A preferred embodiment introduces new lighting into the 3D model using known in art techniques, for adding light to a scene in animation or during post production of a video clip.
- A preferred embodiment comprises depth extrapolation in the arena to any desired point of reference of each element and background, as part of the 3D modeling of the elements and backgrounds. Depth extrapolation comprises a depth map analysis of the sequence(s) of photographic figure input to the system, which can be carried out in a number of ways as will be explained in more detail below.
- Preferred embodiments may allow various manipulations such as motion blur on the image.
- Using the techniques described herein, all the different kinds of manipulation that can be done while photographing a scene can also be done in the 3D virtual arena, such as changing the focus, modifying the zoom and the lighting, etc.
- Using the techniques mentioned herein, the user can create a full motion picture from the figures and backgrounds.
- Using the techniques mentioned herein, the user can create a full computer game (console game. TV game etc.) using the 3D space based model where all the figures are real image based 3D models.
- Preferably, computer generated images can be added to the three-dimensional environment and three-dimensional models therein. These images can have effects such as altering the skin of the model, adding further computer generated elements to the model or the background and so on
- According to this embodiment, the user can use the time line information associated with individual figures within the sequences to reconstruct the motion of the figure in a motion capture stage. The present techniques work using a sequence of images from a single camera or from images from two or more cameras.
- In the procedure described herein, two dimensional and three dimensional tracking can be applied to any of the figures and backgrounds identified, based on their movements in the time based clips. The tracking can be done in real time, or later as part of re-animating the clips.
- According a preferred embodiment of the present invention, the user may also add moving or static elements to the figures or backgrounds in the space based 3D environment.
- According to a preferred embodiment of the present invention, the user can create new arenas that were not originally photographed. For example, the user may combine several different surroundings into a unified arena, or combine a photographed arena with a synthetic arena which is computer generated.
- According to a preferred embodiment of the present invention, the user can use a figure that is reconstructed using the present embodiments in a 3D model, remove it from its background, and relocate it to different arenas, or to export it to any computer generated program.
- According to a preferred embodiment of the present invention, the user can create a new figure based on the reconstructed figures. The user may further add or change her texture, organs, and so on.
- According to a preferred embodiment, the user may use existing footage, for example an old movie, and use the data of the movie to model figures and backgrounds of the movie. This may be done by creating a full 3D space based environment or arena of the figures and locations therein, and then create a new movie made from the original figures and surroundings, based on the 3D environment that he has created.
- According to a preferred embodiment of the present invention, virtual gathering can be done using virtual 3D replication of the user. Such a virtual gathering may involve motion capture of the user. An application is allowing the user to participate in a virtual martial arts lesson where the teacher can see the 3D figure of the user and correct his movement, and each student may see the other students as 3D figures. The motion capture can be done using the user's own web camera.
- Such an application may also be used for additional educational purposes, virtual physical training, virtual video conferencing, etc. The 3D model and motion capture may also be used for virtual exhibitions, multiplayer games, or even virtual dating.
- According to a preferred embodiment of the present invention, the space based 3D model may be used in simulation, simulating combat arena for training soldiers, flight simulation, and so on.
- According to a preferred embodiment of the present invention, the 3D arena can be used in medical devices. It may be used for manipulating images acquired from one or more sensors. The images may be used to create a 3D model of a body organ for use during an actual surgical procedure in real time or for the purposes of simulation.
- The 3D models and environments described herein may be used for planning and design, for example, in architecture and construction engineering.
- In one particular application of the present invention, the models and environments described herein may also be used for transition between different video standards, such as between PAL and NTSC.
- One application of the techniques provided herein is video compression. In the application, space based 3D modeling using the photographed clip allows for transmission of the model, after which almost all that is necessary is the transmission of movement information. Such a technique represents a large saving in bandwidth over transmission of video frames, for the application is applicable to various uses of video and various quality specifications, from motion picture to cellular video, clips.
- Furthermore, the present embodiments provide a new method for video recording wherein the recording is directly made into or applied on the 3D space based model of the present embodiments. The video frames themselves can be reproduced after the information has been extracted to the model.
- The 3D model of the present embodiments can be used for capturing and modeling moving elements in real time from a single source, and viewing them from any direction. In one application, multiple users at different screens are able to view these figures from any direction or zoom, in real time.
- A device according to a preferred embodiment of the present invention system may be used in real time for capturing 3D movement of the user, and using it for fully operating the computer with the 3D movements of hands or body, for any computer program. This implementation may utilize a specified camera, a regular camera such as a regular video camera, a stills camera or a cellular camera. For example, the user may be immersed within a computer game where one of the existing 2D or 3D characters in the game moves according to the movements of the user. This can also be done in the user interface of the cellular mobile phones or any other hand held mobile devices.
- According to a preferred embodiment of the present invention, users can model themselves as a full or a partial 3D model and immerse themselves in a computer game or any other relevant computer program
- Applications of the present embodiments allow for creating of full
real image 2D/3D figures and background in computer games, simulators, or any variation of such a platform. - According to a preferred embodiment of the present invention, 3D modeling can be done using any kind of sensor gathered information such as infra red, etc.
- According to a preferred embodiment of the present invention, microscopic information can also be modeled into the novel 3D space based model using data gathered from suitable sensors.
- According to a preferred embodiment of the present invention, 3D models and texture can be used to create new user defined 2D/3D arena based on data gathered from sensors without optical information, such as subatomic particles, distant stars, or even areas the sensors cannot capture (for example—behind a wall).
- According to a preferred embodiment of the present invention, the 3D SP process may be used in machine vision enablement. For example it may be used to provide three-dimensional spatial understanding of a scene to a robot. The robot is thus able to relate to a human as a unified three-dimensional entity and not as a partial image in multiple frames. The resulting robot may have applications for example assisting disabled people and so on.
- As needed by the application, the 3D SP process may create a super resolution reconstructed 3D model in terms of the number of texture pixels per inch and number of depth points that construct the 3D formation of the model.
- It is expected that during the life of this patent many relevant photography and imaging devices and systems will be developed and the scope of the terms herein, particularly of the terms “3D model”, “image capture”, “depth map”, “Clip”, “Virtual Reality”, and “Computer”, is intended to include all such new technologies a priori.
- Additional objects, advantages, and novel features of the present invention will become apparent to one ordinarily skilled in the art upon examination of the following examples, which are not intended to be limiting. Additionally, each of the various embodiments and aspects of the present invention as delineated hereinabove and as claimed in the claims section below finds experimental support in the following examples.
- It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination.
- Although the invention has been described in conjunction with specific embodiments thereof, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art. Accordingly, it is intended to embrace all such alternatives, modifications and variations that fall within the spirit and broad scope of the appended claims. All publications, patents and patent applications mentioned in this specification are herein incorporated in their entirety by reference into the specification, to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incorporated herein by reference. In addition, citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the present invention.
Claims (16)
1. Apparatus for image based human machine interfacing by estimating a 3D representation of at least a portion of a user based on a set of 2D image data of the at least a portion of the user, comprising:
image processing circuits for: (1) identifying non-rigid structures of the user within said image data, and (2) associating three-dimensional skeleton model elements with identified structures, such that model defined spatial constraints between skeleton model elements and spatial relations between identified non-rigid structures in the 2D image data set are used to fit the model to the 2D image data set and to approximate 3D coordinates of at least one of said non-rigid structures of the non-rigid body.
2. Apparatus according to claim 1 , wherein said image processing circuitry is further adapted to identify within said image data a complex body made up of a plurality of interrelated structures whose three-dimensional movement constraints relative to one another are defined by the three-dimensional skeleton model.
3. Apparatus according to claim 2 , further adapted to analyze relative movements within a series of said sets of 2D image data, thereby providing three-dimensional movement information.
4. Apparatus according to claim 2 , further adapted to store a plurality of predetermined skeleton models.
5. Apparatus according to claim 1 , further adapted to associate a given skeleton element with an identified structure and to adjust a size of the given skeleton element to correspond to a dimension of the identified structure.
6. Apparatus according to claim 1 , further adapted to associate a given skeleton element to a given identified structure and to deform the given skeleton element corresponding to the identified structure.
7. Apparatus according to claim 1 , further adapted to associate a given skeleton element to a given identified structure and to apply texture to the given skeleton element.
8. Apparatus according to claim 1 , further adapted to track respective structures at a first level and move corresponding skeleton elements at a second level.
9. Apparatus according to claim 8 , further adapted to track respective structures at said second level and calculate the deviation of the structure at said first level.
10. Apparatus according to claim 1 , further adapted to animate said image data by applying motion to said skeleton elements.
11. Apparatus according to claim 1 , further adapted to select a first viewpoint and project using said three dimensional skeleton elements onto a 2D plane associated with said first viewpoint.
12. Apparatus according to claim 11 , further adapted to select a second viewpoint and project using said three-dimensional skeleton elements onto a 2D plane associated with said second viewpoint.
13. Apparatus according to claim 12 , further adapted to select said first and second viewpoints to provide stereoscopic vision.
14. Apparatus according to claim 1 , further adapted to capture input data onto said skeleton elements, to provide three-dimensional motion capture.
15. Apparatus according to claim 1 , further adapted to store data relative to said skeleton elements, to provide three-dimensional image data compression and recording.
16. Apparatus according to claim 1 , further adapted to form images at a given resolution by fitting interpolated pixels at said given resolution over said skeleton elements.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/149,457 US20190200003A1 (en) | 2004-07-30 | 2018-10-02 | System and method for 3d space-dimension based image processing |
Applications Claiming Priority (7)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US59213604P | 2004-07-30 | 2004-07-30 | |
PCT/IL2005/000813 WO2006011153A2 (en) | 2004-07-30 | 2005-07-31 | A system and method for 3d space-dimension based image processing |
US57295807A | 2007-01-30 | 2007-01-30 | |
US11/742,609 US8237775B2 (en) | 2004-07-30 | 2007-05-01 | System and method for 3D space-dimension based image processing |
US13/531,543 US9177220B2 (en) | 2004-07-30 | 2012-06-24 | System and method for 3D space-dimension based image processing |
US14/883,702 US20160105661A1 (en) | 2004-07-30 | 2015-10-15 | System and method for 3d space-dimension based image processing |
US16/149,457 US20190200003A1 (en) | 2004-07-30 | 2018-10-02 | System and method for 3d space-dimension based image processing |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/883,702 Continuation US20160105661A1 (en) | 2004-07-30 | 2015-10-15 | System and method for 3d space-dimension based image processing |
Publications (1)
Publication Number | Publication Date |
---|---|
US20190200003A1 true US20190200003A1 (en) | 2019-06-27 |
Family
ID=35786586
Family Applications (6)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/572,958 Active 2025-08-04 US8114172B2 (en) | 2004-07-30 | 2005-07-31 | System and method for 3D space-dimension based image processing |
US11/742,609 Active 2030-03-10 US8237775B2 (en) | 2004-07-30 | 2007-05-01 | System and method for 3D space-dimension based image processing |
US11/742,634 Active 2028-10-31 US8111284B1 (en) | 2004-07-30 | 2007-05-01 | System and method for 3D space-dimension based image processing |
US13/531,543 Active 2027-04-10 US9177220B2 (en) | 2004-07-30 | 2012-06-24 | System and method for 3D space-dimension based image processing |
US14/883,702 Abandoned US20160105661A1 (en) | 2004-07-30 | 2015-10-15 | System and method for 3d space-dimension based image processing |
US16/149,457 Abandoned US20190200003A1 (en) | 2004-07-30 | 2018-10-02 | System and method for 3d space-dimension based image processing |
Family Applications Before (5)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/572,958 Active 2025-08-04 US8114172B2 (en) | 2004-07-30 | 2005-07-31 | System and method for 3D space-dimension based image processing |
US11/742,609 Active 2030-03-10 US8237775B2 (en) | 2004-07-30 | 2007-05-01 | System and method for 3D space-dimension based image processing |
US11/742,634 Active 2028-10-31 US8111284B1 (en) | 2004-07-30 | 2007-05-01 | System and method for 3D space-dimension based image processing |
US13/531,543 Active 2027-04-10 US9177220B2 (en) | 2004-07-30 | 2012-06-24 | System and method for 3D space-dimension based image processing |
US14/883,702 Abandoned US20160105661A1 (en) | 2004-07-30 | 2015-10-15 | System and method for 3d space-dimension based image processing |
Country Status (6)
Country | Link |
---|---|
US (6) | US8114172B2 (en) |
EP (1) | EP1789928A4 (en) |
JP (4) | JP4904264B2 (en) |
KR (5) | KR101183000B1 (en) |
CA (1) | CA2575704C (en) |
WO (1) | WO2006011153A2 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220027656A1 (en) * | 2020-07-24 | 2022-01-27 | Ricoh Company, Ltd. | Image matching method and apparatus and non-transitory computer-readable medium |
US20230068731A1 (en) * | 2020-03-17 | 2023-03-02 | Sony Group Corporation | Image processing device and moving image data generation method |
Families Citing this family (202)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8035612B2 (en) | 2002-05-28 | 2011-10-11 | Intellectual Ventures Holding 67 Llc | Self-contained interactive video display system |
US8300042B2 (en) * | 2001-06-05 | 2012-10-30 | Microsoft Corporation | Interactive video display system using strobed light |
US20050122308A1 (en) * | 2002-05-28 | 2005-06-09 | Matthew Bell | Self-contained interactive video display system |
US7710391B2 (en) * | 2002-05-28 | 2010-05-04 | Matthew Bell | Processing an image utilizing a spatially varying pattern |
WO2004055776A1 (en) * | 2002-12-13 | 2004-07-01 | Reactrix Systems | Interactive directed light/sound system |
CN102034197A (en) | 2003-10-24 | 2011-04-27 | 瑞克楚斯系统公司 | Method and system for managing an interactive video display system |
US8681100B2 (en) | 2004-07-30 | 2014-03-25 | Extreme Realty Ltd. | Apparatus system and method for human-machine-interface |
US8872899B2 (en) * | 2004-07-30 | 2014-10-28 | Extreme Reality Ltd. | Method circuit and system for human to machine interfacing by hand gestures |
US8114172B2 (en) | 2004-07-30 | 2012-02-14 | Extreme Reality Ltd. | System and method for 3D space-dimension based image processing |
US8806435B2 (en) | 2004-12-31 | 2014-08-12 | Intel Corporation | Remote logging mechanism |
US9128519B1 (en) | 2005-04-15 | 2015-09-08 | Intellectual Ventures Holding 67 Llc | Method and system for state-based control of objects |
US8081822B1 (en) | 2005-05-31 | 2011-12-20 | Intellectual Ventures Holding 67 Llc | System and method for sensing a feature of an object in an interactive video display |
US8659668B2 (en) | 2005-10-07 | 2014-02-25 | Rearden, Llc | Apparatus and method for performing motion capture using a random pattern on capture surfaces |
US9330324B2 (en) | 2005-10-11 | 2016-05-03 | Apple Inc. | Error compensation in three-dimensional mapping |
EP1934945A4 (en) | 2005-10-11 | 2016-01-20 | Apple Inc | Method and system for object reconstruction |
US9046962B2 (en) | 2005-10-31 | 2015-06-02 | Extreme Reality Ltd. | Methods, systems, apparatuses, circuits and associated computer executable code for detecting motion, position and/or orientation of objects within a defined spatial region |
US20070285554A1 (en) * | 2005-10-31 | 2007-12-13 | Dor Givon | Apparatus method and system for imaging |
US8098277B1 (en) | 2005-12-02 | 2012-01-17 | Intellectual Ventures Holding 67 Llc | Systems and methods for communication between a reactive video system and a mobile communication device |
KR101331543B1 (en) | 2006-03-14 | 2013-11-20 | 프라임센스 엘티디. | Three-dimensional sensing using speckle patterns |
GB0613352D0 (en) * | 2006-07-05 | 2006-08-16 | Ashbey James A | Improvements in stereoscopic imaging systems |
US8021160B2 (en) * | 2006-07-22 | 2011-09-20 | Industrial Technology Research Institute | Learning assessment method and device using a virtual tutor |
CN101517568A (en) | 2006-07-31 | 2009-08-26 | 生命力有限公司 | System and method for performing motion capture and image reconstruction |
US8888592B1 (en) | 2009-06-01 | 2014-11-18 | Sony Computer Entertainment America Llc | Voice overlay |
KR100842568B1 (en) * | 2007-02-08 | 2008-07-01 | 삼성전자주식회사 | Apparatus and method for making compressed image data and apparatus and method for output compressed image data |
US8788848B2 (en) | 2007-03-22 | 2014-07-22 | Microsoft Corporation | Optical DNA |
US8837721B2 (en) | 2007-03-22 | 2014-09-16 | Microsoft Corporation | Optical DNA based on non-deterministic errors |
US20080231835A1 (en) * | 2007-03-23 | 2008-09-25 | Keigo Iizuka | Divergence ratio distance mapping camera |
US8493496B2 (en) | 2007-04-02 | 2013-07-23 | Primesense Ltd. | Depth mapping using projected patterns |
JP5147933B2 (en) * | 2007-04-15 | 2013-02-20 | エクストリーム リアリティー エルティーディー. | Man-machine interface device system and method |
WO2008155770A2 (en) | 2007-06-19 | 2008-12-24 | Prime Sense Ltd. | Distance-varying illumination and imaging techniques for depth mapping |
WO2008156318A2 (en) * | 2007-06-19 | 2008-12-24 | Electronics And Telecommunications Research Institute | Metadata structure for storing and playing stereoscopic data, and method for storing stereoscopic content file using this metadata |
WO2009015501A1 (en) * | 2007-07-27 | 2009-02-05 | ETH Zürich | Computer system and method for generating a 3d geometric model |
CA2699628A1 (en) | 2007-09-14 | 2009-03-19 | Matthew Bell | Gesture-based user interactions with status indicators for acceptable inputs in volumetric zones |
US10162474B2 (en) * | 2007-09-26 | 2018-12-25 | Autodesk, Inc. | Navigation system for a 3D virtual scene |
US8159682B2 (en) | 2007-11-12 | 2012-04-17 | Intellectual Ventures Holding 67 Llc | Lens system |
US8147339B1 (en) | 2007-12-15 | 2012-04-03 | Gaikai Inc. | Systems and methods of serving game video |
US8613673B2 (en) | 2008-12-15 | 2013-12-24 | Sony Computer Entertainment America Llc | Intelligent game loading |
US8968087B1 (en) | 2009-06-01 | 2015-03-03 | Sony Computer Entertainment America Llc | Video game overlay |
US8259163B2 (en) | 2008-03-07 | 2012-09-04 | Intellectual Ventures Holding 67 Llc | Display with built in 3D sensing |
US8121351B2 (en) * | 2008-03-09 | 2012-02-21 | Microsoft International Holdings B.V. | Identification of objects in a 3D video using non/over reflective clothing |
US8595218B2 (en) | 2008-06-12 | 2013-11-26 | Intellectual Ventures Holding 67 Llc | Interactive display management systems and methods |
MX2010014093A (en) * | 2008-06-19 | 2011-03-04 | Thomson Licensing | Display of two-dimensional content during three-dimensional presentation. |
US8456517B2 (en) | 2008-07-09 | 2013-06-04 | Primesense Ltd. | Integrated processor for 3D mapping |
US20110163948A1 (en) * | 2008-09-04 | 2011-07-07 | Dor Givon | Method system and software for providing image sensor based human machine interfacing |
CA2741559A1 (en) | 2008-10-24 | 2010-04-29 | Extreme Reality Ltd. | A method system and associated modules and software components for providing image sensor based human machine interfacing |
US8926435B2 (en) | 2008-12-15 | 2015-01-06 | Sony Computer Entertainment America Llc | Dual-mode program execution |
US20100167248A1 (en) * | 2008-12-31 | 2010-07-01 | Haptica Ltd. | Tracking and training system for medical procedures |
JP5877065B2 (en) * | 2009-01-26 | 2016-03-02 | トムソン ライセンシングThomson Licensing | Frame packing for video encoding |
US8588465B2 (en) * | 2009-01-30 | 2013-11-19 | Microsoft Corporation | Visual target tracking |
US8295546B2 (en) | 2009-01-30 | 2012-10-23 | Microsoft Corporation | Pose tracking pipeline |
US8866821B2 (en) * | 2009-01-30 | 2014-10-21 | Microsoft Corporation | Depth map movement tracking via optical flow and velocity prediction |
US20100195867A1 (en) * | 2009-01-30 | 2010-08-05 | Microsoft Corporation | Visual target tracking using model fitting and exemplar |
WO2010087778A1 (en) * | 2009-02-02 | 2010-08-05 | Agency For Science, Technology And Research | Method and system for rendering an entertainment animation |
US9135948B2 (en) * | 2009-07-03 | 2015-09-15 | Microsoft Technology Licensing, Llc | Optical medium with added descriptor to reduce counterfeiting |
US8786682B2 (en) * | 2009-03-05 | 2014-07-22 | Primesense Ltd. | Reference image techniques for three-dimensional sensing |
US8717417B2 (en) * | 2009-04-16 | 2014-05-06 | Primesense Ltd. | Three-dimensional mapping and imaging |
JP5573316B2 (en) * | 2009-05-13 | 2014-08-20 | セイコーエプソン株式会社 | Image processing method and image processing apparatus |
US8803889B2 (en) * | 2009-05-29 | 2014-08-12 | Microsoft Corporation | Systems and methods for applying animations or motions to a character |
US9182814B2 (en) * | 2009-05-29 | 2015-11-10 | Microsoft Technology Licensing, Llc | Systems and methods for estimating a non-visible or occluded body part |
US8506402B2 (en) | 2009-06-01 | 2013-08-13 | Sony Computer Entertainment America Llc | Game execution environments |
US9582889B2 (en) | 2009-07-30 | 2017-02-28 | Apple Inc. | Depth mapping based on pattern matching and stereoscopic information |
US20110025830A1 (en) * | 2009-07-31 | 2011-02-03 | 3Dmedia Corporation | Methods, systems, and computer-readable storage media for generating stereoscopic content via depth map creation |
US9218126B2 (en) | 2009-09-21 | 2015-12-22 | Extreme Reality Ltd. | Methods circuits apparatus and systems for human machine interfacing with an electronic appliance |
US8878779B2 (en) | 2009-09-21 | 2014-11-04 | Extreme Reality Ltd. | Methods circuits device systems and associated computer executable code for facilitating interfacing with a computing platform display screen |
JP5337658B2 (en) * | 2009-10-02 | 2013-11-06 | 株式会社トプコン | Wide-angle imaging device and measurement system |
US8963829B2 (en) * | 2009-10-07 | 2015-02-24 | Microsoft Corporation | Methods and systems for determining and tracking extremities of a target |
US7961910B2 (en) | 2009-10-07 | 2011-06-14 | Microsoft Corporation | Systems and methods for tracking a model |
US8867820B2 (en) * | 2009-10-07 | 2014-10-21 | Microsoft Corporation | Systems and methods for removing a background of an image |
US8564534B2 (en) | 2009-10-07 | 2013-10-22 | Microsoft Corporation | Human tracking system |
US8817071B2 (en) * | 2009-11-17 | 2014-08-26 | Seiko Epson Corporation | Context constrained novel view interpolation |
US8830227B2 (en) | 2009-12-06 | 2014-09-09 | Primesense Ltd. | Depth-based gain control |
US8803951B2 (en) | 2010-01-04 | 2014-08-12 | Disney Enterprises, Inc. | Video capture system control using virtual cameras for augmented reality |
US20110199469A1 (en) * | 2010-02-15 | 2011-08-18 | Gallagher Andrew C | Detection and display of stereo images |
US20110199463A1 (en) * | 2010-02-15 | 2011-08-18 | Gallagher Andrew C | Display with integrated camera |
US20110199468A1 (en) * | 2010-02-15 | 2011-08-18 | Gallagher Andrew C | 3-dimensional display with preferences |
US8384774B2 (en) * | 2010-02-15 | 2013-02-26 | Eastman Kodak Company | Glasses for viewing stereo images |
JP2011170857A (en) * | 2010-02-22 | 2011-09-01 | Ailive Inc | System and method for performing motion recognition with minimum delay |
US8730309B2 (en) | 2010-02-23 | 2014-05-20 | Microsoft Corporation | Projectors and depth cameras for deviceless augmented reality and interaction |
US8982182B2 (en) * | 2010-03-01 | 2015-03-17 | Apple Inc. | Non-uniform spatial resource allocation for depth mapping |
CA2696925A1 (en) * | 2010-03-19 | 2011-09-19 | Bertrand Nepveu | Integrated field-configurable headset and system |
JP5087101B2 (en) | 2010-03-31 | 2012-11-28 | 株式会社バンダイナムコゲームス | Program, information storage medium, and image generation system |
KR20110116525A (en) * | 2010-04-19 | 2011-10-26 | 엘지전자 주식회사 | Image display device and operating method for the same |
CN101840508B (en) * | 2010-04-26 | 2013-01-09 | 中国科学院计算技术研究所 | Method and system for automatically identifying characteristic points in human body chain structure. |
US8803888B2 (en) | 2010-06-02 | 2014-08-12 | Microsoft Corporation | Recognition system for sharing information |
JP2011258159A (en) * | 2010-06-11 | 2011-12-22 | Namco Bandai Games Inc | Program, information storage medium and image generation system |
US8928659B2 (en) * | 2010-06-23 | 2015-01-06 | Microsoft Corporation | Telepresence systems with viewer perspective adjustment |
US8976230B1 (en) * | 2010-06-28 | 2015-03-10 | Vlad Vendrow | User interface and methods to adapt images for approximating torso dimensions to simulate the appearance of various states of dress |
US8676591B1 (en) | 2010-08-02 | 2014-03-18 | Sony Computer Entertainment America Llc | Audio deceleration |
US9098931B2 (en) | 2010-08-11 | 2015-08-04 | Apple Inc. | Scanning projectors and image capture modules for 3D mapping |
US8730302B2 (en) * | 2010-08-27 | 2014-05-20 | Broadcom Corporation | Method and system for enhancing 3D effects for 3D video rendering |
CN103403694B (en) | 2010-09-13 | 2019-05-21 | 索尼电脑娱乐美国公司 | Add-on assemble management |
KR20130090898A (en) | 2010-09-13 | 2013-08-14 | 소니 컴퓨터 엔터테인먼트 아메리카 엘엘씨 | Dual mode program execution and loading |
US20130208926A1 (en) * | 2010-10-13 | 2013-08-15 | Microsoft Corporation | Surround sound simulation with virtual skeleton modeling |
US20130208899A1 (en) * | 2010-10-13 | 2013-08-15 | Microsoft Corporation | Skeletal modeling for positioning virtual object sounds |
US20130208897A1 (en) * | 2010-10-13 | 2013-08-15 | Microsoft Corporation | Skeletal modeling for world space object sounds |
US20130208900A1 (en) * | 2010-10-13 | 2013-08-15 | Microsoft Corporation | Depth camera with integrated three-dimensional audio |
US9522330B2 (en) * | 2010-10-13 | 2016-12-20 | Microsoft Technology Licensing, Llc | Three-dimensional audio sweet spot feedback |
US9628755B2 (en) * | 2010-10-14 | 2017-04-18 | Microsoft Technology Licensing, Llc | Automatically tracking user movement in a video chat application |
US9196076B1 (en) * | 2010-11-17 | 2015-11-24 | David MacLeod | Method for producing two-dimensional animated characters |
WO2012066501A1 (en) | 2010-11-19 | 2012-05-24 | Primesense Ltd. | Depth mapping using time-coded illumination |
US9131136B2 (en) | 2010-12-06 | 2015-09-08 | Apple Inc. | Lens arrays for pattern projection and imaging |
KR20120065834A (en) * | 2010-12-13 | 2012-06-21 | 한국전자통신연구원 | Apparatus for generating digital actor based on multiple cameras and method thereof |
US9129438B2 (en) | 2011-01-18 | 2015-09-08 | NedSense Loft B.V. | 3D modeling and rendering from 2D images |
CA2806520C (en) * | 2011-01-23 | 2016-02-16 | Extreme Reality Ltd. | Methods, systems, devices and associated processing logic for generating stereoscopic images and video |
JP2012160039A (en) * | 2011-02-01 | 2012-08-23 | Fujifilm Corp | Image processor, stereoscopic image printing system, image processing method and program |
US8761437B2 (en) * | 2011-02-18 | 2014-06-24 | Microsoft Corporation | Motion recognition |
US9857868B2 (en) | 2011-03-19 | 2018-01-02 | The Board Of Trustees Of The Leland Stanford Junior University | Method and system for ergonomic touch-free interface |
TWI424377B (en) * | 2011-04-01 | 2014-01-21 | Altek Corp | Method for analyzing object motion in multi frames |
US9030528B2 (en) | 2011-04-04 | 2015-05-12 | Apple Inc. | Multi-zone imaging sensor and lens array |
US8840466B2 (en) | 2011-04-25 | 2014-09-23 | Aquifi, Inc. | Method and system to create three-dimensional mapping in a two-dimensional game |
US9594430B2 (en) * | 2011-06-01 | 2017-03-14 | Microsoft Technology Licensing, Llc | Three-dimensional foreground selection for vision system |
US9597587B2 (en) | 2011-06-08 | 2017-03-21 | Microsoft Technology Licensing, Llc | Locational node device |
DE102011104524A1 (en) * | 2011-06-15 | 2012-12-20 | Ifakt Gmbh | Method and device for determining and reproducing virtual location-related information for a room area |
WO2013028908A1 (en) * | 2011-08-24 | 2013-02-28 | Microsoft Corporation | Touch and social cues as inputs into a computer |
JP5746937B2 (en) * | 2011-09-01 | 2015-07-08 | ルネサスエレクトロニクス株式会社 | Object tracking device |
US9786083B2 (en) | 2011-10-07 | 2017-10-10 | Dreamworks Animation L.L.C. | Multipoint offset sampling deformation |
ES2895454T3 (en) | 2011-11-09 | 2022-02-21 | Abyssal S A | System and method of operation for remotely operated vehicles with superimposed 3D images |
US9161012B2 (en) | 2011-11-17 | 2015-10-13 | Microsoft Technology Licensing, Llc | Video compression using virtual skeleton |
US8666119B1 (en) * | 2011-11-29 | 2014-03-04 | Lucasfilm Entertainment Company Ltd. | Geometry tracking |
US20130141433A1 (en) * | 2011-12-02 | 2013-06-06 | Per Astrand | Methods, Systems and Computer Program Products for Creating Three Dimensional Meshes from Two Dimensional Images |
US8811938B2 (en) | 2011-12-16 | 2014-08-19 | Microsoft Corporation | Providing a user interface experience based on inferred vehicle state |
KR101908284B1 (en) | 2012-01-13 | 2018-10-16 | 삼성전자주식회사 | Apparatus and method for analysising body parts association |
US8854433B1 (en) | 2012-02-03 | 2014-10-07 | Aquifi, Inc. | Method and system enabling natural user interface gestures with an electronic system |
US9651417B2 (en) | 2012-02-15 | 2017-05-16 | Apple Inc. | Scanning depth engine |
US20130271472A1 (en) * | 2012-04-12 | 2013-10-17 | Motorola Mobility, Inc. | Display of Value Changes in Between Keyframes in an Animation Using a Timeline |
US9111135B2 (en) | 2012-06-25 | 2015-08-18 | Aquifi, Inc. | Systems and methods for tracking human hands using parts based template matching using corresponding pixels in bounded regions of a sequence of frames that are a specified distance interval from a reference camera |
US8934675B2 (en) | 2012-06-25 | 2015-01-13 | Aquifi, Inc. | Systems and methods for tracking human hands by performing parts based template matching using images from multiple viewpoints |
US20140028662A1 (en) * | 2012-07-24 | 2014-01-30 | Sharp Laboratories Of America, Inc. | Viewer reactive stereoscopic display for head detection |
US20140047393A1 (en) * | 2012-08-07 | 2014-02-13 | Samsung Electronics Co., Ltd. | Method and portable apparatus with a gui |
US9696427B2 (en) | 2012-08-14 | 2017-07-04 | Microsoft Technology Licensing, Llc | Wide angle depth detection |
US8836768B1 (en) | 2012-09-04 | 2014-09-16 | Aquifi, Inc. | Method and system enabling natural user interface gestures with user wearable glasses |
US9386298B2 (en) * | 2012-11-08 | 2016-07-05 | Leap Motion, Inc. | Three-dimensional image sensors |
CN103077546B (en) * | 2012-12-27 | 2015-10-28 | 江苏太奇通软件有限公司 | The three-dimensional perspective transform method of X-Y scheme |
JP6075066B2 (en) * | 2012-12-28 | 2017-02-08 | 株式会社リコー | Image management system, image management method, and program |
US9092665B2 (en) | 2013-01-30 | 2015-07-28 | Aquifi, Inc | Systems and methods for initializing motion tracking of human hands |
US9129155B2 (en) | 2013-01-30 | 2015-09-08 | Aquifi, Inc. | Systems and methods for initializing motion tracking of human hands using template matching within bounded regions determined using a depth map |
US9052746B2 (en) * | 2013-02-15 | 2015-06-09 | Microsoft Technology Licensing, Llc | User center-of-mass and mass distribution extraction using depth images |
US9298266B2 (en) | 2013-04-02 | 2016-03-29 | Aquifi, Inc. | Systems and methods for implementing three-dimensional (3D) gesture based graphical user interfaces (GUI) that incorporate gesture reactive interface objects |
KR101428467B1 (en) * | 2013-07-03 | 2014-08-12 | 연세대학교 산학협력단 | Method and apparatus for automated updating 4d cad construction model |
EP3686754A1 (en) * | 2013-07-30 | 2020-07-29 | Kodak Alaris Inc. | System and method for creating navigable views of ordered images |
US9798388B1 (en) | 2013-07-31 | 2017-10-24 | Aquifi, Inc. | Vibrotactile system to augment 3D input systems |
US20230156350A1 (en) * | 2013-09-30 | 2023-05-18 | Duelight Llc | Systems, methods, and computer program products for digital photography |
KR20150043818A (en) | 2013-10-15 | 2015-04-23 | 삼성전자주식회사 | Image processing apparatus and control method thereof |
CN103578127B (en) * | 2013-11-13 | 2016-08-31 | 北京像素软件科技股份有限公司 | A kind of object turns round operation realizing method and device |
US20150138311A1 (en) * | 2013-11-21 | 2015-05-21 | Panavision International, L.P. | 360-degree panoramic camera systems |
TWI472231B (en) | 2013-11-27 | 2015-02-01 | Ind Tech Res Inst | Video pre-processing method and apparatus for motion estimation |
US9418465B2 (en) * | 2013-12-31 | 2016-08-16 | Dreamworks Animation Llc | Multipoint offset sampling deformation techniques |
US9507417B2 (en) | 2014-01-07 | 2016-11-29 | Aquifi, Inc. | Systems and methods for implementing head tracking based graphical user interfaces (GUI) that incorporate gesture reactive interface objects |
US20150215530A1 (en) * | 2014-01-27 | 2015-07-30 | Microsoft Corporation | Universal capture |
US9619105B1 (en) | 2014-01-30 | 2017-04-11 | Aquifi, Inc. | Systems and methods for gesture based interaction with viewpoint dependent user interfaces |
KR101381580B1 (en) * | 2014-02-04 | 2014-04-17 | (주)나인정보시스템 | Method and system for detecting position of vehicle in image of influenced various illumination environment |
WO2015124388A1 (en) * | 2014-02-19 | 2015-08-27 | Koninklijke Philips N.V. | Motion adaptive visualization in medical 4d imaging |
KR20150101915A (en) * | 2014-02-27 | 2015-09-04 | 삼성전자주식회사 | Method for displaying 3 dimension graphic user interface screen and device for performing the same |
US20150262380A1 (en) * | 2014-03-17 | 2015-09-17 | Qualcomm Incorporated | Adaptive resolution in optical flow computations for an image processing system |
US10321117B2 (en) * | 2014-04-11 | 2019-06-11 | Lucasfilm Entertainment Company Ltd. | Motion-controlled body capture and reconstruction |
US10313656B2 (en) | 2014-09-22 | 2019-06-04 | Samsung Electronics Company Ltd. | Image stitching for three-dimensional video |
US11205305B2 (en) | 2014-09-22 | 2021-12-21 | Samsung Electronics Company, Ltd. | Presentation of three-dimensional video |
US20160284135A1 (en) * | 2015-03-25 | 2016-09-29 | Gila Kamhi | Reality Animation Mechanism |
RU2586566C1 (en) * | 2015-03-25 | 2016-06-10 | Общество с ограниченной ответственностью "Лаборатория 24" | Method of displaying object |
US10268781B2 (en) * | 2015-07-01 | 2019-04-23 | Paddy Dunning | Visual modeling apparatuses, methods and systems |
CN105184738A (en) * | 2015-09-08 | 2015-12-23 | 郑州普天信息技术有限公司 | Three-dimensional virtual display device and method |
US10276210B2 (en) * | 2015-11-18 | 2019-04-30 | International Business Machines Corporation | Video enhancement |
RU2606874C1 (en) * | 2015-12-02 | 2017-01-10 | Виталий Витальевич Аверьянов | Method of augmented reality environment generating device controlling |
US10150034B2 (en) | 2016-04-11 | 2018-12-11 | Charles Chungyohl Lee | Methods and systems for merging real world media within a virtual world |
US10718613B2 (en) * | 2016-04-19 | 2020-07-21 | Massachusetts Institute Of Technology | Ground-based system for geolocation of perpetrators of aircraft laser strikes |
RU2636676C2 (en) * | 2016-05-12 | 2017-11-27 | Общество с ограниченной ответственностью "Торговый дом "Технолайн" ООО "Торговый дом "Технолайн" | Method of creating augmented reality elements and graphic media for its implementation |
US10071314B2 (en) * | 2016-06-30 | 2018-09-11 | Electronic Arts Inc. | Multi-character interaction scenario |
JP2018116537A (en) * | 2017-01-19 | 2018-07-26 | ソニー株式会社 | Information processing apparatus, information processing method, and program |
JP6944180B2 (en) * | 2017-03-23 | 2021-10-06 | 株式会社Free−D | Video conversion system, video conversion method and video conversion program |
US20230107110A1 (en) * | 2017-04-10 | 2023-04-06 | Eys3D Microelectronics, Co. | Depth processing system and operational method thereof |
US10417276B2 (en) * | 2017-05-15 | 2019-09-17 | Adobe, Inc. | Thumbnail generation from panoramic images |
US10431000B2 (en) * | 2017-07-18 | 2019-10-01 | Sony Corporation | Robust mesh tracking and fusion by using part-based key frames and priori model |
EP3669330A4 (en) | 2017-08-15 | 2021-04-07 | Nokia Technologies Oy | Encoding and decoding of volumetric video |
WO2019034807A1 (en) | 2017-08-15 | 2019-02-21 | Nokia Technologies Oy | Sequential encoding and decoding of volymetric video |
US10607079B2 (en) * | 2017-09-26 | 2020-03-31 | Toyota Research Institute, Inc. | Systems and methods for generating three dimensional skeleton representations |
US10529074B2 (en) | 2017-09-28 | 2020-01-07 | Samsung Electronics Co., Ltd. | Camera pose and plane estimation using active markers and a dynamic vision sensor |
US10839547B2 (en) | 2017-09-28 | 2020-11-17 | Samsung Electronics Co., Ltd. | Camera pose determination and tracking |
US10460512B2 (en) * | 2017-11-07 | 2019-10-29 | Microsoft Technology Licensing, Llc | 3D skeletonization using truncated epipolar lines |
US11645719B2 (en) | 2017-12-05 | 2023-05-09 | International Business Machines Corporation | Dynamic event depiction facilitating automatic resource(s) diverting |
US11113887B2 (en) * | 2018-01-08 | 2021-09-07 | Verizon Patent And Licensing Inc | Generating three-dimensional content from two-dimensional images |
CN108765529A (en) * | 2018-05-04 | 2018-11-06 | 北京比特智学科技有限公司 | Video generation method and device |
CN108765263A (en) * | 2018-05-21 | 2018-11-06 | 电子科技大学 | A kind of human skeleton model method for building up based on two dimensional image |
JP7017207B2 (en) * | 2018-06-12 | 2022-02-08 | アイレック技建株式会社 | Image inspection device and its image inspection method |
US10595000B1 (en) * | 2018-08-02 | 2020-03-17 | Facebook Technologies, Llc | Systems and methods for using depth information to extrapolate two-dimentional images |
US10872459B2 (en) | 2019-02-05 | 2020-12-22 | X Development Llc | Scene recognition using volumetric substitution of real world objects |
KR102639725B1 (en) * | 2019-02-18 | 2024-02-23 | 삼성전자주식회사 | Electronic device for providing animated image and method thereof |
JP7095628B2 (en) * | 2019-03-07 | 2022-07-05 | 日本電信電話株式会社 | Coordinate system transformation parameter estimator, method and program |
JP7331927B2 (en) * | 2019-07-23 | 2023-08-23 | 富士通株式会社 | Generation method, generation program and information processing device |
US11270121B2 (en) | 2019-08-20 | 2022-03-08 | Microsoft Technology Licensing, Llc | Semi supervised animated character recognition in video |
US11366989B2 (en) * | 2019-08-20 | 2022-06-21 | Microsoft Technology Licensing, Llc | Negative sampling algorithm for enhanced image classification |
US11958183B2 (en) | 2019-09-19 | 2024-04-16 | The Research Foundation For The State University Of New York | Negotiation-based human-robot collaboration via augmented reality |
CN112634339B (en) * | 2019-09-24 | 2024-05-31 | 阿里巴巴集团控股有限公司 | Commodity object information display method and device and electronic equipment |
CN111126300B (en) * | 2019-12-25 | 2023-09-08 | 成都极米科技股份有限公司 | Human body image detection method and device, electronic equipment and readable storage medium |
WO2021236468A1 (en) * | 2020-05-19 | 2021-11-25 | Intelligent Security Systems Corporation | Technologies for analyzing behaviors of objects or with respect to objects based on stereo imageries therof |
US11263454B2 (en) * | 2020-05-25 | 2022-03-01 | Jingdong Digits Technology Holding Co., Ltd. | System and method for video-based pig counting in the crowd |
CN112383679A (en) * | 2020-11-02 | 2021-02-19 | 北京德火科技有限责任公司 | Remote same-screen remote interview mode of AR immersive panoramic simulation system at different places and control method thereof |
CN112379773B (en) * | 2020-11-12 | 2024-05-24 | 深圳市洲明科技股份有限公司 | Multi-person three-dimensional motion capturing method, storage medium and electronic equipment |
US11450107B1 (en) | 2021-03-10 | 2022-09-20 | Microsoft Technology Licensing, Llc | Dynamic detection and recognition of media subjects |
CN115937379A (en) * | 2021-08-16 | 2023-04-07 | 北京字跳网络技术有限公司 | Special effect generation method and device, electronic equipment and storage medium |
US20230306616A1 (en) * | 2022-03-25 | 2023-09-28 | Logistics and Supply Chain MultiTech R&D Centre Limited | Device and method for capturing and analyzing a motion of a user |
US11948234B1 (en) * | 2023-08-30 | 2024-04-02 | Illuscio, Inc. | Systems and methods for dynamic enhancement of point cloud animations |
CN117278731B (en) * | 2023-11-21 | 2024-05-28 | 启迪数字科技(深圳)有限公司 | Multi-video and three-dimensional scene fusion method, device, equipment and storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6144375A (en) * | 1998-08-14 | 2000-11-07 | Praja Inc. | Multi-perspective viewer for content-based interactivity |
US6208360B1 (en) * | 1997-03-10 | 2001-03-27 | Kabushiki Kaisha Toshiba | Method and apparatus for graffiti animation |
US6317130B1 (en) * | 1996-10-31 | 2001-11-13 | Konami Co., Ltd. | Apparatus and method for generating skeleton-based dynamic picture images as well as medium storing therein program for generation of such picture images |
US20050063596A1 (en) * | 2001-11-23 | 2005-03-24 | Yosef Yomdin | Encoding of geometric modeled images |
US20060002601A1 (en) * | 2004-06-30 | 2006-01-05 | Accuray, Inc. | DRR generation using a non-linear attenuation model |
US20060002615A1 (en) * | 2004-06-30 | 2006-01-05 | Accuray, Inc. | Image enhancement method and system for fiducial-less tracking of treatment targets |
US20060074292A1 (en) * | 2004-09-30 | 2006-04-06 | Accuray, Inc. | Dynamic tracking of moving targets |
US7823066B1 (en) * | 2000-03-03 | 2010-10-26 | Tibco Software Inc. | Intelligent console for content-based interactivity |
Family Cites Families (127)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4376950A (en) | 1980-09-29 | 1983-03-15 | Ampex Corporation | Three-dimensional television system using holographic techniques |
US5130794A (en) | 1990-03-29 | 1992-07-14 | Ritchey Kurtis J | Panoramic display system |
US5515183A (en) | 1991-08-08 | 1996-05-07 | Citizen Watch Co., Ltd. | Real-time holography system |
US5691885A (en) | 1992-03-17 | 1997-11-25 | Massachusetts Institute Of Technology | Three-dimensional interconnect having modules with vertical top and bottom connectors |
JP3414417B2 (en) | 1992-09-30 | 2003-06-09 | 富士通株式会社 | 3D image information transmission system |
JP2627483B2 (en) * | 1994-05-09 | 1997-07-09 | 株式会社エイ・ティ・アール通信システム研究所 | Attitude detection apparatus and method |
US5920319A (en) * | 1994-10-27 | 1999-07-06 | Wake Forest University | Automatic analysis in virtual endoscopy |
US5745719A (en) | 1995-01-19 | 1998-04-28 | Falcon; Fernando D. | Commands functions invoked from movement of a control input device |
US5835133A (en) | 1996-01-23 | 1998-11-10 | Silicon Graphics, Inc. | Optical system for single camera stereo video |
US6115482A (en) | 1996-02-13 | 2000-09-05 | Ascent Technology, Inc. | Voice-output reading system with gesture-based navigation |
JP3337938B2 (en) * | 1996-04-25 | 2002-10-28 | 松下電器産業株式会社 | Motion transmitting / receiving device having three-dimensional skeleton structure and motion transmitting / receiving method |
US5909218A (en) * | 1996-04-25 | 1999-06-01 | Matsushita Electric Industrial Co., Ltd. | Transmitter-receiver of three-dimensional skeleton structure motions and method thereof |
US6445814B2 (en) | 1996-07-01 | 2002-09-03 | Canon Kabushiki Kaisha | Three-dimensional information processing apparatus and method |
US5852450A (en) * | 1996-07-11 | 1998-12-22 | Lamb & Company, Inc. | Method and apparatus for processing captured motion data |
US5831633A (en) * | 1996-08-13 | 1998-11-03 | Van Roy; Peter L. | Designating, drawing and colorizing generated images by computer |
JPH10334270A (en) * | 1997-05-28 | 1998-12-18 | Mitsubishi Electric Corp | Operation recognition device and recorded medium recording operation recognition program |
JPH1145351A (en) * | 1997-07-28 | 1999-02-16 | Matsushita Electric Ind Co Ltd | Information processor |
JPH11175765A (en) * | 1997-12-11 | 1999-07-02 | Alpine Electron Inc | Method and device for generating three-dimensional model and storage medium |
US6243106B1 (en) * | 1998-04-13 | 2001-06-05 | Compaq Computer Corporation | Method for figure tracking using 2-D registration and 3-D reconstruction |
US6256882B1 (en) * | 1998-07-14 | 2001-07-10 | Cascade Microtech, Inc. | Membrane probing system |
US6681031B2 (en) | 1998-08-10 | 2004-01-20 | Cybernet Systems Corporation | Gesture-controlled interfaces for self-service machines and other applications |
JP2000155606A (en) * | 1998-11-24 | 2000-06-06 | Ricoh Elemex Corp | Operation control system |
US6303924B1 (en) | 1998-12-21 | 2001-10-16 | Microsoft Corporation | Image sensing operator input device |
US6529643B1 (en) * | 1998-12-21 | 2003-03-04 | Xerox Corporation | System for electronic compensation of beam scan trajectory distortion |
US6657670B1 (en) | 1999-03-16 | 2003-12-02 | Teco Image Systems Co., Ltd. | Diaphragm structure of digital still camera |
DE19917660A1 (en) | 1999-04-19 | 2000-11-02 | Deutsch Zentr Luft & Raumfahrt | Method and input device for controlling the position of an object to be graphically represented in a virtual reality |
US6597801B1 (en) * | 1999-09-16 | 2003-07-22 | Hewlett-Packard Development Company L.P. | Method for object registration via selection of models with dynamically ordered features |
US7123292B1 (en) | 1999-09-29 | 2006-10-17 | Xerox Corporation | Mosaicing images with an offset lens |
JP2001246161A (en) | 1999-12-31 | 2001-09-11 | Square Co Ltd | Device and method for game using gesture recognizing technic and recording medium storing program to realize the method |
GB2358098A (en) | 2000-01-06 | 2001-07-11 | Sharp Kk | Method of segmenting a pixelled image |
EP1117072A1 (en) * | 2000-01-17 | 2001-07-18 | Koninklijke Philips Electronics N.V. | Text improvement |
US6674877B1 (en) * | 2000-02-03 | 2004-01-06 | Microsoft Corporation | System and method for visually tracking occluded objects in real time |
US7370983B2 (en) | 2000-03-02 | 2008-05-13 | Donnelly Corporation | Interior mirror assembly with display |
KR100355815B1 (en) * | 2000-04-11 | 2002-10-19 | 이지로보틱스 주식회사 | Apparatus for motion capture and motion animation using multiple mobile robot |
US6554706B2 (en) * | 2000-05-31 | 2003-04-29 | Gerard Jounghyun Kim | Methods and apparatus of displaying and evaluating motion data in a motion game apparatus |
JP2002032788A (en) * | 2000-07-14 | 2002-01-31 | Nippon Telegr & Teleph Corp <Ntt> | Method and device for providing virtual reality and recording medium with virtual reality providing program recorded threreon |
US7227526B2 (en) | 2000-07-24 | 2007-06-05 | Gesturetek, Inc. | Video-based image control system |
US6906687B2 (en) | 2000-07-31 | 2005-06-14 | Texas Instruments Incorporated | Digital formatter for 3-dimensional display applications |
JP4047575B2 (en) * | 2000-11-15 | 2008-02-13 | 株式会社セガ | Display object generation method in information processing apparatus, program for controlling execution thereof, and recording medium storing the program |
IL139995A (en) | 2000-11-29 | 2007-07-24 | Rvc Llc | System and method for spherical stereoscopic photographing |
US7116330B2 (en) * | 2001-02-28 | 2006-10-03 | Intel Corporation | Approximating motion using a three-dimensional model |
JP4695275B2 (en) * | 2001-03-07 | 2011-06-08 | 独立行政法人科学技術振興機構 | Video generation system |
US7061532B2 (en) | 2001-03-27 | 2006-06-13 | Hewlett-Packard Development Company, L.P. | Single sensor chip digital stereo camera |
US9400921B2 (en) * | 2001-05-09 | 2016-07-26 | Intel Corporation | Method and system using a data-driven model for monocular face tracking |
US6961055B2 (en) * | 2001-05-09 | 2005-11-01 | Free Radical Design Limited | Methods and apparatus for constructing virtual environments |
US6862121B2 (en) | 2001-06-05 | 2005-03-01 | California Institute Of Technolgy | Method and apparatus for holographic recording of fast phenomena |
JP4596220B2 (en) | 2001-06-26 | 2010-12-08 | ソニー株式会社 | Image processing apparatus and method, recording medium, and program |
WO2003025859A1 (en) | 2001-09-17 | 2003-03-27 | National Institute Of Advanced Industrial Science And Technology | Interface apparatus |
CA2359269A1 (en) | 2001-10-17 | 2003-04-17 | Biodentity Systems Corporation | Face imaging system for recordal and automated identity confirmation |
WO2003039698A1 (en) | 2001-11-02 | 2003-05-15 | Atlantis Cyberspace, Inc. | Virtual reality game system with pseudo 3d display driver & mission control |
JP4077622B2 (en) * | 2001-11-15 | 2008-04-16 | 独立行政法人科学技術振興機構 | 3D human moving image generation system |
US6833843B2 (en) | 2001-12-03 | 2004-12-21 | Tempest Microsystems | Panoramic imaging and display system with canonical magnifier |
US7683929B2 (en) | 2002-02-06 | 2010-03-23 | Nice Systems, Ltd. | System and method for video content analysis-based detection, surveillance and alarm management |
JP2003331315A (en) * | 2002-03-05 | 2003-11-21 | Matsushita Electric Works Ltd | Moving image information production system for virtual human body |
AU2003280516A1 (en) | 2002-07-01 | 2004-01-19 | The Regents Of The University Of California | Digital processing of video images |
JP3866168B2 (en) * | 2002-07-31 | 2007-01-10 | 独立行政法人科学技術振興機構 | Motion generation system using multiple structures |
US8013852B2 (en) * | 2002-08-02 | 2011-09-06 | Honda Giken Kogyo Kabushiki Kaisha | Anthropometry-based skeleton fitting |
US8460103B2 (en) | 2004-06-18 | 2013-06-11 | Igt | Gesture controlled casino gaming system |
JP3960536B2 (en) | 2002-08-12 | 2007-08-15 | 株式会社国際電気通信基礎技術研究所 | Computer-implemented method and computer-executable program for automatically adapting a parametric dynamic model to human actor size for motion capture |
KR100507780B1 (en) * | 2002-12-20 | 2005-08-17 | 한국전자통신연구원 | Apparatus and method for high-speed marker-free motion capture |
CN1739119A (en) | 2003-01-17 | 2006-02-22 | 皇家飞利浦电子股份有限公司 | Full depth map acquisition |
US9177387B2 (en) | 2003-02-11 | 2015-11-03 | Sony Computer Entertainment Inc. | Method and apparatus for real time motion capture |
US7257237B1 (en) | 2003-03-07 | 2007-08-14 | Sandia Corporation | Real time markerless motion tracking using linked kinematic chains |
US8745541B2 (en) | 2003-03-25 | 2014-06-03 | Microsoft Corporation | Architecture for controlling a computer using hand gestures |
WO2004094943A1 (en) | 2003-04-22 | 2004-11-04 | Hiroshi Arisawa | Motion capturing method, motion capturing device, and motion capturing marker |
US20070098250A1 (en) | 2003-05-01 | 2007-05-03 | Delta Dansk Elektronik, Lys Og Akustik | Man-machine interface based on 3-D positions of the human body |
US7418134B2 (en) | 2003-05-12 | 2008-08-26 | Princeton University | Method and apparatus for foreground segmentation of video sequences |
WO2004114063A2 (en) * | 2003-06-13 | 2004-12-29 | Georgia Tech Research Corporation | Data reconstruction using directional interpolation techniques |
JP2005020227A (en) | 2003-06-25 | 2005-01-20 | Pfu Ltd | Picture compression device |
JP2005025415A (en) | 2003-06-30 | 2005-01-27 | Sony Corp | Position detector |
US7257250B2 (en) * | 2003-10-29 | 2007-08-14 | International Business Machines Corporation | System, method, and program product for extracting a multiresolution quadrilateral-based subdivision surface representation from an arbitrary two-manifold polygon mesh |
US7755608B2 (en) | 2004-01-23 | 2010-07-13 | Hewlett-Packard Development Company, L.P. | Systems and methods of interfacing with a machine |
KR100853605B1 (en) | 2004-03-23 | 2008-08-22 | 후지쯔 가부시끼가이샤 | Distinguishing tilt and translation motion components in handheld devices |
US20070183633A1 (en) * | 2004-03-24 | 2007-08-09 | Andre Hoffmann | Identification, verification, and recognition method and system |
US8036494B2 (en) | 2004-04-15 | 2011-10-11 | Hewlett-Packard Development Company, L.P. | Enhancing image resolution |
US7308112B2 (en) | 2004-05-14 | 2007-12-11 | Honda Motor Co., Ltd. | Sign based human-machine interaction |
US7519223B2 (en) | 2004-06-28 | 2009-04-14 | Microsoft Corporation | Recognizing gestures and using gestures for interacting with software applications |
US8872899B2 (en) | 2004-07-30 | 2014-10-28 | Extreme Reality Ltd. | Method circuit and system for human to machine interfacing by hand gestures |
US8114172B2 (en) | 2004-07-30 | 2012-02-14 | Extreme Reality Ltd. | System and method for 3D space-dimension based image processing |
US8432390B2 (en) | 2004-07-30 | 2013-04-30 | Extreme Reality Ltd | Apparatus system and method for human-machine interface |
GB0424030D0 (en) | 2004-10-28 | 2004-12-01 | British Telecomm | A method and system for processing video data |
US7386150B2 (en) | 2004-11-12 | 2008-06-10 | Safeview, Inc. | Active subject imaging with body identification |
US7903141B1 (en) | 2005-02-15 | 2011-03-08 | Videomining Corporation | Method and system for event detection by multi-scale image invariant analysis |
WO2006099597A2 (en) | 2005-03-17 | 2006-09-21 | Honda Motor Co., Ltd. | Pose estimation based on critical point analysis |
US7774713B2 (en) | 2005-06-28 | 2010-08-10 | Microsoft Corporation | Dynamic user experience with semantic rich objects |
US20070285554A1 (en) | 2005-10-31 | 2007-12-13 | Dor Givon | Apparatus method and system for imaging |
US9046962B2 (en) | 2005-10-31 | 2015-06-02 | Extreme Reality Ltd. | Methods, systems, apparatuses, circuits and associated computer executable code for detecting motion, position and/or orientation of objects within a defined spatial region |
US8265349B2 (en) | 2006-02-07 | 2012-09-11 | Qualcomm Incorporated | Intra-mode region-of-interest video object segmentation |
US9395905B2 (en) | 2006-04-05 | 2016-07-19 | Synaptics Incorporated | Graphical scroll wheel |
JP2007302223A (en) | 2006-04-12 | 2007-11-22 | Hitachi Ltd | Non-contact input device for in-vehicle apparatus |
EP2033164B1 (en) | 2006-06-23 | 2015-10-07 | Imax Corporation | Methods and systems for converting 2d motion pictures for stereoscopic 3d exhibition |
US8022935B2 (en) | 2006-07-06 | 2011-09-20 | Apple Inc. | Capacitance sensing electrode with integrated I/O mechanism |
US7783118B2 (en) | 2006-07-13 | 2010-08-24 | Seiko Epson Corporation | Method and apparatus for determining motion in images |
US7701439B2 (en) | 2006-07-13 | 2010-04-20 | Northrop Grumman Corporation | Gesture recognition simulation system and method |
US8139067B2 (en) * | 2006-07-25 | 2012-03-20 | The Board Of Trustees Of The Leland Stanford Junior University | Shape completion, animation and marker-less motion capture of people, animals or characters |
US7907117B2 (en) | 2006-08-08 | 2011-03-15 | Microsoft Corporation | Virtual controller for visual displays |
US7936932B2 (en) | 2006-08-24 | 2011-05-03 | Dell Products L.P. | Methods and apparatus for reducing storage size |
US8356254B2 (en) | 2006-10-25 | 2013-01-15 | International Business Machines Corporation | System and method for interacting with a display |
US20080104547A1 (en) | 2006-10-25 | 2008-05-01 | General Electric Company | Gesture-based communications |
US8756516B2 (en) | 2006-10-31 | 2014-06-17 | Scenera Technologies, Llc | Methods, systems, and computer program products for interacting simultaneously with multiple application programs |
US7885480B2 (en) | 2006-10-31 | 2011-02-08 | Mitutoyo Corporation | Correlation peak finding method for image correlation displacement sensing |
US8793621B2 (en) | 2006-11-09 | 2014-07-29 | Navisense | Method and device to control touchless recognition |
US8023726B2 (en) * | 2006-11-10 | 2011-09-20 | University Of Maryland | Method and system for markerless motion capture using multiple cameras |
US8055073B1 (en) * | 2006-12-19 | 2011-11-08 | Playvision Technologies, Inc. | System and method for enabling meaningful interaction with video based characters and objects |
US8075499B2 (en) | 2007-05-18 | 2011-12-13 | Vaidhi Nathan | Abnormal motion detector and monitor |
US7916944B2 (en) | 2007-01-31 | 2011-03-29 | Fuji Xerox Co., Ltd. | System and method for feature level foreground segmentation |
JP5147933B2 (en) | 2007-04-15 | 2013-02-20 | エクストリーム リアリティー エルティーディー. | Man-machine interface device system and method |
WO2008134745A1 (en) | 2007-04-30 | 2008-11-06 | Gesturetek, Inc. | Mobile video-based therapy |
US8432377B2 (en) | 2007-08-30 | 2013-04-30 | Next Holdings Limited | Optical touchscreen with improved illumination |
US8041116B2 (en) | 2007-09-27 | 2011-10-18 | Behavioral Recognition Systems, Inc. | Identifying stale background pixels in a video analysis system |
US8005263B2 (en) | 2007-10-26 | 2011-08-23 | Honda Motor Co., Ltd. | Hand sign recognition using label assignment |
US9451142B2 (en) | 2007-11-30 | 2016-09-20 | Cognex Corporation | Vision sensors, systems, and methods |
US8107726B2 (en) | 2008-06-18 | 2012-01-31 | Samsung Electronics Co., Ltd. | System and method for class-specific object segmentation of image data |
AU2009281762A1 (en) | 2008-08-15 | 2010-02-18 | Brown University | Method and apparatus for estimating body shape |
US20110163948A1 (en) | 2008-09-04 | 2011-07-07 | Dor Givon | Method system and software for providing image sensor based human machine interfacing |
CA2741559A1 (en) | 2008-10-24 | 2010-04-29 | Extreme Reality Ltd. | A method system and associated modules and software components for providing image sensor based human machine interfacing |
US8289440B2 (en) | 2008-12-08 | 2012-10-16 | Lytro, Inc. | Light field data acquisition devices, and methods of using and manufacturing same |
KR101738569B1 (en) | 2009-02-17 | 2017-05-22 | 인텔 코포레이션 | Method and system for gesture recognition |
US8320619B2 (en) | 2009-05-29 | 2012-11-27 | Microsoft Corporation | Systems and methods for tracking a model |
US8466934B2 (en) | 2009-06-29 | 2013-06-18 | Min Liang Tan | Touchscreen interface |
US8270733B2 (en) | 2009-08-31 | 2012-09-18 | Behavioral Recognition Systems, Inc. | Identifying anomalous object types during classification |
US9218126B2 (en) | 2009-09-21 | 2015-12-22 | Extreme Reality Ltd. | Methods circuits apparatus and systems for human machine interfacing with an electronic appliance |
US8878779B2 (en) | 2009-09-21 | 2014-11-04 | Extreme Reality Ltd. | Methods circuits device systems and associated computer executable code for facilitating interfacing with a computing platform display screen |
US8659592B2 (en) | 2009-09-24 | 2014-02-25 | Shenzhen Tcl New Technology Ltd | 2D to 3D video conversion |
US20110292036A1 (en) | 2010-05-31 | 2011-12-01 | Primesense Ltd. | Depth sensor with application interface |
CA2806520C (en) | 2011-01-23 | 2016-02-16 | Extreme Reality Ltd. | Methods, systems, devices and associated processing logic for generating stereoscopic images and video |
US9251422B2 (en) | 2011-11-13 | 2016-02-02 | Extreme Reality Ltd. | Methods systems apparatuses circuits and associated computer executable code for video based subject characterization, categorization, identification and/or presence response |
-
2005
- 2005-07-31 US US11/572,958 patent/US8114172B2/en active Active
- 2005-07-31 CA CA2575704A patent/CA2575704C/en active Active
- 2005-07-31 KR KR1020077004917A patent/KR101183000B1/en not_active IP Right Cessation
- 2005-07-31 EP EP05763283A patent/EP1789928A4/en not_active Withdrawn
- 2005-07-31 KR KR1020137014664A patent/KR101424942B1/en not_active IP Right Cessation
- 2005-07-31 JP JP2007523240A patent/JP4904264B2/en not_active Expired - Fee Related
- 2005-07-31 WO PCT/IL2005/000813 patent/WO2006011153A2/en active Application Filing
- 2005-07-31 KR KR1020127007138A patent/KR101238608B1/en not_active IP Right Cessation
- 2005-07-31 KR KR1020127021137A patent/KR101295471B1/en not_active IP Right Cessation
- 2005-07-31 KR KR1020137000289A patent/KR101323966B1/en not_active IP Right Cessation
-
2007
- 2007-05-01 US US11/742,609 patent/US8237775B2/en active Active
- 2007-05-01 US US11/742,634 patent/US8111284B1/en active Active
-
2011
- 2011-10-07 JP JP2011223094A patent/JP5244951B2/en not_active Expired - Fee Related
-
2012
- 2012-06-24 US US13/531,543 patent/US9177220B2/en active Active
-
2013
- 2013-02-15 JP JP2013027962A patent/JP2013137785A/en active Pending
- 2013-04-05 JP JP2013079602A patent/JP2013157014A/en active Pending
-
2015
- 2015-10-15 US US14/883,702 patent/US20160105661A1/en not_active Abandoned
-
2018
- 2018-10-02 US US16/149,457 patent/US20190200003A1/en not_active Abandoned
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6317130B1 (en) * | 1996-10-31 | 2001-11-13 | Konami Co., Ltd. | Apparatus and method for generating skeleton-based dynamic picture images as well as medium storing therein program for generation of such picture images |
US6208360B1 (en) * | 1997-03-10 | 2001-03-27 | Kabushiki Kaisha Toshiba | Method and apparatus for graffiti animation |
US6144375A (en) * | 1998-08-14 | 2000-11-07 | Praja Inc. | Multi-perspective viewer for content-based interactivity |
US7823066B1 (en) * | 2000-03-03 | 2010-10-26 | Tibco Software Inc. | Intelligent console for content-based interactivity |
US20050063596A1 (en) * | 2001-11-23 | 2005-03-24 | Yosef Yomdin | Encoding of geometric modeled images |
US20060002601A1 (en) * | 2004-06-30 | 2006-01-05 | Accuray, Inc. | DRR generation using a non-linear attenuation model |
US20060002615A1 (en) * | 2004-06-30 | 2006-01-05 | Accuray, Inc. | Image enhancement method and system for fiducial-less tracking of treatment targets |
US20060074292A1 (en) * | 2004-09-30 | 2006-04-06 | Accuray, Inc. | Dynamic tracking of moving targets |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20230068731A1 (en) * | 2020-03-17 | 2023-03-02 | Sony Group Corporation | Image processing device and moving image data generation method |
US20220027656A1 (en) * | 2020-07-24 | 2022-01-27 | Ricoh Company, Ltd. | Image matching method and apparatus and non-transitory computer-readable medium |
US11948343B2 (en) * | 2020-07-24 | 2024-04-02 | Ricoh Company, Ltd. | Image matching method and apparatus and non-transitory computer-readable medium |
Also Published As
Publication number | Publication date |
---|---|
US20120320052A1 (en) | 2012-12-20 |
US8111284B1 (en) | 2012-02-07 |
US9177220B2 (en) | 2015-11-03 |
KR101238608B1 (en) | 2013-02-28 |
JP2012038334A (en) | 2012-02-23 |
CA2575704C (en) | 2014-03-04 |
JP5244951B2 (en) | 2013-07-24 |
CA2575704A1 (en) | 2006-02-02 |
KR20120040751A (en) | 2012-04-27 |
KR20070048752A (en) | 2007-05-09 |
US8114172B2 (en) | 2012-02-14 |
KR101424942B1 (en) | 2014-08-01 |
KR20130086061A (en) | 2013-07-30 |
WO2006011153A2 (en) | 2006-02-02 |
KR101323966B1 (en) | 2013-10-31 |
US8237775B2 (en) | 2012-08-07 |
EP1789928A2 (en) | 2007-05-30 |
WO2006011153A3 (en) | 2008-10-16 |
JP4904264B2 (en) | 2012-03-28 |
JP2013157014A (en) | 2013-08-15 |
KR20120096600A (en) | 2012-08-30 |
US20070285419A1 (en) | 2007-12-13 |
KR101295471B1 (en) | 2013-08-09 |
EP1789928A4 (en) | 2011-03-16 |
JP2008508590A (en) | 2008-03-21 |
JP2013137785A (en) | 2013-07-11 |
KR101183000B1 (en) | 2012-09-18 |
KR20130020717A (en) | 2013-02-27 |
US20160105661A1 (en) | 2016-04-14 |
US20080037829A1 (en) | 2008-02-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20190200003A1 (en) | System and method for 3d space-dimension based image processing | |
US8462198B2 (en) | Animation generation systems and methods | |
CN110544301A (en) | Three-dimensional human body action reconstruction system, method and action training system | |
US20120038739A1 (en) | Methods, systems, and computer readable media for shader-lamps based physical avatars of real and virtual people | |
WO2019140945A1 (en) | Mixed reality method applied to flight simulator | |
CN113421328A (en) | Three-dimensional human body virtual reconstruction method and device | |
Cheung et al. | Markerless human motion transfer | |
JP6799468B2 (en) | Image processing equipment, image processing methods and computer programs | |
Vasudevan et al. | A methodology for remote virtual interaction in teleimmersive environments | |
JP2023057498A (en) | Motion attitude evaluating system by overlapping comparison of images | |
KR100753965B1 (en) | Posture capture system of puppet and method thereof | |
CN109309827A (en) | More people's apparatus for real time tracking and method for 360 ° of suspension light field three-dimensional display systems | |
Blasko | Vision-based camera matching using markers | |
JP4822307B2 (en) | 3D object restoration method and apparatus | |
WO2023022606A1 (en) | Systems and methods for computer animation of an artificial character using facial poses from a live actor | |
Magnenat-Thalmann et al. | VIRTUAL MIRROR: A real-time motion capture application for virtual-try-on | |
Rasool | Tangible images | |
Delbridge | Directing for the 360 degree frame: developing a directorial approach to performance capture |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |