CN108765529A

CN108765529A - Video generation method and device

Info

Publication number: CN108765529A
Application number: CN201810419784.XA
Authority: CN
Inventors: 邓澍军
Original assignee: Beijing Bit Intelligence Technology Co Ltd
Current assignee: Beijing Bit Intelligence Technology Co Ltd
Priority date: 2018-05-04
Filing date: 2018-05-04
Publication date: 2018-11-06

Abstract

The application proposes a kind of video generation method and device, wherein method includes：Depth information acquisition is carried out to the first object in the first video, the depth information based on acquisition builds the 3D models of first object；Obtain the face image data of the second object；It is added to the face image data in the 3D models of first object, obtains the 3D models of second object；Using the 3D models of second object, the 3D models of first object in first video are replaced, generate the second video.Pass through this method, it can realize and 3D model replacements are carried out to the figure image having in video, obtain the individualized video of 3 D stereo, the personage in video can be replaced with the image of oneself by user according to demand, it is participated in video thereby using family, improves sense of participation and the experience sense of user.

Description

Video generation method and device

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a video generation method and apparatus.

Background

In order to meet the requirement of users for independently making videos, some video making software is produced at the same time. However, most of the existing video production software obtains a video file by splicing a plurality of pictures, and in the obtained video file, each character image still appears in the form of a two-dimensional image, so that the stereoscopic impression is poor.

In addition, in the related art, the editing operation on the video file generally only supports simple processing operations such as clipping, creating subtitles, adding pictures, background music, and the like, and cannot perform personalized editing on characters or things contained in the video file.

Disclosure of Invention

The application provides a video generation method and device, which aim to solve the technical problem that in the prior art, the stereoscopic impression of videos which cannot be individually edited and manufactured by people or things contained in video files is poor.

Therefore, the first aspect of the present application provides a video generation method to implement 3D model replacement on a character image in an existing video to obtain a three-dimensional personalized video, and a user can replace the character in the video with its own image according to the requirement, so that the user participates in the video, and the participation and experience of the user are improved.

A second aspect of the present application provides a video generating apparatus.

A third aspect of the present application provides an electronic device.

A fourth aspect of the present application provides a non-transitory computer-readable storage medium.

A fifth aspect of the present application proposes a computer program product.

An embodiment of a first aspect of the present application provides a video generation method, including:

acquiring depth information of a first object in a first video, and constructing a 3D model of the first object based on the acquired depth information;

acquiring face image data of a second object;

adding the face image data to the 3D model of the first object to obtain a 3D model of the second object;

and replacing the 3D model of the first object in the first video by the 3D model of the second object to generate a second video.

According to the video generation method, the first object in the first video is subjected to depth information acquisition, the 3D model of the first object is built based on the acquired depth information, the face image data of the second object is obtained, the face image data is added into the 3D model of the first object, the 3D model of the second object is obtained, and then the 3D model of the first object in the first video is replaced by the 3D model of the second object, and the second video is obtained. Therefore, the 3D model of the first object in the first video is replaced by the 3D model of the second object, the 3D model replacement of the character image in the existing video can be achieved, the three-dimensional personalized video is obtained, and the user can replace the character in the video with the image of the user according to the requirement, so that the user participates in the video, and the participation and experience of the user are improved.

An embodiment of a second aspect of the present application provides a video generating apparatus, including:

the device comprises a construction module, a data acquisition module and a data processing module, wherein the construction module is used for acquiring depth information of a first object in a first video and constructing a 3D model of the first object based on the acquired depth information;

the acquisition module is used for acquiring face image data of a second object;

the fitting module is used for adding the face image data into the 3D model of the first object to obtain a 3D model of the second object;

and the generating module is used for replacing the 3D model of the first object in the first video by utilizing the 3D model of the second object to generate a second video.

The video generation device in the embodiment of the application acquires depth information of a first object in a first video, constructs a 3D model of the first object based on the acquired depth information, acquires face image data of a second object, adds the face image data into the 3D model of the first object to obtain the 3D model of the second object, and then replaces the 3D model of the first object in the first video with the 3D model of the second object to obtain a second video. Therefore, the 3D model of the first object in the first video is replaced by the 3D model of the second object, the 3D model replacement of the character image in the existing video can be achieved, the three-dimensional personalized video is obtained, and the user can replace the character in the video with the image of the user according to the requirement, so that the user participates in the video, and the participation and experience of the user are improved.

An embodiment of a third aspect of the present application provides an electronic device, including: a processor and a memory; wherein the processor executes a program corresponding to the executable program code by reading the executable program code stored in the memory, so as to implement the video generation method according to the embodiment of the first aspect.

A fourth aspect of the present application is directed to a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the video generation method according to the first aspect.

An embodiment of a fifth aspect of the present application provides a computer program product, where instructions of the computer program product, when executed by a processor, perform the video generation method according to the embodiment of the first aspect.

Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a schematic flowchart of a video generation method according to an embodiment of the present application;

fig. 2 is a schematic flowchart of another video generation method provided in the embodiment of the present application;

fig. 3 is a schematic flowchart of another video generation method according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a video generating apparatus according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of another video generating apparatus according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of another video generating apparatus according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 8 is a diagram illustrating a hardware structure of an electronic device according to an embodiment of the present application; and

fig. 9 is a schematic diagram illustrating a computer-readable storage medium according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.

A video generation method and apparatus of an embodiment of the present application are described below with reference to the drawings.

Fig. 1 is a schematic flowchart of a video generation method according to an embodiment of the present application.

As shown in fig. 1, the video generation method includes the steps of:

step 101, collecting depth information of a first object in a first video, and constructing a 3D model of the first object based on the collected depth information.

The first object may be a real person model, such as a foreign child, participating in recording the first video, or may be a cartoon character image in the first video.

As an example, when the first object is a cartoon character image, the first video may be a 3D animation including the first object, the cartoon character image and the object in the 3D animation being presented in the form of a three-dimensional solid, the cartoon character image in the 3D animation carrying depth information. Thus, in this example, the depth information of the first object may be directly acquired from the 3D animation, and a 3D model of the first object may be constructed using the acquired depth information. For example, a user may select a cartoon character from a first video as a first object by means of clicking, after receiving a click operation of the user on the cartoon character in the first video, the device determines the cartoon character targeted by the click operation as the first object selected by the user, acquires depth information of the cartoon character (i.e., the first object) from the first video, and constructs a 3D model of the first object by using the depth information.

As an example, when the first object is a real model, the first video may be a video file recorded in a real scene, a cartoon scene, or the like using the first object, for example, the first video may be a video of the recorded first object and a foreign teacher talking or singing on a grass, or the first video may be a music clip, a story clip, or the like in which the first object participates, which is shot using a green or blue curtain as a background. In this example, the depth information of the first object may be previously acquired and the 3D model may be constructed by the depth information acquiring means before or after the first video recording.

For example, a structured light projector may be provided in the apparatus to form depth information of the actual scene by collecting reflected light of the structured light on the actual scene. When the depth information of the first object needs to be acquired, the structured light projector is started, and after the structured light emitted by the structured light projector reaches the first object, the first object obstructs the structured light, so that the structured light is reflected at the first object to form reflected light. At this time, reflected light formed on the first object by the structured light may be collected by a camera installed in the apparatus, and depth information of the first object is formed using the collected reflected light. Furthermore, a 3D model of the first object may be constructed from the acquired depth information.

When constructing the 3D model of the first object, the feature point data for forming the 3D model may be extracted from the depth information, and the feature points may be connected into a network according to the extracted feature point data. For example, according to the distance relationship of each point in space, points of the same plane or points with distances within a threshold range are connected into a triangular network, and then the networks are spliced, so that the 3D model can be constructed.

In practical application, when the first object is a real-person model, a 3D model of the first object may be established in advance, a first video including the first object may be recorded in advance, and the 3D model of the first object and the corresponding at least one first video may be stored in advance for selection by a user. When the user wants to participate in the first video only including a character, the 3D model of the first object is determined after the user selects the first video; when the first video in which the user wants to participate includes at least two character images, the user may display the character images included in the first video to the user after selecting the first video, select one of the character images as a first object by the user, and determine a 3D model corresponding to the first object from the pre-stored 3D models after selecting the first object by the user.

Step 102, facial image data of a second subject is acquired.

The second object is a person who wants to participate in the first video, and may be, for example, the user himself or a family or a friend of the user.

As an example, when the face image data of the second object is acquired, the face image data may be acquired by performing face detection on the second object by the image sensor. For example, when a user wants to participate in a first video, the user may turn on a front camera of the device, place a human face within a visible range of the front camera, and start the image sensor to acquire the facial image data of the user after the front camera detects the human face.

As an example, when acquiring the face image data of the second object, candidate images including a face region of the second object may be acquired from an image library of the second object, and then for each candidate image, a ratio of the face region in the candidate image is extracted, and if the ratio exceeds a preset threshold, the candidate image is taken as a target image, and then the face image data is extracted from the target image. For example, a preset threshold may be set to 50%, and when the proportion of the face region in the candidate image reaches 50%, the candidate image is determined as the target image. If the number of candidate images of the face area in the candidate images reaches the threshold value is multiple, determining the candidate image with the highest ratio as the target image; if the ratio of the face area in the candidate image does not reach the preset threshold value, the candidate image with the highest ratio can be determined as the target image, or the second object is reminded to start the camera to acquire the face image data. Furthermore, a related face recognition technique may be employed to extract face image data from the determined target image.

The face image data is extracted by determining the image with the face area having the ratio in the candidate image reaching the preset threshold as the target image, so that the face image data with higher definition can be ensured to be acquired.

Step 103, adding the face image data to the 3D model of the first object to obtain the 3D model of the second object.

In this embodiment, after the face image data of the second object is obtained, the face region of the first object may be recognized from the 3D model of the first object, and then, the face image data of the second object is placed on the face region of the first object recognized from the 3D model of the first object, so as to obtain the 3D model of the second object.

In order to ensure the accuracy and the integrity of the attachment of the face image data and avoid that the face image data of the second object is too large or too small after being added into the 3D model of the first object, so that the obtained 3D model is not coordinated enough, in a possible implementation manner of the embodiment of the present application, a face region can be identified from the 3D model of the first object, and the size of the face region is obtained, and then the size of the face image corresponding to the face image data of the second object is adjusted according to the obtained size, so that the adjusted face image is matched with the size of the face region in the 3D model, and then the adjusted face image is attached into the face region, so that a relatively complete and coordinated 3D model of the second object is obtained.

And 104, replacing the 3D model of the first object in the first video by the 3D model of the second object to generate a second video.

The first video is a video file in which the second object wants to participate, and the first video comprises the first object; the second video includes a 3D model of a second object.

In this embodiment, after obtaining the 3D model of the second object, the 3D model of the first object in the first video may be replaced with the 3D model of the second object, so as to obtain a second video file including the 3D model of the second object.

In the video generation method of this embodiment, depth information is acquired for a first object in a first video, a 3D model of the first object is constructed based on the acquired depth information, face image data of a second object is acquired, the face image data is added to a face region in the 3D model of the first object to obtain the 3D model of the second object, and the 3D model of the first object in the first video is replaced by the 3D model of the second object to obtain a second video. Therefore, the 3D model of the first object in the first video is replaced by the 3D model of the second object, the 3D model replacement of the character image in the existing video can be achieved, the three-dimensional personalized video is obtained, and the user can replace the character in the video with the image of the user according to the requirement, so that the user participates in the video, and the participation and experience of the user are improved.

In order to describe more clearly the implementation process of generating the second video by replacing the 3D model of the first object in the first video with the 3D model of the second object in the foregoing embodiment, the embodiment of the present application proposes another video generation method,

fig. 2 is a schematic flowchart of another video generation method according to an embodiment of the present disclosure.

As shown in fig. 2, step 104 may include the following steps based on the embodiment shown in fig. 1:

step 201, aiming at each frame of a first picture in a first video, identifying a first object from the first picture according to the characteristic information of the first object.

The feature information of the first object may be, for example, facial feature information of the first object, including one or more of eyes, eyebrows, nose, mouth, and face.

In this embodiment, for each frame of the first picture in the first video, the first object may be identified from the first picture according to the feature information of the first object. For example, the face feature information of each character included in the first screen may be matched with the face feature information of the first object, and the character having the highest degree of matching may be determined as the first object.

Step 202, a 3D model of the first object is extracted from the first picture, and a 3D model of the second object is filled in a blank area after the 3D model of the first object is extracted to form a second picture.

In this embodiment, after the first object is identified from the first image, the 3D model of the first object in the first image may be extracted, and then the acquired 3D model of the second object is filled into a blank area of the first image after the 3D model of the first object is extracted, so as to obtain a second image including the 3D model of the second object.

And step 203, synthesizing a second video by using the second picture.

In this embodiment, after replacing the 3D model of the first object in each frame of the first image with the 3D model of the second object, a plurality of frames of the second image are obtained, and then, the second image is utilized to obtain a second video by synthesis, where the second video includes the 3D model of the second object.

Specifically, when the second video is synthesized by using the second picture, the expression data on the 3D model of the first object may be extracted from the first picture, and then the expression on the 3D model of the second object in the second picture of the same frame as the first picture is controlled according to the expression data, so that the second video is synthesized by using the second picture carrying the expression.

In a specific implementation, after the first object is recognized from the first picture, expression data on the 3D model of the first object may be extracted, and the extracted expression data may be cached. And after the second picture is obtained, controlling the expression on the 3D model of the second object in the second picture by using the cached expression data of the first object, so that the expression on the 3D model of the second object in the second picture is consistent with the expression on the 3D model of the first object in the first picture.

The expression data on the 3D model of the first object is used for controlling the expression on the 3D model of the second object in the second picture, so that the expression of the second object in the obtained second video can be consistent with the expression of the first object in the first video, the richness of the expression of the second object in the second video is improved, and the second object in the second picture is more vivid.

Because the dressing style is different from person to person, especially the colors and styles of dresses are greatly different between boys and girls. In order to improve the reality of the second object in the second video, so that the second object in the second video conforms to its own real image, in one possible implementation manner of the embodiment of the present application, when the facial image data of the second object is obtained, the clothing feature of the second object may also be obtained synchronously, where the clothing feature may include, but is not limited to, the style, color, and the like of the second object. Therefore, before the second video is synthesized by the second picture, the clothing feature of the second object in the second picture can be updated by the obtained clothing feature of the second object, and the clothing feature is updated for the second object of which the clothing feature in the second picture is the clothing feature of the first object, so that after the clothing feature is updated, the clothing feature of the second object in the second picture is consistent with the actual dress of the second object, and the reality of the second object in the second video is improved.

Since the first object participating in the recording of the first video may be different from the country to which the second object belongs and the skin color of people in different countries may be different, for example, the skin color of people in european countries is white and the skin color of people in asian countries is yellow. In order to improve the reality of the second object in the second video, so that the skin color of the second object in the second video is consistent with the real skin color thereof, in a possible implementation manner of the embodiment of the application, when the face image data of the second object is obtained, the skin color feature of the second object may also be obtained from the face image data. For example, the color values of the pixel points in the face region in the face image data may be obtained, and then the skin color feature of the second object may be determined according to the color values. Furthermore, before the second video is synthesized by using the second picture, the skin color feature of the second object in the second picture can be updated by using the obtained skin color feature of the second object, wherein the skin color feature of the second object before updating is the skin color feature of the first object. Therefore, after the skin color feature of the second object in the second picture is updated by using the obtained skin color feature of the second object, the skin color feature of the second object in the second picture can be consistent with the skin color of the second object, and the authenticity of the second object in the second video is improved.

According to the video generation method, the first object is identified from the first image according to the characteristic information of the first object aiming at each frame of the first image in the first video, the 3D model of the first object is extracted from the first image, the 3D model of the second object is filled in a blank area after the 3D model of the first object is extracted to form the second image, the second image is further synthesized to form the second video, the image of the person in the video can be replaced by the image of the user, the personalized editing of the video is realized, and the participation and experience of the user are improved.

In order to more clearly describe that the facial image data is added to the 3D model of the first object to obtain the 3D model of the second object in the foregoing embodiment, another video generation method is proposed in the embodiment of the present application, and fig. 3 is a flowchart of another video generation method provided in the embodiment of the present application.

As shown in fig. 3, step 103 may include the following steps based on the embodiment shown in fig. 1:

step 301, identifying the key point position of the second object from the face image data, and obtaining the first central point of the key point of the second object according to each key point position of the second object.

Wherein the keypoints of the second subject may be, for example, the eyes, eyebrows, mouth, nose, and ears of the second subject.

In this embodiment, after the face image data of the second object is obtained, the positions of the key points of the second object may be identified from the face image data according to the shapes of the facial organs such as ears and nose, and then the first central point of the key point of the second object is obtained according to the position of each key point of the second object.

Example one, the first center point may be one. In this example, the first center point may be a region where a nose of the second object is located; or the first center point may be determined according to the positions of all the key points of the second object, a dot of the minimum circular region including all the key points of the second object may be used as the first center point, or an intersection point of two diagonal lines of the minimum rectangular region including all the key points of the second object may be used as the first center point, and the determination manner of the first center point is not limited in the present application.

Example two, the number of first center points is consistent with the number of keypoints. In this example, for each keypoint, the first central point may be determined according to the central position of the area covered by the keypoint. For example, when the key point is an eye, the first central point may be a pixel point where the central position of the eye is located; when the key point is the mouth, the first center point may be the midpoint of the mouth corner line.

Step 302, identifying a face region of the first object from the 3D model of the first object, identifying key point positions of the first object from the face region, and obtaining a second center point of the key point of the first object according to each key point position of the first object.

In this embodiment, for the constructed 3D model of the first object, a face recognition technology may be adopted to recognize a face region of the first object from the 3D model of the first object, and the key point position of the first object may be recognized from the face region of the first object according to the shape of the face organs such as ears and nose. Wherein the key points of the first object and the key points of the second object comprise the same facial organs, for example, the key points of the second object are ears and noses, and the identified key points of the first object also comprise ears and noses; the key points of the second object are mouth, eyes and nose, and the identified key points of the first object also include mouth, eyes and nose. Further, the second center point of the first object keypoint may be obtained from each keypoint location of the first object in the same way as the first center point of the second object keypoint is obtained.

Step 303, adding the face image data to the face region in the 3D model of the first object according to the first center point and the second center point.

In this embodiment, after the first central point and the second central point are obtained, the face image data of the second object may be added to the face region in the 3D model of the first object according to the first central point and the second central point, so as to obtain the 3D model of the second object.

As an example, when the number of the first central point and the second central point is one, the first central point may be placed at a position where the second central point is located, and then the key point of the second object is attached to the face region in the 3D model of the first object according to the relative position relationship between the key point of the second object and the first central point.

In a second example, when the number of the first central points and the second central points is multiple, for a key point that is the same as the first object and the second object, the key point of the second object may be attached to the face region in the 3D model of the first object according to the first central point and the second central point that correspond to the key point.

In the video generation method of this embodiment, by identifying the keypoint of the second object from the face image data, acquiring the first center point of the keypoint of the second object, identifying the face region of the first object from the 3D model of the first object, identifying the keypoint position of the first object from the face region, acquiring the second center point of the keypoint of the first object according to each keypoint position of the first object, and adding the face image data to the face region in the 3D model of the first object according to the first center point and the second center point to obtain the 3D model of the second object, the 3D model of the second object that is consistent with the 3D model contour of the first object can be obtained, so as to provide a condition for replacing the 3D model of the first object in the first video with the 3D model of the second object.

In order to implement the above embodiments, the present application further provides a video generating apparatus.

Fig. 4 is a schematic structural diagram of a video generating apparatus according to an embodiment of the present application.

As shown in fig. 4, the video generation apparatus 40 includes: a build module 410, an acquire module 420, a fit module 430, and a generate module 440.

The constructing module 410 is configured to perform depth information acquisition on a first object in a first video, and construct a 3D model of the first object based on the acquired depth information.

An obtaining module 420, configured to obtain face image data of the second object.

Specifically, the obtaining module 420 is configured to perform face detection on the second object through the image sensor, and obtain face image data; or, acquiring a candidate image including a face region of the second object from an image library of the second object, extracting the occupation ratio of the face region in the candidate image for each candidate image, and if the occupation ratio exceeds a preset threshold, taking the candidate image as a target image, and further extracting face image data from the target image.

And the fitting module 430 is configured to add the face image data to the 3D model of the first object to obtain a 3D model of the second object.

A generating module 440, configured to replace the 3D model of the first object in the first video with the 3D model of the second object, and generate a second video.

Further, in a possible implementation manner of the embodiment of the present application, as shown in fig. 5, on the basis of the embodiment shown in fig. 4, the generating module 440 may include:

the identifying unit 441 is configured to identify, for each frame of the first picture in the first video, the first object from the first picture according to the feature information of the first object.

A filling unit 442, configured to extract a 3D model of the first object from the first frame, and fill the 3D model of the second object in a blank area after extracting the 3D model of the first object to form a second frame.

A synthesizing unit 443 configured to synthesize the second video using the second picture.

Specifically, the synthesizing unit 443 is configured to extract expression data on a 3D model of the first object from the first screen; controlling the expression on the 3D model of the second object in the second picture of the same frame as the first picture according to the expression data; and synthesizing a second video by using the second picture carrying the expression.

In a possible implementation manner of the embodiment of the present application, when the obtaining module 420 obtains the face image data of the second object, the clothing feature of the second object may also be obtained. Thus, in this embodiment, the synthesizing unit 443 is further configured to update the clothing feature of the second object in the second screen with the obtained clothing feature of the second object before synthesizing the second video; and the clothing feature of the second object is the clothing feature of the first object before updating, so that the clothing feature of the second object in the second picture is consistent with the actual clothing of the second object, and the reality of the second object in the second video is improved.

In a possible implementation manner of the embodiment of the present application, when the obtaining module 420 obtains the face image data of the second object, the skin color feature of the second object may also be obtained from the face image data. Therefore, in this embodiment, the synthesizing unit 443 is further configured to update the skin color feature of the second object in the second picture by using the obtained skin color feature of the second object before synthesizing the second video; and the skin color feature of the second object is the skin color feature of the first object before updating, so that the skin color feature of the second object in the second picture is consistent with the skin color of the second object, and the reality of the second object in the second video is improved.

According to each frame of the first picture in the first video, the first object is identified from the first picture according to the characteristic information of the first object, then the 3D model of the first object is extracted from the first picture, the 3D model of the second object is filled in a blank area after the 3D model of the first object is extracted to form a second picture, then the second picture is used for synthesizing the second video, the figure image in the video can be replaced by the image of the user, the personalized editing of the video is realized, and the participation and experience of the user are improved.

In a possible implementation manner of the embodiment of the present application, as shown in fig. 6, on the basis of the embodiment shown in fig. 4, the attaching module 430 may include:

an obtaining unit 431, configured to identify a key point position of the second object from the face image data, and obtain a first center point of a key point of the second object according to each key point position of the second object; and identifying a face region of the first object from the 3D model of the first object, identifying key point positions of the first object from the face region, and acquiring a second central point of the key points of the first object according to each key point position of the first object.

And the fitting unit 432 is configured to add the face image data to the face region in the 3D model of the first object according to the first center point and the second center point.

The 3D model of the second object can be obtained by obtaining a first central point of a key point of the second object and a second central point of the key point of the first object and then adding the face image data into the face region in the 3D model of the first object according to the first central point and the second central point, so that the 3D model of the second object with the contour consistent with that of the 3D model of the first object can be obtained, and conditions are provided for replacing the 3D model of the first object in the first video by using the 3D model of the second object.

It should be noted that the foregoing explanation on the embodiment of the video generation method is also applicable to the video generation apparatus of the embodiment, and the implementation principle thereof is similar and will not be described herein again.

The video generation apparatus of this embodiment acquires depth information of a first object in a first video, constructs a 3D model of the first object based on the acquired depth information, acquires face image data of a second object, adds the face image data to the 3D model of the first object to obtain the 3D model of the second object, and further replaces the 3D model of the first object in the first video with the 3D model of the second object to obtain a second video. Therefore, the 3D model of the first object in the first video is replaced by the 3D model of the second object, the 3D model replacement of the character image in the existing video can be achieved, the three-dimensional personalized video is obtained, and the user can replace the character in the video with the image of the user according to the requirement, so that the user participates in the video, and the participation and experience of the user are improved.

In order to implement the above embodiments, the present application further provides an electronic device.

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 7, the electronic apparatus 80 includes: a processor 801 and a memory 802. Wherein, the processor 801 runs a program corresponding to the executable program code by reading the executable program code stored in the memory 802 for implementing the video generation method as described in the foregoing embodiments.

Fig. 8 is a diagram illustrating a hardware structure of an electronic device according to an embodiment of the present application. The electronic device may be implemented in various forms, and the electronic device in the present application may include, but is not limited to, mobile terminal devices such as a mobile phone, a smart phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a navigation apparatus, a vehicle-mounted terminal device, a vehicle-mounted display terminal, a vehicle-mounted electronic rear view mirror, and the like, and fixed terminal devices such as a digital TV, a desktop computer, and the like.

As shown in fig. 8, the electronic apparatus 1100 may include a wireless communication unit 1110, an a/V (audio/video) input unit 1120, a user input unit 1130, a sensing unit 1140, an output unit 1150, a memory 1160, an interface unit 1170, a controller 1180, a power supply unit 1190, and the like. Fig. 8 shows a terminal device having various components, but it is to be understood that not all of the illustrated components are required to be implemented. More or fewer components may alternatively be implemented.

The wireless communication unit 1110 allows radio communication between the electronic apparatus 1100 and a wireless communication system or a network, among others. The a/V input unit 1120 is for receiving an audio or video signal. The user input unit 1130 may generate key input data to control various operations of the electronic apparatus according to a command input by a user. The sensing unit 1140 detects a current state of the electronic device 1100, a position of the electronic device 1100, presence or absence of a touch input by a user to the electronic device 1100, an orientation of the electronic device 1100, acceleration or deceleration movement and direction of the electronic device 1100, and the like, and generates a command or signal for controlling an operation of the electronic device 1100. The interface unit 1170 serves as an interface through which at least one external device is connected to the electronic apparatus 1100. The output unit 1150 is configured to provide output signals in a visual, audio, and/or tactile manner. The memory 1160 may store software programs and the like for processing and controlling operations performed by the controller 1180, or may temporarily store data that has been output or is to be output. Memory 1160 may include at least one type of storage media. Also, the electronic apparatus 1100 may cooperate with network storage that performs storage functions of the memory 1160 via a network connection. The controller 1180 generally controls the overall operation of the electronic device. In addition, the controller 1180 may include a multimedia module for reproducing or playing back multimedia data. The controller 1180 may perform a pattern recognition process to recognize a handwriting input or a picture drawing input performed on the touch screen as a character or an image. The power supply unit 1190 receives external power or internal power and provides appropriate power required to operate the various elements and components under the control of the controller 1180.

Various embodiments of the video generation methods presented herein may be implemented using a computer-readable medium, such as computer software, hardware, or any combination thereof. For a hardware implementation, various embodiments of the video generation method proposed by the present application may be implemented by using at least one of an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a processor, a controller, a microcontroller, a microprocessor, an electronic unit designed to perform the functions described herein, and in some cases, various embodiments of the video generation method proposed by the present application may be implemented in the controller 1180. For software implementation, various embodiments of the video generation method presented herein may be implemented with separate software modules that allow for performing at least one function or operation. The software codes may be implemented by software applications (or programs) written in any suitable programming language, which may be stored in memory 1160 and executed by controller 1180.

In order to implement the above embodiments, the present application also proposes a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the video generation method as described in the foregoing embodiments.

Fig. 9 is a schematic diagram illustrating a computer-readable storage medium according to an embodiment of the present application. As shown in fig. 9, a computer-readable storage medium 300 having non-transitory computer-readable instructions 310 stored thereon according to an embodiment of the application. The non-transitory computer readable instructions 310, when executed by a processor, perform all or a portion of the steps of the video generation method of the embodiments of the present application as previously described.

To implement the above embodiments, the present application also proposes a computer program product, in which instructions, when executed by a processor, implement the video generation method as described in the foregoing embodiments.

In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims

1. A method of video generation, comprising:

acquiring face image data of a second object;

2. The method of claim 1, wherein replacing the 3D model of the first object in the first video with the 3D model of the second object to generate a second video comprises:

for each frame of a first picture in the first video, identifying the first object from the first picture according to the characteristic information of the first object;

the 3D model of the first object is extracted from the first picture, and the 3D model of the second object is filled in a blank area after the 3D model of the first object is extracted to form a second picture;

and synthesizing the second video by using the second picture.

3. The method of claim 2, wherein the synthesizing the second video using the second picture comprises:

extracting expression data on a 3D model of the first object from the first picture;

controlling the expression on the 3D model of the second object in the second picture of the same frame as the first picture according to the expression data;

and synthesizing the second video by using the second picture carrying the expression.

4. The method of claim 2 or 3, further comprising:

synchronously acquiring the clothing characteristics of the second object when acquiring the face image data of the second object;

before the synthesizing the second video, the method further comprises:

updating the clothing feature of the second object in the second picture by using the obtained clothing feature of the second object; wherein the clothing feature of the second object before updating is the clothing feature of the first object.

5. The method of claim 2 or 3, further comprising:

obtaining skin color features of the second object from the face image data;

before the synthesizing the second video, the method further comprises:

updating the skin color feature of the second object in the second picture by using the obtained skin color feature of the second object; wherein the skin tone feature of the second object before updating is the skin tone feature of the first object.

6. The method of claim 1, wherein said adding the facial image data to the 3D model of the first object comprises:

identifying the key point position of the second object from the face image data, and acquiring a first central point of the key point of the second object according to each key point position of the second object;

identifying a face region of the first object from the 3D model of the first object, identifying key point positions of the first object from the face region, and acquiring a second central point of the key points of the first object according to each key point position of the first object;

and adding the face image data into a face region in the 3D model of the first object according to the first central point and the second central point.

7. The method of claim 1, wherein the obtaining facial image data of the second subject comprises:

carrying out face detection on the second object through an image sensor to obtain face image data; or,

acquiring a candidate image comprising a face region of the second object from an image library of the second object;

for each candidate image, extracting the proportion of the face region in the candidate image, and if the proportion exceeds a preset threshold value, taking the candidate image as a target image;

extracting the facial image data from the target image.

8. A video generation apparatus, comprising:

9. An electronic device comprising a processor and a memory;

wherein the processor executes a program corresponding to the executable program code by reading the executable program code stored in the memory for implementing the video generation method according to any one of claims 1 to 7.

10. A non-transitory computer-readable storage medium on which a computer program is stored, the program, when executed by a processor, implementing the video generation method of any one of claims 1-7.