CN111709875B

CN111709875B - Image processing method, device, electronic equipment and storage medium

Info

Publication number: CN111709875B
Application number: CN202010549839.6A
Authority: CN
Inventors: 李鑫; 李甫; 林天威; 何栋梁; 张赫男; 孙昊; 文石磊; 丁二锐
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-06-16
Filing date: 2020-06-16
Publication date: 2023-11-14
Anticipated expiration: 2040-06-16
Also published as: CN111709875A

Abstract

The application discloses an image processing method, an image processing device, an image processing system and a storage medium, and relates to the field of image processing and deep learning. The specific implementation scheme is as follows: acquiring training images of the face images of the first type; selecting a second-type feature image corresponding to at least part of the facial area of the first-type face image; wherein the first type of style is different from the second type of style; based on the feature images of the second type corresponding to the at least partial facial areas, adjusting the face images to obtain adjusted face images; determining a target network by adopting a training image containing the adjusted face image; the target network is a trained network and is used for converting an input image to be processed containing face images into an output image containing face images of a second type.

Description

Image processing method, device, electronic equipment and storage medium

Technical Field

The present application relates to the field of information processing. The application relates in particular to the field of image processing and deep learning.

Background

In the related art, a similar loop (cycle) generation type network (GAN) is generally adopted in the process of converting images in different styles, however, such generation networks are often greatly affected by training data, and the problems of uncontrollable and unclear finally generated images are easy to occur.

Disclosure of Invention

The disclosure provides an image processing method, an image processing device, electronic equipment and a storage medium.

According to an aspect of the present disclosure, there is provided an image processing method including:

acquiring training images of the face images of the first type;

selecting a second-type feature image corresponding to at least part of the facial area of the first-type face image; wherein the first type of style is different from the second type of style;

based on the feature images of the second type corresponding to the at least partial facial areas, adjusting the face images to obtain adjusted face images;

determining a target network by adopting a training image containing the adjusted face image; the target network is a trained network and is used for converting an input image to be processed containing face images into an output image containing face images of a second type.

According to another aspect of the present disclosure, there is provided an image processing apparatus including:

the image acquisition module is used for acquiring training images of the face images of the first type;

the image preprocessing module is used for selecting a characteristic image of a second type corresponding to at least part of the face area of the face image of the first type; based on the feature images of the second type corresponding to the at least partial facial areas, adjusting the face images to obtain adjusted face images;

the training module is used for determining a target network by adopting a training image containing the adjusted face image; the target network is a trained network and is used for converting an input image to be processed containing face images into an output image containing face images of a second type.

According to an aspect of the present disclosure, there is provided an electronic apparatus including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the aforementioned method.

According to an aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the aforementioned method.

According to an aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.

According to the technology of the application, when the target network is trained, the training image is preprocessed, the facial area in the training image is adjusted to be the characteristic image of the second type, and then the training of the target network is performed based on the adjusted image, so that the task difficulty is reduced in the training of the network, the load of the network is reduced, and the image generated by the network is more controllable.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are included to provide a better understanding of the present application and are not to be construed as limiting the application. Wherein:

FIG. 1 is a flowchart of an image processing method according to an embodiment of the present application;

FIG. 2 is a flowchart of a second image processing method according to an embodiment of the present application;

FIG. 3 is a cartoon face image obtained after image conversion in the related art;

fig. 4 is a semi-finished image in face image preprocessing according to an embodiment of the present application;

FIG. 5 is a schematic diagram showing the constitution of an image processing apparatus according to an embodiment of the present application;

FIG. 6 is a schematic diagram showing a second configuration of an image processing apparatus according to an embodiment of the present application;

fig. 7 is a block diagram of an electronic device for implementing an image processing method of an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present application are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

An embodiment of the present application provides an image processing method, as shown in fig. 1, including:

s101: acquiring training images of the face images of the first type;

s102: selecting a second-type feature image corresponding to at least part of the facial area of the first-type face image; wherein the first type of style is different from the second type of style;

s103: based on the feature images of the second type corresponding to the at least partial facial areas, adjusting the face images to obtain adjusted face images;

s104: determining a target network by adopting a training image containing the adjusted face image; the target network is a trained network and is used for converting an input image to be processed containing face images into an output image containing face images of a second type.

The solution provided in this embodiment may be applied to an electronic device, for example, a server or a terminal device, which is not limited herein.

The training images of the facial images of the first type may be all or at least part of the images in the training image set. That is, in the process of training the target network, the target network may be trained by using the foregoing method for all of the images in the training image set.

The first type of style is different from the second type of style, and in one example, the first type of style can be understood as a photo obtained by photographing or can be understood as an image of a real person; the second type of style can be understood as cartoon style, and also can be oil painting style, traditional Chinese painting style and the like. Of course, the two styles can be determined according to practical situations, and this example is not exhaustive.

In S102, the selecting a second type of feature image corresponding to at least a part of the face area of the first type of face image includes:

and selecting a second-type feature image matched with the features of each face region from a preset image library based on the features of each face region in at least part of the face regions of the first-type face image.

Here, the preset image library may be different from the training image library. The preset image library mainly comprises at least one characteristic image of a second type.

Specifically, the second-type feature images refer to feature images of different facial regions in the second-type face.

The face region may be one of five sense organs in the face image, for example, the face region is one of eyes, nose, mouth, eyebrows, and ears in the face image. Accordingly, the feature images of the second type, that is, the images of the five sense organs (any one of eyes, nose, mouth, eyebrows, ears) of the second type.

In one example, selecting a second type of feature image that matches the features of each face region may be specifically:

and selecting a second type of feature image matched with the at least one feature from a preset image library according to the at least one feature of each face area.

Wherein the at least one feature may include at least one of: the size of the face area, the sex of the face corresponding to the face area and the opening and closing angle of the face area. Of course, further features may be present, and the present embodiment is described by way of example only and not by way of limitation.

For example, the facial region may be an eye and the corresponding features may include: the size of the eyes, the sex of the face is female, the eyes open, the angle of opening of the eyes, etc. Still alternatively, the facial region may be a mouth and the corresponding features may include: mouth width, height, sex of face is male, mouth closed, etc. There may also be more facial regions and their corresponding features, which are not exhaustive.

Further, selecting the feature images of the second type of style matching the features of each face region may be:

sequentially matching the features of the face area with each specific image in a preset image library to obtain a matched feature image of a second type;

or, the feature images stored in the preset image library may contain corresponding labels or feature descriptions, and the features of the face area are sequentially matched with the labels or feature descriptions of each specific image in the preset image library, so as to obtain the matched feature images of the second type.

In an example, eyes and mouth can be identified in the face image, the characteristics of the eyes are analyzed to obtain the characteristics of the size, the opening and closing angle and the like of the eyes, the characteristics of the size, the maximum opening and closing angle and the like of the mouth are obtained, and the sex corresponding to the face image is male or female; and then, based on the characteristics of eyes and mouths obtained through analysis, selecting and obtaining the corresponding second-type characteristic images of the eyes and the mouths from a preset image library, namely the characteristic images of the eyes and the mouths in cartoon styles.

If the areas of eyes, nose and ears can be identified in the face image, then only the eyes and mouth can be adjusted according to the preset configuration, and the at least partial areas can be understood as eyes and mouth in all the areas in the face image. That is, at least a partial region may be understood as a region determined according to a preset configuration.

Still alternatively, at least part of the face region may be a recognizable face region, for example, the face image may include eyes, nose, ears, and mouth, but part of the face region, for example, the ears, may not be clearly recognizable, and then at least part of the face region may be eyes, nose, and mouth.

Based on the processing mode, the second-type characteristic images are selected, so that images which are more fit with the replacement characteristics of the face images can be obtained, and the output obtained by the final trained target network can be more fit with the requirements of users.

In S103, the adjusting the face image based on the feature image of the second type corresponding to the at least part of face area to obtain an adjusted face image includes:

and adding the feature images of the second type corresponding to each face area to the corresponding face areas of the face images to obtain adjusted face images.

For example, at least part of the facial area may be eyes and mouth, and the corresponding second type of feature image may be an image of cartoon eyes and mouth; and sticking the cartoon eyes and the cartoon mouth images to the eyes and the mouth positions of the face image to obtain the adjusted face image.

Of course, the facial region corresponding to the facial image can be replaced by the feature image based on the second type of style, that is, the image of the corresponding region in the facial image is replaced by the image of the cartoon eyes and the mouth, so as to obtain the adjusted facial image.

In S104, the target network may be a cycle Gan network, but may be other networks, which is not exhaustive herein.

It should be noted that if the network is a cycle Gan network, a corresponding cartoon output image (i.e. a face image of the second type) can also be input during training.

The training process of the cycle Gan network is not described here.

The target network trained so far, further, in an example, the solution provided in this embodiment may further be as shown in fig. 2, including:

s201: acquiring an image to be processed, and extracting a first type of face image to be processed from the image to be processed;

s202: selecting a feature image of a second type which is matched with at least part of facial areas in the face images to be processed of the first type;

s203: adding the feature images of the second type to at least part of the face areas of the face images to be processed to obtain adjusted images to be processed;

s204: and inputting the adjusted image to be processed into the target network to obtain an output image containing the face image of the second type.

The image to be processed may be any image taken by the user, or alternatively, a photograph input by the user, or the like.

The first type of style and the second type of style in S201-S204 are the same as the foregoing embodiments, and are not described here again.

In S202, selecting a second type of feature image matched with at least part of the face area in the first type of face image to be processed, including:

and selecting a second-type feature image matched with the features of each face area from a preset image library based on the features of each face area in the first-type to-be-processed face image.

The specific processing of S202 is similar to the processing of selecting the corresponding feature image for the training image, and will not be repeated; in addition, the process of adjusting the image to be processed in S203 is similar to the process of the training image described above, and will not be described again.

In the related art, the cycle gan is used to process the input image, so that the problems of unclear lines and the like are easily generated, for example, referring to fig. 3, as for the image obtained by converting the input face image (the input face image is a photograph, not shown in the figure), it can be seen that the result generated by the cartoon portrait obtained by converting is disordered, for example, the lines of eyebrows are abnormal, and the unclear lines appear on the face, so that the result is not controllable in the process of converting the image types in the related art.

According to the scheme, at least one of eyes, nose, eyebrows and mouth of a second type matched with the facial area characteristics of the original image can be firstly attached in a mapping mode, so that a semi-finished product of a two-dimensional cartoon graph (shown in figure 4) is generated; then, the semi-finished product graph is used as an input training cycle gan, so that task difficulty is reduced, network load is reduced, the model only needs to make a few fine adjustments on the input semi-finished product, and the generated image is more controllable. Alternatively, at least one of eyes, nose, eyebrows, and mouth of a second type matching with facial region features of the image to be processed may be attached first by means of a map to generate a semi-finished product of a two-dimensional cartoon map (an image as shown in fig. 4 may be used as an example); and then inputting the semi-finished product graph into a cycle gan to obtain a converted cartoon face image, so that the generated image is more controllable.

An embodiment of the present application further provides an image processing apparatus, as shown in fig. 5, including:

an image acquisition module 51, configured to acquire a training image of a face image of a first style included therein;

an image preprocessing module 52, configured to select a feature image of a second type corresponding to at least a part of the face area of the face image of the first type; based on the feature images of the second type corresponding to the at least partial facial areas, adjusting the face images to obtain adjusted face images;

a training module 53, configured to determine a target network by using a training image including the adjusted face image; the target network is a trained network and is used for converting an input image to be processed containing face images into an output image containing face images of a second type.

The image preprocessing module 52 is configured to select, from a preset image library, a feature image of a second type that matches the feature of each face region based on the feature of each face region in at least a part of the face images of the first type.

The image preprocessing module 52 is configured to add a feature image of a second type corresponding to each face area to the corresponding face area of the face image, so as to obtain an adjusted face image.

In one example, as shown in fig. 6, the apparatus further comprises:

the image processing module 54 is configured to obtain an image to be processed, and extract a first type of face image to be processed from the image to be processed; selecting a feature image of a second type which is matched with at least part of facial areas in the face images to be processed of the first type; adding the feature images of the second type to at least part of the face areas of the face images to be processed to obtain adjusted images to be processed; and inputting the adjusted image to be processed into the target network to obtain an output image containing the face image of the second type.

The image processing module 54 is configured to select, from a preset image library, a feature image of a second type that matches the feature of each face area based on the feature of each face area in the face image to be processed of the first type.

It should be understood that, in this embodiment, the processing that each module in the image processing apparatus can execute is the same as that in the foregoing method embodiment, and will not be described herein.

In addition, the image processing apparatus may be implemented in the same electronic device, that is, the modules are all disposed in the same electronic device. Or, the image acquisition module, the image preprocessing module and the training module may be implemented in different electronic devices, for example, the image acquisition module, the image preprocessing module and the training module may be disposed in the first electronic device, and the image processing module may be disposed in the second electronic device. Of course, there may be many more arrangements of modules, which are not exhaustive.

According to embodiments of the present application, the present application also provides an electronic device, a readable storage medium and a computer program product.

As shown in fig. 7, there is a block diagram of an electronic device of an image processing method according to an embodiment of the present application. The electronic device may be the aforementioned deployment device or proxy device. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the applications described and/or claimed herein.

As shown in fig. 7, the electronic device includes: one or more processors 801, memory 802, and interfaces for connecting the components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 801 is illustrated in fig. 7.

Memory 802 is a non-transitory computer readable storage medium provided by the present application. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the image processing method provided by the present application. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to execute the image processing method provided by the present application.

The memory 802 is used as a non-transitory computer readable storage medium for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., an image acquisition module, an image preprocessing module, a training module, an image processing module, as shown in fig. 6) corresponding to an image processing method according to an embodiment of the present application. The processor 801 executes various functional applications of the server and data processing, that is, implements the image processing method in the above-described method embodiment, by running non-transitory software programs, instructions, and modules stored in the memory 802.

Memory 802 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of the electronic device, etc. In addition, memory 802 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 802 may optionally include memory located remotely from processor 801, which may be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the image processing method may further include: an input device 803 and an output device 804. The processor 801, memory 802, input devices 803, and output devices 804 may be connected by a bus or other means, for example in fig. 7.

The input device 803 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer stick, one or more mouse buttons, a track ball, a joystick, and the like. The output device 804 may include a display apparatus, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibration motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.

According to the technical scheme of the embodiment of the application, when the target network is trained, the training image is preprocessed, the face area in the training image is adjusted to be the characteristic image of the second type style, and then the target network is trained based on the adjusted image, so that the task difficulty is reduced in the training of the network, the load of the network is reduced, and the image generated by the network is more controllable.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed embodiments are achieved, and are not limited herein.

The above embodiments do not limit the scope of the present application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application should be included in the scope of the present application.

Claims

1. An image processing method, comprising:

acquiring training images of the face images of the first type;

determining a target network by adopting a training image containing the adjusted face image; the target network is a trained network and is used for converting an input image to be processed containing a face image into an output image containing a face image of a second type;

the selecting the feature image of the second type corresponding to at least part of the face area of the face image of the first type comprises the following steps:

selecting a second-type feature image matched with the features of each face region from a preset image library based on the features of each face region in at least part of the face regions of the first-type face image;

wherein the at least partial facial area is an area determined according to preset configuration, and at least one of the five sense organs in the face image is only adjusted according to the preset configuration; alternatively, the at least partial face region is a recognizable face region;

the method further comprises the steps of:

acquiring an image to be processed, and extracting a first type of face image to be processed from the image to be processed;

selecting a feature image of a second type which is matched with at least part of facial areas in the face images to be processed of the first type;

adding the feature images of the second type to at least part of the face areas of the face images to be processed to obtain adjusted images to be processed; the adding mode is in the form of a map;

and inputting the adjusted image to be processed into the target network to obtain an output image containing the face image of the second type.

2. The method according to claim 1, wherein the adjusting the face image based on the feature image of the second type corresponding to the at least part of the face area to obtain an adjusted face image includes:

3. The method of claim 1, wherein selecting a second type of feature image that matches at least a portion of the facial region in the first type of face image to be processed comprises:

4. An image processing apparatus comprising:

the training module is used for determining a target network by adopting a training image containing the adjusted face image; the target network is a trained network and is used for converting an input image to be processed containing a face image into an output image containing a face image of a second type;

the image preprocessing module is used for selecting a second-type feature image matched with the features of each face area from a preset image library based on the features of each face area in at least part of the face areas of the first-type face image;

the apparatus further comprises:

the image processing module is used for acquiring an image to be processed and extracting a first type of face image to be processed from the image to be processed; selecting a feature image of a second type which is matched with at least part of facial areas in the face images to be processed of the first type; adding the feature images of the second type to at least part of the face areas of the face images to be processed to obtain adjusted images to be processed; the adding mode is in the form of a map; and inputting the adjusted image to be processed into the target network to obtain an output image containing the face image of the second type.

5. The apparatus of claim 4, wherein the image preprocessing module is configured to add a feature image of a second type corresponding to each face region to the corresponding face region of the face image to obtain an adjusted face image.

6. The apparatus according to claim 4, wherein the image processing module is configured to select, from a preset image library, a feature image of a second type that matches the feature of each face region based on the feature of each face region in the face image to be processed of the first type.

7. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-3.

8. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-3.

9. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-3.