CN111223046B

CN111223046B - Image super-resolution reconstruction method and device

Info

Publication number: CN111223046B
Application number: CN201911140450.XA
Authority: CN
Inventors: 孙旭; 董晓宇; 高连如; 雷莉萍; 张兵
Original assignee: Institute of Remote Sensing and Digital Earth of CAS
Current assignee: Institute of Remote Sensing and Digital Earth of CAS
Priority date: 2019-11-20
Filing date: 2019-11-20
Publication date: 2023-04-25
Anticipated expiration: 2039-11-20
Also published as: CN111223046A

Abstract

The application provides an image super-resolution reconstruction method and device, wherein the method comprises the following steps: and inputting the low-resolution image and the resolution improvement times into a trained preset network to obtain a high-resolution reconstructed image. The trained preset network comprises a preset number of multi-perception branch modules, and any multi-perception branch module comprises a plurality of cascaded residual channel attention groups; any residual channel attention group includes a plurality of cascaded enhancement residual blocks, any enhancement residual block including: the second convolution layer, the third convolution layer, the rectifying module and the second summation module; an image for inputting the enhanced residual block is input to a second convolution layer; the output of the second convolution layer is input into the rectification module; the output of the rectifying module is input into the third convolution layer; the image for inputting the enhanced residual block, the output of the rectifying module, and the output of the third convolution layer are respectively input to the second summing module. The reconstructed image has higher spatial resolution and higher information fidelity.

Description

Image super-resolution reconstruction method and device

Technical Field

The present disclosure relates to the field of image processing, and in particular, to a method and an apparatus for reconstructing super-resolution images.

Background

Super-Resolution reconstruction (SR) refers to restoring a high Resolution image from one or more low Resolution images of the same scene.

Super-resolution reconstruction is an important digital image processing technology and has wide application in the fields of medicine, remote sensing and various social life. The current mainstream image super-resolution reconstruction method is as follows: a super-resolution reconstruction method based on deep learning. Specifically, a mapping relation between a high-resolution training sample pair and a low-resolution training sample pair is learned by constructing a neural network, and then various low-resolution images of an input network are reconstructed with high resolution by using learned priori knowledge.

However, the fidelity of the reconstructed high resolution image is low.

Disclosure of Invention

The application provides an image super-resolution reconstruction method and device, and aims to solve the problem that the fidelity of an image obtained by super-resolution reconstruction is low.

In order to achieve the above object, the present application provides the following technical solutions:

the application provides an image super-resolution reconstruction method, which comprises the following steps:

acquiring a low-resolution image to be reconstructed and a preset resolution improvement multiple;

inputting the low-resolution image and the resolution improvement multiple into a trained preset network to obtain a high-resolution reconstructed image; the trained preset network comprises a first convolution layer, a preset number of multi-perception branch modules, a first summation module, a first channel attention module and an up-sampling module; any one of the multi-perception branching modules comprises a plurality of cascaded residual channel attention groups; any one of the residual channel attention groups comprises a plurality of cascaded enhanced residual blocks;

The low-resolution image is input into the first convolution layer, and the output of the first convolution layer is respectively input into each multi-perception branch module; the output of each multi-perception branching module and the output of the first convolution layer are respectively input into the first summation module; the first summation module is used for summing pixel values of pixel points at the same position of the same channel in the input image; the output of the first summing module is input to the first channel attention module; the output of the first channel attention module is input into the up-sampling module; the up-sampling module is used for up-sampling the output of the first channel attention module by the resolution improvement times; the up-sampling module obtains the high-resolution reconstructed image;

any one of the enhanced residual blocks includes: the second convolution layer, the third convolution layer, the rectifying module and the second summation module; an image for inputting the enhanced residual block is input to the second convolution layer; the output of the second convolution layer is input into the rectification module; the output of the rectifying module is input into the third convolution layer; the image for inputting the enhanced residual block, the output of the rectifying module, and the output of the third convolution layer are respectively input to the second summing module; the second summation module is used for summing pixel values of pixel points at the same position of the same channel in the input image to obtain a multi-channel image;

Outputting the high-resolution reconstructed image.

Optionally, the preset number is not less than 2.

Optionally, the preset network further includes: a fourth convolution layer;

the output of the up-sampling module is input into the fourth convolution layer;

and the fourth convolution layer carries out convolution operation on the output of the up-sampling module to obtain the high-resolution reconstructed image.

Optionally, any one of the residual channel attention groups further includes: a second channel attention module, a fifth convolution layer, and a third summation module;

an image for inputting the residual channel attention group, a first enhanced residual block in the residual channel attention group; the output of the first enhanced residual block inputs a second enhanced residual block of the residual channel attention group; the output of the (B-1) th enhanced residual block in the residual channel attention group is input into the (B) th enhanced residual block in the residual channel attention group; the output of the B enhanced residual block is input into the second channel attention module; the output of the second channel attention module is input into the fifth convolution layer;

the image for inputting the residual channel attention group and the output of the fifth convolution layer are respectively input into the third summation module;

And the third summation module is used for summing pixel values of the pixel points at the same position of the same channel in the input image and outputting a multi-channel image.

Optionally, the rectifying module is a linear rectifying module.

The application also provides an image super-resolution reconstruction device, which comprises:

the acquisition module is used for acquiring the low-resolution image to be reconstructed and a preset resolution improvement multiple;

the reconstruction module is used for inputting the low-resolution image and the resolution improvement multiple into a trained preset network to obtain a high-resolution reconstructed image; the trained preset network comprises a first convolution layer, a preset number of multi-perception branch modules, a first summation module, a first channel attention module and an up-sampling module; any one of the multi-perception branching modules comprises a plurality of cascaded residual channel attention groups; any one of the residual channel attention groups comprises a plurality of cascaded enhanced residual blocks;

and the output module is used for outputting the high-resolution reconstructed image.

Optionally, the preset number is not less than 2.

Optionally, the preset network further includes: a fourth convolution layer;

Optionally, the rectifying module is a linear rectifying module.

The application also provides a storage medium comprising a stored program, wherein the program executes any one of the image super-resolution reconstruction methods.

The application also provides a device comprising at least one processor, and at least one memory and a bus connected with the processor; the processor and the memory complete communication with each other through the bus; the processor is configured to invoke the program instructions in the memory to perform any of the above-described image super-resolution reconstruction methods.

In the image super-resolution reconstruction method and device, because the preset network comprises a preset number of multi-perception branch modules, any one multi-perception branch module comprises a plurality of cascaded residual channel attention groups, and each residual channel attention group comprises a plurality of cascaded enhanced residual blocks. Wherein, any one enhancement residual block comprises a second convolution layer, a third convolution layer, a rectification module and a second summation module. The image used for inputting the enhanced residual block is input into the second convolution layer, the output of the second convolution layer is input into the rectifying module, and the output of the rectifying module is input into the third convolution layer, so that the input of the third convolution layer in any one enhanced residual block is obtained through the calculation of the second convolution layer, the third convolution layer achieves a different perception scale with the second convolution layer, and the perception scale of the third convolution layer is increased relative to that of the second convolution layer.

The image used for inputting the enhancement residual block, the output of the rectifying module and the output of the third convolution layer are respectively input into the second summation module; the second summation module is used for summing pixel values of pixel points at the same position of the same channel in the input image to obtain images of all channels, namely the enhanced residual block can extract three levels of information, namely an image for inputting the enhanced residual block, the output of the rectification module and the output of the third convolution layer. And, the characteristic information of these three levels is all as the input of the enhancement residual block of the subsequent cascade, and so on, any multi-perception branch module can extract the information of more levels (each level includes a plurality of channels), and then, the preset number of multi-perception branch modules can extract the information of more levels.

In addition, the preset network further comprises a first summation module and a first channel attention module, wherein the first summation module sums pixel values of pixels at the same position of the same channel in the multi-level information output by the first convolution layer and each multi-perception branch module to obtain a multi-channel image, and the first channel attention module gives different weights to different channels in the multi-channel image output by the first summation module, so that different channel characteristics are utilized in a self-adaptive manner. Meanwhile, the up-sampling module improves the output of the first channel attention module by a preset multiple of resolution, so that compared with the high-resolution image reconstructed by the existing mode, the high-resolution image reconstructed by the embodiment of the application has higher fidelity.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic structural diagram of a multi-awareness network according to an embodiment of the present disclosure;

FIG. 2 is a schematic structural diagram of any multi-aware branching module according to an embodiment of the present application;

FIG. 3 is a schematic structural diagram of any one of the attention groups of residual channels disclosed in the embodiments of the present application;

fig. 4 is a schematic structural diagram of any one of the enhanced residual blocks disclosed in the embodiments of the present application;

fig. 5 is a schematic diagram of a training process of a multi-awareness network according to an embodiment of the present disclosure;

FIG. 6 is a flowchart of an image super-resolution reconstruction method disclosed in an embodiment of the present application;

fig. 7 is a schematic structural diagram of an image super-resolution reconstruction device according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

The inventor of the application finds that the reasons for low fidelity of the high-resolution image reconstructed by the existing image super-resolution reconstruction method based on the deep learning in the research include: first, image information learned by the neural network is not fully utilized, i.e., the perceptibility is limited. And the information extracted from different levels in the second and the neural networks are directly used for final reconstruction, namely, the channel-level characteristic difference among the information extracted from different levels in the neural networks is ignored.

In one aspect, the network provided in the embodiment of the present application includes a preset number of multi-aware branching modules, where any one of the multi-aware branching modules includes a plurality of cascaded residual channel attention groups, and each residual channel attention group includes a plurality of cascaded enhanced residual blocks. Wherein, any one enhancement residual block comprises a second convolution layer, a third convolution layer, a rectification module and a second summation module. The image used for inputting the enhanced residual block is input into the second convolution layer, the output of the second convolution layer is input into the rectifying module, and the output of the rectifying module is input into the third convolution layer, so that the input of the third convolution layer in any one enhanced residual block is obtained through the calculation of the second convolution layer, the third convolution layer achieves a different perception scale with the second convolution layer, and the perception scale of the third convolution layer is increased relative to that of the second convolution layer.

On the other hand, the image for inputting the enhanced residual block, the output of the rectifying module, and the output of the third convolution layer are respectively input to the second summing module; the second summation module is used for summing pixel values of pixel points at the same position of the same channel in the input image to obtain images of all channels, namely the enhanced residual block can extract three levels of information, namely an image for inputting the enhanced residual block, the output of the rectification module and the output of the third convolution layer. And, the characteristic information of these three levels is all as the input of the enhancement residual block of the subsequent cascade, and so on, any multi-perception branch module can extract the information of more levels (each level includes a plurality of channels), and then, the information of more levels that the multi-perception branch module of preset quantity can output.

In addition, the network provided by the embodiment of the application further comprises a first summation module and a first channel attention module, wherein the first summation module sums the pixel values of the pixel points at the same position of the same channel in the multi-level information output by the first convolution layer and each multi-perception branch module to obtain a multi-channel image, and the first channel attention module gives different weights to different channels in the multi-channel image output by the first summation module, so that the characteristics of the different channels are utilized to different extents in a self-adaptive manner.

In summary, compared with the high-resolution image reconstructed by the existing method, the high-resolution image reconstructed by the embodiment of the application has higher fidelity.

Fig. 1 is a structure of a multi-awareness attention network according to an embodiment of the present application, including:

the device comprises a first convolution layer, a preset number of multi-perception branch modules, a first summation module, a first channel attention module, an up-sampling module and a fourth convolution layer.

The low-resolution image to be reconstructed is input into a first convolution layer, the output of the first convolution layer is respectively input into each multi-perception branch module, the output of each multi-perception branch module and the output of the first convolution layer are respectively input into a first summation module, and the first summation module is used for summing pixel values of pixel points at the same position of the same channel in the input image; the output of the first summing module is input into the first channel attention module; the output of the first channel attention module is input into an up-sampling module, the up-sampling module is used for carrying out r times up-sampling operation on the output of the first channel attention module, and the output of the up-sampling module is input into a fourth convolution layer; and the fourth convolution layer carries out convolution operation on the input of the up-sampling module to obtain a high-resolution reconstruction image.

Specifically, the first summing module sums the pixel values of the pixel points at the same position of the same channel in the input image, where the meaning is that: the output image of each multi-perception branching module is assumed to be an n-channel image, specifically, a 1 st channel, a 2 nd channel, a 3 rd channel, a … th channel and an n-th channel respectively. The output of the first convolution layer is an n-channel image, specifically, the 1 st channel, the 2 nd channel, the 3 rd channel, the … th channel and the n-th channel. In this embodiment, the first summing module sums the pixel values of the pixel points at the same position of the 1 st channel in the image output by each multi-perception branching module and the 1 st channel in the image output by the first convolution layer. The first summation module sums the pixel values of the pixel points at the same position of the 2 nd channel in the image output by each multi-perception branch module and the 2 nd channel in the image output by the first convolution layer, and so on, and the first summation module sums the pixel values of the pixel points at the same position of the n th channel in the image output by each multi-perception branch module and the n th channel in the image output by the first convolution layer. That is, the first summing module is configured to sum pixel values of pixels at the same position in the same channel in the image output by each multi-perception branching module and the image output by the first convolution layer.

It should be noted that, in the embodiment of the present application, the fourth convolution layer is optional, if the multi-awareness network includes the fourth convolution layer, the fourth convolution layer outputs the high-resolution image, and if the multi-awareness network does not include the fourth convolution layer, the up-sampling module outputs the high-resolution image.

Specifically, in the present embodiment, a low-resolution image input to a multi-aware attention network (hereinafter referred to as an MPAN for convenience of description) is denoted as X, where X is an M-row N-column C-channel image. The parameters of the first convolution layer are w _MPAN,1 The parameters of the fourth convolution layer are represented by w _MPAN,2 And (3) representing. Wherein the output of the first convolution layer is denoted as X ₀ Specific X ₀ ＝F _Conv (X,w _MPAN,1 ) Wherein F is _Conv Representing a convolution operation. Wherein if the first convolution layer includes n convolutionsCore, then X ₀ For M rows and N columns of N-channel images.

Since in this embodiment, the output of the first convolution layer is input to each multi-aware branching module (for convenience of description, any one multi-aware branching is simply referred to as MPB). In this embodiment, the convolution layer set of the kth MPB in the MPAN is represented as

The value of K is the number of multi-perception branch modules in the MPAN. In practice, in order to ensure that the high-resolution image reconstructed by the MPAN has higher fidelity, the value of K obtained by the test is not less than 2, and of course, in practice, the value of K may also be other values, and the embodiment does not specifically limit the value of K.

In this embodiment, the low resolution image X input into the MPAN network is calculated by the following formula (1):

in the method, in the process of the invention,

x representing the output of the first MPB to the first convolutional layer ₀ Is used for calculating the result of the calculation,

x representing the output of the second MPB to the first convolutional layer ₀ Is calculated as->

X representing the output of the Kth MPB to the first convolutional layer ₀ Is calculated by the computer. Wherein any MPB outputs X to the first convolution layer ₀ Is an N-channel image of M rows and N columns.

In the method, in the process of the invention,

the representation is: first oneOutput X of summing module to first convolution layer ₀ And summing pixel values of pixel points at the same position of the same channel in the output of each MPB. The first summing module outputs M rows and N columns of N-channel images.

In the method, in the process of the invention,

the representation is: the first channel attention module calculates M rows and N columns of N channel images output by the first summation module. Assuming X is adopted ¹ Representation->

Then

Can be expressed as

Wherein w is _down A convolution layer consisting of n/r' convolution kernels of size 1×1×n, w _up For a convolution layer consisting of n convolution kernels of size 1 x n/r ', r' is the vector dimension transform factor in the pass through attention module.

In the method, in the process of the invention,

representing the upsampling module upsampling the output of the first summing module. Where r refers to the image size magnification, or resolution enhancement, or upsampling rate.

In the method, in the process of the invention,

and the fourth convolution layer carries out convolution calculation on the output of the up-sampling module to reconstruct a high-resolution image.

In the present embodiment, the kth MPB in the MPAN includes G ^(k) The structure of the group of residual channel attention (hereinafter, referred to as RCAG for convenience of description) and the sixth convolutional layer, specifically, the kth MPB is shown in fig. 2. As can be seen from FIG. 2, G ^(k) RCAG (RCAG)A sixth convolutional layer cascade, specifically, the output of the first RCAG is input to the second RCAG, the output of the second RCAG is input to the third RCAG, … …, the G ^(k) -1 output of RCAG input G ^(k) RCAG, G ^(k) The output of each RCAG is input to the sixth convolutional layer. In this embodiment, the convolution layer set of the g-th RCAG in the kth MPB employs

Indicating that the sixth convolution layer employs +.>

Representing, therefore, the convolution layer set of the kth MPB in the MPAN

For any one MPB, assuming that the input data is X, a calculation formula of the data X by the MPB is as follows formula (2):

wherein X is an input image;

represents the convolution layer set, w, of the g-th RCAG _MPB A sixth convolutional layer in the MPB is shown.

Representing the operation of the g-th RCAG in MPB on the input, e.g., +.>

Representation of use->

And calculating X.

Wherein the structure of the g-th RCAG in the kth MPB is shown in FIG. 3 and comprises a plurality of A concatenated enhancement residual block, a second channel attention module, a fifth convolution layer, and a third summation module. Wherein the image for inputting the RCAG is assumed to be X ² First X is taken up ² The first enhancement residual block is input, the output of the first enhancement residual block is input into the second enhancement residual block, the output of the second enhancement residual block is input into the third enhancement residual block, and so on, the output of the last enhancement residual block is input into the second channel attention group, and the output of the second channel attention group is input into the fifth convolution layer. Output of fifth convolution layer and image X for inputting RCAG ² And respectively inputting the pixel values of the pixel points at the same position of the same channel in the input image to a third summation module, and outputting the multi-channel image. If image X for inputting RCAG ² And (3) outputting the image of M rows, N columns and N channels by the third summation module.

Specifically, the calculation formula of any one RCAG for the input image is shown in the following formula (3):

wherein X is an input image of the RCAG;

is a set of convolutional layers in an RCAG;

A set of convolutional layers representing the b-th ERB in the RCAG, +.>

Representation of use->

The inputs are operated on. w (w) _RCAG The convolution layer at the end of the RCAG, the fifth convolution layer in this embodiment, is shown. w (w) _up See w in the first channel attention module in MPAN _up ，w _down See w in the first channel attention module in MPAN _down And will not be described in detail here.

In this embodiment, the structure of any one of the g-th RCAG enhanced residual blocks in the kth MPB is shown in fig. 4. The device comprises a second convolution layer, a third convolution layer, a rectification module and a second summation module. Wherein the calculation process of the enhanced residual block on the image input into the enhanced residual block comprises the following steps: the image input to the enhanced residual block is input to the second convolution layer at first, the output of the second convolution layer is input to the rectification module, the output of the rectification module is input to the third convolution layer, and the output of the third convolution layer, the output of the rectification module and the image input to the enhanced residual block are respectively input to the second summation module.

Assuming that an image to which the enhanced residual block is input is represented as X, a calculation formula of the enhanced residual block for the image is shown as the following formula (4):

w in the formula _ERB ＝{w _ERB,1 ,w _ERB,2 The set of convolutional layers in the enhanced residual block, w _ERB,1 Representing the second convolutional layer, w, in the enhanced residual block _ERB,2 Representing the third convolutional layer in the enhanced residual block. F (F) _Conv (X,w _ERB,1 ) Representing the convolution operation performed on image X by the second convolution layer in the enhanced residual block. F (F) _ReLU (F _Conv (X,w _ERB,1 ) A rectification module performs rectification calculation on the output of the second convolution module. F (F) _Conv (F _ReLU (F _Conv (X,w _ERB,1 )),w _ERB,2 ) Representing the third convolution layer performing convolution calculations on the output of the rectification module.

In this embodiment, the rectifying module may be specifically a linear rectifying module, that is, the rectifying module calculates the output of the first convolution layer according to a linear rectifying function. The linear rectification function is the prior art, and is not described herein.

Fig. 5 is a training process of a multi-awareness network according to an embodiment of the present application, including the following steps:

s501, acquiring an image set to be trained and resolution improvement multiples.

In this embodiment, the resolution improvement factor is a super resolution improvement factor that needs to be achieved by the MPAN network obtained by training in this embodiment. For example, if the MPAN network trained by the embodiment is to achieve a 3-fold resolution improvement effect, the resolution improvement factor in this step is 3.

In this step, the image set to be trained includes: a preset high resolution image set and a preset low resolution image set. The low-resolution image set is obtained by r times degradation of the high-resolution image set. Wherein, the value of r is the resolution improvement multiple in the step.

It should be noted that, the resolution improvement factor in this embodiment may be set according to actual situations, and this embodiment does not limit the specific value of the resolution improvement factor.

In particular, the high resolution image set is represented as

Representing a low resolution image set as +.>

Wherein r is a resolution improvement factor.

S502, initializing convolution kernels in each convolution layer in the MPAN network.

Specifically, the MPAN network in this embodiment is an MPAN network provided in fig. 1 in this embodiment of the present application.

In this step, the size of the convolution kernel in each convolution layer in the MPAN network is t×t×c, and the number of convolution kernels in each convolution layer may be n.

S503, inputting the low-resolution image set and the resolution improvement times into an MPAN network, and obtaining a result image set after the MPAN network rebuilds the low-resolution image set.

In this step, after the low resolution image set and the resolution improvement factor are input into the MPAN network, the MPAN network reconstructs the low resolution image, and outputs a high resolution image reconstructed from the low resolution image set, which is referred to as a result image for convenience of description.

S504, calculating a loss function value between the result image set and the high-resolution image set according to a preset loss function.

In this embodiment, the loss function may be a formula shown in the following formula (6):

wherein L is ₁ (W _MPAN ) The value of the loss function is indicated,

representing a low resolution image set

Result image calculated by MPAN network, < ->

Representing high resolution image concentration and +.>

A corresponding high resolution image.

In this embodiment, the loss function may also be a formula shown in the following formula (7):

wherein L is ₂ (W _MPAN ) The value of the loss function is indicated,

representing a low resolution image set

Result image calculated by MPAN network, < ->

Representing high resolution image concentration and +.>

A corresponding high resolution image.

In the embodiment, only two specific formulas of the loss function are provided, in practice, the calculation formulas of the loss function may also use formulas of other forms, and the specific form of the loss function is not limited in the embodiment.

S505, according to the loss function value, the convolution operation weight of all convolution layers in the MPAN network is adjusted, and the step of inputting the low-resolution image set and the resolution improvement multiple into the MPAN network is returned to obtain a result image set after the MPAN network reconstructs the low-resolution image set until the loss function value is not reduced any more, and the trained MPAN network is obtained.

The purpose that this embodiment needs to achieve for training an MPAN network is: and (3) by adjusting convolution operation weights of all convolution layers in the MPAN network, inputting a low-resolution image in the MPAN network, calculating a result image through the MPAN network, and obtaining the trained MPAN network when the loss function value between high-resolution images corresponding to the low-resolution image reaches the minimum value. Namely, the convolution operation weight set formed by the convolution operation weights of all the convolution layers in the trained MPAN network reaches the optimal value under the specified resolution improvement multiple r.

Specifically, in this step, the specific implementation process of adjusting the convolution operation weights of all the convolution layers in the MPAN network according to the loss function value is the prior art, and will not be described herein.

Fig. 6 is a schematic diagram of an image super-resolution reconstruction method according to an embodiment of the present application, including the following steps:

s601, acquiring a low-resolution image to be reconstructed and a preset resolution improvement multiple.

In this step, the manner of acquiring the low resolution image to be reconstructed is the prior art, and is not described herein. The resolution improvement factor in this step may be set by the user according to the actual situation, and the value of the resolution improvement factor is not limited in this embodiment.

S602, inputting the low-resolution image and the resolution improvement times into a trained preset network to obtain a high-resolution reconstructed image.

The trained preset network in this step is an mpa network obtained through training in the corresponding embodiment of fig. 5, and a high-resolution reconstructed image output by the trained mpa network is obtained.

S603, outputting a high-resolution reconstruction image.

In this step, a specific implementation manner of outputting the high-resolution reconstructed image is the prior art, and will not be described herein.

Fig. 7 is a schematic diagram of an image super-resolution reconstruction device according to an embodiment of the present application, including: an acquisition module 701, a reconstruction module 702 and an output module 703.

The acquiring module 701 is configured to acquire a low-resolution image to be reconstructed and a preset resolution improvement factor. The reconstruction module 702 is configured to input the low-resolution image and the resolution enhancement multiple into a trained preset network to obtain a high-resolution reconstructed image; the trained preset network comprises a first convolution layer, a preset number of multi-perception branch modules, a first summation module, a first channel attention module and an up-sampling module; any multi-perception branching module comprises a plurality of cascaded residual channel attention groups; any one of the residual channel attention groups includes a plurality of cascaded enhanced residual blocks;

The low-resolution image is input into a first convolution layer, and the output of the first convolution layer is respectively input into each multi-perception branch module; the output of each multi-perception branch module and the output of the first convolution layer are respectively input into a first summation module; the first summation module is used for summing pixel values of pixel points at the same position of the same channel in the input image; the output of the first summing module is input into the first channel attention module; the output of the first channel attention module is input into the up-sampling module; the up-sampling module is used for up-sampling the output of the first channel attention module by a resolution improvement multiple; the up-sampling module obtains a high-resolution reconstruction image;

any one of the enhanced residual blocks includes: the second convolution layer, the third convolution layer, the rectifying module and the second summation module; an image for inputting the enhanced residual block is input to a second convolution layer; the output of the second convolution layer is input into the rectification module; the output of the rectifying module is input into the third convolution layer; the image used for inputting the enhancement residual block, the output of the rectifying module and the output of the third convolution layer are respectively input into the second summation module; the second summation module is used for summing pixel values of pixel points at the same position of the same channel in the input image to obtain a multi-channel image;

An output module 703 for outputting a high resolution reconstructed image.

Optionally, the preset number is not less than 2.

Optionally, the preset network further includes: a fourth convolution layer; the output of the up-sampling module is input into a fourth convolution layer; and the fourth convolution layer carries out convolution operation on the output of the up-sampling module to obtain a high-resolution reconstructed image.

Optionally, any one of the residual channel attention groups further includes: a second channel attention module, a fifth convolution layer, and a third summation module; an image for inputting the residual channel attention group, a first enhanced residual block in the residual channel attention group; the output of the first enhanced residual block inputs a second enhanced residual block of the residual channel attention group; the output of the (B-1) th enhanced residual block in the residual channel attention group is input into the (B) th enhanced residual block in the residual channel attention group; the output of the B enhanced residual block is input into a second channel attention module; the output of the second channel attention module is input into the fifth convolution layer; the image of the attention group of the residual channel and the output of the fifth convolution layer are input into a third summation module respectively; and the third summation module is used for summing pixel values of the pixel points at the same position of the same channel in the input image and outputting a multi-channel image.

Optionally, the rectifying module is a linear rectifying module.

An embodiment of the present application provides an apparatus, as shown in fig. 8, including at least one processor, and at least one memory and a bus connected to the processor; the processor and the memory complete communication with each other through a bus; the processor is used for calling the program instructions in the memory to execute the image super-resolution reconstruction method. The device herein may be a server, PC, PAD, cell phone, etc.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, the device includes one or more processors (CPUs), memory, and a bus. The device may also include input/output interfaces, network interfaces, and the like.

The memory may include volatile memory, random Access Memory (RAM), and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), among other forms in computer readable media, the memory including at least one memory chip. Memory is an example of a computer-readable medium.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims

1. An image super-resolution reconstruction method, which is characterized by comprising the following steps:

outputting the high-resolution reconstructed image.

2. The method of claim 1, wherein the predetermined number is not less than 2.

3. The method of claim 1, wherein the pre-set network further comprises: a fourth convolution layer;

4. A method according to claim 3, wherein any one of said residual channel attention groups further comprises: a second channel attention module, a fifth convolution layer, and a third summation module;

5. The method of any one of claims 1-4, wherein the rectifying module is a linear rectifying module.

6. An image super-resolution reconstruction apparatus, comprising:

7. The apparatus of claim 6, wherein the pre-set network further comprises: a fourth convolution layer;

8. The apparatus of claim 6, wherein any one of the residual channel attention groups further comprises: a second channel attention module, a fifth convolution layer, and a third summation module;

9. A storage medium comprising a stored program, wherein the program performs the image super-resolution reconstruction method according to any one of claims 1 to 5.

10. An apparatus comprising at least one processor, and at least one memory, bus coupled to the processor; the processor and the memory complete communication with each other through the bus; the processor is configured to invoke program instructions in the memory to perform the image super-resolution reconstruction method according to any of claims 1-5.