CN111223046B - Image super-resolution reconstruction method and device - Google Patents

Image super-resolution reconstruction method and device Download PDF

Info

Publication number
CN111223046B
CN111223046B CN201911140450.XA CN201911140450A CN111223046B CN 111223046 B CN111223046 B CN 111223046B CN 201911140450 A CN201911140450 A CN 201911140450A CN 111223046 B CN111223046 B CN 111223046B
Authority
CN
China
Prior art keywords
module
image
output
convolution layer
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911140450.XA
Other languages
Chinese (zh)
Other versions
CN111223046A (en
Inventor
孙旭
董晓宇
高连如
雷莉萍
张兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Remote Sensing and Digital Earth of CAS
Original Assignee
Institute of Remote Sensing and Digital Earth of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Remote Sensing and Digital Earth of CAS filed Critical Institute of Remote Sensing and Digital Earth of CAS
Priority to CN201911140450.XA priority Critical patent/CN111223046B/en
Publication of CN111223046A publication Critical patent/CN111223046A/en
Application granted granted Critical
Publication of CN111223046B publication Critical patent/CN111223046B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The application provides an image super-resolution reconstruction method and device, wherein the method comprises the following steps: and inputting the low-resolution image and the resolution improvement times into a trained preset network to obtain a high-resolution reconstructed image. The trained preset network comprises a preset number of multi-perception branch modules, and any multi-perception branch module comprises a plurality of cascaded residual channel attention groups; any residual channel attention group includes a plurality of cascaded enhancement residual blocks, any enhancement residual block including: the second convolution layer, the third convolution layer, the rectifying module and the second summation module; an image for inputting the enhanced residual block is input to a second convolution layer; the output of the second convolution layer is input into the rectification module; the output of the rectifying module is input into the third convolution layer; the image for inputting the enhanced residual block, the output of the rectifying module, and the output of the third convolution layer are respectively input to the second summing module. The reconstructed image has higher spatial resolution and higher information fidelity.

Description

Image super-resolution reconstruction method and device
Technical Field
The present disclosure relates to the field of image processing, and in particular, to a method and an apparatus for reconstructing super-resolution images.
Background
Super-Resolution reconstruction (SR) refers to restoring a high Resolution image from one or more low Resolution images of the same scene.
Super-resolution reconstruction is an important digital image processing technology and has wide application in the fields of medicine, remote sensing and various social life. The current mainstream image super-resolution reconstruction method is as follows: a super-resolution reconstruction method based on deep learning. Specifically, a mapping relation between a high-resolution training sample pair and a low-resolution training sample pair is learned by constructing a neural network, and then various low-resolution images of an input network are reconstructed with high resolution by using learned priori knowledge.
However, the fidelity of the reconstructed high resolution image is low.
Disclosure of Invention
The application provides an image super-resolution reconstruction method and device, and aims to solve the problem that the fidelity of an image obtained by super-resolution reconstruction is low.
In order to achieve the above object, the present application provides the following technical solutions:
the application provides an image super-resolution reconstruction method, which comprises the following steps:
acquiring a low-resolution image to be reconstructed and a preset resolution improvement multiple;
inputting the low-resolution image and the resolution improvement multiple into a trained preset network to obtain a high-resolution reconstructed image; the trained preset network comprises a first convolution layer, a preset number of multi-perception branch modules, a first summation module, a first channel attention module and an up-sampling module; any one of the multi-perception branching modules comprises a plurality of cascaded residual channel attention groups; any one of the residual channel attention groups comprises a plurality of cascaded enhanced residual blocks;
The low-resolution image is input into the first convolution layer, and the output of the first convolution layer is respectively input into each multi-perception branch module; the output of each multi-perception branching module and the output of the first convolution layer are respectively input into the first summation module; the first summation module is used for summing pixel values of pixel points at the same position of the same channel in the input image; the output of the first summing module is input to the first channel attention module; the output of the first channel attention module is input into the up-sampling module; the up-sampling module is used for up-sampling the output of the first channel attention module by the resolution improvement times; the up-sampling module obtains the high-resolution reconstructed image;
any one of the enhanced residual blocks includes: the second convolution layer, the third convolution layer, the rectifying module and the second summation module; an image for inputting the enhanced residual block is input to the second convolution layer; the output of the second convolution layer is input into the rectification module; the output of the rectifying module is input into the third convolution layer; the image for inputting the enhanced residual block, the output of the rectifying module, and the output of the third convolution layer are respectively input to the second summing module; the second summation module is used for summing pixel values of pixel points at the same position of the same channel in the input image to obtain a multi-channel image;
Outputting the high-resolution reconstructed image.
Optionally, the preset number is not less than 2.
Optionally, the preset network further includes: a fourth convolution layer;
the output of the up-sampling module is input into the fourth convolution layer;
and the fourth convolution layer carries out convolution operation on the output of the up-sampling module to obtain the high-resolution reconstructed image.
Optionally, any one of the residual channel attention groups further includes: a second channel attention module, a fifth convolution layer, and a third summation module;
an image for inputting the residual channel attention group, a first enhanced residual block in the residual channel attention group; the output of the first enhanced residual block inputs a second enhanced residual block of the residual channel attention group; the output of the (B-1) th enhanced residual block in the residual channel attention group is input into the (B) th enhanced residual block in the residual channel attention group; the output of the B enhanced residual block is input into the second channel attention module; the output of the second channel attention module is input into the fifth convolution layer;
the image for inputting the residual channel attention group and the output of the fifth convolution layer are respectively input into the third summation module;
And the third summation module is used for summing pixel values of the pixel points at the same position of the same channel in the input image and outputting a multi-channel image.
Optionally, the rectifying module is a linear rectifying module.
The application also provides an image super-resolution reconstruction device, which comprises:
the acquisition module is used for acquiring the low-resolution image to be reconstructed and a preset resolution improvement multiple;
the reconstruction module is used for inputting the low-resolution image and the resolution improvement multiple into a trained preset network to obtain a high-resolution reconstructed image; the trained preset network comprises a first convolution layer, a preset number of multi-perception branch modules, a first summation module, a first channel attention module and an up-sampling module; any one of the multi-perception branching modules comprises a plurality of cascaded residual channel attention groups; any one of the residual channel attention groups comprises a plurality of cascaded enhanced residual blocks;
the low-resolution image is input into the first convolution layer, and the output of the first convolution layer is respectively input into each multi-perception branch module; the output of each multi-perception branching module and the output of the first convolution layer are respectively input into the first summation module; the first summation module is used for summing pixel values of pixel points at the same position of the same channel in the input image; the output of the first summing module is input to the first channel attention module; the output of the first channel attention module is input into the up-sampling module; the up-sampling module is used for up-sampling the output of the first channel attention module by the resolution improvement times; the up-sampling module obtains the high-resolution reconstructed image;
Any one of the enhanced residual blocks includes: the second convolution layer, the third convolution layer, the rectifying module and the second summation module; an image for inputting the enhanced residual block is input to the second convolution layer; the output of the second convolution layer is input into the rectification module; the output of the rectifying module is input into the third convolution layer; the image for inputting the enhanced residual block, the output of the rectifying module, and the output of the third convolution layer are respectively input to the second summing module; the second summation module is used for summing pixel values of pixel points at the same position of the same channel in the input image to obtain a multi-channel image;
and the output module is used for outputting the high-resolution reconstructed image.
Optionally, the preset number is not less than 2.
Optionally, the preset network further includes: a fourth convolution layer;
the output of the up-sampling module is input into the fourth convolution layer;
and the fourth convolution layer carries out convolution operation on the output of the up-sampling module to obtain the high-resolution reconstructed image.
Optionally, any one of the residual channel attention groups further includes: a second channel attention module, a fifth convolution layer, and a third summation module;
An image for inputting the residual channel attention group, a first enhanced residual block in the residual channel attention group; the output of the first enhanced residual block inputs a second enhanced residual block of the residual channel attention group; the output of the (B-1) th enhanced residual block in the residual channel attention group is input into the (B) th enhanced residual block in the residual channel attention group; the output of the B enhanced residual block is input into the second channel attention module; the output of the second channel attention module is input into the fifth convolution layer;
the image for inputting the residual channel attention group and the output of the fifth convolution layer are respectively input into the third summation module;
and the third summation module is used for summing pixel values of the pixel points at the same position of the same channel in the input image and outputting a multi-channel image.
Optionally, the rectifying module is a linear rectifying module.
The application also provides a storage medium comprising a stored program, wherein the program executes any one of the image super-resolution reconstruction methods.
The application also provides a device comprising at least one processor, and at least one memory and a bus connected with the processor; the processor and the memory complete communication with each other through the bus; the processor is configured to invoke the program instructions in the memory to perform any of the above-described image super-resolution reconstruction methods.
In the image super-resolution reconstruction method and device, because the preset network comprises a preset number of multi-perception branch modules, any one multi-perception branch module comprises a plurality of cascaded residual channel attention groups, and each residual channel attention group comprises a plurality of cascaded enhanced residual blocks. Wherein, any one enhancement residual block comprises a second convolution layer, a third convolution layer, a rectification module and a second summation module. The image used for inputting the enhanced residual block is input into the second convolution layer, the output of the second convolution layer is input into the rectifying module, and the output of the rectifying module is input into the third convolution layer, so that the input of the third convolution layer in any one enhanced residual block is obtained through the calculation of the second convolution layer, the third convolution layer achieves a different perception scale with the second convolution layer, and the perception scale of the third convolution layer is increased relative to that of the second convolution layer.
The image used for inputting the enhancement residual block, the output of the rectifying module and the output of the third convolution layer are respectively input into the second summation module; the second summation module is used for summing pixel values of pixel points at the same position of the same channel in the input image to obtain images of all channels, namely the enhanced residual block can extract three levels of information, namely an image for inputting the enhanced residual block, the output of the rectification module and the output of the third convolution layer. And, the characteristic information of these three levels is all as the input of the enhancement residual block of the subsequent cascade, and so on, any multi-perception branch module can extract the information of more levels (each level includes a plurality of channels), and then, the preset number of multi-perception branch modules can extract the information of more levels.
In addition, the preset network further comprises a first summation module and a first channel attention module, wherein the first summation module sums pixel values of pixels at the same position of the same channel in the multi-level information output by the first convolution layer and each multi-perception branch module to obtain a multi-channel image, and the first channel attention module gives different weights to different channels in the multi-channel image output by the first summation module, so that different channel characteristics are utilized in a self-adaptive manner. Meanwhile, the up-sampling module improves the output of the first channel attention module by a preset multiple of resolution, so that compared with the high-resolution image reconstructed by the existing mode, the high-resolution image reconstructed by the embodiment of the application has higher fidelity.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic structural diagram of a multi-awareness network according to an embodiment of the present disclosure;
FIG. 2 is a schematic structural diagram of any multi-aware branching module according to an embodiment of the present application;
FIG. 3 is a schematic structural diagram of any one of the attention groups of residual channels disclosed in the embodiments of the present application;
fig. 4 is a schematic structural diagram of any one of the enhanced residual blocks disclosed in the embodiments of the present application;
fig. 5 is a schematic diagram of a training process of a multi-awareness network according to an embodiment of the present disclosure;
FIG. 6 is a flowchart of an image super-resolution reconstruction method disclosed in an embodiment of the present application;
fig. 7 is a schematic structural diagram of an image super-resolution reconstruction device according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
The inventor of the application finds that the reasons for low fidelity of the high-resolution image reconstructed by the existing image super-resolution reconstruction method based on the deep learning in the research include: first, image information learned by the neural network is not fully utilized, i.e., the perceptibility is limited. And the information extracted from different levels in the second and the neural networks are directly used for final reconstruction, namely, the channel-level characteristic difference among the information extracted from different levels in the neural networks is ignored.
In one aspect, the network provided in the embodiment of the present application includes a preset number of multi-aware branching modules, where any one of the multi-aware branching modules includes a plurality of cascaded residual channel attention groups, and each residual channel attention group includes a plurality of cascaded enhanced residual blocks. Wherein, any one enhancement residual block comprises a second convolution layer, a third convolution layer, a rectification module and a second summation module. The image used for inputting the enhanced residual block is input into the second convolution layer, the output of the second convolution layer is input into the rectifying module, and the output of the rectifying module is input into the third convolution layer, so that the input of the third convolution layer in any one enhanced residual block is obtained through the calculation of the second convolution layer, the third convolution layer achieves a different perception scale with the second convolution layer, and the perception scale of the third convolution layer is increased relative to that of the second convolution layer.
On the other hand, the image for inputting the enhanced residual block, the output of the rectifying module, and the output of the third convolution layer are respectively input to the second summing module; the second summation module is used for summing pixel values of pixel points at the same position of the same channel in the input image to obtain images of all channels, namely the enhanced residual block can extract three levels of information, namely an image for inputting the enhanced residual block, the output of the rectification module and the output of the third convolution layer. And, the characteristic information of these three levels is all as the input of the enhancement residual block of the subsequent cascade, and so on, any multi-perception branch module can extract the information of more levels (each level includes a plurality of channels), and then, the information of more levels that the multi-perception branch module of preset quantity can output.
In addition, the network provided by the embodiment of the application further comprises a first summation module and a first channel attention module, wherein the first summation module sums the pixel values of the pixel points at the same position of the same channel in the multi-level information output by the first convolution layer and each multi-perception branch module to obtain a multi-channel image, and the first channel attention module gives different weights to different channels in the multi-channel image output by the first summation module, so that the characteristics of the different channels are utilized to different extents in a self-adaptive manner.
In summary, compared with the high-resolution image reconstructed by the existing method, the high-resolution image reconstructed by the embodiment of the application has higher fidelity.
Fig. 1 is a structure of a multi-awareness attention network according to an embodiment of the present application, including:
the device comprises a first convolution layer, a preset number of multi-perception branch modules, a first summation module, a first channel attention module, an up-sampling module and a fourth convolution layer.
The low-resolution image to be reconstructed is input into a first convolution layer, the output of the first convolution layer is respectively input into each multi-perception branch module, the output of each multi-perception branch module and the output of the first convolution layer are respectively input into a first summation module, and the first summation module is used for summing pixel values of pixel points at the same position of the same channel in the input image; the output of the first summing module is input into the first channel attention module; the output of the first channel attention module is input into an up-sampling module, the up-sampling module is used for carrying out r times up-sampling operation on the output of the first channel attention module, and the output of the up-sampling module is input into a fourth convolution layer; and the fourth convolution layer carries out convolution operation on the input of the up-sampling module to obtain a high-resolution reconstruction image.
Specifically, the first summing module sums the pixel values of the pixel points at the same position of the same channel in the input image, where the meaning is that: the output image of each multi-perception branching module is assumed to be an n-channel image, specifically, a 1 st channel, a 2 nd channel, a 3 rd channel, a … th channel and an n-th channel respectively. The output of the first convolution layer is an n-channel image, specifically, the 1 st channel, the 2 nd channel, the 3 rd channel, the … th channel and the n-th channel. In this embodiment, the first summing module sums the pixel values of the pixel points at the same position of the 1 st channel in the image output by each multi-perception branching module and the 1 st channel in the image output by the first convolution layer. The first summation module sums the pixel values of the pixel points at the same position of the 2 nd channel in the image output by each multi-perception branch module and the 2 nd channel in the image output by the first convolution layer, and so on, and the first summation module sums the pixel values of the pixel points at the same position of the n th channel in the image output by each multi-perception branch module and the n th channel in the image output by the first convolution layer. That is, the first summing module is configured to sum pixel values of pixels at the same position in the same channel in the image output by each multi-perception branching module and the image output by the first convolution layer.
It should be noted that, in the embodiment of the present application, the fourth convolution layer is optional, if the multi-awareness network includes the fourth convolution layer, the fourth convolution layer outputs the high-resolution image, and if the multi-awareness network does not include the fourth convolution layer, the up-sampling module outputs the high-resolution image.
Specifically, in the present embodiment, a low-resolution image input to a multi-aware attention network (hereinafter referred to as an MPAN for convenience of description) is denoted as X, where X is an M-row N-column C-channel image. The parameters of the first convolution layer are w MPAN,1 The parameters of the fourth convolution layer are represented by w MPAN,2 And (3) representing. Wherein the output of the first convolution layer is denoted as X 0 Specific X 0 =F Conv (X,w MPAN,1 ) Wherein F is Conv Representing a convolution operation. Wherein if the first convolution layer includes n convolutionsCore, then X 0 For M rows and N columns of N-channel images.
Since in this embodiment, the output of the first convolution layer is input to each multi-aware branching module (for convenience of description, any one multi-aware branching is simply referred to as MPB). In this embodiment, the convolution layer set of the kth MPB in the MPAN is represented as
Figure BDA0002280778660000081
The value of K is the number of multi-perception branch modules in the MPAN. In practice, in order to ensure that the high-resolution image reconstructed by the MPAN has higher fidelity, the value of K obtained by the test is not less than 2, and of course, in practice, the value of K may also be other values, and the embodiment does not specifically limit the value of K.
In this embodiment, the low resolution image X input into the MPAN network is calculated by the following formula (1):
Figure BDA0002280778660000091
in the method, in the process of the invention,
Figure BDA0002280778660000092
x representing the output of the first MPB to the first convolutional layer 0 Is used for calculating the result of the calculation,
Figure BDA0002280778660000093
x representing the output of the second MPB to the first convolutional layer 0 Is calculated as->
Figure BDA0002280778660000094
X representing the output of the Kth MPB to the first convolutional layer 0 Is calculated by the computer. Wherein any MPB outputs X to the first convolution layer 0 Is an N-channel image of M rows and N columns.
In the method, in the process of the invention,
Figure BDA0002280778660000095
the representation is: first oneOutput X of summing module to first convolution layer 0 And summing pixel values of pixel points at the same position of the same channel in the output of each MPB. The first summing module outputs M rows and N columns of N-channel images.
In the method, in the process of the invention,
Figure BDA0002280778660000096
the representation is: the first channel attention module calculates M rows and N columns of N channel images output by the first summation module. Assuming X is adopted 1 Representation->
Figure BDA0002280778660000097
Then
Figure BDA0002280778660000098
Can be expressed as
Figure BDA0002280778660000099
Wherein w is down A convolution layer consisting of n/r' convolution kernels of size 1×1×n, w up For a convolution layer consisting of n convolution kernels of size 1 x n/r ', r' is the vector dimension transform factor in the pass through attention module.
In the method, in the process of the invention,
Figure BDA00022807786600000910
representing the upsampling module upsampling the output of the first summing module. Where r refers to the image size magnification, or resolution enhancement, or upsampling rate.
In the method, in the process of the invention,
Figure BDA00022807786600000911
and the fourth convolution layer carries out convolution calculation on the output of the up-sampling module to reconstruct a high-resolution image.
In the present embodiment, the kth MPB in the MPAN includes G (k) The structure of the group of residual channel attention (hereinafter, referred to as RCAG for convenience of description) and the sixth convolutional layer, specifically, the kth MPB is shown in fig. 2. As can be seen from FIG. 2, G (k) RCAG (RCAG)A sixth convolutional layer cascade, specifically, the output of the first RCAG is input to the second RCAG, the output of the second RCAG is input to the third RCAG, … …, the G (k) -1 output of RCAG input G (k) RCAG, G (k) The output of each RCAG is input to the sixth convolutional layer. In this embodiment, the convolution layer set of the g-th RCAG in the kth MPB employs
Figure BDA0002280778660000101
Indicating that the sixth convolution layer employs +.>
Figure BDA0002280778660000102
Representing, therefore, the convolution layer set of the kth MPB in the MPAN
Figure BDA0002280778660000103
For any one MPB, assuming that the input data is X, a calculation formula of the data X by the MPB is as follows formula (2):
Figure BDA0002280778660000104
wherein X is an input image;
Figure BDA0002280778660000105
represents the convolution layer set, w, of the g-th RCAG MPB A sixth convolutional layer in the MPB is shown.
Figure BDA0002280778660000106
Representing the operation of the g-th RCAG in MPB on the input, e.g., +.>
Figure BDA0002280778660000107
Representation of use->
Figure BDA0002280778660000108
And calculating X.
Wherein the structure of the g-th RCAG in the kth MPB is shown in FIG. 3 and comprises a plurality of A concatenated enhancement residual block, a second channel attention module, a fifth convolution layer, and a third summation module. Wherein the image for inputting the RCAG is assumed to be X 2 First X is taken up 2 The first enhancement residual block is input, the output of the first enhancement residual block is input into the second enhancement residual block, the output of the second enhancement residual block is input into the third enhancement residual block, and so on, the output of the last enhancement residual block is input into the second channel attention group, and the output of the second channel attention group is input into the fifth convolution layer. Output of fifth convolution layer and image X for inputting RCAG 2 And respectively inputting the pixel values of the pixel points at the same position of the same channel in the input image to a third summation module, and outputting the multi-channel image. If image X for inputting RCAG 2 And (3) outputting the image of M rows, N columns and N channels by the third summation module.
Specifically, the calculation formula of any one RCAG for the input image is shown in the following formula (3):
Figure BDA0002280778660000109
wherein X is an input image of the RCAG;
Figure BDA00022807786600001010
is a set of convolutional layers in an RCAG;
Figure BDA00022807786600001011
A set of convolutional layers representing the b-th ERB in the RCAG, +.>
Figure BDA00022807786600001012
Representation of use->
Figure BDA00022807786600001013
The inputs are operated on. w (w) RCAG The convolution layer at the end of the RCAG, the fifth convolution layer in this embodiment, is shown. w (w) up See w in the first channel attention module in MPAN up ,w down See w in the first channel attention module in MPAN down And will not be described in detail here.
In this embodiment, the structure of any one of the g-th RCAG enhanced residual blocks in the kth MPB is shown in fig. 4. The device comprises a second convolution layer, a third convolution layer, a rectification module and a second summation module. Wherein the calculation process of the enhanced residual block on the image input into the enhanced residual block comprises the following steps: the image input to the enhanced residual block is input to the second convolution layer at first, the output of the second convolution layer is input to the rectification module, the output of the rectification module is input to the third convolution layer, and the output of the third convolution layer, the output of the rectification module and the image input to the enhanced residual block are respectively input to the second summation module.
Assuming that an image to which the enhanced residual block is input is represented as X, a calculation formula of the enhanced residual block for the image is shown as the following formula (4):
Figure BDA0002280778660000111
w in the formula ERB ={w ERB,1 ,w ERB,2 The set of convolutional layers in the enhanced residual block, w ERB,1 Representing the second convolutional layer, w, in the enhanced residual block ERB,2 Representing the third convolutional layer in the enhanced residual block. F (F) Conv (X,w ERB,1 ) Representing the convolution operation performed on image X by the second convolution layer in the enhanced residual block. F (F) ReLU (F Conv (X,w ERB,1 ) A rectification module performs rectification calculation on the output of the second convolution module. F (F) Conv (F ReLU (F Conv (X,w ERB,1 )),w ERB,2 ) Representing the third convolution layer performing convolution calculations on the output of the rectification module.
In this embodiment, the rectifying module may be specifically a linear rectifying module, that is, the rectifying module calculates the output of the first convolution layer according to a linear rectifying function. The linear rectification function is the prior art, and is not described herein.
Fig. 5 is a training process of a multi-awareness network according to an embodiment of the present application, including the following steps:
s501, acquiring an image set to be trained and resolution improvement multiples.
In this embodiment, the resolution improvement factor is a super resolution improvement factor that needs to be achieved by the MPAN network obtained by training in this embodiment. For example, if the MPAN network trained by the embodiment is to achieve a 3-fold resolution improvement effect, the resolution improvement factor in this step is 3.
In this step, the image set to be trained includes: a preset high resolution image set and a preset low resolution image set. The low-resolution image set is obtained by r times degradation of the high-resolution image set. Wherein, the value of r is the resolution improvement multiple in the step.
It should be noted that, the resolution improvement factor in this embodiment may be set according to actual situations, and this embodiment does not limit the specific value of the resolution improvement factor.
In particular, the high resolution image set is represented as
Figure BDA0002280778660000121
Representing a low resolution image set as +.>
Figure BDA0002280778660000122
Wherein r is a resolution improvement factor.
S502, initializing convolution kernels in each convolution layer in the MPAN network.
Specifically, the MPAN network in this embodiment is an MPAN network provided in fig. 1 in this embodiment of the present application.
In this step, the size of the convolution kernel in each convolution layer in the MPAN network is t×t×c, and the number of convolution kernels in each convolution layer may be n.
S503, inputting the low-resolution image set and the resolution improvement times into an MPAN network, and obtaining a result image set after the MPAN network rebuilds the low-resolution image set.
In this step, after the low resolution image set and the resolution improvement factor are input into the MPAN network, the MPAN network reconstructs the low resolution image, and outputs a high resolution image reconstructed from the low resolution image set, which is referred to as a result image for convenience of description.
S504, calculating a loss function value between the result image set and the high-resolution image set according to a preset loss function.
In this embodiment, the loss function may be a formula shown in the following formula (6):
Figure BDA0002280778660000123
wherein L is 1 (W MPAN ) The value of the loss function is indicated,
Figure BDA0002280778660000124
representing a low resolution image set
Figure BDA0002280778660000125
Result image calculated by MPAN network, < ->
Figure BDA0002280778660000126
Representing high resolution image concentration and +.>
Figure BDA0002280778660000127
A corresponding high resolution image.
In this embodiment, the loss function may also be a formula shown in the following formula (7):
Figure BDA0002280778660000128
wherein L is 2 (W MPAN ) The value of the loss function is indicated,
Figure BDA0002280778660000131
representing a low resolution image set
Figure BDA0002280778660000132
Result image calculated by MPAN network, < ->
Figure BDA0002280778660000133
Representing high resolution image concentration and +.>
Figure BDA0002280778660000134
A corresponding high resolution image.
In the embodiment, only two specific formulas of the loss function are provided, in practice, the calculation formulas of the loss function may also use formulas of other forms, and the specific form of the loss function is not limited in the embodiment.
S505, according to the loss function value, the convolution operation weight of all convolution layers in the MPAN network is adjusted, and the step of inputting the low-resolution image set and the resolution improvement multiple into the MPAN network is returned to obtain a result image set after the MPAN network reconstructs the low-resolution image set until the loss function value is not reduced any more, and the trained MPAN network is obtained.
The purpose that this embodiment needs to achieve for training an MPAN network is: and (3) by adjusting convolution operation weights of all convolution layers in the MPAN network, inputting a low-resolution image in the MPAN network, calculating a result image through the MPAN network, and obtaining the trained MPAN network when the loss function value between high-resolution images corresponding to the low-resolution image reaches the minimum value. Namely, the convolution operation weight set formed by the convolution operation weights of all the convolution layers in the trained MPAN network reaches the optimal value under the specified resolution improvement multiple r.
Specifically, in this step, the specific implementation process of adjusting the convolution operation weights of all the convolution layers in the MPAN network according to the loss function value is the prior art, and will not be described herein.
Fig. 6 is a schematic diagram of an image super-resolution reconstruction method according to an embodiment of the present application, including the following steps:
s601, acquiring a low-resolution image to be reconstructed and a preset resolution improvement multiple.
In this step, the manner of acquiring the low resolution image to be reconstructed is the prior art, and is not described herein. The resolution improvement factor in this step may be set by the user according to the actual situation, and the value of the resolution improvement factor is not limited in this embodiment.
S602, inputting the low-resolution image and the resolution improvement times into a trained preset network to obtain a high-resolution reconstructed image.
The trained preset network in this step is an mpa network obtained through training in the corresponding embodiment of fig. 5, and a high-resolution reconstructed image output by the trained mpa network is obtained.
S603, outputting a high-resolution reconstruction image.
In this step, a specific implementation manner of outputting the high-resolution reconstructed image is the prior art, and will not be described herein.
Fig. 7 is a schematic diagram of an image super-resolution reconstruction device according to an embodiment of the present application, including: an acquisition module 701, a reconstruction module 702 and an output module 703.
The acquiring module 701 is configured to acquire a low-resolution image to be reconstructed and a preset resolution improvement factor. The reconstruction module 702 is configured to input the low-resolution image and the resolution enhancement multiple into a trained preset network to obtain a high-resolution reconstructed image; the trained preset network comprises a first convolution layer, a preset number of multi-perception branch modules, a first summation module, a first channel attention module and an up-sampling module; any multi-perception branching module comprises a plurality of cascaded residual channel attention groups; any one of the residual channel attention groups includes a plurality of cascaded enhanced residual blocks;
The low-resolution image is input into a first convolution layer, and the output of the first convolution layer is respectively input into each multi-perception branch module; the output of each multi-perception branch module and the output of the first convolution layer are respectively input into a first summation module; the first summation module is used for summing pixel values of pixel points at the same position of the same channel in the input image; the output of the first summing module is input into the first channel attention module; the output of the first channel attention module is input into the up-sampling module; the up-sampling module is used for up-sampling the output of the first channel attention module by a resolution improvement multiple; the up-sampling module obtains a high-resolution reconstruction image;
any one of the enhanced residual blocks includes: the second convolution layer, the third convolution layer, the rectifying module and the second summation module; an image for inputting the enhanced residual block is input to a second convolution layer; the output of the second convolution layer is input into the rectification module; the output of the rectifying module is input into the third convolution layer; the image used for inputting the enhancement residual block, the output of the rectifying module and the output of the third convolution layer are respectively input into the second summation module; the second summation module is used for summing pixel values of pixel points at the same position of the same channel in the input image to obtain a multi-channel image;
An output module 703 for outputting a high resolution reconstructed image.
Optionally, the preset number is not less than 2.
Optionally, the preset network further includes: a fourth convolution layer; the output of the up-sampling module is input into a fourth convolution layer; and the fourth convolution layer carries out convolution operation on the output of the up-sampling module to obtain a high-resolution reconstructed image.
Optionally, any one of the residual channel attention groups further includes: a second channel attention module, a fifth convolution layer, and a third summation module; an image for inputting the residual channel attention group, a first enhanced residual block in the residual channel attention group; the output of the first enhanced residual block inputs a second enhanced residual block of the residual channel attention group; the output of the (B-1) th enhanced residual block in the residual channel attention group is input into the (B) th enhanced residual block in the residual channel attention group; the output of the B enhanced residual block is input into a second channel attention module; the output of the second channel attention module is input into the fifth convolution layer; the image of the attention group of the residual channel and the output of the fifth convolution layer are input into a third summation module respectively; and the third summation module is used for summing pixel values of the pixel points at the same position of the same channel in the input image and outputting a multi-channel image.
Optionally, the rectifying module is a linear rectifying module.
An embodiment of the present application provides an apparatus, as shown in fig. 8, including at least one processor, and at least one memory and a bus connected to the processor; the processor and the memory complete communication with each other through a bus; the processor is used for calling the program instructions in the memory to execute the image super-resolution reconstruction method. The device herein may be a server, PC, PAD, cell phone, etc.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, the device includes one or more processors (CPUs), memory, and a bus. The device may also include input/output interfaces, network interfaces, and the like.
The memory may include volatile memory, random Access Memory (RAM), and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), among other forms in computer readable media, the memory including at least one memory chip. Memory is an example of a computer-readable medium.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims (10)

1. An image super-resolution reconstruction method, which is characterized by comprising the following steps:
acquiring a low-resolution image to be reconstructed and a preset resolution improvement multiple;
inputting the low-resolution image and the resolution improvement multiple into a trained preset network to obtain a high-resolution reconstructed image; the trained preset network comprises a first convolution layer, a preset number of multi-perception branch modules, a first summation module, a first channel attention module and an up-sampling module; any one of the multi-perception branching modules comprises a plurality of cascaded residual channel attention groups; any one of the residual channel attention groups comprises a plurality of cascaded enhanced residual blocks;
the low-resolution image is input into the first convolution layer, and the output of the first convolution layer is respectively input into each multi-perception branch module; the output of each multi-perception branching module and the output of the first convolution layer are respectively input into the first summation module; the first summation module is used for summing pixel values of pixel points at the same position of the same channel in the input image; the output of the first summing module is input to the first channel attention module; the output of the first channel attention module is input into the up-sampling module; the up-sampling module is used for up-sampling the output of the first channel attention module by the resolution improvement times; the up-sampling module obtains the high-resolution reconstructed image;
Any one of the enhanced residual blocks includes: the second convolution layer, the third convolution layer, the rectifying module and the second summation module; an image for inputting the enhanced residual block is input to the second convolution layer; the output of the second convolution layer is input into the rectification module; the output of the rectifying module is input into the third convolution layer; the image for inputting the enhanced residual block, the output of the rectifying module, and the output of the third convolution layer are respectively input to the second summing module; the second summation module is used for summing pixel values of pixel points at the same position of the same channel in the input image to obtain a multi-channel image;
outputting the high-resolution reconstructed image.
2. The method of claim 1, wherein the predetermined number is not less than 2.
3. The method of claim 1, wherein the pre-set network further comprises: a fourth convolution layer;
the output of the up-sampling module is input into the fourth convolution layer;
and the fourth convolution layer carries out convolution operation on the output of the up-sampling module to obtain the high-resolution reconstructed image.
4. A method according to claim 3, wherein any one of said residual channel attention groups further comprises: a second channel attention module, a fifth convolution layer, and a third summation module;
An image for inputting the residual channel attention group, a first enhanced residual block in the residual channel attention group; the output of the first enhanced residual block inputs a second enhanced residual block of the residual channel attention group; the output of the (B-1) th enhanced residual block in the residual channel attention group is input into the (B) th enhanced residual block in the residual channel attention group; the output of the B enhanced residual block is input into the second channel attention module; the output of the second channel attention module is input into the fifth convolution layer;
the image for inputting the residual channel attention group and the output of the fifth convolution layer are respectively input into the third summation module;
and the third summation module is used for summing pixel values of the pixel points at the same position of the same channel in the input image and outputting a multi-channel image.
5. The method of any one of claims 1-4, wherein the rectifying module is a linear rectifying module.
6. An image super-resolution reconstruction apparatus, comprising:
the acquisition module is used for acquiring the low-resolution image to be reconstructed and a preset resolution improvement multiple;
the reconstruction module is used for inputting the low-resolution image and the resolution improvement multiple into a trained preset network to obtain a high-resolution reconstructed image; the trained preset network comprises a first convolution layer, a preset number of multi-perception branch modules, a first summation module, a first channel attention module and an up-sampling module; any one of the multi-perception branching modules comprises a plurality of cascaded residual channel attention groups; any one of the residual channel attention groups comprises a plurality of cascaded enhanced residual blocks;
The low-resolution image is input into the first convolution layer, and the output of the first convolution layer is respectively input into each multi-perception branch module; the output of each multi-perception branching module and the output of the first convolution layer are respectively input into the first summation module; the first summation module is used for summing pixel values of pixel points at the same position of the same channel in the input image; the output of the first summing module is input to the first channel attention module; the output of the first channel attention module is input into the up-sampling module; the up-sampling module is used for up-sampling the output of the first channel attention module by the resolution improvement times; the up-sampling module obtains the high-resolution reconstructed image;
any one of the enhanced residual blocks includes: the second convolution layer, the third convolution layer, the rectifying module and the second summation module; an image for inputting the enhanced residual block is input to the second convolution layer; the output of the second convolution layer is input into the rectification module; the output of the rectifying module is input into the third convolution layer; the image for inputting the enhanced residual block, the output of the rectifying module, and the output of the third convolution layer are respectively input to the second summing module; the second summation module is used for summing pixel values of pixel points at the same position of the same channel in the input image to obtain a multi-channel image;
And the output module is used for outputting the high-resolution reconstructed image.
7. The apparatus of claim 6, wherein the pre-set network further comprises: a fourth convolution layer;
the output of the up-sampling module is input into the fourth convolution layer;
and the fourth convolution layer carries out convolution operation on the output of the up-sampling module to obtain the high-resolution reconstructed image.
8. The apparatus of claim 6, wherein any one of the residual channel attention groups further comprises: a second channel attention module, a fifth convolution layer, and a third summation module;
an image for inputting the residual channel attention group, a first enhanced residual block in the residual channel attention group; the output of the first enhanced residual block inputs a second enhanced residual block of the residual channel attention group; the output of the (B-1) th enhanced residual block in the residual channel attention group is input into the (B) th enhanced residual block in the residual channel attention group; the output of the B enhanced residual block is input into the second channel attention module; the output of the second channel attention module is input into the fifth convolution layer;
the image for inputting the residual channel attention group and the output of the fifth convolution layer are respectively input into the third summation module;
And the third summation module is used for summing pixel values of the pixel points at the same position of the same channel in the input image and outputting a multi-channel image.
9. A storage medium comprising a stored program, wherein the program performs the image super-resolution reconstruction method according to any one of claims 1 to 5.
10. An apparatus comprising at least one processor, and at least one memory, bus coupled to the processor; the processor and the memory complete communication with each other through the bus; the processor is configured to invoke program instructions in the memory to perform the image super-resolution reconstruction method according to any of claims 1-5.
CN201911140450.XA 2019-11-20 2019-11-20 Image super-resolution reconstruction method and device Active CN111223046B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911140450.XA CN111223046B (en) 2019-11-20 2019-11-20 Image super-resolution reconstruction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911140450.XA CN111223046B (en) 2019-11-20 2019-11-20 Image super-resolution reconstruction method and device

Publications (2)

Publication Number Publication Date
CN111223046A CN111223046A (en) 2020-06-02
CN111223046B true CN111223046B (en) 2023-04-25

Family

ID=70832773

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911140450.XA Active CN111223046B (en) 2019-11-20 2019-11-20 Image super-resolution reconstruction method and device

Country Status (1)

Country Link
CN (1) CN111223046B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113808026A (en) * 2020-06-12 2021-12-17 华为技术有限公司 Image processing method and device
CN112801881B (en) * 2021-04-13 2021-06-22 湖南大学 High-resolution hyperspectral calculation imaging method, system and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108734660A (en) * 2018-05-25 2018-11-02 上海通途半导体科技有限公司 A kind of image super-resolution rebuilding method and device based on deep learning
CN109003229A (en) * 2018-08-09 2018-12-14 成都大学 Magnetic resonance super resolution ratio reconstruction method based on three-dimensional enhancing depth residual error network
CN110111256A (en) * 2019-04-28 2019-08-09 西安电子科技大学 Image Super-resolution Reconstruction method based on residual error distillation network
WO2019192588A1 (en) * 2018-04-04 2019-10-10 华为技术有限公司 Image super resolution method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019192588A1 (en) * 2018-04-04 2019-10-10 华为技术有限公司 Image super resolution method and device
CN108734660A (en) * 2018-05-25 2018-11-02 上海通途半导体科技有限公司 A kind of image super-resolution rebuilding method and device based on deep learning
CN109003229A (en) * 2018-08-09 2018-12-14 成都大学 Magnetic resonance super resolution ratio reconstruction method based on three-dimensional enhancing depth residual error network
CN110111256A (en) * 2019-04-28 2019-08-09 西安电子科技大学 Image Super-resolution Reconstruction method based on residual error distillation network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
崔顺.基于深度学习的图像超分辨率重建技术研究.信息科技 中国优秀硕士学位论文全文库.2018,第9-第46页. *

Also Published As

Publication number Publication date
CN111223046A (en) 2020-06-02

Similar Documents

Publication Publication Date Title
CN111369440B (en) Model training and image super-resolution processing method, device, terminal and storage medium
Choi et al. A deep convolutional neural network with selection units for super-resolution
CN112381172B (en) InSAR interference image phase unwrapping method based on U-net
CN111476719B (en) Image processing method, device, computer equipment and storage medium
Sun et al. Lightweight image super-resolution via weighted multi-scale residual network
Yu et al. A unified learning framework for single image super-resolution
CN110782395B (en) Image processing method and device, electronic equipment and computer readable storage medium
CN111105352A (en) Super-resolution image reconstruction method, system, computer device and storage medium
US20190005619A1 (en) Image upscaling system, training method thereof, and image upscaling method
US20180315165A1 (en) Apparatus for upscaling an image, method for training the same, and method for upscaling an image
CN111754404B (en) Remote sensing image space-time fusion method based on multi-scale mechanism and attention mechanism
CN113066034A (en) Face image restoration method and device, restoration model, medium and equipment
CN108921801B (en) Method and apparatus for generating image
CN111223046B (en) Image super-resolution reconstruction method and device
CN111932480A (en) Deblurred video recovery method and device, terminal equipment and storage medium
CN113421187B (en) Super-resolution reconstruction method, system, storage medium and equipment
CN115953303A (en) Multi-scale image compressed sensing reconstruction method and system combining channel attention
CN110782398B (en) Image processing method, generative countermeasure network system and electronic device
CN111797678A (en) Phase unwrapping method and device based on composite neural network
TW202409963A (en) Method and apparatus for generating high-resolution image, and a non-transitory computer-readable medium
CN115713462A (en) Super-resolution model training method, image recognition method, device and equipment
CN117934286B (en) Lightweight image super-resolution method and device and electronic equipment thereof
CN113850721A (en) Single image super-resolution reconstruction method, device and equipment and readable storage medium
CN113947521A (en) Image resolution conversion method and device based on deep neural network and terminal equipment
CN117830102A (en) Image super-resolution restoration method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant