CN114998457B - Image compression method, image decompression method, related device and readable storage medium - Google Patents

Image compression method, image decompression method, related device and readable storage medium Download PDF

Info

Publication number
CN114998457B
CN114998457B CN202210915500.2A CN202210915500A CN114998457B CN 114998457 B CN114998457 B CN 114998457B CN 202210915500 A CN202210915500 A CN 202210915500A CN 114998457 B CN114998457 B CN 114998457B
Authority
CN
China
Prior art keywords
nth
layer
vector
global
local
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210915500.2A
Other languages
Chinese (zh)
Other versions
CN114998457A (en
Inventor
梁永生
郑琳峰
鲍有能
谭文
李超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute Of Technology shenzhen Shenzhen Institute Of Science And Technology Innovation Harbin Institute Of Technology
Original Assignee
Harbin Institute Of Technology shenzhen Shenzhen Institute Of Science And Technology Innovation Harbin Institute Of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute Of Technology shenzhen Shenzhen Institute Of Science And Technology Innovation Harbin Institute Of Technology filed Critical Harbin Institute Of Technology shenzhen Shenzhen Institute Of Science And Technology Innovation Harbin Institute Of Technology
Priority to CN202210915500.2A priority Critical patent/CN114998457B/en
Publication of CN114998457A publication Critical patent/CN114998457A/en
Application granted granted Critical
Publication of CN114998457B publication Critical patent/CN114998457B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/002Image coding using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/001Model-based coding, e.g. wire frame

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application discloses an image compression method, an image decompression method, related equipment and a readable storage medium, which are used for fitting low, medium and high frequency information and improving the rate distortion performance of image compression. The method in the embodiment of the application comprises the following steps: the method comprises the steps of obtaining a two-dimensional coordinate vector of each pixel point in a space domain in an image to be compressed, inputting the image to be compressed and the two-dimensional coordinate vector into an input layer in a multilayer perceptron, obtaining a multi-dimensional vector of each pixel point in each frequency domain output by the input layer, inputting the multi-dimensional vector of the pixel point into N cascaded hidden layers in the multilayer perceptron, after cascading the global feature and the local feature of the multi-dimensional vector by the N hidden layers, obtaining an Nth comprehensive feature vector output by the Nth hidden layer, inputting the Nth comprehensive feature vector of the pixel point into an output layer in the multilayer perceptron, outputting a compressed pixel value corresponding to the pixel point by the output layer, and obtaining a compressed target image according to the compressed pixel value corresponding to each pixel point.

Description

Image compression method, image decompression method, related device and readable storage medium
Technical Field
The embodiment of the application relates to the field of image processing, in particular to an image compression method, an image decompression method, related equipment and a readable storage medium.
Background
The image compression has more benefits, such as reducing the storage space occupied by the image file, and reducing the consumed network bandwidth when the image file is transmitted, so that the compression of the image has a certain significance.
The existing image compression method includes the steps of obtaining a two-dimensional coordinate vector of each pixel point in a space domain in an image to be compressed, inputting the image to be compressed and the two-dimensional coordinate vector into an input layer in a multi-layer sensing machine trained in advance, obtaining a multi-dimensional vector of each pixel point output by the input layer in each frequency domain, inputting the multi-dimensional vector of the pixel point into N full-connection layers cascaded in the multi-layer sensing machine aiming at each pixel point, performing cascade processing on the characteristics of the multi-dimensional vectors through the N full-connection layers to obtain an N comprehensive characteristic vector output by an N-level full-connection layer, inputting the N comprehensive characteristic vector of the pixel point into an output layer in the multi-layer sensing machine aiming at each pixel point, outputting a compressed pixel value corresponding to the pixel point through the output layer, and obtaining a compressed target image according to the compressed pixel value corresponding to each pixel point.
However, because the nonlinear activation function adopted by the full-link layer is generally a ReLU activation function, a multi-layer perceptron composed of an input layer, N full-link layers and an output layer has spectral deviation, low-frequency information can be fitted, but middle-frequency and high-frequency information is difficult to fit, the N-th comprehensive feature vector output by the N-th full-link layer represents that the integrity of the features of the image to be compressed is poor, and the rate-distortion performance of image compression is low.
Disclosure of Invention
The embodiment of the application provides an image compression method, an image decompression method, related equipment and a readable storage medium, which are used for fitting low, medium and high frequency information and improving the rate distortion performance of image compression.
In a first aspect, an embodiment of the present application provides an image compression method, including:
obtaining a two-dimensional coordinate vector of each pixel point in a to-be-compressed image in a spatial domain, inputting the to-be-compressed image and the two-dimensional coordinate vector into an input layer in a pre-trained multilayer perceptron, and obtaining a multi-dimensional vector of each frequency of each pixel point in a frequency domain, which is output by the input layer;
for each pixel point, inputting the multidimensional vector of the pixel point into N cascaded hidden layers in the multilayer perceptron, and after the N hidden layers cascade the global features and the local features of the multidimensional vector, obtaining an Nth comprehensive feature vector output by the Nth hidden layer; n is an integer greater than or equal to 2;
for each pixel point, inputting the Nth comprehensive characteristic vector of the pixel point into an output layer in the multilayer perceptron, and outputting a compressed pixel value corresponding to the pixel point by the output layer;
and obtaining a compressed target image according to the compressed pixel value corresponding to each pixel point.
Optionally, each hidden layer includes a local branch unit, a global branch unit, and a synthesis unit respectively connected to the local branch unit and the global branch unit;
after the N hidden layers cascade the global features and the local features of the multidimensional vector, an nth comprehensive feature vector output by the nth hidden layer is obtained, including:
each local branch unit of the N hidden layers is used for extracting the local features of the multidimensional vector in a cascading manner to obtain an Nth-level local feature vector output by an Nth-level local branch unit;
extracting global features of the multidimensional vector by each global branch unit of the N hidden layers in a cascading manner to obtain an Nth-level global feature vector output by an Nth-level local branch unit;
and performing comprehensive processing on the Nth-level local feature vector and the Nth-level global feature vector by a comprehensive unit of the Nth-level hidden layer to output an Nth comprehensive feature vector of the Nth-level hidden layer.
Optionally, each of the local branching units includes a linear layer and a gaussian activation function layer connected to each other;
the step of extracting the local features of the multidimensional vector by the cascade connection of the local branch units of the N hidden layers to obtain an nth-level local feature vector output by an nth-level local branch unit includes:
for a level 1 local branch unit, inputting local features of the multi-dimensional vector into a level 1 local branch unit of the N hidden layers, processing the local features of the multi-dimensional vector by a linear layer of the level 1 local branch unit to output a first local linear feature vector, processing the first local linear feature vector by a Gaussian activation function layer of the level 1 local branch unit to output a level 1 local feature vector;
for the nth stage local branch unit, inputting the nth-1 synthesized feature vector output by the nth-1 stage hidden layer into the nth stage local branch unit, processing the nth-1 synthesized feature vector by a linear layer of the nth stage local branch unit to output an nth local linear feature vector, and processing the nth local linear feature vector by a Gaussian activation function layer of the nth stage local branch unit to output an nth stage local feature vector; wherein N is more than or equal to 2 and less than or equal to N.
Optionally, each of the global branching units includes a linear layer and a nonlinear activation function layer connected to each other;
the extracting, by the cascade connection of each global branching unit of the N hidden layers, the global feature of the multidimensional vector to obtain an nth level global feature vector output by an nth level local branching unit includes:
for a first-level global branch unit, inputting the multidimensional vector into a 1 st-level global branch unit in the N hidden layers, processing global features of the multidimensional vector by a linear layer of the 1 st-level global branch unit to output a first global linear feature vector, and processing the first global linear feature vector by a non-linear activation function layer of the 1 st-level global branch unit to output a 1 st-level global feature vector;
for an nth level global branch unit, inputting an nth-1 level global feature vector output by the nth-1 level hidden layer into the nth level global branch unit, processing the nth-1 level global feature vector by a linear layer of the nth level global branch unit to output an nth global linear feature vector, and processing the nth global linear feature vector by a nonlinear activation function layer of the nth level global branch unit to output an nth level global feature vector; wherein N is more than or equal to 2 and less than or equal to N.
Optionally, before the image to be compressed and the two-dimensional coordinate vector are input to an input layer in a pre-trained multi-layer perceptron, the method further includes:
obtaining a two-dimensional coordinate vector of each sample pixel point in a space domain in a multi-frame image sample; wherein, each sample pixel point is respectively marked with a pixel value;
inputting the image sample and the two-dimensional coordinate vector of the image sample into an initial multilayer perceptron, and outputting a predicted pixel value corresponding to each sample pixel point by the initial multilayer perceptron;
and calculating the loss between the predicted pixel value and the labeled pixel value of each sample pixel point according to a regression loss function, and obtaining the trained multilayer perceptron when the loss meets the convergence condition.
Optionally, the inputting the image to be compressed and the two-dimensional coordinate vector into an input layer of a pre-trained multi-layer perceptron includes:
obtaining a first fitting parameter value of the initial multi-layer perceptron and a second fitting parameter value of the trained multi-layer perceptron;
subtracting the first fitting parameter value from the second fitting parameter value to obtain a residual fitting parameter value;
converting the residual error fitting parameter value from a floating point number to an integer to obtain a target residual error fitting parameter value;
adding the target residual error fitting parameter value and the first fitting parameter value to obtain a target fitting parameter value;
and inputting the image to be compressed and the two-dimensional coordinate vector into an input layer of a pre-trained multilayer perceptron with fitting parameters as the target fitting parameter values.
In a second aspect, an embodiment of the present application provides an image decompression method, including:
determining the size of an image to be decompressed of a target image;
obtaining a target two-dimensional coordinate vector of each pixel point of the target image in a spatial domain under the image size, inputting the target image and the target two-dimensional coordinate vector into an input layer of a pre-trained multilayer perceptron, and obtaining a target multi-dimensional vector of each frequency of each pixel point in a frequency domain, which is output by the input layer;
for each pixel point, inputting a target multi-dimensional vector of the pixel point into N cascaded hidden layers in the multilayer perceptron, and after cascading processing is carried out on global features and local features of the target multi-dimensional vector by the N hidden layers, obtaining an Nth target comprehensive feature vector output by the Nth-level hidden layer; n is an integer greater than or equal to 2;
for each pixel point, inputting the Nth target comprehensive characteristic vector of the pixel point into an output layer in the multilayer perceptron, and outputting a decompressed pixel value corresponding to the pixel point by the output layer;
and obtaining a decompressed image according to the decompressed pixel value of each pixel value.
In a third aspect, an embodiment of the present application provides an image compression apparatus, including:
the input unit is used for obtaining a two-dimensional coordinate vector of each pixel point in a to-be-compressed image in a spatial domain, inputting the to-be-compressed image and the two-dimensional coordinate vector into an input layer in a pre-trained multilayer perceptron, and obtaining a multi-dimensional vector of each frequency of each pixel point in a frequency domain, which is output by the input layer;
the cascade processing unit is used for inputting the multidimensional vectors of the pixel points into N cascaded hidden layers in the multilayer perceptron, and after the global features and the local features of the multidimensional vectors are subjected to cascade processing by the N hidden layers, an Nth comprehensive feature vector output by the Nth-level hidden layer is obtained; n is an integer greater than or equal to 2;
the output unit is used for inputting the Nth comprehensive characteristic vector of each pixel point into an output layer in the multilayer perceptron, and outputting a compressed pixel value corresponding to the pixel point by the output layer;
and the obtaining unit is used for obtaining a compressed target image according to the compressed pixel value corresponding to each pixel point.
In a fourth aspect, an embodiment of the present application provides an image decompression apparatus, including:
a determining unit, configured to determine an image size to be decompressed for the target image;
the input unit is used for obtaining a target two-dimensional coordinate vector of each pixel point of the target image in a space domain under the image size, inputting the target image and the target two-dimensional coordinate vector into an input layer of a pre-trained multilayer perceptron, and obtaining a target multi-dimensional vector of each frequency of each pixel point in a frequency domain, which is output by the input layer;
the cascade processing unit is used for inputting the target multi-dimensional vectors of the pixel points into N cascaded hidden layers in the multilayer perceptron, and after the global features and the local features of the target multi-dimensional vectors are subjected to cascade processing by the N hidden layers, the N target comprehensive feature vectors output by the N-level hidden layers are obtained; n is an integer greater than or equal to 2;
the output unit is used for inputting the Nth target comprehensive characteristic vector of each pixel point into an output layer in the multilayer perceptron, and outputting a decompressed pixel value corresponding to the pixel point by the output layer;
an obtaining unit, configured to obtain a decompressed image according to the decompressed pixel value of each of the pixel values.
In a fifth aspect, an embodiment of the present application provides an image processing apparatus, including:
the system comprises a central processing unit, a memory, an input/output interface, a wired or wireless network interface and a power supply;
the memory is a transient storage memory or a persistent storage memory;
the central processor is configured to communicate with the memory and execute the instruction operations in the memory to perform the aforementioned image compression method and image decompression method.
In a sixth aspect, the present application provides a computer-readable storage medium, which includes instructions that, when executed on a computer, cause the computer to execute the foregoing image compression method and image decompression method.
In a seventh aspect, the present application provides a computer program product including instructions, which when run on a computer, causes the computer to execute the foregoing image compression method and image decompression method.
According to the technical scheme, the embodiment of the application has the following advantages: the method comprises the steps of obtaining a two-dimensional coordinate vector of each pixel point in a spatial domain in an image to be compressed, inputting the image to be compressed and the two-dimensional coordinate vector into an input layer in a multi-layer perceptron trained in advance, obtaining multi-dimensional vectors of each frequency of each pixel point output by the input layer, inputting the multi-dimensional vectors of the pixel points into N cascaded hidden layers in the multi-layer perceptron aiming at each pixel point, performing cascade processing on global features and local features of the multi-dimensional vectors by the N hidden layers to obtain an N comprehensive feature vector output by an N-level hidden layer, inputting the N comprehensive feature vector of the pixel point into an output layer in the multi-layer perceptron aiming at each pixel point, outputting compressed pixel values corresponding to the pixel points by the output layer, obtaining a compressed target image according to the compressed pixel values corresponding to each pixel point, fitting low, medium and high-frequency information, enabling the N comprehensive feature vector output by the N-level fully-connected layer to represent the integrity of the features of the image to be compressed, and enabling the rate distortion performance of the compressed image to be high.
Drawings
Fig. 1 is a schematic flowchart of an image compression method disclosed in an embodiment of the present application;
FIG. 2 is a schematic flowchart of a method for quantizing and entropy-encoding a difference between a first fitting parameter value and a second fitting parameter value according to an embodiment of the present disclosure;
FIG. 3 is a block diagram of an overall architecture of a multi-layered sensor according to an embodiment of the present disclosure;
fig. 4 is a schematic flowchart of an image decompression method disclosed in an embodiment of the present application;
fig. 5 is a schematic structural diagram of an image compression apparatus disclosed in an embodiment of the present application;
FIG. 6 is a schematic structural diagram of another image compression apparatus disclosed in an embodiment of the present application;
fig. 7 is a schematic structural diagram of an image decompression apparatus disclosed in an embodiment of the present application;
fig. 8 is a schematic structural diagram of an image processing apparatus disclosed in an embodiment of the present application.
Detailed Description
The embodiment of the application provides an image compression method, an image decompression method, related equipment and a readable storage medium, which are used for fitting low, medium and high frequency information and improving the rate distortion performance of image compression.
Referring to fig. 1, fig. 1 is a schematic flowchart of an image compression method disclosed in an embodiment of the present application, the method including:
101. and obtaining a two-dimensional coordinate vector of each pixel point in the image to be compressed in a spatial domain, inputting the image to be compressed and the two-dimensional coordinate vector into an input layer in a pre-trained multilayer perceptron, and obtaining a multidimensional vector of each frequency of each pixel point output by the input layer in a frequency domain.
When image compression is carried out, a two-dimensional coordinate vector of each pixel point in a to-be-compressed image in a spatial domain can be obtained, the to-be-compressed image and the two-dimensional coordinate vector are input into an input layer in a pre-trained multilayer perceptron, and a multi-dimensional vector of each frequency of each pixel point output by the input layer in a frequency domain is obtained.
102. For each pixel point, inputting the multidimensional vector of the pixel point into N cascaded hidden layers in a multilayer perceptron, and after cascading processing is carried out on the global features and the local features of the multidimensional vector by the N hidden layers, obtaining an N comprehensive feature vector output by an N-th hidden layer; n is an integer greater than or equal to 2.
After obtaining the multidimensional vectors of each frequency of each pixel output by the input layer in the frequency domain, inputting the multidimensional vectors of the pixels into N cascaded hidden layers in the multi-layer perceptron aiming at each pixel, and after cascading processing is carried out on the global features and the local features of the multidimensional vectors by the N hidden layers, obtaining the Nth comprehensive feature vector output by the Nth hidden layer; n is an integer greater than or equal to 2. It can be understood that, the method for cascade processing the global features and the local features of the multidimensional vector by the N hidden layers may be to cascade extract the local features of the multidimensional vector by each local branch unit of the N hidden layers to obtain an nth-level local feature vector output by an nth-level local branch unit, cascade extract the global features of the multidimensional vector by each global branch unit of the N hidden layers to obtain an nth-level global feature vector output by the nth-level local branch unit, or may be other reasonable methods for cascade processing, which is not limited herein.
103. And (4) inputting the Nth comprehensive characteristic vector of each pixel point into an output layer in the multilayer perceptron, and outputting a compressed pixel value corresponding to the pixel point by the output layer.
After the nth comprehensive characteristic vector output by the nth-level hidden layer is obtained, the nth comprehensive characteristic vector of the pixel point can be input into an output layer of the multilayer perceptron for each pixel point, and the output layer outputs a compressed pixel value corresponding to the pixel point.
104. And obtaining a compressed target image according to the compressed pixel value corresponding to each pixel point.
After the compressed pixel values corresponding to the pixel points are output by the output layer, the compressed target image can be obtained according to the compressed pixel values corresponding to each pixel point.
In the embodiment of the application, the two-dimensional coordinate vector of each pixel point in a space domain in an image to be compressed can be obtained, the image to be compressed and the two-dimensional coordinate vector are input into an input layer in a multi-layer perceptron trained in advance, the multi-dimensional vector of each pixel point in each frequency domain output by the input layer is obtained, the multi-dimensional vector of each pixel point is input into N cascaded hidden layers in the multi-layer perceptron aiming at each pixel point, the N hidden layers are used for carrying out cascade processing on the global features and the local features of the multi-dimensional vectors to obtain the N comprehensive feature vector output by the N-level hidden layers, the N comprehensive feature vector of each pixel point is input into an output layer in the multi-layer perceptron aiming at each pixel point, the output layer outputs the compressed pixel values corresponding to the pixel points, the compressed target image after compression is obtained according to the compressed pixel values corresponding to each pixel point, low, medium and high-frequency information can be fitted, the N comprehensive feature vector output by the N-level full connection layer represents the characteristics of the image to be compressed, and the image has good integrity and the distortion performance of the image compression is high.
In this embodiment of the present application, after the global features and the local features of the multidimensional vector are cascade-processed by N hidden layers, there may be a variety of methods for obtaining an nth comprehensive feature vector output by an nth hidden layer, and one of the methods is described below based on the image compression method shown in fig. 1.
In this embodiment, before performing image compression, the multi-layer perceptron needs to be trained in advance, and there may be a variety of training methods, one of which is described below:
firstly, embedding priori knowledge into a multilayer perceptron, and realizing meta-learning algorithm training of the multilayer perceptron to obtain an initial multilayer perceptron and a first fitting parameter of the initial multilayer perceptron. Specifically, the method for training the multi-layer perceptron by the meta-learning algorithm comprises the following steps: the meta-learning algorithm can be based on the MAML algorithm, the training comprises two parts of inner layer training optimization and outer layer training optimization, and the neural network parameters can be set as
Figure DEST_PATH_IMAGE001
The parameter-by-parameter learning rate of the inner loop is alpha, and the updating step number of the inner loop is k. First, initialization is performed
Figure 967684DEST_PATH_IMAGE001
And alpha, randomly sampling a plurality of samples from the metadata set, dividing the samples into training samples and testing samples in an outer layer cycle, and assigning network parameters of an inner layer cycle to be
Figure 357077DEST_PATH_IMAGE002
Updating parameters according to the prediction loss of the model on the training sample in the inner layer circulation, and obtaining the parameters after k steps of updating after the inner layer circulation is finished
Figure DEST_PATH_IMAGE003
The gradient is determined from the predicted loss of the model on the test specimen, and the parameters are not updated again
Figure 703745DEST_PATH_IMAGE003
Instead, the skin parameters are updated according to the gradient
Figure 647430DEST_PATH_IMAGE001
And the learning rate alpha is cycled, and the first fitting parameter is finally obtained
Figure 206587DEST_PATH_IMAGE004
And a learning rate α.
Embedding the priori knowledge into a multilayer perceptron to obtain an initial multilayer perceptron and a first fitting parameter of the initial multilayer perceptron
Figure 423723DEST_PATH_IMAGE004
The initial multi-tier perceptron may then be trained. The specific training method is as follows: firstly, obtaining a two-dimensional coordinate vector of each sample pixel point in a multi-frame image sample in a spatial domain; the method comprises the steps of labeling pixel values of all sample pixel points, inputting two-dimensional coordinate vectors of image samples and image samples into an initial multilayer perceptron, outputting predicted pixel values corresponding to all the sample pixel points by the initial multilayer perceptron, calculating loss between the predicted pixel values and the labeled pixel values of all the sample pixel points according to a regression loss function, and obtaining the trained multilayer perceptron when the loss meets a convergence condition. For example, if the image sample is a 2D image, let us
Figure 878975DEST_PATH_IMAGE006
A single frame of an image is represented,
Figure DEST_PATH_IMAGE007
representing annotated pixel values, the initial multi-level perceptron representation uses weights of
Figure 841115DEST_PATH_IMAGE008
Neural network of
Figure DEST_PATH_IMAGE009
To represent the frame image when inputting the two-dimensional coordinate vector
Figure 469542DEST_PATH_IMAGE010
Then, the output predicted pixel value is
Figure DEST_PATH_IMAGE011
The loss between the predicted pixel value and the annotated pixel value for each sample pixel point is calculated by minimizing the mean square error, i.e.
Figure 99106DEST_PATH_IMAGE012
And when the loss meets the convergence condition, obtaining the trained multilayer perceptron.
After the multi-layer perceptron is trained, and when image compression is carried out, a two-dimensional coordinate vector of each pixel point in a to-be-compressed image in a spatial domain can be obtained, the to-be-compressed image and the two-dimensional coordinate vector are input into an input layer in the pre-trained multi-layer perceptron, and a multi-dimensional vector of each frequency of each pixel point output by the input layer in a frequency domain is obtained. Specifically, after the image to be compressed and the two-dimensional coordinate vector are input into an input layer of a pre-trained multilayer perceptron, the obtained multidimensional vector is a multidimensional vector subjected to random Fourier mapping.
It should be noted that, after obtaining the trained multi-layered sensor, before inputting the image to be compressed and the two-dimensional coordinate vector into the input layer of the pre-trained multi-layered sensor, in order to reduce the code rate and thereby improve the rate-distortion performance of the compressed image, the difference between the first fitting parameter value of the initial multi-layered sensor and the second fitting parameter value of the trained multi-layered sensor may be quantized and entropy-encoded, please refer to fig. 2, where fig. 2 is a schematic flow chart of a method for quantizing and entropy-encoding the difference between the first fitting parameter value of the initial multi-layered sensor and the second fitting parameter value of the trained multi-layered sensor, which is disclosed in an embodiment of the present application, and the method includes: first obtaining a first fitting parameter value of an initial multi-layer perceptron, and trainingAnd subtracting the first fitting parameter value from the second fitting parameter value to obtain a residual fitting parameter value, converting the residual fitting parameter value from a floating point number to an integer to obtain a target residual fitting parameter value, and adding the target residual fitting parameter value and the first fitting parameter value to obtain a target fitting parameter value. Specifically, let the whole multi-layered perceptron be the mapping
Figure DEST_PATH_IMAGE013
The multilayer perceptron is the neural network shown in FIG. 2, and the parameter of the initial multilayer perceptron is the first fitting parameter value
Figure 23462DEST_PATH_IMAGE004
(i.e., meta-learning initialization parameters shown in FIG. 2
Figure 941740DEST_PATH_IMAGE004
I.e. parameters obtained after meta-learning training), training the initial multi-layer perceptron to obtain a trained multi-layer perceptron, and fitting the second fitting parameter values of the trained multi-layer perceptron
Figure 842700DEST_PATH_IMAGE014
The second fitting parameter value
Figure 264454DEST_PATH_IMAGE014
Subtracting the first fitting parameter value
Figure 795929DEST_PATH_IMAGE004
Obtaining residual fitting parameter values
Figure DEST_PATH_IMAGE015
According to quantization units
Figure 263820DEST_PATH_IMAGE016
Quantizing the residual fitting parameter values, and converting the residual fitting parameter values from floating point numbers to integers
Figure DEST_PATH_IMAGE017
And thereby further compress the data, thereby further compressing the data,
Figure 998164DEST_PATH_IMAGE013
then through lossless coding module AE, will
Figure 540004DEST_PATH_IMAGE018
Converting into binary code stream, and restoring the binary code stream into integer by lossless decoding module AD after the binary code stream passes through transmission channel
Figure DEST_PATH_IMAGE019
Plus the first fitting parameter value
Figure 570277DEST_PATH_IMAGE004
And obtaining a target fitting parameter value.
After the difference between the first fitting parameter value of the initial multi-layer sensor and the second fitting parameter value of the trained multi-layer sensor is quantized and entropy-encoded, the image to be compressed may be compressed, please refer to fig. 3, where fig. 3 is an overall architecture diagram of the multi-layer sensor disclosed in this embodiment of the present application, and the image to be compressed and the two-dimensional coordinate vector may be input into an input layer (i.e., a position encoding layer in fig. 3) of the pre-trained multi-layer sensor whose fitting parameters are target fitting parameter values, so as to obtain a multi-dimensional vector of each frequency of each pixel point output by the input layer in a frequency domain. Specifically, the two-dimensional coordinate vector may be normalized first to obtain a normalized two-dimensional coordinate vector
Figure 197567DEST_PATH_IMAGE020
The normalized two-dimensional coordinate vector is processed
Figure DEST_PATH_IMAGE021
Input layer of input multilayer perceptron
Figure 174751DEST_PATH_IMAGE022
Obtaining an input layer
Figure DEST_PATH_IMAGE023
Outputting multidimensional vector of each frequency of each pixel point in frequency domain
Figure 603720DEST_PATH_IMAGE024
Wherein, in the process,
Figure 273736DEST_PATH_IMAGE025
in order to input the amplitude of the layer,
Figure DEST_PATH_IMAGE026
the dimensions of the input coordinates, as for a 2D image,
Figure 919481DEST_PATH_IMAGE027
,
Figure DEST_PATH_IMAGE028
in order to input the amplitude of the layer,
Figure 231514DEST_PATH_IMAGE029
is calculated from a mean value of 0 and a standard deviation of
Figure DEST_PATH_IMAGE030
The hyper-parameters (parameters in the Gaussian distribution) are set as default
Figure 279104DEST_PATH_IMAGE031
. For the normalization example, for a 2D image with an image size of 3 pixels by 3 pixels, the two-dimensional coordinate vector of each pixel point in the image is [0,0 ]],[0,1],[0,2],[1,0], [1,1], [1,2],[2,0], [2,1], [2,2]After the two-dimensional coordinate vector is normalized, the [ -1, -1 ] can be obtained], [-1,0], [-1,1],[0,-1],[0,0],[0,1],[1,-1], [1,0], [1,1]。
After obtaining the multidimensional vectors of each frequency of each pixel point output by the input layer in the frequency domain, the multidimensional vectors of the pixel points can be input into N cascaded hidden layers in the multilayer perceptron aiming at each pixel point, and the N hidden layers cascade the global characteristics and the local characteristics of the multidimensional vectorsAfter processing, obtaining an Nth comprehensive characteristic vector output by an Nth-level hidden layer; n is an integer greater than or equal to 2. Specifically, the cascaded N hidden layers may be cascaded N Wavelet Base Units (WBUs), or may be other neural network layers capable of performing cascade processing on global features and local features of the multidimensional vector, and are not limited herein. Specifically, the nth synthesized feature vector is an output vector of the cascaded N hidden layers, and may be an output vector of the cascaded N hidden layers
Figure DEST_PATH_IMAGE032
The method for obtaining the nth comprehensive feature vector output by the nth hidden layer after the global feature and the local feature of the multidimensional vector are cascaded by the N hidden layers may be: after obtaining the nth level local feature vector and the nth level global feature vector, the synthesis unit of the nth level hidden layer may perform synthesis processing on the nth level local feature vector and the nth level global feature vector to output the nth level comprehensive feature vector of the nth level hidden layer. Each hidden layer comprises a local branch unit, a global branch unit and a comprehensive unit respectively connected with the local branch unit and the global branch unit. Specifically, the method of the comprehensive processing may be to perform dot multiplication on the nth level local feature vector and the nth level global feature vector, or may be to perform addition, matrix multiplication, and the like, and the specific method of the comprehensive processing is not limited here.
With reference to fig. 3, in fig. 3, a dashed box includes a hidden layer, the linear layer and the gaussian activation function above each hidden layer form a local branch unit, the linear layer and the ReLU activation function or the squared ReLU activation function below each hidden layer form a global branch unit, and the portion of the local branch unit and the global branch unit in each hidden layer is an integrated unit.
The method for extracting the local features of the multidimensional vector can be as follows: for the level 1 local branch unit, local features of the multi-dimensional vector are input into the level 1 local branch unit in the N hidden layers, the local features of the multi-dimensional vector are processed by a linear layer of the level 1 local branch unit to output a first local linear feature vector, and the first local linear feature vector is processed by a Gaussian activation function layer of the level 1 local branch unit to output the level 1 local feature vector. For the nth stage local branch unit, inputting the nth-1 comprehensive characteristic vector output by the nth-1 stage hidden layer into the nth stage local branch unit, processing the nth-1 comprehensive characteristic vector by a linear layer of the nth stage local branch unit to output an nth local linear characteristic vector, and processing the nth local linear characteristic vector by a Gaussian activation function layer of the nth stage local branch unit to output the nth stage local characteristic vector; wherein N is more than or equal to 2 and less than or equal to N. Wherein each local branching unit comprises a linear layer and a Gaussian activation function layer which are connected with each other. It is understood that the method for extracting the local feature of the multi-dimensional vector may be other reasonable methods besides the above method, and is not limited herein.
The method for extracting the global features of the multidimensional vector may be: for the first-level global branch unit, the multidimensional vector is input into a 1 st-level global branch unit in the N hidden layers, the global features of the multidimensional vector are processed by the linear layer of the 1 st-level global branch unit to output a first global linear feature vector, and the first global linear feature vector is processed by the nonlinear activation function layer of the 1 st-level global branch unit to output a 1 st-level global feature vector. For the nth level global branch unit, inputting the nth-1 level global feature vector output by the nth-1 level hidden layer into the nth level global branch unit, processing the nth-1 level global feature vector by a linear layer of the nth level global branch unit to output an nth level global linear feature vector, and processing the nth global linear feature vector by a nonlinear activation function layer of the nth level global branch unit to output the nth level global feature vector; wherein N is more than or equal to 2 and less than or equal to N. Wherein each global branching unit comprises a linear layer and a nonlinear activation function layer which are connected with each other. The non-linear activation function layer may be a ReLU activation function layer or a squared ReLU activation function layer, which is not limited herein.
It is understood that the method for extracting the global feature of the multidimensional vector may be other reasonable methods besides the above method, and is not limited herein.
It can also be understood that, the method for obtaining the nth comprehensive feature vector output by the nth hidden layer after the global feature and the local feature of the multidimensional vector are cascade-processed by the N hidden layers may be other reasonable methods besides the above method, and is not limited herein.
Specifically, let us
Figure 167293DEST_PATH_IMAGE033
The input quantities of the local branch unit and the global branch unit of the stage hidden layer are respectively
Figure DEST_PATH_IMAGE034
Then the iterative process for each level of hidden layers may be as follows:
Figure 565913DEST_PATH_IMAGE035
wherein,
Figure DEST_PATH_IMAGE036
respectively represent the first
Figure 681637DEST_PATH_IMAGE033
The fitting parameters of the linear layers of global and local branching units in the level concealment layer,
Figure 52575DEST_PATH_IMAGE037
expressing the nonlinear activation function of the global branch unit, formula I represents
Figure 64393DEST_PATH_IMAGE038
Input quantity of global branch unit of stage hidden layer
Figure DEST_PATH_IMAGE039
And a first
Figure 123879DEST_PATH_IMAGE033
Input quantity of global branch unit of stage hidden layer
Figure 512135DEST_PATH_IMAGE040
The mapping relation is the first
Figure 737580DEST_PATH_IMAGE038
Input quantity of global branch unit of stage hidden layer
Figure 920299DEST_PATH_IMAGE039
Is as follows
Figure 496774DEST_PATH_IMAGE033
Input quantity of global branch unit of stage hidden layer
Figure 423142DEST_PATH_IMAGE040
The first obtained after linear processing of the linear layer and nonlinear processing of the nonlinear activation function layer
Figure 503093DEST_PATH_IMAGE033
A level global feature vector. The second formula represents
Figure 122293DEST_PATH_IMAGE038
Input quantity of local branch unit of stage hidden layer
Figure 419020DEST_PATH_IMAGE041
And a first
Figure 149079DEST_PATH_IMAGE033
Input quantity of local branch unit of stage hidden layer
Figure DEST_PATH_IMAGE042
The mapping relation is the first
Figure 614695DEST_PATH_IMAGE038
Input quantity of local branch unit of stage hidden layer
Figure 670376DEST_PATH_IMAGE043
Comprises the following steps: firstly, first of all
Figure 955864DEST_PATH_IMAGE033
Input quantity of local branch unit of stage hidden layer
Figure 958455DEST_PATH_IMAGE042
Linear processing of the linear layer and nonlinear processing of the Gaussian activation function layer are carried out to obtain the second
Figure 278578DEST_PATH_IMAGE033
Rank local feature vector, will
Figure 741045DEST_PATH_IMAGE033
Level local feature vector and
Figure 248250DEST_PATH_IMAGE033
performing point multiplication on the level global feature vector to obtain the second
Figure 320111DEST_PATH_IMAGE033
Output of stage hidden layer
Figure 494741DEST_PATH_IMAGE033
Synthesize the feature vector, will
Figure 361066DEST_PATH_IMAGE033
Output of stage hidden layer
Figure 621146DEST_PATH_IMAGE033
Synthesizing the feature vector as
Figure 231118DEST_PATH_IMAGE038
Input quantity of local branch unit of stage hidden layer
Figure 260254DEST_PATH_IMAGE043
After the N-th comprehensive characteristic vector of the N-th level hidden layer is output, the N-th comprehensive characteristic vector of each pixel point can be input into an output layer of the multilayer perceptron, and a compressed pixel value corresponding to the pixel point is output by the output layer. The compressed pixel value may be an RGB value of the image, or may be another pixel value, which is not limited herein. Specifically, the output layer includes linear layer and Sigmoid activation function layer, to every pixel, synthesizes the output layer that the characteristic vector was input into multilayer perceptron with the Nth of pixel, by the compressed pixel value y that output layer output pixel point corresponds, the formula is as follows:
Figure DEST_PATH_IMAGE044
and a third formula represents a mapping relation between the compressed pixel value y and the Nth comprehensive characteristic vector w, wherein the mapping relation is that after the Nth comprehensive characteristic vector w is input into a linear layer of an output layer for linear processing, a Sigmoid activation function layer is input for nonlinear processing, and finally the compressed pixel value y is obtained.
After the compressed pixel values corresponding to the pixel points are output by the output layer, a compressed target image can be obtained according to the compressed pixel values corresponding to each pixel point, and the compressed target image is a reconstructed image obtained after being compressed according to the multilayer perceptron.
It is understood that the encoding scheme implemented by the input layer to map the two-dimensional coordinate vector into the high-dimensional vector includes, but is not limited to, sine and cosine position encoding, random fourier feature position encoding, gaussian function position encoding, and other different position encoding schemes. Of local branch units
Figure 536296DEST_PATH_IMAGE045
Activation functions include, but are not limited to, gaussian activation functions, reLU functions, geLU functions, squaring functions, or squared ReLU functions. The activation function of the output layer may be selected according to a range of data values, and the specific embodiment of the activation function may be various activation functions including, but not limited to, a linear function, a sigmoid function, a tanh function, a ReLU function, and the like. The values output by the output layer of the multi-layer perceptron can take different data values according to the data type needing to be fitted, including but not limited to the amplitude value of the audio signal, the RGB values of the 2D image and the video, the symbol distance function value of the 3D surface, the attribute value of the 3D point cloud and the like. The pixel value may be an RGB value of the image, or may be other pixel values representing the image, which is not limited herein. Algorithms involved in meta-learning training the multi-layer perceptron include, but are not limited to, the MAML algorithm and the replay algorithm. Quantization methods include, but are not limited to, uniform quantization, non-uniform quantization, vector quantization. The lossless coding module AE and the lossless decoding module AD are entropy coding methods, and specific embodiments include, but are not limited to, huffman coding, arithmetic coding, interval coding, asymmetric number series coding, and the like.
In the embodiment, a two-dimensional coordinate vector of each pixel point in a spatial domain in an image to be compressed can be obtained, the image to be compressed and the two-dimensional coordinate vector are input into an input layer in a multi-layer perceptron trained in advance, a multi-dimensional vector of each frequency of each pixel point output by the input layer is obtained, the multi-dimensional vector of each pixel point is input into N hidden layers cascaded in the multi-layer perceptron aiming at each pixel point, the N hidden layers are used for cascading the global characteristics and the local characteristics of the multi-dimensional vector to obtain an N comprehensive characteristic vector output by an N-level hidden layer, the N comprehensive characteristic vector of each pixel point is input into an output layer in the multi-layer perceptron aiming at each pixel point, a compressed pixel value corresponding to the pixel point is output by the output layer, a compressed target image corresponding to each pixel point is obtained according to the compressed pixel value corresponding to each pixel point, and the data fitting capability of the multi-layer perceptron is improved by designing an efficient activation function and a neural network structure, so that information of low, medium and high frequency can be obtained, the N comprehensive characteristic vector output by the N comprehensive layer represents the integrity of the image to be compressed, and the image fitting performance is high. Secondly, the first fitting parameters of the initial multilayer perceptron are introduced, prior knowledge can be merged into the fitting parameters of the multilayer perceptron, the gradient direction which enables the gradient to be reduced as fast as possible can be found quickly by utilizing the grasped prior knowledge, and the convergence speed in the process of training the multilayer perceptron can be improved. And moreover, the difference value of the first fitting parameter value of the initial multilayer perceptron and the second fitting parameter value of the trained multilayer perceptron is quantized and entropy-coded, so that the parameter quantity of the multilayer perceptron is reduced, the calculation complexity is reduced, the code rate is reduced, and the rate-distortion performance of image compression is improved.
The image compression method in the embodiment of the present application is described above, and the image decompression method in the embodiment of the present application is described below, please refer to fig. 4, where fig. 4 is a schematic flow chart of an image decompression method disclosed in the embodiment of the present application, and the method includes:
401. the size of the image that needs to be decompressed for the target image is determined.
When image decompression is performed, the size of the image that needs to be decompressed for the target image may be determined.
402. And obtaining a target two-dimensional coordinate vector of each pixel point of the target image in the spatial domain under the image size, inputting the target image and the target two-dimensional coordinate vector into an input layer in a pre-trained multilayer perceptron, and obtaining a target multi-dimensional vector of each frequency of each pixel point output by the input layer in the frequency domain.
After the size of an image to be decompressed of a target image is determined, a target two-dimensional coordinate vector of each pixel point of the target image in a spatial domain under the image size can be obtained, the target image and the target two-dimensional coordinate vector are input into an input layer of a pre-trained multilayer perceptron, and a target multi-dimensional vector of each frequency of each pixel point output by the input layer in a frequency domain is obtained.
403. For each pixel point, inputting a target multi-dimensional vector of the pixel point into N cascaded hidden layers in a multi-layer perceptron, and after cascading processing is carried out on global features and local features of the target multi-dimensional vector by the N hidden layers, obtaining an Nth target comprehensive feature vector output by an Nth-level hidden layer; n is an integer greater than or equal to 2.
After a target multi-dimensional vector of each pixel output by the input layer at each frequency in a frequency domain is obtained, the target multi-dimensional vector of each pixel can be input into N cascaded hidden layers in the multi-layer perceptron aiming at each pixel, and after global features and local features of the target multi-dimensional vector are cascaded by the N hidden layers, an Nth target comprehensive feature vector output by an Nth-level hidden layer is obtained; n is an integer greater than or equal to 2.
404. And inputting the Nth target comprehensive characteristic vector of each pixel point into an output layer in the multilayer perceptron, and outputting the decompressed pixel value corresponding to the pixel point by the output layer.
After the nth target comprehensive characteristic vector output by the nth-level hidden layer is obtained, the nth target comprehensive characteristic vector of the pixel point can be input into an output layer in the multilayer perceptron for each pixel point, and the output layer outputs the decompressed pixel value corresponding to the pixel point.
405. And obtaining a decompressed image according to the decompressed pixel value of each pixel value.
After the decompressed pixel values corresponding to the pixel points are output by the output layer, the decompressed image can be obtained according to the decompressed pixel value of each pixel value. Specifically, the decompressed image is a reconstructed image obtained by decompressing according to the multilayer perceptron.
In the embodiment, the size of an image to be decompressed of a target image can be determined, a target two-dimensional coordinate vector of each pixel point of the target image in a spatial domain under the image size is obtained, the target image and the target two-dimensional coordinate vector are input into an input layer in a pre-trained multilayer perceptron, a target multi-dimensional vector of each pixel point output by the input layer at each frequency in a frequency domain is obtained, the target multi-dimensional vector of each pixel point is input into N cascaded hidden layers in the multilayer perceptron, and an Nth target comprehensive feature vector output by an Nth-level hidden layer is obtained after global features and local features of the target multi-dimensional vector are cascaded by the N hidden layers; and N is an integer greater than or equal to 2, the Nth target comprehensive characteristic vector of each pixel point is input into an output layer of the multilayer perceptron, the output layer outputs a decompressed pixel value corresponding to the pixel point, and a decompressed image is obtained according to the decompressed pixel value of each pixel value. Through designing an efficient activation function and a neural network structure, the data fitting capacity of the multilayer perceptron is improved, so that low, medium and high frequency information can be fitted, the N comprehensive characteristic vector output by the N-level full-connection layer represents the good integrity of the characteristics of the image to be decompressed, and the rate distortion performance of image decompression is high. Secondly, the image with any image size can be decompressed according to the requirements of the user, and the flexibility of decompressing and reconstructing the compressed image is improved.
With reference to fig. 5, the image decompression method in the embodiment of the present application is described above, and the image compression apparatus in the embodiment of the present application is described below, where an embodiment of the image compression apparatus in the embodiment of the present application includes:
the input unit 501 is configured to obtain a two-dimensional coordinate vector of each pixel point in a to-be-compressed image in a spatial domain, input the to-be-compressed image and the two-dimensional coordinate vector into an input layer in a pre-trained multi-layer perceptron, and obtain a multi-dimensional vector of each frequency of each pixel point in a frequency domain, where the multi-dimensional vector is output by the input layer;
a cascade processing unit 502, configured to input, for each pixel point, the multidimensional vector of the pixel point obtained by the input unit 501 into N cascaded hidden layers in the multilayer perceptron, and after the N hidden layers perform cascade processing on the global features and the local features of the multidimensional vector, obtain an nth comprehensive feature vector output by the nth-stage hidden layer; n is an integer greater than or equal to 2;
an output unit 503, configured to, for each pixel point, input the nth synthesized feature vector of the pixel point obtained by the cascade processing unit 502 into an output layer in the multilayer perceptron, and output, by the output layer, a compressed pixel value corresponding to the pixel point;
an obtaining unit 504, configured to obtain a compressed target image according to the compressed pixel value corresponding to the pixel point obtained by each output unit 503.
In the embodiment of the application, a two-dimensional coordinate vector of each pixel point in a spatial domain in an image to be compressed can be obtained, the image to be compressed and the two-dimensional coordinate vector are input into an input layer in a multi-layer perceptron trained in advance, a multi-dimensional vector of each frequency of each pixel point output by the input layer is obtained, the multi-dimensional vector of each pixel point in the frequency domain is input into N cascaded hidden layers in the multi-layer perceptron aiming at each pixel point, after the global characteristic and the local characteristic of the multi-dimensional vector are cascaded by the N hidden layers, an N comprehensive characteristic vector output by an N-level hidden layer is obtained, the N comprehensive characteristic vector of each pixel point is input into an output layer in the multi-layer perceptron aiming at each pixel point, a compressed pixel value corresponding to each pixel point is output by the output layer, a compressed target image is obtained according to the compressed pixel value corresponding to each pixel point, low, medium and high frequency information can be fitted, the N comprehensive characteristic vector output by the N comprehensive layer of the N full connection layer represents the integrity of the characteristics of the image to be compressed, and the distortion performance of the image compression is high.
Referring to fig. 6, an image compression apparatus in an embodiment of the present application is described in detail below, where another embodiment of the image compression apparatus in the embodiment of the present application includes:
the input unit 601 is configured to obtain a two-dimensional coordinate vector of each pixel point in a to-be-compressed image in a spatial domain, input the to-be-compressed image and the two-dimensional coordinate vector into an input layer in a pre-trained multi-layer perceptron, and obtain a multi-dimensional vector of each frequency of each pixel point in a frequency domain, where the multi-dimensional vector is output by the input layer;
a cascade processing unit 602, configured to input, for each pixel point, the multidimensional vector of the pixel point obtained by the input unit 601 into N cascaded hidden layers in the multilayer perceptron, and after cascade processing is performed on global features and local features of the multidimensional vector by the N hidden layers, an nth comprehensive feature vector output by the nth hidden layer is obtained; n is an integer greater than or equal to 2;
an output unit 603, configured to, for each pixel point, input the nth synthesized feature vector of the pixel point obtained by the cascade processing unit 602 into an output layer in the multilayer perceptron, and output, by the output layer, a compressed pixel value corresponding to the pixel point;
an obtaining unit 604, configured to obtain a compressed target image according to the compressed pixel value corresponding to the pixel point obtained by each output unit 603.
The cascade processing unit 602 is specifically configured to extract, by cascading, each local branch unit of the N hidden layers, the local feature of the multidimensional vector to obtain an nth-level local feature vector output by an nth-level local branch unit, extract, by cascading, by each global branch unit of the N hidden layers, the global feature of the multidimensional vector to obtain an nth-level global feature vector output by an nth-level local branch unit, and perform, by a synthesis unit of the nth-level hidden layer, synthesis processing on the nth-level local feature vector and the nth-level global feature vector to output an nth-level synthesized feature vector of the nth-level hidden layer.
The cascade processing unit 602, specifically configured to input local features of the multidimensional vector to a level 1 local branching unit of the N hidden layers for the level 1 local branching unit, process local features of the multidimensional vector by a linear layer of the level 1 local branching unit to output a first local linear feature vector, process the first local linear feature vector by a gaussian activation function layer of the level 1 local branching unit to output a level 1 local feature vector, input an N-1 th integrated feature vector output by the N-1 th hidden layer to the N level local branching unit for the N level local branching unit, process the N-1 th integrated feature vector by a linear layer of the N level local branching unit to output an N-th local linear feature vector, process the N-th local linear feature vector by a gaussian activation function layer of the N level local branching unit to output an N level local feature vector; wherein N is more than or equal to 2 and less than or equal to N.
The cascade processing unit 602 is specifically configured to, for a first-level global branch unit, input the multidimensional vector into a 1 st-level global branch unit in the N hidden layers, process global features of the multidimensional vector by a linear layer of the 1 st-level global branch unit to output a first global linear feature vector, process the first global linear feature vector by a non-linear activation function layer of the 1 st-level global branch unit to output a 1 st-level global feature vector, for an nth-level global branch unit, input an N-1 st-level global feature vector output by the N-1 st-level hidden layer into the nth-level global branch unit, process the N-1 st-level global feature vector by a linear layer of the nth-level global branch unit to output an nth global linear feature vector, process the nth global linear feature vector by a non-linear activation function layer of the nth-level global branch unit to output an nth-level global feature vector; wherein N is more than or equal to 2 and less than or equal to N.
The image compression apparatus further includes: a calculation unit 605;
the obtaining unit 604 is further configured to obtain a two-dimensional coordinate vector of each sample pixel point in the multi-frame image sample in the spatial domain; wherein, each sample pixel point is respectively marked with a pixel value;
the input unit 601 is further configured to input the image sample and the two-dimensional coordinate vector of the image sample into an initial multi-layer perceptron, and output a predicted pixel value corresponding to each sample pixel point by the initial multi-layer perceptron;
the calculating unit 605 is specifically configured to calculate a loss between the predicted pixel value and the labeled pixel value of each sample pixel point according to a regression loss function, and when the loss meets a convergence condition, obtain a trained multi-layer perceptron.
The input unit 601 is specifically configured to obtain a first fitting parameter value of the initial multi-layer sensor and a second fitting parameter value of the trained multi-layer sensor, subtract the first fitting parameter value from the second fitting parameter value to obtain a residual fitting parameter value, convert the residual fitting parameter value from a floating point number to an integer to obtain a target residual fitting parameter value, add the target residual fitting parameter value to the first fitting parameter value to obtain a target fitting parameter value, and input the to-be-compressed image and the two-dimensional coordinate vector to an input layer of the pre-trained multi-layer sensor of which the fitting parameters are the target fitting parameter value.
In this embodiment, each unit in the image compression device performs the operation of the image compression device in the embodiment shown in fig. 1, which is not described herein again.
With reference to fig. 7, the image compression apparatus in the embodiment of the present application is described above, and the image decompression apparatus in the embodiment of the present application is described below, where an embodiment of the image decompression apparatus in the embodiment of the present application includes:
a determining unit 701, configured to determine an image size to be decompressed for a target image;
an input unit 702, configured to obtain a target two-dimensional coordinate vector of each pixel point of the target image in the spatial domain in the image size determined by the determining unit 701, input the target image and the target two-dimensional coordinate vector into an input layer in a pre-trained multi-layer perceptron, and obtain a target multi-dimensional vector of each frequency of each pixel point in a frequency domain output by the input layer;
the cascade processing unit 703 is further configured to, for each pixel point, input the target multidimensional vector of the pixel point obtained by the input unit 702 into N cascaded hidden layers in the multilayer perceptron, and after the global features and the local features of the target multidimensional vector are cascade-processed by the N hidden layers, obtain an nth target comprehensive feature vector output by the nth-stage hidden layer; n is an integer greater than or equal to 2;
the output unit 704 is further configured to, for each pixel point, input the nth target comprehensive feature vector of the pixel point obtained by the cascade processing unit 703 into an output layer in the multilayer perceptron, and output a decompressed pixel value corresponding to the pixel point by the output layer;
an obtaining unit 705, configured to obtain a decompressed image according to the decompressed pixel value of the pixel value obtained by each output unit 704.
In this embodiment, the size of an image to be decompressed of a target image can be determined, a target two-dimensional coordinate vector of each pixel point of the target image in a spatial domain under the image size is obtained, the target image and the target two-dimensional coordinate vector are input into an input layer in a multi-layer perceptron trained in advance, a target multi-dimensional vector of each frequency of each pixel point output by the input layer in a frequency domain is obtained, the target multi-dimensional vector of each pixel point is input into N cascaded hidden layers in the multi-layer perceptron aiming at each pixel point, and an N target comprehensive feature vector output by an N-level hidden layer is obtained after global features and local features of the target multi-dimensional vector are cascaded by the N hidden layers; and N is an integer greater than or equal to 2, aiming at each pixel point, inputting the Nth target comprehensive characteristic vector of the pixel point into an output layer in the multilayer perceptron, outputting a decompressed pixel value corresponding to the pixel point by the output layer, and obtaining a decompressed image according to the decompressed pixel value of each pixel value. Through designing an efficient activation function and a neural network structure, the data fitting capacity of the multilayer perceptron is improved, so that low, medium and high frequency information can be fitted, the N comprehensive characteristic vector output by the N-level full-connection layer represents the good integrity of the characteristics of the image to be decompressed, and the rate distortion performance of image decompression is high. Secondly, the image with any image size can be decompressed according to the requirements of the user, and the flexibility of decompressing and reconstructing the compressed image is improved.
Referring to fig. 8, another embodiment of an image processing apparatus 800 according to an embodiment of the present application includes:
a central processing unit 801, a memory 805, an input/output interface 804, a wired or wireless network interface 803 and a power supply 802;
memory 805 is a transient storage memory or a persistent storage memory;
the central processor 801 is configured to communicate with the memory 805 and execute the operations of the instructions in the memory 805 to perform the methods described in the embodiments illustrated in fig. 1 above.
The embodiment of the present application further provides a computer-readable storage medium, which includes instructions, when the instructions are executed on a computer, cause the computer to execute the method in the foregoing embodiment shown in fig. 1.
The embodiment of the present application further provides a computer program product containing instructions, which when run on a computer, cause the computer to execute the method in the foregoing embodiment shown in fig. 1.
It should be understood that, although the steps in the flowcharts related to the embodiments as described above are sequentially displayed as indicated by arrows, the steps are not necessarily performed sequentially as indicated by the arrows. The steps are not limited to being performed in the exact order illustrated and, unless explicitly stated herein, may be performed in other orders. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be rotated or alternated with other steps or at least a part of the steps or stages in other steps.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present application, which are essential or part of the technical solutions contributing to the prior art, or all or part of the technical solutions, may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and the like.

Claims (10)

1. An image compression method, comprising:
obtaining a two-dimensional coordinate vector of each pixel point in a to-be-compressed image in a spatial domain, inputting the to-be-compressed image and the two-dimensional coordinate vector into an input layer in a pre-trained multilayer perceptron, and obtaining a multi-dimensional vector of each frequency of each pixel point in a frequency domain, which is output by the input layer;
for each pixel point, inputting the multidimensional vector of the pixel point into N cascaded hidden layers in the multilayer perceptron, and after the N hidden layers cascade the global features and the local features of the multidimensional vector, obtaining an Nth comprehensive feature vector output by an Nth-level hidden layer; n is an integer greater than or equal to 2;
for each pixel point, inputting the Nth comprehensive characteristic vector of the pixel point into an output layer in the multilayer perceptron, and outputting a compressed pixel value corresponding to the pixel point by the output layer;
obtaining a compressed target image according to the compressed pixel value corresponding to each pixel point;
each hidden layer comprises a local branch unit, a global branch unit and a comprehensive unit which is respectively connected with the local branch unit and the global branch unit;
after the N hidden layers cascade the global features and the local features of the multidimensional vector, an nth comprehensive feature vector output by the nth hidden layer is obtained, including:
each local branch unit of the N hidden layers is used for extracting the local features of the multidimensional vector in a cascading manner to obtain an Nth-level local feature vector output by an Nth-level local branch unit;
extracting global features of the multidimensional vector by each global branch unit of the N hidden layers in a cascade mode to obtain an Nth-level global feature vector output by an Nth-level local branch unit;
performing, by the synthesis unit of the nth-level hidden layer, synthesis processing on the nth-level local feature vector and the nth-level global feature vector to output an nth synthesis feature vector of the nth-level hidden layer;
the global feature extracted from the global branch of the previous hidden layer is used as the input of the global branch of the next hidden layer.
2. The method of claim 1, wherein each of the local branching units comprises a linear layer and a gaussian activation function layer connected to each other;
the step of extracting the local features of the multidimensional vector by the cascade connection of the local branch units of the N hidden layers to obtain an nth-level local feature vector output by an nth-level local branch unit includes:
for a level 1 local branch unit, inputting local features of the multi-dimensional vector into a level 1 local branch unit of the N hidden layers, processing the local features of the multi-dimensional vector by a linear layer of the level 1 local branch unit to output a first local linear feature vector, processing the first local linear feature vector by a Gaussian activation function layer of the level 1 local branch unit to output a level 1 local feature vector;
for the nth stage local branch unit, inputting the nth-1 comprehensive characteristic vector output by the nth-1 stage hidden layer into the nth stage local branch unit, processing the nth-1 comprehensive characteristic vector by a linear layer of the nth stage local branch unit to output an nth local linear characteristic vector, and processing the nth local linear characteristic vector by a Gaussian activation function layer of the nth stage local branch unit to output an nth stage local characteristic vector; wherein N is more than or equal to 2 and less than or equal to N.
3. The method of claim 1, wherein each of the global branching units comprises a linear layer and a nonlinear activation function layer connected to each other;
the extracting, by the cascade connection of each global branching unit of the N hidden layers, the global feature of the multidimensional vector to obtain an nth level global feature vector output by an nth level local branching unit includes:
for a first-level global branch unit, inputting the multidimensional vector into a 1 st-level global branch unit in the N hidden layers, processing global features of the multidimensional vector by a linear layer of the 1 st-level global branch unit to output a first global linear feature vector, and processing the first global linear feature vector by a non-linear activation function layer of the 1 st-level global branch unit to output a 1 st-level global feature vector;
for the nth stage global branch unit, inputting the nth-1 stage global feature vector output by the nth-1 stage hidden layer into the nth stage global branch unit, processing the nth-1 stage global feature vector by a linear layer of the nth stage global branch unit to output an nth global linear feature vector, and processing the nth global linear feature vector by a nonlinear activation function layer of the nth stage global branch unit to output an nth stage global feature vector; wherein N is more than or equal to 2 and less than or equal to N.
4. The method of claim 1, wherein before inputting the image to be compressed and the two-dimensional coordinate vector into an input layer in a pre-trained multi-layer perceptron, the method further comprises:
obtaining a two-dimensional coordinate vector of each sample pixel point in a space domain in a multi-frame image sample; wherein, each sample pixel point is respectively marked with a pixel value;
inputting the image sample and the two-dimensional coordinate vector of the image sample into an initial multi-layer perceptron, and outputting a predicted pixel value corresponding to each sample pixel point by the initial multi-layer perceptron;
and calculating the loss between the predicted pixel value and the labeled pixel value of each sample pixel point according to a regression loss function, and obtaining the trained multilayer perceptron when the loss meets the convergence condition.
5. The method according to claim 4, wherein the inputting the image to be compressed and the two-dimensional coordinate vector into an input layer of a pre-trained multi-layer perceptron comprises:
obtaining a first fitting parameter value of the initial multi-layer perceptron and a second fitting parameter value of the trained multi-layer perceptron;
subtracting the first fitting parameter value from the second fitting parameter value to obtain a residual fitting parameter value;
converting the residual error fitting parameter value from a floating point number to an integer to obtain a target residual error fitting parameter value;
adding the target residual error fitting parameter value and the first fitting parameter value to obtain a target fitting parameter value;
and inputting the image to be compressed and the two-dimensional coordinate vector into an input layer of a pre-trained multilayer perceptron with fitting parameters as the target fitting parameter values.
6. An image decompression method, comprising:
determining the size of an image to be decompressed of a target image;
obtaining a target two-dimensional coordinate vector of each pixel point of the target image in a spatial domain under the image size, inputting the target image and the target two-dimensional coordinate vector into an input layer of a pre-trained multilayer perceptron, and obtaining a target multi-dimensional vector of each frequency of each pixel point in a frequency domain, which is output by the input layer;
for each pixel point, inputting the target multi-dimensional vector of the pixel point into N cascaded hidden layers in the multilayer perceptron, and after the global features and the local features of the target multi-dimensional vector are cascaded by the N hidden layers, obtaining an Nth target comprehensive feature vector output by an Nth-level hidden layer; n is an integer greater than or equal to 2;
for each pixel point, inputting the Nth target comprehensive characteristic vector of the pixel point into an output layer in the multilayer perceptron, and outputting a decompressed pixel value corresponding to the pixel point by the output layer;
obtaining a decompressed image according to the decompressed pixel value of each pixel value;
each hidden layer comprises a local branch unit, a global branch unit and a comprehensive unit which is respectively connected with the local branch unit and the global branch unit;
after the N hidden layers cascade the global features and the local features of the multidimensional vector, an nth comprehensive feature vector output by the nth hidden layer is obtained, including:
each local branch unit of the N hidden layers is used for extracting local features of the multi-dimensional vectors in a cascade mode to obtain an Nth-level local feature vector output by an Nth-level local branch unit;
extracting global features of the multidimensional vector by each global branch unit of the N hidden layers in a cascade mode to obtain an Nth-level global feature vector output by an Nth-level local branch unit;
performing, by the synthesis unit of the nth-level hidden layer, synthesis processing on the nth-level local feature vector and the nth-level global feature vector to output an nth synthesis feature vector of the nth-level hidden layer;
the global feature extracted from the global branch of the previous hidden layer is used as the input of the global branch of the next hidden layer.
7. An image compression apparatus characterized by comprising:
the input unit is used for obtaining a two-dimensional coordinate vector of each pixel point in a to-be-compressed image in a spatial domain, inputting the to-be-compressed image and the two-dimensional coordinate vector into an input layer in a pre-trained multilayer perceptron, and obtaining a multi-dimensional vector of each frequency of each pixel point in a frequency domain, which is output by the input layer;
the cascade processing unit is used for inputting the multidimensional vectors of the pixel points into N cascaded hidden layers in the multilayer perceptron aiming at each pixel point, and after the N hidden layers carry out cascade processing on the global features and the local features of the multidimensional vectors, the N comprehensive feature vectors output by the N-level hidden layers are obtained; n is an integer greater than or equal to 2;
the output unit is also used for inputting the Nth comprehensive characteristic vector of each pixel point into an output layer in the multilayer perceptron, and outputting a compressed pixel value corresponding to the pixel point by the output layer;
the obtaining unit is used for obtaining a compressed target image according to the compressed pixel value corresponding to each pixel point;
the cascade processing unit is specifically configured to extract, in a cascade manner, local features of the multidimensional vector by each local branching unit of the N hidden layers to obtain an nth-level local feature vector output by an nth-level local branching unit, extract, in a cascade manner, global features of the multidimensional vector by each global branching unit of the N hidden layers to obtain an nth-level global feature vector output by the nth-level local branching unit, and perform, by an integrating unit of the nth-level hidden layer, an integration process on the nth-level local feature vector and the nth-level global feature vector to output an nth-level integrated feature vector of the nth hidden layer, where each hidden layer includes a local branching unit, a global branching unit, and an integrating unit connected to the local branching unit and the global branching unit, respectively; the global feature extracted from the global branch of the previous hidden layer is used as the input of the global branch of the next hidden layer.
8. An image decompression apparatus characterized by comprising:
a determining unit, configured to determine an image size of a target image to be decompressed;
the input unit is used for obtaining a target two-dimensional coordinate vector of each pixel point of the target image in a space domain under the image size, inputting the target image and the target two-dimensional coordinate vector into an input layer of a pre-trained multilayer perceptron, and obtaining a target multi-dimensional vector of each frequency of each pixel point in a frequency domain, which is output by the input layer;
the cascade processing unit is further used for inputting the target multi-dimensional vectors of the pixel points into N cascaded hidden layers in the multilayer perceptron, and after the global features and the local features of the target multi-dimensional vectors are subjected to cascade processing by the N hidden layers, the N target comprehensive feature vectors output by the N-level hidden layers are obtained; n is an integer greater than or equal to 2;
the output unit is also used for inputting the Nth target comprehensive characteristic vector of each pixel point into an output layer in the multilayer perceptron, and the output layer outputs a decompressed pixel value corresponding to the pixel point;
an obtaining unit configured to obtain a decompressed image according to a decompressed pixel value of each of the pixel values;
the cascade processing unit is specifically configured to extract, in a cascade manner, local features of the multidimensional vector by each local branching unit of the N hidden layers to obtain an nth-level local feature vector output by an nth-level local branching unit, extract, in a cascade manner, global features of the multidimensional vector by each global branching unit of the N hidden layers to obtain an nth-level global feature vector output by the nth-level local branching unit, and perform, by an integrating unit of the nth-level hidden layer, an integration process on the nth-level local feature vector and the nth-level global feature vector to output an nth-level integrated feature vector of the nth hidden layer, where each hidden layer includes a local branching unit, a global branching unit, and an integrating unit connected to the local branching unit and the global branching unit, respectively; and the global features extracted by the global branch of the previous hidden layer are used as the input of the global branch of the next hidden layer.
9. An image processing apparatus characterized by comprising:
the system comprises a central processing unit, a memory, an input/output interface, a wired or wireless network interface and a power supply;
the memory is a transient memory or a persistent memory;
the central processor is configured to communicate with the memory and execute the operations of the instructions in the memory to perform the method of any of claims 1 to 6.
10. A computer-readable storage medium comprising instructions which, when executed on a computer, cause the computer to perform the method of any one of claims 1 to 6.
CN202210915500.2A 2022-08-01 2022-08-01 Image compression method, image decompression method, related device and readable storage medium Active CN114998457B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210915500.2A CN114998457B (en) 2022-08-01 2022-08-01 Image compression method, image decompression method, related device and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210915500.2A CN114998457B (en) 2022-08-01 2022-08-01 Image compression method, image decompression method, related device and readable storage medium

Publications (2)

Publication Number Publication Date
CN114998457A CN114998457A (en) 2022-09-02
CN114998457B true CN114998457B (en) 2022-11-22

Family

ID=83022107

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210915500.2A Active CN114998457B (en) 2022-08-01 2022-08-01 Image compression method, image decompression method, related device and readable storage medium

Country Status (1)

Country Link
CN (1) CN114998457B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101795344A (en) * 2010-03-02 2010-08-04 北京大学 Digital hologram compression method and system, decoding method and system, and transmission method and system
JP2019016166A (en) * 2017-07-06 2019-01-31 日本放送協会 Neural network, encoder, decoder, learning method, control method, and program
CN109919864A (en) * 2019-02-20 2019-06-21 重庆邮电大学 A kind of compression of images cognitive method based on sparse denoising autoencoder network

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11924445B2 (en) * 2020-09-25 2024-03-05 Qualcomm Incorporated Instance-adaptive image and video compression using machine learning systems
CN112381790A (en) * 2020-11-13 2021-02-19 天津大学 Abnormal image detection method based on depth self-coding

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101795344A (en) * 2010-03-02 2010-08-04 北京大学 Digital hologram compression method and system, decoding method and system, and transmission method and system
JP2019016166A (en) * 2017-07-06 2019-01-31 日本放送協会 Neural network, encoder, decoder, learning method, control method, and program
CN109919864A (en) * 2019-02-20 2019-06-21 重庆邮电大学 A kind of compression of images cognitive method based on sparse denoising autoencoder network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Joint Global and Local Hierarchical Priors for Learned Image Compression;Jun-Hyuk Kim等;《arxiv》;20210721;第1-5页 *

Also Published As

Publication number Publication date
CN114998457A (en) 2022-09-02

Similar Documents

Publication Publication Date Title
EP3735658A1 (en) Generating a compressed representation of a neural network with proficient inference speed and power consumption
CN111669587B (en) Mimic compression method and device of video image, storage medium and terminal
US20180239992A1 (en) Processing artificial neural network weights
US20160292589A1 (en) Ultra-high compression of images based on deep learning
CN114402596B (en) Neural network model decoding method, device, system and medium
CN114581544A (en) Image compression method, computer device and computer storage medium
US20100008592A1 (en) Image signal transforming and inverse-transforming method and computer program product with pre-encoding filtering features
KR20200089635A (en) Systems and methods for image compression at multiple, different bitrates
Chen et al. Compressive sensing multi-layer residual coefficients for image coding
CN115022637A (en) Image coding method, image decompression method and device
CN114998457B (en) Image compression method, image decompression method, related device and readable storage medium
Feng et al. Neural subspaces for light fields
Zhuang et al. A robustness and low bit-rate image compression network for underwater acoustic communication
TW202406344A (en) Point cloud geometry data augmentation method and apparatus, encoding method and apparatus, decoding method and apparatus, and encoding and decoding system
WO2023205969A1 (en) Point cloud geometric information compression method and apparatus, point cloud geometric information decompression method and apparatus, point cloud video encoding method and apparatus, and point cloud video decoding method and apparatus
Thepade et al. New clustering algorithm for Vector Quantization using Haar sequence
CN112616058B (en) Video encoding or decoding method, apparatus, computer device, and storage medium
KR20240025629A (en) Video compression using optical flow
CN113554719B (en) Image encoding method, decoding method, storage medium and terminal equipment
CN111107360B (en) Spectrum-space dimension combined hyperspectral image lossless compression method and system
CN114638002A (en) Compressed image encryption method supporting similarity retrieval
Navaneethakrishnan Study of image compression techniques
CN116916033B (en) Combined space-time video compression method based on random self-adaptive Fourier decomposition
Joshi et al. Reducing Image Compression Time using Improvised Discrete Cosine Transform Algorithm
Nihal et al. A Survey and Study of Image Compression Methods

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant