CN114998457B - Image compression method, image decompression method, related device and readable storage medium - Google Patents
Image compression method, image decompression method, related device and readable storage medium Download PDFInfo
- Publication number
- CN114998457B CN114998457B CN202210915500.2A CN202210915500A CN114998457B CN 114998457 B CN114998457 B CN 114998457B CN 202210915500 A CN202210915500 A CN 202210915500A CN 114998457 B CN114998457 B CN 114998457B
- Authority
- CN
- China
- Prior art keywords
- nth
- layer
- vector
- global
- local
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 91
- 230000006835 compression Effects 0.000 title claims abstract description 41
- 238000007906 compression Methods 0.000 title claims abstract description 41
- 230000006837 decompression Effects 0.000 title claims abstract description 23
- 239000013598 vector Substances 0.000 claims abstract description 384
- 238000012545 processing Methods 0.000 claims description 74
- 230000006870 function Effects 0.000 claims description 56
- 230000004913 activation Effects 0.000 claims description 42
- 101100134058 Caenorhabditis elegans nth-1 gene Proteins 0.000 claims description 18
- 230000008569 process Effects 0.000 claims description 14
- 230000015572 biosynthetic process Effects 0.000 claims description 11
- 238000003786 synthesis reaction Methods 0.000 claims description 11
- 238000007667 floating Methods 0.000 claims description 5
- 230000002085 persistent effect Effects 0.000 claims description 3
- 230000001052 transient effect Effects 0.000 claims description 3
- 230000010354 integration Effects 0.000 claims 2
- 238000012549 training Methods 0.000 description 13
- 238000013528 artificial neural network Methods 0.000 description 7
- 238000004422 calculation algorithm Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 238000013507 mapping Methods 0.000 description 6
- 238000013139 quantization Methods 0.000 description 5
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 101100455978 Arabidopsis thaliana MAM1 gene Proteins 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000012886 linear function Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
- G06T9/002—Image coding using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
- G06T9/001—Model-based coding, e.g. wire frame
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the application discloses an image compression method, an image decompression method, related equipment and a readable storage medium, which are used for fitting low, medium and high frequency information and improving the rate distortion performance of image compression. The method in the embodiment of the application comprises the following steps: the method comprises the steps of obtaining a two-dimensional coordinate vector of each pixel point in a space domain in an image to be compressed, inputting the image to be compressed and the two-dimensional coordinate vector into an input layer in a multilayer perceptron, obtaining a multi-dimensional vector of each pixel point in each frequency domain output by the input layer, inputting the multi-dimensional vector of the pixel point into N cascaded hidden layers in the multilayer perceptron, after cascading the global feature and the local feature of the multi-dimensional vector by the N hidden layers, obtaining an Nth comprehensive feature vector output by the Nth hidden layer, inputting the Nth comprehensive feature vector of the pixel point into an output layer in the multilayer perceptron, outputting a compressed pixel value corresponding to the pixel point by the output layer, and obtaining a compressed target image according to the compressed pixel value corresponding to each pixel point.
Description
Technical Field
The embodiment of the application relates to the field of image processing, in particular to an image compression method, an image decompression method, related equipment and a readable storage medium.
Background
The image compression has more benefits, such as reducing the storage space occupied by the image file, and reducing the consumed network bandwidth when the image file is transmitted, so that the compression of the image has a certain significance.
The existing image compression method includes the steps of obtaining a two-dimensional coordinate vector of each pixel point in a space domain in an image to be compressed, inputting the image to be compressed and the two-dimensional coordinate vector into an input layer in a multi-layer sensing machine trained in advance, obtaining a multi-dimensional vector of each pixel point output by the input layer in each frequency domain, inputting the multi-dimensional vector of the pixel point into N full-connection layers cascaded in the multi-layer sensing machine aiming at each pixel point, performing cascade processing on the characteristics of the multi-dimensional vectors through the N full-connection layers to obtain an N comprehensive characteristic vector output by an N-level full-connection layer, inputting the N comprehensive characteristic vector of the pixel point into an output layer in the multi-layer sensing machine aiming at each pixel point, outputting a compressed pixel value corresponding to the pixel point through the output layer, and obtaining a compressed target image according to the compressed pixel value corresponding to each pixel point.
However, because the nonlinear activation function adopted by the full-link layer is generally a ReLU activation function, a multi-layer perceptron composed of an input layer, N full-link layers and an output layer has spectral deviation, low-frequency information can be fitted, but middle-frequency and high-frequency information is difficult to fit, the N-th comprehensive feature vector output by the N-th full-link layer represents that the integrity of the features of the image to be compressed is poor, and the rate-distortion performance of image compression is low.
Disclosure of Invention
The embodiment of the application provides an image compression method, an image decompression method, related equipment and a readable storage medium, which are used for fitting low, medium and high frequency information and improving the rate distortion performance of image compression.
In a first aspect, an embodiment of the present application provides an image compression method, including:
obtaining a two-dimensional coordinate vector of each pixel point in a to-be-compressed image in a spatial domain, inputting the to-be-compressed image and the two-dimensional coordinate vector into an input layer in a pre-trained multilayer perceptron, and obtaining a multi-dimensional vector of each frequency of each pixel point in a frequency domain, which is output by the input layer;
for each pixel point, inputting the multidimensional vector of the pixel point into N cascaded hidden layers in the multilayer perceptron, and after the N hidden layers cascade the global features and the local features of the multidimensional vector, obtaining an Nth comprehensive feature vector output by the Nth hidden layer; n is an integer greater than or equal to 2;
for each pixel point, inputting the Nth comprehensive characteristic vector of the pixel point into an output layer in the multilayer perceptron, and outputting a compressed pixel value corresponding to the pixel point by the output layer;
and obtaining a compressed target image according to the compressed pixel value corresponding to each pixel point.
Optionally, each hidden layer includes a local branch unit, a global branch unit, and a synthesis unit respectively connected to the local branch unit and the global branch unit;
after the N hidden layers cascade the global features and the local features of the multidimensional vector, an nth comprehensive feature vector output by the nth hidden layer is obtained, including:
each local branch unit of the N hidden layers is used for extracting the local features of the multidimensional vector in a cascading manner to obtain an Nth-level local feature vector output by an Nth-level local branch unit;
extracting global features of the multidimensional vector by each global branch unit of the N hidden layers in a cascading manner to obtain an Nth-level global feature vector output by an Nth-level local branch unit;
and performing comprehensive processing on the Nth-level local feature vector and the Nth-level global feature vector by a comprehensive unit of the Nth-level hidden layer to output an Nth comprehensive feature vector of the Nth-level hidden layer.
Optionally, each of the local branching units includes a linear layer and a gaussian activation function layer connected to each other;
the step of extracting the local features of the multidimensional vector by the cascade connection of the local branch units of the N hidden layers to obtain an nth-level local feature vector output by an nth-level local branch unit includes:
for a level 1 local branch unit, inputting local features of the multi-dimensional vector into a level 1 local branch unit of the N hidden layers, processing the local features of the multi-dimensional vector by a linear layer of the level 1 local branch unit to output a first local linear feature vector, processing the first local linear feature vector by a Gaussian activation function layer of the level 1 local branch unit to output a level 1 local feature vector;
for the nth stage local branch unit, inputting the nth-1 synthesized feature vector output by the nth-1 stage hidden layer into the nth stage local branch unit, processing the nth-1 synthesized feature vector by a linear layer of the nth stage local branch unit to output an nth local linear feature vector, and processing the nth local linear feature vector by a Gaussian activation function layer of the nth stage local branch unit to output an nth stage local feature vector; wherein N is more than or equal to 2 and less than or equal to N.
Optionally, each of the global branching units includes a linear layer and a nonlinear activation function layer connected to each other;
the extracting, by the cascade connection of each global branching unit of the N hidden layers, the global feature of the multidimensional vector to obtain an nth level global feature vector output by an nth level local branching unit includes:
for a first-level global branch unit, inputting the multidimensional vector into a 1 st-level global branch unit in the N hidden layers, processing global features of the multidimensional vector by a linear layer of the 1 st-level global branch unit to output a first global linear feature vector, and processing the first global linear feature vector by a non-linear activation function layer of the 1 st-level global branch unit to output a 1 st-level global feature vector;
for an nth level global branch unit, inputting an nth-1 level global feature vector output by the nth-1 level hidden layer into the nth level global branch unit, processing the nth-1 level global feature vector by a linear layer of the nth level global branch unit to output an nth global linear feature vector, and processing the nth global linear feature vector by a nonlinear activation function layer of the nth level global branch unit to output an nth level global feature vector; wherein N is more than or equal to 2 and less than or equal to N.
Optionally, before the image to be compressed and the two-dimensional coordinate vector are input to an input layer in a pre-trained multi-layer perceptron, the method further includes:
obtaining a two-dimensional coordinate vector of each sample pixel point in a space domain in a multi-frame image sample; wherein, each sample pixel point is respectively marked with a pixel value;
inputting the image sample and the two-dimensional coordinate vector of the image sample into an initial multilayer perceptron, and outputting a predicted pixel value corresponding to each sample pixel point by the initial multilayer perceptron;
and calculating the loss between the predicted pixel value and the labeled pixel value of each sample pixel point according to a regression loss function, and obtaining the trained multilayer perceptron when the loss meets the convergence condition.
Optionally, the inputting the image to be compressed and the two-dimensional coordinate vector into an input layer of a pre-trained multi-layer perceptron includes:
obtaining a first fitting parameter value of the initial multi-layer perceptron and a second fitting parameter value of the trained multi-layer perceptron;
subtracting the first fitting parameter value from the second fitting parameter value to obtain a residual fitting parameter value;
converting the residual error fitting parameter value from a floating point number to an integer to obtain a target residual error fitting parameter value;
adding the target residual error fitting parameter value and the first fitting parameter value to obtain a target fitting parameter value;
and inputting the image to be compressed and the two-dimensional coordinate vector into an input layer of a pre-trained multilayer perceptron with fitting parameters as the target fitting parameter values.
In a second aspect, an embodiment of the present application provides an image decompression method, including:
determining the size of an image to be decompressed of a target image;
obtaining a target two-dimensional coordinate vector of each pixel point of the target image in a spatial domain under the image size, inputting the target image and the target two-dimensional coordinate vector into an input layer of a pre-trained multilayer perceptron, and obtaining a target multi-dimensional vector of each frequency of each pixel point in a frequency domain, which is output by the input layer;
for each pixel point, inputting a target multi-dimensional vector of the pixel point into N cascaded hidden layers in the multilayer perceptron, and after cascading processing is carried out on global features and local features of the target multi-dimensional vector by the N hidden layers, obtaining an Nth target comprehensive feature vector output by the Nth-level hidden layer; n is an integer greater than or equal to 2;
for each pixel point, inputting the Nth target comprehensive characteristic vector of the pixel point into an output layer in the multilayer perceptron, and outputting a decompressed pixel value corresponding to the pixel point by the output layer;
and obtaining a decompressed image according to the decompressed pixel value of each pixel value.
In a third aspect, an embodiment of the present application provides an image compression apparatus, including:
the input unit is used for obtaining a two-dimensional coordinate vector of each pixel point in a to-be-compressed image in a spatial domain, inputting the to-be-compressed image and the two-dimensional coordinate vector into an input layer in a pre-trained multilayer perceptron, and obtaining a multi-dimensional vector of each frequency of each pixel point in a frequency domain, which is output by the input layer;
the cascade processing unit is used for inputting the multidimensional vectors of the pixel points into N cascaded hidden layers in the multilayer perceptron, and after the global features and the local features of the multidimensional vectors are subjected to cascade processing by the N hidden layers, an Nth comprehensive feature vector output by the Nth-level hidden layer is obtained; n is an integer greater than or equal to 2;
the output unit is used for inputting the Nth comprehensive characteristic vector of each pixel point into an output layer in the multilayer perceptron, and outputting a compressed pixel value corresponding to the pixel point by the output layer;
and the obtaining unit is used for obtaining a compressed target image according to the compressed pixel value corresponding to each pixel point.
In a fourth aspect, an embodiment of the present application provides an image decompression apparatus, including:
a determining unit, configured to determine an image size to be decompressed for the target image;
the input unit is used for obtaining a target two-dimensional coordinate vector of each pixel point of the target image in a space domain under the image size, inputting the target image and the target two-dimensional coordinate vector into an input layer of a pre-trained multilayer perceptron, and obtaining a target multi-dimensional vector of each frequency of each pixel point in a frequency domain, which is output by the input layer;
the cascade processing unit is used for inputting the target multi-dimensional vectors of the pixel points into N cascaded hidden layers in the multilayer perceptron, and after the global features and the local features of the target multi-dimensional vectors are subjected to cascade processing by the N hidden layers, the N target comprehensive feature vectors output by the N-level hidden layers are obtained; n is an integer greater than or equal to 2;
the output unit is used for inputting the Nth target comprehensive characteristic vector of each pixel point into an output layer in the multilayer perceptron, and outputting a decompressed pixel value corresponding to the pixel point by the output layer;
an obtaining unit, configured to obtain a decompressed image according to the decompressed pixel value of each of the pixel values.
In a fifth aspect, an embodiment of the present application provides an image processing apparatus, including:
the system comprises a central processing unit, a memory, an input/output interface, a wired or wireless network interface and a power supply;
the memory is a transient storage memory or a persistent storage memory;
the central processor is configured to communicate with the memory and execute the instruction operations in the memory to perform the aforementioned image compression method and image decompression method.
In a sixth aspect, the present application provides a computer-readable storage medium, which includes instructions that, when executed on a computer, cause the computer to execute the foregoing image compression method and image decompression method.
In a seventh aspect, the present application provides a computer program product including instructions, which when run on a computer, causes the computer to execute the foregoing image compression method and image decompression method.
According to the technical scheme, the embodiment of the application has the following advantages: the method comprises the steps of obtaining a two-dimensional coordinate vector of each pixel point in a spatial domain in an image to be compressed, inputting the image to be compressed and the two-dimensional coordinate vector into an input layer in a multi-layer perceptron trained in advance, obtaining multi-dimensional vectors of each frequency of each pixel point output by the input layer, inputting the multi-dimensional vectors of the pixel points into N cascaded hidden layers in the multi-layer perceptron aiming at each pixel point, performing cascade processing on global features and local features of the multi-dimensional vectors by the N hidden layers to obtain an N comprehensive feature vector output by an N-level hidden layer, inputting the N comprehensive feature vector of the pixel point into an output layer in the multi-layer perceptron aiming at each pixel point, outputting compressed pixel values corresponding to the pixel points by the output layer, obtaining a compressed target image according to the compressed pixel values corresponding to each pixel point, fitting low, medium and high-frequency information, enabling the N comprehensive feature vector output by the N-level fully-connected layer to represent the integrity of the features of the image to be compressed, and enabling the rate distortion performance of the compressed image to be high.
Drawings
Fig. 1 is a schematic flowchart of an image compression method disclosed in an embodiment of the present application;
FIG. 2 is a schematic flowchart of a method for quantizing and entropy-encoding a difference between a first fitting parameter value and a second fitting parameter value according to an embodiment of the present disclosure;
FIG. 3 is a block diagram of an overall architecture of a multi-layered sensor according to an embodiment of the present disclosure;
fig. 4 is a schematic flowchart of an image decompression method disclosed in an embodiment of the present application;
fig. 5 is a schematic structural diagram of an image compression apparatus disclosed in an embodiment of the present application;
FIG. 6 is a schematic structural diagram of another image compression apparatus disclosed in an embodiment of the present application;
fig. 7 is a schematic structural diagram of an image decompression apparatus disclosed in an embodiment of the present application;
fig. 8 is a schematic structural diagram of an image processing apparatus disclosed in an embodiment of the present application.
Detailed Description
The embodiment of the application provides an image compression method, an image decompression method, related equipment and a readable storage medium, which are used for fitting low, medium and high frequency information and improving the rate distortion performance of image compression.
Referring to fig. 1, fig. 1 is a schematic flowchart of an image compression method disclosed in an embodiment of the present application, the method including:
101. and obtaining a two-dimensional coordinate vector of each pixel point in the image to be compressed in a spatial domain, inputting the image to be compressed and the two-dimensional coordinate vector into an input layer in a pre-trained multilayer perceptron, and obtaining a multidimensional vector of each frequency of each pixel point output by the input layer in a frequency domain.
When image compression is carried out, a two-dimensional coordinate vector of each pixel point in a to-be-compressed image in a spatial domain can be obtained, the to-be-compressed image and the two-dimensional coordinate vector are input into an input layer in a pre-trained multilayer perceptron, and a multi-dimensional vector of each frequency of each pixel point output by the input layer in a frequency domain is obtained.
102. For each pixel point, inputting the multidimensional vector of the pixel point into N cascaded hidden layers in a multilayer perceptron, and after cascading processing is carried out on the global features and the local features of the multidimensional vector by the N hidden layers, obtaining an N comprehensive feature vector output by an N-th hidden layer; n is an integer greater than or equal to 2.
After obtaining the multidimensional vectors of each frequency of each pixel output by the input layer in the frequency domain, inputting the multidimensional vectors of the pixels into N cascaded hidden layers in the multi-layer perceptron aiming at each pixel, and after cascading processing is carried out on the global features and the local features of the multidimensional vectors by the N hidden layers, obtaining the Nth comprehensive feature vector output by the Nth hidden layer; n is an integer greater than or equal to 2. It can be understood that, the method for cascade processing the global features and the local features of the multidimensional vector by the N hidden layers may be to cascade extract the local features of the multidimensional vector by each local branch unit of the N hidden layers to obtain an nth-level local feature vector output by an nth-level local branch unit, cascade extract the global features of the multidimensional vector by each global branch unit of the N hidden layers to obtain an nth-level global feature vector output by the nth-level local branch unit, or may be other reasonable methods for cascade processing, which is not limited herein.
103. And (4) inputting the Nth comprehensive characteristic vector of each pixel point into an output layer in the multilayer perceptron, and outputting a compressed pixel value corresponding to the pixel point by the output layer.
After the nth comprehensive characteristic vector output by the nth-level hidden layer is obtained, the nth comprehensive characteristic vector of the pixel point can be input into an output layer of the multilayer perceptron for each pixel point, and the output layer outputs a compressed pixel value corresponding to the pixel point.
104. And obtaining a compressed target image according to the compressed pixel value corresponding to each pixel point.
After the compressed pixel values corresponding to the pixel points are output by the output layer, the compressed target image can be obtained according to the compressed pixel values corresponding to each pixel point.
In the embodiment of the application, the two-dimensional coordinate vector of each pixel point in a space domain in an image to be compressed can be obtained, the image to be compressed and the two-dimensional coordinate vector are input into an input layer in a multi-layer perceptron trained in advance, the multi-dimensional vector of each pixel point in each frequency domain output by the input layer is obtained, the multi-dimensional vector of each pixel point is input into N cascaded hidden layers in the multi-layer perceptron aiming at each pixel point, the N hidden layers are used for carrying out cascade processing on the global features and the local features of the multi-dimensional vectors to obtain the N comprehensive feature vector output by the N-level hidden layers, the N comprehensive feature vector of each pixel point is input into an output layer in the multi-layer perceptron aiming at each pixel point, the output layer outputs the compressed pixel values corresponding to the pixel points, the compressed target image after compression is obtained according to the compressed pixel values corresponding to each pixel point, low, medium and high-frequency information can be fitted, the N comprehensive feature vector output by the N-level full connection layer represents the characteristics of the image to be compressed, and the image has good integrity and the distortion performance of the image compression is high.
In this embodiment of the present application, after the global features and the local features of the multidimensional vector are cascade-processed by N hidden layers, there may be a variety of methods for obtaining an nth comprehensive feature vector output by an nth hidden layer, and one of the methods is described below based on the image compression method shown in fig. 1.
In this embodiment, before performing image compression, the multi-layer perceptron needs to be trained in advance, and there may be a variety of training methods, one of which is described below:
firstly, embedding priori knowledge into a multilayer perceptron, and realizing meta-learning algorithm training of the multilayer perceptron to obtain an initial multilayer perceptron and a first fitting parameter of the initial multilayer perceptron. Specifically, the method for training the multi-layer perceptron by the meta-learning algorithm comprises the following steps: the meta-learning algorithm can be based on the MAML algorithm, the training comprises two parts of inner layer training optimization and outer layer training optimization, and the neural network parameters can be set asThe parameter-by-parameter learning rate of the inner loop is alpha, and the updating step number of the inner loop is k. First, initialization is performedAnd alpha, randomly sampling a plurality of samples from the metadata set, dividing the samples into training samples and testing samples in an outer layer cycle, and assigning network parameters of an inner layer cycle to beUpdating parameters according to the prediction loss of the model on the training sample in the inner layer circulation, and obtaining the parameters after k steps of updating after the inner layer circulation is finishedThe gradient is determined from the predicted loss of the model on the test specimen, and the parameters are not updated againInstead, the skin parameters are updated according to the gradientAnd the learning rate alpha is cycled, and the first fitting parameter is finally obtainedAnd a learning rate α.
Embedding the priori knowledge into a multilayer perceptron to obtain an initial multilayer perceptron and a first fitting parameter of the initial multilayer perceptronThe initial multi-tier perceptron may then be trained. The specific training method is as follows: firstly, obtaining a two-dimensional coordinate vector of each sample pixel point in a multi-frame image sample in a spatial domain; the method comprises the steps of labeling pixel values of all sample pixel points, inputting two-dimensional coordinate vectors of image samples and image samples into an initial multilayer perceptron, outputting predicted pixel values corresponding to all the sample pixel points by the initial multilayer perceptron, calculating loss between the predicted pixel values and the labeled pixel values of all the sample pixel points according to a regression loss function, and obtaining the trained multilayer perceptron when the loss meets a convergence condition. For example, if the image sample is a 2D image, let usA single frame of an image is represented,representing annotated pixel values, the initial multi-level perceptron representation uses weights ofNeural network ofTo represent the frame image when inputting the two-dimensional coordinate vectorThen, the output predicted pixel value isThe loss between the predicted pixel value and the annotated pixel value for each sample pixel point is calculated by minimizing the mean square error, i.e.And when the loss meets the convergence condition, obtaining the trained multilayer perceptron.
After the multi-layer perceptron is trained, and when image compression is carried out, a two-dimensional coordinate vector of each pixel point in a to-be-compressed image in a spatial domain can be obtained, the to-be-compressed image and the two-dimensional coordinate vector are input into an input layer in the pre-trained multi-layer perceptron, and a multi-dimensional vector of each frequency of each pixel point output by the input layer in a frequency domain is obtained. Specifically, after the image to be compressed and the two-dimensional coordinate vector are input into an input layer of a pre-trained multilayer perceptron, the obtained multidimensional vector is a multidimensional vector subjected to random Fourier mapping.
It should be noted that, after obtaining the trained multi-layered sensor, before inputting the image to be compressed and the two-dimensional coordinate vector into the input layer of the pre-trained multi-layered sensor, in order to reduce the code rate and thereby improve the rate-distortion performance of the compressed image, the difference between the first fitting parameter value of the initial multi-layered sensor and the second fitting parameter value of the trained multi-layered sensor may be quantized and entropy-encoded, please refer to fig. 2, where fig. 2 is a schematic flow chart of a method for quantizing and entropy-encoding the difference between the first fitting parameter value of the initial multi-layered sensor and the second fitting parameter value of the trained multi-layered sensor, which is disclosed in an embodiment of the present application, and the method includes: first obtaining a first fitting parameter value of an initial multi-layer perceptron, and trainingAnd subtracting the first fitting parameter value from the second fitting parameter value to obtain a residual fitting parameter value, converting the residual fitting parameter value from a floating point number to an integer to obtain a target residual fitting parameter value, and adding the target residual fitting parameter value and the first fitting parameter value to obtain a target fitting parameter value. Specifically, let the whole multi-layered perceptron be the mappingThe multilayer perceptron is the neural network shown in FIG. 2, and the parameter of the initial multilayer perceptron is the first fitting parameter value(i.e., meta-learning initialization parameters shown in FIG. 2I.e. parameters obtained after meta-learning training), training the initial multi-layer perceptron to obtain a trained multi-layer perceptron, and fitting the second fitting parameter values of the trained multi-layer perceptronThe second fitting parameter valueSubtracting the first fitting parameter valueObtaining residual fitting parameter valuesAccording to quantization unitsQuantizing the residual fitting parameter values, and converting the residual fitting parameter values from floating point numbers to integersAnd thereby further compress the data, thereby further compressing the data,then through lossless coding module AE, willConverting into binary code stream, and restoring the binary code stream into integer by lossless decoding module AD after the binary code stream passes through transmission channelPlus the first fitting parameter valueAnd obtaining a target fitting parameter value.
After the difference between the first fitting parameter value of the initial multi-layer sensor and the second fitting parameter value of the trained multi-layer sensor is quantized and entropy-encoded, the image to be compressed may be compressed, please refer to fig. 3, where fig. 3 is an overall architecture diagram of the multi-layer sensor disclosed in this embodiment of the present application, and the image to be compressed and the two-dimensional coordinate vector may be input into an input layer (i.e., a position encoding layer in fig. 3) of the pre-trained multi-layer sensor whose fitting parameters are target fitting parameter values, so as to obtain a multi-dimensional vector of each frequency of each pixel point output by the input layer in a frequency domain. Specifically, the two-dimensional coordinate vector may be normalized first to obtain a normalized two-dimensional coordinate vectorThe normalized two-dimensional coordinate vector is processedInput layer of input multilayer perceptronObtaining an input layerOutputting multidimensional vector of each frequency of each pixel point in frequency domainWherein, in the process,in order to input the amplitude of the layer,the dimensions of the input coordinates, as for a 2D image,, in order to input the amplitude of the layer,is calculated from a mean value of 0 and a standard deviation ofThe hyper-parameters (parameters in the Gaussian distribution) are set as default. For the normalization example, for a 2D image with an image size of 3 pixels by 3 pixels, the two-dimensional coordinate vector of each pixel point in the image is [0,0 ]],[0,1],[0,2],[1,0], [1,1], [1,2],[2,0], [2,1], [2,2]After the two-dimensional coordinate vector is normalized, the [ -1, -1 ] can be obtained], [-1,0], [-1,1],[0,-1],[0,0],[0,1],[1,-1], [1,0], [1,1]。
After obtaining the multidimensional vectors of each frequency of each pixel point output by the input layer in the frequency domain, the multidimensional vectors of the pixel points can be input into N cascaded hidden layers in the multilayer perceptron aiming at each pixel point, and the N hidden layers cascade the global characteristics and the local characteristics of the multidimensional vectorsAfter processing, obtaining an Nth comprehensive characteristic vector output by an Nth-level hidden layer; n is an integer greater than or equal to 2. Specifically, the cascaded N hidden layers may be cascaded N Wavelet Base Units (WBUs), or may be other neural network layers capable of performing cascade processing on global features and local features of the multidimensional vector, and are not limited herein. Specifically, the nth synthesized feature vector is an output vector of the cascaded N hidden layers, and may be an output vector of the cascaded N hidden layers。
The method for obtaining the nth comprehensive feature vector output by the nth hidden layer after the global feature and the local feature of the multidimensional vector are cascaded by the N hidden layers may be: after obtaining the nth level local feature vector and the nth level global feature vector, the synthesis unit of the nth level hidden layer may perform synthesis processing on the nth level local feature vector and the nth level global feature vector to output the nth level comprehensive feature vector of the nth level hidden layer. Each hidden layer comprises a local branch unit, a global branch unit and a comprehensive unit respectively connected with the local branch unit and the global branch unit. Specifically, the method of the comprehensive processing may be to perform dot multiplication on the nth level local feature vector and the nth level global feature vector, or may be to perform addition, matrix multiplication, and the like, and the specific method of the comprehensive processing is not limited here.
With reference to fig. 3, in fig. 3, a dashed box includes a hidden layer, the linear layer and the gaussian activation function above each hidden layer form a local branch unit, the linear layer and the ReLU activation function or the squared ReLU activation function below each hidden layer form a global branch unit, and the portion of the local branch unit and the global branch unit in each hidden layer is an integrated unit.
The method for extracting the local features of the multidimensional vector can be as follows: for the level 1 local branch unit, local features of the multi-dimensional vector are input into the level 1 local branch unit in the N hidden layers, the local features of the multi-dimensional vector are processed by a linear layer of the level 1 local branch unit to output a first local linear feature vector, and the first local linear feature vector is processed by a Gaussian activation function layer of the level 1 local branch unit to output the level 1 local feature vector. For the nth stage local branch unit, inputting the nth-1 comprehensive characteristic vector output by the nth-1 stage hidden layer into the nth stage local branch unit, processing the nth-1 comprehensive characteristic vector by a linear layer of the nth stage local branch unit to output an nth local linear characteristic vector, and processing the nth local linear characteristic vector by a Gaussian activation function layer of the nth stage local branch unit to output the nth stage local characteristic vector; wherein N is more than or equal to 2 and less than or equal to N. Wherein each local branching unit comprises a linear layer and a Gaussian activation function layer which are connected with each other. It is understood that the method for extracting the local feature of the multi-dimensional vector may be other reasonable methods besides the above method, and is not limited herein.
The method for extracting the global features of the multidimensional vector may be: for the first-level global branch unit, the multidimensional vector is input into a 1 st-level global branch unit in the N hidden layers, the global features of the multidimensional vector are processed by the linear layer of the 1 st-level global branch unit to output a first global linear feature vector, and the first global linear feature vector is processed by the nonlinear activation function layer of the 1 st-level global branch unit to output a 1 st-level global feature vector. For the nth level global branch unit, inputting the nth-1 level global feature vector output by the nth-1 level hidden layer into the nth level global branch unit, processing the nth-1 level global feature vector by a linear layer of the nth level global branch unit to output an nth level global linear feature vector, and processing the nth global linear feature vector by a nonlinear activation function layer of the nth level global branch unit to output the nth level global feature vector; wherein N is more than or equal to 2 and less than or equal to N. Wherein each global branching unit comprises a linear layer and a nonlinear activation function layer which are connected with each other. The non-linear activation function layer may be a ReLU activation function layer or a squared ReLU activation function layer, which is not limited herein.
It is understood that the method for extracting the global feature of the multidimensional vector may be other reasonable methods besides the above method, and is not limited herein.
It can also be understood that, the method for obtaining the nth comprehensive feature vector output by the nth hidden layer after the global feature and the local feature of the multidimensional vector are cascade-processed by the N hidden layers may be other reasonable methods besides the above method, and is not limited herein.
Specifically, let usThe input quantities of the local branch unit and the global branch unit of the stage hidden layer are respectivelyThen the iterative process for each level of hidden layers may be as follows:
wherein,respectively represent the firstThe fitting parameters of the linear layers of global and local branching units in the level concealment layer,expressing the nonlinear activation function of the global branch unit, formula I representsInput quantity of global branch unit of stage hidden layerAnd a firstInput quantity of global branch unit of stage hidden layerThe mapping relation is the firstInput quantity of global branch unit of stage hidden layerIs as followsInput quantity of global branch unit of stage hidden layerThe first obtained after linear processing of the linear layer and nonlinear processing of the nonlinear activation function layerA level global feature vector. The second formula representsInput quantity of local branch unit of stage hidden layerAnd a firstInput quantity of local branch unit of stage hidden layerThe mapping relation is the firstInput quantity of local branch unit of stage hidden layerComprises the following steps: firstly, first of allInput quantity of local branch unit of stage hidden layerLinear processing of the linear layer and nonlinear processing of the Gaussian activation function layer are carried out to obtain the secondRank local feature vector, willLevel local feature vector andperforming point multiplication on the level global feature vector to obtain the secondOutput of stage hidden layerSynthesize the feature vector, willOutput of stage hidden layerSynthesizing the feature vector asInput quantity of local branch unit of stage hidden layer。
After the N-th comprehensive characteristic vector of the N-th level hidden layer is output, the N-th comprehensive characteristic vector of each pixel point can be input into an output layer of the multilayer perceptron, and a compressed pixel value corresponding to the pixel point is output by the output layer. The compressed pixel value may be an RGB value of the image, or may be another pixel value, which is not limited herein. Specifically, the output layer includes linear layer and Sigmoid activation function layer, to every pixel, synthesizes the output layer that the characteristic vector was input into multilayer perceptron with the Nth of pixel, by the compressed pixel value y that output layer output pixel point corresponds, the formula is as follows:
and a third formula represents a mapping relation between the compressed pixel value y and the Nth comprehensive characteristic vector w, wherein the mapping relation is that after the Nth comprehensive characteristic vector w is input into a linear layer of an output layer for linear processing, a Sigmoid activation function layer is input for nonlinear processing, and finally the compressed pixel value y is obtained.
After the compressed pixel values corresponding to the pixel points are output by the output layer, a compressed target image can be obtained according to the compressed pixel values corresponding to each pixel point, and the compressed target image is a reconstructed image obtained after being compressed according to the multilayer perceptron.
It is understood that the encoding scheme implemented by the input layer to map the two-dimensional coordinate vector into the high-dimensional vector includes, but is not limited to, sine and cosine position encoding, random fourier feature position encoding, gaussian function position encoding, and other different position encoding schemes. Of local branch unitsActivation functions include, but are not limited to, gaussian activation functions, reLU functions, geLU functions, squaring functions, or squared ReLU functions. The activation function of the output layer may be selected according to a range of data values, and the specific embodiment of the activation function may be various activation functions including, but not limited to, a linear function, a sigmoid function, a tanh function, a ReLU function, and the like. The values output by the output layer of the multi-layer perceptron can take different data values according to the data type needing to be fitted, including but not limited to the amplitude value of the audio signal, the RGB values of the 2D image and the video, the symbol distance function value of the 3D surface, the attribute value of the 3D point cloud and the like. The pixel value may be an RGB value of the image, or may be other pixel values representing the image, which is not limited herein. Algorithms involved in meta-learning training the multi-layer perceptron include, but are not limited to, the MAML algorithm and the replay algorithm. Quantization methods include, but are not limited to, uniform quantization, non-uniform quantization, vector quantization. The lossless coding module AE and the lossless decoding module AD are entropy coding methods, and specific embodiments include, but are not limited to, huffman coding, arithmetic coding, interval coding, asymmetric number series coding, and the like.
In the embodiment, a two-dimensional coordinate vector of each pixel point in a spatial domain in an image to be compressed can be obtained, the image to be compressed and the two-dimensional coordinate vector are input into an input layer in a multi-layer perceptron trained in advance, a multi-dimensional vector of each frequency of each pixel point output by the input layer is obtained, the multi-dimensional vector of each pixel point is input into N hidden layers cascaded in the multi-layer perceptron aiming at each pixel point, the N hidden layers are used for cascading the global characteristics and the local characteristics of the multi-dimensional vector to obtain an N comprehensive characteristic vector output by an N-level hidden layer, the N comprehensive characteristic vector of each pixel point is input into an output layer in the multi-layer perceptron aiming at each pixel point, a compressed pixel value corresponding to the pixel point is output by the output layer, a compressed target image corresponding to each pixel point is obtained according to the compressed pixel value corresponding to each pixel point, and the data fitting capability of the multi-layer perceptron is improved by designing an efficient activation function and a neural network structure, so that information of low, medium and high frequency can be obtained, the N comprehensive characteristic vector output by the N comprehensive layer represents the integrity of the image to be compressed, and the image fitting performance is high. Secondly, the first fitting parameters of the initial multilayer perceptron are introduced, prior knowledge can be merged into the fitting parameters of the multilayer perceptron, the gradient direction which enables the gradient to be reduced as fast as possible can be found quickly by utilizing the grasped prior knowledge, and the convergence speed in the process of training the multilayer perceptron can be improved. And moreover, the difference value of the first fitting parameter value of the initial multilayer perceptron and the second fitting parameter value of the trained multilayer perceptron is quantized and entropy-coded, so that the parameter quantity of the multilayer perceptron is reduced, the calculation complexity is reduced, the code rate is reduced, and the rate-distortion performance of image compression is improved.
The image compression method in the embodiment of the present application is described above, and the image decompression method in the embodiment of the present application is described below, please refer to fig. 4, where fig. 4 is a schematic flow chart of an image decompression method disclosed in the embodiment of the present application, and the method includes:
401. the size of the image that needs to be decompressed for the target image is determined.
When image decompression is performed, the size of the image that needs to be decompressed for the target image may be determined.
402. And obtaining a target two-dimensional coordinate vector of each pixel point of the target image in the spatial domain under the image size, inputting the target image and the target two-dimensional coordinate vector into an input layer in a pre-trained multilayer perceptron, and obtaining a target multi-dimensional vector of each frequency of each pixel point output by the input layer in the frequency domain.
After the size of an image to be decompressed of a target image is determined, a target two-dimensional coordinate vector of each pixel point of the target image in a spatial domain under the image size can be obtained, the target image and the target two-dimensional coordinate vector are input into an input layer of a pre-trained multilayer perceptron, and a target multi-dimensional vector of each frequency of each pixel point output by the input layer in a frequency domain is obtained.
403. For each pixel point, inputting a target multi-dimensional vector of the pixel point into N cascaded hidden layers in a multi-layer perceptron, and after cascading processing is carried out on global features and local features of the target multi-dimensional vector by the N hidden layers, obtaining an Nth target comprehensive feature vector output by an Nth-level hidden layer; n is an integer greater than or equal to 2.
After a target multi-dimensional vector of each pixel output by the input layer at each frequency in a frequency domain is obtained, the target multi-dimensional vector of each pixel can be input into N cascaded hidden layers in the multi-layer perceptron aiming at each pixel, and after global features and local features of the target multi-dimensional vector are cascaded by the N hidden layers, an Nth target comprehensive feature vector output by an Nth-level hidden layer is obtained; n is an integer greater than or equal to 2.
404. And inputting the Nth target comprehensive characteristic vector of each pixel point into an output layer in the multilayer perceptron, and outputting the decompressed pixel value corresponding to the pixel point by the output layer.
After the nth target comprehensive characteristic vector output by the nth-level hidden layer is obtained, the nth target comprehensive characteristic vector of the pixel point can be input into an output layer in the multilayer perceptron for each pixel point, and the output layer outputs the decompressed pixel value corresponding to the pixel point.
405. And obtaining a decompressed image according to the decompressed pixel value of each pixel value.
After the decompressed pixel values corresponding to the pixel points are output by the output layer, the decompressed image can be obtained according to the decompressed pixel value of each pixel value. Specifically, the decompressed image is a reconstructed image obtained by decompressing according to the multilayer perceptron.
In the embodiment, the size of an image to be decompressed of a target image can be determined, a target two-dimensional coordinate vector of each pixel point of the target image in a spatial domain under the image size is obtained, the target image and the target two-dimensional coordinate vector are input into an input layer in a pre-trained multilayer perceptron, a target multi-dimensional vector of each pixel point output by the input layer at each frequency in a frequency domain is obtained, the target multi-dimensional vector of each pixel point is input into N cascaded hidden layers in the multilayer perceptron, and an Nth target comprehensive feature vector output by an Nth-level hidden layer is obtained after global features and local features of the target multi-dimensional vector are cascaded by the N hidden layers; and N is an integer greater than or equal to 2, the Nth target comprehensive characteristic vector of each pixel point is input into an output layer of the multilayer perceptron, the output layer outputs a decompressed pixel value corresponding to the pixel point, and a decompressed image is obtained according to the decompressed pixel value of each pixel value. Through designing an efficient activation function and a neural network structure, the data fitting capacity of the multilayer perceptron is improved, so that low, medium and high frequency information can be fitted, the N comprehensive characteristic vector output by the N-level full-connection layer represents the good integrity of the characteristics of the image to be decompressed, and the rate distortion performance of image decompression is high. Secondly, the image with any image size can be decompressed according to the requirements of the user, and the flexibility of decompressing and reconstructing the compressed image is improved.
With reference to fig. 5, the image decompression method in the embodiment of the present application is described above, and the image compression apparatus in the embodiment of the present application is described below, where an embodiment of the image compression apparatus in the embodiment of the present application includes:
the input unit 501 is configured to obtain a two-dimensional coordinate vector of each pixel point in a to-be-compressed image in a spatial domain, input the to-be-compressed image and the two-dimensional coordinate vector into an input layer in a pre-trained multi-layer perceptron, and obtain a multi-dimensional vector of each frequency of each pixel point in a frequency domain, where the multi-dimensional vector is output by the input layer;
a cascade processing unit 502, configured to input, for each pixel point, the multidimensional vector of the pixel point obtained by the input unit 501 into N cascaded hidden layers in the multilayer perceptron, and after the N hidden layers perform cascade processing on the global features and the local features of the multidimensional vector, obtain an nth comprehensive feature vector output by the nth-stage hidden layer; n is an integer greater than or equal to 2;
an output unit 503, configured to, for each pixel point, input the nth synthesized feature vector of the pixel point obtained by the cascade processing unit 502 into an output layer in the multilayer perceptron, and output, by the output layer, a compressed pixel value corresponding to the pixel point;
an obtaining unit 504, configured to obtain a compressed target image according to the compressed pixel value corresponding to the pixel point obtained by each output unit 503.
In the embodiment of the application, a two-dimensional coordinate vector of each pixel point in a spatial domain in an image to be compressed can be obtained, the image to be compressed and the two-dimensional coordinate vector are input into an input layer in a multi-layer perceptron trained in advance, a multi-dimensional vector of each frequency of each pixel point output by the input layer is obtained, the multi-dimensional vector of each pixel point in the frequency domain is input into N cascaded hidden layers in the multi-layer perceptron aiming at each pixel point, after the global characteristic and the local characteristic of the multi-dimensional vector are cascaded by the N hidden layers, an N comprehensive characteristic vector output by an N-level hidden layer is obtained, the N comprehensive characteristic vector of each pixel point is input into an output layer in the multi-layer perceptron aiming at each pixel point, a compressed pixel value corresponding to each pixel point is output by the output layer, a compressed target image is obtained according to the compressed pixel value corresponding to each pixel point, low, medium and high frequency information can be fitted, the N comprehensive characteristic vector output by the N comprehensive layer of the N full connection layer represents the integrity of the characteristics of the image to be compressed, and the distortion performance of the image compression is high.
Referring to fig. 6, an image compression apparatus in an embodiment of the present application is described in detail below, where another embodiment of the image compression apparatus in the embodiment of the present application includes:
the input unit 601 is configured to obtain a two-dimensional coordinate vector of each pixel point in a to-be-compressed image in a spatial domain, input the to-be-compressed image and the two-dimensional coordinate vector into an input layer in a pre-trained multi-layer perceptron, and obtain a multi-dimensional vector of each frequency of each pixel point in a frequency domain, where the multi-dimensional vector is output by the input layer;
a cascade processing unit 602, configured to input, for each pixel point, the multidimensional vector of the pixel point obtained by the input unit 601 into N cascaded hidden layers in the multilayer perceptron, and after cascade processing is performed on global features and local features of the multidimensional vector by the N hidden layers, an nth comprehensive feature vector output by the nth hidden layer is obtained; n is an integer greater than or equal to 2;
an output unit 603, configured to, for each pixel point, input the nth synthesized feature vector of the pixel point obtained by the cascade processing unit 602 into an output layer in the multilayer perceptron, and output, by the output layer, a compressed pixel value corresponding to the pixel point;
an obtaining unit 604, configured to obtain a compressed target image according to the compressed pixel value corresponding to the pixel point obtained by each output unit 603.
The cascade processing unit 602 is specifically configured to extract, by cascading, each local branch unit of the N hidden layers, the local feature of the multidimensional vector to obtain an nth-level local feature vector output by an nth-level local branch unit, extract, by cascading, by each global branch unit of the N hidden layers, the global feature of the multidimensional vector to obtain an nth-level global feature vector output by an nth-level local branch unit, and perform, by a synthesis unit of the nth-level hidden layer, synthesis processing on the nth-level local feature vector and the nth-level global feature vector to output an nth-level synthesized feature vector of the nth-level hidden layer.
The cascade processing unit 602, specifically configured to input local features of the multidimensional vector to a level 1 local branching unit of the N hidden layers for the level 1 local branching unit, process local features of the multidimensional vector by a linear layer of the level 1 local branching unit to output a first local linear feature vector, process the first local linear feature vector by a gaussian activation function layer of the level 1 local branching unit to output a level 1 local feature vector, input an N-1 th integrated feature vector output by the N-1 th hidden layer to the N level local branching unit for the N level local branching unit, process the N-1 th integrated feature vector by a linear layer of the N level local branching unit to output an N-th local linear feature vector, process the N-th local linear feature vector by a gaussian activation function layer of the N level local branching unit to output an N level local feature vector; wherein N is more than or equal to 2 and less than or equal to N.
The cascade processing unit 602 is specifically configured to, for a first-level global branch unit, input the multidimensional vector into a 1 st-level global branch unit in the N hidden layers, process global features of the multidimensional vector by a linear layer of the 1 st-level global branch unit to output a first global linear feature vector, process the first global linear feature vector by a non-linear activation function layer of the 1 st-level global branch unit to output a 1 st-level global feature vector, for an nth-level global branch unit, input an N-1 st-level global feature vector output by the N-1 st-level hidden layer into the nth-level global branch unit, process the N-1 st-level global feature vector by a linear layer of the nth-level global branch unit to output an nth global linear feature vector, process the nth global linear feature vector by a non-linear activation function layer of the nth-level global branch unit to output an nth-level global feature vector; wherein N is more than or equal to 2 and less than or equal to N.
The image compression apparatus further includes: a calculation unit 605;
the obtaining unit 604 is further configured to obtain a two-dimensional coordinate vector of each sample pixel point in the multi-frame image sample in the spatial domain; wherein, each sample pixel point is respectively marked with a pixel value;
the input unit 601 is further configured to input the image sample and the two-dimensional coordinate vector of the image sample into an initial multi-layer perceptron, and output a predicted pixel value corresponding to each sample pixel point by the initial multi-layer perceptron;
the calculating unit 605 is specifically configured to calculate a loss between the predicted pixel value and the labeled pixel value of each sample pixel point according to a regression loss function, and when the loss meets a convergence condition, obtain a trained multi-layer perceptron.
The input unit 601 is specifically configured to obtain a first fitting parameter value of the initial multi-layer sensor and a second fitting parameter value of the trained multi-layer sensor, subtract the first fitting parameter value from the second fitting parameter value to obtain a residual fitting parameter value, convert the residual fitting parameter value from a floating point number to an integer to obtain a target residual fitting parameter value, add the target residual fitting parameter value to the first fitting parameter value to obtain a target fitting parameter value, and input the to-be-compressed image and the two-dimensional coordinate vector to an input layer of the pre-trained multi-layer sensor of which the fitting parameters are the target fitting parameter value.
In this embodiment, each unit in the image compression device performs the operation of the image compression device in the embodiment shown in fig. 1, which is not described herein again.
With reference to fig. 7, the image compression apparatus in the embodiment of the present application is described above, and the image decompression apparatus in the embodiment of the present application is described below, where an embodiment of the image decompression apparatus in the embodiment of the present application includes:
a determining unit 701, configured to determine an image size to be decompressed for a target image;
an input unit 702, configured to obtain a target two-dimensional coordinate vector of each pixel point of the target image in the spatial domain in the image size determined by the determining unit 701, input the target image and the target two-dimensional coordinate vector into an input layer in a pre-trained multi-layer perceptron, and obtain a target multi-dimensional vector of each frequency of each pixel point in a frequency domain output by the input layer;
the cascade processing unit 703 is further configured to, for each pixel point, input the target multidimensional vector of the pixel point obtained by the input unit 702 into N cascaded hidden layers in the multilayer perceptron, and after the global features and the local features of the target multidimensional vector are cascade-processed by the N hidden layers, obtain an nth target comprehensive feature vector output by the nth-stage hidden layer; n is an integer greater than or equal to 2;
the output unit 704 is further configured to, for each pixel point, input the nth target comprehensive feature vector of the pixel point obtained by the cascade processing unit 703 into an output layer in the multilayer perceptron, and output a decompressed pixel value corresponding to the pixel point by the output layer;
an obtaining unit 705, configured to obtain a decompressed image according to the decompressed pixel value of the pixel value obtained by each output unit 704.
In this embodiment, the size of an image to be decompressed of a target image can be determined, a target two-dimensional coordinate vector of each pixel point of the target image in a spatial domain under the image size is obtained, the target image and the target two-dimensional coordinate vector are input into an input layer in a multi-layer perceptron trained in advance, a target multi-dimensional vector of each frequency of each pixel point output by the input layer in a frequency domain is obtained, the target multi-dimensional vector of each pixel point is input into N cascaded hidden layers in the multi-layer perceptron aiming at each pixel point, and an N target comprehensive feature vector output by an N-level hidden layer is obtained after global features and local features of the target multi-dimensional vector are cascaded by the N hidden layers; and N is an integer greater than or equal to 2, aiming at each pixel point, inputting the Nth target comprehensive characteristic vector of the pixel point into an output layer in the multilayer perceptron, outputting a decompressed pixel value corresponding to the pixel point by the output layer, and obtaining a decompressed image according to the decompressed pixel value of each pixel value. Through designing an efficient activation function and a neural network structure, the data fitting capacity of the multilayer perceptron is improved, so that low, medium and high frequency information can be fitted, the N comprehensive characteristic vector output by the N-level full-connection layer represents the good integrity of the characteristics of the image to be decompressed, and the rate distortion performance of image decompression is high. Secondly, the image with any image size can be decompressed according to the requirements of the user, and the flexibility of decompressing and reconstructing the compressed image is improved.
Referring to fig. 8, another embodiment of an image processing apparatus 800 according to an embodiment of the present application includes:
a central processing unit 801, a memory 805, an input/output interface 804, a wired or wireless network interface 803 and a power supply 802;
the central processor 801 is configured to communicate with the memory 805 and execute the operations of the instructions in the memory 805 to perform the methods described in the embodiments illustrated in fig. 1 above.
The embodiment of the present application further provides a computer-readable storage medium, which includes instructions, when the instructions are executed on a computer, cause the computer to execute the method in the foregoing embodiment shown in fig. 1.
The embodiment of the present application further provides a computer program product containing instructions, which when run on a computer, cause the computer to execute the method in the foregoing embodiment shown in fig. 1.
It should be understood that, although the steps in the flowcharts related to the embodiments as described above are sequentially displayed as indicated by arrows, the steps are not necessarily performed sequentially as indicated by the arrows. The steps are not limited to being performed in the exact order illustrated and, unless explicitly stated herein, may be performed in other orders. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be rotated or alternated with other steps or at least a part of the steps or stages in other steps.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present application, which are essential or part of the technical solutions contributing to the prior art, or all or part of the technical solutions, may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and the like.
Claims (10)
1. An image compression method, comprising:
obtaining a two-dimensional coordinate vector of each pixel point in a to-be-compressed image in a spatial domain, inputting the to-be-compressed image and the two-dimensional coordinate vector into an input layer in a pre-trained multilayer perceptron, and obtaining a multi-dimensional vector of each frequency of each pixel point in a frequency domain, which is output by the input layer;
for each pixel point, inputting the multidimensional vector of the pixel point into N cascaded hidden layers in the multilayer perceptron, and after the N hidden layers cascade the global features and the local features of the multidimensional vector, obtaining an Nth comprehensive feature vector output by an Nth-level hidden layer; n is an integer greater than or equal to 2;
for each pixel point, inputting the Nth comprehensive characteristic vector of the pixel point into an output layer in the multilayer perceptron, and outputting a compressed pixel value corresponding to the pixel point by the output layer;
obtaining a compressed target image according to the compressed pixel value corresponding to each pixel point;
each hidden layer comprises a local branch unit, a global branch unit and a comprehensive unit which is respectively connected with the local branch unit and the global branch unit;
after the N hidden layers cascade the global features and the local features of the multidimensional vector, an nth comprehensive feature vector output by the nth hidden layer is obtained, including:
each local branch unit of the N hidden layers is used for extracting the local features of the multidimensional vector in a cascading manner to obtain an Nth-level local feature vector output by an Nth-level local branch unit;
extracting global features of the multidimensional vector by each global branch unit of the N hidden layers in a cascade mode to obtain an Nth-level global feature vector output by an Nth-level local branch unit;
performing, by the synthesis unit of the nth-level hidden layer, synthesis processing on the nth-level local feature vector and the nth-level global feature vector to output an nth synthesis feature vector of the nth-level hidden layer;
the global feature extracted from the global branch of the previous hidden layer is used as the input of the global branch of the next hidden layer.
2. The method of claim 1, wherein each of the local branching units comprises a linear layer and a gaussian activation function layer connected to each other;
the step of extracting the local features of the multidimensional vector by the cascade connection of the local branch units of the N hidden layers to obtain an nth-level local feature vector output by an nth-level local branch unit includes:
for a level 1 local branch unit, inputting local features of the multi-dimensional vector into a level 1 local branch unit of the N hidden layers, processing the local features of the multi-dimensional vector by a linear layer of the level 1 local branch unit to output a first local linear feature vector, processing the first local linear feature vector by a Gaussian activation function layer of the level 1 local branch unit to output a level 1 local feature vector;
for the nth stage local branch unit, inputting the nth-1 comprehensive characteristic vector output by the nth-1 stage hidden layer into the nth stage local branch unit, processing the nth-1 comprehensive characteristic vector by a linear layer of the nth stage local branch unit to output an nth local linear characteristic vector, and processing the nth local linear characteristic vector by a Gaussian activation function layer of the nth stage local branch unit to output an nth stage local characteristic vector; wherein N is more than or equal to 2 and less than or equal to N.
3. The method of claim 1, wherein each of the global branching units comprises a linear layer and a nonlinear activation function layer connected to each other;
the extracting, by the cascade connection of each global branching unit of the N hidden layers, the global feature of the multidimensional vector to obtain an nth level global feature vector output by an nth level local branching unit includes:
for a first-level global branch unit, inputting the multidimensional vector into a 1 st-level global branch unit in the N hidden layers, processing global features of the multidimensional vector by a linear layer of the 1 st-level global branch unit to output a first global linear feature vector, and processing the first global linear feature vector by a non-linear activation function layer of the 1 st-level global branch unit to output a 1 st-level global feature vector;
for the nth stage global branch unit, inputting the nth-1 stage global feature vector output by the nth-1 stage hidden layer into the nth stage global branch unit, processing the nth-1 stage global feature vector by a linear layer of the nth stage global branch unit to output an nth global linear feature vector, and processing the nth global linear feature vector by a nonlinear activation function layer of the nth stage global branch unit to output an nth stage global feature vector; wherein N is more than or equal to 2 and less than or equal to N.
4. The method of claim 1, wherein before inputting the image to be compressed and the two-dimensional coordinate vector into an input layer in a pre-trained multi-layer perceptron, the method further comprises:
obtaining a two-dimensional coordinate vector of each sample pixel point in a space domain in a multi-frame image sample; wherein, each sample pixel point is respectively marked with a pixel value;
inputting the image sample and the two-dimensional coordinate vector of the image sample into an initial multi-layer perceptron, and outputting a predicted pixel value corresponding to each sample pixel point by the initial multi-layer perceptron;
and calculating the loss between the predicted pixel value and the labeled pixel value of each sample pixel point according to a regression loss function, and obtaining the trained multilayer perceptron when the loss meets the convergence condition.
5. The method according to claim 4, wherein the inputting the image to be compressed and the two-dimensional coordinate vector into an input layer of a pre-trained multi-layer perceptron comprises:
obtaining a first fitting parameter value of the initial multi-layer perceptron and a second fitting parameter value of the trained multi-layer perceptron;
subtracting the first fitting parameter value from the second fitting parameter value to obtain a residual fitting parameter value;
converting the residual error fitting parameter value from a floating point number to an integer to obtain a target residual error fitting parameter value;
adding the target residual error fitting parameter value and the first fitting parameter value to obtain a target fitting parameter value;
and inputting the image to be compressed and the two-dimensional coordinate vector into an input layer of a pre-trained multilayer perceptron with fitting parameters as the target fitting parameter values.
6. An image decompression method, comprising:
determining the size of an image to be decompressed of a target image;
obtaining a target two-dimensional coordinate vector of each pixel point of the target image in a spatial domain under the image size, inputting the target image and the target two-dimensional coordinate vector into an input layer of a pre-trained multilayer perceptron, and obtaining a target multi-dimensional vector of each frequency of each pixel point in a frequency domain, which is output by the input layer;
for each pixel point, inputting the target multi-dimensional vector of the pixel point into N cascaded hidden layers in the multilayer perceptron, and after the global features and the local features of the target multi-dimensional vector are cascaded by the N hidden layers, obtaining an Nth target comprehensive feature vector output by an Nth-level hidden layer; n is an integer greater than or equal to 2;
for each pixel point, inputting the Nth target comprehensive characteristic vector of the pixel point into an output layer in the multilayer perceptron, and outputting a decompressed pixel value corresponding to the pixel point by the output layer;
obtaining a decompressed image according to the decompressed pixel value of each pixel value;
each hidden layer comprises a local branch unit, a global branch unit and a comprehensive unit which is respectively connected with the local branch unit and the global branch unit;
after the N hidden layers cascade the global features and the local features of the multidimensional vector, an nth comprehensive feature vector output by the nth hidden layer is obtained, including:
each local branch unit of the N hidden layers is used for extracting local features of the multi-dimensional vectors in a cascade mode to obtain an Nth-level local feature vector output by an Nth-level local branch unit;
extracting global features of the multidimensional vector by each global branch unit of the N hidden layers in a cascade mode to obtain an Nth-level global feature vector output by an Nth-level local branch unit;
performing, by the synthesis unit of the nth-level hidden layer, synthesis processing on the nth-level local feature vector and the nth-level global feature vector to output an nth synthesis feature vector of the nth-level hidden layer;
the global feature extracted from the global branch of the previous hidden layer is used as the input of the global branch of the next hidden layer.
7. An image compression apparatus characterized by comprising:
the input unit is used for obtaining a two-dimensional coordinate vector of each pixel point in a to-be-compressed image in a spatial domain, inputting the to-be-compressed image and the two-dimensional coordinate vector into an input layer in a pre-trained multilayer perceptron, and obtaining a multi-dimensional vector of each frequency of each pixel point in a frequency domain, which is output by the input layer;
the cascade processing unit is used for inputting the multidimensional vectors of the pixel points into N cascaded hidden layers in the multilayer perceptron aiming at each pixel point, and after the N hidden layers carry out cascade processing on the global features and the local features of the multidimensional vectors, the N comprehensive feature vectors output by the N-level hidden layers are obtained; n is an integer greater than or equal to 2;
the output unit is also used for inputting the Nth comprehensive characteristic vector of each pixel point into an output layer in the multilayer perceptron, and outputting a compressed pixel value corresponding to the pixel point by the output layer;
the obtaining unit is used for obtaining a compressed target image according to the compressed pixel value corresponding to each pixel point;
the cascade processing unit is specifically configured to extract, in a cascade manner, local features of the multidimensional vector by each local branching unit of the N hidden layers to obtain an nth-level local feature vector output by an nth-level local branching unit, extract, in a cascade manner, global features of the multidimensional vector by each global branching unit of the N hidden layers to obtain an nth-level global feature vector output by the nth-level local branching unit, and perform, by an integrating unit of the nth-level hidden layer, an integration process on the nth-level local feature vector and the nth-level global feature vector to output an nth-level integrated feature vector of the nth hidden layer, where each hidden layer includes a local branching unit, a global branching unit, and an integrating unit connected to the local branching unit and the global branching unit, respectively; the global feature extracted from the global branch of the previous hidden layer is used as the input of the global branch of the next hidden layer.
8. An image decompression apparatus characterized by comprising:
a determining unit, configured to determine an image size of a target image to be decompressed;
the input unit is used for obtaining a target two-dimensional coordinate vector of each pixel point of the target image in a space domain under the image size, inputting the target image and the target two-dimensional coordinate vector into an input layer of a pre-trained multilayer perceptron, and obtaining a target multi-dimensional vector of each frequency of each pixel point in a frequency domain, which is output by the input layer;
the cascade processing unit is further used for inputting the target multi-dimensional vectors of the pixel points into N cascaded hidden layers in the multilayer perceptron, and after the global features and the local features of the target multi-dimensional vectors are subjected to cascade processing by the N hidden layers, the N target comprehensive feature vectors output by the N-level hidden layers are obtained; n is an integer greater than or equal to 2;
the output unit is also used for inputting the Nth target comprehensive characteristic vector of each pixel point into an output layer in the multilayer perceptron, and the output layer outputs a decompressed pixel value corresponding to the pixel point;
an obtaining unit configured to obtain a decompressed image according to a decompressed pixel value of each of the pixel values;
the cascade processing unit is specifically configured to extract, in a cascade manner, local features of the multidimensional vector by each local branching unit of the N hidden layers to obtain an nth-level local feature vector output by an nth-level local branching unit, extract, in a cascade manner, global features of the multidimensional vector by each global branching unit of the N hidden layers to obtain an nth-level global feature vector output by the nth-level local branching unit, and perform, by an integrating unit of the nth-level hidden layer, an integration process on the nth-level local feature vector and the nth-level global feature vector to output an nth-level integrated feature vector of the nth hidden layer, where each hidden layer includes a local branching unit, a global branching unit, and an integrating unit connected to the local branching unit and the global branching unit, respectively; and the global features extracted by the global branch of the previous hidden layer are used as the input of the global branch of the next hidden layer.
9. An image processing apparatus characterized by comprising:
the system comprises a central processing unit, a memory, an input/output interface, a wired or wireless network interface and a power supply;
the memory is a transient memory or a persistent memory;
the central processor is configured to communicate with the memory and execute the operations of the instructions in the memory to perform the method of any of claims 1 to 6.
10. A computer-readable storage medium comprising instructions which, when executed on a computer, cause the computer to perform the method of any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210915500.2A CN114998457B (en) | 2022-08-01 | 2022-08-01 | Image compression method, image decompression method, related device and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210915500.2A CN114998457B (en) | 2022-08-01 | 2022-08-01 | Image compression method, image decompression method, related device and readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114998457A CN114998457A (en) | 2022-09-02 |
CN114998457B true CN114998457B (en) | 2022-11-22 |
Family
ID=83022107
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210915500.2A Active CN114998457B (en) | 2022-08-01 | 2022-08-01 | Image compression method, image decompression method, related device and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114998457B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101795344A (en) * | 2010-03-02 | 2010-08-04 | 北京大学 | Digital hologram compression method and system, decoding method and system, and transmission method and system |
JP2019016166A (en) * | 2017-07-06 | 2019-01-31 | 日本放送協会 | Neural network, encoder, decoder, learning method, control method, and program |
CN109919864A (en) * | 2019-02-20 | 2019-06-21 | 重庆邮电大学 | A kind of compression of images cognitive method based on sparse denoising autoencoder network |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11924445B2 (en) * | 2020-09-25 | 2024-03-05 | Qualcomm Incorporated | Instance-adaptive image and video compression using machine learning systems |
CN112381790A (en) * | 2020-11-13 | 2021-02-19 | 天津大学 | Abnormal image detection method based on depth self-coding |
-
2022
- 2022-08-01 CN CN202210915500.2A patent/CN114998457B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101795344A (en) * | 2010-03-02 | 2010-08-04 | 北京大学 | Digital hologram compression method and system, decoding method and system, and transmission method and system |
JP2019016166A (en) * | 2017-07-06 | 2019-01-31 | 日本放送協会 | Neural network, encoder, decoder, learning method, control method, and program |
CN109919864A (en) * | 2019-02-20 | 2019-06-21 | 重庆邮电大学 | A kind of compression of images cognitive method based on sparse denoising autoencoder network |
Non-Patent Citations (1)
Title |
---|
Joint Global and Local Hierarchical Priors for Learned Image Compression;Jun-Hyuk Kim等;《arxiv》;20210721;第1-5页 * |
Also Published As
Publication number | Publication date |
---|---|
CN114998457A (en) | 2022-09-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3735658A1 (en) | Generating a compressed representation of a neural network with proficient inference speed and power consumption | |
CN111669587B (en) | Mimic compression method and device of video image, storage medium and terminal | |
US20180239992A1 (en) | Processing artificial neural network weights | |
US20160292589A1 (en) | Ultra-high compression of images based on deep learning | |
CN114402596B (en) | Neural network model decoding method, device, system and medium | |
CN114581544A (en) | Image compression method, computer device and computer storage medium | |
US20100008592A1 (en) | Image signal transforming and inverse-transforming method and computer program product with pre-encoding filtering features | |
KR20200089635A (en) | Systems and methods for image compression at multiple, different bitrates | |
Chen et al. | Compressive sensing multi-layer residual coefficients for image coding | |
CN115022637A (en) | Image coding method, image decompression method and device | |
CN114998457B (en) | Image compression method, image decompression method, related device and readable storage medium | |
Feng et al. | Neural subspaces for light fields | |
Zhuang et al. | A robustness and low bit-rate image compression network for underwater acoustic communication | |
TW202406344A (en) | Point cloud geometry data augmentation method and apparatus, encoding method and apparatus, decoding method and apparatus, and encoding and decoding system | |
WO2023205969A1 (en) | Point cloud geometric information compression method and apparatus, point cloud geometric information decompression method and apparatus, point cloud video encoding method and apparatus, and point cloud video decoding method and apparatus | |
Thepade et al. | New clustering algorithm for Vector Quantization using Haar sequence | |
CN112616058B (en) | Video encoding or decoding method, apparatus, computer device, and storage medium | |
KR20240025629A (en) | Video compression using optical flow | |
CN113554719B (en) | Image encoding method, decoding method, storage medium and terminal equipment | |
CN111107360B (en) | Spectrum-space dimension combined hyperspectral image lossless compression method and system | |
CN114638002A (en) | Compressed image encryption method supporting similarity retrieval | |
Navaneethakrishnan | Study of image compression techniques | |
CN116916033B (en) | Combined space-time video compression method based on random self-adaptive Fourier decomposition | |
Joshi et al. | Reducing Image Compression Time using Improvised Discrete Cosine Transform Algorithm | |
Nihal et al. | A Survey and Study of Image Compression Methods |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |