CN116095183A

CN116095183A - Data compression method and related equipment

Info

Publication number: CN116095183A
Application number: CN202310077949.0A
Authority: CN
Inventors: 张琛; 汤姆·莱德; 康宁; 张世枫
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2022-03-14
Filing date: 2023-01-13
Publication date: 2023-05-09
Also published as: WO2023174256A1

Abstract

The application relates to the field of artificial intelligence and discloses a data compression method, which comprises the following steps: acquiring first target data, wherein the first target data comprises first sub-data and second sub-data; obtaining a first probability distribution by varying a first decoder of the self-encoder according to the first sub-data, the first probability distribution being used as a conditional probability distribution of the second sub-data; compressing the second sub-data by an entropy encoder according to the first probability distribution to obtain a first bit stream; and compressing the first sub data to the first bit stream to obtain a second bit stream. Compared with the initial bit which is required by the anti-coding mechanism in the prior art and is additionally arranged, the embodiment of the application does not need the initial bit which is additionally arranged, so that the compression of single data points can be realized, and the compression ratio in parallel compression is reduced.

Description

Data compression method and related equipment

The present application claims priority from the chinese patent office, application number 202210249906.1, chinese patent application entitled "a data compression method and related apparatus," filed on

day

14, 3, 2022, the entire contents of which are incorporated herein by reference.

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to a data compression method and related devices.

Background

Nowadays multimedia data occupies a great majority of the internet traffic. Compression of image data plays a vital role in the storage and efficient transmission of multimedia data. Image coding is a technique of great practical value.

Research on image coding has been long history, researchers have proposed a large number of methods and made various international standards such as JPEG, JPEG2000, webP, BPG, etc. image coding standards. Although widely used at present, these conventional methods exhibit certain limitations with respect to the ever-increasing amount of image data and the ever-emerging new media types.

The lossless compression scheme based on artificial intelligence utilizes the characteristic that the depth generation model can estimate probability distribution of data more accurately than the traditional scheme, and obtains compression ratio far superior to the traditional lossless compression scheme. In artificial intelligence based lossless compression schemes, widely used depth generation models include an autoregressive model (autoregressive model), a variational auto-encoder (VAE), a stream model (normalizing flows), and the like. In general, the autoregressive model is well compatible with arithmetic encoders and huffman coding; the variable self-encoder can be well compatible with an asymmetric digital system by combining a reverse coding (bits-back) mechanism; the stream model is compatible with the three different entropy encoders described above. In addition to the compression ratio, the lossless compression solution is evaluated for an indicator of throughput. For lossless compression solutions based on artificial intelligence, the overall throughput is lower than conventional solutions because the model size is much larger than conventional solutions. In addition, there is no absolute precedence between lossless compression solutions based on different generative models for integrating two indexes of compression ratio and throughput rate. Current research is still in the stage of exploring the pareto fronts of compression schemes of different generative models.

Wherein the variational auto-encoder model is a hidden variable model, unlike a full observation model (e.g., an auto-regression model). Instead of modeling the data itself directly, this model additionally introduces one (or more) hidden variables, and then models the prior distribution, likelihood function, and approximate posterior distribution. The conventional entropy coding method cannot be directly used because marginal distribution of data cannot be directly obtained from the variable self-encoder. In order to enable lossless compression of data using a variable self-encoder, a variable self-encoding lossless compression scheme based on an inverse coding scheme is proposed. The bits-back ANS is the original form of the scheme and is applicable to a variant self-encoder model that contains only one hidden variable and can be generalized to a variant self-encoder model that contains multiple hidden variables.

Current de-coding scheme based variable self-encoder lossless compression schemes all require additional initial bits to decompress the samples of the implicit variable. The additional initial bits are randomly generated data, the size of the data needs to be considered in the compression cost, and when the number of data to be compressed in series is small, the additional average cost is high; moreover, since the additional initial bits required are proportional to the number of data points to be compressed, efficient parallel compression cannot be achieved.

Disclosure of Invention

Compared with the additional initial bit required by the anti-coding mechanism in the prior art, the data compression method provided by the application has the advantages that the additional initial bit is not needed, single-data-point compression can be realized, and the compression ratio in parallel compression is greatly reduced.

In a first aspect, the present application provides a data compression method, including: acquiring first target data, wherein the first target data comprises first sub-data and second sub-data;

in one possible implementation, the first target data may be image data for compression or other data (e.g., text, video, binary stream, etc.).

In one possible implementation, the first target data is an image block, and the first sub data and the second sub data are obtained by performing data segmentation on the image block; or,

the first target data is a text sequence, and the first sub data and the second sub data are obtained by data segmentation of the text sequence; or,

the first target data is a binary stream, and the first sub data and the second sub data are obtained by data segmentation of the binary stream; or,

The first target data is a video, and the first sub data and the second sub data are obtained by data segmentation of a plurality of image frames of the video.

In one possible implementation, the first sub-data and the second sub-data are obtained by performing data segmentation on the image block in a spatial dimension or a channel dimension. Wherein for image data one channel dimension (C) and two spatial dimensions (width W and height H) are included.

According to the first sub-data, a first probability distribution is obtained through a first decoder of a variable self-encoder, and the first probability distribution is used as a conditional probability distribution of the second sub-data; compressing the second sub-data by an entropy encoder according to the first probability distribution to obtain a first bit stream; the first sub-data is compressed (i.e. the first sub-data is compressed to the first bit stream) with the first bit stream as an initial bit stream to obtain a second bit stream.

Compared with the additional initial bit required by the anti-coding mechanism in the prior art, the embodiment of the invention does not need the additional initial bit, can realize the compression of single data point, and greatly reduces the compression ratio during parallel compression.

In one possible implementation, the first target data is an image block, and the first sub data and the second sub data are obtained by performing data segmentation on the image block.

In one possible implementation, the first sub-data and the second sub-data are obtained by performing data segmentation on the image block in a spatial dimension or a channel dimension.

In one possible implementation, the variance self-encoder may include a variance encoder, a decoder (e.g., a first decoder and a second decoder in embodiments of the present application), and a priori distribution of hidden variables.

In one possible implementation, the decoder may be formed by decoder layers (e.g., the first convolutional neural network and the second convolutional neural network in the embodiments of the present application), and the number of decoder layers is the same as the number of hidden variables in the variant-self encoder. The decoder layer functions to input hidden variables of a deeper layer and output a conditional probability distribution of current layer data (the current layer data may be hidden variables of a shallower layer or data).

In the existing variational self-encoder model, the variational encoder needs to input the whole data to predict the approximate posterior distribution of hidden variables, and the input hidden variables in the decoder directly predict the conditional probability distribution of the whole data. In the embodiment of the present application, the data to be compressed is divided into at least two parts, namely: first sub data and second sub data. Unlike the prior art in which all data is input to the variable encoder, in the embodiment of the present application, only a part of the data (the first sub-data) is input to the variable encoder to predict the approximate posterior distribution of the hidden variable, and the hidden variable is input to the first decoder to predict the conditional probability distribution of the first sub-data; the conditional probability distribution of the second sub-data depends on the first sub-data and may be determined in particular by inputting the first sub-data into the first decoder.

In one possible implementation, the decoder may implement a pixel reset operation.

In one possible implementation, the first decoder may include a first convolutional neural network and a second convolutional neural network, and the obtaining, according to the first sub-data, a first probability distribution by varying the first decoder of the self-encoder may specifically include: performing pixel reset operation from a space dimension to a channel dimension on second target data comprising the second sub data to obtain third sub data, wherein the sizes of the second target data and the first target data are consistent, and the sizes of the third sub data and the first sub data in the space dimension are the same;

the second target data including the second sub data may be data having the same size as the first target data, where elements other than the second sub data in the first target data may be set to zero (or other preset values) to obtain the second target data, and after performing the pixel resetting operation on the second target data, the second target data may be converted into third sub data having the same size as the first sub data in the spatial dimension.

According to the embodiment of the application, the encoder layer based on the autoregressive structure defined by the channel priority pixel reset is used, and the correlation between picture pixels is fully utilized, so that the parameter quantity required by a model is greatly reduced on the premise of obtaining a lower coding length, the compression throughput rate is improved, and the space cost of model storage is reduced.

In one possible implementation, fourth sub-data may be obtained through the first convolutional neural network according to the first sub-data, where the fourth sub-data and the third sub-data have the same size in a channel dimension. That is, the first sub-data may be subjected to feature extraction and size transformation through the first convolutional neural network, so as to obtain fourth sub-data of which the size of the helter third sub-data in the channel dimension is the same.

In one possible implementation, the third sub-data and the fourth sub-data may be fused to obtain fused sub-data. Alternatively, the fusion manner may be data substitution of the corresponding channel.

In one possible implementation, the fusing the third sub data and the fourth sub data may specifically include: and replacing the data of part of channels in the fourth sub data with the data of the corresponding channels in the third sub data to obtain the fused sub data.

In one possible implementation, the first probability distribution may be obtained from the fused sub-data through the second convolutional neural network.

In one possible implementation, the fused sub-data and the first sub-data may also be subjected to a splicing operation (concat) along a channel dimension to obtain spliced sub-data; furthermore, the obtaining, according to the fused sub-data, the first probability distribution through the second convolutional neural network may specifically include: and obtaining the first probability distribution through the second convolution neural network according to the spliced sub-data.

In a second aspect, the present application provides a data decompression method, including:

acquiring a second bit stream;

decoding first sub-data from the second bit stream to obtain a first bit stream;

according to the first sub-data, a first probability distribution is obtained through a first decoder of a variable self-encoder, and the first probability distribution is used as a conditional probability distribution of second sub-data;

according to the first probability distribution, second sub-data are obtained from the first bit stream through decoding by an entropy coder; the first sub data and the second sub data are used for restoring to obtain first target data.

In one possible implementation, the decoding the first sub-data from the second bit stream to obtain the first bit stream includes:

acquiring prior distribution of hidden variables;

according to the prior distribution, the hidden variable is solved from the second bit stream through an entropy coder to obtain a fourth bit stream;

obtaining a second probability distribution through a second decoder of the variable self-encoder according to the hidden variable; the second probability distribution is used as a conditional probability distribution of the first sub-data;

according to the second probability distribution, the first sub-data is solved from the fourth bit stream through the entropy coder to obtain a third bit stream;

Obtaining approximate posterior distribution of hidden variables through a variable encoder in the variable sub-encoder according to the first sub-data;

and compressing the hidden variable to the third bit stream through the entropy coder according to the approximate posterior distribution to obtain a first bit stream.

the first target data is a binary stream, and the first sub data and the second sub data are obtained by data segmentation of the binary stream; or alternatively

In one possible implementation, the first decoder includes a first convolutional neural network and a second convolutional neural network, and the deriving the first probability distribution by varying the first decoder of the self-encoder according to the first sub-data includes:

performing pixel reset operation from a space dimension to a channel dimension on second target data comprising the second sub data to obtain third sub data, wherein the sizes of the second target data and the first target data are consistent, and the sizes of the third sub data and the first sub data in the space dimension are the same;

according to the first sub-data, fourth sub-data is obtained through the first convolutional neural network, and the fourth sub-data and the third sub-data are the same in size of a channel dimension;

fusing the third sub data and the fourth sub data to obtain fused sub data;

and obtaining the first probability distribution through the second convolution neural network according to the fused sub-data.

In one possible implementation, the fusing the third sub data and the fourth sub data includes:

and replacing the data of part of channels in the fourth sub data with the data of the corresponding channels in the third sub data to obtain the fused sub data.

In one possible implementation, the method further comprises:

splicing the fused sub-data and the first sub-data along the channel dimension to obtain spliced sub-data;

and obtaining the first probability distribution through the second convolutional neural network according to the fused sub-data, wherein the first probability distribution comprises the following steps: and obtaining the first probability distribution through the second convolution neural network according to the spliced sub-data.

In a third aspect, the present application provides a data compression apparatus comprising:

the acquisition module is used for acquiring first target data, wherein the first target data comprises first sub-data and second sub-data;

the compression module is used for obtaining a first probability distribution through a first decoder of a variable self-encoder according to the first sub-data, and the first probability distribution is used as a conditional probability distribution of the second sub-data;

compressing the second sub-data by an entropy encoder according to the first probability distribution to obtain a first bit stream;

and compressing the first sub data to the first bit stream to obtain a second bit stream.

In one possible implementation, the compression module is specifically configured to:

decoding the hidden variable from the first bit stream through the entropy coder according to the approximate posterior distribution to obtain a third bit stream;

Compressing the first sub-data to the third bit stream by the entropy encoder according to the second probability distribution to obtain a fourth bit stream;

and compressing the hidden variable to the fourth bit stream through the entropy coder according to the prior distribution of the hidden variable to obtain a second bit stream.

In one possible implementation, the first decoder includes a first convolutional neural network and a second convolutional neural network, and the compression module is specifically configured to:

fusing the third sub data and the fourth sub data to obtain fused sub data;

In one possible implementation, the apparatus further includes:

the splicing module is used for carrying out splicing operation on the fused sub-data and the first sub-data along the channel dimension so as to obtain spliced sub-data;

In a fourth aspect, the present application provides a data decompression apparatus, including:

the acquisition module is used for acquiring a second bit stream;

a decompression module, configured to decode first sub-data from the second bit stream to obtain a first bit stream;

In one possible implementation, the code receiving module is specifically configured to:

acquiring prior distribution of hidden variables;

In one possible implementation, the first decoder includes a first convolutional neural network and a second convolutional neural network, and the decompression module is specifically configured to:

fusing the third sub data and the fourth sub data to obtain fused sub data;

In one possible implementation, the apparatus further includes:

In a fifth aspect, the present application provides a data compression apparatus comprising a storage medium, a processing circuit, and a bus system; wherein the storage medium is configured to store instructions, and the processing circuit is configured to execute the instructions in the memory to perform the data compression method according to any one of the first aspect.

In a sixth aspect, the present application provides a data compression apparatus comprising a storage medium, a processing circuit, and a bus system; wherein the storage medium is configured to store instructions, and the processing circuit is configured to execute the instructions in the memory to perform the data compression method according to any one of the second aspects.

In a seventh aspect, embodiments of the present application provide a computer readable storage medium having a computer program stored therein, which when run on a computer, causes the computer to perform the data compression method according to any one of the first aspects above.

In an eighth aspect, embodiments of the present application provide a computer-readable storage medium having a computer program stored therein, which when run on a computer, causes the computer to perform the data compression method according to any one of the second aspects above.

In a ninth aspect, embodiments of the present application provide a computer program which, when run on a computer, causes the computer to perform the data compression method of any one of the first aspects described above.

In a tenth aspect, embodiments of the present application provide a computer program which, when run on a computer, causes the computer to perform the data compression method of any of the second aspects described above.

In an eleventh aspect, the present application provides a chip system comprising a processor for supporting an execution device (e.g. a data compression means or a data decompression means) or a training device to perform the functions involved in the above aspects, e.g. to send or process data and/or information involved in the above methods. In one possible design, the chip system further includes a memory for holding program instructions and data necessary for the execution device or the training device. The chip system can be composed of chips, and can also comprise chips and other discrete devices.

The embodiment of the application provides a data compression method, which comprises the following steps: acquiring first target data, wherein the first target data comprises first sub-data and second sub-data; according to the first sub-data, a first probability distribution is obtained through a first decoder of a variable self-encoder, and the first probability distribution is used as a conditional probability distribution of the second sub-data; compressing the second sub-data by an entropy encoder according to the first probability distribution to obtain a first bit stream; and compressing the first sub data to the first bit stream to obtain a second bit stream. Compared with the additional initial bit required by the anti-coding mechanism in the prior art, the embodiment of the invention does not need the additional initial bit, can realize single-data-point compression, and reduces the compression ratio during parallel compression.

Drawings

FIG. 1 is a schematic diagram of a structure of an artificial intelligence main body frame;

fig. 2 is an application scenario illustration of an embodiment of the present application;

fig. 3 is an application scenario illustration of an embodiment of the present application;

FIG. 4 is a schematic illustration of a CNN-based data processing procedure;

FIG. 5 is a schematic illustration of a CNN-based data processing procedure;

FIG. 6 is a schematic diagram of an embodiment of a system architecture method according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a chip according to an embodiment of the present application;

fig. 8 is a flowchart of a data compression method according to an embodiment of the present application;

FIG. 9 is an exemplary illustration of a pixel replacement operation provided in an embodiment of the present application;

fig. 10 is a schematic flowchart of a decoder according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of a decoder provided in an embodiment of the present application;

fig. 12 is a flowchart of a data compression method according to an embodiment of the present application;

fig. 13 is a flowchart of a data compression method according to an embodiment of the present application;

fig. 14 is a flowchart of a data decompression method according to an embodiment of the present application;

FIG. 15 is a schematic structural diagram of a data compression device according to an embodiment of the present disclosure;

Fig. 16 is a schematic structural diagram of a data decompression device according to an embodiment of the present application;

fig. 17 is a schematic structural diagram of an execution device according to an embodiment of the present application.

Detailed Description

Embodiments of the present invention will be described below with reference to the accompanying drawings in the embodiments of the present invention. The terminology used in the description of the embodiments of the invention herein is for the purpose of describing particular embodiments of the invention only and is not intended to be limiting of the invention.

Embodiments of the present application are described below with reference to the accompanying drawings. As one of ordinary skill in the art can appreciate, with the development of technology and the appearance of new scenes, the technical solutions provided in the embodiments of the present application are applicable to similar technical problems.

The terms first, second and the like in the description and in the claims of the present application and in the above-described figures, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely illustrative of the manner in which the embodiments of the application described herein have been described for objects of the same nature. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Referring to fig. 1, a schematic structural diagram of an artificial intelligence main body framework is shown in fig. 1, and the artificial intelligence main body framework is described below from two dimensions of "intelligent information chain" (horizontal axis) and "IT value chain" (vertical axis). Where the "intelligent information chain" reflects a list of processes from the acquisition of data to the processing. For example, there may be general procedures of intelligent information awareness, intelligent information representation and formation, intelligent reasoning, intelligent decision making, intelligent execution and output. In this process, the data undergoes a "data-information-knowledge-wisdom" gel process. The "IT value chain" reflects the value that artificial intelligence brings to the information technology industry from the underlying infrastructure of personal intelligence, information (provisioning and processing technology implementation), to the industrial ecological process of the system.

(1) Infrastructure of

The infrastructure provides computing capability support for the artificial intelligence system, realizes communication with the outside world, and realizes support through the base platform. Communicating with the outside through the sensor; the computing power is provided by a smart chip (CPU, NPU, GPU, ASIC, FPGA and other hardware acceleration chips); the basic platform comprises a distributed computing framework, a network and other relevant platform guarantees and supports, and can comprise cloud storage, computing, interconnection and interworking networks and the like. For example, the sensor and external communication obtains data that is provided to a smart chip in a distributed computing system provided by the base platform for computation.

(2) Data

The data of the upper layer of the infrastructure is used to represent the data source in the field of artificial intelligence. The data relate to graphics, images, voice and text, and also relate to the internet of things data of the traditional equipment, including service data of the existing system and sensing data such as force, displacement, liquid level, temperature, humidity and the like.

(3) Data processing

Data processing typically includes data training, machine learning, deep learning, searching, reasoning, decision making, and the like.

Wherein machine learning and deep learning can perform symbolized and formalized intelligent information modeling, extraction, preprocessing, training and the like on data.

Reasoning refers to the process of simulating human intelligent reasoning modes in a computer or an intelligent system, and carrying out machine thinking and problem solving by using formal information according to a reasoning control strategy, and typical functions are searching and matching.

Decision making refers to the process of making decisions after intelligent information is inferred, and generally provides functions of classification, sequencing, prediction and the like.

(4) General capability

After the data has been processed, some general-purpose capabilities can be formed based on the result of the data processing, such as algorithms or a general-purpose system, for example, translation, text analysis, computer vision processing, speech recognition, image recognition, etc.

(5) Intelligent product and industry application

The intelligent product and industry application refers to products and applications of an artificial intelligent system in various fields, is encapsulation of an artificial intelligent overall solution, and realizes land application by making intelligent information decisions, and the application fields mainly comprise: intelligent terminal, intelligent transportation, intelligent medical treatment, autopilot, smart city etc.

The method and the device can be applied to the field of data compression in the field of artificial intelligence, and a plurality of application scenes of products falling to the ground are described below.

1. Image compression process applied to terminal equipment

The image compression method provided by the embodiment of the application can be applied to an image compression process in terminal equipment, and particularly can be applied to album, video monitoring and the like on the terminal equipment. Specifically, reference may be made to fig. 2, fig. 2 is an application scenario illustration of an embodiment of the present application, and as shown in fig. 2, a terminal device may obtain a picture to be compressed, where the picture to be compressed may be a photograph taken by a camera or a frame of picture taken from a video. The terminal equipment can perform feature extraction on the acquired picture to be compressed through an artificial intelligence (artificial intelligence, AI) coding unit in an embedded neural network (neural-network processing unit, NPU), convert image data into output features with lower redundancy, generate probability estimation of each point in the output features, perform arithmetic coding on the extracted output features through the probability estimation of each point in the output features by a central processing unit (central processing unit, CPU), reduce coding redundancy of the output features, further reduce data transmission amount in the image compression process, and store coded data obtained by coding in a corresponding storage position in a data file mode. When a user needs to acquire the file stored in the storage position, the CPU can acquire and load the stored file in the corresponding storage position, acquire the decoded feature map based on arithmetic decoding, and reconstruct the feature map through an AI decoding unit in the NPU to obtain a reconstructed image.

2. Image compression process applied to cloud side

The image compression method provided by the embodiment of the application can be applied to the image compression process of the cloud side, and particularly can be applied to functions such as cloud photo album on a cloud side server. Specifically, reference may be made to fig. 3, and fig. 3 is an application scenario illustration of an embodiment of the present application, where, as shown in fig. 3, a terminal device may obtain a picture to be compressed, where the picture to be compressed may be a photograph taken by a camera or a frame of picture taken from a video. The terminal device can perform lossless coding compression on the picture to be compressed through the CPU to obtain coded data, for example, but not limited to, any lossless compression method in the prior art, the terminal device can transmit the coded data to a cloud side server, the server can perform corresponding lossless decoding on the received coded data to obtain the picture to be compressed, the server can perform feature extraction on the obtained picture to be compressed through an AI coding unit in a graphic processor (graphics processing unit, GPU), the image data is converted into output features with lower redundancy, probability estimation of each point in the output features is generated, the CPU performs arithmetic coding on the extracted output features through the probability estimation of each point in the output features, coding redundancy of the output features is reduced, data transmission amount in the image compression process is further reduced, and the coded data obtained by coding is stored in a corresponding storage position in a data file mode. When a user needs to acquire the file stored in the storage position, the CPU may acquire and load the stored file in the corresponding storage position, acquire a feature map obtained by decoding based on arithmetic decoding, reconstruct the feature map through an AI decoding unit in the NPU to obtain a reconstructed image, and the server may perform lossless encoding compression on the picture to be compressed through the CPU to obtain encoded data, for example, but not limited to, based on any lossless compression method in the prior art, and the server may transmit the encoded data to the terminal device, and the terminal device may perform corresponding lossless decoding on the received encoded data to obtain a decoded image.

Since embodiments of the present application relate to a large number of applications of neural networks, for ease of understanding, related terms and concepts of the neural networks to which embodiments of the present application may relate are first described below.

(1) Neural network

The neural network may be composed of neural units, which may refer to an arithmetic unit having xs and intercept 1 as inputs, and the output of the arithmetic unit may be:

where s=1, 2, … …, n is a natural number greater than 1, ws is the weight of Xs, and b is the bias of the neural unit. f is an activation function (activation functions) of the neural unit for introducing a nonlinear characteristic into the neural network to convert an input signal in the neural unit to an output signal. The output signal of the activation function may be used as an input to a next convolutional layer, and the activation function may be a sigmoid function. A neural network is a network formed by joining together a plurality of the above-described single neural units, i.e., the output of one neural unit may be the input of another neural unit. The input of each neural unit may be connected to a local receptive field of a previous layer to extract features of the local receptive field, which may be an area composed of several neural units.

(2) Deep neural network

Deep neural networks (deep neural network, DNN), also known as multi-layer neural networks, can be understood as neural networks with multiple hidden layers. The DNNs are divided according to the positions of different layers, and the neural networks inside the DNNs can be divided into three types: input layer, hidden layer, output layer. Typically the first layer is the input layer, the last layer is the output layer, and the intermediate layers are all hidden layers. The layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.

Although DNN appears to be complex, it is not really complex in terms of the work of each layer, simply the following linear relational expression:

wherein (1)>

Is an input vector, +.>

Is the output vector, +.>

Is the offset vector, W is the weight matrix (also called coefficient), and α () is the activation function. Each layer is only for the input vector +.>

The output vector is obtained by such simple operation>

Since the DNN layers are many, the coefficient W and the offset vector +.>

And the number of (2) is also relatively large. The definition of these parameters in DNN is as follows: taking the coefficient W as an example: it is assumed that in DNN of one three layers, the linear coefficients of the 4 th neuron of the second layer to the 2 nd neuron of the third layer are defined as +. >

The superscript 3 represents the number of layers in which the coefficient W is located, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.

In summary, the coefficients of the kth neuron of the L-1 layer to the jth neuron of the L layer are defined as

It should be noted that the input layer is devoid of W parameters. In deep neural networks, more hidden layers make the network more capable of characterizing complex situations in the real world. Theoretically, the more parameters the higher the model complexity, the greater the "capacity", meaning that it can accomplish more complex learning tasks. The process of training the deep neural network, i.e. learning the weight matrix, has the final objective of obtaining a weight matrix (a weight matrix formed by a number of layers of vectors W) for all layers of the trained deep neural network.

(3) The convolutional neural network (convolutional neuron network, CNN) is a deep neural network with a convolutional structure. The convolutional neural network comprises a feature extractor consisting of a convolutional layer and a sub-sampling layer, which can be regarded as a filter. The convolution layer refers to a neuron layer in the convolution neural network, which performs convolution processing on an input signal. In the convolutional layer of the convolutional neural network, one neuron may be connected with only a part of adjacent layer neurons. A convolutional layer typically contains a number of feature planes, each of which may be composed of a number of neural elements arranged in a rectangular pattern. Neural elements of the same feature plane share weights, where the shared weights are convolution kernels. Sharing weights can be understood as the way features are extracted independent of location. The convolution kernel can be formed in a matrix with random size, and reasonable weight can be obtained through learning in the training process of the convolution neural network. In addition, the direct benefit of sharing weights is to reduce the connections between layers of the convolutional neural network, while reducing the risk of overfitting.

CNN is a very common neural network, and the structure of CNN is described in detail below with reference to fig. 4. As described in the foregoing description of the basic concept, the convolutional neural network is a deep neural network with a convolutional structure, and is a deep learning architecture, where the deep learning architecture refers to learning at multiple levels at different abstraction levels through machine learning algorithms. As a deep learning architecture, CNN is a feed-forward artificial neural network in which individual neurons can respond to an image input thereto.

As shown in fig. 4, convolutional Neural Network (CNN) 200 may include an input layer 210, a convolutional layer/pooling layer 220 (where the pooling layer is optional), and a fully-connected layer (fully connected layer) 230.

Convolution layer/pooling layer 220:

convolution layer:

the convolution/pooling layer 220 as shown in fig. 4 may include layers as examples 221-226, for example: in one implementation, layer 221 is a convolutional layer, layer 222 is a pooling layer, layer 223 is a convolutional layer, layer 224 is a pooling layer, layer 225 is a convolutional layer, and layer 226 is a pooling layer; in another implementation, 221, 222 are convolutional layers, 223 are pooling layers, 224, 225 are convolutional layers, and 226 are pooling layers. I.e. the output of the convolution layer may be used as input to a subsequent pooling layer or as input to another convolution layer to continue the convolution operation.

The internal principle of operation of one convolution layer will be described below using the convolution layer 221 as an example.

The convolution layer 221 may include a plurality of convolution operators, also known as kernels, which function in image processing as a filter to extract specific information from the input image matrix, which may be a weight matrix in nature, which is typically predefined, and which is typically processed on the input image in a horizontal direction, pixel by pixel (or two pixels by two pixels … …, depending on the value of the step size stride), to accomplish the task of extracting specific features from the image. The size of the weight matrix should be related to the size of the image, and it should be noted that the depth dimension (depth dimension) of the weight matrix is the same as the depth dimension of the input image, and the weight matrix extends to the entire depth of the input image during the convolution operation. Thus, convolving with a single weight matrix produces a convolved output of a single depth dimension, but in most cases does not use a single weight matrix, but instead applies multiple weight matrices of the same size (row by column), i.e., multiple homography matrices. The outputs of each weight matrix are stacked to form the depth dimension of the convolved image, where the dimension is understood to be determined by the "multiple" as described above. Different weight matrices may be used to extract different features in the image, e.g., one weight matrix is used to extract image edge information, another weight matrix is used to extract a particular color of the image, yet another weight matrix is used to blur unwanted noise in the image, etc. The plurality of weight matrixes have the same size (row and column), the feature images extracted by the plurality of weight matrixes with the same size have the same size, and the extracted feature images with the same size are combined to form the output of convolution operation.

The weight values in the weight matrices are required to be obtained through a large amount of training in practical application, and each weight matrix formed by the weight values obtained through training can be used for extracting information from an input image, so that the convolutional neural network 200 can perform correct prediction.

When convolutional neural network 200 has multiple convolutional layers, the initial convolutional layer (e.g., 221) tends to extract more general features, which may also be referred to as low-level features; as the depth of the convolutional neural network 200 increases, features extracted by the later convolutional layers (e.g., 226) become more complex, such as features of high level semantics, which are more suitable for the problem to be solved.

Pooling layer:

since it is often desirable to reduce the number of training parameters, the convolutional layers often require periodic introduction of pooling layers, one convolutional layer followed by one pooling layer, or multiple convolutional layers followed by one or more pooling layers, as illustrated by layers 221-226 in FIG. 4, 220. The only purpose of the pooling layer during image processing is to reduce the spatial size of the image. The pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to obtain a smaller size image. The average pooling operator may calculate pixel values in the image over a particular range to produce an average as a result of the average pooling. The max pooling operator may take the pixel with the largest value in a particular range as the result of max pooling. In addition, just as the size of the weighting matrix used in the convolutional layer should be related to the image size, the operators in the pooling layer should also be related to the image size. The size of the image output after the processing by the pooling layer can be smaller than the size of the image input to the pooling layer, and each pixel point in the image output by the pooling layer represents the average value or the maximum value of the corresponding sub-region of the image input to the pooling layer.

Full connection layer 230:

after processing by the convolutional layer/pooling layer 220, the convolutional neural network 200 is not yet sufficient to output the desired output information. Because, as previously described, the convolution/pooling layer 220 will only extract features and reduce the parameters imposed by the input image. However, in order to generate the final output information (the required class information or other relevant information), convolutional neural network 200 needs to utilize fully-connected layer 230 to generate the output of the required number of classes or groups. Thus, multiple hidden layers (231, 232 to 23n as shown in fig. 4) may be included in the fully-connected layer 230, and parameters included in the multiple hidden layers may be pre-trained according to relevant training data of specific task types, such as image recognition, image classification, image super-resolution reconstruction, and the like … …

After the hidden layers in the fully connected layer 230, i.e., the final layer of the overall convolutional neural network 200 is the output layer 240, the output layer 240 has a class-cross entropy-like loss function, specifically for calculating the prediction error, once the forward propagation of the overall convolutional neural network 200 (e.g., propagation from 210 to 240 directions in fig. 4 is forward propagation) is completed, the backward propagation (e.g., propagation from 240 to 210 directions in fig. 4 is backward propagation) will begin to update the weight values and deviations of the aforementioned layers to reduce the loss of the convolutional neural network 200 and the error between the result output by the convolutional neural network 200 through the output layer and the ideal result.

It should be noted that the convolutional neural network 200 shown in fig. 4 is only an example of a convolutional neural network, and in a specific application, the convolutional neural network may also exist in the form of other network models, for example, only includes a part of the network structure shown in fig. 4, for example, the convolutional neural network used in the embodiment of the present application may include only the input layer 210, the convolutional layer/pooling layer 220, and the output layer 240.

It should be noted that, the convolutional neural network 100 shown in fig. 4 is only an example of a convolutional neural network, and in a specific application, the convolutional neural network may also exist in the form of other network models, for example, a plurality of convolutional layers/pooling layers shown in fig. 5 are parallel, and the features extracted respectively are all input to the fully-connected layer 230 for processing.

(4) Loss function

In training the deep neural network, since the output of the deep neural network is expected to be as close to the value actually expected, the weight vector of each layer of the neural network can be updated by comparing the predicted value of the current network with the value actually expected and according to the difference between the predicted value and the value actually expected (of course, there is usually an initialization process before the first update, that is, the parameters are preconfigured for each layer in the deep neural network), for example, if the predicted value of the network is higher, the weight vector is adjusted to make the predicted value lower, and the adjustment is continued until the deep neural network can predict the value actually expected or the value very close to the value actually expected. Thus, it is necessary to define in advance "how to compare the predicted value and the difference between values", which is a loss function (loss function) or a function (objective function), which are important equations for measuring the difference between the predicted value and the value. Taking the loss function as an example, the higher the output value (loss) of the loss function is, the larger the difference is, and then the training of the deep neural network becomes a process of reducing the loss as much as possible.

(5) Back propagation algorithm

The neural network can adopt a Back Propagation (BP) algorithm to correct the parameter in the initial neural network model in the training process, so that the reconstruction error loss of the neural network model is smaller and smaller. Specifically, the input signal is transmitted forward until the output is generated with error loss, and the parameters in the initial neural network model are updated by back propagation of the error loss information, so that the error loss is converged. The back propagation algorithm is a back propagation motion that dominates the error loss, and aims to obtain parameters of the optimal neural network model, such as a weight matrix.

(6) Lossless compression: the data compression technology is that the compressed data length is smaller than the original data length. The compressed data is decompressed, and the recovered data must be identical to the original data.

(7) Compression length: the storage space occupied by the compressed data.

(8) Compression ratio: the ratio of the original data length to the compressed data length. If not compressed, the value is 1. The larger the value, the better.

(9) Number of bits per dimension: average bit length per dimension (byte) of the compressed data. The calculation formula is as follows: 8/compression ratio. If not compressed, this value is 8. The smaller the value, the better.

(10) Throughput rate: average the size of the data processed per second.

(11) Hidden variables: data having a specific probability distribution can be obtained by establishing conditional probabilities of the data and the original data.

(12) Encoding/decoding: the process of data compression is encoding and the process of decompression is decoding.

(13) And (3) performing inverse coding: a special encoding technique uses the extra binary data stored in the system to generate specific data using decoding.

The system architecture provided in the embodiment of the present application is described in detail below with reference to fig. 6. Fig. 6 is a schematic diagram of a system architecture according to an embodiment of the present application. As shown in fig. 6, the system architecture 500 includes an execution device 510, a training device 520, a database 530, a client device 540, a data storage system 550, and a data acquisition system 560.

The execution device 510 includes a computing module 511, an I/O interface 512, a preprocessing module 513, and a preprocessing module 514. The calculation module 511 may include a target model/rule 501 therein, with the preprocessing module 513 and preprocessing module 514 being optional.

As an example, the executing device 510 may be a mobile phone, a tablet, a notebook, an intelligent wearable device, etc., and the terminal device may perform compression processing on the acquired picture. As another example, the terminal device may be a Virtual Reality (VR) device. As another example, the embodiment of the present application may also be applied to intelligent monitoring, where a camera may be configured in the intelligent monitoring, and then the intelligent monitoring may obtain a picture to be compressed through the camera, etc., and it should be understood that the embodiment of the present application may also be applied to other scenes where image compression is required, and no one-to-one enumeration is performed on other application scenes herein.

The data acquisition device 560 is used to acquire training data. After the training data is collected, the data collection device 560 stores the training data in the database 530 and the training device 520 trains the target model/rule 501 based on the training data maintained in the database 530.

The target model/rule 501 (e.g., the variation self-encoder, the entropy encoder, etc. in the embodiment of the present application) can be used to implement the task of compressing and decompressing data, that is, the data to be processed (e.g., the first target data in the embodiment of the present application) is input into the target model/rule 501, and the compressed data (e.g., the second bit stream in the embodiment of the present application) can be obtained. In practical applications, the training data maintained in the database 530 is not necessarily acquired by the data acquisition device 560, but may be received from other devices. It should be noted that the training device 520 is not necessarily completely based on the training data maintained by the database 530 to train the target model/rule 501, and it is also possible to acquire the training data from the cloud or other places to train the model, which should not be taken as a limitation of the embodiments of the present application.

The target model/rule 501 obtained by training according to the training device 520 may be applied to different systems or devices, such as the execution device 510 shown in fig. 6, where the execution device 510 may be a terminal, such as a mobile phone terminal, a tablet computer, a notebook computer, an augmented reality (augmented reality, AR)/Virtual Reality (VR) device, a vehicle-mounted terminal, or may also be a server or cloud. In fig. 6, an execution device 510 configures an input/output (I/O) interface 512 for data interaction with external devices, and a user may input data to the I/O interface 512 through a client device 540.

The preprocessing module 513 and the preprocessing module 514 are used for preprocessing according to the input data received by the I/O interface 512. It should be appreciated that there may be no pre-processing module 513 and pre-processing module 514 or only one pre-processing module. When the preprocessing module 513 and the preprocessing module 514 are not present, the calculation module 511 may be directly employed to process the input data.

In preprocessing input data by the execution device 510, or in performing processing related to computation or the like by the computation module 511 of the execution device 510, the execution device 510 may call data, codes or the like in the data storage system 550 for corresponding processing, or may store data, instructions or the like obtained by corresponding processing in the data storage system 550.

Finally, the I/O interface 512 presents the processing results to the client device 540 for presentation to the user.

In the case shown in FIG. 6, the user may manually give input data, which may be manipulated through an interface provided by the I/O interface 512. In another case, the client device 540 may automatically send the input data to the I/O interface 512, and if the client device 540 is required to automatically send the input data requiring authorization from the user, the user may set the corresponding permissions in the client device 540. The user may view the results output by the execution device 510 at the client device 540, and the specific presentation may be in the form of a display, a sound, an action, or the like. The client device 540 may also be used as a data collection terminal to collect input data from the input I/O interface 512 and output data from the output I/O interface 512 as new sample data, and store the new sample data in the database 530. Of course, instead of being collected by the client device 540, the I/O interface 512 may directly store the input data of the I/O interface 512 and the output result of the I/O interface 512 as new sample data into the database 530.

It should be noted that fig. 6 is only a schematic diagram of a system architecture provided in the embodiment of the present application, and the positional relationship among devices, apparatuses, modules, etc. shown in the drawing is not limited in any way, for example, in fig. 6, the data storage system 550 is an external memory with respect to the execution device 510, and in other cases, the data storage system 550 may be disposed in the execution device 510.

The following describes a chip hardware structure provided in the embodiments of the present application.

Fig. 7 is a hardware architecture diagram of a chip according to an embodiment of the present application, where the chip includes a neural network processor 700. The chip may be provided in an execution device 510 as shown in fig. 6 to perform the calculation of the calculation module 511. The chip may also be provided in a training device 520 as shown in fig. 6 for completing training work of the training device 520 and outputting the target model/rule 501. The algorithms of the layers in the image processing model shown in fig. 6 can be implemented in the chip shown in fig. 7.

The neural network processor (neural processing unit, NPU) 700 is mounted as a co-processor to a main central processing unit (host central processing unit, host CPU) which distributes tasks. The NPU has a core part of an arithmetic circuit 703, and the controller 704 controls the arithmetic circuit 703 to extract data in a memory (weight memory 702 or input memory 701) and perform arithmetic.

In some implementations, the arithmetic circuitry 703 internally includes a plurality of processing units (PEs). In some implementations, the arithmetic circuit 703 is a two-dimensional systolic array. The arithmetic circuit 703 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 703 is a general purpose matrix processor.

For example, assume that there is an input matrix a, a weight matrix B, and an output matrix C. The arithmetic circuit 703 takes the data corresponding to the matrix B from the weight memory 702 and buffers the data on each PE in the arithmetic circuit 703. The arithmetic circuit 703 takes the matrix a data from the input memory 701 and performs matrix operation with the matrix B, and the obtained partial result or the final result of the matrix is stored in an accumulator (accumulator) 708.

The vector calculation unit 707 may further process the output of the operation circuit 703, such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like. For example, vector computation unit 707 may be used for network computation of non-convolutional/non-FC layers in a neural network, such as pooling, batch normalization (batch normalization), local response normalization (local response normalization), and the like.

In some implementations, the vector computation unit 707 can store the vector of processed outputs to the unified memory 706. For example, the vector calculation unit 707 may apply a nonlinear function to an output of the operation circuit 703, such as a vector of accumulated values, to generate an activation value. In some implementations, the vector calculation unit 707 generates normalized values, combined values, or both. In some implementations, the vector of processed outputs can be used as an activation input to the arithmetic circuit 703, for example for use in subsequent layers in a neural network.

The unified memory 706 is used for storing input data and output data.

The weight data is transferred to the input memory 701 and/or the unified memory 706 directly by the memory cell access controller (direct memory access controller, DMAC) 705, the weight data in the external memory is stored in the weight memory 702, and the data in the unified memory 706 is stored in the external memory.

A bus interface unit (bus interface unit, BIU) 710 for interfacing between the main CPU, DMAC and finger memory 709 over a bus.

An instruction fetch memory (instruction fetch buffer) 709 coupled to the controller 704 for storing instructions for use by the controller 704.

The controller 704 is configured to invoke an instruction cached in the instruction fetch memory 709, so as to control a working process of the operation accelerator.

Typically, the unified memory 706, input memory 701, weight memory 702, and finger memory 709 are on-chip (on-chip) memories, and the external memory is a memory external to the NPU, which may be a double data rate synchronous dynamic random access memory (double data rate synchronous dynamic random access memory, DDR SDRAM), high bandwidth memory (high bandwidth memory, HBM), or other readable and writable memory.

Lossless compression of data is one of the important fundamental directions in the field of information technology. The aim is to create a double shot of the original data space into the coding space, so that data with a higher occurrence frequency and longer occurrence frequency are represented by shorter codes, thereby obtaining a shorter data representation length in an average sense, and a one-to-one conversion between the original data space and the coding space can be realized according to the double shot. According to the shannon source coding theorem, the optimal lossless compression length of the data is determined by shannon information entropy of data probability distribution; and the more accurate the data probability distribution is estimated, the more the near-optimal lossless compression length can be obtained.

The lossless compression scheme based on artificial intelligence utilizes the characteristic that the depth generation model can estimate probability distribution of data more accurately than the traditional scheme, and obtains compression ratio far superior to the traditional lossless compression scheme. In artificial intelligence based lossless compression schemes, widely used depth generation models include an autoregressive model (autoregressive Models), a variational auto-encoder (VAE), a stream model (normalizing flows), and the like. In general, the autoregressive model is well compatible with arithmetic encoders and huffman coding; the variable self-encoder can be well compatible with an asymmetric digital system by combining a reverse coding (bits-back) mechanism; the stream model is compatible with the three different entropy encoders described above. In addition to the compression ratio, the lossless compression solution is evaluated for an indicator of throughput. For lossless compression solutions based on artificial intelligence, the overall throughput is lower than conventional solutions because the model size is much larger than conventional solutions. In addition, there is no absolute precedence between lossless compression solutions based on different generative models for integrating two indexes of compression ratio and throughput rate. Current research is still in the stage of exploring the pareto fronts of compression schemes of different generative models.

Taking the bit-Back ANS and the variable self-encoder containing one hidden variable as an example, in the variable self-encoder containing one hidden variable, the model can be divided into three modules, namely: a priori module, a variant encoder module, and a decoder module. The above three modules can be used to determine the parameters of the following three distributions, respectively, namely: a priori distribution of hidden variables, likelihood functions of hidden variables (conditional probability distribution of data) and approximate posterior distribution of hidden variables.

The data compression step in the technical scheme is as follows:

1. acquiring data to be compressed;

2. acquiring additional initial bit data (bit stream 1);

3. inputting data to be compressed into a variation encoder so as to obtain approximate posterior distribution of hidden variables; decoding a sample of an hidden variable from the bit stream 1 using an entropy encoder according to the approximate posterior distribution, and obtaining a bit stream 2;

4. inputting the decompressed hidden variable samples into a decoder, thereby obtaining conditional probability distribution of the data; compressing data to be compressed into a bit stream 2 using an entropy encoder according to a conditional probability distribution of the data, thereby obtaining a bit stream 3;

5. acquiring prior distribution of hidden variables from a prior module; compressing samples of the hidden variables into a bit stream 3 using an entropy encoder according to a priori distribution of the hidden variables, thereby obtaining a bit stream 4;

6. the bit stream 4 is output as final compressed bit data.

The decompression step of the data in the technical scheme is as follows:

1. acquiring bit data (bit stream 4) to be decompressed;

2. acquiring prior distribution of hidden variables from a prior module; decompressing, using an entropy encoder, from the fourth bitstream, the hidden variable samples used in the compression stage according to the a priori distribution of the hidden variables, thereby obtaining a bitstream 3;

3. Inputting the hidden variable samples into a decoder, thereby obtaining conditional probability distribution of data; decompressing the compressed data from the bit stream 3 using an entropy encoder according to the conditional probability distribution of the data and obtaining a bit stream 2;

4. inputting the decompressed data into a variation encoder, thereby obtaining approximate posterior distribution of hidden variables; compressing the above-mentioned hidden variable samples into a bit stream 2 using an entropy encoder according to an approximate posterior distribution, thereby obtaining a bit stream 1;

5. outputting bit stream 1 as the recovered additional initial bits;

6. outputting the decompressed data.

Based on the technical background, the invention aims at improving an artificial intelligent lossless compression scheme based on a variation self-encoder. The invention improves the two major pain problems in the subdivision field: firstly, a special autoregressive structure is introduced to reduce the parameter quantity required by the variable autorecoder to achieve the same compression ratio, so that the throughput rate is improved; secondly, by introducing a special variation encoder and decoder structure and proposing a new anti-coding algorithm, the random initial bits necessary before the lossless compression scheme based on the variation self-encoder are removed, thereby realizing single-data-point compression decompression and efficient parallel compression decompression of the scheme.

Referring to fig. 8, fig. 8 is an embodiment schematic diagram of a data compression method provided in an embodiment of the present application, and as shown in fig. 8, the data compression method provided in the embodiment of the present application includes:

801. first target data is acquired, wherein the first target data comprises first sub-data and second sub-data.

In one possible implementation, the first target data may be image data for compression or other data (such as text, video, etc.), where the first target data may be an image (or a part of an image) captured by the terminal device through a camera, or the first target data may also be an image obtained from inside the terminal device (for example, an image stored in an album of the terminal device, or a picture obtained by the terminal device from the cloud). It should be understood that the first target data may be data having an image compression requirement, and the present application does not limit the source of the first target image.

For example, the first target data may include 6 channels, the first sub data may be data of the first three channels in the first target data, and the second sub data may be data of the last three channels in the first target data.

For example, the size of the first target data in the spatial dimension is n×n, the first sub data may be data of the first target data in the spatial dimension (0 to N/2) ×n, and the second sub data may be data of the first target data in the spatial dimension (N/2 to N) ×n.

It should be understood that the present application is not limited to the segmentation rule for performing data segmentation on the first target data.

802. According to the first sub-data, a first probability distribution is obtained through a first decoder of a variable self-encoder, and the first probability distribution is used as a conditional probability distribution of the second sub-data;

Next, the structure of the decoder in the embodiment of the present application will be described:

Next, pixel reset operation from spatial dimension to channel dimension in the embodiment of the present application is described:

in one possible implementation, a parameter (denoted as k) and two reversible operations (denoted as space-to-channel operation and channel-to-space operation) for pixel reset in the channel dimension and the space dimension may be configured. The parameter k takes a positive integer, and determines the ratio of dimensional changes of the input tensor space and the output tensor space in the two reversible operations. For the same k, the space-to-channel operation and the channel-to-space operation are reciprocal.

For images, the image data may be represented as vectors that contain one channel dimension (C) and two spatial dimensions (wide W and high H). Due to the nature of data batching in deep learning techniques, the corresponding tensor representation is one more batch dimension (N), i.e. the picture data tensor contains four dimensions (NCHW or NHWC). Taking HCHW as an example, one of the sizes is n ₁ *c ₁ *h ₁ *w ₁ The tensor-passing parameter k space-time channel operation can be changed to a size n ₁ *k ² c ₁ *h ₁ /k*w ₁ Tensor of/k. Where requirement h ₁ And w ₁ K can be divided in whole. One of size n ₂ *c ₂ *h ₂ *w ₂ Tensor of (a) is changed to n in size through channel space operation with k parameter ₂ *c ₂ /k ² *kh ₂ *kw ₂ Tensors of (c). It can be seen that the two operations described above do not change the total number of elements in the tensor, only the positions of the elements in the tensor. Different pixel reset means are available for different rules of pixel reset. The pixel reset operation used in the embodiments of the present application adopts a channel-first manner. Since the two operations of space-to-channel and channel-to-space are reciprocal at a fixed k, fig. 9 shows the effect of the operation of space-to-channel when n is 1, h and w is 4, c is 3, and k is 2.

Taking the first decoder as an example, in one possible implementation, the first decoder may include a first convolutional neural network and a second convolutional neural network, and the obtaining, according to the first sub-data, a first probability distribution by changing the first decoder of the self-encoder may specifically include: performing pixel reset operation from a space dimension to a channel dimension on second target data comprising the second sub data to obtain third sub data, wherein the sizes of the second target data and the first target data are consistent, and the sizes of the third sub data and the first sub data in the space dimension are the same;

The second target data including the second sub data may be data having the same size as the first target data, where elements other than the second sub data in the first target data may be set to zero (or other preset values) to obtain the second target data, and after performing the pixel resetting operation on the second target data, the second target data may be converted into third sub data having the same size as the first sub data in the spatial dimension. For example, reference may be made to fig. 10, where the current layer variable may be the second target data described above, and a pixel reset operation may be performed on the current layer variable.

Wherein the third sub-data z 'may be used in calculating the probability distribution of the (i+1) -th channel pixel of the second sub-data' _i-1 The first i channels replace the fourth sub-data z _i Is a channel of the first i channels.

Illustratively, referring to FIG. 10, the deeper hidden variable is z _i Output as current layer variable z _i-1 Is a conditional probability distribution of (c). The input of the neural network one is z _i Output is equal to z _i-1 Tensors z 'of the same size (same number of elements included)' _i . Tensor z' _i The pixel reset operation becomes sum z through the space of parameter k _i Tensor z' with same size in space dimension _i . Due to tensor z _i And tensor z _i The space dimensions are the same, so that splicing operation can be performed according to the channel dimensions to obtain a spliced tensor z '' _i . Tensor z _i-1 The pixel reset operation becomes sum z through the space of parameter k _i Tensor z 'with the same spatial dimension' _i-1 . The decoder layer one introduces an autoregressive structure in the following way: tensor z'. _i Inputting the second neural network to obtain tensor z' _i-1 Probability distribution parameters of the first channel pixels; tensor z' _i-1 The first i channels replace z '' _i From z' _i Is input into a neural network II to obtain tensor z' _i-1 Probability distribution parameters for the (i+1) th channel pixel.

Referring to fig. 11, fig. 11 is a schematic diagram of a variable self-encoder according to an embodiment of the present application, where the first decoder corresponds to the second decoder according to an embodiment of the present application, and the second decoder corresponds to the first decoder according to an embodiment of the present application.

Exemplary, table 1 below shows an exemplary flow for a variant self-encoder containing a hidden variable, as described in table 1 below. The compression flow of the flow is shown in the left side of the table 1, and the decompression flow is shown in the right side of the table 1. The approximate posterior distribution q (z) ¹ The parameters of x) are given after the compression data is input by the variable encoder. Its deepest hidden variable firstDistribution of tests

The parameters of the model are directly given by the parameters of the prior distribution module of the deepest hidden variables in the model. The remaining parameters of the conditional probability distribution are each output by the corresponding decoder layer by inputting the values of the conditional data. Each of the decoder layer structures referred to may refer to the description of the decoder in the above embodiments. X in Table 1 ₁ ,…x ₁₂ The 12 channels obtained by the conversion of the spatial dimension into the channel dimension by the channel-first pixel reset (parameter k is 2) are input for data x (including 3 channels). For->

And the same is true.

TABLE 1

803. Compressing the second sub-data by an entropy encoder according to the first probability distribution to obtain a first bit stream;

in one possible implementation, the second sub-data may be compressed by an entropy encoder according to the first probability distribution to obtain a first bit stream. The first bit stream may be used as an initial bit stream and the first sub data is compressed. Compared with the additional initial bit required by the anti-coding mechanism in the prior art, the embodiment of the invention does not need the additional initial bit, can realize the compression of single data point, and greatly reduces the compression ratio during parallel compression.

804. The first sub-data is compressed to the first bit stream.

In one possible implementation, the compressing the first sub-data with the first bit stream as an initial bit stream may specifically include: obtaining approximate posterior distribution of hidden variables through a variable encoder in the variable sub-encoder according to the first sub-data; obtaining the hidden variable from the first bit stream through the entropy coder according to the approximate posterior distribution, and obtaining a third bit stream; obtaining a second probability distribution through a second decoder of the variable self-encoder according to the hidden variable; the second probability distribution is used as a conditional probability distribution of the first sub-data; compressing the first sub-data to the third bit stream by the entropy encoder according to the second probability distribution to obtain a fourth bit stream; and compressing the hidden variable to the fourth bit stream through the entropy coder according to the prior distribution of the hidden variable to obtain a second bit stream.

Illustratively, on the encoding side, the following steps may be performed:

1. acquiring first target data;

2. dividing first target data into first sub-data and second sub-data, and inputting the first sub-data into a first decoder to obtain conditional probability distribution of the second sub-data; compressing the second sub-data using a conditional probability distribution of the second sub-data to obtain initial bit data (first bit stream);

3. inputting the first sub-data into a variation encoder, thereby obtaining an approximate posterior distribution of the hidden variable; decoding a sample of an hidden variable from the first bit stream using an entropy encoder according to the approximate posterior distribution and obtaining a third bit stream;

4. inputting the decompressed hidden variable samples into a second decoder, thereby obtaining the conditional probability distribution of the data; compressing the first sub-data into a third bit stream using an entropy encoder according to a conditional probability distribution of the data, thereby obtaining a fourth bit stream;

5. acquiring prior distribution of hidden variables from a prior module; compressing samples of the hidden variable into a third bit stream using an entropy encoder according to a priori distribution of the hidden variable, thereby obtaining a second bit stream;

6. and outputting the second bit stream as the finally compressed bit data.

Accordingly, in the decoding section, the decompression step may be:

1. acquiring bit data (second bit stream) to be decompressed;

2. acquiring prior distribution of hidden variables from a prior module; decompressing, using an entropy encoder, from the second bitstream, the hidden variable samples used in the compression stage according to the a priori distribution of the hidden variables, thereby obtaining a fourth bitstream;

3. inputting the hidden variable sample into a decoder I, thereby obtaining conditional probability distribution of sub-data I; decompressing the compressed first sub-data from the fourth bit stream using an entropy encoder according to the conditional probability distribution of the first sub-data and obtaining a third bit stream;

4. inputting the decompressed sub-data into a variation coder so as to obtain approximate posterior distribution of hidden variables; compressing the hidden variable samples into a third bit stream using an entropy encoder according to an approximate posterior distribution, thereby obtaining a first bit stream;

5. inputting the first sub data into a second decoder so as to obtain conditional probability distribution of the second sub data; the conditional probability distribution of the root number sub-data II is solved to obtain the sub-data II from the first bit stream, and the bit stream is consumed;

6. inverting the first sub data and the second sub data according to the separation mode during compression, thereby obtaining the original data, and outputting the decompressed original data (first target data).

Referring to fig. 12, fig. 12 is an illustration of a data compression process when the hidden variable number is 1, where S is first target data, S1 is first sub data, and S2 is second sub data.

In order to compare the difference between the embodiment of the present application and the prior art, the flow is shown in fig. 13. FIG. 13 shows the compression decompression core flow of the prior art scheme to the left; fig. 13 shows the compression decompression core flow without additional initial bits in the embodiment of the present application.

Referring to table 2, table 2 is a core method flow for a variable auto-encoder lossless compression solution based on a decoder layer of an auto-regressive structure defined by channel-first pixel reset and without additional initial bits.

TABLE 2

The beneficial effects of the embodiments of the present application are described next in conjunction with experimental results.

In the embodiment of the application, the used encoder better utilizes the correlation among pixels of the picture data, and the model size can be reduced by 100 times while giving more optimal coding length than the lossless compression scheme of the same model.

Table 3 shows the average number of coded bits per dimension (bpd) over the public data set for this Scheme (SHVC) and other industry optimizations. It can be seen that the effect of this scheme is optimal or near optimal in all comparative schemes, including the conventional scheme, the VAE model scheme and the stream model scheme. And is optimal in the same type of scheme (VAE based).

Table 4 shows that in addition to the advantages of the coding length, the reasoning time of the scheme is greatly reduced due to the smaller number of model parameters, so that the throughput rate of compression and decompression is improved. The data statistics is set to 10000 CIFAR10 pictures, the batch size is 100, and the hardware is a V100 video card.

TABLE 3 Table 3

TABLE 4 Table 4

In addition, the embodiment of the application can realize single-data-point compression and efficient parallel compression while avoiding additional initial bits required by the existing anti-coding mechanism. The average per-dimension code length when taking into account the extra initial bits in three cases of the present application embodiment (SHVC), using (SHVC-ARIB) and using deterministic posterior (essentially self-encoder model) and scheme without anti-coding mechanism (SHVC-Det) is shown in table 5. It can be seen from table 5 that the use of this scheme can reduce the extra space cost by up to 30 times over the current anti-coding compression algorithm.

TABLE 5

	SHVC	SHVC-ARIB	SHVC-Det
				CIFAR10	4.18	3.19	3.37
ImageNet32	5.03	4.00	4.17
				ImageNet64	4.57	3.71	3.90

It should be appreciated that embodiments of the present application may also be used in lossy compression of data.

The embodiment of the application provides a data compression method, which comprises the following steps: acquiring first target data, wherein the first target data comprises first sub-data and second sub-data; according to the first sub-data, a first probability distribution is obtained through a first decoder of a variable self-encoder, and the first probability distribution is used as a conditional probability distribution of the second sub-data; compressing the second sub-data by an entropy encoder according to the first probability distribution to obtain a first bit stream; and compressing the first sub data by taking the first bit stream as an initial bit stream to obtain a second bit stream. Compared with the additional initial bit required by the anti-coding mechanism in the prior art, the embodiment of the invention does not need the additional initial bit, can realize the compression of single data point, and greatly reduces the compression ratio during parallel compression.

Referring to fig. 14, fig. 14 is a flowchart of a data decompression method provided in an embodiment of the present application, and as shown in fig. 14, the data decompression method provided in the embodiment of the present application includes:

1401. acquiring a priori distribution of the hidden variables;

1402. according to the prior distribution, the hidden variable is solved from the second bit stream through an entropy coder to obtain a fourth bit stream;

1403. obtaining a second probability distribution through a second decoder of the variable self-encoder according to the hidden variable; the second probability distribution is used as a conditional probability distribution of the first sub-data;

1404. according to the second probability distribution, the first sub-data is solved from the fourth bit stream through the entropy coder to obtain a third bit stream;

1405. obtaining approximate posterior distribution of hidden variables through a variable encoder in the variable sub-encoder according to the first sub-data;

1406. compressing the hidden variable to the third bit stream by the entropy coder according to the approximate posterior distribution to obtain a first bit stream;

1407. according to the first sub-data, a first probability distribution is obtained through a first decoder of the variable self-encoder, and the first probability distribution is used as a conditional probability distribution of the second sub-data;

1408. According to the first probability distribution, second sub-data are solved from the first bit stream through the entropy coder; the first sub data and the second sub data are used to determine first target data.

fusing the third sub data and the fourth sub data to obtain fused sub data;

In one possible implementation, the method further comprises:

For a description of the data decompression method, reference may be made to fig. 8 and a description related to data decompression in a corresponding embodiment, which will not be repeated here.

In order to better implement the above-described solutions according to the embodiments of the present application, on the basis of the embodiments corresponding to fig. 1 to 14, the following further provides related devices for implementing the above-described solutions. Referring specifically to fig. 15, fig. 15 is a schematic structural diagram of a data compression device 1500 provided in an embodiment of the present application, where the data compression device 1500 may be a terminal device or a server, and the data compression device 1500 includes:

an obtaining module 1501, configured to obtain first target data, where the first target data includes first sub-data and second sub-data;

for a specific description of the acquisition module 1501, reference may be made to the description of step 801 in the above embodiment, which is not repeated here.

A compression module 1502, configured to obtain a first probability distribution according to the first sub-data by changing a first decoder of a self-encoder, where the first probability distribution is used as a conditional probability distribution of the second sub-data;

And compressing the first sub data by taking the first bit stream as an initial bit stream to obtain a second bit stream.

For a specific description of the compression module 1502, reference may be made to descriptions of steps 802 to 804 in the above embodiments, which are not repeated here.

obtaining the hidden variable from the first bit stream through the entropy coder according to the approximate posterior distribution, and obtaining a third bit stream;

fusing the third sub data and the fourth sub data to obtain fused sub data;

In one possible implementation, the apparatus further includes:

Referring to fig. 16, fig. 16 is a schematic structural diagram of a data decompression apparatus 1600 provided in an embodiment of the present application, where the data decompression apparatus 1600 may be a terminal device or a server, and the data decompression apparatus 1600 may include:

an acquisition module 1601, configured to acquire a second bit stream and a priori distribution of hidden variables;

for a specific description of the acquisition module 1601, reference may be made to the description of step 1401 in the above embodiment, and a detailed description is omitted here.

A decompression module 1602, configured to decompress the hidden variable from the second bit stream by an entropy encoder according to the prior distribution, to obtain a fourth bit stream;

compressing the hidden variable to the third bit stream by the entropy coder according to the approximate posterior distribution to obtain a first bit stream;

according to the first sub-data, a first probability distribution is obtained through a first decoder of the variable self-encoder, and the first probability distribution is used as a conditional probability distribution of the second sub-data;

according to the first probability distribution, second sub-data are solved from the first bit stream through the entropy coder; the first sub data and the second sub data are used to determine first target data.

For a specific description of the decompression module 1602, reference may be made to descriptions of steps 1402 to 1408 in the above embodiments, which are not repeated here.

Fusing the third sub data and the fourth sub data to obtain fused sub data;

In one possible implementation, the apparatus further includes:

Next, referring to fig. 17, fig. 17 is a schematic structural diagram of an execution device provided in the embodiment of the present application, where execution device 1700 may specifically be represented by a virtual reality VR device, a mobile phone, a tablet, a notebook, an intelligent wearable device, a monitoring data processing device, and the like, which is not limited herein. The execution apparatus 1700 may be provided with the data compression device described in the corresponding embodiment of fig. 15 or the data decompression device described in the corresponding embodiment of fig. 16. Specifically, the execution apparatus 1700 may include: receiver 1701, transmitter 1702, processor 1703 and memory 1704 (where the number of processors 1703 in execution device 1700 may be one or more, one processor is illustrated in fig. 15), where processor 1703 may include an application processor 17031 and a communication processor 17032. In some embodiments of the present application, the receiver 1701, transmitter 1702, processor 1703 and memory 1704 may be connected by a bus or other means.

The memory 1704 may include read only memory and random access memory and provide instructions and data to the processor 1703. A portion of the memory 1704 may also include non-volatile random access memory (NVRAM). The memory 1704 stores a processor and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, wherein the operating instructions may include various operating instructions for performing various operations.

The processor 1703 controls the operation of the execution device. In a specific application, the individual components of the execution device are coupled together by a bus system, which may include, in addition to a data bus, a power bus, a control bus, a status signal bus, etc. For clarity of illustration, however, the various buses are referred to in the figures as bus systems.

The methods disclosed in the embodiments of the present application may be applied to the processor 1703 or implemented by the processor 1703. The processor 1703 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the methods described above may be performed by integrated logic circuitry in hardware or instructions in software in the processor 1703. The processor 1703 may be a general purpose processor, a digital signal processor (digital signal processing, DSP), a microprocessor, or a microcontroller, and may further include an application specific integrated circuit (application specific integrated circuit, ASIC), a field-programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The processor 1703 may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present application. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in hardware, in a decoded processor, or in a combination of hardware and software modules in a decoded processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. Which is located in the memory 1704 and the processor 1703 reads information from the memory 1704 and, in combination with its hardware, performs the steps of the method described above.

The receiver 1701 may be used to receive input numeric or character information and to generate signal inputs related to performing relevant settings of the device and function control. The transmitter 1702 may be used to output numeric or character information via a first interface; the transmitter 1702 may also be configured to send instructions to the disk stack via the first interface to modify data in the disk stack; the transmitter 1702 may also include a display device such as a display screen.

There is also provided in an embodiment of the present application a computer program product comprising instructions which, when executed on a computer, cause the computer to perform the steps performed by the method described in the embodiment shown in fig. 8, or cause the computer to perform the steps performed by the method described in the embodiment shown in fig. 14.

There is also provided in the embodiments of the present application a computer-readable storage medium in which a program for performing signal processing is stored, which when run on a computer causes the computer to perform the steps performed by the method described in the embodiment shown in fig. 8 or causes the computer to perform the steps performed by the method described in the embodiment shown in fig. 14.

It should be further noted that the above-described apparatus embodiments are merely illustrative, and that the units described as separate units may or may not be physically separate, and that units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the device provided by the application, the connection relation between the modules represents that the modules have communication connection therebetween, and can be specifically implemented as one or more communication buses or signal lines.

From the above description of the embodiments, it will be apparent to those skilled in the art that the present application may be implemented by means of software plus necessary general purpose hardware, or of course may be implemented by dedicated hardware including application specific integrated circuits, dedicated CPUs, dedicated memories, dedicated components and the like. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions can be varied, such as analog circuits, digital circuits, or dedicated circuits. However, a software program implementation is a preferred embodiment in many cases for the present application. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk or an optical disk of a computer, etc., including several instructions for causing a computer device (which may be a personal computer, a training device, or a network device, etc.) to perform the method described in the embodiments of the present application.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, training device, or data center to another website, computer, training device, or data center via a wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a training device, a data center, or the like that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy Disk, a hard Disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.

Claims

1. A method of data compression, comprising:

acquiring first target data, wherein the first target data comprises first sub-data and second sub-data;

according to the first sub-data, a first probability distribution is obtained through a first decoder of a variable self-encoder, and the first probability distribution is used as a conditional probability distribution of the second sub-data;

2. The method according to claim 1, wherein the first target data is an image block, and the first sub data and the second sub data are obtained by performing data segmentation on the image block; or,

3. The method according to claim 1 or 2, wherein the first sub-data and the second sub-data are obtained by performing data segmentation on the image block in a spatial dimension or a channel dimension.

4. A method according to any one of claims 1 to 3, wherein said compressing said first sub-data into said first bit stream comprises:

5. The method of any of claims 1 to 4, wherein the first decoder comprises a first convolutional neural network and a second convolutional neural network, wherein the deriving the first probability distribution from the first sub-data by varying the first decoder from the encoder comprises:

fusing the third sub data and the fourth sub data to obtain fused sub data;

6. The method of claim 5, wherein the fusing the third sub-data and the fourth sub-data comprises:

7. The method according to claim 5 or 6, characterized in that the method further comprises:

8. A method of decompressing data comprising:

acquiring a second bit stream;

9. The method of claim 8, wherein decoding the first sub-data from the second bit stream to obtain the first bit stream comprises:

acquiring prior distribution of hidden variables;

10. The method according to claim 8 or 9, wherein the first target data is an image block, and the first sub data and the second sub data are obtained by performing data segmentation on the image block; or,

11. The method according to any one of claims 8 to 10, wherein the first sub-data and the second sub-data are obtained by performing data segmentation on the image block in a spatial dimension or a channel dimension.

12. The method according to any one of claims 8 to 11, wherein the first decoder comprises a first convolutional neural network and a second convolutional neural network, wherein the deriving the first probability distribution from the first sub-data by varying the first decoder from the encoder comprises:

fusing the third sub data and the fourth sub data to obtain fused sub data;

13. The method of claim 12, wherein the fusing the third sub-data and the fourth sub-data comprises:

14. The method according to claim 12 or 13, characterized in that the method further comprises:

15. A data compression apparatus, comprising:

16. The apparatus of claim 15, wherein the first target data is an image block, and the first sub data and the second sub data are obtained by performing data segmentation on the image block; or,

17. The apparatus according to claim 15 or 16, wherein the compression module is specifically configured to:

18. The apparatus according to any one of claims 15 to 17, wherein the first decoder comprises a first convolutional neural network and a second convolutional neural network, the compression module being specifically configured to:

fusing the third sub data and the fourth sub data to obtain fused sub data;

19. The apparatus of claim 18, wherein the fusing the third sub-data and the fourth sub-data comprises:

20. The apparatus according to claim 18 or 19, characterized in that the apparatus further comprises:

21. A data decompression apparatus, comprising:

the acquisition module is used for acquiring a second bit stream;

22. The apparatus of claim 21, wherein the code receiving module is specifically configured to:

acquiring prior distribution of hidden variables;

23. The apparatus according to claim 21 or 22, wherein the first target data is an image block, and the first sub data and the second sub data are obtained by performing data segmentation on the image block; or,

24. The apparatus according to any one of claims 21 to 23, wherein the first decoder comprises a first convolutional neural network and a second convolutional neural network, and the decompression module is specifically configured to:

Fusing the third sub data and the fourth sub data to obtain fused sub data;

25. The apparatus of claim 24, wherein the fusing the third sub-data and the fourth sub-data comprises:

26. The apparatus according to claim 24 or 25, characterized in that the apparatus further comprises:

27. A data compression device comprising a storage medium, a processing circuit, and a bus system; wherein the storage medium is for storing instructions and the processing circuitry is for executing the instructions in the memory to perform the steps of the method of any one of claims 1 to 14.

28. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method of any of claims 1 to 14.

29. A computer program product, characterized in that it comprises code for implementing the steps of the method according to any one of claims 1 to 14, when said code is executed.