CN110751265A - Lightweight neural network construction method and system and electronic equipment - Google Patents
Lightweight neural network construction method and system and electronic equipment Download PDFInfo
- Publication number
- CN110751265A CN110751265A CN201910904649.9A CN201910904649A CN110751265A CN 110751265 A CN110751265 A CN 110751265A CN 201910904649 A CN201910904649 A CN 201910904649A CN 110751265 A CN110751265 A CN 110751265A
- Authority
- CN
- China
- Prior art keywords
- convolution
- tensor
- neural network
- decomposition
- matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 68
- 238000010276 construction Methods 0.000 title claims abstract description 28
- 238000000354 decomposition reaction Methods 0.000 claims abstract description 82
- 239000011159 matrix material Substances 0.000 claims abstract description 73
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 60
- 238000013139 quantization Methods 0.000 claims abstract description 54
- 238000000034 method Methods 0.000 claims abstract description 25
- 238000004364 calculation method Methods 0.000 claims abstract description 23
- 238000010586 diagram Methods 0.000 claims description 21
- 230000015654 memory Effects 0.000 claims description 16
- 230000006870 function Effects 0.000 claims description 13
- 230000004913 activation Effects 0.000 claims description 12
- 102000008297 Nuclear Matrix-Associated Proteins Human genes 0.000 claims description 6
- 108010035916 Nuclear Matrix-Associated Proteins Proteins 0.000 claims description 6
- 210000000299 nuclear matrix Anatomy 0.000 claims description 6
- 230000006835 compression Effects 0.000 description 5
- 238000007906 compression Methods 0.000 description 5
- 230000001133 acceleration Effects 0.000 description 3
- 238000004590 computer program Methods 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 238000000926 separation method Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000006837 decompression Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 229940050561 matrix product Drugs 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Mathematical Optimization (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The application relates to a light weight neural network construction method, a light weight neural network construction system and electronic equipment. The method comprises the following steps: step a: decomposing a1 x 1 convolution parameter matrix in the depth separable convolution by using a tensor train decomposition algorithm; step b: carrying out quantization operation on the matrix kernel parameters subjected to tensor decomposition by using a weight quantization algorithm to obtain an optimized depth separable convolution; step c: a lightweight neural network is constructed using the optimized deep separable convolution. By compressing the 1 × 1 convolution of the depth separable convolution by using a tensor train decomposition algorithm, the parameter quantity of the depth separable convolution is greatly reduced while the model performance is maintained. The core matrix parameters after tensor decomposition are quantized from 32 bits to low bits by using a weight quantization algorithm, so that the calculation amount of the model is reduced, the forward inference speed of the model is increased, and the light weight type neural network constructed by the method can be better deployed on embedded equipment with limited calculation amount and storage amount.
Description
Technical Field
The application belongs to the technical field of deep neural networks, and particularly relates to a light-weight neural network construction method, a light-weight neural network construction system and electronic equipment.
Background
With deep learning, the method has better and better effects in a plurality of fields such as image recognition, natural language processing, voice recognition and the like. In order to achieve extremely high accuracy, researchers generally adopt deeper and more complex network structures, but parameters and calculation amount of the neural network are greatly increased, requirements on hardware (a processor, a memory, a calculation card and bandwidth) are higher and higher, and it is difficult to directly deploy the large deep neural network on embedded equipment with limited calculation amount and storage amount and achieve available speed. With the application of artificial intelligence in various industries, the demand for deploying these large networks on embedded devices is increasing, and how to implement the compression and acceleration of neural networks is an important issue that must be considered for implementing the industrialization of artificial intelligence.
The deep neural network is deployed on the embedded device, and the problem of limited storage space and computing capacity of the device needs to be considered firstly, so that a light-weight neural network structure which is very compact and efficient needs to be designed. MobileNet [ Howard AG, Zhu M, Chen B, et al. Mobilenes: effective volatile neural networks for mobile vision applications [ J ]. arXiv preprinting arXiv:1704.04861,2017 ] is the most representative lightweight neural network at present, and Depth-separable convolution (Depth-wise separable convolution) is used to replace the traditional convolution operation, so that the operation amount of the convolution operation is obviously reduced on the basis of ensuring the model performance. The deep separable convolution divides the conventional convolution operation into two steps: the first step is Depthwise contribution, one Convolution kernel is only convoluted with one corresponding characteristic diagram; the second step is Pointwise Convolution, the size of the Convolution kernel is 1 × 1, namely 1 × 1 Convolution, and linear combination of different channels of the feature map is realized. The 1 × 1 convolution in the deep separable convolution can be regarded as mapping a set of feature maps by a fully-connected matrix, wherein the most significant parameter quantity comes from the fully-connected mapping matrix, which contains a large number of redundant parameters (the 1 × 1 convolution in MobileNet occupies about 75% of the parameter quantity and 95% of the calculated quantity).
Disclosure of Invention
The present application provides a method, a system and an electronic device for constructing a lightweight neural network, which aim to solve at least one of the above technical problems in the prior art to a certain extent.
In order to solve the above problems, the present application provides the following technical solutions:
a lightweight neural network construction method comprises the following steps:
step a: decomposing a1 x 1 convolution parameter matrix in the depth separable convolution by using a tensor train decomposition algorithm;
step b: carrying out quantization operation on the matrix kernel parameters subjected to tensor decomposition by using a weight quantization algorithm to obtain an optimized depth separable convolution;
step c: a lightweight neural network is constructed using the optimized deep separable convolution.
The technical scheme adopted by the embodiment of the application further comprises the following steps: in the step a, a convolution kernel parameter matrix is set to be 1 × 1 × M × N, where M is the number of input eigen map channels and N is the number of output eigen map channels, then the total parameter number of 1 × 1 convolution is MN, and the decomposition of the 1 × 1 convolution parameter matrix in the depth separable convolution by using a tensor train decomposition algorithm specifically includes:
step a 1: the current layer 1 x 1 convolution kernel parameter matrix is extracted and converted into dimension (m)1n1,...,mdnd) Of (b), wherein
Step a 2: carrying out tensor train decomposition on the tensor A to obtain a nuclear matrix Gk[mk,nk];
Step a 3: carrying out tensor train decomposition on the channel number M of the input characteristic diagram to obtainWherein,
step a 4: obtaining an output characteristic diagram after tensor operationWhereinThe operational procedure of the decomposed 1 × 1 convolution is expressed as:
the technical scheme adopted by the embodiment of the application further comprises the following steps: in the step b, the quantizing the matrix kernel parameters after tensor decomposition by using a weight quantization algorithm specifically includes:
step b 1: extracting the weight of the current layer, and calculating the values of a scaling coefficient S and a zero point Z;
step b 2: calculating a quantization value q corresponding to the actual value r through the scaling coefficient S and the zero point Z, wherein the q is r/S + Z;
step b 3: performing corresponding calculation on input data and the quantized weight parameters, and storing the obtained result in a uint32 form;
step b 4: adding the offset in the form of the uint32 and the result in step b3 and quantifying the result to the form of uint 8;
step b 5: the result in the form of the uint8 is input to the activation function, resulting in output data for the layer, the result being in the form of the uint 8.
The technical scheme adopted by the embodiment of the application further comprises the following steps: in the step c, the sizes of the separate convolution kernels of the lightweight neural network are all 3 × 3, and the activation function is Relu 6.
Another technical scheme adopted by the embodiment of the application is as follows: a lightweight neural network construction system, comprising:
a network decomposition module: for decomposing a1 x 1 convolution parameter matrix in a depth separable convolution using a tensor train decomposition algorithm;
a parameter quantization module: the matrix kernel parameter after tensor decomposition is subjected to quantization operation by using a weight quantization algorithm to obtain an optimized depth separable convolution;
a model construction module: for constructing lightweight neural networks using optimized deep separable convolutions.
The technical scheme adopted by the embodiment of the application further comprises the following steps: assuming that the convolution kernel parameter matrix is 1 × 1 × M × N, where M is the number of input eigen map channels and N is the number of output eigen map channels, the total number of parameters of 1 × 1 convolution is MN, and the network decomposition module decomposes the 1 × 1 convolution parameter matrix in the deep separable convolution by using a tensor train decomposition algorithm specifically as follows: the current layer 1 x 1 convolution kernel parameter matrix is extracted and converted into dimension (m)1n1,...,mdnd) Of (b), whereinCarrying out tensor train decomposition on the tensor A to obtain a nuclear matrix Gk[mk,nk](ii) a Carrying out tensor train decomposition on the channel number M of the input characteristic diagram to obtainWherein,obtaining an output characteristic diagram after tensor operationWhereinThe operational procedure of the decomposed 1 × 1 convolution is expressed as:
the technical scheme adopted by the embodiment of the application further comprises the following steps: the parameter quantization module performs quantization operation on the matrix kernel parameters after tensor decomposition by using a weight quantization algorithm, and specifically comprises the following steps: 1: extracting the weight of the current layer, and calculating the values of a scaling coefficient S and a zero point Z; 2: calculating a quantization value q corresponding to the actual value r through the scaling coefficient S and the zero point Z, wherein the q is r/S + Z; 3: performing corresponding calculation on input data and the quantized weight parameters, and storing the obtained result in a uint32 form; 4: adding the offset in the form of the uint32 and the result in 3 and quantizing the result to the form of uint 8; 5: the result in the form of the uint8 is input to the activation function, resulting in output data for the layer, the result being in the form of the uint 8.
The technical scheme adopted by the embodiment of the application further comprises the following steps: the sizes of the separation convolution kernels of the lightweight neural network are all 3 x 3, and the activation function is Relu 6.
The embodiment of the application adopts another technical scheme that: an electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the one processor to cause the at least one processor to perform the following operations of the lightweight neural network construction method described above:
step a: decomposing a1 x 1 convolution parameter matrix in the depth separable convolution by using a tensor train decomposition algorithm;
step b: carrying out quantization operation on the matrix kernel parameters subjected to tensor decomposition by using a weight quantization algorithm to obtain an optimized depth separable convolution;
step c: a lightweight neural network is constructed using the optimized deep separable convolution.
Compared with the prior art, the embodiment of the application has the advantages that: the lightweight neural network construction method, the system and the electronic device in the embodiment of the application compress the 1 × 1 convolution of the deep separable convolution by using the tensor train decomposition algorithm, so that the parameter quantity of the deep separable convolution is greatly reduced while the model performance is maintained. By using a weight quantization algorithm to quantize the kernel matrix parameters after tensor decomposition from 32 bits to low bits, the calculation amount of the model is reduced, and the forward inference speed of the model is increased. The lightweight neural network constructed by the method needs less storage space and less calculation power and can be better deployed on embedded devices with limited calculation amount and storage amount.
Drawings
FIG. 1 is a flow chart of a method of constructing a lightweight neural network according to an embodiment of the present application;
FIG. 2 is a tensor train exploded view;
FIG. 3 is a schematic diagram of a depth separable convolution after tensor train decomposition is introduced;
FIG. 4 is a schematic diagram of the operation process of the 1 × 1 convolution after quantization;
FIG. 5 is a schematic structural diagram of a lightweight neural network construction system according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a hardware device of a lightweight neural network construction method provided in an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
In order to solve the defects in the prior art, the application provides a depth separable convolution based on tensor train decomposition, and a set of lightweight neural network aiming at embedded equipment is built by combining a weight quantization algorithm. Firstly, aiming at the problem of 1 × 1 convolution parameter redundancy in the depth separable convolution, a tensor train decomposition algorithm is used for decomposing a1 × 1 convolution full-connection mapping matrix, and the parameter quantity of a light weight type neural network is further reduced; secondly, quantizing the matrix kernel parameters after tensor decomposition by using a weight quantization method, accelerating the forward inference speed of the model and reducing the size of the model; finally, a lightweight neural network for the embedded device is constructed using the optimized deep separable convolution.
Specifically, please refer to fig. 1, which is a flowchart of a method for constructing a lightweight neural network according to an embodiment of the present application. The lightweight neural network construction method comprises the following steps:
step 100: compressing a1 x 1 convolution parameter matrix in the depth separable convolution by using a tensor train decomposition algorithm, and reducing the parameter quantity of a lightweight network;
in step 100, a Tensor Train decomposition algorithm (Tensor-Train)0 is a Tensor decomposition algorithm, and each element in the high-dimensional Tensor can be expressed in a form of Matrix multiplication (Matrix Product State), that is:
A(i1,i2,...,id)=G1(i1)G2(i2)...Gd(id) (1)
in the formula (1), Gk(ik) Is a size rk-1×rkMatrix of rkIs the rank of decomposition (TT-ranks) of tensor trains, and is a scalar r to ensure that the result of the matrix multiplication is a scalar0=rk=1。
Figure 2 is a tensor train decomposition diagram. The tensor train decomposition algorithm is applied to the 1 x 1 convolution, so that the parameter quantity of the deep separable convolution can be effectively reduced, and good operation performance is kept. In the embodiment of the present application, the principle of compressing the 1 × 1 convolution parameter matrix in the depth separable convolution by using the tensor train decomposition algorithm is as follows: the essence of the 1 × 1 convolution is to perform linear combination on input feature maps to realize information exchange between the feature maps, and a convolution kernel parameter matrix is set to be 1 × 1 × M × N, where M is the number of channels of the input feature maps and N is the number of channels of the output feature maps, so that the total parameter number of the 1 × 1 convolution is MN. The parameter matrix is a full-connection matrix and contains a large amount of parameter redundancy, the parameter matrix is subjected to decompression by tensor train decomposition, the parameter amount of the model can be further reduced, and the depth separable convolution after the tensor train decomposition is introduced is shown in figure 3. The tensor train decomposition algorithm comprises the following concrete implementation steps:
step 101: the current layer 1 x 1 convolution kernel parameter matrix is extracted and converted into dimension (m)1n1,...,mdnd) Of (b), wherein
Step 102: carrying out tensor train decomposition on the tensor A obtained by conversion to obtain a nuclear matrix Gk[mk,nk];
Step 103: decomposing the channel number M of the input characteristic diagram in the same way to obtainWherein,
step 104: obtaining an output characteristic diagram after tensor operationWhereinThe operational procedure of the decomposed 1 × 1 convolution is expressed as:
step 200: carrying out quantization operation on the matrix kernel parameters subjected to tensor decomposition by using a weight quantization algorithm to obtain an optimized depth separable convolution;
in step 200, although the parameter quantity is reduced by the tensor train decomposition algorithm, the parameter matrix is decomposed into a plurality of matrix cores, the number of calculation layers of the matrix is increased, and the required calculation quantity is not obviously reduced. Therefore, in the present application, the weight quantization algorithm is applied to the decomposed 1 × 1 convolution parameter matrix kernel, and the 32bit parameter is quantized to a low bit (in the embodiment of the present application, it is preferable to quantize the 32bit parameter to 8bit, and specifically, the quantization can be set according to actual operation), so that the calculation amount can be obviously reduced, the forward operation speed of the model is increased, the size of the model is reduced, and the storage space required by the model is compressed. Weight quantization algorithm [ Krishnaoorthi. quantization depth relational networks for effectiveness reference: Awhitepaperpper [ J ]. arXiv prediction arXiv:1806.08342,2018 ]. The method is a neural network forward acceleration technology which is widely used at present, the weight parameters are quantized from 32 bits to low bits, the operation amount of the neural network can be obviously reduced, almost no precision loss can be realized, and a schematic diagram of the operation process of the 1 × 1 convolution after quantization is shown in fig. 4.
Specifically, the weight quantization algorithm includes the following steps:
step 201: extracting the weight of the current layer, and calculating the values of a scaling coefficient S and a zero point Z;
step 202: calculating a quantization value q corresponding to the actual value r through the scaling coefficient S and the zero point Z, wherein the q is r/S + Z;
step 203: performing corresponding calculation on input data and the quantized weight parameters, and storing the obtained result in a uint32 form;
step 204: adding the offset in the form of the uint32 to the result in step 203 and quantizing the result to the form of the uint8 in the same manner as steps 201 and 202;
step 205: the result of step 204 is input into the activation function to obtain the output data for the layer, the result being in the form of a agent 8.
Step 300: constructing a lightweight neural network for the embedded device using the optimized depth separable convolution;
in step 300, the lightweight neural network of the embodiment of the present application mainly comprises a depth separable convolution based on tensor train decomposition, the size of the separation convolution kernels is 3 × 3 by replacing the conventional depth separable convolution with the depth separable convolution based on tensor train decomposition, the superposition of a plurality of small 3 × 3 convolution kernels has fewer parameters and better nonlinear representation compared with a larger convolution kernel, the activation function uses Relu6, and Batch Normalization is used. And the depth separable convolution parameters after the tensor train is decomposed are quantized from 32 bits to 8 bits by using a weight quantization algorithm, so that the model precision is maintained, and the size and the inference speed of the model are obviously improved compared with those of the MobileNet.
For the ImageNet dataset, the main architecture of the lightweight neural network of the embodiments of the present application is shown in the following table:
table 1: main body architecture of light weight type neural network
The number of parameters before compression is 3.19M and the number of parameters after compression is 0.922M, which greatly reduces the amount of parameters for the deep separable convolution.
Please refer to fig. 5, which is a schematic structural diagram of a lightweight neural network construction system according to an embodiment of the present application. The lightweight neural network construction system comprises a network decomposition module, a parameter quantification module and a model construction module.
A network decomposition module: the method is used for compressing a1 x 1 convolution parameter matrix in the depth separable convolution by using a tensor train decomposition algorithm to reduce the parameter quantity of a lightweight network; the tensor train decomposition algorithm is a tensor decomposition algorithm, and each element in the high-dimensional tensor can be expressed in a Matrix multiplication (Matrix Product State) form, that is to say:
A(i1,i2,...,id)=G1(i1)G2(i2)...Gd(id) (1)
in the formula (1), Gk(ik) Is a size rk-1×rkMatrix of rkIs the rank of decomposition (TT-ranks) of tensor trains, and is a scalar r to ensure that the result of the matrix multiplication is a scalar0=rk=1。
The tensor train decomposition algorithm is applied to the 1 x 1 convolution, so that the parameter quantity of the deep separable convolution can be effectively reduced, and good operation performance is kept. In the embodiment of the present application, the principle of compressing the 1 × 1 convolution parameter matrix in the depth separable convolution by using the tensor train decomposition algorithm is as follows: the essence of the 1 × 1 convolution is to perform linear combination on input feature maps to realize information exchange between the feature maps, and a convolution kernel parameter matrix is set to be 1 × 1 × M × N, where M is the number of channels of the input feature maps and N is the number of channels of the output feature maps, so that the total parameter number of the 1 × 1 convolution is MN. The parameter matrix is a full-connection matrix and contains a large amount of parameter redundancy, the parameter matrix is subjected to decompression by tensor train decomposition, the parameter amount of the model can be further reduced, and the depth separable convolution after the tensor train decomposition is introduced is shown in figure 3.
The tensor train decomposition algorithm specifically comprises the following steps: the current layer 1 x 1 convolution kernel parameter matrix is extracted and converted into dimension (m)1n1,...,mdnd) Of (b), whereinCarrying out tensor train decomposition on the tensor A obtained by conversion to obtain a nuclear matrix Gk[mk,nk](ii) a Decomposing the channel number M of the input characteristic diagram in the same way to obtainWherein,obtaining an output characteristic diagram after tensor operationWhereinThe operational procedure of the decomposed 1 × 1 convolution is expressed as:
a parameter quantization module: the matrix kernel parameter after tensor decomposition is subjected to quantization operation by using a weight quantization algorithm to obtain an optimized depth separable convolution; although the parameter quantity is reduced by the tensor train decomposition algorithm, the parameter matrix is decomposed into a plurality of matrix cores, the number of operation layers of the matrix is increased, and the required calculation quantity is not obviously reduced. Therefore, the weight quantization algorithm is applied to the decomposed 1 × 1 convolution parameter matrix kernel, the 32bit parameter is quantized to a low bit, the calculation amount can be obviously reduced, the forward calculation speed of the model is accelerated, the size of the model is reduced, and the storage space required by the model is compressed. The weight quantization algorithm is a neural network forward acceleration technology which is widely used at present, the weight parameters are quantized from 32 bits to low bits, the operation amount of the neural network can be obviously reduced, and almost no precision loss can be realized.
Specifically, the weight quantization algorithm specifically includes:
1: extracting the weight of the current layer, and calculating the values of a scaling coefficient S and a zero point Z;
2: calculating a quantization value q corresponding to the actual value r through the scaling coefficient S and the zero point Z, wherein the q is r/S + Z;
3: performing corresponding calculation on input data and the quantized weight parameters, and storing the obtained result in a uint32 form;
4: adding the offset in the form of the uint32 and the result in 3 and quantizing the result to the uint8 form in the same manner;
5: and inputting the result in the step 4 into the activation function to obtain the output data of the layer, wherein the result is in the form of the agent 8.
A model construction module: for constructing a lightweight neural network for the embedded device using the optimized deep separable convolution; the lightweight neural network of the embodiment of the application mainly comprises the depth separable convolution based on tensor train decomposition, the traditional depth separable convolution is replaced by the depth separable convolution based on tensor train decomposition, the sizes of separation convolution kernels are all 3 x 3, the superposition of a plurality of small 3 x 3 convolution kernels has fewer parameters and better nonlinear representation compared with a larger convolution kernel, the Relu6 is used as an activation function, and Batch Normalization is used. And the depth separable convolution parameters after the tensor train is decomposed are quantized from 32 bits to 8 bits by using a weight quantization algorithm, so that the model precision is maintained, and the size and the inference speed of the model are obviously improved compared with those of the MobileNet.
For the ImageNet dataset, the main architecture of the lightweight neural network of the embodiments of the present application is shown in the following table:
table 1: main body architecture of light weight type neural network
The number of parameters before compression is 3.19M and the number of parameters after compression is 0.922M, which greatly reduces the amount of parameters for the deep separable convolution.
Fig. 6 is a schematic structural diagram of a hardware device of a lightweight neural network construction method provided in an embodiment of the present application. As shown in fig. 6, the device includes one or more processors and memory. Taking a processor as an example, the apparatus may further include: an input system and an output system.
The processor, memory, input system, and output system may be connected by a bus or other means, as exemplified by the bus connection in fig. 6.
The memory, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules. The processor executes various functional applications and data processing of the electronic device, i.e., implements the processing method of the above-described method embodiment, by executing the non-transitory software program, instructions and modules stored in the memory.
The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data and the like. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the processing system over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input system may receive input numeric or character information and generate a signal input. The output system may include a display device such as a display screen.
The one or more modules are stored in the memory and, when executed by the one or more processors, perform the following for any of the above method embodiments:
step a: decomposing a1 x 1 convolution parameter matrix in the depth separable convolution by using a tensor train decomposition algorithm;
step b: carrying out quantization operation on the matrix kernel parameters subjected to tensor decomposition by using a weight quantization algorithm to obtain an optimized depth separable convolution;
step c: a lightweight neural network is constructed using the optimized deep separable convolution.
The product can execute the method provided by the embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the methods provided in the embodiments of the present application.
Embodiments of the present application provide a non-transitory (non-volatile) computer storage medium having stored thereon computer-executable instructions that may perform the following operations:
step a: decomposing a1 x 1 convolution parameter matrix in the depth separable convolution by using a tensor train decomposition algorithm;
step b: carrying out quantization operation on the matrix kernel parameters subjected to tensor decomposition by using a weight quantization algorithm to obtain an optimized depth separable convolution;
step c: a lightweight neural network is constructed using the optimized deep separable convolution.
Embodiments of the present application provide a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions that, when executed by a computer, cause the computer to perform the following:
step a: decomposing a1 x 1 convolution parameter matrix in the depth separable convolution by using a tensor train decomposition algorithm;
step b: carrying out quantization operation on the matrix kernel parameters subjected to tensor decomposition by using a weight quantization algorithm to obtain an optimized depth separable convolution;
step c: a lightweight neural network is constructed using the optimized deep separable convolution.
The lightweight neural network construction method, the system and the electronic device in the embodiment of the application compress the 1 × 1 convolution of the deep separable convolution by using the tensor train decomposition algorithm, so that the parameter quantity of the deep separable convolution is greatly reduced while the model performance is maintained. By using a weight quantization algorithm to quantize the kernel matrix parameters after tensor decomposition from 32 bits to low bits, the calculation amount of the model is reduced, and the forward inference speed of the model is increased. The lightweight neural network constructed by the method needs less storage space and less calculation power and can be better deployed on embedded devices with limited calculation amount and storage amount.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (9)
1. A lightweight neural network construction method is characterized by comprising the following steps:
step a: decomposing a1 x 1 convolution parameter matrix in the depth separable convolution by using a tensor train decomposition algorithm;
step b: carrying out quantization operation on the matrix kernel parameters subjected to tensor decomposition by using a weight quantization algorithm to obtain an optimized depth separable convolution;
step c: a lightweight neural network is constructed using the optimized deep separable convolution.
2. A lightweight neural network construction method according to claim 1, wherein in step a, assuming that the convolution kernel parameter matrix is 1 × 1 × M × N, where M is the number of input eigen map channels and N is the number of output eigen map channels, the total parameter number of 1 × 1 convolution is MN, and the decomposing the 1 × 1 convolution parameter matrix in the depth separable convolution by using the tensor train decomposition algorithm is specifically:
step a 1: the current layer 1 x 1 convolution kernel parameter matrix is extracted and converted into dimension (m)1n1,...,mdnd) Of (b), wherein
Step a 2: carrying out tensor train decomposition on the tensor A to obtain a nuclear matrix Gk[mk,nk];
Step a 3: carrying out tensor train decomposition on the channel number M of the input characteristic diagram to obtainWherein,
step a 4: obtaining an output characteristic diagram after tensor operationWhereinThe operational procedure of the decomposed 1 × 1 convolution is expressed as:
3. a lightweight neural network construction method according to claim 2, wherein in step b, the quantizing the matrix kernel parameters after tensor decomposition by using a weight quantizing algorithm specifically includes:
step b 1: extracting the weight of the current layer, and calculating the values of a scaling coefficient S and a zero point Z;
step b 2: calculating a quantization value q corresponding to the actual value r through the scaling coefficient S and the zero point Z, wherein the q is r/S + Z;
step b 3: performing corresponding calculation on input data and the quantized weight parameters, and storing the obtained result in a uint32 form;
step b 4: adding the offset in the form of the uint32 and the result in step b3 and quantifying the result to the form of uint 8;
step b 5: inputting the result of the form of the uint8 described in the step b4 into the activation function to obtain the output data of the layer, and the result is in the form of the uint 8.
4. A method for constructing a lightweight neural network as claimed in any one of claims 1 to 3, wherein in step c, the sizes of the discrete convolution kernels of the lightweight neural network are 3 x 3, and the activation function is Relu 6.
5. A lightweight neural network construction system, comprising:
a network decomposition module: for decomposing a1 x 1 convolution parameter matrix in a depth separable convolution using a tensor train decomposition algorithm;
a parameter quantization module: the matrix kernel parameter after tensor decomposition is subjected to quantization operation by using a weight quantization algorithm to obtain an optimized depth separable convolution;
a model construction module: for constructing lightweight neural networks using optimized deep separable convolutions.
6. A lightweight neural network construction system as claimed in claim 5, wherein assuming that the convolution kernel parameter matrix is 1 x M x N, where M is the number of input eigen map channels and N is the number of output eigen map channels, the total parameter number of 1 x 1 convolution is MN, and said network decomposition module decomposes the 1 x 1 convolution parameter matrix in the deep separable convolution using tensor train decomposition algorithm specifically: the current layer 1 x 1 convolution kernel parameter matrix is extracted and converted into dimension (m)1n1,...,mdnd) Of (b), whereinCarrying out tensor train decomposition on the tensor A to obtain a nuclear matrix Gk[mk,nk](ii) a Carrying out tensor train decomposition on the channel number M of the input characteristic diagram to obtainWherein,obtaining an output characteristic diagram after tensor operationWhereinThe operational procedure of the decomposed 1 × 1 convolution is expressed as:
7. the system according to claim 6, wherein the quantizing module quantizes the matrix kernel parameters after tensor decomposition by using a weight quantizing algorithm specifically includes: 1: extracting the weight of the current layer, and calculating the values of a scaling coefficient S and a zero point Z; 2: calculating a quantization value q corresponding to the actual value r through the scaling coefficient S and the zero point Z, wherein the q is r/S + Z; 3: performing corresponding calculation on input data and the quantized weight parameters, and storing the obtained result in a uint32 form; 4: adding the offset in the form of the uint32 and the result in 3 and quantizing the result to the form of uint 8; 5: the result in the form of the uint8 is input to the activation function, resulting in output data for the layer, the result being in the form of the uint 8.
8. A lightweight neural network construction system as claimed in any one of claims 5 to 7, wherein the discrete convolution kernels of the lightweight neural network are each 3 x 3 in size and the activation function is Relu 6.
9. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the following operations of the lightweight neural network construction method of any one of claims 1 to 4 above:
step a: decomposing a1 x 1 convolution parameter matrix in the depth separable convolution by using a tensor train decomposition algorithm;
step b: carrying out quantization operation on the matrix kernel parameters subjected to tensor decomposition by using a weight quantization algorithm to obtain an optimized depth separable convolution;
step c: a lightweight neural network is constructed using the optimized deep separable convolution.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910904649.9A CN110751265A (en) | 2019-09-24 | 2019-09-24 | Lightweight neural network construction method and system and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910904649.9A CN110751265A (en) | 2019-09-24 | 2019-09-24 | Lightweight neural network construction method and system and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110751265A true CN110751265A (en) | 2020-02-04 |
Family
ID=69276977
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910904649.9A Pending CN110751265A (en) | 2019-09-24 | 2019-09-24 | Lightweight neural network construction method and system and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110751265A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111291317A (en) * | 2020-02-26 | 2020-06-16 | 上海海事大学 | Approximate matrix convolution neural network binary greedy recursion method |
CN113470653A (en) * | 2020-03-31 | 2021-10-01 | 华为技术有限公司 | Voiceprint recognition method, electronic equipment and system |
CN113537485A (en) * | 2020-04-15 | 2021-10-22 | 北京金山数字娱乐科技有限公司 | Neural network model compression method and device |
CN113869517A (en) * | 2020-06-30 | 2021-12-31 | 阿里巴巴集团控股有限公司 | Inference method based on deep learning model |
WO2022068623A1 (en) * | 2020-09-30 | 2022-04-07 | 华为技术有限公司 | Model training method and related device |
US11562231B2 (en) * | 2018-09-03 | 2023-01-24 | Tesla, Inc. | Neural networks for embedded devices |
EP4241206A4 (en) * | 2020-12-01 | 2024-01-03 | Huawei Technologies Co., Ltd. | Device and method for implementing a tensor-train decomposition operation |
-
2019
- 2019-09-24 CN CN201910904649.9A patent/CN110751265A/en active Pending
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11562231B2 (en) * | 2018-09-03 | 2023-01-24 | Tesla, Inc. | Neural networks for embedded devices |
US11983630B2 (en) | 2018-09-03 | 2024-05-14 | Tesla, Inc. | Neural networks for embedded devices |
CN111291317A (en) * | 2020-02-26 | 2020-06-16 | 上海海事大学 | Approximate matrix convolution neural network binary greedy recursion method |
CN111291317B (en) * | 2020-02-26 | 2023-03-24 | 上海海事大学 | Approximate matrix convolution neural network binary greedy recursion method |
CN113470653A (en) * | 2020-03-31 | 2021-10-01 | 华为技术有限公司 | Voiceprint recognition method, electronic equipment and system |
CN113537485A (en) * | 2020-04-15 | 2021-10-22 | 北京金山数字娱乐科技有限公司 | Neural network model compression method and device |
CN113869517A (en) * | 2020-06-30 | 2021-12-31 | 阿里巴巴集团控股有限公司 | Inference method based on deep learning model |
WO2022068623A1 (en) * | 2020-09-30 | 2022-04-07 | 华为技术有限公司 | Model training method and related device |
EP4241206A4 (en) * | 2020-12-01 | 2024-01-03 | Huawei Technologies Co., Ltd. | Device and method for implementing a tensor-train decomposition operation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110751265A (en) | Lightweight neural network construction method and system and electronic equipment | |
CN107516129B (en) | Dimension self-adaptive Tucker decomposition-based deep network compression method | |
Liu et al. | Frequency-domain dynamic pruning for convolutional neural networks | |
WO2020233130A1 (en) | Deep neural network compression method and related device | |
CN111079781B (en) | Lightweight convolutional neural network image recognition method based on low rank and sparse decomposition | |
Yu et al. | On compressing deep models by low rank and sparse decomposition | |
CN110663048B (en) | Execution method, execution device, learning method, learning device, and recording medium for deep neural network | |
CN107395211B (en) | Data processing method and device based on convolutional neural network model | |
CN112508125A (en) | Efficient full-integer quantization method of image detection model | |
CN106557812A (en) | The compression of depth convolutional neural networks and speeding scheme based on dct transform | |
CN113595993B (en) | Vehicle-mounted sensing equipment joint learning method for model structure optimization under edge calculation | |
CN108197707A (en) | Compression method based on the convolutional neural networks that global error is rebuild | |
CN115713109A (en) | Multi-head attention model compression method for image classification | |
WO2023051335A1 (en) | Data encoding method, data decoding method, and data processing apparatus | |
CN114978189A (en) | Data coding method and related equipment | |
CN114154626B (en) | Filter pruning method for image classification task | |
Qi et al. | Learning low resource consumption cnn through pruning and quantization | |
CN115022637A (en) | Image coding method, image decompression method and device | |
CN111882028B (en) | Convolution operation device for convolution neural network | |
CN114677545B (en) | Lightweight image classification method based on similarity pruning and efficient module | |
CN114372565B (en) | Target detection network compression method for edge equipment | |
CN115564043A (en) | Image classification model pruning method and device, electronic equipment and storage medium | |
Brillet et al. | Tunable cnn compression through dimensionality reduction | |
CN114154621A (en) | Convolutional neural network image processing method and device based on FPGA | |
CN114841342A (en) | Tensor-based efficient Transformer construction method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200204 |
|
RJ01 | Rejection of invention patent application after publication |