CN110751265A - Lightweight neural network construction method and system and electronic equipment - Google Patents

Lightweight neural network construction method and system and electronic equipment Download PDF

Info

Publication number
CN110751265A
CN110751265A CN201910904649.9A CN201910904649A CN110751265A CN 110751265 A CN110751265 A CN 110751265A CN 201910904649 A CN201910904649 A CN 201910904649A CN 110751265 A CN110751265 A CN 110751265A
Authority
CN
China
Prior art keywords
convolution
tensor
neural network
decomposition
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910904649.9A
Other languages
Chinese (zh)
Inventor
周阳
张涌
宁立
王书强
邬晶晶
姜元爽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN201910904649.9A priority Critical patent/CN110751265A/en
Publication of CN110751265A publication Critical patent/CN110751265A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Mathematical Optimization (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The application relates to a light weight neural network construction method, a light weight neural network construction system and electronic equipment. The method comprises the following steps: step a: decomposing a1 x 1 convolution parameter matrix in the depth separable convolution by using a tensor train decomposition algorithm; step b: carrying out quantization operation on the matrix kernel parameters subjected to tensor decomposition by using a weight quantization algorithm to obtain an optimized depth separable convolution; step c: a lightweight neural network is constructed using the optimized deep separable convolution. By compressing the 1 × 1 convolution of the depth separable convolution by using a tensor train decomposition algorithm, the parameter quantity of the depth separable convolution is greatly reduced while the model performance is maintained. The core matrix parameters after tensor decomposition are quantized from 32 bits to low bits by using a weight quantization algorithm, so that the calculation amount of the model is reduced, the forward inference speed of the model is increased, and the light weight type neural network constructed by the method can be better deployed on embedded equipment with limited calculation amount and storage amount.

Description

Lightweight neural network construction method and system and electronic equipment
Technical Field
The application belongs to the technical field of deep neural networks, and particularly relates to a light-weight neural network construction method, a light-weight neural network construction system and electronic equipment.
Background
With deep learning, the method has better and better effects in a plurality of fields such as image recognition, natural language processing, voice recognition and the like. In order to achieve extremely high accuracy, researchers generally adopt deeper and more complex network structures, but parameters and calculation amount of the neural network are greatly increased, requirements on hardware (a processor, a memory, a calculation card and bandwidth) are higher and higher, and it is difficult to directly deploy the large deep neural network on embedded equipment with limited calculation amount and storage amount and achieve available speed. With the application of artificial intelligence in various industries, the demand for deploying these large networks on embedded devices is increasing, and how to implement the compression and acceleration of neural networks is an important issue that must be considered for implementing the industrialization of artificial intelligence.
The deep neural network is deployed on the embedded device, and the problem of limited storage space and computing capacity of the device needs to be considered firstly, so that a light-weight neural network structure which is very compact and efficient needs to be designed. MobileNet [ Howard AG, Zhu M, Chen B, et al. Mobilenes: effective volatile neural networks for mobile vision applications [ J ]. arXiv preprinting arXiv:1704.04861,2017 ] is the most representative lightweight neural network at present, and Depth-separable convolution (Depth-wise separable convolution) is used to replace the traditional convolution operation, so that the operation amount of the convolution operation is obviously reduced on the basis of ensuring the model performance. The deep separable convolution divides the conventional convolution operation into two steps: the first step is Depthwise contribution, one Convolution kernel is only convoluted with one corresponding characteristic diagram; the second step is Pointwise Convolution, the size of the Convolution kernel is 1 × 1, namely 1 × 1 Convolution, and linear combination of different channels of the feature map is realized. The 1 × 1 convolution in the deep separable convolution can be regarded as mapping a set of feature maps by a fully-connected matrix, wherein the most significant parameter quantity comes from the fully-connected mapping matrix, which contains a large number of redundant parameters (the 1 × 1 convolution in MobileNet occupies about 75% of the parameter quantity and 95% of the calculated quantity).
Disclosure of Invention
The present application provides a method, a system and an electronic device for constructing a lightweight neural network, which aim to solve at least one of the above technical problems in the prior art to a certain extent.
In order to solve the above problems, the present application provides the following technical solutions:
a lightweight neural network construction method comprises the following steps:
step a: decomposing a1 x 1 convolution parameter matrix in the depth separable convolution by using a tensor train decomposition algorithm;
step b: carrying out quantization operation on the matrix kernel parameters subjected to tensor decomposition by using a weight quantization algorithm to obtain an optimized depth separable convolution;
step c: a lightweight neural network is constructed using the optimized deep separable convolution.
The technical scheme adopted by the embodiment of the application further comprises the following steps: in the step a, a convolution kernel parameter matrix is set to be 1 × 1 × M × N, where M is the number of input eigen map channels and N is the number of output eigen map channels, then the total parameter number of 1 × 1 convolution is MN, and the decomposition of the 1 × 1 convolution parameter matrix in the depth separable convolution by using a tensor train decomposition algorithm specifically includes:
step a 1: the current layer 1 x 1 convolution kernel parameter matrix is extracted and converted into dimension (m)1n1,...,mdnd) Of (b), wherein
Figure BDA0002212918980000031
Step a 2: carrying out tensor train decomposition on the tensor A to obtain a nuclear matrix Gk[mk,nk];
Step a 3: carrying out tensor train decomposition on the channel number M of the input characteristic diagram to obtain
Figure BDA0002212918980000036
Wherein,
Figure BDA0002212918980000032
step a 4: obtaining an output characteristic diagram after tensor operation
Figure BDA0002212918980000033
Wherein
Figure BDA0002212918980000034
The operational procedure of the decomposed 1 × 1 convolution is expressed as:
Figure BDA0002212918980000035
the technical scheme adopted by the embodiment of the application further comprises the following steps: in the step b, the quantizing the matrix kernel parameters after tensor decomposition by using a weight quantization algorithm specifically includes:
step b 1: extracting the weight of the current layer, and calculating the values of a scaling coefficient S and a zero point Z;
step b 2: calculating a quantization value q corresponding to the actual value r through the scaling coefficient S and the zero point Z, wherein the q is r/S + Z;
step b 3: performing corresponding calculation on input data and the quantized weight parameters, and storing the obtained result in a uint32 form;
step b 4: adding the offset in the form of the uint32 and the result in step b3 and quantifying the result to the form of uint 8;
step b 5: the result in the form of the uint8 is input to the activation function, resulting in output data for the layer, the result being in the form of the uint 8.
The technical scheme adopted by the embodiment of the application further comprises the following steps: in the step c, the sizes of the separate convolution kernels of the lightweight neural network are all 3 × 3, and the activation function is Relu 6.
Another technical scheme adopted by the embodiment of the application is as follows: a lightweight neural network construction system, comprising:
a network decomposition module: for decomposing a1 x 1 convolution parameter matrix in a depth separable convolution using a tensor train decomposition algorithm;
a parameter quantization module: the matrix kernel parameter after tensor decomposition is subjected to quantization operation by using a weight quantization algorithm to obtain an optimized depth separable convolution;
a model construction module: for constructing lightweight neural networks using optimized deep separable convolutions.
The technical scheme adopted by the embodiment of the application further comprises the following steps: assuming that the convolution kernel parameter matrix is 1 × 1 × M × N, where M is the number of input eigen map channels and N is the number of output eigen map channels, the total number of parameters of 1 × 1 convolution is MN, and the network decomposition module decomposes the 1 × 1 convolution parameter matrix in the deep separable convolution by using a tensor train decomposition algorithm specifically as follows: the current layer 1 x 1 convolution kernel parameter matrix is extracted and converted into dimension (m)1n1,...,mdnd) Of (b), wherein
Figure BDA0002212918980000041
Carrying out tensor train decomposition on the tensor A to obtain a nuclear matrix Gk[mk,nk](ii) a Carrying out tensor train decomposition on the channel number M of the input characteristic diagram to obtain
Figure BDA0002212918980000042
Wherein,
Figure BDA0002212918980000043
obtaining an output characteristic diagram after tensor operation
Figure BDA0002212918980000044
Wherein
Figure BDA0002212918980000045
The operational procedure of the decomposed 1 × 1 convolution is expressed as:
Figure BDA0002212918980000051
the technical scheme adopted by the embodiment of the application further comprises the following steps: the parameter quantization module performs quantization operation on the matrix kernel parameters after tensor decomposition by using a weight quantization algorithm, and specifically comprises the following steps: 1: extracting the weight of the current layer, and calculating the values of a scaling coefficient S and a zero point Z; 2: calculating a quantization value q corresponding to the actual value r through the scaling coefficient S and the zero point Z, wherein the q is r/S + Z; 3: performing corresponding calculation on input data and the quantized weight parameters, and storing the obtained result in a uint32 form; 4: adding the offset in the form of the uint32 and the result in 3 and quantizing the result to the form of uint 8; 5: the result in the form of the uint8 is input to the activation function, resulting in output data for the layer, the result being in the form of the uint 8.
The technical scheme adopted by the embodiment of the application further comprises the following steps: the sizes of the separation convolution kernels of the lightweight neural network are all 3 x 3, and the activation function is Relu 6.
The embodiment of the application adopts another technical scheme that: an electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the one processor to cause the at least one processor to perform the following operations of the lightweight neural network construction method described above:
step a: decomposing a1 x 1 convolution parameter matrix in the depth separable convolution by using a tensor train decomposition algorithm;
step b: carrying out quantization operation on the matrix kernel parameters subjected to tensor decomposition by using a weight quantization algorithm to obtain an optimized depth separable convolution;
step c: a lightweight neural network is constructed using the optimized deep separable convolution.
Compared with the prior art, the embodiment of the application has the advantages that: the lightweight neural network construction method, the system and the electronic device in the embodiment of the application compress the 1 × 1 convolution of the deep separable convolution by using the tensor train decomposition algorithm, so that the parameter quantity of the deep separable convolution is greatly reduced while the model performance is maintained. By using a weight quantization algorithm to quantize the kernel matrix parameters after tensor decomposition from 32 bits to low bits, the calculation amount of the model is reduced, and the forward inference speed of the model is increased. The lightweight neural network constructed by the method needs less storage space and less calculation power and can be better deployed on embedded devices with limited calculation amount and storage amount.
Drawings
FIG. 1 is a flow chart of a method of constructing a lightweight neural network according to an embodiment of the present application;
FIG. 2 is a tensor train exploded view;
FIG. 3 is a schematic diagram of a depth separable convolution after tensor train decomposition is introduced;
FIG. 4 is a schematic diagram of the operation process of the 1 × 1 convolution after quantization;
FIG. 5 is a schematic structural diagram of a lightweight neural network construction system according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a hardware device of a lightweight neural network construction method provided in an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
In order to solve the defects in the prior art, the application provides a depth separable convolution based on tensor train decomposition, and a set of lightweight neural network aiming at embedded equipment is built by combining a weight quantization algorithm. Firstly, aiming at the problem of 1 × 1 convolution parameter redundancy in the depth separable convolution, a tensor train decomposition algorithm is used for decomposing a1 × 1 convolution full-connection mapping matrix, and the parameter quantity of a light weight type neural network is further reduced; secondly, quantizing the matrix kernel parameters after tensor decomposition by using a weight quantization method, accelerating the forward inference speed of the model and reducing the size of the model; finally, a lightweight neural network for the embedded device is constructed using the optimized deep separable convolution.
Specifically, please refer to fig. 1, which is a flowchart of a method for constructing a lightweight neural network according to an embodiment of the present application. The lightweight neural network construction method comprises the following steps:
step 100: compressing a1 x 1 convolution parameter matrix in the depth separable convolution by using a tensor train decomposition algorithm, and reducing the parameter quantity of a lightweight network;
in step 100, a Tensor Train decomposition algorithm (Tensor-Train)0 is a Tensor decomposition algorithm, and each element in the high-dimensional Tensor can be expressed in a form of Matrix multiplication (Matrix Product State), that is:
A(i1,i2,...,id)=G1(i1)G2(i2)...Gd(id) (1)
in the formula (1), Gk(ik) Is a size rk-1×rkMatrix of rkIs the rank of decomposition (TT-ranks) of tensor trains, and is a scalar r to ensure that the result of the matrix multiplication is a scalar0=rk=1。
Figure 2 is a tensor train decomposition diagram. The tensor train decomposition algorithm is applied to the 1 x 1 convolution, so that the parameter quantity of the deep separable convolution can be effectively reduced, and good operation performance is kept. In the embodiment of the present application, the principle of compressing the 1 × 1 convolution parameter matrix in the depth separable convolution by using the tensor train decomposition algorithm is as follows: the essence of the 1 × 1 convolution is to perform linear combination on input feature maps to realize information exchange between the feature maps, and a convolution kernel parameter matrix is set to be 1 × 1 × M × N, where M is the number of channels of the input feature maps and N is the number of channels of the output feature maps, so that the total parameter number of the 1 × 1 convolution is MN. The parameter matrix is a full-connection matrix and contains a large amount of parameter redundancy, the parameter matrix is subjected to decompression by tensor train decomposition, the parameter amount of the model can be further reduced, and the depth separable convolution after the tensor train decomposition is introduced is shown in figure 3. The tensor train decomposition algorithm comprises the following concrete implementation steps:
step 101: the current layer 1 x 1 convolution kernel parameter matrix is extracted and converted into dimension (m)1n1,...,mdnd) Of (b), wherein
Figure BDA0002212918980000081
Step 102: carrying out tensor train decomposition on the tensor A obtained by conversion to obtain a nuclear matrix Gk[mk,nk];
Step 103: decomposing the channel number M of the input characteristic diagram in the same way to obtain
Figure BDA0002212918980000082
Wherein,
Figure BDA0002212918980000083
step 104: obtaining an output characteristic diagram after tensor operation
Figure BDA0002212918980000084
Wherein
Figure BDA0002212918980000085
The operational procedure of the decomposed 1 × 1 convolution is expressed as:
Figure BDA0002212918980000086
step 200: carrying out quantization operation on the matrix kernel parameters subjected to tensor decomposition by using a weight quantization algorithm to obtain an optimized depth separable convolution;
in step 200, although the parameter quantity is reduced by the tensor train decomposition algorithm, the parameter matrix is decomposed into a plurality of matrix cores, the number of calculation layers of the matrix is increased, and the required calculation quantity is not obviously reduced. Therefore, in the present application, the weight quantization algorithm is applied to the decomposed 1 × 1 convolution parameter matrix kernel, and the 32bit parameter is quantized to a low bit (in the embodiment of the present application, it is preferable to quantize the 32bit parameter to 8bit, and specifically, the quantization can be set according to actual operation), so that the calculation amount can be obviously reduced, the forward operation speed of the model is increased, the size of the model is reduced, and the storage space required by the model is compressed. Weight quantization algorithm [ Krishnaoorthi. quantization depth relational networks for effectiveness reference: Awhitepaperpper [ J ]. arXiv prediction arXiv:1806.08342,2018 ]. The method is a neural network forward acceleration technology which is widely used at present, the weight parameters are quantized from 32 bits to low bits, the operation amount of the neural network can be obviously reduced, almost no precision loss can be realized, and a schematic diagram of the operation process of the 1 × 1 convolution after quantization is shown in fig. 4.
Specifically, the weight quantization algorithm includes the following steps:
step 201: extracting the weight of the current layer, and calculating the values of a scaling coefficient S and a zero point Z;
step 202: calculating a quantization value q corresponding to the actual value r through the scaling coefficient S and the zero point Z, wherein the q is r/S + Z;
step 203: performing corresponding calculation on input data and the quantized weight parameters, and storing the obtained result in a uint32 form;
step 204: adding the offset in the form of the uint32 to the result in step 203 and quantizing the result to the form of the uint8 in the same manner as steps 201 and 202;
step 205: the result of step 204 is input into the activation function to obtain the output data for the layer, the result being in the form of a agent 8.
Step 300: constructing a lightweight neural network for the embedded device using the optimized depth separable convolution;
in step 300, the lightweight neural network of the embodiment of the present application mainly comprises a depth separable convolution based on tensor train decomposition, the size of the separation convolution kernels is 3 × 3 by replacing the conventional depth separable convolution with the depth separable convolution based on tensor train decomposition, the superposition of a plurality of small 3 × 3 convolution kernels has fewer parameters and better nonlinear representation compared with a larger convolution kernel, the activation function uses Relu6, and Batch Normalization is used. And the depth separable convolution parameters after the tensor train is decomposed are quantized from 32 bits to 8 bits by using a weight quantization algorithm, so that the model precision is maintained, and the size and the inference speed of the model are obviously improved compared with those of the MobileNet.
For the ImageNet dataset, the main architecture of the lightweight neural network of the embodiments of the present application is shown in the following table:
table 1: main body architecture of light weight type neural network
Figure BDA0002212918980000101
The number of parameters before compression is 3.19M and the number of parameters after compression is 0.922M, which greatly reduces the amount of parameters for the deep separable convolution.
Please refer to fig. 5, which is a schematic structural diagram of a lightweight neural network construction system according to an embodiment of the present application. The lightweight neural network construction system comprises a network decomposition module, a parameter quantification module and a model construction module.
A network decomposition module: the method is used for compressing a1 x 1 convolution parameter matrix in the depth separable convolution by using a tensor train decomposition algorithm to reduce the parameter quantity of a lightweight network; the tensor train decomposition algorithm is a tensor decomposition algorithm, and each element in the high-dimensional tensor can be expressed in a Matrix multiplication (Matrix Product State) form, that is to say:
A(i1,i2,...,id)=G1(i1)G2(i2)...Gd(id) (1)
in the formula (1), Gk(ik) Is a size rk-1×rkMatrix of rkIs the rank of decomposition (TT-ranks) of tensor trains, and is a scalar r to ensure that the result of the matrix multiplication is a scalar0=rk=1。
The tensor train decomposition algorithm is applied to the 1 x 1 convolution, so that the parameter quantity of the deep separable convolution can be effectively reduced, and good operation performance is kept. In the embodiment of the present application, the principle of compressing the 1 × 1 convolution parameter matrix in the depth separable convolution by using the tensor train decomposition algorithm is as follows: the essence of the 1 × 1 convolution is to perform linear combination on input feature maps to realize information exchange between the feature maps, and a convolution kernel parameter matrix is set to be 1 × 1 × M × N, where M is the number of channels of the input feature maps and N is the number of channels of the output feature maps, so that the total parameter number of the 1 × 1 convolution is MN. The parameter matrix is a full-connection matrix and contains a large amount of parameter redundancy, the parameter matrix is subjected to decompression by tensor train decomposition, the parameter amount of the model can be further reduced, and the depth separable convolution after the tensor train decomposition is introduced is shown in figure 3.
The tensor train decomposition algorithm specifically comprises the following steps: the current layer 1 x 1 convolution kernel parameter matrix is extracted and converted into dimension (m)1n1,...,mdnd) Of (b), wherein
Figure BDA0002212918980000111
Carrying out tensor train decomposition on the tensor A obtained by conversion to obtain a nuclear matrix Gk[mk,nk](ii) a Decomposing the channel number M of the input characteristic diagram in the same way to obtain
Figure BDA0002212918980000112
Wherein,obtaining an output characteristic diagram after tensor operation
Figure BDA0002212918980000121
WhereinThe operational procedure of the decomposed 1 × 1 convolution is expressed as:
Figure BDA0002212918980000123
a parameter quantization module: the matrix kernel parameter after tensor decomposition is subjected to quantization operation by using a weight quantization algorithm to obtain an optimized depth separable convolution; although the parameter quantity is reduced by the tensor train decomposition algorithm, the parameter matrix is decomposed into a plurality of matrix cores, the number of operation layers of the matrix is increased, and the required calculation quantity is not obviously reduced. Therefore, the weight quantization algorithm is applied to the decomposed 1 × 1 convolution parameter matrix kernel, the 32bit parameter is quantized to a low bit, the calculation amount can be obviously reduced, the forward calculation speed of the model is accelerated, the size of the model is reduced, and the storage space required by the model is compressed. The weight quantization algorithm is a neural network forward acceleration technology which is widely used at present, the weight parameters are quantized from 32 bits to low bits, the operation amount of the neural network can be obviously reduced, and almost no precision loss can be realized.
Specifically, the weight quantization algorithm specifically includes:
1: extracting the weight of the current layer, and calculating the values of a scaling coefficient S and a zero point Z;
2: calculating a quantization value q corresponding to the actual value r through the scaling coefficient S and the zero point Z, wherein the q is r/S + Z;
3: performing corresponding calculation on input data and the quantized weight parameters, and storing the obtained result in a uint32 form;
4: adding the offset in the form of the uint32 and the result in 3 and quantizing the result to the uint8 form in the same manner;
5: and inputting the result in the step 4 into the activation function to obtain the output data of the layer, wherein the result is in the form of the agent 8.
A model construction module: for constructing a lightweight neural network for the embedded device using the optimized deep separable convolution; the lightweight neural network of the embodiment of the application mainly comprises the depth separable convolution based on tensor train decomposition, the traditional depth separable convolution is replaced by the depth separable convolution based on tensor train decomposition, the sizes of separation convolution kernels are all 3 x 3, the superposition of a plurality of small 3 x 3 convolution kernels has fewer parameters and better nonlinear representation compared with a larger convolution kernel, the Relu6 is used as an activation function, and Batch Normalization is used. And the depth separable convolution parameters after the tensor train is decomposed are quantized from 32 bits to 8 bits by using a weight quantization algorithm, so that the model precision is maintained, and the size and the inference speed of the model are obviously improved compared with those of the MobileNet.
For the ImageNet dataset, the main architecture of the lightweight neural network of the embodiments of the present application is shown in the following table:
table 1: main body architecture of light weight type neural network
Figure BDA0002212918980000131
The number of parameters before compression is 3.19M and the number of parameters after compression is 0.922M, which greatly reduces the amount of parameters for the deep separable convolution.
Fig. 6 is a schematic structural diagram of a hardware device of a lightweight neural network construction method provided in an embodiment of the present application. As shown in fig. 6, the device includes one or more processors and memory. Taking a processor as an example, the apparatus may further include: an input system and an output system.
The processor, memory, input system, and output system may be connected by a bus or other means, as exemplified by the bus connection in fig. 6.
The memory, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules. The processor executes various functional applications and data processing of the electronic device, i.e., implements the processing method of the above-described method embodiment, by executing the non-transitory software program, instructions and modules stored in the memory.
The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data and the like. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the processing system over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input system may receive input numeric or character information and generate a signal input. The output system may include a display device such as a display screen.
The one or more modules are stored in the memory and, when executed by the one or more processors, perform the following for any of the above method embodiments:
step a: decomposing a1 x 1 convolution parameter matrix in the depth separable convolution by using a tensor train decomposition algorithm;
step b: carrying out quantization operation on the matrix kernel parameters subjected to tensor decomposition by using a weight quantization algorithm to obtain an optimized depth separable convolution;
step c: a lightweight neural network is constructed using the optimized deep separable convolution.
The product can execute the method provided by the embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the methods provided in the embodiments of the present application.
Embodiments of the present application provide a non-transitory (non-volatile) computer storage medium having stored thereon computer-executable instructions that may perform the following operations:
step a: decomposing a1 x 1 convolution parameter matrix in the depth separable convolution by using a tensor train decomposition algorithm;
step b: carrying out quantization operation on the matrix kernel parameters subjected to tensor decomposition by using a weight quantization algorithm to obtain an optimized depth separable convolution;
step c: a lightweight neural network is constructed using the optimized deep separable convolution.
Embodiments of the present application provide a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions that, when executed by a computer, cause the computer to perform the following:
step a: decomposing a1 x 1 convolution parameter matrix in the depth separable convolution by using a tensor train decomposition algorithm;
step b: carrying out quantization operation on the matrix kernel parameters subjected to tensor decomposition by using a weight quantization algorithm to obtain an optimized depth separable convolution;
step c: a lightweight neural network is constructed using the optimized deep separable convolution.
The lightweight neural network construction method, the system and the electronic device in the embodiment of the application compress the 1 × 1 convolution of the deep separable convolution by using the tensor train decomposition algorithm, so that the parameter quantity of the deep separable convolution is greatly reduced while the model performance is maintained. By using a weight quantization algorithm to quantize the kernel matrix parameters after tensor decomposition from 32 bits to low bits, the calculation amount of the model is reduced, and the forward inference speed of the model is increased. The lightweight neural network constructed by the method needs less storage space and less calculation power and can be better deployed on embedded devices with limited calculation amount and storage amount.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (9)

1. A lightweight neural network construction method is characterized by comprising the following steps:
step a: decomposing a1 x 1 convolution parameter matrix in the depth separable convolution by using a tensor train decomposition algorithm;
step b: carrying out quantization operation on the matrix kernel parameters subjected to tensor decomposition by using a weight quantization algorithm to obtain an optimized depth separable convolution;
step c: a lightweight neural network is constructed using the optimized deep separable convolution.
2. A lightweight neural network construction method according to claim 1, wherein in step a, assuming that the convolution kernel parameter matrix is 1 × 1 × M × N, where M is the number of input eigen map channels and N is the number of output eigen map channels, the total parameter number of 1 × 1 convolution is MN, and the decomposing the 1 × 1 convolution parameter matrix in the depth separable convolution by using the tensor train decomposition algorithm is specifically:
step a 1: the current layer 1 x 1 convolution kernel parameter matrix is extracted and converted into dimension (m)1n1,...,mdnd) Of (b), wherein
Step a 2: carrying out tensor train decomposition on the tensor A to obtain a nuclear matrix Gk[mk,nk];
Step a 3: carrying out tensor train decomposition on the channel number M of the input characteristic diagram to obtain
Figure FDA0002212918970000012
Wherein,
Figure FDA0002212918970000013
step a 4: obtaining an output characteristic diagram after tensor operation
Figure FDA0002212918970000014
Wherein
Figure FDA0002212918970000015
The operational procedure of the decomposed 1 × 1 convolution is expressed as:
Figure FDA0002212918970000016
3. a lightweight neural network construction method according to claim 2, wherein in step b, the quantizing the matrix kernel parameters after tensor decomposition by using a weight quantizing algorithm specifically includes:
step b 1: extracting the weight of the current layer, and calculating the values of a scaling coefficient S and a zero point Z;
step b 2: calculating a quantization value q corresponding to the actual value r through the scaling coefficient S and the zero point Z, wherein the q is r/S + Z;
step b 3: performing corresponding calculation on input data and the quantized weight parameters, and storing the obtained result in a uint32 form;
step b 4: adding the offset in the form of the uint32 and the result in step b3 and quantifying the result to the form of uint 8;
step b 5: inputting the result of the form of the uint8 described in the step b4 into the activation function to obtain the output data of the layer, and the result is in the form of the uint 8.
4. A method for constructing a lightweight neural network as claimed in any one of claims 1 to 3, wherein in step c, the sizes of the discrete convolution kernels of the lightweight neural network are 3 x 3, and the activation function is Relu 6.
5. A lightweight neural network construction system, comprising:
a network decomposition module: for decomposing a1 x 1 convolution parameter matrix in a depth separable convolution using a tensor train decomposition algorithm;
a parameter quantization module: the matrix kernel parameter after tensor decomposition is subjected to quantization operation by using a weight quantization algorithm to obtain an optimized depth separable convolution;
a model construction module: for constructing lightweight neural networks using optimized deep separable convolutions.
6. A lightweight neural network construction system as claimed in claim 5, wherein assuming that the convolution kernel parameter matrix is 1 x M x N, where M is the number of input eigen map channels and N is the number of output eigen map channels, the total parameter number of 1 x 1 convolution is MN, and said network decomposition module decomposes the 1 x 1 convolution parameter matrix in the deep separable convolution using tensor train decomposition algorithm specifically: the current layer 1 x 1 convolution kernel parameter matrix is extracted and converted into dimension (m)1n1,...,mdnd) Of (b), wherein
Figure FDA0002212918970000031
Carrying out tensor train decomposition on the tensor A to obtain a nuclear matrix Gk[mk,nk](ii) a Carrying out tensor train decomposition on the channel number M of the input characteristic diagram to obtain
Figure FDA0002212918970000032
Wherein,obtaining an output characteristic diagram after tensor operation
Figure FDA0002212918970000034
Wherein
Figure FDA0002212918970000035
The operational procedure of the decomposed 1 × 1 convolution is expressed as:
Figure FDA0002212918970000036
7. the system according to claim 6, wherein the quantizing module quantizes the matrix kernel parameters after tensor decomposition by using a weight quantizing algorithm specifically includes: 1: extracting the weight of the current layer, and calculating the values of a scaling coefficient S and a zero point Z; 2: calculating a quantization value q corresponding to the actual value r through the scaling coefficient S and the zero point Z, wherein the q is r/S + Z; 3: performing corresponding calculation on input data and the quantized weight parameters, and storing the obtained result in a uint32 form; 4: adding the offset in the form of the uint32 and the result in 3 and quantizing the result to the form of uint 8; 5: the result in the form of the uint8 is input to the activation function, resulting in output data for the layer, the result being in the form of the uint 8.
8. A lightweight neural network construction system as claimed in any one of claims 5 to 7, wherein the discrete convolution kernels of the lightweight neural network are each 3 x 3 in size and the activation function is Relu 6.
9. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the following operations of the lightweight neural network construction method of any one of claims 1 to 4 above:
step a: decomposing a1 x 1 convolution parameter matrix in the depth separable convolution by using a tensor train decomposition algorithm;
step b: carrying out quantization operation on the matrix kernel parameters subjected to tensor decomposition by using a weight quantization algorithm to obtain an optimized depth separable convolution;
step c: a lightweight neural network is constructed using the optimized deep separable convolution.
CN201910904649.9A 2019-09-24 2019-09-24 Lightweight neural network construction method and system and electronic equipment Pending CN110751265A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910904649.9A CN110751265A (en) 2019-09-24 2019-09-24 Lightweight neural network construction method and system and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910904649.9A CN110751265A (en) 2019-09-24 2019-09-24 Lightweight neural network construction method and system and electronic equipment

Publications (1)

Publication Number Publication Date
CN110751265A true CN110751265A (en) 2020-02-04

Family

ID=69276977

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910904649.9A Pending CN110751265A (en) 2019-09-24 2019-09-24 Lightweight neural network construction method and system and electronic equipment

Country Status (1)

Country Link
CN (1) CN110751265A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291317A (en) * 2020-02-26 2020-06-16 上海海事大学 Approximate matrix convolution neural network binary greedy recursion method
CN113470653A (en) * 2020-03-31 2021-10-01 华为技术有限公司 Voiceprint recognition method, electronic equipment and system
CN113537485A (en) * 2020-04-15 2021-10-22 北京金山数字娱乐科技有限公司 Neural network model compression method and device
CN113869517A (en) * 2020-06-30 2021-12-31 阿里巴巴集团控股有限公司 Inference method based on deep learning model
WO2022068623A1 (en) * 2020-09-30 2022-04-07 华为技术有限公司 Model training method and related device
US11562231B2 (en) * 2018-09-03 2023-01-24 Tesla, Inc. Neural networks for embedded devices
EP4241206A4 (en) * 2020-12-01 2024-01-03 Huawei Technologies Co., Ltd. Device and method for implementing a tensor-train decomposition operation

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11562231B2 (en) * 2018-09-03 2023-01-24 Tesla, Inc. Neural networks for embedded devices
US11983630B2 (en) 2018-09-03 2024-05-14 Tesla, Inc. Neural networks for embedded devices
CN111291317A (en) * 2020-02-26 2020-06-16 上海海事大学 Approximate matrix convolution neural network binary greedy recursion method
CN111291317B (en) * 2020-02-26 2023-03-24 上海海事大学 Approximate matrix convolution neural network binary greedy recursion method
CN113470653A (en) * 2020-03-31 2021-10-01 华为技术有限公司 Voiceprint recognition method, electronic equipment and system
CN113537485A (en) * 2020-04-15 2021-10-22 北京金山数字娱乐科技有限公司 Neural network model compression method and device
CN113869517A (en) * 2020-06-30 2021-12-31 阿里巴巴集团控股有限公司 Inference method based on deep learning model
WO2022068623A1 (en) * 2020-09-30 2022-04-07 华为技术有限公司 Model training method and related device
EP4241206A4 (en) * 2020-12-01 2024-01-03 Huawei Technologies Co., Ltd. Device and method for implementing a tensor-train decomposition operation

Similar Documents

Publication Publication Date Title
CN110751265A (en) Lightweight neural network construction method and system and electronic equipment
CN107516129B (en) Dimension self-adaptive Tucker decomposition-based deep network compression method
Liu et al. Frequency-domain dynamic pruning for convolutional neural networks
WO2020233130A1 (en) Deep neural network compression method and related device
CN111079781B (en) Lightweight convolutional neural network image recognition method based on low rank and sparse decomposition
Yu et al. On compressing deep models by low rank and sparse decomposition
CN110663048B (en) Execution method, execution device, learning method, learning device, and recording medium for deep neural network
CN107395211B (en) Data processing method and device based on convolutional neural network model
CN112508125A (en) Efficient full-integer quantization method of image detection model
CN106557812A (en) The compression of depth convolutional neural networks and speeding scheme based on dct transform
CN113595993B (en) Vehicle-mounted sensing equipment joint learning method for model structure optimization under edge calculation
CN108197707A (en) Compression method based on the convolutional neural networks that global error is rebuild
CN115713109A (en) Multi-head attention model compression method for image classification
WO2023051335A1 (en) Data encoding method, data decoding method, and data processing apparatus
CN114978189A (en) Data coding method and related equipment
CN114154626B (en) Filter pruning method for image classification task
Qi et al. Learning low resource consumption cnn through pruning and quantization
CN115022637A (en) Image coding method, image decompression method and device
CN111882028B (en) Convolution operation device for convolution neural network
CN114677545B (en) Lightweight image classification method based on similarity pruning and efficient module
CN114372565B (en) Target detection network compression method for edge equipment
CN115564043A (en) Image classification model pruning method and device, electronic equipment and storage medium
Brillet et al. Tunable cnn compression through dimensionality reduction
CN114154621A (en) Convolutional neural network image processing method and device based on FPGA
CN114841342A (en) Tensor-based efficient Transformer construction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200204

RJ01 Rejection of invention patent application after publication