CN112232497A - Method, system, device and medium for compiling AI chip - Google Patents

Method, system, device and medium for compiling AI chip Download PDF

Info

Publication number
CN112232497A
CN112232497A CN202011083320.XA CN202011083320A CN112232497A CN 112232497 A CN112232497 A CN 112232497A CN 202011083320 A CN202011083320 A CN 202011083320A CN 112232497 A CN112232497 A CN 112232497A
Authority
CN
China
Prior art keywords
network model
calculation
chip
network
weight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202011083320.XA
Other languages
Chinese (zh)
Inventor
沈付旺
景璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202011083320.XA priority Critical patent/CN112232497A/en
Publication of CN112232497A publication Critical patent/CN112232497A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Neurology (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Stored Programmes (AREA)

Abstract

The invention discloses a method, a system, equipment and a storage medium for compiling an AI chip, wherein the method comprises the following steps: quantizing the network model of the AI chip, and performing precision adjustment on the quantized network model based on the weight of the network model; distributing the calculation of the convolutional neural network to different MAC calculation units of the AI chip in the network model after the precision adjustment so as to optimize the calculation process of the network model; and acquiring hardware architecture parameters, and generating a continuous calculation flow according to the hardware architecture parameters and the network model. According to the invention, the AI chip is quantized and the precision is adjusted, so that the performance is more perfect; the calculation of the network is optimized to the maximum extent, the calculation steps are reduced, and the utilization rate of the MAC calculation unit is maximized; by generating a computation flow that can keep the computing device in a busy state all the time during the network reasoning process, the utilization rate of the AI device is improved.

Description

Method, system, device and medium for compiling AI chip
Technical Field
The present invention relates to the field of artificial intelligence, and more particularly, to a method, system, computer device and readable medium for compiling an AI chip.
Background
Artificial Intelligence (AI), the most popular and potential technology at present, has been greatly developed in recent years under the support of powerful computing devices. Related applications penetrate into aspects of life such as weather prediction, automatic driving, natural language translation, and so forth. The basis of AI techniques, however, is a vast mathematical computation that relies on powerful computing equipment. In the early stage of AI technology application, because there is no AI computation-dedicated device, a general-purpose computing architecture processor such as a CPU and a GPU is often used to complete the deployment of relevant computation and AI computation examples, and although these general-purpose computing devices have strong flexibility and good versatility, when performing AI-related computation, they are often low in computation efficiency, high in power consumption, and low in hardware cost, so that people seek an AI computation-dedicated computing device AI ASIC (AI application specific chip). The purpose of the AI special chip is to make the AI calculation efficiency higher, the AI equipment calculation power stronger, and the energy consumption and hardware cost lower, thereby reducing the cost of deploying the AI equipment on the whole and improving the processing speed of the AI equipment.
The flow of example calculation to be performed for the AI chip or the AI device is: 1) acquiring an algorithm or a deep network model file to be subjected to instance operation from an existing deep learning framework such as TensorFlow, PyTorch, Caffe and ONNX; 2) generating an instruction set or a calculation configuration file which can be directly executed on the AI device by utilizing an AI compiler; 3) directly distributing the instruction set or the file in the step 2) to a driver by utilizing a top-level application or runtime; 4) the driver forwards the instruction set or the relevant computing configuration to the AI device; 5) and the AI equipment completes the calculation and outputs a calculation result or precision data.
However, the prior art has the following disadvantages for compiling AI chips:
(1) the quantization function is not perfect enough, and the quantization precision is to be improved;
(2) the calculation optimization can only support the optimization of operator fusion class, and the calculation cannot be well optimized comprehensively and deeply;
(3) the method can not directly support some mainstream frameworks such as PyTorch and the like on the aspect of flexibility, and brings few limitations to users;
(4) only aiming at certain specific hardware platforms, the expansion can not be carried out;
(5) the use process is too complicated, which is not beneficial to the user to deploy the actual calculation example;
(6) the computing equipment has an idle state when performing network reasoning computation, and the utilization rate of the equipment needs to be improved;
(7) the calculation optimization can only support the optimization of the very primary operator fusion class, the calculation of the network cannot be optimized to the maximum extent, the calculation steps are reduced, and the utilization rate of the MAC calculation unit is maximized.
Disclosure of Invention
In view of this, an object of the embodiments of the present invention is to provide a method, a system, a computer device, and a computer-readable storage medium for compiling an AI chip, which enable performance to be more complete by quantizing and performing precision adjustment on the AI chip; the calculation of the network is optimized to the maximum extent, the calculation steps are reduced, and the utilization rate of the MAC calculation unit is maximized; by generating a computation flow that can keep the computing device in a busy state all the time during the network reasoning process, the utilization rate of the AI device is improved.
In view of the above, an aspect of the embodiments of the present invention provides a method for compiling an AI chip, including the following steps: quantizing the network model of the AI chip, and performing precision adjustment on the quantized network model based on the weight of the network model; distributing the calculation of the convolutional neural network to different MAC calculation units of an AI chip in the network model after the precision adjustment so as to optimize the calculation process of the network model; and acquiring hardware architecture parameters, and generating continuous calculation flow according to the hardware architecture parameters and the network model.
In some embodiments, the performing precision adjustment on the quantized network model based on the weight of the network model includes: carrying out linear quantization on the weight of the network model to generate first measurement data; generating a feature map of the network model and quantifying the network model based on the feature map to generate second measurement data; and carrying out quantitative reasoning by utilizing the first measured data and the second measured data, comparing a reasoning result with a reasoning result of the original network model, and adjusting the network model according to the comparison result.
In some embodiments, the allocating, in the precision-adjusted network model, the calculation of the convolutional neural network to different MAC calculation units of an AI chip to optimize the calculation flow of the network model includes: judging whether the size of the picture input into the network model exceeds a threshold value; and in response to the fact that the size of the picture input into the network model exceeds a threshold value, fragmenting the picture, and sending the fragmented sub-picture to different AI chip cores.
In some embodiments, the allocating, in the precision-adjusted network model, the calculation of the convolutional neural network to different MAC calculation units of an AI chip to optimize the calculation flow of the network model includes: judging whether the number of channels of the network model input pictures exceeds a second threshold value; and in response to the fact that the number of channels of the network model input pictures exceeds a second threshold value, segmenting the weight of the network model, and sending the segmented sub-weight to different AI chip cores.
In some embodiments, the generating a continuous computation flow from the hardware architecture parameters and the network model comprises: in response to generating a new computing stream, the current usage of computing power per time period is obtained and the new computing stream is set to the time period of minimum current computing power usage.
In some embodiments, the method further comprises: the network model is retrained to crop and compress the network model.
In some embodiments, the method further comprises: and carrying out the same classification on the calculation in the calculation chart of the network model so as to simplify the calculation process.
In another aspect of the embodiments of the present invention, an AI compiling chip system is further provided, including: the quantification module is configured to quantify the network model of the AI chip and adjust the precision of the quantified network model based on the weight of the network model; the calculation module is configured to distribute the calculation of the convolutional neural network to different MAC calculation units of an AI chip in the network model after the precision adjustment so as to optimize the calculation process of the network model; and the flow module is configured to acquire hardware architecture parameters and generate continuous computation flow according to the hardware architecture parameters and the network model.
In another aspect of the embodiments of the present invention, there is also provided a computer device, including: at least one processor; and a memory storing computer instructions executable on the processor, the instructions when executed by the processor implementing the steps of the method as above.
In a further aspect of the embodiments of the present invention, a computer-readable storage medium is also provided, in which a computer program for implementing the above method steps is stored when the computer program is executed by a processor.
The invention has the following beneficial technical effects:
(1) the interface problems of different deep learning frames PyTorch, TensorFlow, Caffe and ONNX are solved, and a comprehensive and reliable quantification function can be provided;
(2) comprehensive calculation optimization functions are provided, including operator fusion, clipping compression, sparse deployment, hardware-oriented adaptive optimization functions based on AutoML and reinforcement learning, automatic selection of an optimal calculation graph, and realization of maximized utilization rate of MAC calculation units, so that calculation speed is maximized, and delay is minimized;
(3) according to a hardware architecture and a network computing process, a computing flow which maximally utilizes AI equipment is automatically generated, so that the computing efficiency approaches to 100% of an ideal one;
(4) the method provides a plurality of flexible development modes for users, such as the development of a user-defined operator and a user-defined network, and the users can select the development language which is good at themselves to realize the user-defined operator and the network which are not supported by the current compiler;
(5) through an implementation method of a LLVM (Low Level Virtual Machine) rear end, the adaptation of a user to an AI compiler and hardware of the user can be met, a large amount of secondary development of the AI compiler is not needed due to the change of a hardware architecture, and the like, so that a large amount of research and development time can be saved for the user, and the user can focus on the design of the hardware architecture.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.
FIG. 1 is a diagram illustrating an embodiment of a method for compiling an AI chip according to the present invention;
FIG. 2 is a flow chart of the custom operator and network provided by the present invention;
FIG. 3 is a flow diagram of a customized underlying compiler provided by the present invention;
fig. 4 is a schematic diagram of a hardware structure of a computer device for compiling an AI chip according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.
In view of the above objects, a first aspect of an embodiment of the present invention proposes an embodiment of a method for compiling an AI chip. Fig. 1 is a schematic diagram illustrating an embodiment of a method for compiling an AI chip according to the present invention. As shown in fig. 1, the embodiment of the present invention includes the following steps:
s1, quantizing the network model of the AI chip, and adjusting the precision of the quantized network model based on the weight of the network model;
s2, distributing the calculation of the convolutional neural network to different MAC calculation units of the AI chip in the network model after the precision adjustment so as to optimize the calculation process of the network model; and
and S3, acquiring the hardware architecture parameters, and generating a continuous calculation flow according to the hardware architecture parameters and the network model.
The basis of the AI technology is deep learning and a special AI computing chip, which is a chip specially aiming at deep learning and computing. A powerful and efficient chip is designed, and the comprehensive consideration needs to be given to the aspects of algorithm top layer, AI compiler, architecture, hardware design and the like. The algorithm is mainly a deep neural network, such as a convolution type network, a circulation type neural network and the like. The algorithms are existing open source algorithms, accuracy verification is carried out on the existing open source algorithms through related data sets, and a user needs to select a corresponding algorithm according to an actual application scene of the user in the future. The chip is designed to support the algorithms more flexibly, comprehensively and efficiently.
And quantizing the network model of the AI chip, and performing precision adjustment on the quantized network model based on the weight of the network model. The embodiment of the invention has the function of quantizing the network model, and can quantize the original FP32 high-bit network model based on TensorFlow, PyTorch, ONNX, Caffe and the like to INT16/Fp16, INT8 and even INT4 low-bit types. And the quantization accuracy of the network satisfies the index on accuracy.
At present, a plurality of deep learning algorithms are provided, a plurality of deep learning frames for realizing the algorithms are provided, an AI compiler can compile an instruction set or a configuration file which can be directly operated on hardware, and firstly, the AI compiler can analyze the original algorithms realized based on different frames, so that more comprehensive model interfaces and analysis functions are required.
In some embodiments, the performing precision adjustment on the quantized network model based on the weight of the network model includes: carrying out linear quantization on the weight of the network model to generate first measurement data; generating a feature map of the network model and quantifying the network model based on the feature map to generate second measurement data; and carrying out quantitative reasoning by utilizing the first measured data and the second measured data, comparing a reasoning result with a reasoning result of the original network model, and adjusting the network model according to the comparison result. Firstly, carrying out linear quantization on a weight of a network model, and generating Scale data (first measurement data) of the weight; then, reasoning the deployment network model on the corresponding data set to generate a reasoning data result of FP32, namely a feature map file; then, quantizing the feature diagram or the activation value of the network model to generate a quantization coefficient or Scale data (second measurement data) between layers; and finally, carrying out quantitative reasoning by using the quantized weight value, the weight value and Scale data of the feature map, comparing the final reasoning result with the original FP32 reasoning result, and carrying out fine adjustment according to the error size of a certain layer or the precision on a data set, thereby finally realizing a quantitative model with no precision loss or 1% precision loss.
After the network model quantification is completed, the calculation process of the whole network model is further optimized and combined, the data moving and calculating steps are reduced to the maximum extent, and the utilization rate of the MAC calculating unit is maximized.
In some embodiments, the method further comprises: the network model is retrained to crop and compress the network model. For example, the compiler evaluates the network model and performs necessary clipping on the network model, specifically, the clipping of the network model is to remove layers or channels that are not important in the network model by retraining until the precision cannot be recovered after clipping, so as to achieve the maximum clipping ratio without losing the precision, thereby realizing the miniaturization of the network model and the reduction of the large calculation amount ratio. After network clipping and compression, the computing structure of each layer becomes sparse, so that sparse computation needs to be matched with a hardware architecture to complete deployment of sparse computation.
In some embodiments, the method further comprises: and carrying out the same classification on the calculation in the calculation chart of the network model so as to simplify the calculation process. For example, in one operator or in one calculation step, the involved calculations of different types are subjected to the same class, common children are eliminated, the calculation step of the operator is simplified, and the simplification of the calculation is further realized. Operators or operations which can be combined in the calculation graph are fused, so that the reduction of calculation steps is realized, the movement of data before and after each calculation is reduced, and a large amount of calculation time is saved.
The method can utilize the latest reinforcement learning and the adaptive learning AutoML to carry out the calculation optimization of the adaptive learning aiming at the current AI equipment of the user, so that the calculation optimization reaches the optimal matching of calculation and hardware.
And distributing the calculation of the convolutional neural network to different MAC calculation units of the AI chip in the network model after the precision adjustment so as to optimize the calculation process of the network model. A convolutional network is often composed of many convolutional layers, and the basic process of computation for each convolutional layer is: feature graph (Feature) data input, Weight (Weight) loading, multiplication and addition calculation, result output or temporary storage. When a plurality of pictures are input, each picture can be respectively distributed to each core, and the weights of the cores are shared and multiplexed because the weights are the same.
In some embodiments, the allocating, in the precision-adjusted network model, the calculation of the convolutional neural network to different MAC calculation units of an AI chip to optimize the calculation flow of the network model includes: judging whether the size of the picture input into the network model exceeds a threshold value; and in response to the fact that the size of the picture input into the network model exceeds a threshold value, fragmenting the picture, and sending the fragmented sub-picture to different AI chip cores. In some practical scenes, the size of the picture may be very large, when the size exceeds a preset threshold, the picture can be sent to different AI chip cores in a slicing mode, and the weight at this moment is shared and multiplexed.
In some embodiments, the allocating, in the precision-adjusted network model, the calculation of the convolutional neural network to different MAC calculation units of an AI chip to optimize the calculation flow of the network model includes: judging whether the number of channels of the network model input pictures exceeds a second threshold value; and in response to the fact that the number of channels of the network model input pictures exceeds a second threshold value, segmenting the weight of the network model, and sending the segmented sub-weight to different AI chip cores. In the network computing process, the size of the picture may be smaller and smaller, but the number of channels of the picture may be larger and larger, when the number of channels exceeds a preset second threshold, the weight values may be sent to different computing cores respectively in a slicing mode, and at this time, the input (feature map) is shared and multiplexed.
And acquiring hardware architecture parameters, and generating a continuous calculation flow according to the hardware architecture parameters and the network model. The AI compiler can carry out self-adaptation according to the hardware configuration and the calculation flow of the network model, thereby ensuring that the hardware does not have the idle state of the hardware caused by data waiting, moving, hardware subtask configuration and the like after the hardware deploys the calculation task, and leading the utilization rate of the hardware to be close to the ideal 100 percent. Specifically, the method comprises the following steps: firstly, importing a hardware architecture file, and if the parameter does not exist or the file is not imported, defaulting a compiler to a self-contained hardware architecture; importing a network model file; and then generating an efficient and seamless calculation flow according to the hardware architecture, the network model and the related calculation configuration file data.
In some embodiments, the generating a continuous computation flow from the hardware architecture parameters and the network model comprises: in response to generating a new computing stream, the current usage of computing power per time period is obtained and the new computing stream is set to the time period of minimum current computing power usage.
For a novel network or a user-defined network, the AI compiler of the invention provides a user-defined network and an implementation interface of a user-defined operator. Users can choose to use the self-adept language to carry out self-defined programming, such as Python, C + + implementation of self operator and network definition.
FIG. 2 is a flow chart illustrating the custom operators and network provided by the present invention. As shown in fig. 2, analyzing the specific function and mathematical expression of the operator, making explicit input and output, determining the development mode and the used calculation interface of the operator, and determining the file name, the operator name, and the operator type of the operator implementation; and respectively defining operator information and a prototype, and realizing corresponding operator functions through codes. The operator compiling comprises the following steps: and carrying out Makefile writing on the realized operator, and carrying out make compiling. The operator test comprises the following steps: and testing whether the output of the operator is consistent with the actual output through specific input data, if so, carrying out the next step, otherwise, checking the operator definition and realizing codes, and re-developing until the final test is passed. And (4) recompiling the AI compiler, and automatically adding the newly developed operator to an operator library of the AI compiler by the AI compiler. And further performing model conversion on the self-defined network model according to the regenerated AI compiler, generating an instruction set or a configuration file and the like which can be directly executed by hardware, and verifying a network reasoning result. And similarly, carrying out the next task according to the final reasoning result or checking the problem in the operator development process.
The programming language is ultimately compiled into corresponding machine instructions for execution by the hardware. However, the target platform of the machine code generated by the compiler of these languages is mainly a CPU platform, and it is necessary to customize the compiler for a specific hardware platform so that the compiled code can be executed efficiently not only on an architecture platform such as a general-purpose CPU but also on a designed AI chip. The embodiment of the invention provides a customized compiler development function aiming at the hardware architecture of a user based on the LLVM. The LLVM compiler architecture is mainly divided into a front part, a middle part and a rear part. The front-end development mainly comprises the steps of adding support to a language, analyzing grammar of the language through the LLVM, and generating an intermediate code LLVM IR (intermediate code), wherein the intermediate part is an LLVM optimizer and is an independent part of the language and a target platform, and the front-end development and the intermediate part do not need a user to care about and can use the existing implementation. The back end is mainly the executable code for generating the target platform, and the AI compiler realizes the customized LLVM back end supporting the specific AI chip ASIC.
FIG. 3 illustrates a flow diagram of a customized underlying compiler provided by the present invention. As shown in FIG. 3, the customized underlying compiler includes create target machine, target machine registration, create register set, instruction set selection, instruction scheduling, and JIT support. The target machine is created, i.e. by creating the TargetMachine subclass, for describing the characteristics of its own target machine. Target registration is to register the target machine through the targetalgorithm interface. Creating a register set for the target machine hardware is by TableGen to generate code about register definitions, register aliases and register classes, representing information that facilitates register allocation and inter-register interaction by implementing a subclass that inherits the targetresiginfo class. Instruction set selection is the conversion of device-independent IR instructions into device-dependent dag (directed Acycle graph) nodes. Instruction scheduling is the efficient scheduling and execution of instructions by way of table scheduling, a greedy heuristic. The JIT (Just-In-Time) support is to realize the compilation execution of a runtime calling function or program segment by writing a subclass inherited from a TargetJITInfo class.
The embodiment of the invention solves the interface problems of different deep learning frames PyTorch, TensorFlow, Caffe and ONNX, and can provide a comprehensive and reliable quantification function. The invention provides comprehensive calculation optimization functions, which comprise operator fusion, cutting compression, sparse deployment, hardware-oriented adaptive optimization functions based on AutoML and reinforcement learning, and automatic selection of an optimal calculation graph, so that the utilization rate of an MAC calculation unit is maximized, the calculation speed is maximized, and the delay is minimized. The invention can automatically generate the calculation flow which maximally utilizes the AI equipment according to the hardware architecture and the network calculation flow, so that the calculation efficiency approaches to 100 percent of ideal. The invention provides a plurality of flexible development modes for users, such as the development of user-defined operators and user-defined networks, and users can select the development language which is good at themselves to realize the user-defined operators and networks which are not supported by the compiler in advance. The expandability of the invention can meet the adaptation of users to the AI compiler and the hardware of the users through the implementation method of the LLVM back end, does not need to carry out secondary development of a large number of AI compilers and the like due to the change of the hardware architecture, can save a large amount of research and development time for the users, and leads the users to focus on the design of the hardware architecture.
It should be particularly noted that, the steps in the embodiments of the method for compiling an AI chip can be mutually intersected, replaced, added, and deleted, so that these methods for compiling an AI chip with reasonable permutation and combination transformation also belong to the scope of the present invention, and the scope of the present invention should not be limited to the embodiments.
In view of the above object, according to a second aspect of the embodiments of the present invention, there is provided a system for compiling an AI chip, including: the quantification module is configured to quantify the network model of the AI chip and adjust the precision of the quantified network model based on the weight of the network model; the calculation module is configured to distribute the calculation of the convolutional neural network to different MAC calculation units of an AI chip in the network model after the precision adjustment so as to optimize the calculation process of the network model; and the flow module is configured to acquire hardware architecture parameters and generate continuous computation flow according to the hardware architecture parameters and the network model.
In some embodiments, the quantization module is configured to: carrying out linear quantization on the weight of the network model to generate first measurement data; generating a feature map of the network model and quantifying the network model based on the feature map to generate second measurement data; and carrying out quantitative reasoning by utilizing the first measured data and the second measured data, comparing a reasoning result with a reasoning result of the original network model, and adjusting the network model according to the comparison result.
In some embodiments, the computing module is configured to: judging whether the size of the picture input into the network model exceeds a threshold value; and in response to the fact that the size of the picture input into the network model exceeds a threshold value, fragmenting the picture, and sending the fragmented sub-picture to different AI chip cores.
In some embodiments, the computing module is configured to: judging whether the number of channels of the network model input pictures exceeds a second threshold value; and in response to the fact that the number of channels of the network model input pictures exceeds a second threshold value, segmenting the weight of the network model, and sending the segmented sub-weight to different AI chip cores.
In some embodiments, the flow module is configured to: in response to generating a new computing stream, the current usage of computing power per time period is obtained and the new computing stream is set to the time period of minimum current computing power usage.
In some embodiments, the system further comprises: and the retraining module is configured to retrain the network model so as to cut and compress the network model.
In some embodiments, the system further comprises: and the simplifying module is configured for carrying out similarity on calculation in the calculation chart of the network model so as to simplify the calculation process.
In view of the above object, a third aspect of the embodiments of the present invention provides a computer device, including: at least one processor; and a memory storing computer instructions executable on the processor, the instructions being executable by the processor to perform the steps of: s1, quantizing the network model of the AI chip, and adjusting the precision of the quantized network model based on the weight of the network model; s2, distributing the calculation of the convolutional neural network to different MAC calculation units of the AI chip in the network model after the precision adjustment so as to optimize the calculation process of the network model; and S3, acquiring the hardware architecture parameters, and generating a continuous calculation flow according to the hardware architecture parameters and the network model.
In some embodiments, the performing precision adjustment on the quantized network model based on the weight of the network model includes: carrying out linear quantization on the weight of the network model to generate first measurement data; generating a feature map of the network model and quantifying the network model based on the feature map to generate second measurement data; and carrying out quantitative reasoning by utilizing the first measured data and the second measured data, comparing a reasoning result with a reasoning result of the original network model, and adjusting the network model according to the comparison result.
In some embodiments, the allocating, in the precision-adjusted network model, the calculation of the convolutional neural network to different MAC calculation units of an AI chip to optimize the calculation flow of the network model includes: judging whether the size of the picture input into the network model exceeds a threshold value; and in response to the fact that the size of the picture input into the network model exceeds a threshold value, fragmenting the picture, and sending the fragmented sub-picture to different AI chip cores.
In some embodiments, the allocating, in the precision-adjusted network model, the calculation of the convolutional neural network to different MAC calculation units of an AI chip to optimize the calculation flow of the network model includes: judging whether the number of channels of the network model input pictures exceeds a second threshold value; and in response to the fact that the number of channels of the network model input pictures exceeds a second threshold value, segmenting the weight of the network model, and sending the segmented sub-weight to different AI chip cores.
In some embodiments, the generating a continuous computation flow from the hardware architecture parameters and the network model comprises: in response to generating a new computing stream, the current usage of computing power per time period is obtained and the new computing stream is set to the time period of minimum current computing power usage.
In some embodiments, the steps further comprise: the network model is retrained to crop and compress the network model.
In some embodiments, the steps further comprise: and carrying out the same classification on the calculation in the calculation chart of the network model so as to simplify the calculation process.
Fig. 4 is a schematic diagram of a hardware structure of an embodiment of the computer device for compiling the AI chip according to the present invention.
Taking the apparatus shown in fig. 4 as an example, the apparatus includes a processor 301 and a memory 302, and may further include: an input device 303 and an output device 304.
The processor 301, the memory 302, the input device 303 and the output device 304 may be connected by a bus or other means, and fig. 4 illustrates the connection by a bus as an example.
The memory 302, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the method for compiling the AI chip in the embodiment of the present application. The processor 301 executes various functional applications of the server and data processing, that is, a method of compiling the AI chip of the above-described method embodiment, by executing the nonvolatile software program, instructions, and modules stored in the memory 302.
The memory 302 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the method of compiling the AI chip, and the like. Further, the memory 302 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 302 optionally includes memory located remotely from processor 301, which may be connected to a local module via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 303 may receive information such as a user name and a password that are input. The output means 304 may comprise a display device such as a display screen.
One or more program instructions/modules corresponding to the method of compiling the AI chip are stored in the memory 302, and when executed by the processor 301, perform the method of compiling the AI chip in any of the above-described method embodiments.
Any embodiment of a computer device for executing the method for compiling the AI chip can achieve the same or similar effects as any corresponding method embodiment.
The invention also provides a computer readable storage medium storing a computer program which, when executed by a processor, performs the method as above.
Finally, it should be noted that, as one of ordinary skill in the art can appreciate that all or part of the processes of the methods of the above embodiments can be implemented by a computer program to instruct related hardware, and the program of the method for compiling the AI chip can be stored in a computer readable storage medium, and when executed, the program can include the processes of the embodiments of the methods as described above. The storage medium of the program may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a Random Access Memory (RAM), or the like. The embodiments of the computer program may achieve the same or similar effects as any of the above-described method embodiments.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims (10)

1. A method of compiling an AI chip, comprising the steps of:
quantizing the network model of the AI chip, and performing precision adjustment on the quantized network model based on the weight of the network model;
distributing the calculation of the convolutional neural network to different MAC calculation units of an AI chip in the network model after the precision adjustment so as to optimize the calculation process of the network model; and
and acquiring hardware architecture parameters, and generating a continuous calculation flow according to the hardware architecture parameters and the network model.
2. The method of claim 1, wherein the performing precision adjustment on the quantized network model based on the weight of the network model comprises:
carrying out linear quantization on the weight of the network model to generate first measurement data;
generating a feature map of the network model and quantifying the network model based on the feature map to generate second measurement data; and
and carrying out quantitative reasoning by utilizing the first measured data and the second measured data, comparing a reasoning result with a reasoning result of the original network model, and adjusting the network model according to the comparison result.
3. The method of claim 1, wherein the assigning of the convolutional neural network computation to different MAC computation units of an AI chip in the precision-adjusted network model to optimize the computation flow of the network model comprises:
judging whether the size of the picture input into the network model exceeds a threshold value; and
and in response to the fact that the size of the picture input into the network model exceeds a threshold value, fragmenting the picture, and sending the fragmented sub-picture to different AI chip cores.
4. The method of claim 1, wherein the assigning of the convolutional neural network computation to different MAC computation units of an AI chip in the precision-adjusted network model to optimize the computation flow of the network model comprises:
judging whether the number of channels of the network model input pictures exceeds a second threshold value; and
and in response to the fact that the number of channels of the network model input pictures exceeds a second threshold value, segmenting the weight of the network model, and sending the segmented sub-weight to different AI chip cores.
5. The method of claim 1, wherein generating a continuous computational flow from the hardware architecture parameters and the network model comprises:
in response to generating a new computing stream, the current usage of computing power per time period is obtained and the new computing stream is set to the time period of minimum current computing power usage.
6. The method of claim 1, further comprising:
the network model is retrained to crop and compress the network model.
7. The method of claim 1, further comprising:
and carrying out the same classification on the calculation in the calculation chart of the network model so as to simplify the calculation process.
8. A system for compiling an AI chip, comprising:
the quantification module is configured to quantify the network model of the AI chip and adjust the precision of the quantified network model based on the weight of the network model;
the calculation module is configured to distribute the calculation of the convolutional neural network to different MAC calculation units of an AI chip in the network model after the precision adjustment so as to optimize the calculation process of the network model; and
and the flow module is configured to acquire hardware architecture parameters and generate continuous computation flow according to the hardware architecture parameters and the network model.
9. A computer device, comprising:
at least one processor; and
a memory storing computer instructions executable on the processor, the instructions when executed by the processor implementing the steps of the method of any one of claims 1 to 7.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
CN202011083320.XA 2020-10-12 2020-10-12 Method, system, device and medium for compiling AI chip Withdrawn CN112232497A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011083320.XA CN112232497A (en) 2020-10-12 2020-10-12 Method, system, device and medium for compiling AI chip

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011083320.XA CN112232497A (en) 2020-10-12 2020-10-12 Method, system, device and medium for compiling AI chip

Publications (1)

Publication Number Publication Date
CN112232497A true CN112232497A (en) 2021-01-15

Family

ID=74112076

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011083320.XA Withdrawn CN112232497A (en) 2020-10-12 2020-10-12 Method, system, device and medium for compiling AI chip

Country Status (1)

Country Link
CN (1) CN112232497A (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113918507A (en) * 2021-12-09 2022-01-11 之江实验室 Method and device for adapting deep learning framework to AI acceleration chip
CN114004352A (en) * 2021-12-31 2022-02-01 杭州雄迈集成电路技术股份有限公司 Simulation implementation method, neural network compiler and computer readable storage medium
US11403069B2 (en) 2017-07-24 2022-08-02 Tesla, Inc. Accelerated mathematical engine
US11409692B2 (en) 2017-07-24 2022-08-09 Tesla, Inc. Vector computational unit
US11487288B2 (en) 2017-03-23 2022-11-01 Tesla, Inc. Data synthesis for autonomous control systems
US11537811B2 (en) 2018-12-04 2022-12-27 Tesla, Inc. Enhanced object detection for autonomous vehicles based on field view
US11561791B2 (en) 2018-02-01 2023-01-24 Tesla, Inc. Vector computational unit receiving data elements in parallel from a last row of a computational array
US11562231B2 (en) 2018-09-03 2023-01-24 Tesla, Inc. Neural networks for embedded devices
US11567514B2 (en) 2019-02-11 2023-01-31 Tesla, Inc. Autonomous and user controlled vehicle summon to a target
US11610117B2 (en) 2018-12-27 2023-03-21 Tesla, Inc. System and method for adapting a neural network model on a hardware platform
US11636333B2 (en) 2018-07-26 2023-04-25 Tesla, Inc. Optimizing neural network structures for embedded systems
US11665108B2 (en) 2018-10-25 2023-05-30 Tesla, Inc. QoS manager for system on a chip communications
US11681649B2 (en) 2017-07-24 2023-06-20 Tesla, Inc. Computational array microprocessor system using non-consecutive data formatting
US11734562B2 (en) 2018-06-20 2023-08-22 Tesla, Inc. Data pipeline and deep learning system for autonomous driving
US11748620B2 (en) 2019-02-01 2023-09-05 Tesla, Inc. Generating ground truth for machine learning from time series elements
US11790664B2 (en) 2019-02-19 2023-10-17 Tesla, Inc. Estimating object properties using visual image data
US11816585B2 (en) 2018-12-03 2023-11-14 Tesla, Inc. Machine learning models operating at different frequencies for autonomous vehicles
US11841434B2 (en) 2018-07-20 2023-12-12 Tesla, Inc. Annotation cross-labeling for autonomous control systems
US11893774B2 (en) 2018-10-11 2024-02-06 Tesla, Inc. Systems and methods for training machine models with augmented data
US11893393B2 (en) 2017-07-24 2024-02-06 Tesla, Inc. Computational array microprocessor system with hardware arbiter managing memory requests
CN117851270A (en) * 2024-03-07 2024-04-09 中国电子科技集团公司第十五研究所 Method and device for testing system-on-chip compiler, electronic equipment and storage medium
US12014553B2 (en) 2019-02-01 2024-06-18 Tesla, Inc. Predicting three-dimensional features for autonomous driving
US12136030B2 (en) 2023-03-16 2024-11-05 Tesla, Inc. System and method for adapting a neural network model on a hardware platform

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12020476B2 (en) 2017-03-23 2024-06-25 Tesla, Inc. Data synthesis for autonomous control systems
US11487288B2 (en) 2017-03-23 2022-11-01 Tesla, Inc. Data synthesis for autonomous control systems
US11403069B2 (en) 2017-07-24 2022-08-02 Tesla, Inc. Accelerated mathematical engine
US11893393B2 (en) 2017-07-24 2024-02-06 Tesla, Inc. Computational array microprocessor system with hardware arbiter managing memory requests
US12086097B2 (en) 2017-07-24 2024-09-10 Tesla, Inc. Vector computational unit
US11681649B2 (en) 2017-07-24 2023-06-20 Tesla, Inc. Computational array microprocessor system using non-consecutive data formatting
US11409692B2 (en) 2017-07-24 2022-08-09 Tesla, Inc. Vector computational unit
US11797304B2 (en) 2018-02-01 2023-10-24 Tesla, Inc. Instruction set architecture for a vector computational unit
US11561791B2 (en) 2018-02-01 2023-01-24 Tesla, Inc. Vector computational unit receiving data elements in parallel from a last row of a computational array
US11734562B2 (en) 2018-06-20 2023-08-22 Tesla, Inc. Data pipeline and deep learning system for autonomous driving
US11841434B2 (en) 2018-07-20 2023-12-12 Tesla, Inc. Annotation cross-labeling for autonomous control systems
US12079723B2 (en) 2018-07-26 2024-09-03 Tesla, Inc. Optimizing neural network structures for embedded systems
US11636333B2 (en) 2018-07-26 2023-04-25 Tesla, Inc. Optimizing neural network structures for embedded systems
US11562231B2 (en) 2018-09-03 2023-01-24 Tesla, Inc. Neural networks for embedded devices
US11983630B2 (en) 2018-09-03 2024-05-14 Tesla, Inc. Neural networks for embedded devices
US11893774B2 (en) 2018-10-11 2024-02-06 Tesla, Inc. Systems and methods for training machine models with augmented data
US11665108B2 (en) 2018-10-25 2023-05-30 Tesla, Inc. QoS manager for system on a chip communications
US11816585B2 (en) 2018-12-03 2023-11-14 Tesla, Inc. Machine learning models operating at different frequencies for autonomous vehicles
US11908171B2 (en) 2018-12-04 2024-02-20 Tesla, Inc. Enhanced object detection for autonomous vehicles based on field view
US11537811B2 (en) 2018-12-04 2022-12-27 Tesla, Inc. Enhanced object detection for autonomous vehicles based on field view
US11610117B2 (en) 2018-12-27 2023-03-21 Tesla, Inc. System and method for adapting a neural network model on a hardware platform
US11748620B2 (en) 2019-02-01 2023-09-05 Tesla, Inc. Generating ground truth for machine learning from time series elements
US12014553B2 (en) 2019-02-01 2024-06-18 Tesla, Inc. Predicting three-dimensional features for autonomous driving
US11567514B2 (en) 2019-02-11 2023-01-31 Tesla, Inc. Autonomous and user controlled vehicle summon to a target
US11790664B2 (en) 2019-02-19 2023-10-17 Tesla, Inc. Estimating object properties using visual image data
CN113918507A (en) * 2021-12-09 2022-01-11 之江实验室 Method and device for adapting deep learning framework to AI acceleration chip
CN114004352A (en) * 2021-12-31 2022-02-01 杭州雄迈集成电路技术股份有限公司 Simulation implementation method, neural network compiler and computer readable storage medium
US12136030B2 (en) 2023-03-16 2024-11-05 Tesla, Inc. System and method for adapting a neural network model on a hardware platform
CN117851270A (en) * 2024-03-07 2024-04-09 中国电子科技集团公司第十五研究所 Method and device for testing system-on-chip compiler, electronic equipment and storage medium
CN117851270B (en) * 2024-03-07 2024-05-03 中国电子科技集团公司第十五研究所 Method and device for testing system-on-chip compiler, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN112232497A (en) Method, system, device and medium for compiling AI chip
CN107563512B (en) Data processing method, device and storage medium
CN115659281B (en) Method and device for fusing adaptive acceleration operators
CN110058883A (en) A kind of CNN accelerated method and system based on OPU
CN113703775B (en) Compiling method, compiling device, compiling equipment and storage medium
CN109656544B (en) Cloud service API (application program interface) adaptation method based on execution path similarity
CN108786112B (en) Application scene configuration method, device and storage medium
CN111527501A (en) Chip adaptation determining method and related product
CN116702835A (en) Neural network reasoning acceleration method, target detection method, device and storage medium
CN110750298B (en) AI model compiling method, equipment and storage medium
US20220172044A1 (en) Method, electronic device, and computer program product for deploying machine learning model
CN113204373A (en) Operation method, device and related product
CN115829006A (en) Compiling method and device of neural network model, electronic equipment and storage medium
CN115525436A (en) Model deployment and operation method and device, offline analysis tool and electronic equipment
CN117196000A (en) Edge side model reasoning acceleration method for containerized deployment
CN113885845B (en) Calculation map generation method, system, equipment and medium of deep learning compiler
Grimaldi et al. Optimality assessment of memory-bounded convnets deployed on resource-constrained risc cores
WO2022246986A1 (en) Data processing method, apparatus and device, and computer-readable storage medium
CN110968404B (en) Equipment data processing method and device
CN114936015A (en) Deep learning compiler based on hardware computation graph
CN112015426B (en) Code management method, device and equipment
CN116560666B (en) AI front end unified computing method, device and medium based on multi-level code generation
CN112001494A (en) Method for realizing support of FPGA (field programmable Gate array) back-end equipment by nGraph framework
CN117271057A (en) Large model deployment method, device and product based on server non-perception calculation
CN114217881B (en) Task unloading method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20210115

WW01 Invention patent application withdrawn after publication