This repo contains the source code for the U-Boost NAS method presented in this paper. It optimizes hardware resource utilization with task accuracy and latency to maximize inference performance. It estimates hardware utilization using a novel computational model for DNN accelerators.
src/search
: contains the code for microarchitecture and channel search stagessrc/vanilla
: contains the code for final training stagesrc/data
: contains the code for dataloaders and preprocessing for various datasetssrc/hardware
: contains the code for hardware model and cycle-accurate hardware simulations
This code implements the microarchitecture search as in DARTS-like methods and channel search using DMaskNAS method.
Computes the following runtime for convolutional cells:
with "matrixification" of the tensors, where:- B is the number of batches
- h is the height of the input
- w is the width of the input
- c is the number of channels
- f is the number of filters
- k1 is one dimension of the kernel
- k2 is the other dimension of the kernel
- s1 is one dimension of the systolic array
- s2 is the other dimension of the systolic array
However, the ceil function is not differentiable and can only be used as a collection of point estimates. This hinders the neural architecture seach and allows only for evolutionary or reinforcement learning methods, which require orders of magnitude more computational resources compared to differentiable methods. For this reason, the ceil function is replaced with a soft approximation, the smooth ceiling
:
for wi intervals between zero and a fixed value. This model corresponds more with the realistic case.
TODO: explain a bit more the realistic case.
python main.py #--help for information about optional arguments
If you use this code, please cite our paper.