Skip to content

An end-to-end benchmark suite of multi-modal DNN applications for system-architecture co-design

License

Notifications You must be signed in to change notification settings

xfhelen/MMBench

Repository files navigation

MMBench: End-to-End Benchmarking Tool for Analyzing the Hardware-Software Implications of Multi-modal DNNs

Ⅰ. Introduction & Background

​ Multi-modal DNNs have become increasingly popular across various application domains due to their significant accuracy improvement compared to SOTA uni-modal DNNs.

Multimodal DNN

image-20230726131917741 image-20230726132014952image-20230726132026796image-20230726132039464Self-driving                                     Medical                                           Multimedia                                           Robotic

​ To understand the implications of multi-modal DNNs on hardware-software co-designs, we have developed MMBench, an end-to-end benchmarking tool designed to evaluate the performance of multi-modal DNNs at both architecture and system levels.

II. Overview of MMBench

Proposed method

MMBench provides profiling tools based on integrated profilers in both CPU and NVIDIA GPU, including PyTorch profiler, Nsight System, and Nsight Compute. These tools enable researchers to comprehensively understand the execution of multi-modal DNNs. See the figure below for how they work together to analyze DNN performance.

image-20230726132234532

Unique features

​ In all, MMBench possesses the following unique features closely related with the characteristics of multi-modal DNNs, which distinguishes itself from general-purpose benchmarks in these specific areas:

  • Fine-grained Network Characterization
  • End-to-End Application
  • ExecutionUser-friendly Profiler Integration

Ⅲ. Implementation Details

Workloads in MMBench

​ MMBench includes 9 different applications from the five most important multi-modal research domains as shown below. It can cover a wide range of the multi-modal DNNs workloads today.

Application Domain Size Modalities Unimodal models Fusion models Task type
Avmnist Multimedia Small Image
Audio
CNN Concate/Tensor Classification
MMimdb Multimedia Medium Image
Text
CNN+transformer Concate/Tensor Classification
CMU-MOSEI Affective computing Large Language
Vision
Audio
CNN+transformer Concate/Tensor/Transformer Regression
Sarcasm Affective computing Small Language
Vision
Audio
CNN+transformer Concate/Tensor/Transformer Classification
Medical VQA Medical Large Image
Text
CNN+transformer Transformer Generation
Medical Segmentation Medical Large MRI scans
(T1, T1c, T2, FLAIR)
CNN+transformer Transformer Segmentation
MuJoCo Push Robotics Medium Image, force, proprioception, control CNN+RNN Concate/Tensor/Transformer Classification
Vison & Touch Robotics Large Image, force, proprioception, depth CNN+RNN Concate/Tensor Classification
TransFuser Automatic driving Large Image
LiDAR
ResNet-34
ResNet-18
Transformer Classification

image-20230726132314122

Encoders, fusion and head methods

​ From software aspects, the applications we choose apply many kinds of subnets (mainly as encoders) , fusion ways and head methods, which consititue a whole multi-modal DNN.

image-20230726132334520

Ⅳ. Profiling Method and Code

Nsight System and Nsight Compute

Nsight System and Nsight Compute measurement scripts are provided in the scripts folder. You can follow instructions there to run experiments.

Pytorch Profiler

The code for measuring using the Pytorch Profiler is contained within each application's own folder. The result will be generated in the log folder.

Ⅴ. Acknowledgement

Some codes and applications were adapted from the MultiBench.

Ⅵ. Contributors

Our team has been working on related technologies since 2018. Thank you to everyone for contributing to this project.

Correspondence to:

Ⅶ. Related Publications

Characterizing and Understanding End-to-End Multi-modal Neural Networks on GPUs
Xiaofeng Hou, Cheng Xu, Jiacheng Liu, Xuehan Tang, Lingyu Sun, Chao Li and Kwang-Ting Cheng
IEEE Co