MMBench: End-to-End Benchmarking Tool for Analyzing the Hardware-Software Implications of Multi-modal DNNs

Ⅰ. Introduction & Background

Multi-modal DNNs have become increasingly popular across various application domains due to their significant accuracy improvement compared to SOTA uni-modal DNNs.

Multimodal DNN

Self-driving Medical Multimedia Robotic

To understand the implications of multi-modal DNNs on hardware-software co-designs, we have developed MMBench, an end-to-end benchmarking tool designed to evaluate the performance of multi-modal DNNs at both architecture and system levels.

II. Overview of MMBench

Proposed method

MMBench provides profiling tools based on integrated profilers in both CPU and NVIDIA GPU, including PyTorch profiler, Nsight System, and Nsight Compute. These tools enable researchers to comprehensively understand the execution of multi-modal DNNs. See the figure below for how they work together to analyze DNN performance.

Unique features

In all, MMBench possesses the following unique features closely related with the characteristics of multi-modal DNNs, which distinguishes itself from general-purpose benchmarks in these specific areas:

Fine-grained Network Characterization
End-to-End Application
ExecutionUser-friendly Profiler Integration

Ⅲ. Implementation Details

Workloads in MMBench

MMBench includes 9 different applications from the five most important multi-modal research domains as shown below. It can cover a wide range of the multi-modal DNNs workloads today.

Application	Domain	Size	Modalities	Unimodal models	Fusion models	Task type
Avmnist	Multimedia	Small	Image Audio	CNN	Concate/Tensor	Classification
MMimdb	Multimedia	Medium	Image Text	CNN+transformer	Concate/Tensor	Classification
CMU-MOSEI	Affective computing	Large	Language Vision Audio	CNN+transformer	Concate/Tensor/Transformer	Regression
Sarcasm	Affective computing	Small	Language Vision Audio	CNN+transformer	Concate/Tensor/Transformer	Classification
Medical VQA	Medical	Large	Image Text	CNN+transformer	Transformer	Generation
Medical Segmentation	Medical	Large	MRI scans (T1, T1c, T2, FLAIR)	CNN+transformer	Transformer	Segmentation
MuJoCo Push	Robotics	Medium	Image, force, proprioception, control	CNN+RNN	Concate/Tensor/Transformer	Classification
Vison & Touch	Robotics	Large	Image, force, proprioception, depth	CNN+RNN	Concate/Tensor	Classification
TransFuser	Automatic driving	Large	Image LiDAR	ResNet-34 ResNet-18	Transformer	Classification

Encoders, fusion and head methods

From software aspects, the applications we choose apply many kinds of subnets (mainly as encoders) , fusion ways and head methods, which consititue a whole multi-modal DNN.

Ⅳ. Profiling Method and Code

Nsight System and Nsight Compute

Nsight System and Nsight Compute measurement scripts are provided in the scripts folder. You can follow instructions there to run experiments.

Pytorch Profiler

The code for measuring using the Pytorch Profiler is contained within each application's own folder. The result will be generated in the log folder.

Ⅴ. Acknowledgement

Some codes and applications were adapted from the MultiBench.

Ⅵ. Contributors

Our team has been working on related technologies since 2018. Thank you to everyone for contributing to this project.

Correspondence to:

Ⅶ. Related Publications

Characterizing and Understanding End-to-End Multi-modal Neural Networks on GPUs
Xiaofeng Hou, Cheng Xu, Jiacheng Liu, Xuehan Tang, Lingyu Sun, Chao Li and Kwang-Ting Cheng
IEEE Co

Name		Name	Last commit message	Last commit date
Latest commit History 140 Commits
applications		applications
datasets		datasets
figures		figures
models		models
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
multi_model_end2end_test.py		multi_model_end2end_test.py
pie.html		pie.html
push_multi.py		push_multi.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MMBench: End-to-End Benchmarking Tool for Analyzing the Hardware-Software Implications of Multi-modal DNNs

Ⅰ. Introduction & Background

II. Overview of MMBench

Proposed method

Unique features

Ⅲ. Implementation Details

Workloads in MMBench

Encoders, fusion and head methods

Ⅳ. Profiling Method and Code

Nsight System and Nsight Compute

Pytorch Profiler

Ⅴ. Acknowledgement

Ⅵ. Contributors

Ⅶ. Related Publications

License

xfhelen/MMBench

Folders and files

Latest commit

History

Repository files navigation

MMBench: End-to-End Benchmarking Tool for Analyzing the Hardware-Software Implications of Multi-modal DNNs

Ⅰ. Introduction & Background

II. Overview of MMBench

Proposed method

Unique features

Ⅲ. Implementation Details

Workloads in MMBench

Encoders, fusion and head methods

Ⅳ. Profiling Method and Code

Nsight System and Nsight Compute

Pytorch Profiler

Ⅴ. Acknowledgement

Ⅵ. Contributors

Ⅶ. Related Publications