[Improvement] Exclude data loading time when benchmarking speed #900

xvjiarui · 2021-09-23T21:41:24Z

Motivation

Exclude data loading time when benchmarking speed

Modification

Call data = next(data_loader_iter) explicitly.

BC-breaking (Optional)

May need to re-run all inference speed benchmark.

codecov · 2021-09-23T21:59:09Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.67%. Comparing base (0b11d58) to head (a703956).
Report is 334 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #900      +/-   ##
==========================================
+ Coverage   89.05%   89.67%   +0.61%     
==========================================
  Files         112      114       +2     
  Lines        6060     6431     +371     
  Branches      970     1007      +37     
==========================================
+ Hits         5397     5767     +370     
+ Misses        468      462       -6     
- Partials      195      202       +7

Flag	Coverage Δ
unittests	`89.67% <ø> (+0.61%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

RockeyCoss · 2021-11-15T08:19:32Z

Please merge the master branch into your branch, thank you.

mfernezir · 2022-03-30T19:14:22Z

This version also excludes the time to place the data on the GPU. Are you sure you want it like that? From a practical standpoint, the API receiving an image will also handle cpu-gpu placement, along with the inference time.

I've combined what's here with the current master, but I've left the cpu to gpu timing like in the master branch.

# Copyright (c) OpenMMLab. All rights reserved.

# Added modifications:

# Excludes data loading time as in https://github.com/open-mmlab/mmsegmentation/pull/900
# Preserves --repeat-times from the master branch
# Includes the time to place the data to GPU

import argparse
import os.path as osp
import time

import mmcv
import numpy as np
import torch
from mmcv import Config
from mmcv.parallel import scatter
from mmcv.runner import load_checkpoint, wrap_fp16_model

from mmseg.datasets import build_dataloader, build_dataset
from mmseg.models import build_segmentor


def parse_args():
    parser = argparse.ArgumentParser(description='MMSeg benchmark a model')
    parser.add_argument('config', help='test config file path')
    parser.add_argument('checkpoint', help='checkpoint file')
    parser.add_argument(
        '--log-interval', type=int, default=50, help='interval of logging')
    parser.add_argument(
        '--work-dir',
        help=('if specified, the results will be dumped '
              'into the directory as json'))
    parser.add_argument('--repeat-times', type=int, default=1)
    args = parser.parse_args()
    return args


def main():
    args = parse_args()

    cfg = Config.fromfile(args.config)
    timestamp = time.strftime('%Y%m%d_%H%M%S', time.localtime())
    if args.work_dir is not None:
        mmcv.mkdir_or_exist(osp.abspath(args.work_dir))
        json_file = osp.join(args.work_dir, f'fps_{timestamp}.json')
    else:
        # use config filename as default work_dir if cfg.work_dir is None
        work_dir = osp.join('./work_dirs',
                            osp.splitext(osp.basename(args.config))[0])
        mmcv.mkdir_or_exist(osp.abspath(work_dir))
        json_file = osp.join(work_dir, f'fps_{timestamp}.json')

    repeat_times = args.repeat_times
    # set cudnn_benchmark
    torch.backends.cudnn.benchmark = False
    cfg.model.pretrained = None
    cfg.data.test.test_mode = True

    benchmark_dict = dict(config=args.config, unit='img / s')
    overall_fps_list = []
    for time_index in range(repeat_times):
        print(f'Run {time_index + 1}:')
        # build the dataloader
        # TODO: support multiple images per gpu (only minor changes are needed)
        dataset = build_dataset(cfg.data.test)
        data_loader = build_dataloader(
            dataset,
            samples_per_gpu=1,
            workers_per_gpu=0,
            dist=False,
            shuffle=False,
            persistent_workers=False)

        # build the model and load checkpoint
        cfg.model.train_cfg = None
        model = build_segmentor(cfg.model, test_cfg=cfg.get('test_cfg'))
        fp16_cfg = cfg.get('fp16', None)
        if fp16_cfg is not None:
            wrap_fp16_model(model)
        if 'checkpoint' in args and osp.exists(args.checkpoint):
            load_checkpoint(model, args.checkpoint, map_location='cpu')

        model = model.cuda()
        device = next(model.parameters()).device  # model device

        model.eval()

        # the first several iterations may be very slow so skip them
        num_warmup = 5
        pure_inf_time = 0
        total_iters = 200

        data_loader_iter = iter(data_loader)
        # benchmark with 200 image and take the average
        for i in range(total_iters):
            data = next(data_loader_iter)

            # torch.cuda.synchronize()
            start_time = time.perf_counter()
            data = scatter(data, [device])[0]

            with torch.no_grad():
                model(return_loss=False, rescale=True, **data)

            # torch.cuda.synchronize()
            elapsed = time.perf_counter() - start_time

            if i >= num_warmup:
                pure_inf_time += elapsed
                if (i + 1) % args.log_interval == 0:
                    fps = (i + 1 - num_warmup) / pure_inf_time
                    print(f'Done image [{i + 1:<3}/ {total_iters}], '
                          f'fps: {fps:.2f} img / s')

        fps = (total_iters - num_warmup) / pure_inf_time
        print(f'Overall fps: {fps:.2f} img / s')
        benchmark_dict[f'overall_fps_{time_index + 1}'] = round(fps, 2)
        overall_fps_list.append(fps)

    benchmark_dict['average_fps'] = round(np.mean(overall_fps_list), 2)
    benchmark_dict['fps_variance'] = round(np.var(overall_fps_list), 4)
    print(f'Average fps of {repeat_times} evaluations: '
          f'{benchmark_dict["average_fps"]}')
    print(f'The variance of {repeat_times} evaluations: '
          f'{benchmark_dict["fps_variance"]}')
    mmcv.dump(benchmark_dict, json_file, indent=4)


if __name__ == '__main__':
    main()

* Initial Wildcard Stable Diffusion Pipeline * Added some additional example usage * style * Added links in README and additional documentation * Initial Wildcard Stable Diffusion Pipeline * Added some additional example usage * style * Added links in README and additional documentation * cleanup readme again * Apply suggestions from code review Co-authored-by: Patrick von Platen <[email protected]>

[Improvement] Exclude data loading time when benchmarking speed

a5c9087

xvjiarui requested review from Junjun2016 and MengzhangLI September 23, 2021 21:41

exclude cpu to gpu time

a703956

MengzhangLI approved these changes Jan 16, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Improvement] Exclude data loading time when benchmarking speed #900

[Improvement] Exclude data loading time when benchmarking speed #900

xvjiarui commented Sep 23, 2021

codecov bot commented Sep 23, 2021 •

edited

Loading

RockeyCoss commented Nov 15, 2021

mfernezir commented Mar 30, 2022

[Improvement] Exclude data loading time when benchmarking speed #900

Are you sure you want to change the base?

[Improvement] Exclude data loading time when benchmarking speed #900

Conversation

xvjiarui commented Sep 23, 2021

Motivation

Modification

BC-breaking (Optional)

codecov bot commented Sep 23, 2021 • edited Loading

Codecov Report

RockeyCoss commented Nov 15, 2021

mfernezir commented Mar 30, 2022

codecov bot commented Sep 23, 2021 •

edited

Loading