Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement] Exclude data loading time when benchmarking speed #900

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

xvjiarui
Copy link
Collaborator

Motivation

Exclude data loading time when benchmarking speed

Modification

Call data = next(data_loader_iter) explicitly.

BC-breaking (Optional)

May need to re-run all inference speed benchmark.

@codecov
Copy link

codecov bot commented Sep 23, 2021

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.67%. Comparing base (0b11d58) to head (a703956).
Report is 334 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #900      +/-   ##
==========================================
+ Coverage   89.05%   89.67%   +0.61%     
==========================================
  Files         112      114       +2     
  Lines        6060     6431     +371     
  Branches      970     1007      +37     
==========================================
+ Hits         5397     5767     +370     
+ Misses        468      462       -6     
- Partials      195      202       +7     
Flag Coverage Δ
unittests 89.67% <ø> (+0.61%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@RockeyCoss
Copy link
Contributor

Please merge the master branch into your branch, thank you.

@mfernezir
Copy link
Contributor

This version also excludes the time to place the data on the GPU. Are you sure you want it like that? From a practical standpoint, the API receiving an image will also handle cpu-gpu placement, along with the inference time.

I've combined what's here with the current master, but I've left the cpu to gpu timing like in the master branch.

# Copyright (c) OpenMMLab. All rights reserved.

# Added modifications:

# Excludes data loading time as in https://github.com/open-mmlab/mmsegmentation/pull/900
# Preserves --repeat-times from the master branch
# Includes the time to place the data to GPU

import argparse
import os.path as osp
import time

import mmcv
import numpy as np
import torch
from mmcv import Config
from mmcv.parallel import scatter
from mmcv.runner import load_checkpoint, wrap_fp16_model

from mmseg.datasets import build_dataloader, build_dataset
from mmseg.models import build_segmentor


def parse_args():
    parser = argparse.ArgumentParser(description='MMSeg benchmark a model')
    parser.add_argument('config', help='test config file path')
    parser.add_argument('checkpoint', help='checkpoint file')
    parser.add_argument(
        '--log-interval', type=int, default=50, help='interval of logging')
    parser.add_argument(
        '--work-dir',
        help=('if specified, the results will be dumped '
              'into the directory as json'))
    parser.add_argument('--repeat-times', type=int, default=1)
    args = parser.parse_args()
    return args


def main():
    args = parse_args()

    cfg = Config.fromfile(args.config)
    timestamp = time.strftime('%Y%m%d_%H%M%S', time.localtime())
    if args.work_dir is not None:
        mmcv.mkdir_or_exist(osp.abspath(args.work_dir))
        json_file = osp.join(args.work_dir, f'fps_{timestamp}.json')
    else:
        # use config filename as default work_dir if cfg.work_dir is None
        work_dir = osp.join('./work_dirs',
                            osp.splitext(osp.basename(args.config))[0])
        mmcv.mkdir_or_exist(osp.abspath(work_dir))
        json_file = osp.join(work_dir, f'fps_{timestamp}.json')

    repeat_times = args.repeat_times
    # set cudnn_benchmark
    torch.backends.cudnn.benchmark = False
    cfg.model.pretrained = None
    cfg.data.test.test_mode = True

    benchmark_dict = dict(config=args.config, unit='img / s')
    overall_fps_list = []
    for time_index in range(repeat_times):
        print(f'Run {time_index + 1}:')
        # build the dataloader
        # TODO: support multiple images per gpu (only minor changes are needed)
        dataset = build_dataset(cfg.data.test)
        data_loader = build_dataloader(
            dataset,
            samples_per_gpu=1,
            workers_per_gpu=0,
            dist=False,
            shuffle=False,
            persistent_workers=False)

        # build the model and load checkpoint
        cfg.model.train_cfg = None
        model = build_segmentor(cfg.model, test_cfg=cfg.get('test_cfg'))
        fp16_cfg = cfg.get('fp16', None)
        if fp16_cfg is not None:
            wrap_fp16_model(model)
        if 'checkpoint' in args and osp.exists(args.checkpoint):
            load_checkpoint(model, args.checkpoint, map_location='cpu')

        model = model.cuda()
        device = next(model.parameters()).device  # model device

        model.eval()

        # the first several iterations may be very slow so skip them
        num_warmup = 5
        pure_inf_time = 0
        total_iters = 200

        data_loader_iter = iter(data_loader)
        # benchmark with 200 image and take the average
        for i in range(total_iters):
            data = next(data_loader_iter)

            # torch.cuda.synchronize()
            start_time = time.perf_counter()
            data = scatter(data, [device])[0]

            with torch.no_grad():
                model(return_loss=False, rescale=True, **data)

            # torch.cuda.synchronize()
            elapsed = time.perf_counter() - start_time

            if i >= num_warmup:
                pure_inf_time += elapsed
                if (i + 1) % args.log_interval == 0:
                    fps = (i + 1 - num_warmup) / pure_inf_time
                    print(f'Done image [{i + 1:<3}/ {total_iters}], '
                          f'fps: {fps:.2f} img / s')

        fps = (total_iters - num_warmup) / pure_inf_time
        print(f'Overall fps: {fps:.2f} img / s')
        benchmark_dict[f'overall_fps_{time_index + 1}'] = round(fps, 2)
        overall_fps_list.append(fps)

    benchmark_dict['average_fps'] = round(np.mean(overall_fps_list), 2)
    benchmark_dict['fps_variance'] = round(np.var(overall_fps_list), 4)
    print(f'Average fps of {repeat_times} evaluations: '
          f'{benchmark_dict["average_fps"]}')
    print(f'The variance of {repeat_times} evaluations: '
          f'{benchmark_dict["fps_variance"]}')
    mmcv.dump(benchmark_dict, json_file, indent=4)


if __name__ == '__main__':
    main()

aravind-h-v pushed a commit to aravind-h-v/mmsegmentation that referenced this pull request Mar 27, 2023
* Initial Wildcard Stable Diffusion Pipeline

* Added some additional example usage

* style

* Added links in README and additional documentation

* Initial Wildcard Stable Diffusion Pipeline

* Added some additional example usage

* style

* Added links in README and additional documentation

* cleanup readme again

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants