Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Refactor ImageRecordIter #14824

Merged
merged 6 commits into from
May 5, 2019
Merged

Conversation

ZhennanQin
Copy link
Contributor

Description

This PR brings below changes to ImageRecordIter:

  • Add new parameter dtype, making ImageRecordIter(dtype='uint8') equivalent to ImageRecordUInt8Iter.
  • Add new optional parameter ctx, which indicates the device context used for. When ctx='cpu' is specified, a CPU backend optimized data loader will be used.
  • Add new CPU backend optimized implementation. In this implementation, data_loader is working as a engine operator. Overall throughput get improved, and allow profiling data_loader overhead with built-in profiler.

@pengzhao-intel @TaoLv @xinyu-intel @anirudh2290

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http:https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@ZhennanQin ZhennanQin requested a review from szha as a code owner April 28, 2019 07:54
@pengzhao-intel pengzhao-intel added this to Review in progress in CPU Performance and Quantization Apr 28, 2019
Copy link
Contributor

@pengzhao-intel pengzhao-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change will make the CPU user more convenience to avoid tuning how much thread is used for data loading.

n_parsed_ = 0;
overflow = false;
rnd_.seed(kRandMagic + record_param_.seed);
int maxthread, threadget;
#pragma omp parallel
{
// be conservative, set number of real cores
maxthread = std::max(omp_get_num_procs() / 2 - 1, 1);
maxthread = std::max(omp_get_num_procs(), 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this get the # of logic cores?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I think we should let user to make the decision, just as normal operator. Before this change, data loader will only use n/2 cores for export OMP_NUM_THREADS=n.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since now the iterator is pushed to engine, will this omp threading make troubles?

@pengzhao-intel
Copy link
Contributor

@wkcn could you help take a review?

CPU Performance and Quantization automation moved this from Review in progress to Reviewer approved Apr 29, 2019
Copy link
Member

@wkcn wkcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for your contribution!

Could you please provide a performance comparision?

@pengzhao-intel
Copy link
Contributor

@szha to confirm the API enhancement.

@roywei
Copy link
Member

roywei commented Apr 29, 2019

@mxnet-label-bot add [Data-loading]

Copy link
Contributor

@pengzhao-intel pengzhao-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Wait a moment to see if there are other comments; otherwise, I will merge this PR in 24 hours.

@pengzhao-intel
Copy link
Contributor

@xinyu-intel please help rebase the code and paste the performance data as the request from @wkcn

@szha szha requested a review from zhreshold May 1, 2019 20:28
Copy link
Member

@zhreshold zhreshold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the wrapper looks neet.

@pengzhao-intel pengzhao-intel merged commit 621b391 into apache:master May 5, 2019
CPU Performance and Quantization automation moved this from Reviewer approved to Done May 5, 2019
access2rohit pushed a commit to access2rohit/incubator-mxnet that referenced this pull request May 14, 2019
* cpu optimized data loader

* Fix CI

* Fix CI

* Fix ci

* Fix doc
@ZhennanQin ZhennanQin deleted the loader_cpu branch May 31, 2019 02:07
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
* cpu optimized data loader

* Fix CI

* Fix CI

* Fix ci

* Fix doc
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

7 participants