Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MXNET-349] Histogram Operator #10931

Merged
merged 5 commits into from
Jun 25, 2018
Merged

[MXNET-349] Histogram Operator #10931

merged 5 commits into from
Jun 25, 2018

Conversation

haojin2
Copy link
Contributor

@haojin2 haojin2 commented May 14, 2018

Description

This PR implements histogram operator.

Checklist

Essentials

  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http:https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Implementation of internal _histogram operator
  • User-facing Python interface
  • Unit tests

Comments

The user-facing interface resembles basic behaviors of Numpy's histogram: https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.histogram.html.
Three versions are supported:
histogram(x, bins=7) - divided into 7 equal bins in the range of [x.min, x.max]
histogram(x, bins=7, range=(a,b)) - divided into 7 equal bins in the range of [a, b], anything outside the range is ignored
histogram(x, bins=NDArray) - bin bounds are specified by the bins NDArray.

@haojin2 haojin2 requested a review from szha as a code owner May 14, 2018 08:18
@haojin2 haojin2 force-pushed the histogram branch 2 times, most recently from a1bf838 to 49a9ae4 Compare May 14, 2018 08:31
return _internal._histogram(data=a, bins=bins)
elif isinstance(bins, int):
if range_ is None:
range_ = (float(a.min().asnumpy()[0]), float(a.max().asnumpy()[0]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't do this in the frontend.

elif isinstance(bins, int):
if range_ is None:
range_ = (float(a.min().asnumpy()[0]), float(a.max().asnumpy()[0]))
return _internal._histogram(data=a, bins=array([]), bin_cnt=bins, range=range_)
Copy link
Contributor

@piiswrong piiswrong May 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why make an invalid array/

Copy link
Contributor Author

@haojin2 haojin2 May 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bins is not used in this version of _histogram, but it's one of the inputs so I need a mock array.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can define a mode variable in HistogramParam for distinguishing the cases of 1 and 2 ndarray inputs. When register the op, use set_num_inputs([](const NodeAttrs& attrs) to define the number of inputs based upon the mode.
See example:
https://github.com/apache/incubator-mxnet/blob/master/src/operator/nn/convolution.cc#L471

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

.add_argument("bins", "NDArray-or-Symbol", "Input ndarray")
.add_arguments(HistogramParam::__FIELDS__());

NNVM_REGISTER_OP(_backward_histogram)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Historgram op is not differentiable. Don't register backward op.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will get rid of it.

@haojin2 haojin2 force-pushed the histogram branch 3 times, most recently from 7c44e7f to b17585a Compare May 14, 2018 20:50
@@ -3740,3 +3740,32 @@ def empty(shape, ctx=None, dtype=None):
if dtype is None:
dtype = mx_real_t
return NDArray(handle=_new_alloc_handle(shape, ctx, False, dtype))


def histogram(a, bins=10, range_=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why underscore suffix for range?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pylint complains if we override the keyword range

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@piiswrong numpy.histogram has parameter range. Should we disable pylint check here to make mx.nd.histogram parameter naming consistent with numpy?

"""

# pylint: disable= no-member, protected-access
if isinstance(bins, NDArray):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to check whether bins is a 1d array?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's checked inside the op

# pylint: disable= no-member, protected-access
if isinstance(bins, NDArray):
return _internal._histogram(data=a, bins=bins)
elif isinstance(bins, int):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace int with integer_types.

def f(x, bins=10, range=None):
return np.histogram(x, bins, range=range)

shape = (3, 3, 3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Randomize shape and test ndarrays from 1D-5D.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

x = rand_ndarray(shape, stype='default')
mx_bins = mx.nd.array([-1.0, 0.5, 2.0, 4.5, 50.0])
np_bins = mx_bins.asnumpy()
bin_cnt = 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should test more settings of bin_cnt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

elif isinstance(bins, int):
if range_ is None:
range_ = (float(a.min().asnumpy()[0]), float(a.max().asnumpy()[0]))
return _internal._histogram(data=a, bins=array([]), bin_cnt=bins, range=range_)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can define a mode variable in HistogramParam for distinguishing the cases of 1 and 2 ndarray inputs. When register the op, use set_num_inputs([](const NodeAttrs& attrs) to define the number of inputs based upon the mode.
See example:
https://github.com/apache/incubator-mxnet/blob/master/src/operator/nn/convolution.cc#L471

SHAPE_ASSIGN_CHECK(*out_attrs, 1, in_attrs->at(1));
}

return out_attrs->at(0).ndim() == 1U && out_attrs->at(0).Size() != 0U &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call shape_is_none to simplify the code. ndim == 1 should have been guaranteed before this line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


TYPE_ASSIGN_CHECK(*out_attrs, 0, mshadow::kInt64);
TYPE_ASSIGN_CHECK(*out_attrs, 1, in_attrs->at(0));
return out_attrs->at(0) != -1 && out_attrs->at(1) != -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call !type_is_none(out_attrs->at(i))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@haojin2 haojin2 force-pushed the histogram branch 4 times, most recently from 26a9d38 to c2a9f1d Compare May 30, 2018 20:38
@haojin2 haojin2 changed the title [MXNET-349] [WIP] Histogram Operator [MXNET-349] Histogram Operator May 30, 2018
@haojin2 haojin2 force-pushed the histogram branch 4 times, most recently from 240eecc to 55d6ccb Compare June 2, 2018 01:38
@haojin2
Copy link
Contributor Author

haojin2 commented Jun 2, 2018

@reminisce @piiswrong @anirudh2290 @rahul003 Unit tests added, this is ready, please give a review if you have time. Thanks!

res, bin_bounds = np.histogram(a.asnumpy(), bins=bins)
return array(res), array(bin_bounds)
return _internal._histogram(data=a, bin_cnt=bins, range=range)
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why return None? Shouldn't this cause an error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that I should throw an exception?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an exception for the illegal input case.

@haojin2 haojin2 force-pushed the histogram branch 2 times, most recently from 39fe74e to c53e477 Compare June 4, 2018 23:47
.set_attr<nnvm::FInferShape>("FInferShape", HistogramOpShape)
.set_attr<nnvm::FInferType>("FInferType", HistogramOpType)
.set_attr<FCompute>("FCompute<cpu>", HistogramOpForward<cpu>)
.set_attr<nnvm::FInplaceOption>("FInplaceOption",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to register this if it's empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
}

template<typename cpu>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a template specialization? Same for other places.

template<>
void HistogramForwardImpl<cpu>()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

template<>
void HistogramForwardImpl<cpu>(const OpContext& ctx,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change this back to using OpContext? If using Stream for dispatching, you can get rid of the templates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not using OpContext, I'm just getting rid of the stream argument here.

@haojin2 haojin2 force-pushed the histogram branch 3 times, most recently from 8689ebd to ee38e0c Compare June 6, 2018 20:39
@haojin2
Copy link
Contributor Author

haojin2 commented Jun 7, 2018

@piiswrong Should be ready for merge, please take a look when you have time, thanks!

@piiswrong
Copy link
Contributor

So symbolic is not supported?

@haojin2
Copy link
Contributor Author

haojin2 commented Jun 12, 2018

@piiswrong There's no backward pass for this function, I wonder if symbolic is necessary in this case?

@haojin2
Copy link
Contributor Author

haojin2 commented Jun 22, 2018

@piiswrong symbol is added for histogram.

@haojin2 haojin2 force-pushed the histogram branch 2 times, most recently from 8bc8e18 to 4595fd9 Compare June 23, 2018 16:24
@piiswrong piiswrong merged commit ed7e360 into apache:master Jun 25, 2018
XinYao1994 pushed a commit to XinYao1994/incubator-mxnet that referenced this pull request Aug 29, 2018
* implementation of histogram operator

* address code reviews and code re-design

* add exception for invalid inputs

* address code reviews

* add symbol and symbolic forward check for histogram
@haojin2 haojin2 deleted the histogram branch March 25, 2019 08:14
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants