variance nonzero for constant array #9631

gerritholl · 2017-08-30T22:39:43Z

NumPy may give a nonzero variance (and thus standard deviation) for a constant array. This may be due to loss of numerical precision, but Pythons builtin variance routine gives the correct 0 answer, so clearly it's a preventable loss:

In [45]: x = [6715266981.538051]*10

In [46]: statistics.variance(x)
Out[46]: 0.0

In [47]: numpy.array(x).var()
Out[47]: 9.0949470177292824e-13

I think it would be highly desirable if x.std() and x.var() for a constant array could be assumed to be exactly identical to zero. I'm aware one should not compare floating point numbers but zero is a bit of a special case.

The text was updated successfully, but these errors were encountered:

njsmith · 2017-08-31T02:20:11Z

FWIW, statistics.variance appears to do all its arithmetic using exact arbitrary-precision fractions, which in addition to being extremely slow will probably blow up your memory on reasonable-sized datasets.

bashtage · 2017-08-31T08:52:29Z

If you want to check for 0 variance, you should use p2p

gerritholl · 2017-08-31T09:08:33Z

I see, the statistics approach is indeed too high a price to pay for numpy. I figured there might be a reformulated numerical algorithm where loss of precision does not cause the same problem but I might be wrong there.

I ran into it in a vectorised situation, where I would have noticed a bug in my code months ago if my .std(x) had been zero instead of small (due to things being infinite). I'm prepending a .ptp() check now to act correctly where ptp() is zero.

Perhaps I'm asking too much here. FWIW, I confirmed that Matlab also gives a nonzero answer (although not the same as numpy!).

njsmith · 2017-08-31T17:25:53Z

I'm not aware of any general-purpose numerical algorithms that can give that kind of precision guarantee. I think the only ways you could guarantee exact 0 from a variance calculation are by using an infinite precision calculation, or by doing a preprocessing pass to explicitly check for all the values being equal. I guess the latter isn't even necessarily all that expensive in common cases where you can bail out after checking a few values, but it's extra complexity (esp. in the vectorized case) for a pretty specific and unusual case. I dunno.

WarrenWeckesser · 2017-09-01T08:00:00Z

I'm not aware of any general-purpose numerical algorithms that can give that kind of precision guarantee.

FWIW, it is not hard to show that Welford's algorithm [1] will give a variance that is exactly 0 when the input is constant.

Numpy uses a two-pass algorithm, and the underlying problem appears to be that numpy's calculation of the mean of x is off by one ULP:

In [258]: x = np.array([6715266981.538051]*10)

In [259]: np.mean(x)
Out[259]: 6715266981.5380497

In [260]: np.mean(x) == x[0]
Out[260]: False

[1] https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm

njsmith · 2017-09-01T10:16:32Z

Ah, you're quite right -- I actually checked that one before posting, but misread the details :-). Do we know how it compares to our current algorithm in terms of speed or accuracy?

WarrenWeckesser · 2017-09-01T12:47:02Z

I don't know about the speed. John D. Cook blogged about the accuracy: https://www.johndcook.com/blog/2008/09/26/comparing-three-methods-of-computing-standard-deviation/

sk1p · 2019-01-18T22:46:45Z

Related paper with accuracy and runtime comparisons of various methods: https://dl.acm.org/citation.cfm?id=3223036

mhvk · 2019-01-19T20:22:22Z

@sk1p - thanks for the link! I think it would be great to use that one-pass algorithm. Perhaps ideally for a new gufunc that would calculate (weighted) mean and variance in one go.

mhvk · 2019-03-30T14:54:32Z

Following up on this: a quick gufunc implementation by @0x0L of the algorithm referenced by @sk1p above correctly gives a variance of zero for the example on top.

WarrenWeckesser mentioned this issue Oct 26, 2018

scipy.stats.pearsonr: possible bug with zero variance input scipy/scipy#3728

Closed

mhvk mentioned this issue Jan 31, 2019

mean() for datetime64 #12901

Open

mhvk mentioned this issue Mar 29, 2019

var, std memory consumption #13199

Open

0x0L mentioned this issue Apr 4, 2019

ENH: ufunc helper for variance #13263

Closed

mk0417 mentioned this issue Jun 11, 2020

unexpected behavior of dask groupby std dask/dask#6306

Open

WarrenWeckesser mentioned this issue Sep 7, 2020

np.std incosistent behavior when all values are equal #17261

Closed

ConnectedSystems mentioned this issue Sep 10, 2020

Non-zero variance with constant array affecting results (DMIM) SALib/SALib#344

Closed

anders-kiaer mentioned this issue Oct 28, 2020

BUG: Inconsistent correlation between constant series (varies with number of rows) pandas-dev/pandas#37448

Closed

3 tasks

WarrenWeckesser mentioned this issue Nov 27, 2020

stats.zscore inconsistent behavior when all values are the same scipy/scipy#12815

Closed

ConnectedSystems mentioned this issue Jan 8, 2023

ENH: add Sobol' indices scipy/scipy#17628

Merged

LiBingtao mentioned this issue May 17, 2024

Divided by zero in the Sobol's confidence level calculation SALib/SALib#619

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

variance nonzero for constant array #9631

variance nonzero for constant array #9631

gerritholl commented Aug 30, 2017

njsmith commented Aug 31, 2017

bashtage commented Aug 31, 2017

gerritholl commented Aug 31, 2017 •

edited

Loading

njsmith commented Aug 31, 2017

WarrenWeckesser commented Sep 1, 2017

njsmith commented Sep 1, 2017

WarrenWeckesser commented Sep 1, 2017

sk1p commented Jan 18, 2019

mhvk commented Jan 19, 2019

mhvk commented Mar 30, 2019

variance nonzero for constant array #9631

variance nonzero for constant array #9631

Comments

gerritholl commented Aug 30, 2017

njsmith commented Aug 31, 2017

bashtage commented Aug 31, 2017

gerritholl commented Aug 31, 2017 • edited Loading

njsmith commented Aug 31, 2017

WarrenWeckesser commented Sep 1, 2017

njsmith commented Sep 1, 2017

WarrenWeckesser commented Sep 1, 2017

sk1p commented Jan 18, 2019

mhvk commented Jan 19, 2019

mhvk commented Mar 30, 2019

gerritholl commented Aug 31, 2017 •

edited

Loading