SUM function has calculation precision problems #1471

KarasKsenia · 2024-06-03T11:02:31Z

EPPlus usage

Commercial use (I have a commercial license)

Environment

Windows

Epplus version

7.1.3

Spreadsheet application

Excel

Description

Hi everybody!

We discussed this issue previously, all the info can be seen here #767. The main case is that SUM function had precision error and provided unexpected results, for example when adding 0.2 and 21.9, the result was less than 22.1.
The KahanSum algorithm was implemented in the new major version of EPPlus (v7) to handle this kind of issues. Unfortunately after some simple testing it looks like the issue wasn't handled and there is still no way to get stable precise sum results for very simple equations. The test, that we used, is here:

[TestMethod]
public void Sum_must_not_have_double_precision_error_issue_766()
{
    using var p = new ExcelPackage();
    p.Workbook.Worksheets.Add("first");

    var sheet = p.Workbook.Worksheets.First();

    sheet.Cells["A1"].Value = 0.4;
    sheet.Cells["B1"].Value = 44.4;
    sheet.Cells["D1"].Value = 44.8;
    sheet.Cells["E1"].Formula = "=D1-SUM(A1:B1)";

    sheet.Cells["A2"].Value = 0.2;
    sheet.Cells["B2"].Value = 21.9;
    sheet.Cells["D2"].Value = 22.1;
    sheet.Cells["E2"].Formula = "=D2-SUM(A2:B2)";

    sheet.Cells["A3"].Formula = "=E1-E2";

    p.Workbook.Calculate();

    Assert.AreEqual("0", sheet.Cells["E1"].Value.ToString());
    Assert.AreEqual("0", sheet.Cells["E2"].Value.ToString());
    Assert.AreEqual("0", sheet.Cells["A3"].Value.ToString());
}

The second Assert fails because the sum of 0.2 and 21.9 is somehow not equal to 22.1, as expected.
I checked the source code, and despite the algorithm looks perfect and basically works, the main part of the precision error handling seems not working. I'm talking of this part:

Unfortunately, these low-order digits (variable _c), which are lost after adding up two doubles, have no impact on the value of the KahanSum. The returned result is always equal to the value of the _sum variable, no matter whether _c has a value greater than 0 or not.

The text was updated successfully, but these errors were encountered:

OssianEPPlus · 2024-06-03T11:21:18Z

Thanks for your report! We've verifed the issue and are looking into it.

swmal · 2024-06-03T11:52:53Z

Hi @KarasKsenia,

Kahansum does unfortunately not remove all IEEE 754 floating point errors, what it does best is to avoid that the floating point error grows when summarizing larger datasets. When using Assert.AreEqual or similar evaluation methods that requires a perfect match you can still get these small deviations when using System.Double.

When we are checking for equality with doubles that has been used in numeric operations ourselves we rather use a method like this:

static double precision = 0.000000000000001;

bool Equals(double lnum, double rnum)
{
   return (Math.Abs(lnum - rnum) < precision);
}

KarasKsenia · 2024-06-04T10:07:13Z

Hi, thanks for your answer!

Is there any chance that in future SUM function will use decimal instead of double? I suppose KahanSum works good for larger datasets, but basically SUM function should work as expected for any dataset size.
I understand that current users have already customized their products for the current behavior. Maybe it is possible to create kind of setting for ExcelPackage to use decimals? So by default it uses doubles, but it would also be possible to make precise calculations by using decimals.

swmal · 2024-06-04T11:36:25Z

Hi,

We are not planning to replace System.Double with System.Decimal.

The Decimal data type is not free from floating point issues either, even if the precision is higher.
It is much slower in calculations than the double. In our tests System.Double performs approximately 5 times faster than Decimal.
It consumes twice the amount of memory as the Double.
Excel, Libre Office, etc uses the IEEE 754 double internally (a.k.a. System.Double), see https://en.wikipedia.org/wiki/Numeric_precision_in_Microsoft_Excel). Therefore the higher precision provided by the Decimal datatype will not be reflected in the spreadsheet program.
We use the same data type in all our functions internally so the result from one function can be processed by another function without type conversion.

What we have discussed and might do in the future is to implement "Precision as displayed" for cell values. In such a feature we would round numeric values according to the format of the cell (which will have some performance issues but still might be valuable).

You can also replace the SUM function provided by EPPlus with your own custom implementation that uses Decimal instead of Double if you like. See this wiki link and the source code for our implementation of SUM.

honoyang · 2024-06-11T06:15:08Z

Hi all,
Recently I met a problem, maybe is related to the precision difference when calculate formula.

https://stackoverflow.com/questions/78580381/c-sharp-epplus-calculation-is-different-from-excel

I just wanted to provide some information for reference. (or this is a test case)
Thanks.

KarasKsenia added the bug Something isn't working label Jun 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SUM function has calculation precision problems #1471

SUM function has calculation precision problems #1471

KarasKsenia commented Jun 3, 2024

OssianEPPlus commented Jun 3, 2024

swmal commented Jun 3, 2024

KarasKsenia commented Jun 4, 2024

swmal commented Jun 4, 2024

honoyang commented Jun 11, 2024

SUM function has calculation precision problems #1471

SUM function has calculation precision problems #1471

Comments

KarasKsenia commented Jun 3, 2024

EPPlus usage

Environment

Epplus version

Spreadsheet application

Description

OssianEPPlus commented Jun 3, 2024

swmal commented Jun 3, 2024

KarasKsenia commented Jun 4, 2024

swmal commented Jun 4, 2024

honoyang commented Jun 11, 2024