Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SUM function has calculation precision problems #1471

Open
KarasKsenia opened this issue Jun 3, 2024 · 5 comments
Open

SUM function has calculation precision problems #1471

KarasKsenia opened this issue Jun 3, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@KarasKsenia
Copy link

EPPlus usage

Commercial use (I have a commercial license)

Environment

Windows

Epplus version

7.1.3

Spreadsheet application

Excel

Description

Hi everybody!

We discussed this issue previously, all the info can be seen here #767. The main case is that SUM function had precision error and provided unexpected results, for example when adding 0.2 and 21.9, the result was less than 22.1.
The KahanSum algorithm was implemented in the new major version of EPPlus (v7) to handle this kind of issues. Unfortunately after some simple testing it looks like the issue wasn't handled and there is still no way to get stable precise sum results for very simple equations. The test, that we used, is here:

[TestMethod]
public void Sum_must_not_have_double_precision_error_issue_766()
{
    using var p = new ExcelPackage();
    p.Workbook.Worksheets.Add("first");

    var sheet = p.Workbook.Worksheets.First();

    sheet.Cells["A1"].Value = 0.4;
    sheet.Cells["B1"].Value = 44.4;
    sheet.Cells["D1"].Value = 44.8;
    sheet.Cells["E1"].Formula = "=D1-SUM(A1:B1)";

    sheet.Cells["A2"].Value = 0.2;
    sheet.Cells["B2"].Value = 21.9;
    sheet.Cells["D2"].Value = 22.1;
    sheet.Cells["E2"].Formula = "=D2-SUM(A2:B2)";

    sheet.Cells["A3"].Formula = "=E1-E2";

    p.Workbook.Calculate();

    Assert.AreEqual("0", sheet.Cells["E1"].Value.ToString());
    Assert.AreEqual("0", sheet.Cells["E2"].Value.ToString());
    Assert.AreEqual("0", sheet.Cells["A3"].Value.ToString());
}

The second Assert fails because the sum of 0.2 and 21.9 is somehow not equal to 22.1, as expected.
I checked the source code, and despite the algorithm looks perfect and basically works, the main part of the precision error handling seems not working. I'm talking of this part:
image
Unfortunately, these low-order digits (variable _c), which are lost after adding up two doubles, have no impact on the value of the KahanSum. The returned result is always equal to the value of the _sum variable, no matter whether _c has a value greater than 0 or not.

@KarasKsenia KarasKsenia added the bug Something isn't working label Jun 3, 2024
@OssianEPPlus
Copy link
Contributor

Thanks for your report! We've verifed the issue and are looking into it.

@swmal
Copy link
Contributor

swmal commented Jun 3, 2024

Hi @KarasKsenia,

Kahansum does unfortunately not remove all IEEE 754 floating point errors, what it does best is to avoid that the floating point error grows when summarizing larger datasets. When using Assert.AreEqual or similar evaluation methods that requires a perfect match you can still get these small deviations when using System.Double.

When we are checking for equality with doubles that has been used in numeric operations ourselves we rather use a method like this:

static double precision = 0.000000000000001;

bool Equals(double lnum, double rnum)
{
   return (Math.Abs(lnum - rnum) < precision);
}

@KarasKsenia
Copy link
Author

Hi, thanks for your answer!

Is there any chance that in future SUM function will use decimal instead of double? I suppose KahanSum works good for larger datasets, but basically SUM function should work as expected for any dataset size.
I understand that current users have already customized their products for the current behavior. Maybe it is possible to create kind of setting for ExcelPackage to use decimals? So by default it uses doubles, but it would also be possible to make precise calculations by using decimals.

@swmal
Copy link
Contributor

swmal commented Jun 4, 2024

Hi,

We are not planning to replace System.Double with System.Decimal.

  1. The Decimal data type is not free from floating point issues either, even if the precision is higher.
  2. It is much slower in calculations than the double. In our tests System.Double performs approximately 5 times faster than Decimal.
  3. It consumes twice the amount of memory as the Double.
  4. Excel, Libre Office, etc uses the IEEE 754 double internally (a.k.a. System.Double), see https://en.wikipedia.org/wiki/Numeric_precision_in_Microsoft_Excel). Therefore the higher precision provided by the Decimal datatype will not be reflected in the spreadsheet program.
  5. We use the same data type in all our functions internally so the result from one function can be processed by another function without type conversion.

What we have discussed and might do in the future is to implement "Precision as displayed" for cell values. In such a feature we would round numeric values according to the format of the cell (which will have some performance issues but still might be valuable).

You can also replace the SUM function provided by EPPlus with your own custom implementation that uses Decimal instead of Double if you like. See this wiki link and the source code for our implementation of SUM.

@honoyang
Copy link

Hi all,
Recently I met a problem, maybe is related to the precision difference when calculate formula.

https://stackoverflow.com/questions/78580381/c-sharp-epplus-calculation-is-different-from-excel

I just wanted to provide some information for reference. (or this is a test case)
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants