Skip to content
This repository has been archived by the owner on Aug 2, 2023. It is now read-only.

Optimizing Int32 Primitive Parsers and clean up #1616

Merged
merged 19 commits into from
Jun 27, 2017

Conversation

ahsonkhan
Copy link
Member

@ahsonkhan ahsonkhan commented Jun 16, 2017

  • Also adding some performance tests and additional test cases.
  • Optimized Invariant UTF-8 and Non-Invariant (for Int32). I would imagine similar optimizations can be applied to the rest.

Here are the key optimizations:

  • TBD

image

image

cc @KrzysztofCwalina, @shiftylogic

Edit: Updated the results with latest changes

@ahsonkhan
Copy link
Member Author

cc @jkotas

Any thoughts on how I can improve the performance for small integers/strings for Invariant UTF8 TryParse?

public static bool TryParseInt32(ReadOnlySpan<byte> text, out int value, out int bytesConsumed)

I think we can get some more savings to avoid the ~10% perf regression for a single digit.

@jkotas
Copy link
Member

jkotas commented Jun 17, 2017

First, you should fix the buffer overruns in your unsafe code before trying to figure out how to optimize it.

I do not think you should be using unsafe code for number parsing. The overhead of the bounds check should be neglible - assuming that the JIT works as expected.

@jkotas
Copy link
Member

jkotas commented Jun 17, 2017

And yes ... I can be optimized. You can start by reading the first character from memory no more than once.

@jkotas
Copy link
Member

jkotas commented Jun 20, 2017

@alexandrnikitin is working on improving Int32 parser in CoreLib: dotnet/coreclr#12196. Some of the tricks can be shared.

@ahsonkhan
Copy link
Member Author

The PR is ready for review. I have updated the performance results in the original post.

if (text[0] == '-')
sbyte sign = 1;
int index = 0;
byte num = text[index];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will likely work better for this to be int, so that you pay for byte->int zero extension just once.

// Parse the first digit separately. If invalid here, we need to return false.
int firstDigit = text[indexOfFirstDigit] - 48; // '0'
if (firstDigit < 0 || firstDigit > 9)
if (num >= '0' && num <= '9')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use a trick like (num - '0') <= ('9' '- '0') to save an extra conditional branch in these.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice if the JIT would do this (can't remember if there's an issue tracking it). Better yet, the C# compiler or an IL optimizer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there's an issue tracking it

I am not able to find one (it was discussed in dotnet/coreclr#4881 and other PRs). @ahsonkhan Could you please make one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tyvm :)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't remember if there's an issue tracking it

AFAIK there's no such issue. Though I happen to have an experimental JIT change that does this optimization. It remains to be seen what will come out of it.

@ahsonkhan ahsonkhan mentioned this pull request Jun 21, 2017
if (text.Length < 1)
{
charsConsumed = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default(int) ? is that just 0 ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, though default implies it isn't necessarily set vs explicitly setting to 0 which suggests a chosen value? So semantically using value = default(int); makes more sense?

Copy link
Member Author

@ahsonkhan ahsonkhan Jun 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, of course. This code will likely change quite a bit, so I will address it then.
I will change it to value=default in the InvariantUTF8.TryParse case, since that is the candidate method I am optimizing.

[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static bool IsDigit(int i)
{
return i >= 0 && i <= 9;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(uint)i <= 9

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? The first check should already have guaranteed the value is not negative.

Copy link
Member Author

@ahsonkhan ahsonkhan Jun 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private static bool IsDigitA(int i)
{
    return (uint)i <= 9;
}

private static bool IsDigitB(int i)
{
    return i >= 0 && i <= 9;
}

IsDigitA(Int32)
L0000: cmp ecx, 0x9
L0003: setbe al
L0006: movzx eax, al
L0009: ret

IsDigitB(Int32)
L0000: test ecx, ecx
L0002: jl L000e
L0004: cmp ecx, 0x9
L0007: setle al
L000a: movzx eax, al
L000d: ret
L000e: xor eax, eax
L0010: ret

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I misread as return i >=0 && (uint)i <= 9 rather than just replace with return (uint)i <= 9.

}

// If parsedValue > (sbyte.MaxValue / 10), any more appended digits will cause overflow.
// if parsedValue == (sbyte.MaxValue / 10), any nextDigit greater than 7 or 8 (depending on sign) implies overflow.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the second line of the comment does not match the implementation. not sure which is right. the comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the part you are referring to?

// if parsedValue == (sbyte.MaxValue / 10)

Are you suggesting something like this:

// If parsedValue > (sbyte.MaxValue / 10), any more appended digits will cause overflow.
// Else, any nextDigit greater than 7 or 8 (depending on sign) implies overflow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, just realized we need still need this check (i.e. the comment is correct):

if parsedValue == (sbyte.MaxValue / 10)

Will fix.

if (num >= '0' && num <= '9')
{
num -= (byte)'0';
if (WillOverFlow(answer, num, sign)) goto FalseExit;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems very complicated. My suggestion would be

  • separate parsing of input longer than Int32OverflowLength into a separate method. Shorter numbers are much more common.
  • WillOverFlow seems pretty complex for what it does. a * 10 + b is not so expensive nowdays. Perhaps it is cheaper to just add and see if sign flips ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is basically the idea for parsing numbers longer than Int32OverflowLength, I wonder if it will work better:

        static void Main(string[] args)
        {
            for (long i = int.MaxValue - 1000; i < int.MaxValue + 10L; i++)
            {
                // lets parse "chars" 
                string chars = i.ToString();

                // this will need to come from parsing leading '-'
                int sign = 1;

                // here is our result
                int accumulator = 0;

                foreach (var c in chars)
                {
                    // will need to check if 'c' is a digit. Assume it is..
                    var d = c - '0';

                    // skip leading 0s
                    if ((d | accumulator) == 0)
                        continue;

                    // try add a digit
                    if (!TryAddNoOverflow(ref accumulator, d, sign))
                    {
                        System.Console.WriteLine("not representable " + (i));
                    }
                }
            }
        }

        // accumulate the value and ensure the sign
        // returns false when number is no longer representable in 32bit
        static bool TryAddNoOverflow(ref int value, int nextDigit, int sign)
        {
            var newValue = value * 10 + (nextDigit * sign);

            // opposite sign check for {newValue, sign}
            if ((newValue ^ sign) < 0)
            {
                return false;
            }

            value = newValue;
            return true;
        }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// opposite sign check for {newValue, sign}

This won't always give you the right result.

If value becomes > 2* int.MaxValue when a new digit gets added, the sign doesn't change.

TryAddNoOverflow(214748364, 7, 1); => Returns True, correct.
TryAddNoOverflow(514748364, 7, 1); => Returns True, wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, i did not realize that. Most of this solution would not be applicable then.

@ahsonkhan
Copy link
Member Author

@dotnet-bot test Innerloop OSX10.12 Debug Build and Test

1 similar comment
@ahsonkhan
Copy link
Member Author

@dotnet-bot test Innerloop OSX10.12 Debug Build and Test

@ahsonkhan
Copy link
Member Author

Here is what I have in mind (other suggestions are incorporated as well:

index++;
if (index < textLength) goto Done;
if (!IsDigit(text[index])) goto Done;

Should line 2 above be if (index <= textLength)? Otherwise, there is an argument out of range exception.
Also, after changing line 2, a test like this would fail: Parse("21474836471") - should return false, it returns true since if (lAnswer > Int32.MaxValue) is false.

@ahsonkhan
Copy link
Member Author

@jkotas, @VSadov
Thoughts on a loop unrolled version? It seems quite promising:
https://gist.github.com/ahsonkhan/dba0afb3304de15be844111d0f8a3ea6

@jkotas
Copy link
Member

jkotas commented Jun 23, 2017

Should line 2 above be if (index <= textLength)?

Yes, it should be if (index >= textLength).

@jkotas
Copy link
Member

jkotas commented Jun 23, 2017

Thoughts on a loop unrolled version? It seems quite promising:

The switch works great for microbenchmarks that are not sensitive to code size (the manual unrolling is like 10x code bloat); and that are always parsing numbers with exact same number of digits - the indirect jump that switch compiles into is well predicted. Once you throw more realistic mix on it - where the number of digits is not always the same and there is a bunch of other code running - it becomes not so good. We used to have similar switch in the memcpy implementation in CoreLib. dotnet/coreclr#9786 got rid of it because of it did not work for real workloads.

So it is tricky to see whether the switch is a good thing on microbenchmarks.

@jkotas
Copy link
Member

jkotas commented Jun 23, 2017

Loop unrolled version without a switch may be interesting though. Something like:

   if (index >= textLength) goto Done;
   num = text[index];
   if (!IsDigit(num)) goto Done;
   index++;
   answer = 10 * answer + num - '0';

   ... same block repeated ...

   if (index >= textLength) goto Done;
   num = text[index];
   if (!IsDigit(num)) goto Done;
   index++;
   answer = 10 * answer + num - '0';

   // Potential overflow
   if (index >= textLength) goto Done;
   num = text[index];
   if (!IsDigit(num)) goto Done;
   long lAnswer = (long)answer * 10 + num - '0';
   if (sign < 0)
   {
       if (lAnswer > (long)Int32.MaxInt + 1) goto FalseExit;
   }
   else
   {
       if (lAnswer > (long)Int32.MaxInt) goto FalseExit;
   }
   answer = (int)lAnswer;
   index++;

   if (index >= textLength) goto Done;
   if (!IsDigit(text[index])) goto Done;

   // Guaranteed overflow
   goto FalseExit;

You can also tune the code bloat factor by unrolling just the first few digits - where it improves the microbenchmark significantly, and run a regular loop for the rest - where the unrolling benefits starts to diminish.


int sign = 1;
if ((TextEncoder.Symbol)nextSymbol == TextEncoder.Symbol.MinusSign)
ref byte textByte = ref text.DangerousGetPinnableReference();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be using unsafe code either.

@jkotas
Copy link
Member

jkotas commented Jun 23, 2017

Any ideas on how such an optimization would work for Int64 parsing

The following modification should work equally well for both Int32 and Int64.

// Potential overflow 
if (index >= textLength) goto Done;
num = text[index];
if (!IsDigit(num)) goto Done; 
if (answer > Int32.MaxValue/10 + 1) goto FalseExit; // Overflow
answer = answer * 10 + num - '0'; 

if ((uint)answer > (uint)Int32.MaxValue + (-1 * sign + 1) / 2) goto FalseExit; // Overflow
index++;
if (index >= textLength) goto Done;
if (!IsDigit(text[index])) goto Done; 

// Guaranteed overflow 
goto FalseExit;

@jkotas
Copy link
Member

jkotas commented Jun 23, 2017

Is this optimization in the right direction? We save one conditional and substitute with some arithmetic instructions

It looks better - do you see any measurable perf difference? It will be different if you take the unification for Int32/Int64 above. And for 32-bit vs. 64-bit platforms.

It would be interesting to see the code for the full parse method - for both x86 and x64. Could you please post it into a gist once you get something you are happy with?

@ahsonkhan
Copy link
Member Author

The following modification should work equally well for both Int32 and Int64.

Ah, lol, leveraging unsigned int works :)

What about unsigned long parsing?

@jkotas
Copy link
Member

jkotas commented Jun 23, 2017

What about unsigned long parsing?

Check that the answer got smaller (same for uint32 parsing):

if (answer > UInt64.MaxValue/10 + 1) goto FalseExit; // Overflow
ulong answer2 = answer * 10 + num - '0'; 

if (answer2 < answer) goto FalseExit; // Overflow
answer = answer2;
index++;

BTW: long/ulong should not have the loop fully unrolled - having the same bit repeated ~18x would be too much, I think.

@ahsonkhan
Copy link
Member Author

BTW: long/ulong should have the loop fully unrolled - having the same bit repeated ~18x would be too much, I think.

should NOT have the loop fully unrolled?

@jkotas
Copy link
Member

jkotas commented Jun 23, 2017

right

@ahsonkhan
Copy link
Member Author

It will be different if you take the unification for Int32/Int64 above. And for 32-bit vs. 64-bit platforms.
It would be interesting to see the code for the full parse method - for both x86 and x64. Could you please post it into a gist once you get something you are happy with?

@jkotas, here is the data and disassembly:
x64: https://gist.github.com/ahsonkhan/dba0afb3304de15be844111d0f8a3ea6#gistcomment-2131566
x86: https://gist.github.com/ahsonkhan/dba0afb3304de15be844111d0f8a3ea6#gistcomment-2131567

@ahsonkhan
Copy link
Member Author

ahsonkhan commented Jun 24, 2017

Is the PR good to go or should I update the overflow checks to one of the following options?
(This is currently in the PR)

if (lAnswer > (long)Int32.MaxValue + (-1 * sign + 1) / 2) goto FalseExit;

(Option 1)

if (sign < 0)
{
    if (lAnswer > (long)Int32.MaxValue + 1) goto FalseExit;
}
else
{
    if (lAnswer > Int32.MaxValue) goto FalseExit;
}

(Option 2)

if ((uint)answer > (uint)Int32.MaxValue + (-1 * sign + 1) / 2) goto FalseExit;

(Option 3)

if (sign < 0)
{
    if ((uint)answer > (uint)Int32.MaxValue + 1) goto FalseExit;
}
else
{
    if ((uint)answer > Int32.MaxValue) goto FalseExit;
}

@jkotas
Copy link
Member

jkotas commented Jun 24, 2017

update the overflow checks to one of the following options?

I do not think that it matters a lot which option you pick. Option 2 is probably smallest/fastest code.

@jkotas
Copy link
Member

jkotas commented Jun 24, 2017

04BB16A4  cmp         edx,ebx  
04BB16A6  jge         04BB1923  
04BB16AC  cmp         edx,edi  
04BB16AE  jae         04BB1943  

This looks like that the caching of length in textLength local is hurting because of the JIT just burns a register for it. I think you should use text.Length instead everywhere, and get rid of the local.

Also, these bounds checks should not be duplicated. I know it was discussed before, but I do not remember the conclusion ... you may need to help the JIT to eliminate the redundant check by using unsigned comparison like if ((uint)index >= (uint)text.Length).

@ahsonkhan
Copy link
Member Author

ahsonkhan commented Jun 26, 2017

you may need to help the JIT to eliminate the redundant check by using unsigned comparison

That did not help remove the duplicated bounds check.

@ahsonkhan
Copy link
Member Author

ahsonkhan commented Jun 26, 2017

@shiftylogic, @jkotas - good to go?

@shiftylogic
Copy link
Contributor

This code makes assumptions that the values in the SymbolTable.Symbol enum are very specific. I suggest adding a comment to that enum stating that the parsing code depends on these specific value settings and that they should not be touched without being very careful of breaking the parsing code.

@ahsonkhan
Copy link
Member Author

I suggest adding a comment to that enum stating that the parsing code depends on these specific value settings and that they should not be touched without being very careful of breaking the parsing code.

I believe the tests would fail if the enums changed. I will definitely add a comment though.
However, this PR did not introduce this dependency on the SymbolTable, it already existed.

ahsonkhan and others added 3 commits June 26, 2017 16:51
* CaterburyPerf

* flush/compress

* huge changes

* space

* small

* flush/compress

* huge changes

* space

* small

* unsaved chagnes

* test fix depends on APIchanges

* resolve issues

* Less alocation, change State

* issue

* remove unnecessary

* resolve

* clean up changes, change state

* resolve

* guidelines

* fix bug

* resolve issues

* change access

* small issues
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants