-
Notifications
You must be signed in to change notification settings - Fork 345
Optimizing Int32 Primitive Parsers and clean up #1616
Conversation
cc @jkotas Any thoughts on how I can improve the performance for small integers/strings for Invariant UTF8 TryParse? public static bool TryParseInt32(ReadOnlySpan<byte> text, out int value, out int bytesConsumed) I think we can get some more savings to avoid the ~10% perf regression for a single digit. |
First, you should fix the buffer overruns in your unsafe code before trying to figure out how to optimize it. I do not think you should be using unsafe code for number parsing. The overhead of the bounds check should be neglible - assuming that the JIT works as expected. |
And yes ... I can be optimized. You can start by reading the first character from memory no more than once. |
@alexandrnikitin is working on improving Int32 parser in CoreLib: dotnet/coreclr#12196. Some of the tricks can be shared. |
The PR is ready for review. I have updated the performance results in the original post. |
if (text[0] == '-') | ||
sbyte sign = 1; | ||
int index = 0; | ||
byte num = text[index]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will likely work better for this to be int
, so that you pay for byte->int zero extension just once.
// Parse the first digit separately. If invalid here, we need to return false. | ||
int firstDigit = text[indexOfFirstDigit] - 48; // '0' | ||
if (firstDigit < 0 || firstDigit > 9) | ||
if (num >= '0' && num <= '9') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use a trick like (num - '0') <= ('9' '- '0')
to save an extra conditional branch in these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice if the JIT would do this (can't remember if there's an issue tracking it). Better yet, the C# compiler or an IL optimizer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if there's an issue tracking it
I am not able to find one (it was discussed in dotnet/coreclr#4881 and other PRs). @ahsonkhan Could you please make one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logged https://github.com/dotnet/roslyn/issues/20375 on the C# side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tyvm :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't remember if there's an issue tracking it
AFAIK there's no such issue. Though I happen to have an experimental JIT change that does this optimization. It remains to be seen what will come out of it.
if (text.Length < 1) | ||
{ | ||
charsConsumed = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default(int)
? is that just 0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, though default
implies it isn't necessarily set vs explicitly setting to 0
which suggests a chosen value? So semantically using value = default(int);
makes more sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, of course. This code will likely change quite a bit, so I will address it then.
I will change it to value=default
in the InvariantUTF8.TryParse case, since that is the candidate method I am optimizing.
[MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
private static bool IsDigit(int i) | ||
{ | ||
return i >= 0 && i <= 9; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(uint)i <= 9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? The first check should already have guaranteed the value is not negative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private static bool IsDigitA(int i)
{
return (uint)i <= 9;
}
private static bool IsDigitB(int i)
{
return i >= 0 && i <= 9;
}
IsDigitA(Int32)
L0000: cmp ecx, 0x9
L0003: setbe al
L0006: movzx eax, al
L0009: ret
IsDigitB(Int32)
L0000: test ecx, ecx
L0002: jl L000e
L0004: cmp ecx, 0x9
L0007: setle al
L000a: movzx eax, al
L000d: ret
L000e: xor eax, eax
L0010: ret
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I misread as return i >=0 && (uint)i <= 9
rather than just replace with return (uint)i <= 9
.
} | ||
|
||
// If parsedValue > (sbyte.MaxValue / 10), any more appended digits will cause overflow. | ||
// if parsedValue == (sbyte.MaxValue / 10), any nextDigit greater than 7 or 8 (depending on sign) implies overflow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the second line of the comment does not match the implementation. not sure which is right. the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the part you are referring to?
// if parsedValue == (sbyte.MaxValue / 10)
Are you suggesting something like this:
// If parsedValue > (sbyte.MaxValue / 10), any more appended digits will cause overflow.
// Else, any nextDigit greater than 7 or 8 (depending on sign) implies overflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, just realized we need still need this check (i.e. the comment is correct):
if parsedValue == (sbyte.MaxValue / 10)
Will fix.
if (num >= '0' && num <= '9') | ||
{ | ||
num -= (byte)'0'; | ||
if (WillOverFlow(answer, num, sign)) goto FalseExit; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems very complicated. My suggestion would be
- separate parsing of input longer than Int32OverflowLength into a separate method. Shorter numbers are much more common.
- WillOverFlow seems pretty complex for what it does.
a * 10 + b
is not so expensive nowdays. Perhaps it is cheaper to just add and see if sign flips ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is basically the idea for parsing numbers longer than Int32OverflowLength, I wonder if it will work better:
static void Main(string[] args)
{
for (long i = int.MaxValue - 1000; i < int.MaxValue + 10L; i++)
{
// lets parse "chars"
string chars = i.ToString();
// this will need to come from parsing leading '-'
int sign = 1;
// here is our result
int accumulator = 0;
foreach (var c in chars)
{
// will need to check if 'c' is a digit. Assume it is..
var d = c - '0';
// skip leading 0s
if ((d | accumulator) == 0)
continue;
// try add a digit
if (!TryAddNoOverflow(ref accumulator, d, sign))
{
System.Console.WriteLine("not representable " + (i));
}
}
}
}
// accumulate the value and ensure the sign
// returns false when number is no longer representable in 32bit
static bool TryAddNoOverflow(ref int value, int nextDigit, int sign)
{
var newValue = value * 10 + (nextDigit * sign);
// opposite sign check for {newValue, sign}
if ((newValue ^ sign) < 0)
{
return false;
}
value = newValue;
return true;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// opposite sign check for {newValue, sign}
This won't always give you the right result.
If value becomes > 2* int.MaxValue when a new digit gets added, the sign doesn't change.
TryAddNoOverflow(214748364, 7, 1); => Returns True, correct.
TryAddNoOverflow(514748364, 7, 1); => Returns True, wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, i did not realize that. Most of this solution would not be applicable then.
@dotnet-bot test Innerloop OSX10.12 Debug Build and Test |
1 similar comment
@dotnet-bot test Innerloop OSX10.12 Debug Build and Test |
index++;
if (index < textLength) goto Done;
if (!IsDigit(text[index])) goto Done; Should line 2 above be |
@jkotas, @VSadov |
Yes, it should be |
The switch works great for microbenchmarks that are not sensitive to code size (the manual unrolling is like 10x code bloat); and that are always parsing numbers with exact same number of digits - the indirect jump that switch compiles into is well predicted. Once you throw more realistic mix on it - where the number of digits is not always the same and there is a bunch of other code running - it becomes not so good. We used to have similar switch in the memcpy implementation in CoreLib. dotnet/coreclr#9786 got rid of it because of it did not work for real workloads. So it is tricky to see whether the switch is a good thing on microbenchmarks. |
Loop unrolled version without a switch may be interesting though. Something like:
You can also tune the code bloat factor by unrolling just the first few digits - where it improves the microbenchmark significantly, and run a regular loop for the rest - where the unrolling benefits starts to diminish. |
|
||
int sign = 1; | ||
if ((TextEncoder.Symbol)nextSymbol == TextEncoder.Symbol.MinusSign) | ||
ref byte textByte = ref text.DangerousGetPinnableReference(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be using unsafe code either.
The following modification should work equally well for both Int32 and Int64.
|
It looks better - do you see any measurable perf difference? It will be different if you take the unification for Int32/Int64 above. And for 32-bit vs. 64-bit platforms. It would be interesting to see the code for the full parse method - for both x86 and x64. Could you please post it into a gist once you get something you are happy with? |
Ah, lol, leveraging unsigned int works :) What about unsigned long parsing? |
Check that the answer got smaller (same for uint32 parsing):
BTW: long/ulong should not have the loop fully unrolled - having the same bit repeated ~18x would be too much, I think. |
should NOT have the loop fully unrolled? |
right |
@jkotas, here is the data and disassembly: |
Is the PR good to go or should I update the overflow checks to one of the following options? if (lAnswer > (long)Int32.MaxValue + (-1 * sign + 1) / 2) goto FalseExit; (Option 1) if (sign < 0)
{
if (lAnswer > (long)Int32.MaxValue + 1) goto FalseExit;
}
else
{
if (lAnswer > Int32.MaxValue) goto FalseExit;
} (Option 2) if ((uint)answer > (uint)Int32.MaxValue + (-1 * sign + 1) / 2) goto FalseExit; (Option 3) if (sign < 0)
{
if ((uint)answer > (uint)Int32.MaxValue + 1) goto FalseExit;
}
else
{
if ((uint)answer > Int32.MaxValue) goto FalseExit;
} |
I do not think that it matters a lot which option you pick. Option 2 is probably smallest/fastest code. |
This looks like that the caching of length in textLength local is hurting because of the JIT just burns a register for it. I think you should use Also, these bounds checks should not be duplicated. I know it was discussed before, but I do not remember the conclusion ... you may need to help the JIT to eliminate the redundant check by using unsigned comparison like |
That did not help remove the duplicated bounds check. |
@shiftylogic, @jkotas - good to go? |
This code makes assumptions that the values in the SymbolTable.Symbol enum are very specific. I suggest adding a comment to that enum stating that the parsing code depends on these specific value settings and that they should not be touched without being very careful of breaking the parsing code. |
I believe the tests would fail if the enums changed. I will definitely add a comment though. |
* CaterburyPerf * flush/compress * huge changes * space * small * flush/compress * huge changes * space * small * unsaved chagnes * test fix depends on APIchanges * resolve issues * Less alocation, change State * issue * remove unnecessary * resolve * clean up changes, change state * resolve * guidelines * fix bug * resolve issues * change access * small issues
Here are the key optimizations:
cc @KrzysztofCwalina, @shiftylogic
Edit: Updated the results with latest changes