Optimize Buffer.MemoryCopy #9786

vkvenkat · 2017-02-25T02:12:23Z

This PR optimizes Buffer.MemoryCopy for lengths up to a threshold. Performance measurements were made on a Skylake Desktop - Intel(R) Core(TM)i7-6700K CPU @ 4.00 GHz with 32 GB RAM. We used Windows 10 Enterprise and Ubuntu 14.04. We used a micro-benchmark which exercises multiple buckets of pseudo-random copy sizes from 0 to 4 KB (details below).

Performance improvements with the optimization:

2% to 73% for x64 Windows
-4% to 73% for x86 Windows
0% to 70% for x64 Ubuntu

The below charts show the performance speedup:

The below table has absolute clocks in Ticks for million copy operations:

Here's the pseudo-code for the micro-benchmark:

// 10 Buckets of copy sizes within specified ranges: [0,8], [9,16], [17,32],..., [2049,4096]
for (int i = 0; i < MAX_BUCKETS; i++)
{  
    Random rnd = new Random(1);  
    for (int j = 0; j < COPYLEN; j++)
    {
            // Randomize copy length within bounds
            copyLen[j] = (ushort) rnd.Next(0, 9); // Range changes with bucket
    }

    begin = DateTime.Now.Ticks;

    for (int k = 0; k < 1000000; k++)
    {
        for (count= 0; count < COPYLEN; count++) 
        {						
            Buffer.MemoryCopy(pSource, pTarget, 8, copyLen[count]); //Dest. max size changes with bucket
        }	
    }

    time = DateTime.Now.Ticks - begin; // Time reported for bucket
}

vkvenkat · 2017-02-25T02:23:13Z

@jkotas PTAL

jkotas · 2017-02-25T03:03:09Z

src/mscorlib/src/System/Buffer.cs

- PInvoke:
+#if AMD64 && !PLATFORM_UNIX 
+ FCopy:
+ _MemoryCopy(dest, src, len);


This should go through PInvoke like before so that it does not starve GC for large copies.

Sure, will do. Buffer::BlockCopy (

coreclr/src/vm/comutilnative.cpp

Line 1438 in da44463

JIT_MemCpy(dstPtr, srcPtr, count);

) uses a similar FCALL to JIT_MemCpy as well.

Yes, we have places like the use of memcpy in Buffer::BlockCopy that can starve GC and lead to long GC pause times. We are trying to not introduce more of them, and fix the ones that are there.

jkotas · 2017-02-25T03:45:14Z

src/mscorlib/src/System/Buffer.cs

+ // check can produce false positives for lengths greater than Int32.MaxInt. But, this is fine because we want to FCALL/PInvoke 
+ // for large lengths anyways.
+
+ if ((len > CopyThreshold) || ((nuint)dest < (nuint)srcEnd)) 


The (nuint)dest < (nuint)srcEnd check does not look right. It means we will hit the slow path any time the destination is lower than the source.

Why is it profitable to check for the CopyThreshold here, and not later? It adds overhead for small copies.

Had re-purposed the if ((nuint)dest - (nuint)src < len) goto PInvoke overlap check from the original. Shall think of an alternate for this.

jkotas · 2017-02-25T03:47:42Z

src/mscorlib/src/System/Buffer.cs

+ private struct Block16 { }
+
+ [StructLayout(LayoutKind.Sequential, Size = 64)]
+ private struct Block64 { } 


Does it work well / will it work well on ARM64?

Shall special-case the usage of custom structs to AMD64 @jamesqo

x86 CoreCLR switched to RyuJIT so there shouldn't be any differences between X86 and X64 here, both should use Block64.

I don't know what ARM64 does but it would make sense for it to use the best available method for moving a 64 bytes block, that should be the same or better than moving 4 bytes at a time.

It may be better to handle this by adding some specific define at the top of the file so it's easy to change:

#if AMD64 || X86 #define HAS_BLOCK64 #endif

BTW: Have you done the perf testing with crossgened corelib (corelib is always crossgened in shipping configs)? I do not think the 64-byte blocks will make difference for crossgened corelib because of it is limited to SSE2; it should not hurt though.

@jkotas Yes, all of the results posted are with crossgened corelib

@vkvenkat Great work optimizing this. I didn't have much time to look over the code, but one thing I noticed-- why are there Block16 / Block64 structs but no Block32?

Lol I just realized you made that comment a year ago. Still asking if the code is still there though

jkotas · 2017-02-25T03:57:41Z

cc @jamesqo

jamesqo

Nice changes! I have my own PR to modify MemoryCopy that I haven't gotten around to finishing yet: #9143 If this is merged I will gladly close mine in favor of yours, however.

jamesqo · 2017-02-25T04:12:21Z

src/mscorlib/src/System/Buffer.cs

+#endif // AMD64
+
+ // FCALL/PInvoke to the native version when the copy length exceeds the threshold or when the buffers are overlapping. This 
+ // check can produce false positives for lengths greater than Int32.MaxInt. But, this is fine because we want to FCALL/PInvoke 


Nit: int.MaxValue

jamesqo · 2017-02-25T04:15:10Z

src/mscorlib/src/System/Buffer.cs

+ if (len > 64) goto MCPY05;
+
+ MCPY00:
+ // len > 16 && len <= 64


Nit: Could be a Debug.Assert instead of a comment (same for other comments like this)

jamesqo · 2017-02-25T04:17:26Z

src/mscorlib/src/System/Buffer.cs

+
+ MCPY04:
+ // len < 4
+ if ((len & len) == 0) return;


len & len is always going to be len.

jamesqo · 2017-02-25T04:24:18Z

src/mscorlib/src/System/Buffer.cs

+ return;
+
+ MCPY02:
+ if ((len & 24) == 0) goto MCPY03;


Interesting that replacing the switch table with all of these branches results in better perf.

The switch statement uses a table in memory. And memory is slow. Second it will polluted your memory data cache.

jamesqo · 2017-02-25T04:26:59Z

src/mscorlib/src/System/Buffer.cs

+ *(int*)(destEnd - 8) = *(int*)(srcEnd - 8);
+ *(int*)(destEnd - 4) = *(int*)(srcEnd - 4);
+#endif 
+ return;


There are now 2/4 writes being incurred here instead of 1/2 for len == 8. I scanned your test code and it seems to indicate that for random numbers in [0, 8] the change is a net win, but have you tried testing consistently with len == 8?

Seeing a 10% perf improvement with our version for a million copies with len == 8 on x64-Windows. Looks like eliminating the switch table lead to this, despite the extra write.

jamesqo · 2017-02-25T04:28:09Z

src/mscorlib/src/System/Buffer.cs

+ // len >= 4 && len < 8
+ *(int*)dest = *(int*)src;
+ *(int*)(destEnd - 4) = *(int*)(srcEnd - 4);
+ return;


Same here, we're incurring an extra write for len == 4.

Saw an 8% perf drop for len == 4 for a million copies.

jamesqo · 2017-02-25T04:30:36Z

src/mscorlib/src/System/Buffer.cs

 #if BIT64
- *(long*)(dest + i) = *(long*)(src + i);
- *(long*)(dest + i + 8) = *(long*)(src + i + 8);
+ *(Block64*)dest = *(Block64*)src;


SIMD is not enabled for ARM64 yet, I think. I think all of the Block* usages should be under ~~AMD64~~ actually, I believe RyuJIT is the default JIT on x86 now so AMD64 || (BIT32 && !ARM)? /cc @jkotas

jamesqo · 2017-02-25T04:32:19Z

src/mscorlib/src/System/Buffer.cs

+ *(int*)(dest + 48) = *(int*)(src + 48);
+ *(int*)(dest + 52) = *(int*)(src + 52);
+ *(int*)(dest + 56) = *(int*)(src + 56);
+ *(int*)(dest + 60) = *(int*)(src + 60);


Is it beneficial to loop unroll this much? In my experiments last year I recall 64 being slightly worse than doing chunks of 48.

Yes, 64 performs better compared to 48. Seeing a regression with 48 for all 3 copy size buckets involved here.

jamesqo · 2017-02-25T04:33:34Z

src/mscorlib/src/System/Buffer.cs

+ n--;
+ if (n != 0) goto MCPY06;
+
+ len = len & 0x3f;


Nit: len &= 0x3f

Nit: Also might be better to just write %= 64 to make it more understandable for others.

jamesqo · 2017-02-25T04:38:56Z

/cc @benaadams @Anderman

DrewScoggins · 2017-02-26T23:50:38Z

test Windows_NT_X64 perf
test Ubuntu14.04_x64 perf

DrewScoggins · 2017-02-26T23:51:05Z

test Ubuntu14.04 perf

vkvenkat · 2017-02-28T00:38:59Z

@jkotas @jamesqo Addressed all of the CR suggestions and pushed my latest. PTAL.

jkotas · 2017-02-28T01:04:53Z

src/mscorlib/src/System/Buffer.cs

- *(int*)(dest + 4) = *(int*)(src + 4);
- *(int*)(dest + 8) = *(int*)(src + 8);
- *(int*)(dest + 12) = *(int*)(src + 12);
+ *(int*)dest = *(int*)src;


These should be done as long for BIT64

jkotas · 2017-02-28T01:07:25Z

src/mscorlib/src/System/Buffer.cs


- switch (len)
+ // PInvoke to the native version when the buffers are overlapping. 
+ if ((((nuint)dest <= (nuint)src) && ((nuint)destEnd > (nuint)src)) || 


Why are you not using the original condition if ((nuint)dest - (nuint)src < len) goto PInvoke; ?

src + len needs to be re-computed in quite a few places and so created srcEnd to avoid this. Had optimized the original condition to if ((nuint)dest < (nuint)srcEnd) and used it in the PR submitted last week. Revised it in favor of a more thorough overlap check as suggested in one of your CR comments.

Had optimized the original condition to if ((nuint)dest < (nuint)srcEnd)

This optimization was incorrect. The original condition had subtle dependency on integer overflow - the rearranging of left side and right side broke it.

Revised it in favor of a more thorough overlap check

I do not think we need a more thorough overlap check here. We just need the original one that had a single compare and jump (unless there is some non-obvious reason why the one with two compares is faster...).

jkotas · 2017-02-28T01:08:15Z

src/mscorlib/src/System/Buffer.cs

- *(int*)(dest + i + 4) = *(int*)(src + i + 4);
- *(int*)(dest + i + 8) = *(int*)(src + i + 8);
- *(int*)(dest + i + 12) = *(int*)(src + i + 12);
+ *(int*)dest = *(int*)src;


use long for BIT64

jkotas · 2017-03-02T06:56:59Z

@dotnet-bot test Windows_NT x64 Release Priority 1 Build and Test please

jkotas · 2017-03-02T07:52:38Z

@vkvenkat Several tests are consistently failing. Could you please take a look?

CoreMangLib._cti_system_text_stringbuilder_StringBuilderReplace4_StringBuilderReplace4_._cti_system_text_stringbuilder_StringBuilderReplace4_StringBuilderReplace4_cmd
CoreMangLib._cti_system_version_VersionToString2_VersionToString2_._cti_system_version_VersionToString2_VersionToString2_cmd
CoreMangLib._cti_system_version_VersionToString1_VersionToString1_._cti_system_version_VersionToString1_VersionToString1_cmd

briansull · 2017-03-02T19:32:42Z

src/mscorlib/src/System/Buffer.cs

-#if BIT64
- *(long*)dest = *(long*)src;
- *(long*)(dest + 8) = *(long*)(src + 8);
+ if (len <= 48) goto MCPY01;


-- EDIT - I see that the overrun issue is handled by using 'destEnd' and srcEnd'

-- Original comment:
Shouldn't this be goto MCPY02

Going to MCPY01 will result in copying 16 bytes unconditionally and returning.
Which will overrun the end of the dest buffer for values less than 48

As you explained below, MCPY01 copies the last 16 bytes of src to dest, and an overrun doesn't occur here. MCPY01 uses destEnd & srcEnd as opposed to dest & src in other places.

briansull · 2017-03-02T19:34:31Z

src/mscorlib/src/System/Buffer.cs

- *(long*)dest = *(long*)src;
- *(long*)(dest + 8) = *(long*)(src + 8);
+
+ MCPY01:


You should add a comment on each label describing what can be expected when we reach here:

MCPY01: // Unconditionally copy the last 16 bytes using destEnd and srcEnd and return.

We have debug asserts after labels like Debug.Assert(len > 16 && len <= 64) after MCPY00. Do you prefer switching these to comments?

briansull · 2017-03-02T19:54:25Z

src/mscorlib/src/System/Buffer.cs

+ *(int*)dest = *(int*)src;
+ *(int*)(dest + 4) = *(int*)(src + 4);
+ *(int*)(destEnd - 8) = *(int*)(srcEnd - 8);
+ *(int*)(destEnd - 4) = *(int*)(srcEnd - 4);


You are using an interesting trick here to prevent the normal overrun that would occur when performing all 8 byte writes when len is not a multiple of 8.

You should explain how this trick works.

Also MCPY01 aqnd MCPY02 perform silimar operations but have a very subtle difference in their input requirements.

I hope that the processor doesn't introduce a stall when len < 8 and we introduce an overlapping write.

Sure, will do.

Hmm, this trick may need a full upfront overlap check to work ... it may be explaining the failure on Ubuntu.

@jkotas Thanks. Investigating the test failures. Shall push a fix for this.

Updated the overlap check to if (((nuint)dest - (nuint)src < len) || ((nuint)src - (nuint)dest < len)) goto PInvoke;. This should resolve test failures.

@briansull Added comments after labels to explain copy logic.

vkvenkat · 2017-03-04T16:52:40Z

@jkotas Addressed latest CR feedback, fixed test failures & updated results. PTAL

jkotas · 2017-03-04T17:24:27Z

Looks great - thank you!

Fixes #9161 PR dotnet#9786 fixes perf of span copy of types that don't contain references

Improve span copy of pointers and structs containing pointers Fixes #9161 PR #9786 fixes perf of span copy of types that don't contain references

…#9999) Improve span copy of pointers and structs containing pointers Fixes #9161 PR dotnet#9786 fixes perf of span copy of types that don't contain references

This is just a copy of changes made in dotnet/coreclr#9786

Port optimization in Buffer.Memmove from CoreCLR This is just a copy of changes made in dotnet/coreclr#9786 * Code review feedback

glenn-slayden · 2018-02-11T23:42:56Z

I posted a comment for c6372c5 here, but I don't think that was the right place, I apologize for repeating the remarks here...

Just curious why all these improvements to Buffer.Memmove (plus more recent work) aren't deployed in the IL cpblk instruction instead? It appears that this commit c6372c5 removes a comment to the effect that cpblk is where all this effort naturally belongs (that is, in case the proliferation of conditional #if code here wasn't already somewhat suggestive), and I agree.

I know everyone wants there to be one central nexus for obtaining an optimal memmove, with proper alignment, chunking, tuning, and everything else done right; shouldn't cpblk be that place?

I also note that this commit removes all address alignment behavior. Now while it's true that the alignment code in ea9bee5 was a bit meek in only considering the destination address--at the possible expense of source alignment:

coreclr/src/mscorlib/src/System/Buffer.cs

Lines 474 to 503 in ea9bee5

 if (((int)dest & 3) != 0) 

 { 

 if (((int)dest & 1) != 0) 

 { 

 *(dest + i) = *(src + i); 

 i += 1; 

 if (((int)dest & 2) != 0) 

 goto IntAligned; 

 } 

 *(short*)(dest + i) = *(short*)(src + i); 

 i += 2; 

 } 

 IntAligned: 

 #if BIT64 

 // On 64-bit IntPtr.Size == 8, so we want to advance to the next 8-aligned address. If 

 // (int)dest % 8 is 0, 5, 6, or 7, we will already have advanced by 0, 3, 2, or 1 

 // bytes to the next aligned address (respectively), so do nothing. On the other hand, 

 // if it is 1, 2, 3, or 4 we will want to copy-and-advance another 4 bytes until 

 // we're aligned. 

 // The thing 1, 2, 3, and 4 have in common that the others don't is that if you 

 // subtract one from them, their 3rd lsb will not be set. Hence, the below check. 

 if ((((int)dest - 1) & 4) == 0) 

 { 

 *(int*)(dest + i) = *(int*)(src + i); 

 i += 4; 

 } 

 #endif // BIT64

...at least this was something, and a reasonably motivated approach since misaligned store penalties are typically higher than for read.

But in this commit there's neither. Did the experiments in #9786 empirically determine that physically aligning these operations is always a waste of time? For severe misalignment between source/target, it would be possible to write a split-access loop, albeit complex, that marshals quad-aligned reads to quad-aligned writes. Was this type of thing evaluated before taking the decision to abandon all efforts at alignment?

Finally, note that by insisting on alignment to the destination address exclusively, the alignment code (that was removed in this commit, see above) may have been ruining a naturally ideal source alignment, with the overall result of making things worse. This could have unfairly penalized test results for the existing code, when it would be a trivial matter to only do that adjustment when src/dest are jointly aligned to some appropriate level, e.g., wrap the above code with:

if ((((int)src ^ (int)dest) & 0x7) == 0)        // (or some other alignment value)
{
    // ...

As far as I can tell, there has never been code in buffer.cs that considers the source and destination physical alignment jointly, in order to then proceed sensibly on the basis of 3 broadly defined cases:

Input src/dst physical addresses already jointly aligned (or len less than max align size);
Joint alignment can be improved (to some degree) prior to starting the gang-operation (code removed in c6372c5)
Perverse src/dst alignment detected; proceed as-is, or, if worthwhile, use split-access loop to fully remediate.

glenn-slayden · 2018-05-04T22:06:18Z

Commit c6372c5 makes a change to the detection of overlap between the source and destination in Buffer.Memmove:

coreclr/src/mscorlib/src/System/Buffer.cs

Line 262 in ea9bee5

if ((nuint)dest - (nuint)src < len) goto PInvoke;

...becomes...

coreclr/src/mscorlib/src/System/Buffer.cs

Line 271 in c6372c5

 if (((nuint)dest - (nuint)src < len) || ((nuint)src - (nuint)dest < len)) goto PInvoke; 

I must admit I puzzled over the original code since the unsigned subtraction and compare "incorrectly" fails to identify source/destination overlap in the case where the destination (address) precedes the source (address) in memory. Indeed c6372c5 clarifies this and would also seem to fix a bug, but how did the original work anyway?

Then I realized of course that the answer is that the original code is stealthily incorporating a strong assumption about--and dependency upon--the copy direction of the managed Buffer.Memmove implementation which follows it.

The unfortunate upshot is that, while c6372c5 certainly makes the code clearer and more readable, it now excludes all overlap cases and thus no longer allows the performance benefit of fully-managed operation (i.e., non-P/Invoke) in the 50% of cases where the overlap happens to be compatible with the copy direction implemented in Buffer.cs. I blame the original code for not documenting its entirely non-obvious design and/or operation.

Anyway, can we revisit this code to restore the optimization that was lost here? Namely, once again allow the managed code path to prevail (still subject to overall size considerations, etc.) instead of P/Invoke for the half of the detected overlaps which comport with the copy direction that's hard-coded into Buffer.Memmove?

…9786) Commit migrated from dotnet/coreclr@c6372c5

dnfclas added the cla-already-signed label Feb 25, 2017

jkotas reviewed Feb 25, 2017

View reviewed changes

jamesqo reviewed Feb 25, 2017

View reviewed changes

jkotas mentioned this pull request Feb 25, 2017

[Revived PR] Faster Buffer.MemoryCopy for AMD64 #9143

Closed

jkotas reviewed Feb 28, 2017

View reviewed changes

vkvenkat changed the title ~~Extending optimized JIT helpers to Buffer.MemoryCopy~~ Optimize Buffer.MemoryCopy Mar 1, 2017

briansull reviewed Mar 2, 2017

View reviewed changes

Extending optimized JIT helpers to Buffer.MemoryCopy

5454a0f

jkotas merged commit c6372c5 into dotnet:master Mar 4, 2017

kouvel added a commit to kouvel/coreclr that referenced this pull request Mar 7, 2017

Improve span copy of pointers and structs containing pointers

b296a68

Fixes #9161 PR dotnet#9786 fixes perf of span copy of types that don't contain references

kouvel mentioned this pull request Mar 7, 2017

Improve span copy of pointers and structs containing pointers #9999

Merged

kouvel added a commit to kouvel/coreclr that referenced this pull request Mar 8, 2017

Improve span copy of pointers and structs containing pointers

75cc69f

Fixes #9161 PR dotnet#9786 fixes perf of span copy of types that don't contain references

kouvel added a commit that referenced this pull request Mar 9, 2017

Improve span copy of pointers and structs containing pointers (#9999)

a6a7bde

Improve span copy of pointers and structs containing pointers Fixes #9161 PR #9786 fixes perf of span copy of types that don't contain references

jorive pushed a commit to guhuro/coreclr that referenced this pull request May 4, 2017

Extending optimized JIT helpers to Buffer.MemoryCopy (dotnet#9786)

7634451

sergiy-k added a commit to sergiy-k/corert that referenced this pull request May 8, 2017

Port optimization in Buffer.Memmove from CoreCLR

138fe5d

This is just a copy of changes made in dotnet/coreclr#9786

sergiy-k mentioned this pull request May 8, 2017

Port optimization in Buffer.Memmove from CoreCLR dotnet/corert#3551

Merged

sergiy-k added a commit to sergiy-k/corert that referenced this pull request May 9, 2017

Port optimization in Buffer.Memmove from CoreCLR

8b6340a

This is just a copy of changes made in dotnet/coreclr#9786

sergiy-k added a commit to dotnet/corert that referenced this pull request May 9, 2017

Port optimization in Buffer.Memmove from CoreCLR (#3551)

62c5b20

Port optimization in Buffer.Memmove from CoreCLR This is just a copy of changes made in dotnet/coreclr#9786 * Code review feedback

jkotas mentioned this pull request Jun 23, 2017

Optimizing Int32 Primitive Parsers and clean up dotnet/corefxlab#1616

Merged

karelz modified the milestone: 2.0.0 Aug 28, 2017

neuecc mentioned this pull request Aug 30, 2017

Improve BlockCopy performance MessagePack-CSharp/MessagePack-CSharp#117

Closed

AlexGhiondea added the netfx-port-consider label Oct 4, 2017

This was referenced Jan 31, 2020

Incorrect (?) optimizations of overlapped struct copy in unsafe code dotnet/runtime#7539

Open

[arm64] Tune memory copy performance dotnet/runtime#7559

Closed

glenn-slayden mentioned this pull request Jan 31, 2020

'Buffer.Memmove' allow managed code path for compatible overlap cases dotnet/runtime#10282

Closed

picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022

Extending optimized JIT helpers to Buffer.MemoryCopy (dotnet/coreclr#…

47a4d73

…9786) Commit migrated from dotnet/coreclr@c6372c5

jkotas mentioned this pull request Oct 11, 2023

Improve SpanHelpers.ClearWithReferences for arm64 dotnet/runtime#93346

Closed

Optimize Buffer.MemoryCopy #9786

Optimize Buffer.MemoryCopy #9786

Conversation

vkvenkat commented Feb 25, 2017 • edited Loading

vkvenkat commented Feb 25, 2017

jkotas Feb 25, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkotas Feb 25, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkotas commented Feb 25, 2017

jamesqo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jamesqo Feb 25, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jamesqo Feb 25, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jamesqo commented Feb 25, 2017

DrewScoggins commented Feb 26, 2017

DrewScoggins commented Feb 26, 2017

vkvenkat commented Feb 28, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkotas Feb 28, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkotas commented Mar 2, 2017

jkotas commented Mar 2, 2017

briansull Mar 2, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

briansull Mar 2, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

briansull Mar 2, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkotas Mar 3, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vkvenkat commented Mar 4, 2017

jkotas commented Mar 4, 2017

glenn-slayden commented Feb 11, 2018

glenn-slayden commented May 4, 2018

vkvenkat commented Feb 25, 2017 •

edited

Loading

jkotas Feb 25, 2017 •

edited

Loading

jkotas Feb 25, 2017 •

edited

Loading

jamesqo Feb 25, 2017 •

edited

Loading

jamesqo Feb 25, 2017 •

edited

Loading

jkotas Feb 28, 2017 •

edited

Loading

briansull Mar 2, 2017 •

edited

Loading

briansull Mar 2, 2017 •

edited

Loading

briansull Mar 2, 2017 •

edited

Loading

jkotas Mar 3, 2017 •

edited

Loading